Into the Cloud

Cloud environments offer enticing cost savings for Web businesses, but they present some interesting challenges. One of these challenges is that clouds tend to provide large numbers of relatively lightweight virtual servers, which may potentially fail; high availability is meant to happen in your application, by designing it so that it can tolerate virtual servers coming and going.

This is great for Web servers, and with a little cleverness, it even works well for very simple key-value data stores. However, it’s not a great fit for the database layer; traditional database products rely on a small number of powerful servers, where one or more may be single points of failure, so would usually be implemented on highly redundant hardware.

Here at GenieDB, we’ve been spending the last few years busily implementing a high-level database designed to transparently operate on a cluster of unreliable small servers… So, we’ve been looking at how we can help in cloud environments.

Our existing core technology is already a perfect match; we already support zero-downtime dynamic reconfiguration of clusters (planned and unplanned), with access to the data via independent MySQL instances on each server. We provide the software to be run on each server, and provide instructions on how to seamlessly add and remove servers from a cluster without downtime.

But many users won’t have an existing cluster management infrastructure in place, and configuring each node manually is no way to run a cluster. So as well as documenting the low-level operations on nodes, we’ve written something we call Cluster Tool, which handles bulk operations on clusters. Given some software installation tarballs and a configuration file listing the servers in your cluster and how you’d like them configured, Cluster Tool will ssh into those servers, install our software, configure it, and correctly follow the processes required. It files away a copy of your configuration, so when it’s presented with a new cluster configuration (which may have added, changed, and removed servers), it can decide what servers need uninstalling, which need upgrading, which need reconfiguring and which need installing, to correctly migrate the cluster to its new state. We ship Cluster Tool with its source code, so you can use it as-is, read the source as a reference implementation alongside the low-level management documentation, modify it to fit into your existing cluster management practices, or just wrap it in your own software that feeds it new cluster configurations.

With these foundations in place, automating cloud deployments was easy. We produced Cloud Tool, which uses a pluggable backend to manage virtual servers from various cloud providers. To grow a cluster, it requests more servers, then adds them to the Cluster Tool configuration, and asks Cluster Tool to add them to the cluster. When they are no longer needed, it asks Cluster Tool to remove them from the cluster, then it requests that the servers be de-provisioned.

It’s as simple as that! And like Cluster Tool, it’s provided as source code, and only uses documented interfaces in the software its built on top of (in this case, Cluster Tool itself), so you can build upon it in your own ways, such as running hybrid clusters based on privately-owned hardware that expand out onto cloud servers during peak load.

Leave a Reply