Many CMS-backed sites are built using MySQL and are launched on cloud infrastructure. In order to mitigate down-time due to regional outages, it is advisable to create a geo-distributed redundancy topology in both the app layer as well as within the database. GenieDB makes it very easy to set up multiple MySQL database servers around the world that are automatically kept synchronized as data is changed on any of the nodes. The database nodes are typically paired 1-on-1 with an app or web server. Some of our customers use the app servers to dish out their CMS backed sites. The database is kept synchronized, but the customers still need to find a way to keep the media content that they use to be available on all these app/web servers. Below is a simple setup that can be easily configured within a very small budget and provides high availability for both the data and the static content during an outage.
While some of our customers use traditional CDN solutions, such as CloudFront, we have others who wanted to roll their own solution. It is for this latter group that we developed this “Poor Man’s CDN”. The key for this approach is to create an ability to maintain a particular directory containing important digital content assets synchronized across multiple machines in multiple geographies.
We used Linux’s inotify and rsync over SSH for this purpose. This provides a fairly basic and secure mechanism, as long as the same file is not changed on multiple nodes simultaneously. Since static content does not typically get updated nearly as frequently as the database, this is usually not a challenge for most applications. (Note, for the database, GenieDB provides an automatic conflict resolution process, but since the static content is outside the database, these minor precautions need to be taken at the application layer.)
inotify provides a mechanism to monitor the file system for events. We use a Python library, Watcher that provides some nice management routines on top of pyinotify, a wrapper to Linux inotify calls.
Setup is simple. Specify the directory to watch in
[job1] watch=/var/www/uploads events=create,delete,modify,move_from,move_to recursive=true #shell script to execute when changes are detected. command=sync_www_directories.sh
Change the permissions on
watcher.py to allow execution and we are half way there. The remaining piece is the script configured under ‘command’ to execute when needed. The script is a fairly simple affair.
#!/bin/bash #script to sync /var/www/uploads directories #check if rsync is running and quit script if it still is RUNNING=$(ps --no-headers -Crsync | wc -l) if [ $RUNNING -ge 1 ] then exit 1 fi #Using root to simplify. You should create a specific user for this purpose. /usr/bin/rsync --exclude '*.tmp' --delete --quiet -a /var/www/uploads firstname.lastname@example.org:/var/www/uploads & /usr/bin/rsync --exclude '*.tmp' --delete --quiet -a /var/www/uploads email@example.com:/var/www/uploads exit
This script will sync up the indicated directory using rsync. Just replace the sample IP addresses above with the actual values. The watcher script can be started using
./watcher.py -c watcher.ini start
To automatically run this script on machine startup, so that it survives reboots, alter
/etc/rc.local to include the command above to run watcher.
We also need to configure ssh-keys to allow rsync to log on to other machines. This can be done using standard mechanisms of creating keys and adding them to
authorized_keys files on various servers. These steps need to be repeated on all the servers that make up your “CDN”.
In conjunction with GenieDB’s automatic management of the database across multiple regions, the above procedure creates a very cost effective solution to achieving high availability during a regional outage, as well as better application response time to distributed users.
That’s it & best of luck! Do let us know if you make any enhancements.