Simple Site Backup Pattern, uses S3 hadoop

The theory here is that you want to backup your website’s document root and the MySQL database on a daily basis. Storing the backup file on your webserver is OK in case you screw up your site, can revert easily, but it’s bad if you lose your server. Best is to have a copy on your web server for easy access, and also store it offsite in case of catastrophe.

 

In this tutorial, we’ll keep 7 days of backup in a local /backup/ directory, then store 30 days of backups on Amazon’s S3. In order to put the files onto Amazon’s S3, going to use hadoop! Using hadoop, not because I plan on doing Map/Reduce on my backups, but because it provides a simple command line method for putting files into S3! It’s easier than writing my own program to store on S3.

 

Note: In the past, I’ve written an article on storing backups on S3 using a Deduplication technique. This is pretty clever and will reduce the total disk space consumed on S3. But, it’s much more complex and if you lost your web server and needed access to the backup files, you’d need to reconstruct all the code to reassemble your files. This would be a pain, in a pinch. So, if you just want a super simple way to backup your files, and you can very easily retrieve them from any machine or browser, this is your article.

Read more

Scalability Rules – Review

Just read the book, “Scalability Rules, 50 Principles for Scaling Web Sites” by Martin L. Abbott and Michael T. Fisher. I’d like to start out saying that having the opportunity to meet the authors of this book was an honor. I only wish that I had read the book before meeting them. I’m inspired by this book; the length of this blog post should be a testament to that.

 

The book was an easy read and spot on. You can read the whole book in one sitting, I did – one Sunday afternoon. Many times I marveled at how we’re all coming to the same realizations, at different companies. I’ve been living this through experience working at very fast growing Internet company with millions of customers, dozens of SaaS based services, and several data centers, some International. What I liked about this book was the affirmation of beliefs I share with those I work with. I could demonstrate example of nearly all of these rules across our array of online services. There were plenty of aha moments! This book is a great introduction to many (all the important ones?) advanced web application scalability topics. If you think you already know them all, think again. Give this book a read. If you’re already an advanced level web app architect, you’ll breeze over much of it, then get an eye-opening surprise or three.

 

I’d like to reinforce how much I enjoyed the affirmation of my own beliefs, and the eye openers. Never before have I seen all of these principles/rules/beliefs (whatever you want to call them) together in one easily reference-able book. I’m going to buy many copies of this and hand them out at work, with the instruction: We should all know these rules, inside and out, through our combined experiences, and this book sums them all up. This is a must have reference to have on the desk.

 

Scalability Rules is very modern, in that it discusses the very latest in large scale web application trends. These aren’t the principles from 2000 or 2005, this is culmination of all the latest, up to 2010 and 2011, trends. Seriously, back in 2005, this stuff hadn’t surfaced yet. Some of the horizontal scaling principles existed, but none of the more modern sharding, noSQL, page-cache, object-cache, CDN, and more had enough sustained experience for all of us to know if it’s all really worth the trouble. Very few sites in 2005 required much more than 2 or 3 web servers behind a load balancer and a database. I anticipated the growth that was about to happen, but it was hard to really know what it’s like until you live it.

 

Here are my brief comments on each of the rules:

Read more