Scaling MySQL Vertically, The Sun Way

This is a response to a response to an article about Scaling MySQL with a Sun server with a 256-thread count.

With x86 based processors, you get high speed threads, but only a couple (up to four) per processor. With the Sun Coolthread T2 processor, you get slower speed threads, but a lot of them, 64 threads per processor. The 256-way is 4 of these processors on one board.

MySQL cannot take advantage of 256 threads in any efficient manner, as a standalone instance. You can, however, run a lot of instances on a single server. The debate in the original response was surrounding whether this is beneficial for any application with a MySQL backend. Although a good architecture for an application will allow horizontal scaling, isn’t the Sun approach still horizontal scaling? It smells like vertical scaling, because there is a whole lot more processor in a single server. But, we’re not making one monster MySQL instance, either. We’re making 28, lower performing instances. Figure each Coolthread thread runs about half the speed of Xeon threads, so it can do the work of 14 fully powered Xeon instances. Key word, the work of. This is only beneficial for applications where total work performed is more important than the speed of a single thread. Instead of having one instance do the work, spread the work out across many instances.

If my application requires a single master to withstand a high rate of transactions, that are read and write intensive to the disk (forget pure RAM tables, or read heavy applications, as they are exceptions), and I need to scale vertically, Sun T5440 isn’t the solution.

The response article contradicted itself by saying that distributed database is a good architecture, but later argued that its not practical. I think what he’s saying is its a good idea, and is being done by a few that have mastered the art, but most web application softwares are not written this way. Imagine you have a web site that uses, say WordPress, and your site gets really popular. You move to a dedicated server, maybe even a grid hosting environment (multiple web servers behind a load balancer). You separate your web and database tiers, by putting the database on a standalone server behind the web server(s). As the site gets more and more popular, you throw more web servers at the problem, but soon realize the bottleneck is the database tier. You need to scale it. Moving to Sun T5440 isn’t going to solve the problem, unless WordPress is completely rewritten to make use of a multi-instance database architecture. This isn’t very feasible. Ok, so what!? This isn’t the correct application of the T5440. MrBenchmark never presumed that it is.

So, what is an example where Sun T5440 is a good application? Any place where having many, lower performance, instances is preferred. One example is shared hosting. Any hosting provider that offers MySQL databases (all of them), support thousands of standalone MySQL databases, across many servers.

Next comes the question of price. The Sun servers cost a lot more than commodity servers. T5440 are about $80K. Figure a typical Dell 2950 costs $5K. Since the Dell instance is twice as fast as Sun, you’d have to buy 14 Dell’s to do the same amount of work as the Sun T5440, assuming 28 Sun instances is the sweet spot, according to MrBenchmark’s article. So, 14 Dell’s, at $5K each, will cost $70K. Sounds like Dell wins, but 14 Dell 2950s are 28U of rack space, compared to 4U of the single T5440. Now have to calculate data center costs, and power and cooling costs. The T5440 runs on two 1120 W power supplies, the Dells run on a single 675 W power supply. The 14 Dell servers may draw four times the power as the single T5440. To calculate which one wins, have to break out the spreadsheet, figure out how much power they’ll actually consume, figure out what the kWh cost is, and what rent costs per U at the datacenter. From a 50K foot view, it looks like Sun is going to win the challenge.

In the previous paragraph, I ignored disk and ram. One thing to consider is each of the Dell 2950s will be loaded with 6 Internal Hard Drives, and will have 16 GB of RAM (or 32 or 64). 14*16GB = 224GB of RAM. Presumably, the T5440 with 64 or 128GB of RAM will be enough. Hard disks is another question – both configurations need hard disks. You can only put 4 drives in the T5440, so some external storage is required. With the Dell 2950, you get up to 6 drives per server. So, we have another cost factor, storage. If the Dell solution can be done with localdisk, and in the Sun T5440, we need external storage, then we’re consuming more space, power, cooling and the cost goes up. Back to the drawing board. Before you go any further in the decision making process, have to examine what the disk solution is going to be. Have to think about total usage disk space, IOPS (internal operations per second), and RAID level. You need enough disks spinning to handle the workload. Seriously, before you commit to any hardware solution, make darn sure you have carefully incorporated the storage solution.

In fact, one thing I don’t believe about MrBenchmarks experiment is that this was all done with just 4 local disks. If so, then 28 instances doing very little disk i/o each, and may not be realistic. Most MySQL applications (not all), require heavy disk i/o. Think about it, what does MySQL do? It puts and gets data. Where does it put and get the data? Answer: disk, usually.

On a another note, I’m pretty impressed with Solaris 10. I’m very novice when it comes to Solaris, but from what I’ve seen so far, I’m really impressed with its resource management options. If you’re going to go Sun, better have an expert level Solaris admin on staff. In my trials, we thought we could just drop the OS, plop down a copy of MySQL for Sparc, and be running. Wrong. There was much we needed to know about tuning the server for what we were doing.

I’m not about to say Sun or Dell is better. There are trade-offs, and lot of things to consider, before making that call.

I wonder how Sun expects to enter the cloud computing market with large enterprise level servers, like the T5440, to compete against commodity grade hardware. The company slogan: do more with less. Is a very “Go Green”, environmentally friendly message, that executives can buy into, because they’re saving money. But, if Sun can’t demonstrate a savings just based on the hardware acquisition cost, then they have to demonstrate the savings over the course of time with space, power and cooling. Imagine you’re at the car dealership, and there is a car that gets extremely good gas mileage, say 60 miles a gallon, but costs a lot more than the other car you want. There are a bunch of things you are weary about; its new, little review, different than what you’re used to. The salesman shows you how you’ll save money in the long run, over the life of the car, due to the savings in gas mileage. You break out the calculator, and figure out he’s right. Do you buy the car? What if the 60 mi/gal car cost less than the car you wanted to buy, now would you buy it? If the server can do 14 times the work of the commodity equivalent, make it cost less then 14 times the commodity equivalent to make it a no brainer for the architects and company executives.

I wonder how the 6000 person Sun layoff, announced last week, due to drop in server sales, will impact this product line?

DaveK

Comments are closed.