Today's massive websites and huge volumes of users have created scalability problems for data centers worldwide. Richard Campbell, a product evangelist with Vancouver, British Columbia-based Strangeloop Networks Inc., shares his scalability secrets in this Q&A based on his research paper, which won an award from Turnersville, N.J.-based Computer Measurement Group Inc.
How extensive is the problem of scalability with Web applications today?
Richard Campbell:The growth in scaling problems reflects the massive growth of the Internet. Today there is a huge volume of users and the websites are larger than ever. The need to build large, scaling websites far outstrips available resources.
What role can a data center (versus application developers) play in helping to solve these problems?
Campbell: While the data center is certainly the stage [where application problems occur], it is also the source of data for solving scaling problems. There is a fundamental difference between developing a website and operating it. Typically it takes a combination of [development and operational] skills to solve a scaling problem. But the solution starts with great diagnostics, and that takes place in the data center.
You say the applications and the networks on which they run are closely intertwined and, therefore, application developers and network designers should work closely together. Could you give us a few examples?
Campbell: One example is a website that provides streaming video or other large files to the user. Having network folks increase the MTU [maximum transmission unit or packet size] will dramatically improve performance. But these changes require solid understanding of the application by the network folks and vice versa.
Another example is the SSL [Secure Socket Layer communication protocol]. Offloading SSL to a dedicated appliance can reduce server workload significantly, but if the application has an SSL testing dependency, the move could break the application. This type of change can be very synergistic, but it requires strong collaboration between developers and network folks.
Testing is key to quantifying baseline performance and, in turn, measuring the success of optimization. Are there any special problems to consider in testing Web applications?
Campbell: It is very challenging to create realistic application tests in the lab because users do things you can't imagine. The best approach is to gather log data from the production application to build behavior profiles of users and then use this data to create tests for the lab. Typically, developers test new versions of applications, but I am testing almost constantly for different purposes.
However, you say that tactical modifications at the granular level, like payload optimization, compression, caching and memory adjustments, can be even more effective than broader optimizations such as server specialization or load balancing. Again, which one or two tactical strategies would likely be the most helpful and why?
Campbell: Each optimization technique has its advantages and consequences. Reducing payload is complex and makes the application more expensive to maintain. But if latency is the problem, then payload optimization is worthwhile. Compression seems like a no-brainer but it adds load to the CPUs and it might not have a substantial benefit. Caching can provide dramatic performance benefits, but puts pressure on memory resources and increases complexity, making it harder to add features and debug problems. There is no one perfect optimization technique. Finding the right combination requires skill, perseverance and experimentation.
In addition, the goal posts are always moving. Each day you may have more users or a new business need. Getting rid of one bottleneck just moves the focus to the next one.
Scaling databases: When you've reached the maximum size and start to scale out, should you specialize by region or other criteria? Are there any downsides to scaling out?
Campbell: Regional partitions make sense if the performance-sensitive queries follow geographic lines. The challenge is finding those natural partitions. Partitions are likely to cause performance problems for queries involving multiple partitions, so it's a trade off. In addition, partitioned databases are more complex to manage, back up and diagnose when things go wrong.
How often should you be testing and optimizing your applications and networks for performance?
Campbell: Testing is something I like to do continuously. But optimization is something I do reluctantly. Eventually, performance becomes the No. 1 feature, and you have to optimize.
How much of a performance gain and, conversely, reduced downtime, can you expect from these efforts?
Campbell: Making a high-performance website doesn't necessarily mean reduced downtime. In fact, often the opposite is true. Complexity is the enemy of reliability, and optimization techniques are complex. That's why you should optimize only where it's absolutely needed. To achieve acceptable performance, sometimes you have to do something more radical, like moving the server closer to the user.
What single take-away message can you offer for our data center audience?
Campbell: Collaborate with your developers. Get involved in requirements gathering. Be part of the optimization team. [Data center operations] bring a unique and valuable perspective to the problem.
Let us know what you think about the story; email Matt Stansberry, Executive Editor.