This tip is the third in our series on benchmarking best practices in the data center. You can also read the second installment on using simple optimizations to improve benchmark performance.
Server benchmarking is essential to maintaining long-term system performance, and can be immensely helpful when I comes to capacity planning. Regardless of whether you use the benchmarking tools that are built into Windows or one of the numerous commercial products, your server benchmarking results are only going to be valid if you adhere to some recommended best practices. Let's take this opportunity to talk about some common benchmarking mistakes and how those mistakes can affect the results.
Benchmarks must test the right elements
Early in my IT career, I took a Microsoft certification class where the instructor taught me a lesson that I will never forget. The class had just spent the better part of the morning learning about the Performance Monitor and all of the various counters that come with it. After lunch, the instructor informed us that he had just connected some client computers to each of our lab servers, and that our job was to use the Performance Monitor to find out if the end users were experiencing an acceptable level of performance.
Putting my new knowledge to work, I used Performance Monitor to check for various hardware bottlenecks. I diligently checked the CPU, disk and memory performance before concluding that the load that the end users were placing on the server was indeed acceptable.
The problem was that my benchmarking was right, but I was looking at the wrong thing. I concluded that the server load was acceptable when the original question was whether or not the end-user experience was acceptable. While those questions may sound like one in the same, they had very different outcomes. The end-user experience was unacceptable because of a network bottleneck that would have been undetectable from the server. Had I taken the time to actually benchmark a client machine instead of only checking the server, I would have known that. The lesson was that benchmarking tools can provide valuable information, but the information only helps you if you use the benchmarking tool in the appropriate manner.
Don't mix server benchmarking tools
Of course there is more to benchmarking than just knowing what to measure. Being consistent with the way that you take your measurements is equally important.
Imagine, for example, that your boss tells you the data drive on a file server has historically had a disk utilization of 97% during business hours (which is way too high). In order to alleviate some of the stress on the server, the organization has recently implemented an identical file server. The data is being replicated across both servers, and a load-balancing solution was put in place to distribute user requests evenly across the two servers. The IT department has invested quite a bit of money in this new solution and now your boss is being asked by upper management to quantify the project's success. Consequently, you are asked to benchmark both servers to find out how heavily the disk is being utilized.
This is where the need for consistency comes into play. There are dozens of server benchmarking tools that could tell you how heavily the server's disk is being used (including Performance Monitor, which is built into Windows). The problem is that the odds of any two of these tools giving you exactly the same results are slim.
There is a law of science that states that you cannot measure something without affecting it to some degree. This law was originally created for physics and chemistry, but it also applies to computer benchmarking. Any benchmarking tool -- no matter how efficient it is -- places some load on the system being tested. This load skews the results that are reported by the tool because the tool includes its own impact on the system within the reported benchmark data. This is why consistency is so important.
So consider the file server described above and imagine what might happen if you chose a random server benchmarking tool that wrote performance data to disk in real time. Such a tool would increase the server's disk usage (while the tool is running), thus skewing the results. If the tool had a big enough impact on the server's disk resources or if it used a different algorithm to measure the disk utilization, it could potentially report that the disk utilization has gone up, not down. If that happened, then management would likely interpret the result as the project being a colossal failure rather than just a reporting error.
In a situation like this, the only way to get reliable results is to find out which server benchmarking tool was used to take the original measurements and use that same tool. You will notice that I used the phrase "reliable results," not "accurate results." That's because every benchmarking tool will give you results that are skewed based on the tool's overhead. What is important, however, is to be consistent in the tools that you use so that each measurement is skewed by the same amount.
Use server benchmarking tools consistently
Of course, being consistent in your tool selection alone does not guarantee reliable results. You must also be sure that you are consistently using the tool in the same way. For example, if you take one set of measurements during peak hours and another set of measurements during off-peak hours, you will probably get very different results even if you use the same tool for both measurements. Likewise, if you change the server benchmarking tool's sampling frequency or alter various aspects of the system's performance from one measurement to the next, you are changing the amount of overhead that the benchmarking tool places on the system, thereby impacting the results.
Use virtualization-aware server benchmarking tools
One last best practice is to distinguish between physical and virtual servers in your benchmarking reports. Due to the way that hardware calls are passed through the hypervisor, accurately benchmarking a virtual server can be extremely difficult. There are products on the market that are specifically designed for virtual server benchmarking, such as VMmark or VMbench. The problem is that these specific tools are designed specifically for virtual machines, and aren't really appropriate for benchmarking physical servers. Tools such as Windows Performance Monitor can benchmark both physical and virtual servers, but you are likely to see wildly different results from a virtual server than from a comparably configured physical server. For right now there really isn't anything that you can do about this discrepancy other than to be aware of it, and to classify physical and virtual machines separately in your server benchmarking reports.
There are a number of different factors and common oversights that can affect the accuracy of server benchmarking. Because these factors can be difficult or impossible to eliminate, it is critically important to perform benchmarking in a consistent manner. In other words, you should use the same server benchmarking tool each time and make sure that the tool is configured the same way each time you use it. You should also try to take measurements at the same time of day or load conditions if possible, since the load that the end users are placing on the server will affect the accuracy of your results.
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at email@example.com.