Businesses can't afford even small IT outages, a fact that has IT operations teams laser-focused on IT systems, servers and general infrastructure. Turns out, they're looking for all the right things in all the wrong places.
No one will tell you not to monitor IT infrastructure. But you might as well do just that if your IT organization isn't tracking uptime using the right input from infrastructure and applications. Systems and workloads are complex and intertwined, which complicates IT monitoring. We often look at isolated key performance stats for various items in the infrastructure stack, without comprehending the effect on other systems and applications that consume those resources. Monitoring should not be judged on which dashboards give the most up-to-date information or have the best GUI.
Ideally, IT organizations should monitor uptime without referencing a system at all, and look instead at the customer experience. System-tracking metrics tell you about the systems, not what the customer sees and experiences. The product exists for end users -- shouldn't they be the first place you set metrics to monitor?
Most tools used to monitor IT infrastructure performance and uptime don't address end-user experience. The majority of these tools come with the infrastructure hardware or software at purchase. When the provided tool doesn't track some important data, IT organizations turn to third-party tools that track uptime and performance via the infrastructure's APIs -- often the same place where the vendor's tools source data. Neither method gets the company closer to understanding end-user experiences.
Successful uptime monitoring has to start at the top and flow downward. Customer experience is the aggregate result of the entire IT infrastructure and software code stack's performance; therefore, it should be the starting point for the rest of the monitoring metrics. However, most applications have limited support for monitoring that correlates application to infrastructure performance. IT organizations can expect to do some customization to enable customer-experience monitoring.
Systems create app uptime -- or downtime
Customer experience should not be the only focus. Dedicated system monitoring exposes hotspots long before they affect uptime. CPU performance, IOPS and memory use all influence end-user experience in different ways: Individual metrics combine to offer a larger picture. Monitor IT infrastructure performance with the aim to understand how each metric affects app behavior. Customer experience is often what is tied to the service-level agreements for uptime, so make it the reference point for all monitoring decisions.
To monitor IT infrastructure in the context of application use, IT shops can turn to a fledgling group of application load-testing tools, such as Apache JMeter, LoadRunner from Hewlett Packard Enterprise and features in Microsoft Visual Studio. There are also resiliency tests such as Netflix's Chaos Monkey and its related programs. Resiliency tests do not monitor uptime, but instead stress and find faults within internal applications. They do help IT operations tune the IT monitoring tools in use to pick up on user experience and correlate it to the correct infrastructure pieces. All of these tests and steps, from the infrastructure to the application and back to the infrastructure, open up a view into what creates a positive or negative end-user experience, which is what companies care about the most.
Want to make LOB happy? Think UX and digital performance management
Five-hour Amazon outage has users second-guessing stability
Ensure your infrastructure maintains ROI goals