BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Eight? Fifteen? Forty? For enterprise IT shops, the number of tools used to monitor, manage and plan IT infrastructure varies from few to many, because there's never just one tool for everything.
Faced with high uptime demands, rapid application releases and infrastructure that spans various on-premises and hosted resources, some IT shops have focused on integrating IT monitoring tools under an umbrella product, such as systems management product BigPanda, that standardizes alerts and actions.
"We used a bunch of apps for monitoring [before adding BigPanda]," said Joe Guzowski, network operations center (NOC) lead at Move Inc., a News Corp. company in Santa Clara, Calif., that operates a network of real estate websites. "It was difficult for the NOC person to look at all these different ones."
Move integrated output from Nagios, New Relic and other IT monitoring tools into dashboards through BigPanda, and it shaved 10 to 15 minutes from response time to an incident with the integrated overview, Guzowski said.
The University of Ottawa Heart Institute's IT estate grew from a cluster of servers in a room to a dedicated space, with expansion into a newly built and redundant data center currently underway. Over his nearly two decades there, the health center has used a plethora of IT monitoring tools, but none that met the facility's needs, said systems analyst Marc Charbonneau.
"We didn't have any proper monitoring for the longest time," he said.
The organization found open source options lacked flexibility or heaped on complexity, while enterprise-class systems monitoring and management platforms were too expensive. The heart health center turned to Pandora FMS on-premises IT monitoring software to pull together alerting and trend information in digestible graphs.
A concerted monitoring effort
A key selling point for such integrated systems management is the ability to mark problems and trends across infrastructure layers. For example, Charbonneau used Hewlett Packard Enterprise's (HPE) 3PAR onboard data storage monitoring and graphing, but found it lacked digestible, customized reporting. Now, he queries the 3PAR storage system with scripts and pipes the aggregated data into custom modules in Pandora FMS. This shows usage across the servers and storage area network, and it generates a weekly snapshot that the team "scans through in less than one minute" to spot issues needing investigation.
A centralized dashboard can act as an intelligent starting point for issue resolution; if a resource exceeds a threshold or another event alerts, the user can drill into the individual tool for more information. If it is a simple matter -- high CPU usage, for example -- troubleshooting occurs directly in the problem server. For more complex issues, Guzowski explained, the dashboard and specialized IT monitoring tool work together. For example, Move gets "nitty-gritty" information from New Relic application performance monitoring and Amazon Web Services CloudWatch cloud monitoring. BigPanda shows the percentage of website transactions with errors, and one of these monitoring tools shows the troubleshooter what the error messages are.
Charbonneau said he plans to attack a new local area network deployment in this manner, creating a centralized trends report in Pandora FMS on all the key uplinks and using the network's built-in monitoring technology only when necessary.
Integrating monitoring data from disparate IT tools helps save both money and capacity. Charbonneau's team correlates HPE 3PAR's assigned storage information with VMware vSphere's actual storage usage data to ensure they reclaim storage after a change, such as removing servers from the data store.
"It's a fact of life that IT organizations are using tools that have overlapping capabilities," said Dennis Drogseth, analyst at Enterprise Management Associates Inc., in Boulder, Colo. "People won't give up on the tools that they associate with their job ... and most people don't have a lot of time to set aside to learn all the capabilities."
A centralized systems management tool can alleviate staffers from interface overload. At York Risk Services Group, ServiceNow IT service management (ITSM) is deployed as a standardization platform for as many IT tasks as possible, allowing them to cut out some redundant tools and synthesize others. Automation between ServiceNow and VMware AirWatch, for example, lets a staff member fulfill a mobile device management request without even knowing how to use AirWatch. Similarly, the company's real-time security scans feed into ServiceNow, where remediation may mean killing a VM and provisioning a new one without ever leaving the ITSM tool.
As in most enterprise IT shops, York has ancillary tools that aren't used every day, but "we don't want to throw [them] away," said Bart Murphy, CTO at the company, based in Parsippany, N.J. Tying these into the primary systems monitoring and management dashboard via APIs lets them keep the whole toolchain in one place. York similarly pulls in disparate tools via APIs to enable insights after an acquisition, when IT organizations have to make sense of new infrastructure, databases and more. With the acquired tool set's data in ServiceNow, Murphy can ask: "What type of issues are they experiencing? What does their network look like? Have we seen this in other parts of our infrastructure?"
Today's integrated IT monitoring tools can resolve many problems, but users see more possibilities in the future. With the ability to examine past events, Move's Guzowski said he envisions creating an incident timeline that shows the steps leading to the event and the details that reveal why it happened. Drogseth noted companies could use tracked data in tools such as ServiceNow to discover which staff members are the most efficient, or the slowest troubleshooters, or to create CIO roadmaps.
Dennis Drogsethanalyst at Enterprise Management Associates
"It's very early days for this -- you are unifying data sets, getting new insights," Drogseth said.
And the tools vary. In addition to BigPanda, ServiceNow and Pandora FMS, OpsDataStore has a bidirectional platform that consumes IT monitoring tool data, analyzes that data and feeds it back into tools for actions. HPE offers a large, complex operations analytics suite in Operations Bridge. Splunk has IT service intelligence, as does ExtraHop. Other companies, such as Rocana, BMC Software, Sumo Logic, IBM, CA Technologies and MOOG, have utilities that perform various monitoring tool aggregation and analysis, he said.
And the future isn't always up to the people running the production systems. "The folks in app dev are going to go to microservices regardless of whether the folks in IT ops can manage them," said Bernd Harzog, CEO of OpsDataStore. Instead of 1,000 VMs with 1,000 application performance management agents, there will be 100 containers with microservices on these VMs, multiplying the monitoring components needed, and individual correlations between IT monitoring tools won't keep up.
Get IT monitoring just right
Remember how you got to the point of sprawling tools with training headaches and a tangle of licenses: ad hoc buying decisions made in response to pressure to accomplish something quickly. Drogseth recommended forming one central procurement group. When selecting a tool, make sure you know its purpose, he added. For example, how important are security, performance and financial analytics for capacity planning? Harzog recommended asking exactly what data you need from each layer of the stack, what vendors' tools are the best sources of this data and how you want to consume the data.
A lot of the same data can inform different questions, Drogseth said. Data used to troubleshoot a problem in the IT infrastructure can be filtered or mixed with other input to determine the best platform for that workload in the future. Log file data can reflect performance, as well as cost.
Once you select an IT monitoring tool, watch systems for a while before settling on thresholds, Charbonneau advised. In many IT deployments, problems must occur before the analysts know what they need to monitor and implement alerts on.
Play around with the IT monitoring dashboards that make sense for your organization. "You could do one [BigPanda dashboard] just for QA [quality assurance], because it's such low priority compared to production," Guzowski explained.
Evaluate the level of customization you want before adopting tools. Some users might want more canned presets, while others like Charbonneau want to script modules exactly as needed.
Murphy stressed price isn't the only factor when selecting an overarching tool -- you're looking for a product that standardizes multiple operations and even business tasks in one place.
Meredith Courtemanche is a senior site editor in TechTarget's Data Center and Virtualization group, with sites including SearchITOperations, SearchWindowsServer and SearchExchange. Find her work @DataCenterTT or email her at email@example.com.
Tips for better app ops
The necessary tools for DevOps
Manage a mix of VMs and clouds