It’s official: IT professionals across the world hate systems monitoring. In fact, the phrase "monitoring sucks" has taken on a life of its own, spawning blogs, an open source code repository at GitHub – even the #monitoringsucks hashtag on Twitter.
"'Monitoring sucks' is a shared meme that we all coalesce around," said Chris Kelly, a developer and evangelist at New Relic Inc., a Software as a Service (SaaS)-based Web application performance monitoring provider based in San Francisco, Calif.
What’s so bad about systems monitoring? John E. Vincent, director of cloud infrastructure at Minneapolis, Minn.-based EnStratus, a cloud management software provider, enumerated monitoring’s long list of shortcomings in this blog post. They include, but are not limited to:
- "Having to set up "fake" hosts just to group arbitrary metrics together ";
- "Having to either collect metrics twice - once for alerting and another for trending";
- "Only being able to see my metrics in 5-minute intervals";
- "Having to choose between [a poor] interface but great monitoring or [poor] monitoring but great interface";
- "Perl (I kid...sort of) "; and
- "Not actually having any real choices."
Vincent started the Monitoring Sucks code repository on GitHub Inc., the popular open source code and idea-sharing site, which collects enhancements and integrations to existing open source systems monitoring tools such as Nagios, Cacti and Graphite, or new ideas about smart ways to work with those tools.
I like the idea of open source, but it comes down to how much is my time worth – is this sustainable?
Mike Horwath, CTO at ipHouse
Wanted: Systems monitoring tools that work
But whatever efforts that code repository has inspired haven’t been enough. In particular, the Nagios open source systems monitoring tool really raises system administrators’ hackles. Graphing packages, which include Cacti, Graphite and Zabbix, also get singled out.
And when they try to combine monitoring and graphing, IT pros throw in the towel.
"I needed something better than red light/green light," said Mike Horwath, CTO at ipHouse, a hosting provider in Minneapolis, Minn. "I needed both monitoring and measuring, and I had to get those integrated."
The firm had used Nagios over the years, but has had limited success pairing it with Cacti or Zabbix.
"They give you lots to graph and measure, but it gets overly complicated," he said.
That’s the problem with open source tools as a whole. "I like the idea of open source, but it comes down to how much is my time worth – is this sustainable?" Horwath said.
Horwath took a right turn and purchased commercial monitoring and reporting software from Santa Monica, Calif.- based hosting tool provider LogicMonitor Inc. that did both out of the gate.
LogicMonitor is offered as SaaS and can’t be customized ad infinitum. But the depth of the reporting that is available makes up for the fact that you can’t tweak every last setting, Howarth said. In fact, the only thing about Nagios that Horwath misses is the "random back-off" feature, which waits to see if a soft failure resolves itself with time before sending out an alert.
For other shops, SaaS tools such as LogicMonitor aren’t an option, but an upgrade to a commercial open source package complete with support is.
A large U.S.-managed service provider with multiple locations recently moved from a combination of Hewlett-Packard’s OpenView, Nagios and OpenNMS and homegrown code to commercial open source Zenoss monitoring suite, citing its easy-to-use GUI interface, support, plus the flexibility customizes the package to the firm’s needs, said the network systems manager responsible for the change, who requested anonymity.
"We can’t entirely go away from homegrown," he said. With Zenoss, "we can still go down and make the changes we need to make." With other tools he looked at, such as SolarWinds, "we couldn’t customize it the way we needed to."
Enterprise-level support is important, too.
"There are times when stuff is just over my head," he said.
Systems monitoring nirvana
Meanwhile, EnStratus’ Vincent has a vision of what would make monitoring better:
"There are plenty of frustrated system administrators, developers, engineers, "devops" and everything under the sun who don't want much. All they really want is for [things]to work. When [stuff] breaks, they want to be notified. They want pretty graphs. They want to see business metrics alongside operational ones. They want to have a 52-inch monitor in the office that everyone can look at and say: 'See that red dot? That's bad. Here's what was going on when we got that red dot. Let's fix that shit and go get beers.' "
He, like system admins everywhere, imagines a day when #monitoringsucksless.