News Stay informed about the latest enterprise technology news and product updates.

'Monitoring Sucks' movement rallies for better systems monitoring tools

IT pros have long suffered subpar systems monitoring tools, but they're not taking things lying down.

It’s official: IT professionals across the world hate systems monitoring. In fact, the phrase "monitoring sucks" has taken on a life of its own, spawning blogs, an open source code repository at GitHub – even the #monitoringsucks hashtag on Twitter.

"'Monitoring sucks' is a shared meme that we all coalesce around," said Chris Kelly, a developer and evangelist at New Relic Inc., a Software as a Service (SaaS)-based Web application performance monitoring provider based in San Francisco, Calif.

What’s so bad about systems monitoring? John E. Vincent, director of cloud infrastructure at Minneapolis, Minn.-based EnStratus, a cloud management software provider, enumerated monitoring’s long list of shortcomings in this blog post. They include, but are not limited to:

  • "Having to set up "fake" hosts just to group arbitrary metrics together ";
  • "Having to either collect metrics twice - once for alerting and another for trending";
  • "Only being able to see my metrics in 5-minute intervals";
  • "Having to choose between [a poor] interface but great monitoring or [poor] monitoring but great interface";
  • "Perl (I kid...sort of) "; and
  • "Not actually having any real choices."

Vincent started the Monitoring Sucks code repository on GitHub Inc., the popular open source code and idea-sharing site, which collects enhancements and integrations to existing open source systems monitoring tools such as Nagios, Cacti and Graphite, or new ideas about smart ways to work with those tools.

I like the idea of open source, but it comes down to how much is my time worth – is this sustainable?

Mike Horwath, CTO at ipHouse

Wanted: Systems monitoring tools that work

But whatever efforts that code repository has inspired haven’t been enough. In particular, the Nagios open source systems monitoring tool really raises system administrators’ hackles. Graphing packages, which include Cacti, Graphite and Zabbix, also get singled out.

And when they try to combine monitoring and graphing, IT pros throw in the towel.

"I needed something better than red light/green light," said Mike Horwath, CTO at ipHouse, a hosting provider in Minneapolis, Minn.  "I needed both monitoring and measuring, and I had to get those integrated."

The firm had used Nagios over the years, but has had limited success pairing it with Cacti or Zabbix.

"They give you lots to graph and measure, but it gets overly complicated," he said.

That’s the problem with open source tools as a whole. "I like the idea of open source, but it comes down to how much is my time worth – is this sustainable?" Horwath said.

Horwath took a right turn and purchased commercial monitoring and reporting software from Santa Monica, Calif.- based hosting tool provider LogicMonitor Inc. that did both out of the gate.

LogicMonitor is offered as SaaS and can’t be customized ad infinitum. But the depth of the reporting that is available makes up for the fact that you can’t tweak every last setting, Howarth said. In fact, the only thing about Nagios that Horwath misses is the "random back-off" feature, which waits to see if a soft failure resolves itself with time before sending out an alert.

For other shops, SaaS tools such as LogicMonitor aren’t an option, but an upgrade to a commercial open source package complete with support is.

A large U.S.-managed service provider with multiple locations recently moved from a combination of Hewlett-Packard’s OpenView, Nagios and OpenNMS and homegrown code to commercial open source Zenoss monitoring suite, citing its easy-to-use GUI interface, support, plus the flexibility customizes the package to the firm’s needs, said the network systems manager responsible for the change, who requested anonymity.

"We can’t entirely go away from homegrown," he said. With Zenoss, "we can still go down and make the changes we need to make."  With other tools he looked at, such as SolarWinds, "we couldn’t customize it the way we needed to."

Enterprise-level support is important, too.

"There are times when stuff is just over my head," he said.

Systems monitoring nirvana

Meanwhile, EnStratus’ Vincent has a vision of what would make monitoring better:

"There are plenty of frustrated system administrators, developers, engineers, "devops" and everything under the sun who don't want much. All they really want is for [things]to work. When [stuff] breaks, they want to be notified. They want pretty graphs. They want to see business metrics alongside operational ones. They want to have a 52-inch monitor in the office that everyone can look at and say: 'See that red dot? That's bad. Here's what was going on when we got that red dot. Let's fix that shit and go get beers.' "

He, like system admins everywhere, imagines a day when #monitoringsucksless.

Dig Deeper on Real-Time Performance Monitoring and Management

Join the conversation


Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Monitoring sucks – true or false?
I agree with using LogicMonitor
The tools and the protocols both suck - you can do a lot with snmp, syslog, sflow, tcpdump, and rrd, (for example) but they don't work together, much less work with the system and application demons, so someone has to end up owning the glue.

The glue isn't a profitable business compared to the boxes - my company thinks it might be a much a 1000% less profitable than selling network equipment - so you don't have a lot of market enthusiasm for it, which leads to token efforts and a lot of home brew and FOSS efforts that never achieve critical mass.
Has anyone looked at Nagios' commercial product? what do you think of it? It has graphs, metrics and business process monitoring.
....and when will companies take this serious enough to really put some effort into these solutions.
Loads of folk groaned about Lotus Domino, and especially the notes client, but the Domino Admin Panel did allow you to form a picture of what was happening with your Domino servers and in more than one notes domain. Since being moved to the Exchange 2007 environment, the realisation that Exchange has nothing to rival the Domino Admin panel has been a constnat disappointment. All MS's effort seems to have been to produce a pretty user client, but little effot went into the server management aspect.
Xymon has been helpful for me...
if you are a busy sysadmin - you don't have time to build your own tools
Amen, John Vincent! Nagios got us somewhere, but it is no longer viable to our vision of where we need to be. I'm surprised this article didn't mention Splunk, but we'll check out Zenoss now, too.
Opsview is like Nagios (runs on Nagios) but the UI is much better, and you dont have to create hosts to group hosts/items together. Plus they have dashboards, etc. We've been using them for a while now - great price too.
yes its sucks but there is reselutaion to get rid out of this problem
We have started using SPLUNK a great product that we can ingest data from any source and monitor on a real time basis with quick setup and notifications
Agree with the get notified comment! Check out PagerDuty to at least make the notification part easier.
Everybody wants a magic wand in this stuff.
Why - no other area of IT has this expectation
I've evaled at least half a dozen products in the last 8 months. So far I am relatively pleased with the capabilities and features in HP's NNMi. Still have more work to do though with digging into it's features and capabilities, to see how well they can be lined up with all of our various needs for monitoring, alerting and business intelligence.