Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Monitoring your data center with Nagios: A discussion with Taylor Dondich

GroundWork Open Source senior developer Taylor Dondich discusses open source network monitoring with Nagios and the future of open source software in the data center.

Taylor Dondich, author of Network Monitoring with Nagios (O'Reilly) , was working as a network engineer in Las Vegas where a limited budget steered him toward open source software. After evaluating the various options, he decided on Nagios, over which Taylor wrote the open source configuration tool, Fruity . This grabbed the attention of GroundWork Open Source where he is now; and the rest is history.

Can you describe Nagios in a bit more detail?

More on data center network monitoring:
Data Center Library: Network Monitoring With Nagios

Measuring network hardware utilization

Taylor Dondich: At the core, Nagios is a very small daemon and it's supported by a library of plug-ins. These plug-ins are miniature programs whose sole job is to go out and check the status of things. So, you might have a plug-in that says, "I want to check on the status of this certain network port on this Cisco switch," or "I want to check the file system's free space on my Windows server." All these programs talk to the devices and talk back to Nagios and say the status of this device is that it's okay, or that it's down, or that there may possibly be a problem. There are those two components and another component: the UI component. It is a web-based application, very accessible, where an individual can login, view the resources that he's responsible for, and manage those resources; take a look at the status, send commands back to Nagios, things of that nature.

How does that compare to the commercial options?
Dondich: If you're looking at HP Open View or Tivoli, these are very large program software suites; a lot of components. Nagios is really just one part of the whole solution. It's a state monitoring solution. It can tell you which devices are up or down. It doesn't give you really granular control. For example, it can't do performance graphing, it doesn't have trouble ticketing; it's just one piece of the puzzle. There are other open source alternatives to Nagios, one being openNMS. But the maturity isn't there and the community isn't as full blown as Nagios. That's how you would compare it to other products out there, especially the commercial aspects. It's [the state monitoring] piece of the puzzle and it does that part really well.

Have you seen a trend in the kinds of monitoring solutions that IT managers are looking for?
Dondich: I've been noticing a lot more application monitoring, what applications are failing and where it's failing. Energy consumption monitoring is something that's big and also just heterogeneous monitoring. Now that Linux is becoming a much more powerful platform in the enterprise, you've got people that need to monitor various types of Linux devices alongside their Windows structure. Those kinds of things are coming into the forefront. Now, because of that you need a network monitoring solution which is extremely flexible in handling that. Nagios has a plug-in architecture where they are custom written to monitor dynamic types of devices; Windows devices, Linux devices. Any device that can talk: PCP, UDP, there are plug-ins out there for environmental control, UPS units. So, any kind of device that your team needs to think about, there is usually a plug-in or some kind of support for that. Nagios comes with a standard set of plug-ins that will get you by on a pretty common set of platforms. Because Nagios has such a powerful community, the members have written plug-ins. Groundwork Open Source has written a set of plug-ins to do WMI monitoring on Windows devices. There's a good online repository called nagiosexchange.org, which is a directory of extensions for Nagios where a great collection of plug-ins can be found. If you have a specific kind of device, normally you can find a plug-in for it. If there isn't, the plug-in API is extremely simple. If you have a developer that knows a little Perc or C, they can probably write a plug-in for it.

Why open source? Is it feasible on the enterprise level?
Dondich: If you take a look at some of these big organizations entering open source software --Novell, IBM, -- you get a sense that, yeah, the enterprise is ready to take on open source as a viable solution to the problems. Network monitoring, for example, just a few years ago, you would deal with some of the larger network monitoring tools, such as Tivoli, HP Open View, and that was it. You had to deal with large licensing costs, complexity in the software and sometimes frustrating support on the other end. Open source software helps alleviate that. There tends to be much less total cost of operation, the support is strong because it's all community-based. If you have a question or a problem, usually asking a question on a mailing list gets you a solution right away. And as the complexity of open source software grows, it can match the feature set of some of the commercial software that you had to deal with previously. Yes, I think open source software is now in the mainstream and you can use it as a viable solution for your enterprise.

Does the fact that open source software is community-based beg any security questions or issues?
Dondich: There are commercial companies offering commercial support for open source software. Groundwork is a great example. We have open source software and we support it commercially. You can go through commercial pipelines to get your support for open source software and you have all your security issues and things like that alleviated. But, when you talk to the open source community, you [shouldn't] expose your security issues to them. If you're discussing configuration, you don't want to discuss the specific configuration of your servers or say my IP address is this and this and this. So you have to kind of be wary of that. But usually the community is extremely helpful. And the Nagios open source community is by far one of the most helpful communities. Usually, they'll be able to help you without really discussing details about your network infrastructure and how it's organized.

Are there situations where an IT manager might not want to use an open source tool?
Dondich: It depends on the maturity of the software you're looking for. For example, Nagios has been around for years and years. Before Nagios, it was known as NetFaint. So, the product has had a great deal of time to mature and become stable enough to really trust your network on it. That's why Groundwork chose Nagios as one of its core open source software--because of its stability; because it's been around for a while. That's one of the major factors when looking at open source products to use in your enterprise, especially when possibly replacing a commercial solution that's already in place. You definitely don't want to look at a bleeding edge open source project and say "Man, I want to implement this!" You have to take a look at the stability, the feature set and if there is support around it; if there is commercial support options around, because sometimes you might have a question that the OS community won't be able to answer. Having a commercial support line in lieu of that definitely helps.

With so much emphasis on software and hardware, we forget that people are the other half of the data center. Can you talk about the management side of using Nagios?
Dondich: When you're trying to implement an IT management solution, you're not just dealing with the tools, you're dealing with IT policies, how to react to issues, things of that nature. Nagios is only going to support the policies you put in place. When you start looking at putting in a network monitoring solution, you need to talk about the procedures: how are you going to notify the individuals that make up your team? What is the network admin's workflow? Are the network admins reachable by pager, cell? You have to put things of that nature together. Proper escalation procedures: If the network admin doesn't fix this certain kind of problem within a certain amount of time, the manager needs to be notified. Even security issues: we don't want any of this type of traffic to go out of this area of our DMV; we need to document that. Nagios really compliments those policies. It doesn't help you design those policies. You have to have really good procedures in place before you have Nagios overlay them. But those are the things you have to think about. If you try to implement Nagios and develop policies later, those policies are going to be skewed.

Dig Deeper on Real-Time Performance Monitoring and Management

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.