I have used Nagios, an enterprise-class network and server monitoring system, for almost 10 years now and have...
yet to find another free, open source monitoring system that can beat it.
This article will walk you through setting up a basic installation of Nagios on a Solaris 10 system. For this example, I use Solaris 10 update 6 (released in October 2008) running in 32-bit mode on a VMware virtual machine. The host name is "sol10vm," but it will be different in your configuration. Alternate versions of Solaris and Apache Web Server should work fine; I've run Nagios on everything from Red Hat 7.3 to Mac OS X.
Nagios installation prerequisites
This tutorial assumes that you've installed the GNU Compiler Collection and GNU make that come on the Solaris 10 installation disc, and that the compiler works properly. In most cases, this simply involves adding
/usr/sfw/bin to your path environment variable. If you run "gcc" and "gmake" and get the following output, you're probably good to go.
[email protected]:/> gcc gcc: no input files [email protected]:/> gmake gmake: *** No targets specified and no makefile found. Stop.
For this demonstration, I use the Apache Web Server packages provided by Steve Christensen's SunFreeware project, specifically Apache 2.0.59 and its dependencies. These packages will install under
/usr/local, so make sure that
/usr/local/bin is in your path and that
/usr/local/ssl/lib are in your system library search path (use "crle;" see "man crle" for details).
Once you've edited the file
/usr/local/apache2/conf/httpd.conf and started the Web Server with
/usr/local/apache2/bin/apachectl start, use your Web browser to go to http://yourhostname. It should look something like this:
Downloading, compiling and installing Nagios
The first step to installing Nagios is to create a Nagios user and group. The following commands show how to do that on a freshly installed Solaris system. In your case, the user ID may not be 100, but I recommend making the Nagios group ID the same as the "nagios" user ID.
[email protected]:/> useradd -c "nagios user" -d /usr/local/nagios nagios [email protected]:/> grep nagios /etc/passwd nagios:x:100:1::/home/nagios:/bin/sh [email protected]:/> groupadd -g 100 nagios [email protected]:/> grep nagios /etc/group nagios::100: [email protected]:/> usermod -g nagios nagios
As of January 2008, the latest version of Nagios is 3.0.6 and the Nagios plug-ins are at version 1.4.13. You can get both from the Nagios download page. Download and extract both archives into a location of your choice. I prefer
I prefer to keep my Nagios installation in its own directory, so we'll pass an argument to the configure script telling it to install everything in
[email protected]:/usr/local/src/nagios-3.0.6> ./configure --prefix=/usr/local/nagios
Once the configure process completes without errors, type "gmake all" to compile the core Nagios software and Web CGIs. Next, type "gmake install" to install everything. Once the installation is finished, run "gmake install-init" and then "gmake install-config" to install sample configuration files and enable Nagios to start when the system boots.
After Nagios itself is compiled and installed, the next step is to repeat the process with the Nagios plug-ins, which enable enhanced system and service checks. After uncompressing the source code archive, the configure step is the same:
[email protected]:/usr/local/src/nagios-plugins-1.4.13> ./configure --prefix=/usr/local/nagios
After the configure script finishes, run "gmake" and "gmake install" to install the plug-ins in the directory that was created when you installed the core Nagios package. In addition, you must add
/usr/local/nagios/lib to your system library search path using the "crle" command as you did with
/usr/local/lib. If this step is omitted, it will cause errors with some of the plug-ins.
Configuring Apache for Nagios
For this example, we will not configure Nagios for HTTP user authentication. This makes the tutorial simpler, but it should not be used in a production environment. Once you've gone through this tutorial and understand how things are set up, read the official Nagios documentation and modify your configuration to implement user authentication.
To configure Apache for use with Nagios, add the following code to your Apache config file. In this case the file is located in
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin Alias /nagios /usr/local/nagios/share <Directory "/usr/local/nagios/sbin"> Options ExecCGI AllowOverride None Order allow,deny Allow from all </Directory> <Directory "/usr/local/nagios/share"> Options None AllowOverride None Order allow,deny Allow from all </Directory>
Once Apache is configured for Nagios, restart the Web Server with
/usr/local/apache2/bin/apachectl graceful, or by running
/usr/local/apache2/bin/apachectl stop followed by
Even though Nagios is not yet fully configured and started, you should be able to go to http://yourhostname/nagios in a Web browser and see a screen like this:
Nagios has a number of configuration files, located in both
The first file we need to edit is
/usr/local/nagios/etc/cgi.cfg. In that file, change the value of "use_authentication" to 0. For production use you will want to re-enable this after reading the documentation about HTTP user authentication.
The second file to edit is
/usr/local/nagios/etc/nagios.cfg. In this file, change both "check_external_commands" and "use_syslog" to 0. This prevents someone from running external commands against your Nagios installation when user authentication is not in effect and keeps Nagios from spamming your syslog.
The default "contact group" configuration for Nagios is fine in this basic example. Edit
/usr/local/nagios/etc/objects/contacts.cfg and change "[email protected]" to your email address under the "nagiosadmin" contact definition. In order for email alerts to work, you need a functioning mail server or mail relay on your Solaris system (that configuration is beyond the scope of this article).
You'll see in contacts.cfg that the contact definition says to use the generic-contact template. This template is defined in
/usr/local/nagios/etc/objects/templates.cfg, and also references the time periods in
/usr/local/nagios/etc/objects/timeperiods.cfg. In most cases you will want to leave these definitions alone, but they're highly customizable and allow for multiple contacts over multiple shifts, or for contacting different people depending on what time of day a problem occurs.
If you ran the command
gmake install-config earlier after compiling and installing Nagios, there's already a localhost.cfg file in place to check various services on the local machine on which Nagios is running. You can safely ignore the "linux-server" references in this file; the author assumes it will be running on a Linux system. We want to trim these down to checks for network connectivity, the Web Server and the SSH daemon. Comment out the entries in this file for the Root_Partition, Current Users, Total Processes, Current Load and Swap Usage services. This will leave only the service definitions for "check_http," "check_ssh," and "PING" uncommented. The commands used for service checks are defined in the commands.cfg file. You can add your own by editing the file and then use them in your service definitions.
The files printer.cfg, switch.cfg and windows.cfg contain more examples of how to monitor printers, switches and Windows systems using some of the advanced Nagios plug-ins. We will not use these files in this tutorial, but they are worth reading to get a feel for how the various pieces of the Nagios puzzle fit together.
After the configuration files have been edited to your satisfaction, it's time to run Nagios to verify your configuration files and make sure that nothing has been forgotten. To do this, run
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg. If everything checks out, the output will look something like this:
If the Nagios configuration verification fails, it will tell you what problems it has found. Go back and check your config files, and run the verification process again until it says, "Things look okay."
Once your configuration is good, it's time to run Nagios. If you ran "gmake install-init" earlier, a script has already been created in
/etc/init.d that will properly start everything for you. Run
/etc/init.d/nagios start to start the process. Once it's running, you should be able to go to http://yourhostname/nagios with a Web browser and click on Tactical Overview to see a global status. In this screenshot you can see that the one host being monitored is OK, as well as the three services on that host. Notifications for two of those services are disabled.
Clicking on Service Detail will give you a detailed status report on all the individual services being monitored, as well as the result of their last check:
Host Detail does just that -- it shows a detailed status display with one line for each host being monitored:
The links to Hostgroup Overview and Hostgroup Summary will show similar status displays for each group of hosts (as defined in the configuration files). Since we only have one host (and one host group) in this quick tutorial, there's no need to show screenshots.
By default, Nagios will check each host and service every five minutes. If something goes down, the Web display for that host or service will change from green to red and an email notification will be sent to the contact groups (and by expansion, the contacts) defined in the host template via templates.cfg. Once the host or service resumes normal operation, email alerts will go out to the defined contacts.
Further reading on Nagios
This tutorial barely scratches the surface of the features in the Nagios enterprise monitoring system, and only demonstrates the most basic of its capabilities. The Nagios online documentation goes into further detail, and a number of good books have been published on the topic. I recommend these titles:
- Nagios: System And Network Monitoring by Wolfgang Barth
- Building a Monitoring Infrastructure with Nagios by David Josephsen
- Learning Nagios 3.0 by Wojciech Kocjan
Hopefully this basic tutorial will get you started using Nagios for all of your network and server monitoring needs.
ABOUT THE AUTHOR: Bill Bradford is the creator and maintainer of SunHELP and lives in Houston, Texas, with his wife Amy.
Did you find this helpful? Write to Matt Stansberry about your data center concerns at [email protected].