Monitoring network computers with the Icinga Nagios fork
Server Observer
A server can struggle for many reasons: System resources like the CPU, RAM, or hard disk space could be overloaded, or network services might have crashed. Depending on the applications that run on a server, consequences can be dire – from irked users to massive financial implications. Therefore, it is more important than ever in a highly networked world to be able to monitor the state of your server and take action immediately. Of course, you could check every server and service individually, but it is far more convenient to use a monitoring tool like Icinga.
Nagios Fork
Icinga [1] is a relatively young project that was forked from Nagios [2] when development of the popular open source network monitor stagnated. Icinga delivers improved database connectors (for MySQL, Oracle, and PostgreSQL), a more user-friendly web interface, and an API that lets administrators integrate numerous extensions without complicated modification of the Icinga core. The Icinga developers also seek to reflect community needs more closely and to integrate patches more quickly. The first stable version, 1.0, was released in December 2009, and the version counter has risen every couple of months ever since.
Icinga comprises three components: the core, the API, and the optional web interface. The core collects system health information generated by plugins and passes it via the IDOMOD interface to the Icinga Data Out Database (IDODB) or the IDO2DB service daemon. The PHP-based API accepts information from the IDODB and displays it in a web-based interface. Additionally, the API facilitates the development of add-ons and plugins. Icinga Web is designed to be a state-of-the-art web interface that is easily customized and with which administrators can keep an eye on the state of the systems they manage. At the time of writing, Icinga Web is still in beta, and it has a couple of bugs that make it difficult for me to recommend for production use.
If you only need to monitor a single host, Icinga is installed easily. Some distributions offer binaries in their repositories, but if this is not the case or if you prefer to use the latest version, the easy-to-understand documentation includes a quick-start guide (for the database via libdbi with IDOUtils), which can help you set up the network monitor in next to no time for access at http://Server/icinga. The challenges come later, because it is highly likely you will want to monitor a larger number of computers.
Icinga can monitor the private services on a computer, including CPU load, RAM, and disk usage, as well as public services like web, SSH, mail, and so on. My lab network environment consists of three computers, one of which acts as the Icinga server; the other two are a web server and a file server that send information to the monitoring server. Because no native approach lets you request information externally about CPU load, RAM, or disk space usage, you need to install a verbose add-on, such as NRPE [3], on each machine. The remote Icinga server will tell it to execute the plugins on the local machine and transmit the required information.
Icinga sends the system administrator all the information needed and alerts the admin in case of an emergency. Advanced features that are a genuine help in daily work include groups, redundant monitoring environments, notification escalation, or check schedules.
Icinga differentiates between active and passive checks. Active checks are initiated by the Icinga service and run regularly at times specified by the administrator. For a passive check, an external application does the work and forwards the results to the Icinga server, which is useful if you can't actively check the computer because it resides behind a firewall, for example. A large number of plugins [4] already exist for various styles in Nagios and Icinga. But before the first check, the administrator needs to configure the computers and the services to monitor in Icinga.
The individual elements involved in a check are referred to as objects in Icinga. Objects include hosts, services, contacts, commands, and time slots. To facilitate the daily work, you can group hosts, services, and contacts. The individual objects are defined in CFG files, which reside below Icinga's etc/objects
directory. The network monitor includes a number of sample definitions of various objects that administrators only need to customize.
In principle, you can define multiple objects in a CFG file, but you can just as easily create separate files for each object in a directory below /path-to-Icinga
/etc/objects
. Lines that start with a hash mark within an object definition are regarded as comments, as is everything within a line to the right of a semicolon.
Defining Hosts and Services
Listing 1 provides a sample host definition. The host is the web server at a language center (display_name
) and is displayed accordingly in the web interface. To inform the administrator (contacts
) when the server goes down (notification_options
), I want Icinga to ping (check_command
) the server every 5 minutes (check_interval
). If the server is still down 60 minutes (notification_interval
) after notifying the administrator, I want to send another message. Icinga is capable of deciding whether a host is down or unreachable (see Table 1). However, to determine that a host is unreachable, you have to define the nodes passed along the route to the host as parents – and this will only work if the routes for outgoing packets are known. The file server definition looks similar.
Listing 1
my_hosts.cfg
# Webserver define host{ host_name webserver alias languagecenter display_name Server at language center address 141.20.108.124 active_checks_enabled 1 passive_checks_enabled 0 max_check_attempts 3 check_command check-host-alive check_interval 5 retry_interval 1 contacts spz_admin notification_period 24x7 notification_interval 60 notification_options d } # Fileserver define host{ host_name fileserver alias Fileserver display_name Fileserver address 192.168.10.127 active_checks_enabled 1 passive_checks_enabled 0 max_check_attempts 3 check_command check-host-alive check_interval 5 retry_interval 1 contacts admin notification_period 24x7 notification_interval 60 notification_options d,u,r }
Table 1
States
Option | Status |
---|---|
Server | |
o
|
OK |
d
|
Down |
u
|
Unreachable |
r
|
Recovered |
Services | |
o
|
OK |
w
|
Warning |
c
|
Critical |
r
|
Recovered |
u
|
Unknown |
Once the servers are defined, the administrator configures the respective services that Icinga will monitor (Listing 2), along with the matching commands (Listing 3), the intervals (Listing 4), and the stakeholding administrators (Listing 5). The individual configuration files have a similar structure. For each service, you need to consider the interval between checks. One useful feature is the ability to define time slots, within which Icinga will perform checks and, if necessary, notify the administrator. Here, time limitations or holidays can be defined. The contact configuration can include email addresses or cell phone numbers, but to integrate each contact with, for example, an Email2SMS gateway or a Text2Speech system (e.g., Festival), you need a matching command.
Listing 2
my_services.cfg (Excerpt)
# SERVICE DEFINITIONS define service{ host_name webserver service_description HTTP active_checks_enabled 1 passive_checks_enabled 0 check_command check_http max_check_attempts 3 ;how often to perform the check before Icinga notifies check_interval 5 retry_interval 1 check_period 24x7 contacts spz_admin notifications_enabled 1 notification_period weekdays notification_interval 60 notification_options w,c,u,r } define service{ host_name fileserver, webserver service_description SSH active_checks_enabled 1 passive_checks_enabled 0 check_command check_ssh max_check_attempts 3 check_interval 15 retry_interval 1 check_period 24x7 contacts admin notifications_enabled 0 }
Listing 3
commands.cfg (Excerpt)
# 'notify-service-by-email' command definition define command{ command_name notify-service-by-email command_line /usr/bin/printf "%b" "***** Icinga *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/bin/mail -s "**$NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTA TE$ **" $CONTACTEMAIL$ } # 'check-host-alive' command definition define command{ command_name check-host-alive command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5 }
Listing 4
timeperiods.cfg (Excerpt)
define timeperiod{ timeperiod_name 24x7 alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 monday 00:00-24:00 tuesday 00:00-24:00 wednesday 00:00-24:00 thursday 00:00-24:00 friday 00:00-24:00 saturday 00:00-24:00 } define timeperiod{ timeperiod_name wochentags alias Robot Robot monday 07:00-17:00 tuesday 07:00-17:00 wednesday 07:00-17:00 thursday 07:00-17:00 friday 07:00-17:00 }
Listing 5
contacts.cfg (Excerpt)
define contact{ contact_name icingaadmin alias Falko Benthin host_notifications_enabled 1 service_notifications_enabled 1 host_notification_period 24x7 service_notification_period 24x7 host_notification_options d,u,r service_notification_options w,u,c,r host_notification_commands notify-host-by-email service_notification_commands notify-service-by-email email root@localhost }
Icinga can use macros, which noticeably simplifies and accelerates many tasks because you can use a single command for multiple hosts and services. Listings 2 and 3 give examples of macros. All services defined for monitoring the file server include a check_nrpe
instruction with an exclamation mark. Each exclamation mark can be followed by an argument, which in turn is evaluated by the macros in other definitions. Macros are nested in $
signs.
After creating the configuration files and storing them in etc/objects
, you still need to tell Icinga by adding a new cfg_file=/usr/local/icinga/etc/objects/object
.cfg
to the main configuration file, /etc/icinga.cfg
. After doing so, you should verify the configuration, /path-to-Icinga
/bin/icinga -v /path-to-Icinga
/etc/icinga.cfg
; assuming there are no errors, restart Icinga (/etc/init.d/icinga restart
).
GUI and Messages
Icinga works without a graphical interface, but it's much nicer to have one. The standard interface can't deny its Nagios ancestry, but it is clear-cut and intuitive.
If everything is working, you'll see a lot of green in the user interface (Figure 1), but if something goes wrong somewhere, the color will change and move closer and closer to red to reflect the status of the hosts or services (Figures 2 and 3). Status messages are typically linked so that clicking one takes you to more detailed information.
If something is so drastically wrong that a message is necessary, Icinga will check its complex ruleset to see whether it should send a message and, if so, to whom (Figure 4). The filters through which the message passes check the following: whether notifications are required, if the problem occurred at a time when the host and service should be running, if messages should be sent for this service in the current time slot, and what the contacts linked to the service actually want. Each contact can define its own rules to stipulate when it wants to receive messages and for what status. If multiple administrators exist and belong to a single group, Icinga will notify all of them. Again, you can define individual notification periods so that each admin will be responsible for one period.