« Previous 1 2 3 Next »
Monitoring HPC Systems
Nerve Center
Configuring
Some tweaks (edits) need to be made to the gmetad and gmond configuration files. The changes to /etc/ganglia/gmetad.conf
are fairly easy. First, look for a line in the file that reads
data_source "my cluster" localhost
and change it to
data_source "Ganglia Test Setup" 192.168.1.4
where 192.168.1.4 is the IP address of the master node. You can use any name you want for your Ganglia cluster name, but I chose to make it "Ganglia Test Setup" as an example. The gmond configuration files also need to be modified slightly.
The following change needs to be made to the file /etc/ganglia/gmond.conf
: A section in the file starts with cluster {
. In that section, assign the variable name
anything you like; just be sure it is in quotes and don't use any quotes or other unusual characters in the name itself. I changed mine like this:
name = "Ganglia Test Setup"
That's about it for configuring Ganglia on the master node. When I start adding Ganglia clients, I'll have to come back and edit gmetad.conf
to add client IP addresses, but that happens later in the article. At this point, I have a choice: I can proceed with installing the Ganglia web interface, or I could test gmond to make sure it's collecting data on the master node. I tend to be a little more conservative and want to run a test before jumping into the deep end of the pool.
Testing gmond and gmetad
To run and test (debug) gmond from the command line, I'll run it "by hand," telling it that I'm "debugging." Sometimes this process produces a great deal of output, so I'll capture it using the script
command (Listing 7). Remember to use Ctrl+C (^c) to kill gmond and then Ctrl+D (^d) to stop the script.
Listing 7
Testing gmond
[root@home4 laytonjb]# cd /tmp [root@home4 tmp]# script gmond.out [root@home4 tmp]# gmond -d 5 -c /etc/ganglia/gmond.conf [root@home4 tmp]# ^c [root@home4 tmp]# ^d
Take a look at the top of the file and you should see some output that looks like Listing 8, which indicates that gmond is working correctly. If everything is running correctly – at least as far as you can tell – then start up the gmetad
and gmond
daemons and make sure they function correctly (Listing 9). You should see the OK output from these commands (one from each). If you don't, you have a problem and should go back through the steps.
Listing 8
gmond Test Output
[root@home4 tmp]# gmond -d 5 -c /etc/ganglia/gmond.conf loaded module: core_metrics loaded module: cpu_module loaded module: disk_module loaded module: load_module loaded module: mem_module loaded module: net_module loaded module: proc_module loaded module: sys_module loaded module: python_module udp_recv_channel mcast_join=239.2.11.71 mcast_if=NULL port=8649 bind=239.2.11.71 buffer=0 socket created, SO_RCVBUF = 124928 tcp_accept_channel bind=NULL port=8649 gzip_output=0 udp_send_channel mcast_join=239.2.11.71 mcast_if=NULL host=NULL port=8649 Unable to find the metric information for 'procs_blocked'. Possible that the module has not been loaded. Unable to find the metric information for 'procs_created'. Possible that the module has not been loaded. Unable to find any metric information for 'softirq_(.+)'. Possible that a module has not been loaded. metric 'cpu_user' being collected now [tcp] Starting TCP listener thread... ...
Listing 9
Starting Daemons
[root@home4 ~]# /etc/rc.d/init.d/gmond start Starting GANGLIA gmond: [ OK ] [root@home4 ~]# /etc/rc.d/init.d/gmetad start Starting GANGLIA gmetad: [ OK ]
If you got two OKs, then you can also check whether the processes are running and the ports are configured correctly (Listing 10). Notice that port 8640 is in use, so everything's good at this point. Now I'm ready to install the web interface!
Listing 10
Checking Processes and Ports
[root@home4 ~]# ps -ef | grep -v grep | grep gm nobody 21637 1 0 18:12 ? 00:00:00 /usr/sbin/gmond nobody 21656 1 0 18:12 ? 00:00:00 /usr/sbin/gmetad [root@home4 ~]# netstat -plane | egrep 'gmon|gme' tcp 0 0 0.0.0.0:8651 0.0.0.0:* LISTEN 99 253012 21656/gmetad tcp 0 0 0.0.0.0:8652 0.0.0.0:* LISTEN 99 253013 21656/gmetad tcp 0 0 0.0.0.0:8649 0.0.0.0:* LISTEN 99 252721 21637/gmond udp 0 0 192.168.1.4:47559 239.2.11.71:8649 ESTABLISHED 99 252723 21637/gmond udp 0 0 239.2.11.71:8649 0.0.0.0:* 99 252719 21637/gmond unix 2 [ ] DGRAM 252725 21637/gmond
Web Interface
Ganglia has a second-generation web interface that is very flexible, including the ability to define your own charts. It uses RRDtool as the database for the charts, a common theme in the monitoring world.
You can download it from SourceForge [17] or get it from the Ganglia website. I will be using the latest version 3.5.12, which was the latest version at the time of writing. RRDtool requires HTTPD and PHP, so be sure you install those.
Download the compressed TAR file and uncompress and untar the file. The README for the tool points to a URL for installation instructions. For my installation, I edited Makefile
and made just four changes:
(1) At the top of the file, change the GDESTDIR
line to:
GDESTDIR = /var/www/html/ganglia
This is where the Ganglia web interface will be installed.
(2) Change the GWEB_STATEDIR
line to:
GWEB_STATEDIR = /var/lib/ganglia-web
(3) Change the GMETAD_ROOTDIR
line to:
GMETAD_ROOTDIR = /var/lib/ganglia
(4) Change the APACHE_USER
line to:
APACHE_USER = apache
Once these changes are made, you can simply run make install
to install the Ganglia web pieces. Now comes the big test. In your browser, open the URL for the Ganglia web page as http://192.168.1.4/ganglia (recall that in gmetad.conf
I told it that the data source was 192.168.1.4). You should see something like the image in Figure 2. Notice that on the left-hand side of the image, near the top of the web page, that the number of Hosts up: is 1 and that it has eight CPUs. Plus, the charts are populated. (I took the screen capture after letting it run a while, so the charts actually had real data.)
Remember that the default refresh or polling interval is 15 seconds, so it might take a couple of minutes for the charts to show you much. Be sure to look at the data below the charts. If the values are reasonable, then most likely things are working correctly.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)