Using rrdgraph for better NTP monitoring

Munin is a great tool, and it seems quite easy to monitor how your NTP service is going overall. E.g.: it's easy to put a web page together with all the offset graphs for your servers.

Unfortunately, this is far from optimal. In fact, the graphs will have different scales on the y axis, so a glance is not enough to check how they are doing overall. You'll actually need to check which values are displayed at the left side of the graph. This is annoying, because if you don't pain enough attention, you could miss bad things happening.

That's why I threw a reluctant eye to rrdtool's graph stuff. I've always been scared by the apparent complexity of the syntax, but I found out that what I needed was easy indeed. …Like this:

#!/bin/bash

RRDPATH=/where/rrd/files/are
IMGPATH=/where/images/will/go
IMGNAME=ntpint.png
IMGFMT=PNG
WIDTH=500
HEIGHT=300
LIMIT=0.005
TITLE="Offset"
VLABEL="milliseconds"

COLORS=("0000ff" "00ff00" "ff0000" "990000")
CIDX=0

GRAPHDEF='HRULE:0#000000'

for FILE in $RRDPATH/*-ntp_kernel_pll_off-ntp_pll_off-g.rrd
do
  BASE=$(basename $FILE)
  SERVER=${BASE%%-ntp_kernel_pll_off-ntp_pll_off-g.rrd}
  LABEL=${SERVER%%.*}
  COLOR=${COLORS[$CIDX]}

  GRAPHDEF="$GRAPHDEF DEF:$LABEL=$FILE:42:AVERAGE LINE2:$LABEL#$COLOR:$LABEL "

  CIDX=$(($CIDX+1))
done

rrdtool graph $IMGPATH/$IMGNAME 
  --start -1d 
  --title $TITLE 
  --vertical-label $VLABEL 
  --width $WIDTH 
  --height $HEIGHT 
  --full-size-mode 
  --upper-limit $LIMIT 
  --lower-limit "-$LIMIT" 
  --rigid 
  --imgformat $IMGFMT 
  $GRAPHDEF

So basically, the script iterates over RRD files found in $RRDPATH, and adding to the graph definition in $GRAPHDEF. Finally, a single command will merge the offset information together in a single, fixed-range graph. With some changes, you can get rid of all those variables and use command-line options instead, which would make it much more flexible.

The final results looks like this:


And, of course, if you couple this script with a web server and cron, you can have the image update regularly and you can keep an eye on it from your web browser.


And if you have many sets of NTP servers, you can use the same script to create an image for each set and put them on a single page, creating a nice dashboard. And since all the graphs will be the same scale on the y axis, this time it will take just a glance to tell if something is wrong or not.

Advertisements

2 thoughts on “Using rrdgraph for better NTP monitoring

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s