Bug affecting NTP multicast users on Linux

…and not just them.

It's the debian bug #654876 (CVE-2012-0207), which was introduced in the Linux kernel version 2.6.36, and affects IPv4 Multicast users. In particular, if you are using NTP multicast on that kernel version or higher, you are affected.

This bug seems easily exploitable in a local network, and may be used for denial of service attacks. Patches are available for Linux 3.0.17, 3.1.9, 3.2.1, with Debian porting it to their kernel package version 3.1.8-2.

For more information, see Ben's technical blog

How does delay influence a clock’s accuracy?

Running ntpq -p to check how ntpd is working is one of the first things a sysadmin learns when trying to debug problems with ntpd. One of the first questions which arises is "how do delay, offset and jitter relate?", or even "what do delay, offset and jitter mean?". Well, today I stumbled across a graph that gives me the opportunity to explain this in practice.

Let's start with explaining these three terms as bare-bones as possible. Note that definitions given below are here just to give you an idea and are no way accurate; please go to the NTP documentation, NTP FAQ or community support sites to get the real ones.

  • delay is an estimate of how many milliseconds an NTP packet takes to travel from a server to your client.
  • offset is an estimate of how many milliseconds your computer's clock differs from UTC.
  • jitter is an estimate of how these measurements are accurate, and it's non-negative: the smaller the jitter, the better accuracy.

Now look at this graph: … Continue reading

Not enough bits this time

It was some time I didn't go through the NTP docs and I felt I was missing some of the new interesting features of ntpd. So I decided to freshen my knowledge and go through them once again, and I allocated some time every week to read the docs.

Recently I started reading "Event messages and status words". I almost immediately found one thing I could not understand: apparently, the codes in the system status word and peer status word exceeded the assigned capacity of four bits. But I was tired and I decided to wait until the week after and re-read it.

Yesterday I started reading it again, and I also went through my notes from last time I read the doc. No luck, it was not that I was tired: there really were codes that could not be represented using just four bits. But, again, I thought I was mistaken and sent a message to the questions@ntp.org list.

And, to my surprise, it was actually a bug!

This drives me to two considerations.

The first: read the docs, ask yourself questions, then ask for clarifications if you need them, and report bugs if you happen to find one.

The second is actually a joke: only a pain-in-the-ass like me could bother enough to read that document, and count the bits and the codes, and find a mismatch. What a poor reputation I am building for myself 🙂

Too late to fix it

We installed a new datacenter right before I went on vacation, and of course we set up an NTP synchronization subnet there. As always, we configured four NTP multicast servers, and the rest as clients (we are talking about several hundreds of servers). The servers were running Debian Linux "Squeeze", while the vast majority of the clients was running Debian Linux "Lenny". For reasons I am not going to discuss here, using Squeeze instead of Lenny is not an option.

Right after the configuration was done, I noticed a really odd thing: all clients displayed a poll interval of 1024 seconds for one of the servers. This is just nonsense, as each server sends an NTP packet every 64 seconds. Anyway, the clients were in good sync so I decided I would investigate this after my vacation. And so I did. … Continue reading

Using rrdgraph for better NTP monitoring

Munin is a great tool, and it seems quite easy to monitor how your NTP service is going overall. E.g.: it's easy to put a web page together with all the offset graphs for your servers.

Unfortunately, this is far from optimal. In fact, the graphs will have different scales on the y axis, so a glance is not enough to check how they are doing overall. You'll actually need to check which values are displayed at the left side of the graph. This is annoying, because if you don't pain enough attention, you could miss bad things happening.

That's why I threw a reluctant eye to rrdtool's graph stuff. I've always been scared by the apparent complexity of the syntax, but I found out that what I needed was easy indeed. … Continue reading

independent_wallclock in Xen 4

I was asked to set up clock synchronization for a Xen VM, running on top of Xen 4.0, which in turn was running on top of a Debian Squeeze. As you may know, NTP on VMs just sucks, so I was looking for Xen 3.0's independent_wallclock setting, and have domU's clock follow dom0's.

With my surprise, /proc/sys/xen/independent_wallclock wasn't there anymore.

With some surprise (but not as much), my searches in Google returned just crap.

As it often happens, IRC came to the rescue. Or, well: no rescue, but at least I was provided with a reason:

(15:51:10) The topic for ##xen is: Xen 4.0 http://www.xen.org/downloads | XCP http://blog.xen.org/index.php/2010/11/24/the-xen-cloud-platform-xcp-1-0-beta-is-available-from-xen-org/ | Wiki: http://wiki.xen.org/ | Solaris: http://www.opensolaris.org/os/community/xen/ | Logs: http://zentific.com/irclogs | Management: check out #zentific
(15:54:48) bronto@freenode: hello! It's maybe an FAQ, but I couldn't find an answer anywhere. Xen 4.0 doesn't have /proc/sys/xen/independent_wallclock (at least, not in Debian Squeeze Linux). Is there any equivalent?
(15:55:14) bronto@freenode: I would like domUs to follow dom0's clock, and sync dom0 using NTP.
(15:55:42) pasik: bronto: it depends on the kernel
(15:55:47) pasik: bronto: pvops kernels don't have it
(15:55:57) pasik: bronto: old xenlinux kernels do have it
(15:56:33) bronto@freenode: pasik: thanks. hmmm… so, the solution is: use a different kernel?
(15:56:40) pasik: bronto: or ntp
(15:56:55) bronto@freenode: pasik: NTP on domU's? Uh, that just sucks…
(15:58:57) pasik: bronto: pvops (upstream) kernels don't have independent_wallclock because there was some problems with it, it wasn't thought to be accepted to upstream Linux
(15:59:03) pasik: bronto: I can't remember the details now
(15:59:27) bronto@freenode: pasik: ouch… 😦
(15:59:41) bronto@freenode: pasik: this is damn bad
(16:00:31) bronto@freenode: pasik: OK. I'll fall back to ntp, and hope it will work

It must be simpler than this…

I sat down scratching my head… that ntp client was syncing perfectly in unicast, and didn't create any association once configured in multicast. "Dah, the same old problem", I told to myself, "it's not getting the packets, setting a multicast route will fix it".

So I prepared the usual debugging set: one window running tcpdump 'dst port ntp', one window on the client running watch ntpq -c pe -c as, another one with tail -f /var/log/syslog | grep ntp and a free shell window. To my surprise, as soon as I fired up tcpdump, multicast ntp packets showed up. "What the…?!" I said. … Continue reading