Debugging NTP again (part 4 and last)

I finally got the Xen servers' clock to stabilize! The clocks kept converging overnight, and are now sporting a very little offset -so little that the graph for today have visible hiccups.

The problem appears to be caused by a kernel bug. If it was fixed in more recent kernels and/or patches, I just don't know. Anyway, the kernel bug is worked around if you set disable kernel in your ntp.conf. Besides, you should set xen.independent_wallclock to zero, and set your clocksource to xen.

Sure, it would not have been possible with the help of other people, so let's tell the full story and provide acknowledgements for those who deserve it. … Continue reading

Debugging NTP again (part 2)

If you wonder what I tried yesterday, well here you are. I tried to make one of the two crazy servers a unicast client. I hoped that having the chance to decide when to poll the stratum 1's would help them to adjust better. Although a bit "irregular", it seemed to work better for a while. Unfortunately, when I came back to the office today and looked at the graphs, it eventually started to oscillate more and more during the night, until it had a reset today. The other one is still going bad, of course.

One more day has passed, and still looking for a solution… … Continue reading

Debugging NTP again (part 1)

Xen servers are well known for having time synchronization problems. I have a few here, too. Two datacenters, with three multicast NTP servers and two Xen servers in each. And in each datacenter, one of the Xen servers is working like a charm, while the other's offset keeps oscillating more and more, until ntpd resets, and the cycle starts again. Using known workarounds just made things worse.

Having a Xen servers's clock suddenly going backwards from time to time isn't any good. This really calls for a deep debug. … Continue reading

So, what’s this ntp server doing?

Debugging a multicast ntp server today. I want to see if it's throwing the right IGMP packets, and if it will finally throw those damned NTP multicast packets.

tcpdump comes to the rescue, and it turns out that ntpd is not doing what it's supposed to do, so I am in for further debugging and research 😦

By the way, the tcpdump line I used is:

tcpdump '( (ip multicast) and (dst port ntp) ) or igmp '

PS: forgot to say: that's a Debian Lenny system.