The dirty dozen

I still remember very well the first (and only, for now) time I held a seminar for my colleagues in Opera: “NTP, a misunderstood protocol” in February 2011. Now fast forward to 2012. We had one leapocalypse between June and July, one bogus leap second one month later, and a bit of disaster about one week ago: nearly two years have passed since I held that seminar, and NTP is still a misunderstood protocol. …That day I didn’t have a huge audience: only six colleagues came to hear my speech (and none of them were sysadmins, if I recall correctly). And I remember one of them in particular, who questioned the need to have four independent sources configured in each client, and in each of our four servers for a robust configuration. But I probably was not good enough if that colleague finished: “I still think it’s too much, and I’ll use just one”. That day, he joined the huge gang of sysadmins who think that one source is just good enough, and those who still think that putting an ntpdate in cron is a good thing. And on top of the pile, he joined the brainless implementers of those OSs that allow you to configure only one time source in systems that are crucial for a whole infrastructure.

Then this happens: one server goes crazy and, thanks to a nice chain of ill configured servers/clients, a disaster breaks havoc on the internet, with hundreds of computers stepping the time back to 12 years ago: after the leap second, we got the leap dozen.

I had my servers configured correctly, and I didn’t even notice. I knew about the disaster only three days later, and by incident. And I have one thing to say to the bad time gang: you deserved it, because experts told you you were doing wrong, and you didn’t care and didn’t listen.

Yes, it could be prevented. First and foremost, ntpd should panic to protect itself from stepping back so much, so if it happened to you and you were using ntpd, your configuration is surely screwed.

If you had only one time source, and that source was bugged (it’s called a falseticker), and you did have that ugly ntpdate in cron and pointing to a bugged server, you really had no chance to survive the step back (and again: you probably deserved it, seriously).

If you had two sources, then you could escape it, if you were lucky enough. But with only two reference clocks, and sporting completely different times, it may be tricky for ntpd to understand which source is right, so chances are that those with just two sources (and one of them bugged) may have either stepped back, or panicked.

If you had three sources (and one of them bugged), then ntpd could easily rule out the faulty server. But a failure in one of the good sources would bring you back to the two-server configuration, and there: you are dead again.

Only with four servers, you are able to survive to the failure of at least one source, and a bugged source.

You may now start to understand why NTP experts insist on having at least four independent sources, and why you should periodically check if your servers’ sources are still alive and well. That said, suit yourself, but you have been warned.

Advertisements

6 thoughts on “The dirty dozen

  1. Ciao DanieleVMs should never be synchronised via ntp; synchronize the host, install the vmware tools on the guest, and let the guest use the system clock as a reference.I took a peek at your document. If you were to install many virtual machines, I suggest that you should install your internal servers and use them as a reference, instead of poking at public servers from each and every machineCiao!– M

  2. DR writes:Hi Marco, that's interesting because VMware suggests to use NTP to sync Linux VMs. Perhaps it's because of the time drift VMs experience with regard to the (hardware) system clock, and syncing directly via NTP cuts off the extra step. Your thoughts on this?I agree on the second point, mine it's just a home-made installation. Cheers! — DR

  3. Originally posted by anonymous:

    Hi Marco, that's interesting because VMware suggests to use NTP to sync Linux VMs.

    That's all new for me. When I worked for Sardegna IT, the recommendation was to have the VMs follow the host's clock. I still think that, where possible, this is the way to go: ntpd and variable frequency CPUs don't like each other. From ntpd's perspective, VM's virtual CPUs appear as changing their frequency all the time and almost randomly, so it's not unusual to see a VM's clock's offset oscillate wildly when ntpd is used.Unfortunately, there still are cases where ntpd on a VM is the only viable option, and one has to live with that :(– M

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s