I still remember very well the first (and only, for now) time I held a seminar for my colleagues in Opera: “NTP, a misunderstood protocol” in February 2011. Now fast forward to 2012. We had one leapocalypse between June and July, one bogus leap second one month later, and a bit of disaster about one week ago: nearly two years have passed since I held that seminar, and NTP is still a misunderstood protocol. …That day I didn’t have a huge audience: only six colleagues came to hear my speech (and none of them were sysadmins, if I recall correctly). And I remember one of them in particular, who questioned the need to have four independent sources configured in each client, and in each of our four servers for a robust configuration. But I probably was not good enough if that colleague finished: “I still think it’s too much, and I’ll use just one”. That day, he joined the huge gang of sysadmins who think that one source is just good enough, and those who still think that putting an ntpdate in cron is a good thing. And on top of the pile, he joined the brainless implementers of those OSs that allow you to configure only one time source in systems that are crucial for a whole infrastructure.
Then this happens: one server goes crazy and, thanks to a nice chain of ill configured servers/clients, a disaster breaks havoc on the internet, with hundreds of computers stepping the time back to 12 years ago: after the leap second, we got the leap dozen.
I had my servers configured correctly, and I didn’t even notice. I knew about the disaster only three days later, and by incident. And I have one thing to say to the bad time gang: you deserved it, because experts told you you were doing wrong, and you didn’t care and didn’t listen.
Yes, it could be prevented. First and foremost, ntpd should panic to protect itself from stepping back so much, so if it happened to you and you were using ntpd, your configuration is surely screwed.
If you had only one time source, and that source was bugged (it’s called a falseticker), and you did have that ugly ntpdate in cron and pointing to a bugged server, you really had no chance to survive the step back (and again: you probably deserved it, seriously).
If you had two sources, then you could escape it, if you were lucky enough. But with only two reference clocks, and sporting completely different times, it may be tricky for ntpd to understand which source is right, so chances are that those with just two sources (and one of them bugged) may have either stepped back, or panicked.
If you had three sources (and one of them bugged), then ntpd could easily rule out the faulty server. But a failure in one of the good sources would bring you back to the two-server configuration, and there: you are dead again.
Only with four servers, you are able to survive to the failure of at least one source, and a bugged source.
You may now start to understand why NTP experts insist on having at least four independent sources, and why you should periodically check if your servers’ sources are still alive and well. That said, suit yourself, but you have been warned.