Know your tools: adjtimex

Cosimo was one of the speakers at Velocity Europe 2012 last week. He gave a nice talk, and he even dared to show my face in his presentation (slide 32 if you're curious).

However, what caught my attention was the very next slide, where it was evident that he was blindly installing adjtimex on all his puppet-managed nodes. I warned him that it could be a bad idea, and he seemed to be quite surprised. Sadly, I was proved to be correct just a few hours later: they had an handful of virtual machines whose clocks were going crazy in all possible ways, and the root cause was tracked down to be the installation of the adjtimex package.

But let's take a step back: what is adjtimex, and why it can switch from a useful tool to an evil rapist of system clocks?
Let's read some excerpts from the man page:

[adjtimex] gives you raw access to the kernel time variables
. . .
Your computer has two clocks – the "hardware clock" that runs all the time, and the system clock that runs only while the computer is on.
. . .
For a machine connected to the Internet, or equipped with a precision oscillator or radio clock, the best way is to regulate the system clock with ntpd(8). The kernel will automatically update the hardware clock every eleven minutes.
. . .
For a standalone or intermittently connected machine, where it's not possible to run ntpd, you may use adjtimex instead to correct the system clock for systematic drift.

For example, if your system clock, left alone, gains one second per day, you may use adjtimex and, using the hardware clock as a reference, tweak the clock frequency so that it doesn't run that fast. However, what happens when you install the adjtimex debian package, and the machine you're installing it on has some load? Well… a mess, likely.

When you install the package, the installation scripts watch your clock for 70 seconds, trying to estimate that infamous systematic drift and correct it (calibration). If the system is lightly- or not loaded during the calibration, that works pretty well. If it has some load, it will get it wrong and your system clock will start running too fast or too slow; if, on top of that, it's a virtual machine, the mess can get really big, as the "virtual CPU" is changing its own frequency continously, and it is not a good thing during clock calibration. That's what was happening in Cosimo's team, on a number of KVM virtual machines: they were correctly configured to follow the host's clock and not running ntpd, but unfortunately adjtimex kicked in, got the frequency of the clock wrong, and they went crazy.

You may think that if adjtimex messed up with your clock, you can always use NTP to put things back in place. Sadly, it is not always the case. ntpd can correct a frequency error in the clock up to 500 ppm (ppm = parts per million = 0.0001%; 500 ppm is about 43 seconds per day). If adjtimex manages to change the clock's frequency for more than 500 ppm, ntpd will not be able to make it sane.

The good thing is that, although it is able to make a huge mess, adjtimex can also come to the rescue in many different ways.

One thing it can do, for example, is to let you sync the frequency of the system clock close to the hardware clock's. The hardware clock is usually nothing special, but I have seen very few cases where its frequency was off for more than 500ppm. The following command may help, but check in the man page if it is appropriate for your system:

hwclock --hctosys ; adjtimex --tick 10000

Now you can run ntpd on your machine to continuously tune the clock, or leave it as it is and rely on the hardware clock if running ntpd is not an option (like, again, for virtual machines).

Another thing you can do on hardware machines (and I actually did once, to fix a crazy clock on a colleague's server), is to unload the machine as much as possible, purge the debian package if it's there (apt-get remove --purge adjtimex does it, for example), and then install it again. That will trigger a new calibration, and probably fix the problem well enough that you can run ntpd and let it do the rest.

As you can see, adjtimex is not that bad: all in all, it has the power to handle time-related kernel variables, and it was of invaluable help when we tried to circumvent the mess caused by the leap second. If you want to have it handy, but you don't want to run into the calibration mess, you can always take the binary alone and put in a convenient location, like /usr/local/sbin. It has virtually no dependencies (the only real one is the libc6 package, which you probably already have on all systems ;), it doesn't conflict with any existing package, and you don't have to sweat if you have to install it from the package in an emergency.

Advertisements

2 thoughts on “Know your tools: adjtimex

  1. Remi Bergsma writes:Thanks for your post! Just for info:no need to purge and reinstall adjtimex to trigger a new calibration. You can also run 'adjtimexconfig'.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s