I’m starting to write a detailed description about how we’ll tackle the leap second. In the meanwhile you can well read this very beautiful and informative piece by Miroslav Lichvar. His observations about the behaviour of ntpd match mine and will give you an idea of how our solution works.
Sysadmin
Leap second update

The leap-lab at Opera (2015)
I’ve been quite busy with my experiments to prevent possible side effect of the leap second. As those who follow me on twitter know I am quite close to finalise a recommendation: all going well, that should come by the end of the week. Stay tuned!
Scary times at the leap second lab
After one month spent on other high priority tasks it was about time to get back to the leap second lab. The fated day is coming and we need to have a strategy in place.
I spent this week running tests, tuning the scripts that support them, and improving the CFEngine policies that manage the lab today and will implement our strategy tomorrow. Besides, I structured my tests a bit better to ensure that the “false start” I had one month ago doesn’t happen again.
On Friday I finally got to run some crucial tests and the results of one of them were scary to say the least.
Wheezy and the leap second: a mystery?
Yesterday I reported about Debian Wheezy steppi
ng back one second, despite the settings in ntpd prohibiting step changes and the leap second not armed in the kernel. The clock in Debian Jessie didn’t step.
At first, I thought it depended on a different ntpd version shipped in the two distributions, but it turned out to be the same. That suggested me that I should have tried a new experiment: run two tests in parallel on wheezy, one with an ntpd running and the other without, to see if the one without ntpd would still step back.
To my biggest surprise, no step happened in either.
This suggests that there must have been something odd in yesterday’s experiment and I should repeat it, while watching the configurations and set up more closely. As always, I’ll keep you posted. Until then, take care.
The leap marathon has started
No, I’m not going to run 42 kilometres jumping 🙂 I’ve started my leap second tests today. The goal of the tests is to find a configuration, or a procedure, or both, to avoid a backwards step of the clock at the insertion of the leap second at the end of June 30th (UTC).
I ran the first test today. In the ntpd configuration I set two directives: tinker step 0 and disable kernel. The first directive disables step adjustments, the latter disables the kernel discipline: ntpd will manage the clock all by itself instead of “asking” the kernel to make corrections to the clock speed; it is not really necessary as the first one should be enough to automatically disable the kernel discipline, so it’s there just for good measure.
So I installed the leap seconds file, installed the new configuration for ntpd, reset the clock to June 30th, 2015 and started the test. For the whole duration of the test the leap second was never armed in the kernel. Everything went as planned in Debian Jessie:
2015/06/30 23:59:59.998489934 2015/06/30 23:59:59.999009849 2015/06/30 23:59:59.999536585 2015/07/01 00:00:00.000063781 2015/07/01 00:00:00.000589560 2015/07/01 00:00:00.001109634
but not so in Wheezy:
2015/06/30 23:59:59.998049126 2015/06/30 23:59:59.998788657 2015/06/30 23:59:59.999572132 2015/07/01 00:00:00.000316483 2015/07/01 00:00:00.001051262 2015/07/01 00:00:00.001792934 2015/07/01 00:00:00.002529339 2015/06/30 23:59:59.004499757 2015/06/30 23:59:59.005266331 2015/06/30 23:59:59.006014975
That means that for wheezy we have two possible “branches”:
- it was ntpd to request the step back
- it was the kernel to request the step back.
The second case is, of course, unlikely as the kernel didn’t know about a leap second. Therefore, the branch to follow first is to use the same ntpd as in jessie in wheezy and see if the results match or not. I’ll keep you posted. Take care.
Update: apparently jessie and wheezy sport the same version of ntpd. Oh well…
Bug or feature? Dereferencing of arrays and namespaces
Now that the upgrade from 3.4 to 3.6 is advancing slowly but steadily I am starting to check the features that are new in 3.6 compared to 3.4. According to the docs namespaces were actually introduced in 3.4.0, but I didn’t take advantage of them yet, and it’s time to start.
When something is declared in a namespace (a bundle, a variable or whatnot) it must be referred to with its namespace. For example, if you declare a bundle test in the namespace nstest, you’ll refer to that bundle from outside the namespace (e.g. in the bundlesequence) as nstest:test. If you declare a variable, for example an array called conf in that bundle, that will be nstest:test.conf outside the namespace. So far so good.
Now, what happens inside the namespace? Well, I found one fact that is indeed surprising.
Report from Config Management Camp 2015
The Config Management Camp 2015 is gone leaving its trail of inspiring presentations, interesting discussions, pleasant meetings with great people and, hopefully, satisfaction for how each of us has played his/her part to make this edition a success.
A big thank for the brave people that attended my seminar and to those who asked questions. The questions gave me a couple of ideas to further expand the seminar, and more may come if you’re so kind to let me know your opinion on the talk: what you liked, what you didn’t, what could be improved in both the talk and the speaker’s style. Thanks in advance. As for the code of the tools, I promise to publish it on GitHub as soon as I get “clearance” (this week, possibly!).
For those who weren’t at the seminar, I presented how we evolved our git repository structure to support more than one project, each one with its own needs, but at the same time being able to share the relevant common libraries and tools and to make the deployment of the policies easy, manageable and maintainable, whatever the number of hubs and projects involved. The questions dove nose down to how we manage access to the hubs so that a person working on project A can’t accidentally deploy his policies on the hubs supporting project B, how we manage access rights to files in separate projects and to branches, and how easy or hard is to extend the deployment tool with new functionality.
The slides of the presentation are on SpeakerDeck (or further down the post if you don’t bother go to SpeakerDeck 😉 The good guys at Normation also filmed the seminar, so it’s just a matter of time that a video of the seminar will be available. Then you’ll also be able to hear my appeal to support cancer research, talking of which you can check another blog post of mine.
Regarding the talks I attended and the “hallway track”, Jez Humble’s keynote was definitely a mind blowing experience. Leaving aside the things that I am doing wrong, that we are doing wrong in my work environment, and that a broad set of people in our profession is f***ing up completely, I understood that there is a category that definitely needs to be more present at events like this: bosses. Because we can do a good job as professionals, follow the best practices, use the brightest and shiniest tools of today and some of the tools of tomorrow, but that’s definitely not enough to establish a culture of cross-area collaboration. That’s not going to happen without the direct involvement of the bosses and their mandate.
Safer package installations with APT and CFEngine
Package installation can be tricky sometimes when using configuration management tools, as the order in which package operations are performed can have an impact on the final result, sometimes a disastrous impact. Months ago I had been looking for a way to make apt-get a bit less proactive when trying to solve dependencies and removing packages and came up with the following package method that we now have in our library.
To use it, just put it in your policies and use apt_get_safe in your policies instead of apt_get wherever you want a more prudential approach to package installations.
I’m putting it here in the hope that it may be useful for everyone. I have used it successfully in Debian 5, 6, 7, 8 and on CFEngine 3.4.4 (where I borrowed parts of the masterfiles from 3.5 and 3.6 like, e.g., the debian_knowledge bundle) and 3.6.x. Enjoy!
A small leap for a clock, a giant leap for mankind
It’s going to happen again: IERS’ Bulletin C was published yesterday announcing that we’ll have a leap second at the end of June this year. It’s time to get ready, but we this time we are luckier than in 2012: leapocalypse’s scars were deep and still hurt, and only inexperienced sysadmins or experienced idiots will deny the need of preparing a strategy to mitigate the unavoidable bugs that will be triggered by the leap second insertion. Or even by the leap second announcement, like it happened in 2012.
Which bugs? Nobody knows. We have six months left during which you could review the source code of your OS’ and software. Or maybe not, but you can at least test what happens on your systems when the leap second is announced and when it is inserted.
Like in 2012, I’ll soon start testing the behavior of both ntpd and Linux. The test framework will be the same as last time, but with the added wisdom gained fighting against leapocalypse. This is what I plan to add to the picture:
- more than one NTP test server: either three or four, to measure the convergence speed more accurately;
- test under heavy load;
- test on at least some of the software that we run;
- test with the kernel discipline disabled to compensate for kernel bugs;
- test on both stock ntpd and the recently released 4.2.8;
- test on Debian Jessie, since it will be surely released before the leap second takes place;
- prepare CFEngine policies to handle the leap second and test those as well.
Like last time I’ll share my results. If you’ll be doing similar tests in your environment, I’ll be more than glad to see your findings. This is going to affect us all, the more information we share, the more chances we have to overcome the bugs.
Image from http://writing.wikinut.com/img/3bguk3_7i7al8s9b/If-We-Could-Turn-Back-Time
How we shaved the poodle
In this post I’ll describe how we used CFEngine to apply fixes to apache and nginx to defuse the infamous poodle bug. The post is a bit rushed, in the hope it may still be useful to someone. The policies use bundles and bodies from either the standard library or from our own. The libraries are not shown here but the names speak for themselves… hopefully 🙂
As you’ll probably know, the “trick” on the server side is not to allow secure (erm…) connections to use anything older than TLSv1. In order to do that, we decided to
- deploy a conf.d snippet to set the appropriate protocol versions as a default;
- disable the same directive in existing configuration files to avoid weaker directives take priority;
- restart the server if/when the configuration gets fixed.
