systemd unit files for CFEngine

systemd logoLearning more of systemd has been on my agenda since the release of Debian 8 “Jessie”. With the new year I decided that I had procrastinated enough, I made a plan and started to study according to the plan. Today it was time for action: to verify my understanding of the documentation I read up to now, I decided to put together unit files for CFEngine. It was an almost complete success and the result is now on GitHub for everyone to enjoy. I would appreciate if you’d give them a shot and report back.

Main goals achieved:

  1. I successfully created three service unit files, one for each of CFEngine’s daemons: cf-serverd, cf-execd and cf-monitord; the units are designed so that if any of the daemon is killed for any reason, systemd will bring it back immediately.
  2. I successfully created a target unit file that puts together the three service units. When the cfengine3 target is started, the three daemons are requested to start; when the cfengine3 target is stopped, the three daemons are stopped. The cfengine3 target completely replaces the init script functionality.

Goal not achieved: I’ve given a shot at socket activation, so that the activation of cf-serverd was delayed until a connection was initiated to port 5308/TCP. That didn’t work properly: systemd tried to start cf-serverd but it died immediately, and systemd tried and tried again until it was too much. I’ll have to investigate if cf-serverd needs to support socket activation explicitly or if I was doing something wrong. The socket unit is not part of the distribution on GitHub but its content are reported here below. In case you spot any problem please let me know.

Continue reading

The leap second aftermath

TurnBackTimeThe leap second is finally behind us, and for the first time it has been transformed in an event. That had the unfortunate consequence that many channels where useful information had flown in the previous events were now flooded with bullshit. But it’s over. A giant army of idiots has finally stopped asking “what will you do with your extra second?”, a smaller but still noticeable army of inaccurate writers and journalists won’t write for a while that the atomic clocks need to be stopped for a second to realign with the Earth (?!?!?!?!?!?). We can now sit, look back and save some take-aways for the next edition of the event.

Continue reading

How to make CFEngine recognize if systemd is used in Debian

CFEngine 3.6 tries to understand if a Linux is using systemd as init system by looking at the contents of /proc/1/cmdline, that happens in bundle common inventory_linux. That’s indeed a smart thing to do but unfortunately fails on Debian Jessie, where you have:

root@cf-test-v10:~# ls -l /sbin/init
lrwxrwxrwx 1 root root 20 May 26 06:07 /sbin/init -> /lib/systemd/systemd

the pseudo-file in /proc will still report /sbin/init and as a result the systemd class won’t be set. This affects services promises negatively and therefore I needed to make our policies try to outsmart the inventory 😉 These promises, added in a bundle of ours, did the trick:

bundle common debian_info {
  vars:
    init_is_link::
      "init_link_destination"
          string => filestat("/sbin/init","linktarget") ;

  classes:
    init_is_link::
      "systemd"
          expression => regcmp("/lib/systemd/systemd",
                               "$(init_link_destination)"),
          comment => "Check if /sbin/init links to systemd" ;

    debian::
      "init_is_link"
          expression => islink("/sbin/init"),
          comment => "Detect if init is a link" ;
}

Notice that our bundle is actually bigger, I cut off all the promises that were not relevant for this post. Enjoy!

A humble attempt to work around the leap second, 2015 edition

TurnBackTimeUpdate: Watch out for public servers not announcing the leap second! In the last few minutes we have been observing a number of public servers (even stratum 1) that don’t announce the leap second. If the majority of your upstream doesn’t announce the leap second, your clients won’t trigger it. If that’s your case, you can use ntpd’s leapfile directive and a leap second file to provide your own servers with the correct information. Check the ntpd documentation for more information.

Update: Miroslav Lichvar has counted the public servers that are announcing the leap second on a per-country basis. You can find his stats on pastebin.


I have been running simulations for the upcoming leap second for a few weeks now. While some mysteries haven’t been solved yet, I was finally able to put together a configuration for our servers and clients that satisfies to the following requirements (where do these requirements come from? That is explained further down in the article):

  1. it works on Debian Linux Squeeze, Wheezy and Jessie
  2. it keeps the Linux kernel out of the game, in order to avoid triggering unknown kernel bugs
  3. it avoids backward steps of the clock
  4. the clock converges to the right time in an acceptable amount of hours
  5. it doesn’t hog public services

What this solution doesn’t provide: this is neither Google’s leap smear nor Amazon’s: you use standard ntpd code with no changes; this is not a fast clock slew as chrony’s either. Servers/clients have evolved predictably during most of the simulations and shouldn’t diverge too much from each other, but there are conditions where you may observe offsets between them in the order of magnitude of 0.1s. That should still be bearable though and will still save you from the headache of kernel bugs or jumps back in time. In order to work properly, this solution must make a few assumptions:

  1. you have at least four internal NTP servers, synchronized with at least four public servers and/or internal specialized time sources
  2. your clients use at least four of your own internal NTP servers and no external NTP server
  3. you use unicast NTP packets (broadcast and multicast will probably work as well or even better, but they haven’t been tested in my simulations)
  4. you are using ntpd (the reference implementation) version 4.2.8p3 (earlier versions have a bug that will make our countermeasures against clock stepping ineffective)

Let’s look at the implementation on both server and client side, which is pretty similar but with a few important differences. Continue reading

Scary times at the leap second lab

The leap-lab at Opera (2015)

The leap-lab at Opera (2015)

After one month spent on other high priority tasks it was about time to get back to the leap second lab. The fated day is coming and we need to have a strategy in place.

I spent this week running tests, tuning the scripts that support them, and improving the CFEngine policies that manage the lab today and will implement our strategy tomorrow. Besides, I structured my tests a bit better to ensure that the “false start” I had one month ago doesn’t happen again.

On Friday I finally got to run some crucial tests and the results of one of them were scary to say the least.

Continue reading

Wheezy and the leap second: a mystery?

Yesterday I reported about Debian Wheezy steppiTurnBackTimeng back one second, despite the settings in ntpd prohibiting step changes and the leap second not armed in the kernel. The clock in Debian Jessie didn’t step.

At first, I thought it depended on a different ntpd version shipped in the two distributions, but it turned out to be the same. That suggested me that I should have tried a new experiment: run two tests in parallel on wheezy, one with an ntpd running and the other without, to see if the one without ntpd would still step back.

To my biggest surprise, no step happened in either.

This suggests that there must have been something odd in yesterday’s experiment and I should repeat it, while watching the configurations and set up more closely. As always, I’ll keep you posted. Until then, take care.

The leap marathon has started

TurnBackTimeNo, I’m not going to run 42 kilometres jumping 🙂 I’ve started my leap second tests today. The goal of the tests is to find a configuration, or a procedure, or both, to avoid a backwards step of the clock at the insertion of the leap second at the end of June 30th (UTC).

I ran the first test today. In the ntpd configuration I set two directives: tinker step 0 and disable kernel. The first directive disables step adjustments, the latter disables the kernel discipline: ntpd will manage the clock all by itself instead of “asking” the kernel to make corrections to the clock speed; it is not really necessary as the first one should be enough to automatically disable the kernel discipline, so it’s there just for good measure.

So I installed the leap seconds file, installed the new configuration for ntpd, reset the clock to June 30th, 2015 and started the test. For the whole duration of the test the leap second was never armed in the kernel. Everything went as planned in Debian Jessie:

 2015/06/30 23:59:59.998489934
 2015/06/30 23:59:59.999009849
 2015/06/30 23:59:59.999536585
 2015/07/01 00:00:00.000063781
 2015/07/01 00:00:00.000589560
 2015/07/01 00:00:00.001109634

but not so in Wheezy:

2015/06/30 23:59:59.998049126
2015/06/30 23:59:59.998788657
2015/06/30 23:59:59.999572132
2015/07/01 00:00:00.000316483
2015/07/01 00:00:00.001051262
2015/07/01 00:00:00.001792934
2015/07/01 00:00:00.002529339
2015/06/30 23:59:59.004499757
2015/06/30 23:59:59.005266331
2015/06/30 23:59:59.006014975

That means that for wheezy we have two possible “branches”:

  • it was ntpd to request the step back
  • it was the kernel to request the step back.

The second case is, of course, unlikely as the kernel didn’t know about a leap second. Therefore, the branch to follow first is to use the same ntpd as in jessie in wheezy and see if the results match or not. I’ll keep you posted. Take care.

Update: apparently jessie and wheezy sport the same version of ntpd. Oh well…

Safer package installations with APT and CFEngine

CFEngineAgentPackage installation can be tricky sometimes when using configuration management tools, as the order in which package operations are performed can have an impact on the final result, sometimes a disastrous impact. Months ago I had been looking for a way to make apt-get a bit less proactive when trying to solve dependencies and removing packages and came up with the following package method that we now have in our library.

To use it, just put it in your policies and use apt_get_safe in your policies instead of apt_get wherever you want a more prudential approach to package installations.

I’m putting it here in the hope that it may be useful for everyone. I have used it successfully in Debian 5, 6, 7, 8 and on CFEngine 3.4.4 (where I borrowed parts of the masterfiles from 3.5 and 3.6 like, e.g., the debian_knowledge bundle) and 3.6.x. Enjoy!

Continue reading

A small leap for a clock, a giant leap for mankind

TurnBackTimeIt’s going to happen again: IERS’ Bulletin C was published yesterday announcing that we’ll have a leap second at the end of June this year. It’s time to get ready, but we this time we are luckier than in 2012: leapocalypse’s scars were deep and still hurt, and only inexperienced sysadmins or experienced idiots will deny the need of preparing a strategy to mitigate the unavoidable bugs that will be triggered by the leap second insertion. Or even by the leap second announcement, like it happened in 2012.

Which bugs? Nobody knows. We have six months left during which you could review the source code of your OS’ and software. Or maybe not, but you can at least test what happens on your systems when the leap second is announced and when it is inserted.

Like in 2012, I’ll soon start testing the behavior of both ntpd and Linux. The test framework will be the same as last time, but with the added wisdom gained fighting against leapocalypse. This is what I plan to add to the picture:

  • more than one NTP test server: either three or four, to measure the convergence speed more accurately;
  • test under heavy load;
  • test on at least some of the software that we run;
  • test with the kernel discipline disabled to compensate for kernel bugs;
  • test on both stock ntpd and the recently released 4.2.8;
  • test on Debian Jessie, since it will be surely released before the leap second takes place;
  • prepare CFEngine policies to handle the leap second and test those as well.

Like last time I’ll share my results. If you’ll be doing similar tests in your environment, I’ll be more than glad to see your findings. This is going to affect us all, the more information we share, the more chances we have to overcome the bugs.

Image from http://writing.wikinut.com/img/3bguk3_7i7al8s9b/If-We-Could-Turn-Back-Time

On systemd

ReligiousWarsUNIX init systems are not a topic people discusses a lot about, usually. There is some buzz when a new one is out, some more buzz when it is adopted in other shops than those where it was born, then most OS keep on with their old solution (usually the System V init system, or sysvinit) and everything falls back to radio silence. Other times, I assume, things cut short from some buzz and directly into the radio silence phase. I’ve been into the Upstart buzz, before that I’ve been into the Solaris SMF buzz and even played with it until our friend OpenSolaris was mercilessly killed by their new father. But, honestly, the heated arguments about systemd took my by surprise.

I really don’t know how I had not heard about systemd before. Maybe I was just looking in another direction, or maybe the fact that it was so controversial suggested systemd’s detractors not to talk about it in the hope that it would be yet another of those attempts that cut short from their offspring to radio silence. I don’t know. Anyway, it didn’t go that way. To me, it’s like systemd flew steadily under the radar and kept growing until the Debian project decided to adopt it as their init system. My brain just filtered the news as yet another thing I could safely ignore for the moment and I’ll have to learn about when it comes. And then the sky fell down.

What happened after that announcement was kind of the burst of a religious war, with people debating harshly, insulting each other, death threats spewed here and there, people resigning from their role in important organizations. Then came the Devuan fork of Debian. For now.

A bit too much for “just another init system”, right?

What follows is the outcome of my Christmastime readings about systemd and my own considerations about what I’ve read. I hope it will help you make up your own opinion on whether or not systemd is a good thing or a bad thing. As an extra, you can read my own opinion (for what is worth).

Continue reading