CFEngine drives you crazy

CFEngine meetup - Oslo, June 18th, 2013

CFEngine meetup – Oslo, June 18th, 2013

Today I attended the CFEngine meetup in Oslo, in the headquarters of CFEngine AS. It was great to meet all the people that I had met only in writing (in mailing lists, twitter, LinkedIn and the like). I am bad at remembering names, so rather than making myself miserable by getting them wrong or forgetting anyone, I’ll just say: thanks all for a very nice evening.

During this meetup a few things happened that suggested that CFEngine may drive you  crazy, and if you don’t believe it… Continue reading

Classes don’t blink

blog-cfengine-logoThis post is about a CFEngine feature that is a bit counter-intuitive at first, and may leave you scratching your head for a while if you don’t know about that. And, seemingly, many don’t. I’ll talk about how CFEngine classe change their state, but especially I’ll talk about how they don’t change their state. Continue reading

CFEngine to the masses

With the aim of suggesting a way to bring CFEngine to the wider audience it deserves, I am launching a project called Dumbed-down CFEngine (or DDCFE for short, or DDC for shorter). It is a raw draft specification of a simple configuration management language, to be compiled into native CFEngine policies. There is no software yet, rather: it’s a call for action to build one because, to be honest, I have neither the resources nor the qualifications to build it all alone in a reasonable amount of time.

Continue reading

The value of the promisee in cfengine promises

blog-cfengine-logoA lesson I learnt today is: use promisees in cfengine promises, and you won’t regret it. In fact, I had a proof today that specifying a promisee is a very useful thing.

Let’s take a step back: a promisee is introduced by a -> sign, and according to the documentation the promisee is the abstract object to whom the promise is made. When CFEngine made its way in our trouble ticket, I started including a ticket number as a promisee every time a promise was created to address a reported issue.

Today I found out that, every time I deployed a change that caused firewall rules to be reloaded, ntpd was also restarted. I couldn’t understand why, so I checked the policies and found this:

      "/etc/init.d/ntp" -> "CLO-2313"
	  args       => "restart",
          ifvarclass => "ntp",
	  comment    => "Restart ntpd when the firewall is restarted" ;

CLO-2313 was a resolved ticket, and there I found out why I made such a decision more than one month ago, and why it was a good thing.

So… use promisees!

…and promise_summary.log met Solarized

Inspired by the blog post at nanard.org, I’ve spent a few days hacking on promise_summary.log and rrdtool, and I have finally something to show.

Let’s recap quickly: a file promise_summary.log in cfengine’s workdir (usually /var/cfengine) contains a summary of every agent run: when the run started and finished, which version of the policy has run, and how many promises (in percentage) were kept, repaired, and not repaired. The first thing I wanted were graphs for these metrics; a medium-term goal is to bring these metrics into some well known tool like Munin or Cacti — that will come later.

I chose to use RRDtool for many reasons; among all:

  • it takes care for both saving the data and making the graphs;
  • it saves the data at different resolutions, automatically;
  • all aspects of a graph are customizable, and different type of graphs can be embedded in the same picture

I had previous experience with RRDtool, and I knew the downsides of course, mainly: the odd, cryptic syntax. What I had forgotten since such a long time was that it’s actually easier than it looks 🙂 … Continue reading

Parsing promise_summary.log

In CFEngine's work directory there is a log file called promise_summary.log. Unsurprisingly, it contains a summary of how agent runs went in the past: how many promises were kept, how many repaired, and how many failed to be repaired. Some weeks ago a blog post on nanard.org showed how this file can be parsed to graph cfengine's activity, and I thought it could be a nice thing to do the same thing in Perl.

For those who never had a peek, the file looks like this:

bronto@murray:/var/cfengine$ tail promise_summary.log 
1367869877,1367869877: Outcome of version Community Failsafe.cf 1.0.0 (agent-0): Promises observed to be kept 95%, Promises repaired 0%, Promises not repaired 5%
1367869877,1367869905: Outcome of version MyOwnPC 1.0.16-1 (agent-0): Promises observed to be kept 97%, Promises repaired 3%, Promises not repaired 0%
1367870164,1367870164: Outcome of version Community Failsafe.cf 1.0.0 (agent-0): Promises observed to be kept 95%, Promises repaired 0%, Promises not repaired 5%
1367870164,1367870191: Outcome of version MyOwnPC 1.0.16-1 (agent-0): Promises observed to be kept 97%, Promises repaired 3%, Promises not repaired 0%
1367870450,1367870450: Outcome of version Community Failsafe.cf 1.0.0 (agent-0): Promises observed to be kept 95%, Promises repaired 0%, Promises not repaired 5%
1367870450,1367870475: Outcome of version MyOwnPC 1.0.16-1 (agent-0): Promises observed to be kept 97%, Promises repaired 3%, Promises not repaired 0%
1367870797,1367870797: Outcome of version Community Failsafe.cf 1.0.0 (agent-0): Promises observed to be kept 95%, Promises repaired 0%, Promises not repaired 5%
1367870797,1367870825: Outcome of version MyOwnPC 1.0.16-1 (agent-0): Promises observed to be kept 97%, Promises repaired 3%, Promises not repaired 0%
1367871083,1367871084: Outcome of version Community Failsafe.cf 1.0.0 (agent-0): Promises observed to be kept 95%, Promises repaired 0%, Promises not repaired 5%
1367871084,1367871111: Outcome of version MyOwnPC 1.0.16-1 (agent-0): Promises observed to be kept 97%, Promises repaired 3%, Promises not repaired 0%

The first thing, and I did it in some ten minutes, is to parse the file to extract the relevant information. Check this one-liner:

perl -F: -alne 'next if m{failsafe.cf}i ; my ($start,$finish) = split(",",$F[0],2) ; my $duration = $finish-$start ; my ($version) = m{version (.+) (} ; my ($kept,$rep,$notrep) = m{Promises observed to be kept (d+)%, Promises repaired (d+)%, Promises not repaired (d+)%} ; print join("t",$start,$duration,$version,$kept,$rep,$notrep)' promise_summary.log

Or, reformatted:

#!/usr/bin/perl -F: -aln

next if m{failsafe.cf}i ;

my ($start,$finish) = split(",",$F[0],2) ;
my $duration = $finish-$start ;
my ($version) = m{version (.+) (} ;
my ($kept,$rep,$notrep) = m{Promises observed to be kept (d+)%, Promises repaired (d+)%, Promises not repaired (d+)%} ;

print join("t",$start,$duration,$version,$kept,$rep,$notrep)

This parses the file and outputs the relevant information tab-separated.

The command line explained

  • -a turns on the "autosplit" feature, where the input is split (normally at spaces) and the chunks are put in the @F array; in this specific case however, the "-F:" switch will make Perl split at colons ":";
  • -l will make it so that newlines are added after each print;
  • -n will wrap the code around a while loop, reading from the file(s) given as argument, and put each line in the $_ "default" variable;
  • -e will execute the code given on the command line;

The code explained

  • we skip the lines generated by the failsafe policy
  • we split the first field (timestamps), and calculate the difference (that is: the duration of the run)
  • we match the version string
  • we match the percentage for kept, repaired, and not repaired promises;
  • we print the bunch in tab-separated format

This is just the start, of course. E.g.: rather than tab-separated values one could print the values in a format suitable to rrdtool, and then use rrdgraph to create the graphs… But that's for later, take care for now. Ciao!

cf-runagent is not obsolete

I sometimes see comments in the cfengine community, stating that cf-runagent is somehow useless. In its detractors' opinion, cf-agent runs by default every 5 minutes, so when you request cf-runagent to run the agent, chances are that it already did it by itself.

Well, fellow cfengineers, I strongly disagree. To me, cf-runagent is a powerful gun in my toolbox. Here's why.

The real big advantage of using cf-runagent is that you can choose which hosts you want to trigger a run on, and you can set classes in the remote agent (assuming you have cf-serverd properly configured). Combining these two features together, it's trivial to trigger one-off actions remotely (e.g.: deploying a new version of a software), and to do that in a one-two-many fashion: try on one node; if an action works on one test machine, try it on two; if it still works, install on a few more until you're confident it will work across the board; then, trigger the action everywhere.

I used that technique when replacing puppet with cfengine. Using cf-runagent, I was able to remotely enable and disable any of the two and, when the time came, to remove puppet from the nodes altogether (software packages and configurations). Isn't that reason good enough to keep using, developing and improving cf-runagent? 😉

bundle agent SeasonGreetings
{
  reports:
    December.(Day24|Day25)::
      "Merry Christmas!" ;
}

Location detection in cfengine

My employer provided me with a laptop that I usually use when working remotely (whether on a VPN or not); more rarely, that laptop could also be connected to the company network in Oslo. Usually, when working remotely I am either at home in Oslo, or in one of my relatives' house in Sardinia.

Depending on where I am, the laptop's configuration needs to be changed or augmented, in particular for these services:

  • ntpd
  • DNS resolver
  • software updates

Being an NTP junkie, I want my ntpd configured in the best possible way, and no: I am not going to point to some random servers in the pool that could be on the other side of the world. Whether I am in Italy or in Norway, I want to use servers from the pool that are either in the same country, or in a country nearby.

When I am in VPN, the client decides for me the search list for DNS domains: I want that list to be enriched with all the domains I actually need.

When I am on a low bandwidth network, I want the system to refrain from automatically checking for system updates: it's much better if I do it manually when, for example, I am not going to rot my mother's skype calls.

For all these reasons, I decided to implement a simple location detection process in cfengine that keeps my laptop correctly configured wherever I am. Besides, since the agent runs every 5 minutes, I can be sure that the configuration changes (e.g. when I enter the VPN) will take place quickly.

…and there could be more if you think about it! E.g.: what about having specific firewall rules depending on where you are? If you are in the office, you may not care if someone tries to access your laptop via SSH as that someone could be… you 😉 But suppose you were not able to determine your current location: would you leave free access to your laptop? … Continue reading

My cfengine policies explained – part 4

The policy we are about to see this time ensures that the hosts file contains at least the small set of records that every hosts file should always include: a record for the IPv4 localhost, a record for the IPv6 localhost, and a record that associates one IP of the host with the FQDN and the hostname in that order. It should also contain a set of IPv6 standard addresses.

This policy is definitely not ready for prime time, and I discourage you from using it (unless you are willing to patch it and share your patch with the rest of the world). Nevertheless, it is a good example of how, with cfengine, you can take care of just a few details in a file, leaving the other parts untouched. … Continue reading