My cfengine policies explained – part 1

As announced on twitter, my cfengine policies are now in production. It's a few dozen servers, so not a big installation, but it saved me a lot of headache already. I thought it would be nice to give back to the community some of the help I got from it, so I'll publish and comment some of the code here. Some of the policies, I'll show in whole; others, I'll have to cut and obfuscate things here and there; there are also a few I cannot publish, sorry about that.

Some may think: why I didn't put this stuff in the design center? Well, some of them are rather rough and not really ready for the prime time. I'll do that eventually, when I feel they're nice enough.

But before we start, let me make a quick note about the design.

I massively took advantage of the "methods" promises, which provided me some interesting features.

First and foremost: they allow me to organise my policies in "layers":

promises.cf, the topmost set of promises, is just a collection of generic requests to the agent: print this, check that, configure here, install there…
one level down, you have more specific bundles which, in turn, may either perform an action, or be another level of abstraction, with specific actions bundled into other method calls
at the low level, you only have those bundles that actually perform actions on the system

Besides:

specific functionalities are encapsulated in small, reusable bundles, possibly parametric bundles
each layer offers a different level of complexity: the higher layers are the simpler ones, and don't require a lot of cfengine knowledge to be written; every time you go down one level, you get closer to the actions performed directly on the system
it should be easier to distribute the task of writing a policy across people with different levels of knowledge (if I am not left alone in this effort 🙂
enforcing a specific order to the actions you want to perform is much easier if you encapsulate related actions in separate bundles, and then you invoke them sequentially; the order of execution is not always important, but when it is, it's nice to be able to enforce it.

Enough talk, let's roll! …The core of any cfengine policy is the file promises.cf. Here's an amended and shortened version of mine:

###############################################################################
#   promises.cf
###############################################################################

body common control
{
  bundlesequence => { "site", "main" };

  inputs => { 
	      "cfengine_stdlib.cf",
	      "cfengine.cf",
	      "update.cf",
	      "opera-lib.cf",
	      "site.cf",
	      "puppet.cf",
	      "housekeeping.cf",
	      "services.cf",
	      "packages.cf",
	      "mini.cf",
	      "firewall.cf",
	      "site-ntp.cf", "ntp.cf",
	      "nagios.cf",
	      "snmpd.cf",
	      "hosts.cf",
	      "hostname.cf",
	      "resolver.cf",
	      "sshkeys.cf",
	    };

  version => "1.0.0.3";
}

###############################################################################

bundle agent main
{
  methods:
    any::
      "banner"       usebundle => banner ;
      "cfengine"     usebundle => cfengine ;
      "update"       usebundle => update ;
      "puppet"       usebundle => puppet ;
      "housekeeping" usebundle => housekeeping ;
      "packages"     usebundle => install_site_packages ;
      "mini"         usebundle => mini ;
      "services"     usebundle => keepalive ;
      "firewall"     usebundle => firewall ;
      "hosts"        usebundle => hosts("/etc/hosts",
                                        "$(sys.ipv4)",
					"$(sys.uqhost)",
					"$(def.domain)") ;
      "hostname"     usebundle => hostname("/etc/hostname") ;
      "resolver"     usebundle => resolver("/etc/resolv.conf","site.dns") ;
      "sshkeys"      usebundle => sshkeys("$(site.ssh[masterkeydir])",
					  "$(site.ssh[localkeydir])",
					  "$(site.ssh[selector])",
					  "$(site.ssh[sshdir])",
					  "$(site.ssh[owner])",
					  "$(site.ssh[group])") ;

    ntp::
      "ntp"          usebundle => ntpconf("ntp.config") ;

    mini_node::
      "nagios"       usebundle => nagios ;
      "snmpd"        usebundle => snmpd ;

  reports:
    report_minimum::
      "Default domain for this site: $(def.domain)" ;
}

bundle agent banner
{
  reports:
    report_normal::
      "This is cfengine community $(sys.cf_version) running on $(sys.fqhost)" ;
}

As you can see, we import a set of policies from external files. For example, there's the standard library, and our "personal" opera-lib.cf library; there is an update.cf derived from the standard failsafe.cf to update the policies when needed, and a number of other policy files.

There is also a site.cf policy (that we are not going to see): it contains a global bundle that is the first one to be examined by our policy (see the bundlesequence); the site bundle defines a set of variables and classes useful in the rest of the policies.

Below the common control body, we find the main agent bundle, the one that kicks off all other policies. The first one to be run is the banner bundle, contained in the same file right below the main bundle. It contains just a reports promise that will write a string if the class report_normal is set. report_normal is defined in site.cf, this way:

      "report_normal"  or         => { "inform_mode","verbose_mode" } ;

If the agent is run with either the -I (inform mode) or -v (verbose mode) option, the class report_normal will be defined and the message will be printed. That will also happen when we ask the agent to run via cf-runagent, and the message will make the output of the command more readable.

The second bundle run is called cfengine and, as said above, it gives us some control over the agent, whether we want it to run or not, and when we want to refresh all cfengine daemons. Let's check it.

bundle agent cfengine
{
  methods:
    any::
      "run_control"    usebundle => cfe_runcontrol ;
      "check_disabled" usebundle => cfe_disabled ;

    !skip_run::
      "set_defaults"   usebundle => cfe_defaults ;
      "add_crontab"    usebundle => cfe_crontab ;
}

bundle agent cfe_runcontrol
{
  vars:
      "flag"  string => "/etc/cfengine/disable" ;

  classes:
      "flagfile_present" expression => fileexists("$(cfe_runcontrol.flag)") ;

  files:
    disable_cfengine::
      "$(flag)"
	comment => "Create/touch this file if disable_cfengine is defined",
        touch   => "true" ;

    enable_cfengine::
      "$(flag)"
        comment => "Remove this file if enable_cfengine is defined",
        delete  => tidy ;

  commands:
    restart_cf_daemons::
      "/etc/init.d/cfengine3 restart"
        comment => "When requested, restart daemons via init.d" ;

  reports:
    flagfile_present::
      "Flag file $(flag) present, this run may be skipped" ;
}

bundle agent cfe_disabled
{
  classes:
    !force_run::
      "skip_run"    expression => fileexists("$(cfe_runcontrol.flag)") ;
}

bundle agent cfe_defaults
{
  vars:
      "conf[RUN_CF_SERVERD]"  string => "1" ;
      "conf[RUN_CF_EXECD]"    string => "1" ;
      "conf[RUN_CF_MONITORD]" string => "1" ;
      "conf[RUN_CF_HUB]"      string => "0" ;

  files:
    any::
      "/etc/default/cfengine3"
          edit_line => set_variable_values("cfe_defaults.conf"),
          classes   => if_repaired("restart_cfe3") ;

  commands:
    restart_cfe3::
      "/etc/init.d/cfengine3 restart" ;

}

bundle agent cfe_crontab
{
  vars:
      "crontab" string => "/etc/cron.d/cfengine" ;

  files:
      "$(crontab)"
          edit_line     => add_cfexecd_crontab,
	  edit_defaults => empty,
	  create        => "yes",
	  comment       => "Adds a crontab so that cf-execd is checked and restarted" ;

}

bundle edit_line add_cfexecd_crontab
{
  insert_lines:
      "*/5 * * * * root /usr/bin/pgrep -c cf-execd >/dev/null 2>&1 || /var/cfengine/bin/cf-execd" ;
}

The first "sub-bundle" called is cfe_runcontrol. It looks for a flag file (in this case it's /etc/cfengine/disable, but it's trivial to make the file name parametric): if the file exists, the class "flagfile_present" is defined, and the reports promise will warn us that this agent run may be skipped.

But the real "meat" of the bundle is in the files promises: if the class disable_cfengine is defined, and that usually happens via the command line, cfengine ensures that the flag file is present; if the class enable_cfengine is defined (again, on the command line), cfengine ensures the flag file is not present. Another run control action happens if the class restart_cf_daemons is defined (yes, again, on the command line): the whole stack of cfengine daemons is restarted.

Once the promises in run_control are verified, we have ensured that the flag file is either present or not, and that cfengine daemons have been refreshed if we so desired. The bundle cfe_disabled will then come into play and, unless we defined the "force_run" class on the command line (which, you have guessed, will force the agent to run even if we have otherwise disabled it using the flag file), will define the "skip_run" class if the flag file exists. That will force the agent to stop, because in a part of promises.cf we didn't show we defined:

  abortclasses => { "skip_run" } ;

abortclasses is a directive in the body agent control, and defines which classes will force the agent to abort when defined. If the class skip_run is defined, the cfengine bundle stops calling methods and returns immediately, so that the agent stops as soon as possible. Now all these pieces fit together:

flag file present -> skip_run defined -> agent run aborts

When we are allowed to run, we ensure that the /etc/default/cfengine3 contains the right configuration, and that we restart the daemons if the defaults have changed (bundle cfe_defaults). Finally, with the cfe_crontab bundle we ensure that a crontab is created in /etc/cron.d, and that crontab will ensure that cf-execd stays up.

That's all for today. We have seen how promises.cf kicks off the whole set of policies, and how the cfengine bundle keeps things running. Next time we'll see the cfengine bundle counterpart: the puppet bundle, which ensures that puppet stops doing whatever it's doing on the node.

Until then, take care!

2 thoughts on “My cfengine policies explained – part 1”

anonymous

October 8th, 2012 - 18:10 at 18:10

David Ramirez writes:Very useful – thanks for posting! I'm also entering production with some 20 servers & 70 workstations – going first with the later as they receive a major OS upgrade. I'm still ironing out problems, and learning. Just a few servers for now, as the majority are still under CF2 and need more differentiation / customization. So far, I'm happy to (finally) understand CF3 and start to really take advantage of it.

marcomarongiu

October 8th, 2012 - 21:10 at 21:10

Thanks David! I am preparing the second instalment of this series, and I hope I can have it out in a couple of days. Stay tuned!

A sysadmin's logbook

in every challenge there is an opportunity

My cfengine policies explained – part 1

2 thoughts on “My cfengine policies explained – part 1”

Leave a comment Cancel reply

Share this:

Related

2 thoughts on “My cfengine policies explained – part 1”

Leave a comment Cancel reply