External node classification, the CFEngine way

CFEngineAgentExternal node classification is a Puppet functionality, where it is left to a program external to Puppet (the external node classifier, ENC) to decide which contexts apply to the node being configured. This approach is opposed to the “standard”, basic one, where the configuration applied to a node is completely defined in Puppet files (manifests), and site.pp in particular. ENC programs really show their power and usefulness where the same ruleset is used to manage a large number of servers, with many different combinations of configurations are applied. In this case, pulling the configuration information from a structured data base instead of plain files scales much better. One can write his own ENC, or use one of the several available, hiera being a well-known one.

Note: files and ENC are just two possible ways to classify nodes in Puppet; besides, classification happens on the puppetmaster, while in CFEngine all configuration decisions are taken on the node running the agent. You may read more information on node classification in puppet here, but then we’ll leave you alone and keep reading this post ūüôā

Now transpose to CFEngine. Being it generally more “low-level” than Puppet, it provides no ENC mechanism out of the box, but plenty of possibilities to implement one yourself. I first checked what were the available options. I got very nice suggestions, notably one by LinkedIn’s Mike Svoboda, where they use a Yahoo! open source product called Range to store the data about the nodes, then they dump the data in JSON format, and finally they use a bash script run as a CFEngine module to raise the relevant classes. As scalable and sophisticated as it is, it was way too much than what I needed.

ENC, in my case, had two purposes: the first: allow us to scale better (and that’s a common trait in all ENC mechanisms) the second: take as much configuration information as possible out of the policies and in plain text files. This way, the access barrier for the non-CFEngine savvy people in the company would be lowered significantly. This approach is actually closer to Neil Watson’s and EvolveThinking’s CFEngine library (based on CSV files) than to the otherwise wonderful LinkedIn approach.

I sowed this information and ideas in my brain and let them sprout in the background for a couple of weeks, letting the most complex solutions drop. The last one standing was, in my opinion, the best combination of power and simplicity. We’d use plain text files, the CFEngine’s own module protocol, and extra-simple scripts: a bash script for simple external node classification, and a Perl script for hierarchical node classification. I’ll summarize the module protocol first, and then show how we leverage it to achieve ENC.

Electric-Car-plug-inPlugins in CFEngine (modules) and the module protocol

CFEngine is able to collect, and make available to your policies, a wealth of system information by means of classes that come predefined when your agent starts (hard classes) and built-in variables (especially those in the sys and mon contexts). When you need more information than CFEngine natively provide, you can still use its built-in functions and commands promises, but sometimes they are not enough: you need to query your system directly to collect much information at once. The standard way is to write a module.

A module is a program written in any language. It can do whatever it likes to collect information, as long as it hands it back to CFEngine in a very simple way: through the standard output:

  • printing out +my_class will activate a global class my_class;
  • printing out -my_class will cancel a global class my_class;
  • printing out =my_var=my_value will set a variable my_var with value my_value in a context (think of a namespace in a programming language) named after the program that generated the variable (more on this in the example below);
  • printing out @my_list= {'list","of","values"} will set a list called my_list
  • of course, you can also set array values: =my_array[key]=array_value is a perfectly valid syntax

In CFEngine 3.5.x and later an extension mechanism has been added to the protocol; it’s not covered here because we are using 3.4.x, but it would be really trivial to extend our ENC to include extensions. See the documentation if you want to know more.

A classical example of mine for a plain simple, yet useful module is one that defines a class for each wireless network we are connected to: it’s a bash script called essid.sh:

#!/bin/bash

PATH="/sbin:/usr/sbin:/usr/bin:/bin"

for ESSID in $( iwgetid --raw )
do
  CESSID=$( echo $ESSID | tr -c "a-zA-Z0-9_" "_" | sed -e 's/_*$//' )
  CLASS="essid_${CESSID}"

  echo "+${CLASS}"
done

exit 0

As you can see it’s really a trivial one: it gets the list of the ESSIDs of the networks we are connected to, replaces all characters that are not legal in a CFEngine class name with underscores, prefixes the resulting strings with essid_, and then echoes them one by one with a + in front. On my laptop and when I am home, the module writes:

bronto@murray:~$ /var/cfengine/modules/essid.sh 
+essid_WiFiOslo

and that sets the class essid_WiFiOslo, from which the agent running on the laptop understands that I am home and applies some special settings accordingly. If I also set a variable from this module, that would be defined in the context/namespace essid_sh, that is: the name of the module itself, with all the “invalid” characters replaced by an underscore. E.g., if the module set a variable called, signal_strength, its fully qualified name would be essid_sh.signal_strength.

There would be a few more things to be said, but the real topic here is external node classification, so let’s get back to that. See the documentation if you need more details about the module protocol.

classificationExternal node classification using the module protocol

Now, think for a second if you had a text file that was formatted according to the module protocol. Something like:

+my_class
-your_class
=my_variable=some text

If we had such a file, we could define classes and variables in our agent by just using the cat command from inside the agent. That would be something already, but comes with the limitation of not allowing any additional human-readable information (like, for example, comments).

But that’s very easy to fix! If we write a shell wrapper around grep as a module and call it enc, we can put both CFEngine information and human-readable information in the file, and ensure that it will be skipped:

#!/bin/sh

/bin/egrep -h ^[=@+-] $* 2> /dev/null

Such a wrapper will filter all non-module information from all the files it gets on the command line (to know why we throw away the standard error, please see the docs). That’s something more, but it has another limitation: it’s not hierarchical, it doesn’t merge the information from the different files. Whatever is in the files is just thrown out and into the agent. What happens if a class is activated and canceled in different files? What if the same name is used to set both a variable and a list? What would be the outcome, what would the agent do?

That’s where hierarchical/merged node classification is needed: if we have a list of files, where a class is both set and cancelled, the final status will be the one found in the last file that said something about that class; if a variable is set many times, the last definition wins.

How difficult is it to implement? Judging by the size of the following script, called henc (for hierarchical ENC), it’s not difficult at all:

#!/usr/bin/perl

use strict ;
use warnings ;

my %class ;    # classes container
my %variable ; # variables container

# Silence errors (e.g.: missing files)
close STDERR ;

while (my $line = <>) {
    chomp $line ;
    my ($setting,$id) = ( $line =~ m{^\s*([=\@+-])(.+)\s*$} ) ;
    next if not defined $setting ; # line didn't match the module protocol

    # add a class
    if ($setting eq '+') {
	# $id is a class name, or should be.
	$class{$id} = 1 ;
    }

    # undefine a class
    if ($setting eq '-') {
	# $id is a class name, or should be.
	$class{$id} = -1 ;
    }

    # define a variable/list
    if ($setting eq '=' or $setting eq '@') {
	# $id is "variable = something", or should be
	my ($varname)      = ( $id =~ m{^(.+?)=} ) ;
	$variable{$varname} = $line ;
    }

    # discard the rest
}

# print out classes
foreach my $classname (keys %class) {
    print "+$classname\n" if $class{$classname} > 0 ;
    print "-$classname\n" if $class{$classname} < 0 ;
}

# print variable/list assignments, the last one wins
foreach my $assignment (values %variable) {
    print "$assignment\n" ;
}

Let me explain briefly how this script works: it reads the lines of the files passed on the command line, one by one, looking for module-like lines. If the line appears to be setting/canceling a class, it will set an hash key/value for that class; if it looks like a variable or array, it will extract the variable name, and save the latest definition for it in another hash. If a class is set two times, the latest definition overwrites the earlier; if a variable and a list with the same name are defined, the latest definition is kept and the earlier are all discarded. When there are no more lines to be read, the module prints all this merged information with no duplicates.

This is all very good, but completely useless as long as we don’t plug it in CFEngine. How do we do that? What decides which files should be checked? Once again, there are endless possibilities, from the simplest one (like: hardwiring a file list into a policy) to the most sophisticated ones (get the list itself from some external source). We decided, once again, for a simple path.

DaltonBrothersHow we plug hENC into our policies

In general, this is how we want to apply settings from hENC:

  • first, we want to read some defaults, generally valid for all possible locations;
  • then, we want to read defaults for the location the node is in, overriding general defaults if needed;
  • then, we want to read defaults that depend on other information, like the environment the node is in (e.g.: it has global connectivity, like a public IPv4 or a global IPv6 address, or it has only private addresses);
  • finally, we want to read special settings that apply to this node only.

We define this hierarchy as a list of files, where the elements of the list depend on certain classes being set; the last file to say anything about a class or variable wins. We’ll show how that works with a practical example: a slightly modified excerpt from our real policies.

In our policies we set some global classes that tell us the location of the node (e.g.: a node running in Oslo will have the oslo class set); plus, we set classes like on_private_net_only or oslo_public, depending on the connectivity of the node itself. Finally, we have the following vars promises:

    oslo_public::
      "enc_subdir"
          policy => "overridable",
          string => "$(enc_basedir)/pub" ;

    on_private_net_only::
      "enc_subdir"
          policy => "overridable",
          string => "$(enc_basedir)/priv" ;

    oslo::
      "henclist"
          policy => "overridable",
          slist => {
                     "$(enc_basedir)/_default_",
                     "$(enc_basedir)/_oslo_",
                     "$(enc_subdir)/_oslo_",
                     "$(enc_subdir)/$(sys.domain)/$(sys.fqhost)",
          } ;

As you can see, the general defaults for all locations are listed first, then we read the defaults for Oslo, then the defaults for Oslo for public or private nodes, and finally the special settings for the node. The latest ones are stored in a subdirectory named after the domain name of the node, so that we don’t clutter a single directory by throwing all the node files there.

Note that any of these files could be missing (e.g.: the node-specific file), and the mechanism still works: if a file doesn’t exist, henc will just ignore it. Note also we could extend the list with a local file that is not pulled from the policy hub (e.g.: /etc/cfengine/local-node.conf), and that would also work and would allow settings defined centrally to be overridden by local settings; whether or not this is a good idea is left to the reader as an exercise ūüėČ

The henclist is then passed to a bundle via a method call in promises.cf:

    "ENC"
      comment   => "External node classification",
      usebundle => henc("site.henclist") ;

Once the bundle henc is processed, the classes will be set/cancelled globally, and the variables will be usable in other parts of the policy like, for example:

      "motd_file"
          string => "$(henc.motd_file)",
          policy => "overridable" ;

Yes, it’s as simple as that!

Note: ENC and bundle common

If in a bundle common you use information from hENC, you’ll have to ensure that it is evaluated after the settings from hENC are applied. For that purpose, in promises.cf we have this:

      # Need to re-evaluate common bundles if they depend on information
      # from ENC
      "reevaluate_$(common_bundles)"
          comment   => "evaulating $(common_bundles) variables after ENC",
          usebundle => $(common_bundles) ;

Appendix: the henc bundle

We conclude this post by showing the bundle agent henc in detail. We see the code first, and then we explain how it works. We use some additional bodies from a library of ours that are not shown here, but it’s easy to understand what they do.

bundle agent henc(enclist_name) {
  vars:
    henc_has_list::
      "enclist"      slist  => { "@($(enclist_name))" } ;
      "enc_fullpath" slist  => maplist("$(site.inputs)/$(this)","enclist") ;
      "encargs"      string => join(" ","enc_fullpath") ;

  classes:
      "henc_has_list" expression => isvariable("enclist_name") ;
      "henc_has_args" expression => isvariable("encargs") ;
      "henc_can_classify"    and => { "henc_has_list","henc_has_args" } ;

  files:
      "$(site.lmodules)/henc"
        comment   => "Copy/update hierarchical merger",
        copy_from => digest_cp("$(site.modules)/henc"),
        perms     => mog("0755","root","root") ;

    henc_has_list::
      "$(site.inputs)/$(enclist)"
        comment   => "Cache henc files locally",
        copy_from => digest_cp("$(site.masterfiles)/$(enclist)") ;

  commands:
    henc_can_classify.!henc_classes_activated::
      "$(site.lmodules)/henc"
        comment    => "Hierarchical classification for $(sys.fqhost)",
        args       => "$(encargs)",
        classes    => always("henc_classes_activated"),
        module     => "true" ;
}

Just recall that the agent goes three times through each bundle to ensure convergence, and we are ready to go.

At the first pass, the vars promises are not evaluated, because the henc_has_list class has not been set yet. Then the classes promises are evaluated: if the bundle was passed an argument, the henc_has_list class gets defined; henc_has_args won’t, as the variable encargs hasn’t been defined yet; this implies that henc_can_classify will also be false. Only the first one of the files promises will be evaluated, and it will update the script if needed. Finally, the commands promises are skipped as the class condition evaluates to false.

At the second pass, the vars promises will be evaluated, and all the variables will possibly be defined. enclist will be a copy of the list whose name was passed to the bundle as parameter; enc_fullpath will be the same list with all paths prefixed with the name of the inputs directory (we define it in a common bundle named site), and encargs will be a string that contains all the files in enc_fullpath, joined with spaces.

When we get to the classes promises, all of the classes will be now defined. As a consequence, the remaining files promises are also evaluated, and the files used in the classification process are copied locally — if they don’t exist on the policy hub, that’s not a problem: the worst that can happen is that CFEngine will tell us that it couldn’t find them. The commands promise will now run, the node will be classified, and the class henc_class_activated will be set.

At the third and last pass, all the promises have been already evaluated, so CFEngine just skips them all. Job done!

Conclusion

In this post I tried to describe how we implemented hierarchical external node classification in CFEngine, and by using only native CFEngine functionalities. Other shops have implemented it differently, as you can read from the help-cfengine forum: LinkedIn uses Range, MailOnline’s Khushil Dep used NodeJS and, if I understand the comments correctly, Normation built one in Rudder. Our ENC is probably the least sophisticated of the pool, but it is already helping us to scale faster, and we are confident that it will help us to scale to much larger numbers than we have now!

Credits

Advertisements

12 thoughts on “External node classification, the CFEngine way

    • Hi Oxtan, thanks for commenting.

      An approach based on a network database didn’t fully suit our needs. First, the node may lose the ability to classify itself when the network is unavailable. Second, while editing a text file in a simple format is straightforward, not everyone may be used to managing information in, say, an LDAP directory or a MongoDB database.

      Regarding the specific of your approach:
      – what happens to a node when the LDAP directory is unreachable?

      – I see that hostinnetgroup accepts only a scalar argument: I understand that you have to call that function many times, one for every class you want to check, to fully classify a node?

      – If so, how heavy is it?

      Thanks again, ciao!
      — bronto

  1. if the ldap servers are not available, then you have bigger problems ;-), they are not only the ldap servers, but also the kerberos kdc’s and they are responsible for one of the company’s sub-domains; that is why ldap environments are resilient with multiple servers in a multi master config.

    But, fair enough. Let us suppose that the ldap environment is not available. In our environment, that would mean that promises that have not been promised already, would not be promised. But those that already have, would not be affected. This is because cfengine does not revert changes already committed. But: provided the ldap servers (and consequently, the kds and dns servers) are not available, then the whole network is already a mess: no one can log in, services are unreachable. At that point, cfengine is the least of my problems.

    Most companies already have LDAP, and you can it delegate tasks to junior staff, so it is pretty straight forward. Nowadays modifying the directory is pretty simple really. Its in a web interface, although you can use any programming language if you need to.

    I have not noticed any performance impact. To be honest, I have not asked myself the question you ask. Maybe a question for the developpers?

    • Sorry, I may not have explained my question enough. What I was wondering is how your policies cope with the unfortunate event of losing connection with the directory server. I understand that they just apply some “general” policies, and refrain to apply anything that depends on external classification. Did I get it right?

  2. Pingback: evolvethinking.com » Bulding CFEngine classes using EFL

  3. Pingback: cf-deploy: easier deployment of CFEngine policies | A sysadmin's logbook

  4. I’m very interested in your approach to this, but there is something that I’m not understanding. Maybe it’s how the pieces connect. In you example above there are three classes: oslo, oslo_public, and on_private_net_only. It seems that a node would have to know about these in order to begin. How does a node discover this?

    • Hi Dan

      Those classes come from our real policies. For each location we have machines in, classes will be defined based on the IPs configured on the machine that tell us where the node is located. In this specific case, the class ‚Äúoslo‚ÄĚ will be defined if the node has an IP that is in one of our ranges in Oslo, ‚Äúoslo_public‚ÄĚ will also be defined if the machine has a public IP address, and ‚Äúon_private_net_only‚ÄĚ will be defined if the machine has only private IP addresses (no direct access to the Internet, may have it via a NAT however).

      It is really easy to define such classes in CFEngine, using class promises with or() and iprange(): you can create a simple ‚Äúlocation detection‚ÄĚ logic based only on the IPs configured on the node. You can see the documentation for iprange at https://cfengine.com/docs/3.5/reference-functions-iprange.html

      I hope this clarifies, if not please ask for more detail.

      • Thanks for the reply. I have a test system up and running now. One thing though, the classes set in the henc bundle are not visible in other bundles even though the other bundles come later in the bundlesequence. How can I make the classes visible to other bundles?

      • Hi Dan

        The class defined via ENC are global. What is probably happening is that either you are running ENC after the bundles that should use it, or the bundles that don’t seem to work are of the “bundle common” type. That is expected, because usually CFEngine parses common bundles before any other bundle to ensure that global variables and classes are defined before anything else.

        The solution is to re-evaluate the common bundles that use ENC-defined variables/classes after ENC has run. In my policies I have a list of those, and immediately after running ENC via a method I have this methods promise:

        # Need to re-evaluate common bundles if they depend on information
        # from ENC
        "reevaluate_bundle_common_$(policy.commons)"
        comment => "evaulating $(policy.commons) variables after ENC",
        usebundle => $(policy.commons) ;

        This will iterate over the common bundles listed in @(policy.commons) and evaluate them once again.

        Let me know if this solves your problem. Ciao!

  5. I thought I had fixed the bundle order, but it turned out that my test bundle was still before henc. Now I’m thinking of how I might integrate this into my environment. I have a little over 100 nodes under management right now.

    The other thing I’m looking for is a way to read a file to get a list of machines that belong in a class. It appears that the Evolve Free Library can do that but I haven’t got it working yet.

    I watched your talk from Fosdem. Are the slides from the talk availible on line somewhere?

    Thanks.

    • Hi Dan

      I’d suggest that you come to the help-cfengine mailing list/forum, post some code and explain the problem in full there. That will make it easier for me to help, and will allow other so provide their insight as well. There are many smart people there! That will also get you in touch with Neil Watson easily (Neil is the author of EFL).

      The slides are on this same blog: https://syslog.me/2014/02/09/my-talks-at-fosdem-and-cfgmgmtcamp/

      Hope to see you in the mailing list. Ciao!
      — M

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s