External node classification is a Puppet functionality, where it is left to a program external to Puppet (the external node classifier, ENC) to decide which contexts apply to the node being configured. This approach is opposed to the “standard”, basic one, where the configuration applied to a node is completely defined in Puppet files (manifests), and site.pp in particular. ENC programs really show their power and usefulness where the same ruleset is used to manage a large number of servers, with many different combinations of configurations are applied. In this case, pulling the configuration information from a structured data base instead of plain files scales much better. One can write his own ENC, or use one of the several available, hiera being a well-known one.
Note: files and ENC are just two possible ways to classify nodes in Puppet; besides, classification happens on the puppetmaster, while in CFEngine all configuration decisions are taken on the node running the agent. You may read more information on node classification in puppet here, but then we’ll leave you alone and keep reading this post 🙂
Now transpose to CFEngine. Being it generally more “low-level” than Puppet, it provides no ENC mechanism out of the box, but plenty of possibilities to implement one yourself. I first checked what were the available options. I got very nice suggestions, notably one by LinkedIn’s Mike Svoboda, where they use a Yahoo! open source product called Range to store the data about the nodes, then they dump the data in JSON format, and finally they use a bash script run as a CFEngine module to raise the relevant classes. As scalable and sophisticated as it is, it was way too much than what I needed.
ENC, in my case, had two purposes: the first: allow us to scale better (and that’s a common trait in all ENC mechanisms) the second: take as much configuration information as possible out of the policies and in plain text files. This way, the access barrier for the non-CFEngine savvy people in the company would be lowered significantly. This approach is actually closer to Neil Watson’s and EvolveThinking’s CFEngine library (based on CSV files) than to the otherwise wonderful LinkedIn approach.
I sowed this information and ideas in my brain and let them sprout in the background for a couple of weeks, letting the most complex solutions drop. The last one standing was, in my opinion, the best combination of power and simplicity. We’d use plain text files, the CFEngine’s own module protocol, and extra-simple scripts: a bash script for simple external node classification, and a Perl script for hierarchical node classification. I’ll summarize the module protocol first, and then show how we leverage it to achieve ENC.