The policy we are about to see this time ensures that the hosts file contains at least the small set of records that every hosts file should always include: a record for the IPv4 localhost, a record for the IPv6 localhost, and a record that associates one IP of the host with the FQDN and the hostname in that order. It should also contain a set of IPv6 standard addresses.
This policy is definitely not ready for prime time, and I discourage you from using it (unless you are willing to patch it and share your patch with the rest of the world). Nevertheless, it is a good example of how, with cfengine, you can take care of just a few details in a file, leaving the other parts untouched. …An example of the problems that affect this policy:
- it uses the function readstringarray, that is affected by the long-standing bug number 887: workarounds are in place, which means that the policy is not as clear as it could be if the function worked properly;
- it was conceived in ignorance of the whitespace_policy, which forced me to implement a "normalization" of the file that was not really necessary
- it has known bugs (see comments in the policy)
Besides, it uses rather contrived regular expressions. No, it's not a bug 😉 but those parts will need more explanation than others.
Let's start from the policy itself. The code is heavily commented, so this post will only add some logic and information that was not included in the comments.
# At the moment I write this policy, there is a bug in readstringarray # (https://cfengine.com/bugtracker/view.php?id=887) which prevents # readstringarray to work properly when using a bundle parameter as # filename. Same holds if it tries to use a calling bundle's parameter, # too. # In order to work around this, readstringarray should either use an # hardcoded name (which we don't want), or refer to another bundle's # variable. That's why we copy $(h) to $(file), and then we refer # to $(hosts.file) in fix_host_entries. bundle agent hosts(h,myip,uqhost,domain) { vars: "file" string => "$(h)" ; files: "$(file)" edit_line => fix_host_entries("$(myip)","$(uqhost)","$(domain)") ; reports: } bundle edit_line fix_host_entries(myip,uqhost,domain) { vars: # regular expression variables we'll use later "fqhost" string => "$(uqhost).$(domain)" ; "re_myip" string => escape("$(myip)") ; "re_fqhost" string => escape("$(fqhost)") ; "re_uqhost" string => escape("$(uqhost)") ; "re_domain" string => escape("$(domain)") ; # what we'll add to $(myip) to make a minimal host record "host_record" string => "$(fqhost) $(uqhost)" ; #!! FIXME # these records are not super-critical, as of today, so we'll # add them unconditionally -- this may result in duplicates # because of different formatting, but we'll skip this for now. "ipv6standard" slist => { "fe00::0 ip6-localnet", "ff00::0 ip6-mcastprefix", "ff02::1 ip6-allnodes", "ff02::2 ip6-allrouters", } ; # We parse hosts file records, to later create classes which will # reflect which records need to be added, and which ones needs # to be amended, in case. "count" int => readstringarray("records", # array to populate "$(hosts.file)",# file to read "s*#[^n]*?", # match comments "s+", # match fields "1000", # max entries "80000") ; # max bytes "ip" slist => getindices("records"), comment => "IPs which have a record in the hosts file" ; "ipclass[$(ip)]" string => canonify("$(ip)"), comment => "Class names for each IP found" ; "myaddr_class" string => canonify("$(myip)"), comment => "class name for our own IP address"; classes: "has_ip_$(ipclass[$(ip)])" expression => "any", comment => "We define a class for each IP address in the hosts file" ; "has_ipv4_localhost" expression => "has_ip_127_0_0_1", comment => "If hosts has a record for 127.0.0.1, this will be defined" ; "has_ipv6_localhost" expression => "has_ip___1", comment => "If hosts has a record for ::1, this will be defined" ; "has_host_record" expression => "has_ip_$(myaddr_class)", comment => "If host has a record for its own address, this will be defined" ; delete_lines: # Thanks to oha for his help with this pattern. I was trying to solve # this with a negative lookbehind, (host name *not* preceded by...) and # I didn't realise that a positive lookbehind at the ^ (beginning of # line NOT followed by...) would work! # Anyway, the pattern below means: # Beginning of line # not followed by our ip address, and a space # then we start matching the real thing: # an address (IPv4 or IPv6; this RE matches much more than that, KISS...) # a sequence starting with whitespace followed by an hostname, 0 or more times # then whitespace and our hostname, either unqualified or qualified # then again whitespace and hostname sequence, 0 or more times # whitespace padding the end of line, 0 or more # # This means that this RE matches all the lines which contain our hostname, # either qualified or unqualfied, but not associated with a proper IP address. # It's a dangerous line, and we wipe it. hosts_records_normalized:: "^(?!$(re_myip)s)[a-fA-F0-9:.]+(s+[a-zA-Z0-9.-]+)*s$(re_uqhost)(.$(re_domain))?(s+[a-zA-Z0-9.-]+)*s*" ; insert_lines: hosts_records_normalized:: "127.0.0.1$(const.t)localhost" ifvarclass => "!has_ipv4_localhost"; "::1$(const.t)localhost ip6-localhost ip6-loopback" ifvarclass => "!has_ipv6_localhost" ; "$(myip)$(const.t)$(host_record)" ifvarclass => "!has_host_record" ; "$(ipv6standard)" ; replace_patterns: hosts_records_normalized:: "^127.0.0.1t(?!localhostb)(.*)" replace_with => value("127.0.0.1$(const.t)localhost $(match.1)"), ifvarclass => "has_ipv4_localhost" ; "^::1t(?!localhostb)(.*)" replace_with => value("::1$(const.t)localhost $(match.1)"), ifvarclass => "has_ipv6_localhost" ; # FIXME # the followng doesn't work properly in the edge case where you have, # e.g. two consecutive occurrences of uqhost with no space in between. # Should be fixed. Probably, the line for localhost is also affected "^$(re_myip)t(?!$(re_fqhost)s+$(re_uqhost))(.*)" replace_with => value("$(myip)$(const.t)$(host_record) $(match.1)"), ifvarclass => "has_host_record" ; !hosts_records_normalized:: "^(s*)([a-fA-F0-9:.]+)( +)" replace_with => value("$(match.2)$(const.t)"), classes => if_ok("hosts_records_normalized"), comment => "Normal records begin at column one and are in IPtNAMES format" ; reports: report_minimum:: "Read $(count) records from $(edit.filename)" ; "Address matched: $(ip)" ; }
We call the agent bundle in a methods promise, as in:
methods: "hosts" usebundle => hosts("/etc/hosts", "$(sys.ipv4)", "$(sys.uqhost)", "$(def.domain)") ;
The bundle agent hosts is quite compact: in fact, all the hard work is performed in the edit_line bundle. We'll go through the three passes the agent runs over it, and see how the file editing takes place. Going through the hosts bundle three times is not really necessary, as nothing will change after the first pass.
For the reasons that are readable in the comments on top of the file, we define a "file" variable using the parameter $(h); in our example, it will be the string "/etc/hosts". We then use this value in the files promise below, thus editing that file by means of the fix_host_entries bundle, to which we forward the remaining parameters.
In fix_host_entries, we start by building a number of variables from the parameters we receive: $(fqhost) is the FQDN of the host machine; the following four lines use the function escape() to build strings that we can safely use in regular expression, hence the prefix "re_".
$(host_record) is just a shortcut for a string we'll use several times in the promises below.
$(ipv6standard) is a set of lines that we'll plainly ensure in the file; be sure to check the comments in the code!
And there it is: readstringarray(). We use that function to parse the /etc/hosts file, and populate the records array with that. When the function has finished parsing the file, records will contain the… well, records in the file, indexed by IP address. The comments (strings matching "s*#[^n]*?" that is: zero or more spaces, followed by an hash, followed by the shortest possible string containing anything but newline) will be skipped, and the records will be broken in two pieces at the first sequence of whitespace. Again, be sure to check the comments in the code to fully understand what goes on.
We then use getindices to fill the ip list with the keys of the records array, that is: the IP addresses of the hosts records. We use @(ip) immediately by iterating over its values and filling out another array, ipclass, that will associate each IP with its canonified version.
Lastly, we canonify our own IP address in myaddr_class which will be handy a few promises below.
The next step is classes: we use the values in @(ip) to iterate over the values in ipclass, in order to define a class for each IP address in /etc/hosts; the class is set to "any" which basically means it is always defined.
We then set the classes has_ipv4_localhost, has_ipv6_localhost, and has_host_record, which will be set only if the previous iteration has set the classes they are associated with.
Then come the delete_lines and insert_lines promises, that will be skipped at the first pass since the hosts_records_normalized will be defined only later on.
replace_patters promises come next, and start the change. At the first pass, the first set is not applied since hosts_records_normalized is not set yet. But the second set, which implements a normalization of the records in the file, will run. The regular expression, "^(s*)([a-fA-F0-9:.]+)( +)", says:
- from the beginning of the line "^";
- match any number of spaces and save it "(s*)";
- then match one or more characters in the set: alphabetic characters, digits, colons, or dots; this is the set of the characters that can be used to compose an IPv4 or IPv6 address "([a-fA-F0-9:.]+)";
- finally, we match and save a set of one or more spaces;
Please notice that this regex is sub-optimal: we actually care to save only the second matched pattern: in fact, we don't use the other two in the replace_with clause, which will replace the string matching the pattern with the address (the second group matched) followed by a tab ($(const.t)). Basically, we ensure that there are no spaces preceding the IP address, and that the address is followed by a single tab. If the promise is repaired or kept, that is: if the file has been, or was already, normalized, the class "hosts_records_normalized" will be set.
The reports promises will be applied only if the report_minimum class is set; check site.cf for details. This ends the first pass.
And there we go with the second pass. The vars promises are all confirmed, as are the classes promises. delete_lines promises, however, are applied now, and wipe away all the records that contain this machine's hostname, if they associate the hostname to a different IP than the one we specified. How this exactly happens is explained in the comments.
insert_lines promises are now considered as well, and will be applied where needed. IPv6 standard records, however, are always applied once the records are normalized.
replace_patterns promises are applied, too. Not to normalize the records as in the first pass, but to check that the file contains at least the minimal set of records it should: a record for localhost (IPv4 and IPv6), and a record for the current IP with the FQDN and the hostname in that order. More specifically:
- "^127.0.0.1t(?!localhostb)(.*)" matches those lines beginning with "127.0.0.1", followed by a tab, but not followed by an isolated "localhost" string, and replace it with "127.0.0.1$(const.t)localhost $(match.1)", that is: "127.0.0.1", followed by a tab, followed by "localhost" followed by the rest of the original record.
- "^::1t(?!localhostb)(.*)" does the same job but with the IPv6 record for localhost.
- "^$(re_myip)t(?!$(re_fqhost)s+$(re_uqhost))(.*)" matches those lines beginning with the IP we specified, followed by a tab, followed by something that is not the FQDN and the unqualified hostname, and replaces it with "$(myip)$(const.t)$(host_record) $(match.1)", that is: the IP we specified, followed by a tab, followed by the FQDN and the unqualified hostname, followed by what the record originally contained.
At the end of this process, the hosts file (or rather, the temporary copy of it cfengine's working on) will have all the records normalized, and the IP address correctly associated with the FQDN and the hostname.
The third pass won't make any further change. Bugs aside, we have now ensured that the hosts file contains what it should.
The next time we'll take a short break from this series to see another policy: location.cf, that implements a number of heuristics to sort-of "geo-locate" the computer running the policy, and configure it accordingly. Until then… take care 😉