rsync, rsyncd, and filters

I had a problem to solve today. I have a bunch of files in a remote rsyncd repository, which I'll call "PuppetConf", and a number of them that I want to synchronize more often than the others, which I'll call "volatile files" or simply "volatile". Now, the problem is that volatile files, which used to reside in /volatile, now need to be located in other paths as well, and I needed a clever way to synchronize them alltogether without involving complex, long command line expressions.

After some experimenting with export, import, and filters, I finally found a way to do that. This is my filter file:

# This should be run with:
# rsync -zav --delete --filter="merge /path/to/this/file" rsyncdserver::PuppetConf /destination/path
# To understand what the patterns in this file means, see the rsync man
# page, in particular the section FILTER RULES.
# If you don't want to go through all that, please read at least the
# following excerpt.
#        o      if  the pattern starts with a / then it is anchored to a
#               particular spot in the hierarchy of files, otherwise  it
#               is  matched  against  the  end of the pathname.  This is
#               similar to a leading ^  in  regular  expressions.
# [...]
#        o      a '*' matches any non-empty path component (it stops  at
#               slashes).
#        o      use '**' to match anything, including slashes.
# [...]
#        o      if the pattern contains a / (not counting a trailing  /)
#               or a "**", then it is matched against the full pathname,
#               including  any  leading  directories.
# [...]
#        o      a  trailing "dir_name/***" will match both the directory
#               (as if "dir_name/" had been specified) and everything in
#               the  directory (as if "dir_name/**" had been specified).
# [...]
#        Note that, when using the --recursive  (-r)  option  (which  is
#        implied  by  -a),  every  subcomponent of every path is visited
#        from the top down,  so  include/exclude  patterns  get  applied
#        recursively  to  each subcomponent's full name (e.g. to include
#        "/foo/bar/baz" the subcomponents "/foo" and "/foo/bar" must not
#        be  excluded).  [...] One solution is to ask for
#        all directories in the hierarchy to be included by using a sin-
#        gle  rule: "+ */" (put it somewhere before the "- *" rule)

# Scan every directory, top down
+ */

# Transfer the top-level volatile directory, and everything under it
+ /volatile/***

# Transfer the top-level release file
+ /release

# Find all those paths which end with nodes, and transfer everything below it
+ nodes/***

# Don't copy anything else
- *

So, if I run this command I update just the volatile files. Good enough, but I still have a long command line here.

The next step was to slightly change the rsyncd configuration that Claudia made server side. Having this stanza in rsyncd.conf:

 path = /puppet
 comment = Volatile files
 read only = true
 transfer logging = true
 filter = merge /etc/rsyncd-filter.conf
 log format = %a %h %o %f %l %b
 log file = /var/log/rsyncd.log

where /etc/rsyncd-filter.conf is the same filter file you see above, shortens the rsync line to this:

rsync -zav rsyncdserver::Volatile /destination/path

and I like it a lot better 🙂

Note that I can't use the –delete option anymore, or I'd wipe everything but the volatile files off. This is not a big deal anyway, since I have to occasionally do a full sync, like:

rsync -zav --delete rsyncdserver::PuppetConf /destination/path

It's so nice to learn new things on the go 🙂


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s