Exploring Docker overlay networks

Docker In the past months I have made several attempts to explore Docker overlay networks, but there were a few pieces to set up before I could really experiment and… well, let’s say that I have probably approached the problem the wrong way and wasted some time along the way. Not again. I have set aside some time and worked agile enough to do the whole job, from start to finish. Nowadays there is little point in creating overlay networks by hand, except that it’s still a good learning experience. And a learning experience with Docker and networking was exactly what I was after.

When I started exploring multi-host Docker networks, Docker was quite different than it is now. In particular, Docker Swarm didn’t exist yet, and there was a certain amount of manual work required in order to create an overlay network, so that containers located in different hosts can communicate.

Before Swarm, in order to set up an overlay network one needed to:

  • have at least two docker hosts to establish an overlay network;
  • have a supported key/value store available for the docker hosts to sync information;
  • configure the docker hosts to use the key/value store;
  • create an overlay network on one of the docker host; if everything worked well, the network will “propagate” to the other docker hosts that had registered with the key/value store;
  • create named containers on different hosts; then try and ping each other using the names: if everything was done correctly, you would be able to ping the containers through the overlay network.

This looks like simple high-level checklist. I’ll now describe the actual steps needed to get this working, leaving the details of my failuers to the last section of this post.

Overlay networks for the impatient

A word of caution

THIS SETUP IS COMPLETELY INSECURE. Consul interfaces are exposed on the public interface of the host with no authentication. Consul communication is not encrypted. The traffic over the overlay network itself is not encrypted nor protected in any way. Using this set-up in any real-life environment is strongly discouraged.

Hardware set-up

I used three hardware machines, all running Debian Linux “Jessie”; all machines got a dynamic IPv4 address from DHCP

  • murray: 192.168.100.10, running consul in server mode;
  • tyrrell: 192.168.100.13, docker host on a 4.9 Linux kernel installed from backports;
  • brabham: 192.168.100.15, docker host, same configuration as tyrrell.

Running the key-value store

create a directory structure, e.g.:

mkdir consul
cd consul
mkdir bin config data

Download consul and unpack it in the bin subdirectory.

Finally, download the start-consul-master.sh script from my github account into the consul directory you created, make it executable and run it:

chmod a+x start-consul-master.sh
./start-consul-master.sh

If everything goes well, you’ll now have a consul server running on your machine. If not, you may have something to tweak in the script. In that case, please refer to the comments in the script itself.

On murray, the script started consul successfully. These are the messages I got on screen:

bronto@murray:~/Downloads/consul$ ./start-consul-master.sh 
Default route on interface wlan0
The IPv4 address on wlan0 is 192.168.100.10
The data directory for consul is /home/bronto/Downloads/consul/data
The config directory for consul is /home/bronto/Downloads/consul/config
Web UI available at http://192.168.100.10:8500/ui
DNS server available at 192.168.100.10 port 8600 (TCP/UDP)
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.0'
           Node ID: 'eeaf4121-262a-4799-a175-d7037d20695f'
         Node name: 'consul-master-murray'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 192.168.100.10 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 192.168.100.10 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: 

==> Log data will now stream in as it occurs:

    2017/04/11 14:53:56 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:192.168.100.10:8300 Address:192.168.100.10:8300}]
    2017/04/11 14:53:56 [INFO] raft: Node at 192.168.100.10:8300 [Follower] entering Follower state (Leader: "")
    2017/04/11 14:53:56 [INFO] serf: EventMemberJoin: consul-master-murray 192.168.100.10
    2017/04/11 14:53:56 [WARN] serf: Failed to re-join any previously known node
    2017/04/11 14:53:56 [INFO] consul: Adding LAN server consul-master-murray (Addr: tcp/192.168.100.10:8300) (DC: dc1)
    2017/04/11 14:53:56 [INFO] serf: EventMemberJoin: consul-master-murray.dc1 192.168.100.10
    2017/04/11 14:53:56 [WARN] serf: Failed to re-join any previously known node
    2017/04/11 14:53:56 [INFO] consul: Handled member-join event for server "consul-master-murray.dc1" in area "wan"
    2017/04/11 14:54:03 [ERR] agent: failed to sync remote state: No cluster leader
    2017/04/11 14:54:06 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2017/04/11 14:54:06 [INFO] raft: Node at 192.168.100.10:8300 [Candidate] entering Candidate state in term 3
    2017/04/11 14:54:06 [INFO] raft: Election won. Tally: 1
    2017/04/11 14:54:06 [INFO] raft: Node at 192.168.100.10:8300 [Leader] entering Leader state
    2017/04/11 14:54:06 [INFO] consul: cluster leadership acquired
    2017/04/11 14:54:06 [INFO] consul: New leader elected: consul-master-murray
    2017/04/11 14:54:07 [INFO] agent: Synced node info

Note that the consul web UI is available (on murray, it was on http://192.168.100.10:8500/ui as you can read from the output of the script). I suggest that you take a look at it while you progress in this procedure; in particular, look at how the objects in the KEY/VALUE section change when the docker hosts join the cluster and when the overlay network is created.

Using the key/value store in docker engine

Since my set-up was not permanent (I just wanted to test out the overlay networks but I don’t mean to have a node always on and exposing the key/value store), I did only a temporary change in Docker Engine’s configuration by stopping the system service and then running the docker daemon by hand. That’s not something you would do in production. If you are interested in making the change permanent, please look at the wonderful article in the “Technical scratchpad” blog.

On both docker hosts I ran:

systemctl stop docker.service

Then on brabham I ran:

/usr/bin/dockerd -H unix:///var/run/docker.sock --cluster-store consul://192.168.100.10:8500 --cluster-advertise=192.168.100.15:0

while on tyrrell I ran:

/usr/bin/dockerd -H unix:///var/run/docker.sock --cluster-store consul://192.168.100.10:8500 --cluster-advertise=192.168.100.13:0

If everything goes well, messages on screen will confirm that you have successully joined consul. E.g.: this was on brabham:

INFO[0001] 2017/04/10 19:55:44 [INFO] serf: EventMemberJoin: brabham 192.168.100.15

This one was also on brabham, letting us know that tyrrell had also joined the family:

INFO[0058] 2017/04/10 19:56:41 [INFO] serf: EventMemberJoin: tyrrell 192.168.100.13

The parameter --cluster-advertise is quite important. If you don’t include it, the creation of the overlay network at the following step will succeed, but containers in different hosts will fail to communicate through the overlay network, which will make the network itself pretty pointless.

Creating the overlay network

On one of the docker hosts, create an overlay network. In my case, I ran the following command on tyrrell to create the network 10.17.0.0/16 to span both hosts, and name it bronto-overlay:

$ docker network create -d overlay --subnet=10.17.0.0/16 bronto-overlay
c4bcd49e4c0268b9fb22c7f68619562c6a030947c8fe5ef2add5abc9446e9415

The output, a long network ID, looks promising. Let’s verify that the network is actually created:

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
3277557c8f42        bridge              bridge              local
80dfa291f573        bronto-bridged      bridge              local
c4bcd49e4c02        bronto-overlay      overlay             global
dee1b78004ef        host                host                local
f1fb0c7be326        none                null                local

Promising indeed: there is a bronto-overlay network and the scope is global. If the command really succeeded, we should find it on brabham, too:

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
90221b0a57e1        bridge              bridge              local
c4bcd49e4c02        bronto-overlay      overlay             global
3565b28a327b        docker_gwbridge     bridge              local
f4691f3066cb        host                host                local
86bdd8ab9b90        none                null                local

There it is. Good.

Test that everything works

With the network set up, I should be able to create one container on each host, bound to the overlay network, and they should be able to ping each other. Not only that: they should be able to resolve each other’s names through Docker’s DNS service and despite being on different machines. Let’s see.

On each machine I ran:

docker run --rm -it --network bronto-overlay --name in-$HOSTNAME debian /bin/bash

That will create a container named in-tyrrell on tyrrell and a container named in-brabham on brabham. Their main network interface will appear to be bound to the overlay network. E.g. on in-tyrrell:

root@d86df4662d7e:/# ip addr show dev eth0
68: eth0@if69: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:11:00:03 brd ff:ff:ff:ff:ff:ff
    inet 10.17.0.3/16 scope global eth0
       valid_lft forever preferred_lft forever

We now try to ping the other container:

root@d86df4662d7e:/# ping in-brabham
PING in-brabham (10.17.0.2): 56 data bytes
64 bytes from 10.17.0.2: icmp_seq=0 ttl=64 time=0.857 ms
64 bytes from 10.17.0.2: icmp_seq=1 ttl=64 time=0.618 ms
^C--- in-brabham ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.618/0.738/0.857/0.120 ms
root@d86df4662d7e:/#

Success!

How things really went

Things weren’t so straightforward in reality.

I will use a consul container. Or not.

When I was designing my experiment I thought I would run consul through a consul container in one of the two hosts. I thus spent some time playing with the progrium/consul container, as indicated by Docker’s documentation, until I found out that it wasn’t being updated since two years… You can find the outcast of my experiment with progrium/consul on github.

I thus switched to Docker’s official consul image. It was similar enough to progrium/consul to seem an easy port, and different enough that it made me hit the wall a few times before I got things right with that image, too. With the help of a couple of scripts I could now easily start a server container and a client container. You can find the outcast of my experiments with docker’s consul image on github, too.

Then, I understood that the whole thing was pointless. You don’t need to have a consul client running on the docker host, because the docker engine itself will be the client. And you can’t have the consul server running as a container on one of the docker hosts. In fact, the docker engine itself needs to register at start-up with the consul server, but the consul server won’t be running because docker hasn’t started… ouch!

That wasn’t a complete waste of time anyway. In fact, with the experience acquired in making consul containers work and the scripts that I had already written, it was an easy task to configure a consul server on a separate laptop (murray). The script that I used to run the consul server is also available on github.

The documentation gap

When I started playing with docker, Docker Swarm didn’t exist yet, or was still in a very primitive stage. The only way to make containers on different hosts communicate were overlay networks and there was no automated procedure to create them, therefore Docker’s documentation explained in some details how they should be set up. With the release of Swarm, overlay networks are created automatically by Swarm itself and not much about the “traditional” set up of the overlay networks was left in the documentation. Walking through the “time machine” in the docs web page proved to be problematic, but luckily enough I found this article on the Technical Scratchpad blog that helped me to connect the dots.

Don’t believe in magic

When I initially started the docker engine and bound it to the consul server I left out the --cluster-advertise option. All in all, docker must be smart enough to guess the right interface for that, right?

No, wrong.

When I did leave the option out, everything seemed to work. In particular, when I created the overlay network on tyrrell I could also see it on brabham. And when I created a container on each host, docker network inspect bronto-overlay showed that both containers were registered on the network. Except that they couldn’t communicate. The problem seems to be known, and things started to work only when I added the --cluster-advertise option with a proper value for each host.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s