Hey guys, thanks for your patience. I'm sorry the ride hasn't been as smooth as ...

shykes · on July 22, 2013

I talked to Nick and we looked at what was causing the problems on his server. A few details on the investigation so far.

1. It's not caused by RethinkDB :)

2. It's not caused by cpu, ram or io load. The server is definitely taking a hit, but it still has spare capacity.

3. No apparent issue on the machine itself (a Linode box)

4. It's not a docker stability issue either. The docker daemon is happy as a clam, and the containers that do get deployed also run perfectly smoothly.

5. We suspected that maybe Hipache failed to update its routing configuration fast enough. But that turned out to be a red herring too (unsurprising since Hipache handles vastly larger load at dotcloud).

Our best explanation so far: Docker's default configuration is to assign to itself a /24 ip range. This only allows for 254 distinct addresses, and thus 254 containers (note: the reasoning behind this default was to minimize the footprint on the host system. In retrospect it seems like a silly decision, we'll change it to something larger). Nick then reconfigured docker's interface to a /16, allowing for more containers to be created, and restarted the docker daemon. This causes docker to allocate IPs to new containers from the new, expanded range, while avoiding possible conflicts with the IPs of pre-existing containers. That's where the problem occurs: Docker is doing this correctly, but very slowly, creating a 30 second delay between container creation and successful IP allocation. We're investigating why exactly.

shykes · on July 22, 2013

I just pushed a fix: https://github.com/dotcloud/docker/pull/1265