Hacker News new | past | comments | ask | show | jobs | submit login

Uh. I'm skeptic about Swarm being something that "works today". I really want to use it and spent some time experimenting, but failed to get it working. It could work for a toy webapp project, but I believe is insufficient for anything complicated. There are a lot of things that one would expect to have but that are yet unsolved. Or I'm just unaware that there is a solution (or the solution didn't fit my personal requirements).

1. Networking options are really limited.

Built-in ingress loses of the originating IP address, cannot bind ports only on specific nodes (something that may have partially fixed this was implemented in 17.06 with --network=host option, but I haven't found much documentation), cannot prefer same-node containers (which is important if you have many microservices, as the latency adds up quickly) and more.

For ingress, if one's lucky to have only HTTP traffic (I don't), they can use something like Traefik. But they'll need to run LBs on a manager nodes which isn't really a smart thing to do, as Swarm is said to be sensitive to overloaded managers. Or one can just go for external LBs, like CloudFlare or whatever Amazon/Google/Microsoft offers.

If one needs raw TCP, UDP or other IP traffic - I think they'd better completely ignore built-in service discovery and LBs and go for something external, like etcd+haproxy/nginx, outside of the Swarm (on nodes' host OS). It has to be completely manual setup (okay, I meant Ansible/Puppet/Chef/Salt/etc). While it's possible to run LBs with Docker (non-Swarm containers, if the Swarm network is attachable), etcd is just not designed to auto-deploy on Swarm: its design explicitly disallows one to just `docker service create --name etcd my/custom/etcd && docker service scale etcd=5`, you'll need to set up every node by hand. I believe Consul is better in this regard but haven't tried it yet.

2. I haven't figured out how to have a persistent storage that follows the containers. If a node dies, Swarm would spawn new containers on other nodes, but no chance to have even a slightly dated snapshot. There was something called Flocker that looked like a solution, but it's essentially dead (despite the revival attempts). GlusterFS is an option I know about, but it's really sensitive to latency.

Databases are even more tricky, unless one's bold to use something fancy like CockroachDB (I had enough subtle issues with RethinkDB to be wary about bleeding edge stuff). Maybe I'm just too stupid, but I failed to grok dynamic PostgreSQL multi-master BDR setup, so my DB is still SPOF with some WAL streaming replication manually-activated failovers.

3. Secrets look like a nice addition, but they're best avoided. They're immutable and you have to recreate the container to switch them. If you have any services that have many user-initiated long-living connections (e.g. IRC, XMPP, Websockets or media streaming), this would makes secrets basically unusable for anything that could be rotated, like TLS certificates. Unless you can drop all your users every now and then.

4. Logging was quite messy, but they've sorted it out with 17.06.

(As for the K8S - it solves most of the issues, but I got my share of issues with Rancher, so I'm really wary about having any complexity in the core. There's already a beast called Linux kernel down there, and $deity have mercy on those who have to debug its oopses. If a behemoth - I mean, K8s - decides to misbehave, I expect to have a really bad time trying to keep things afloat. Even Swarm mode is fairly complex black box binary - but at least I can try debugging it.)




Regarding persistent storage following containers, check out REX-Ray: https://rexray.readthedocs.io/en/stable/ as it works with various storage options.


1. You can use host-mode port binding (equivalent of `docker run -p` as opposed to the routing mesh), you could also use macvlan/ipvlan to do this.

2. Indeed this is tricky right now. One way to sort-of fake it is you can do something like `--mount type=volume,source='important_data{{.Task.Slot}}'`... I'm not sure I would call this a recommendation, but worth playing with. But also I'm not sure if automatic failover of databases is truly a thing, it's just not that easy (outside of the storage aspect).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: