Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm running a small Exhibitor/Zookeeper cluster dockerized (5 nodes, several hundred clients), and its extremely straightforward. Etcd isn't what we'd consider production-grade yet.


Having worked with etcd in production for the last few months, I have to agree. The CoreOS stack needs some more time to marinate.


Thanks for this comment. Glad to know I made the right choice.

I'm not saying etcd won't ever overshadow Zookeeper, it probably will with the momentum behind it, but as an ops guys, I wasn't willing to bet production application service discovery on it.


My distaste for the Go community is pretty well-established in these parts; I think worse-is-better is screwing us all, and etcd seems to me to be the worse-is-better Zookeeper. And for things that don't matter, sure, worse-is-better your life away; a Rails app can be whatever you want, but the infrastructure I manage had better be bulletproof. I won't say etcd will never be competitive, but without some significant changes, I don't see it getting my vote--and those changes are largely around the parts of the feature set that etcd doesn't support, at which point...why use it, anyway?


What particular issues have you run into with etcd and/or CoreOS?


Lots of split brains. Serious bugs making it through the alpha and beta channels into stable (and our boxes auto-updating only to become useless). Fleet units dying purely due to problems with fleetd/systemd. A particularly painful one was an Akka deployment on top of CoreOS where a sidekick unit would fail to start because fleet hadn't actually copied the unit file to the remote host. Only happened with sidekicks but due to how we ran our networking, it effectively killed the application. Almost every redeploy required manually getting fleet to copy the unit over.


Just to add on: I've had fleet misreport unit status and btrfs reporting lack of disk space for no apparent reason. Also the inability to restart individual failed units which are part of a global unit.

Also there was that one time they changed how cloud-config was parsed, so if "#cloud-config" wasn't on the very first line without preceeding spaces, initialisation would fail. That was when I switched the reboot strategy to manual.


Btrfs is no longer the default for CoreOS for this reason. Overlayfs doesn't have this issue.


Oh man, yes. I'd blocked all my scaring memories of btrfs biting me in the ass.


Matches up pretty well with my experience, too. I do not trust fleet as far as I can throw it.


Yeah, the whole project was something of a disaster. Eventually things stabilized a bit but every few weeks etcd or fleetd would throw a curveball and I'd lose a day of time chasing down the problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: