Ask HN: Has anyone had negative experiences with zeromq?

phintjens · on April 9, 2011

Before I opened the 0MQ Pandora box I'd given up coding and was happily writing my autobiography. 0MQ seemed like a pleasant way to spend a weekend. But before I knew it, I'd lost control. Days, weeks, months have passed now, all I think about are more subtle and perfect messaging patterns. They flash before my eyes. Weird names and topologies. It doesn't get better, but worse. Soon I'll be coding all nighters, my wife will leave me, my kids will forget me, and all I'll be doing is programming, motherfucker.

Seriously, 0MQ has made network programming fun (again) in a bad, addictive, way. Any design I can think of turns into real working code in a few hours, sometimes days. And I'm using C, a language that isn't normally fun to work in.

Right now, it's multithreaded clients and servers for resilient shared distributed hash maps. Tomorrow, network-wide logging. After that, another message broker. And so on.

Yes, it's a negative experience. I'd like my old lazy life back.

For the love of god, don't try it.

jrussbowman · on April 9, 2011

I've already been there working with all the apis I use on unscatter.com fortunately with 2 kids I have to stop. Though I have had some very tired weeks when I can't sleep cause I am up all night coding

m0th87 · on April 10, 2011

I have, and it amazes me that everyone speaks so positively of it. I'm not going to argue it's a bad library, or it's not worth investing it, but it's not without its problems. And I certainly don't think it's applicable to all distributed computation use cases.

This post on SO outlines some of the troubles I've had: http://stackoverflow.com/questions/4870814/is-zeromq-product...

A couple of other issues. First, it's really easy to game a ZeroMQ socket. Send it the wrong data, and your application will just fall flat on its face. So you can't run it anywhere that has untrusted computers (e.g. over the Internet).

Another issue I've had is a race condition that occurs when you call recv() before anything is in the queue. The method will continue to block even after it receives something. This is a big deal because it requires some workarounds with bad performance. But I wasn't able to get the jzmq dev team to reproduce it, so it must be something restricted to either OS X or just my system.

FWIW, I think most of the issues are restricted to jzmq, because there's a good deal more complexity running around in that project to overcome the Java <-> C bridge.

The reason I continue to use it anyway is because:

1) It's absurdly useful when it works.

2) The dev team is very responsive.

3) Bugs do get fixed if they can reproduce it. I already have had one issue resolved: https://github.com/zeromq/jzmq/issues/closed#issue/31

As for Tornado, I am in love with that technology. Rather than a framework, it's more like a set of libraries for HTTP communication. That has huge implications, and it feels much more pleasant for me to work with than, say, Django.

jrussbowman · on April 10, 2011

Thank you, this is the kind of information I was looking for

kordless · on April 10, 2011

Loggly uses 0MQ extensively and we'd be happy to sit down with you and chat about it. We were the ones that paid the 0MQ guys to bake disk persistance back into the new version. Also, as someone else mentioned, Zed knows it pretty well and he's awesome about taking time to teach what he knows to others.

jrussbowman · on April 10, 2011

I may take you up on that a some point. I tend to learn by jumping in the deep end, so I'll likely write code first but if I run into any questions I'll keep your offer to chat about it in mind.

Everything I'm doing is open source, I'm planning on using the Apache 2.0 license. The Github repo (without any code yet) is here - https://github.com/joerussbowman/Scale0

The README gives an overview of what I'm attempting to accomplish.

lanstein · on April 10, 2011

Kord, you are why I clicked on this :)

obiterdictum · on April 10, 2011

I can't say I have a lot of negative experience, but after evaluating it, I've come to the conclusion that it's not the right tool for the job for us. We develop trading systems and I wanted a decent messaging framework for internal non-speed-critical communication between apps. Disclaimer: I had limited time to evaluate it, so I may have some misconceptions about ZMQ, you have been warned.

1. Extensive use of asserts in release builds terrifies me. It's meant to check for conditions that shouldn't happen, but I see users complaining about their apps aborting with assertion failures on ZMQ mailing lists and it comes up fairly frequently in Google. There are a fair bit of asserts for error codes returned from system calls. I don't want a critical process crash because I've used library in a wrong way in a completely different part of the application.

2. Only in 2.1 they've fixed the problem where some messages would not be flushed and be lost if you terminate the process too early. This seems like a fairly common bug for younger projects. Recommended workaround is... calling "sleep" before you exit, which is one of the deadly sins of multithreaded programming! This and above point convinces me that ZMQ isn't as mature enough for me to be comfortable with.

3. Transparent reconnection is good, but some of our applications need to quickly detect that other nodes in the system are missing, which forces me to implement off-band heartbeat mechanism.

4. Threading model seems a bit awkward to me (last I checked). First of all, let me state that I personally believe that a library starting threads behind your back is a Bad Thing (unless it's a framework). ZMQ uses a sender thread that you queue your messages into, yet it forces you to dispatch your receive loop by either blocking read or zmq_poll. If it already starts threads by itself, why not provide a callback?

5. Not really a problem, but a missing feature: no way to demultiplex messages from a stream of messages, so you have to implement it yourself. You can subscribe to a subset of messages on a socket, but can't subscribe to multiple subsets from a single socket.

phintjens · on April 10, 2011

These are good points. Let me answer them.

1. The 0MQ devs originally got asserts backwards, using them to validate external input (e.g. on sockets) instead of internal consistency. We've been fixing this for a year or two now, and it's pretty good. You'll get assertion failures if you e.g. use sockets from multiple threads. Not so much if you pass bad stuff onto sockets.

2. 2.1 was a great step forwards, and the use of "sleep" was in toy examples. Real networking apps tend to run forever, so this message loss at exit wasn't a big deal. You're right that the product is still young.

3. Totally agreed, this lack of peer presence detection is annoying, and the source of some debate on the lists.

4. Threading model works fine for me, I've used it extensively. A usable reactor is a hundred lines of code, no more. See the libzapi zloop reactor, in C, for example.

5. Demultiplexing sounds like useful functionality but should probable sit above sockets.

arto · on April 10, 2011

In our (http://dydra.com) case, ZeroMQ's misuse of assertions proved to be the deal breaker.

We evaluated using ZeroMQ for inter-shard communication in our distributed database engine, and while ZeroMQ had a lot going for it, in the end everyone on the team was so frustrated with attempting to reproduce corner case assertions and debugging ZeroMQ internals that we decided to bid it good riddance.

We were working towards a deadline, and ZeroMQ in the end proved counterproductive towards that goal; during the time we used it, we ran into some half a dozen different ZeroMQ asserts, some of them very difficult to reproduce, and of which I believe we managed to solve or work around only two or three.

Some of the problem may be in expectations. If you go in expecting it to be a mature and solid black-box solution (as its versioning might seem to indicate), you may be expecting too much. If you mentally prefix a '0.' to the version number and think of it as fast-evolving alpha software, you'll be happier.

I still believe the concepts behind ZeroMQ to be viable and laudable ones. We may reevaluate it in the future, but not anytime soon.

cpeterso · on April 10, 2011

Those sound like pretty serious deficiencies for a library that purports to be a platform for serious applications.

I use tons of debug asserts in my code, but I'm slowly coming to the conclusion that asserts enable lazy coding. Most asserts should probably be replaced with proper logging and error codes.

jrussbowman · on April 10, 2011

Interesting post. Thank you.

tern · on April 9, 2011

ZeroMQ is used in the lubyk (formerly rubyk) project: http://lubyk.org/en

zedshaw · on April 9, 2011

Ooooh, that looks sexay. I will play with this now.

tern · on April 10, 2011

Also check out LuaAV, which is more mature and has some differences of philosophy: http://lua-av.mat.ucsb.edu/blog/

timf · on April 9, 2011

> "using Python Tornado and it to form an http caching reverse proxy"

It sounds like you should investigate http://mongrel2.org/home

jrussbowman · on April 9, 2011

I have. And honestly if I was in build a product and get something released because I'm building a business mode, I'd be using it I think. It's a pretty good fit for what I was looking for when I started.

Right now thougt I'm thinking learning experience with zeromq, and also I'm seeing how I can build something that can be used to scale an http application beyond a single datacenter/cloud and more. I actually find that pretty exciting and since I have young children and a good paying job, I'm still treating the product I'm working on as a hobby rather than a business.

zedshaw · on April 9, 2011

Yes, definitely do this. Don't let people try to convince you that you shouldn't reinvent the wheel. Typically they just have some wheel they've reinvented that they want you to use. Instead, you should implement as many things as you can to learn how they work, and then use this knowledge to select tools and avoid bullshit and marketing choices.

And who knows, maybe you'll do something better. That's progress.

Also check out gevent and eventlet to see the differences with those systems as well. I just submitted a patch to eventlet to give it better zeromq support.

jrussbowman · on April 9, 2011

And that's coming from the guy who wrote the mongrel2 wheel :)

jrussbowman · on April 9, 2011

Actually now you have me reconsidering even using tornado as the http server. I got more to research and think about

kqueue · on April 9, 2011

+1. I learn the most from reinventing wheels. And I reinvent them on purpose.

j2d2j2d2 · on April 10, 2011

You might like my tornado inspired mongrel2 handler, called Brubeck. It uses eventlet for concurrency too.

https://github.com/j2labs/brubeck

chuhnk · on April 9, 2011

I think Zed Shaw would also vouch for the brilliance of 0mq. He uses it in many places including mongrel2. The one thing he did mention in his pycon presentation was not to expose it to the internet as there are some assertions in the code which cause it to blow up on protocol errors.

pshc · on April 9, 2011

I'm kind of in the same boat. I'm evaluating it right now to see if it'd be suitable for iOS<-->server comms, but it seems more like something you'd use behind the server gateway.

Thing is 0MQ gives you transparent auto-reconnection--but I want to indicate with a spinner when that's happening--and it makes request-reply synchronous--but I already do everything asynchronously in the client anyway. Hmm.

docmarionum1 · on April 9, 2011

I've used it and it works great. The only problem I remember encountering was getting it to work with a virtualenv, but that was probably just inexperience on my part.

kemiller · on April 9, 2011

It sure seems amazing for what it is. I do with people would stop comparing it to message brokers -- solving for entirely different problems as far as I can tell.