I’ve had the privilege to interact directly with Willy (the main developer for many years, still the project lead) on the mailing list, in person at the conference, and even though I’ve never paid a dime, the interaction has been the best open-source experience I’ve ever had. Willy routinely writes multi-paragraph responses on the mailing list to my hair-brained suggestions for how HAProxy could be better for my company’s rather unique needs.
I often feel bad that I cannot do more for the project, because it is so well thought-through and delivered.
The software cuts no corners and delivers at fractions of the TCO of competition (limited by my opinion and experience, of course). For instance, just last week, I spun a rate limiting solution in half a morning, that mitigated some annoying proxy bots instantly (and is flexible enough to automatically block offenders without affecting legitimate users).
The stats, DNS support, and integrations and many other featured are second to none among other load balancers.
It’s an impressive and refreshing project. We could all learn something from HAProxy and the team.
I asked for a feature on GitHub (fetching SHA-2 fingerprint of client certificates, as opposed to SHA-1) and it was handled in one day. That was the feature that pulled me away from nginx (which still only provides SHA-1 via $ssl_client_fingerprint btw.)
As a load balancer, NGINX is subpar in almost every feature comparison, especially at the open-source (free to use) tier.
HAProxy gives you the following that are musts for load balancing (in my opinion), that NGINX does not, at least not easily:
1) A HTML (or JSON) stats page that precisely and completely tells you what’s going on at a high-level. A visit to this during outages is often all that’s required.
2) Support for DNS (and other) discovery mechanisms in a flexible way. (This is paid in NGINX)
3) Active health checks (also paid in NGINX)
4) The ACL system, while somewhat difficult to learn, is amazingly powerful.
5) Flexible L7 retries are brilliant.
We replaced NGINX with HAProxy and eliminated a whole class of bugs, micro-outages, and annoyances just by following HAProxy’s best practices.
I still use NGINX when I need a static web file server, though. :)
This is actually what I often recommend and often encounter in field: haproxy for LB, varnish serving as a smart cache, and nginx for the applications+static file serving.
All 3 components are free, combine extremely well because they've grown together, and are extremely efficient. This is important in virtualised or containerized environments where you want to save resources to minimize response time and leave the CPU for the applications.
Of course each of them can do a little bit of the other ones' job. This is fine, it allows easier initial deployments, but as your site grows, whichever you initially start with, you'll always end up installing the two other ones to constitute the most robust stack ever. And it's easy to insert one next to the others without having to break everything, which further adds to the fun.
I switched reddit from nginx to haproxy because nginx had a pathological problem where it would send 90% of the load to the first app server listed when there was a burst of requests (and 9% to the next, and .9% to the next and so on, leaving almost every app server idle).
I'm sure they fixed the problem by now, but it was not a good look for a load balancer.
HAProxy on the other hand has never failed me, and Wily (the creator) is super responsive if you have questions or need features.
I see a lot of people preferring Nginx, but I can't understand why they are so fascinated about it and try to squeeze it where it has no place. Nginx is just a web server that has some load balancing functionality, it is sub par with HAProxy.
I never seen people saying why are you using HAProxy instead of Apache, but Nginx load balancing functionality is closer to Apache than HAProxy.
I think people choose nginx because it does everything decently, even if it can't beat HAProxy for load-balancing or Varnish for caching (at least the FOSS nginx). I run a small-to-mmedium size application with multiple upstream (local web app tier, remote asset servers) and some basic vhost and caching use-cases, e.g. cache specific paths with different cache keys than others, different cache behaviours and life-times, request rate limiting, some local file serving. nginx does everything I need, without any additional modules. I get close enough to zero-downtime deployments by using multiple backends and proxy_next_upstream, haven't needed active healthchecks for the upstreams yet. I have JSON access logs, so my dashboard about vhost activity is in Kibana now. There are a few juicy features missing (cache PURGE, proxy request coalescing, extended stub status endpoint, JSON error log). Not trying to fanboy, but I wouldn't introduce additional complexity into my stack by running haproxy + varnish + nginx just to get the best of each tool, not at my current scale.
I choose it because I have a lot of experience with Nginx, and none with HAProxy. So I set it up as a LB, and it worked for my needs (at least from what I can tell).
One good reason is that HAProxy doesn't payment gate a bunch of the baseline features. I'm still surprised that NGINX doesn't come with active healthchecks in the free version - baseline feature for a load balancer in the world of treating computers as cattle.
HAProxy is such a great example of software that does what you expect, generally.
> HAProxy is such a great example of software that does what you expect, generally.
It's also rock solid and battle tested. I don't think I ever observed crash. It's typically one of tools that once you configure correctly it works for years without issues.
It got a lot better quickly, but running with thread was crashy with a lot of threads. I don't remember crashes with single threaded/forking mode though, and I ended up with a forking config when I was setting something up at my last job.
Totally agree. HAproxy is an incredible piece of software. It is both extremely fast, and very versatile, which is a rare combination. The 2.2 release checks off almost every item on my haproxy wishlist, which makes me very happy.
To all the developers of such a great piece of software, I offer you my gratitude.
Haproxy is such a nice piece of software, sensible configuration, very stable and versatile. And, one thing that is highly appreciated, I've never seen it do something I wasn't expecting it to. This quality of minimal surprises in its operation isn't going unnoticed by any measure.
There are some edges that were still a bit rough at least in 1.8. In particular, the way it handles reloading state from the state file, and the interactions with that state and the configuration file.
I don't remember the exact details, but the allowed formats of some identifiers were different, and it didn't do "the obvious thing" when the configuration and state file contained a different set of definitions, IIRC. This was consistent, but not very well documented.
It works fine for us now, so we like it, of course, but it took some production incidents to figure out how it all worked. (We did test before shipping, of course, but these were hard-to-predict edge cases.)
I love HA Proxy, but one thing I'm confused about is why the http-tunnel feature was removed in this release (and deprecated in earlier 2.x releases). http-tunnel allowed you to start a session with an HTTP request/response, then keep the socket to the backend alive without further inspection of the protocol.
This is useful for things like RTSP where you kick things off with HTTP but then stream lower level TCP content over the same socket. There are also lots of other custom protocols that benefit from this type of set up, including one that I'm working on wrangling HA Proxy to work with now.
Does anyone know if there's some replacement way to handle this in HA Proxy that I'm overlooking?
> why the http-tunnel feature was removed in this release
From the haproxy 2.0 documentation:
> This mode should not be used as it creates lots of trouble with logging and HTTP processing. And because it cannot work in HTTP/2, this option is deprecated and it is only supported on legacy HTTP frontends. In HTX, it is ignored and a warning is emitted during HAProxy startup.
As for a way to handle this, I believe if you are using an HTTP CONNECT or a websocket (Connection: Upgrade), then haproxy will detect that it is a tunnel, and handle that correctly. If that's not the case, you might be able to use haproxy in tcp mode.
I understand the nostalgia or liking the retro look, but the site is completely unreadable in mobile. There is absolutely no redeeming quality in that.
Claiming that an unreadable version of a site is better than a readable one is simply wrong.
In my opinion a site like this working perfectly on mobile is in the 'who cares' category. Almost nobody is going to haproxy.org on their smartphone with any intention of actually doing anything with the software. Almost nobody is installing or configuring haproxy from mobile.
It's information dense, which is much appreciated on desktop. The .com looks like every other generic 'look at our product!' site, and to actually do anything you have to sort through 5 different dropdowns and other UI items designed to grab your attention.
When I have to install or configure the software I want the .org, 100%.
> Almost nobody is going to haproxy.org on their smartphone with any intention of actually doing anything with the software.
This is where you get it entirely wrong. I read this news and I, as an extensive nginx/traffic user, wanted to check out haproxy to understand if it was worth a shot. The .org page is plagued with general usability and readability problems to the point that it's practically unreadable when compared with the page served through the .com domain. There is no way around it.
You don't fix problems by turning a blind eye and playing the denying card. More importantly, this sort of technical snafu is helps form the public image of the product, and thus this sort of poor performance reflects poorly on the product.
If you were actually serious about it you would just mentally note it and revisit when you were on a more capable device. Perusing for replacement infrastructure software via mobile is just a casual thing, and foolish if you're actually trying to get anything done.
Nobody is swapping out any tech via their mobile browser impression. All this is pretty complicated software and you're going to want to do a lot of reading/inspection before making decisions like that. That is not done via a 5" display.
I stand with my view that there is no problem to fix here. The site is clear & understandable to anyone who seriously plans on using it or is using it.
What's your point? Are you trying to argue that just because you imagine someone does not have a smartphone then it's ok to fool ourselves to believe that no one has a smartphone?
Because mobile has been a basic requirement and competency for, say, the last decade.
basic requirement and competency? Are you the CSS police?
I would've thought allowing user agent style sheets would be a basic requirement for a web browser but it seems that not everyone agrees with me, and sometimes you just have to accept that not everyone is on the same page as you.
> basic requirement and competency? Are you the CSS police?
No, I'm a potential haproxy user who is unable to access the site because whoever put it together either failed to follow basic CSS tutorials or has no idea that there are more reading surfaces than 19' 1980x1080 LCD monitors.
And the surprising thing is that here I am, in HN of all places, where readers are expected to be educated and somewhat informed and with a basic understanding, reading comments like yours. Baffling, to say the least.
Slightly OT, we have several different kinds of services that need rate limiting, written in different stacks. We would like to have one solution for rate limiting, ideally that we could put in front of any service, that was light weight, but also could work with AWS target groups that are already splitting traffic across nodes inside a service - so I believe that means some sort of clustered solution or at least communication. Is haproxy a good fit for this? Maybe nginx (paid)?
We use HAproxy for similar reasons you describe if I’m understanding correctly.
As one of the other posts kinda suggested you can get a ton done with a few hours, it might be worth just standing up a box real quick and trying it out. As a note when we try stuff like this we put behind a AWS LB so we can push partial traffic to our experiment and aren’t betting the farm whilst testing in prod.
Haproxy is brilliant, we've used it for years as a simple mesh on all our services, but we're considering moving to envoy as we need opentracing support to help understand how requests flow between services.
There's an opentracing integration coming very soon! We were hoping to have it available with the 2.2 release but there were still a few things to finalize.
I agree that Redis is "good", I wouldn't put it at the level of HAProxy - particularly when it comes to "what stuff is deliberately kept out of the open source version".
If I had to choose a project/tool to put at a similar 'level' as HAProxy in terms of: doing one thing well; a working open source project with a private company backing it; and a well run project, I'd say it's Varnish, which just happens to pair very well with HAProxy.
Haproxy isn’t quite as fast as iptables (we switched because of this) but it was delightful to configure. The tradeoff is definitely worth it in most cases.
That's because iptables do layer 3 load balancing (on a kernel level) while HAProxy does layer 7 (on user level). Layer 3 load balancing is much simpler so there's less work to do. If layer 3 load balancing does work for your use case then HAProxy wasn't the right tool.
BTW: Willy Tarreau (the author of HAProxy) is also a Linux kernel developer and made contributions in those areas, so . If you configure HAProxy to do zero copy forwarding you can get quite good performance.
We forward a cluster of 2,560 TPU pod cores from our GCE project to other GCE projects in europe-west4-a. Originally it was because we had a separate GCE project with a bunch of credits, but that project had no access to TPUs. The question was, could we still take advantage of the credits? It turns out, we could; the solution involved VPC Network Peering, which I later learned is how the TPUs themselves work. Some configuration details are here: https://www.shawwn.com/swarm#iptables
Nowadays we forward the TPU pods to pretty much anyone who wants to try them out, in hopes of getting more people involved in the TPU programming scene. The TPUs are managed via a website (https://www.tensorfork.com/tpus) and we coordinate TPU access via spreadsheet. Each researcher has their own GCE project, and we simply flip a switch to give them access.
If anyone reading this happens to be into ML and into programming for big hardware rigs, feel free to hop into the Tensorfork discord server and we can show you the ropes. https://github.com/shawwn/tpunicorn#ml-community
I've done some dead simple forwarding/load balancing work, and if you can do it with nat instead of a proxy application it'll use a lot less memory, in addition to less cpu.
That means fewer load balancers needed, or smaller machines (or both). So I'd say that means anytime you run out of capacity on your proxy machines would be an opportunity to look for other techniques. Haproxy is probably easier to use though, and would tend to need less work to get the features you want, though. So there's an opex/capex vs development time argument.
Hyperscaling Haproxy is a lot of fun too, though. There's a huge difference in connections/second between a normal config and a totally tuned config with haproxy and kernel patching on the table.
I’ve had the privilege to interact directly with Willy (the main developer for many years, still the project lead) on the mailing list, in person at the conference, and even though I’ve never paid a dime, the interaction has been the best open-source experience I’ve ever had. Willy routinely writes multi-paragraph responses on the mailing list to my hair-brained suggestions for how HAProxy could be better for my company’s rather unique needs.
I often feel bad that I cannot do more for the project, because it is so well thought-through and delivered.
The software cuts no corners and delivers at fractions of the TCO of competition (limited by my opinion and experience, of course). For instance, just last week, I spun a rate limiting solution in half a morning, that mitigated some annoying proxy bots instantly (and is flexible enough to automatically block offenders without affecting legitimate users).
The stats, DNS support, and integrations and many other featured are second to none among other load balancers.
It’s an impressive and refreshing project. We could all learn something from HAProxy and the team.