35M Hot Dogs: Benchmarking Caddy vs. Nginx

mholt · on Sept 16, 2022

This is a great writeup overall. I was happy to see Tyler's initial outreach before conducting his tests [0]. However, please note that these tests are also being revised shortly after some brief feedback [1]:

- The sendfile tests at the end actually didn't use sendfile, so expect much greater performance there.

- All the Caddy tests had metrics enabled, which are known[2] to be quite slow currently. Nginx does not emit metrics in its configuration, so in that sense the tests are a bit uneven. From my own tests, when I remove metrics code, Caddy is 10-20% faster. (We're working on addressing that [3].)

- The tests in this article did not tune reverse proxy buffers, which are 4KB by default. I was able to see moderate performance improvements (depending on the size of payload) by reducing the buffer size to 1KB and 2KB.

I want to thank Tyler for his considerate and careful approach, and for all the effort put into this!

[0]: https://caddy.community/t/seeking-performance-suggestions-fo...

[1]: https://twitter.com/mholt6/status/1570442275339239424 (thread)

[2]: https://github.com/caddyserver/caddy/issues/4644

[3]: https://github.com/caddyserver/caddy/pull/5042

tylerjl · on Sept 16, 2022

Thanks, Matt! I've pushed the revised section measuring sendfile and metrics changes, so those should be accurate now.

Phew. Caches are purged, my errors are fixed. I can rest finally. If folks have questions about anything, I'm happy to answer.

tomcam · on Sept 16, 2022

Just want to say your writing is the best quirky balance of fun and substance, reminiscent of Corey Quinn [1]. Thanks for doing a so damn much work and for the instantly relatable phrase "Nobody is allowed to make mistakes on the Internet".

[1] https://www.lastweekinaws.com

tylerjl · on Sept 16, 2022

Thank you, that's very kind! There's a reason I included Corey's name in my hastily cobbled-together skeleton meme [1]. Hopefully my writing achieves that level of technical approachability.

[1]: https://blog.tjll.net/reverse-proxy-hot-dog-eating-contest-c...

tomcam · on Sept 16, 2022

How did I miss that. Anyway, you succeeded.

QuinnyPig · on Sept 16, 2022

That’s very kind of you to say. Do I get to put this on my resume?

tomcam · on Sept 16, 2022

THE MAN HIMSELF

I can die happy

fariszr · on Sept 16, 2022

> - All the tests had metrics enabled, which are known[1] to be quite slow. From my own tests, when I remove metrics code, Caddy is 10-20% faster.

But disabling metrics is not supported in standard Caddy, you need to remove specirc code and recompile to disable it.

So maybe benchmarking with it isn't fair to Nginx?

tialaramex · on Sept 16, 2022

Yeah, I think Fair comparisons are:

* How do these things perform by default. This is how they're going to perform for many users, because if it's adequate nobody will tune them, why bother.

* How do these things perform with performance configuration as often recommended online. This is how they'll perform for people who think they need performance but don't tune or don't know how to tune, this might be worse than default but that's actually useful information.

* How do these things perform when their authors get to tune them for our test workload. This is how they'll perform for users who squeeze every drop and can afford to get somebody to do real work to facilitate, possibly even hiring the same authors to do it.

In some cases I would also really want to see:

* How do these things perform with recommended security. A benchmark mode with great scores but lousy security can promote a race to the bottom where everybody ships insecure garbage by default then has a mode which is never measured and has lousy performance yet is mandatory if you don't think Hunter2 is a great password.

mholt · on Sept 16, 2022

> How do these things perform by default.

Agreed on this one -- today I'm looking at how to disable metrics by default and make them opt-in. At least until the performance regression can be addressed.

Update: PR opened: https://github.com/caddyserver/caddy/pull/5042 - hoping to land that before 2.6.

dQw4w9WgXcQ · on Sept 16, 2022

Good stuff dude. Listens to users, sees a problem, doesn't take it personally, makes a fix. Caddy's going places.

mholt · on Sept 16, 2022

I'm also grateful that Dave, the original contributor of the metrics feature, isn't taking it personally. We love the functionality! Just gotta refine it...

mholt · on Sept 16, 2022

> But disabling metrics is not supported in standard Caddy, you need to remove specirc code and recompile to disable it.

We're addressing that quite soon. Unfortunately the original contributor of the feature has been too busy lately to work on it, so we might just have to go the simple route and make it opt-in instead. Expect to see a way to toggle metrics soon!

Update: PR opened: https://github.com/caddyserver/caddy/pull/5042

philipwhiuk · on Sept 16, 2022

There's always going to be some cost to metrics, going forward you probably just want to document it and then update the figure as you tune it. Higher performance opt-in metrics are the sort of thing a company using it at scale ought to be able to help with/sponsor work on.

mholt · on Sept 16, 2022

Absolutely. The plan is to make it opt-in for now, and then having a company sponsor the performance tuning would be very welcomed. Otherwise it'll probably sit until someone with the right skills/know-how and time comes along.

Bilal_io · on Sept 16, 2022

That was fast. I love it!

hinkley · on Sept 16, 2022

> All the Caddy tests had metrics enabled

One of the great mysteries in (my) life is why people think that measuring things is free. It always slows things down a bit and the more precisely you try to measure speed, the slower things go.

I just finished reducing the telemetry overhead for our app by a bit more than half, by cleaning up data handling. Now it's ~5% of response time instead of 10%. I could probably halve that again if I could sort out some stupidity in the configuration logic, but that still leaves around 2-3% for intrinsic complexity instead of accidental.

gordian_NOT · on Sept 16, 2022

I feel like we never see HAProxy in these reverse proxy comparisons. Lots of nginx, Apache, Caddy, Traefik, Envoy, etc.

The HAProxy configuration is just as simple as Caddy for a reverse proxy setup. It's also written in C which is a comparison the author makes between nginx and Caddy. And it seems to be available on every *nix OS.

tempest_ · on Sept 16, 2022

I am not sure I would agree with the assertion that config for HAProxy is just as easy.

In fact I use HAProxy in production pretty regularly because it is solid but its config one of the main reasons I would choose something else.

A basic HAProxy config is fine but it feels like after a little bit each line is just a collection of tokens in a random order that I have to sit and think about to parse.

bmurphy1976 · on Sept 16, 2022

I feel the same way. I'm not a fan of haproxy's configuration system. It's really difficult for me to understand it, whereas I feel I can read most nginx/apache configs and immediately know what is supposed to be happening. I still maintain servers under load in production that use all three to this day and I always go back to nginx because of the configuration alone.

kilburn · on Sept 16, 2022

I can't comment on haproxy because I haven't used it enough, but I think that the "nginx's config is easy to grasp" posture has a bit of Stockholm syndrome in it.

- Do you want to add a header in this "location" block? Great, you better remember to re-apply all the security headers you've defined at a higher level (server block for instance) because of course adding a new header will reset those.

- Oh, you mixed prefix locations with exact location with regex locations. Great, let's see if you can figure out by which location block will a request end up being processed. The docs "clearly" explain what the priority rules for those are and they're easy to grasp [1].

- I see you used a hostname in a proxy_pass directive (e.g.: http://internal.thing.com). Great, I will resolve it at startup and never check again, because this is the most sensible thing to do of course.

- Oh... now you used a variable (e.g.: http://$internal_host). That fundamentally changes things (how?) so I'll respect the DNS's TTL now. Except you'll have to set up a DNS resolver in my config because I refuse to use the system's normal resolver because reasons.

- Here's an `if` directive for the configuration. It sounds extremely useful, doesn't it? Well.. "if is evil" [2] and you should NOT use it. There be dragons, you've been warned.

I could go on... but I think I've proved my point already. Note that these are not complaints, it's just me pointing out that nginx's configuration has its _very_ significant warts too.

[1] https://nginx.org/en/docs/http/ngx_http_core_module.html#loc...

[2] https://www.nginx.com/resources/wiki/start/topics/depth/ifis...

hinkley · on Sept 16, 2022

All that may be true, but for a lot of us old timers we were coming from apache to nginx and apache's configs can eat a bag of dicks.

Unfortunately it's likely I worked in the same building as one of the people responsible for either creating or at least maintaining that mess, but I didn't know at the time that he needed an intervention.

TimWolla · on Sept 16, 2022

Exactly all of this. I've mentioned the first point about add_header redefining instead of appending in a previous HN comment of mine: https://news.ycombinator.com/item?id=27253579. As mentioned in that comment, HAProxy's configuration is much more obvious, because it's procedural. You can read it from top to bottom and know what's happening in which order.

Disclosure: Community contributor to HAProxy, I help maintain HAProxy's issue tracker.

slivanes · on Sept 16, 2022

Yes, I've experienced most of these with nginx and it can be a minefield. The best experience I've had configuring a webserver was lighttpd.

iforgotpassword · on Sept 17, 2022

Yes, also a lighttpd fan here. It's not as refined a reverse proxy, but for the halfway trivial use case it's doing fine, and as a web server it's light (duh) and fast. Much more readable configuration than nginx or apache.

bmurphy1976 · on Sept 16, 2022

To be clear I never said it was easy. I have a LOT of issues with Nginx's configuration, I just find it to be significantly less bad than the other options.

Other than Caddy, Caddy has been great so far but I have only used it for personal projects.

gunapologist99 · on Sept 16, 2022

For simple things, Caddy is nice and easy, but I've struggled with Caddy quite a bit, too, especially for more complex setups. I usually break out haproxy or nginx for really challenging setups, because caddy's documentation and examples are quite sparse (esp v2)

mholt · on Sept 16, 2022

What do you struggle with about the documentation or "more complex setups"? I was just on the phone recently with Stripe who has a fairly complex, large-scale deployment, and they seemed to have figured it out with relative ease.

I'm currently on a push to improve our docs, especially for beginners, so feel free to review the changes and leave your feedback: https://github.com/caddyserver/website/pull/263

GordonS · on Sept 17, 2022

I'm not the GP, but I've had a similar experience, with the Caddy docs seeming good for simple configurations, but lacking for more complex ones.

I last setup a Caddy config maybe 6- 9 months ago, and everything related to client certificates was either scantily documented, wrongly documented, or not documented at all. It might be I got unlucky, as some of the client cert features were fairly new, but it wasn't a great experience.

Still, I much prefer Caddy's config system to Nginx or HAProxy.

Oh, something else I'd really love to see are more fully-featured example configs, as it can be hard to know how to start sometimes.

mholt · on Sept 19, 2022

Examples of "fully-featured" what though? The more features you add, the more combinations of ways to configure things there are.

The problem with "fully-featured" examples is that people copy and paste instead of learn how the software works. I'd rather our user base be skilled crafting configuration.

Generally we recommend that big examples go into our wiki: https://caddy.community/c/wiki/13

gunapologist99 · on Sept 19, 2022

I agree completely. Fully-featured examples would be extremely helpful, especially with things like proxying to different servers for different paths and domains, static file serving set up differently for different paths and domains, various plugin setups, etc.

mholt · on Sept 19, 2022

Oh, just saw this. You wrote your comment while I wrote mine. If you can enumerate specifically what you want to see, please submit it to our issue tracker: https://github.com/caddyserver/website

Generally we encourage examples in our community wiki though: https://caddy.community/c/wiki/13 -- much easier to maintain that way.

tylerjl · on Sept 16, 2022

FWIW I'm a big fan of HAProxy as well, but I was just constrained by the sheer volume of testing and how rigorous I intended to be. Maybe once my testing is a little more generalized I can fan out to additional proxies like HAProxy without too much hassle, as I'd love to know as well.

tomohawk · on Sept 16, 2022

Would love to see this

fullstop · on Sept 16, 2022

I would also like to see benchmarks for reverse proxies with TLS termination.

mholt · on Sept 16, 2022

I think one reason a lot of benchmarks don't include TLS termination is because it's often impractical in the real-world, where most clients reuse the connection and the TLS session for many requests, thus making them negligible in the long run. And given hardware optimizations for cryptographic functions combined with network round trips, you end up benchmarking the network and the protocol more than its actual implementation, which is often upstream from the server itself anyway.

Go's TLS stack is set to be more efficient and safer in coming versions thanks to continued work by Filippo and team.

nerdponx · on Sept 16, 2022

Maybe it would be a useful benchmark to simulate a scenario like "my site got posted on HN and now I'm getting a huge number of unique page views."

mholt · on Sept 16, 2022

Sure, we've already done this very real test in production a number of times and Caddy doesn't even skip a beat. (IMO that's the best kind of benchmark right there. No need to simulate with pretend traffic!)

CoolCold · on Sept 16, 2022

Any idea on how much traffic could be from HN? I doubt more than 100 rps or any other noticeable load

viraptor · on Sept 16, 2022

Around 100k/day with lots of requests concentrated around the start. Still mostly rpm rather than rps.

CoolCold · on Sept 17, 2022

This is kinda nothing even for 1vCPU Load balancer, even Apache can handle it I think

capableweb · on Sept 16, 2022

Yeah, this tends to be (in my cases) where response times suffer the most, unless your bottleneck is I/O to/from the backend or further away

porker · on Sept 16, 2022

h2o [1] was excellent when I tried it for TLS termination, beating hitch in my unscientific tests. And it got http/2 priorities right. It's a shame they don't make regular releases.

1. https://github.com/h2o/h2o/

snowwrestler · on Sept 16, 2022

I’m surprised Varnish is not mentioned much either. For a while there it had a reputation as the fastest reverse proxy. I think its popularity was harmed by complex config and refusal to handle TLS.

pbowyer · on Sept 16, 2022

It's always been blisteringly fast when we've used it, and I like the power of the configuration (it has its quirks but so do most powerful systems). But the overhead of setting it up and maintaining it due to having to handle TLS termination separately puts me off using it when other software is 'good enough'. If Varnish Enterprise was cheaper I would have bought it, but at their enterprise prices no way.

I'm keeping a watching brief on https://github.com/darkweak/souin and its Caddy integration to see if that can step up and replace Varnish for short-lived dynamic caching of web applications. Though I've lost track of its current status.

darkweak · on Sept 16, 2022

Amazing that you're talking about Souin and it's possible usage to replace Varnish. Let me know if you have question about the configuration or implementation. ATM I'm working on the stabilization branch to have a more stable version and merge the improvements into the caddy's cache-handler module.

rapind · on Sept 17, 2022

Varnish is a must for my production apps. The grace period ability to serve stale cache while passing a request through to get the latest is just huge and I can’t live without it.

RcouF1uZ4gsC · on Sept 16, 2022

> The HAProxy configuration is just as simple as Caddy for a reverse proxy setup.

Does HAProxy have built in support for Let’s Encrypt?

That is one of my favorite features. Caddy just automatically manages the certificates for https.

abdusco · on Sept 16, 2022

I use caddy mostly as a reverse proxy in front of an app. It's just one line in the caddy file:

    sub.domain.com {
      # transparent proxy + websocket support + letsencrypt TLS
      reverse_proxy 127.0.0.1:2345
    }

It's a fresh breath of air to have server with sensible defaults after dealing with apache and nginx (haproxy isn't much better in that regard).

mholt · on Sept 16, 2022

If that's your whole Caddyfile, might as well not even use a config file:

    caddy reverse-proxy --from sub.domain.com --to :2345

Glad you like using Caddy!

bmurphy1976 · on Sept 16, 2022

Personally I still recommend the config file. Even when they are simple, it gives you one single source of truth that you can refer to, it will grow as you need it, and it can be stored in source control.

Where and how parameters are configured is a bit more of a wild card and dependent on the environment you are running in.

francislavoie · on Sept 16, 2022

That's something Matt and I tend to disagree on - I agree that a config file is better almost always because it gives you a better starting point to experiment with other features.

mholt · on Sept 16, 2022

Hey, I mean, I do agree that a config file is "better" most of the time -- but having the CLI is just so awesome! :D

CoolCold · on Sept 16, 2022

I still cannot make myself to try Caddy.. in things like this looks sweet but just may be 5% of the functionality [I care about]. Not saying it's not possible, but with Nginx I already know how to do list of CORS, OPTIONS , per location & cookie name caching. Issuing certs is probably simplest and the last thing in config setups of reverse proxying.

TimWolla · on Sept 16, 2022

It does not, because HAProxy does not perform any disk access at runtime and thus would be unable to persist the certificates anywhere. Disks accesses can be unpredictably slow and would block the entire thread which is not something you want when handling hundreds of thousands of requests per second.

See this issue and especially the comment from Lukas Tribus: https://github.com/haproxy/haproxy/issues/1864

Disclosure: Community contributor to HAProxy, I help maintain HAProxy's issue tracker.

mholt · on Sept 16, 2022

That issue has some good explanation, thanks. I wonder if a disk-writing process could be spun out before dropping privileges?

> Disks accesses can be unpredictably slow and would block the entire thread which is not something you want when handling hundreds of thousands of requests per second.

This is not something I see mentioned in the issue, but I don't see why disk accesses need to block requests, or why they have to occur in the same thread as requests?

TimWolla · on Sept 16, 2022

When reading along: Keep in mind that I'm not a core developer and thus are not directly involved in development, design decisions, or roadmap. I have some understanding of the internals and the associated challenges based on my contributions and discussions on the mailing list, but the following might not be entirely correct.

> I wonder if a disk-writing process could be spun out before dropping privileges?

I mean … it sure can and that appears the plan based on the last comment in that issue. However the “no disk access” policy is also useful for security. HAProxy can chroot itself to an empty directory to reduce the blast radius and that is done in the default configuration on at least Debian.

> but I don't see why disk accesses need to block requests

My understanding is that historically Linux disk IO was inherently blocking. A non-blocking interface (io_uring) only became available fairly recently: https://stackoverflow.com/a/57451551/782822. And even then it's a operating system specific interface. For the BSD's you need a different solution.

If your process is blocked for even one millisecond when handling two million of requests per second (https://www.haproxy.com/de/blog/haproxy-forwards-over-2-mill...) then you drop 2k requests or increase latency.

> or why they have to occur in the same thread as requests?

“have” is a strong word, of course nothing “has” to be. One thing to keep in mind is that HAProxy is 20 years old and apart from possibly doing Let's Encrypt there was no real need for it to have disk access. HAProxy is a reverse proxy / load balancer, not a web server.

Inter-thread communication comes with its own set of challenges and building something reliable for a narrow use case is not necessarily worth it, because you likely need to sacrifice something else.

As an example at scale you can't even let your operating system schedule out one of the worker threads to schedule in the “disk writer” thread, because that will effectively result in a reduced processing capacity for some fractions of a second which will result in dropped requests or increased latency. This becomes even worse if the worker holds an important lock.

gordian_NOT · on Sept 16, 2022

It's not as turn-key as Caddy, that's for sure, but it's there: https://www.haproxy.com/blog/lets-encrypt-acme2-for-haproxy/

ei8ths · on Sept 16, 2022

this is great, i'll implement this soon as my current cert is about to expire and have been wanting to get haproxy on lets encrypt.

fullstop · on Sept 16, 2022

Built-in? Not exactly, but there is an acmev2 implemention from haproxytech: https://github.com/haproxytech/haproxy-lua-acme

gog · on Sept 16, 2022

HAProxy does not serve static files (AFAIK), so for some stacks you need to add nginx or caddy after haproxy as well to serve static files and forward to a fastcgi backend.

tomohawk · on Sept 16, 2022

nginx started out as a web server and over time gained reverse proxy abilities.

haproxy started out as a proxy and has gained some web server abilities, but is all about proxying.

haproxy has less surprises as a reverse proxy than nginx does. Some of the defaults for nginx are appropriate for web serving, but not proxying.

asb · on Sept 16, 2022

I wrote up a few notes on my Caddy setup here https://muxup.com/2022q3/muxup-implementation-notes#serving-... which may be a useful reference if you have a static site and wanted to tick off a few items likely on your list (brotli, http3, cache-control, more fine-grained control on redirects).

I don't think performance is ever going to matter for my use case, but one thing I think is worth highlighting is the quality of the community and maintainership. In a thread I started asking for feedback on my Caddyfile (https://caddy.community/t/suggestions-for-simplifying-my-cad...), mholt determined I'd found a bug and rapidly fixed it. I followed up with a PR (https://github.com/caddyserver/website/pull/264) for the docs to clarify something related to this bug which was reviewed and merged within 30 minutes.

mholt · on Sept 16, 2022

Thanks for the comments and participation!

I'm still thinking about that `./`-pattern-matching problem. Will probably have to be addressed after 2.6...

fariszr · on Sept 16, 2022

A very helpful post, thanks for sharing it!

nickjj · on Sept 16, 2022

I'm surprised no benchmarks were done with logging turned on.

I get wanting to isolate things but this is the problem with micro benchmarks, it doesn't test "real world" usage patterns. Chances are your real production server will be logging to at least syslog so logging performance is worth looking into.

If one of them can write logs with 500 microseconds added to reach request but the other takes 5 milliseconds that could be a huge difference in the end.

mholt · on Sept 16, 2022

Caddy's logger (uber/zap) is zero-allocation. We've found that the writing of the logs is often much slower, e.g. printing to the terminal or writing to a file. And that's a system problem more than a Caddy one. But the actual emission of logs is quite fast last time I checked!

nickjj · on Sept 16, 2022

I think your statement is exactly why logging should have been turned on, at least for 1 of the benchmarks. If it's a system problem then it's a problem that both tools need to deal with.

If one of them can do 100,000 requests per second but the other can do 80,000 requests per second but you're both capped at 30,000 requests per second because of system level limitations then you could make a strong case that both products perform equally in the end.

tylerjl · on Sept 16, 2022

This is - along with some reverse proxy settings tweaks - one of the variables I'd be keen to test in the future, since it's probably _the_ most common delta against my tests versus real-world applications.

tylerjl · on Sept 16, 2022

Hey y'all, author here. Traffic/feedback/questions are coming in hot, as HN publicity tends to engender, but I'm happy to answer questions or discuss the findings generally here if you'd like (I'm looking through the comments, too, but I'm more likely to see it here).

jacooper · on Sept 16, 2022

The black color for caddy in the charts is very hard to read in dark mode it would be great if you can change it to other colors

Havoc · on Sept 16, 2022

Close enough to not matter in most use cases. ie pick whatever is convenient

shabbatt · on Sept 16, 2022

This is the answer I was looking for but sadly, this type of insignificance becomes ammunition for managers/founders who are obsessed with novelty

zivkovicp · on Sept 16, 2022

This is almost always the case, no matter the service we're talking about.

no_time · on Sept 16, 2022

Is this how we ended up with electron for desktop applications and Java for backend?

philipwhiuk · on Sept 16, 2022

Yes, because developers and expensive and so developer productivity dominates almost everything else.

eddieroger · on Sept 16, 2022

Also, "pick what you know" applies here, too. If you know NGINX, then all you get from switching to Caddy is experience, and likewise, vice versa.

mholt · on Sept 16, 2022

*and memory safety*

This cannot be understated. Caddy is not written in C! And it can even run your NGINX configs. :) https://github.com/caddyserver/nginx-adapter

excitom · on Sept 16, 2022

A solution in search of a problem.

5d8767c68926 · on Sept 16, 2022

Nginx security page [0] lists a non-zero amount of exploitable issues rooted in manual memory management.

[0] https://nginx.org/en/security_advisories.html

5d8767c68926 · on Sept 16, 2022

When would it matter? I write in Python, so performance was never a concern for me, but I am curious the scenarios in which this was likely to be the weakest link in real workloads.

Given available options, I will take the network software written in a memory safe language every time.

jiripospisil · on Sept 16, 2022

I'm impressed with Caddy's performance. I was expecting it to fall behind mainly due to the fact it's written in Go but apparently not. It's a bit disappointing that it's slower in reverse proxying, as that's one of the most important use cases, but now that it's identified maybe they can make some improvements. Finally, there really should be a max memory / max connections setting (maybe there is?).

zekica · on Sept 16, 2022

Goroutines are efficient enough, and Go compiles to native code. I'm sure that Rust/Tokio or handcrafted C can be faster, but I think Go is fast enough for 99% of use cases.

I'm building a service manager à la systemd in Go as a side project, and I really like it - it's not as low level as Rust and has a huge runtime but it is impressively fast.

pjmlp · on Sept 16, 2022

I am not a big fan of Go's design, however that is exactly one reason I tend to argue for it.

There is enough juice in compiled managed languages that expose value types and low level features, it is a matter to learn how to use the tools on the toolbox instead of taking always the hammer out.

shabbatt · on Sept 16, 2022

The only reason for me to consider Caddy was reverse proxy. Now that reason is gone and I'm happy with nginx

kijin · on Sept 16, 2022

If you tell nginx to limit itself to 4 workers x 1024 connections per worker = 4096 connections, and hurl 10k connections at it, of course it's going to throw errors. It's doing exactly what you told it to do.

That's just one example of how OP's "optimized" nginx config is barely even optimized. There are lots of other variables that you can tweak to get even better performance and blow Caddy out the window, but those tweaks are going to depend on the specific workload you expect to handle. There isn't a single, perfectly optimized, set of values that's going to work for everyone.

The beauty of Caddy is that you get most of that performance without having to tweak anything.

teknopaul · on Sept 16, 2022

Nginx scales to 1,000,0000 workers per vm in my tests, but bandwidth is silly.

I got those results by seriously limiting the junk in http headers. Not with real browsers.

If you have that demand for any commercial service, you have money to distribute your load globally across more than one nginx instance.

samcrawford · on Sept 16, 2022

Great write-up! One question I had was around the use of keepalives. There's no mention in the article of whether keepalives were used between the client and reverse proxy, and no mention of whether it was used between the reverse proxy and backend.

I know Nginx doesn't use keepalives to backends by default (and I see it wasn't setup in the optimised Nginx proxy config), but it looks like Caddy does have keepalives enabled by default.

Perhaps that could explain the delta in failure rates, at least for one case?

mholt · on Sept 16, 2022

Are you talking about HTTP keepalive or TCP keepalive?

Keepalives can actually reduce the performance of a server with many concurrent clients (i.e. a benchmark test), and have other weird effects on benchmarks: https://www.nginx.com/blog/http-keepalives-and-web-performan...

teknopaul · on Sept 16, 2022

Same thing. Http has no keep alive feature, you don't send http keep alive requests, if http 1.1 asks for keepalives it's a tcp thing.

mholt · on Sept 16, 2022

They are distinct in Go. The standard library uses "HTTP keep-alive" to mark connections as idle based on most recent HTTP request, whereas TCP keep-alive checks only ACKs.

skyde · on Sept 17, 2022

http 2.0 has keep alive!

davidjfelix · on Sept 16, 2022

FYI to author (who is in the comments): you may want to prevent the graphs from allowing scroll to zoom, I was scrolling on the page and the graphs were zooming in and out.

lenkite · on Sept 17, 2022

Article made nearly unreadable by conflating scrolling and zooming. Too much fiddling needed. Could have just added a range slider for folks who wanted to expand graphs.

pdhborges · on Sept 16, 2022

The author linked to wrk2 but I think he ended up using a k6 executor that exhibits the problem wrk2 was designed to solve.

hassy · on Sept 16, 2022

Yep, k6 suffers from coordinated omission [1] with its default settings.

A tool that can send a request at a constant rate i.e. wrk2 or Vegeta [2] is a much better fit for this type of a performance test.

1. https://www.scylladb.com/2021/04/22/on-coordinated-omission/

2. https://github.com/tsenart/vegeta

imiric · on Sept 16, 2022

With its default settings, yes, but k6 can be configured to use an executor that implements the open model[1].

See more discussion here[2].

[1]: https://k6.io/docs/using-k6/scenarios/arrival-rate/

[2]: https://community.k6.io/t/is-k6-safe-from-the-coordinated-om...

hassy · on Sept 17, 2022

Yes indeed but wrk2 or Vegeta is still better for this particular use case (unless k6 has support for setting a constant RPS rate, afaik it does not), as otherwise the overhead of establishing a new TCP connection for a single HTTP request will dominate the benchmark.

imiric · on Sept 18, 2022

> unless k6 has support for setting a constant RPS rate, afaik it does not

Yes it can, via the `constant-arrival-rate` executor[1].

> the overhead of establishing a new TCP connection for a single HTTP request will dominate the benchmark

By default, k6 will reuse TCP connections, and you have to explicitly disable it[2].

I'm not saying that wrk2 or Vegeta wouldn't be a good fit for this test, but k6 is also capable of it, with some minor configuration changes.

[1]: https://k6.io/docs/using-k6/scenarios/executors/constant-arr...

[2]: https://k6.io/docs/using-k6/k6-options/reference#no-connecti...

tylerjl · on Sept 16, 2022

Damn. This is probably worth swapping out k6 for if I manage to pull off a second set of benchmarks. Thanks for the heads-up.

anonymouse008 · on Sept 16, 2022

> Wew lad! Now we’re cooking with gas.

This is the new gold standard for benchmarks!

OP / Author, stupendously tremendous job. The methodology is defensible and sensible. Thank you for doing this on behalf of the community.

lelandfe · on Sept 16, 2022

Seconding!

I am also in love with the friendliness and tone of the article. I’m a complete dummy when it comes to stuff like this and still understood most of it. Feynman would be proud.

tylerjl · on Sept 16, 2022

That's very kind of you to say, thank you!

mholt · on Sept 16, 2022

Yeah, Tyler did an amazing job.

boberoni · on Sept 16, 2022

The killer feature of Caddy for me is that it handles TLS/HTTPS certificates automatically for me.

I only ever use Caddy as a reverse proxy for web apps (think Flask, Ruby on Rails, Phoenix Framework). My projects have never needed high performance, but if my projects ever take off, it's nice to see that Caddy is already competitive with Nginx on resilience, latency, and throughput.

teknopaul · on Sept 16, 2022

Worker_connections 1024;

Hello?

http://xtomp.tp23.org/book/100kcc.html

Try worker_connections 1000000;

Sugimot0 · on Sept 16, 2022

I'd love to see a comparison like this for caddy and pingora once pingora is open sourced. Here's the HN link for the announcement. https://news.ycombinator.com/item?id=32836661

bagels · on Sept 16, 2022

I think I don't get the joke. What does the X-Hotdogs header do?

Arnavion · on Sept 16, 2022

The header does nothing. As the article says, the author sent the header in every request, made a total of 35M requests, and thus gained a reason to use 35M hot dogs in the article title.

tylerjl · on Sept 16, 2022

Correct. Maybe it blows my credibility out of the water and I'll be shamed for life, who knows

francislavoie · on Sept 17, 2022

Have you heard of "hotdog eating contests"? Or any other kind of eating contest? The joke is that he's having the two servers compete in "eating hotdogs" because the requests have a hotdog header. I think, anyway.

la_fayette · on Sept 16, 2022

These are interesting tests.considering the energy cost of large software systems it would be also intersting, which of these two has a lower co2 footprint.

petecooper · on Sept 16, 2022

I'm an Nginx guy, and I have been for some years, but I do love a little bit of Caddy jingoism[1] as the weekend approaches.

This is a good write up. I was expecting Caddy to trounce Nginx, but that wasn't the case. I'll be back to re-read this with fresh eyes tomorrow.

[1] For the avoidance of doubt, this is not meant as a snarky observation.

mholt · on Sept 16, 2022

You were expecting Caddy to "trounce" nginx? Most people expect the opposite.

But Caddy certainly does in some cases, especially with the upcoming 2.6 release.

petecooper · on Sept 16, 2022

> You were expecting Caddy to "trounce" nginx? Most people expect the opposite.

I absolutely was, yes. As an observer I see a lot of people saying positive things about Caddy around here, and how it’s superior performance-wise to a variety of ‘classic’ httpd software. Lots of people love Caddy, and they’re quite vocal, so it’s not a stretch to assume there are reasons why they love it. Nginx development has slowed since the events in Ukraine, unsurprisingly, so again it’s not a leap to surmise Caddy is making good things happen in the meantime.

mholt · on Sept 16, 2022

Ahh, right -- so there's a lot more to performance than just req/sec and HTTP errors. And that's probably the love/hype you're hearing about. (Though Caddy's req/sec performance is quite good too, as you can see!)

Caddy scales better than NGINX especially with regards to TLS/HTTPS. Our certificate automation code is the best in the industry, and works nicely in clusters to coordinate and share, automatically.

Caddy performs better in terms of security overall. Go has stronger memory safety guarantees than C, so your server is basically impervious to a whole class of vulnerabilities.

And if you consider failure modes, there are pros and cons to each, but it can definitely be argued that Caddy dropping fewer requests than nginx (if any at all!) is "superior performance".

I'm actually quite pleased that Caddy can now, in general, perform competitively with nginx, and hopefully most people can stop worrying about that.

And if you operate at Cloudflare-"nginx-is-now-too-slow-for-us"-scale, let's talk. (I have some ideas.)

CoolCold · on Sept 16, 2022

Can you add details on _scales better_, what do you mean? I've read recent post from Cloudflare on their thread pool and it makes sense, do you mean things of that sort or?

I had the case when after push notifications mobile clients wakeup and all of them doing TLS handshake to LoadBalancers (Nginx), hitting cpu limit for minute or so, but otherwise had no problem with 5-15k rps and scaling.

mholt · on Sept 16, 2022

Caddy does connection pooling (perhaps differently than what Cloudflare's proxy does, we'll have to see once they open source it) just as Go does. But what Caddy does so well is scale well with the number of certificates/sites.

So we find lots of people using Caddy to serve tens to hundreds of thousands of sites with different domain names because Caddy can automate those certificates without falling over. (Huge deployments like this will require a little more config and planning, but nothing a wiki article [0] can't help with. You might also want sufficient hardware to keep a lot of certs in memory, etc.)

Also note that rps is not a useful metric when TLS enters the picture, as it says nothing useful about the actual TLS impact (TLS connection does not necessarily correlate to HTTP request - and there are many modes for TLS connections that vary).

[0]: https://caddy.community/t/serving-tens-of-thousands-of-domai...

CoolCold · on Sept 17, 2022

> So we find lots of people using Caddy to serve tens to hundreds of thousands of sites with different domain names because Caddy can automate those certificates without falling over.

Okay, interesting. It seems their operation mode is quite different from what I used for/see around.

I wonder how they do it for active / passive LB setup, internal services (not accessible over internet for http challenge and so on) , probably it's not their case though.

Not saying it's not useful, just so minor part of the other things for my operations burden.

mholt · on Sept 17, 2022

> I wonder how they do it for active / passive LB setup, internal services (not accessible over internet for http challenge and so on) , probably it's not their case though.

It is, actually!

Caddy automatically coordinates with other instances in its cluster, which means simply sharing the same storage (file system, DB, etc.) -- so it works great behind LB. Caddy's reverse proxy also offers powerful load balancing capabilities similar to and, in some ways, superior to, what you find in HAProxy, nginx, etc. Caddy uses the TLS-ALPN challenge and HTTP challenge by default, automatically fails over to another when one doesn't work, and even learns which one is more successful and prefers that over time.

Caddy can also get certificates for internal use, both from public CAs using the DNS challenge, or from its own self-managed CA which is also totally automated.

It turns out that these abilities save some companies tens of thousands of dollars per year!

CoolCold · on Sept 17, 2022

Sounds cool, I need to get familiar with the docs, thanks for your answers!

bhawks · on Sept 17, 2022

Interesting article but from a data viz perspective things would be a ton easier to read with a chart per load type comparing nginx, optimized nginx and caddy. A single megachart with a dozen different colored bars is not easily parsable.

fullstop · on Sept 16, 2022

The author's writing style reminds me of Andy Weir's Project Hail Mary or The Martian.

CoolCold · on Sept 16, 2022

Would be nice to have SO_REUSEPORT on Nginx optimized - if I read configuration right, it was not used

cies · on Sept 16, 2022

I like Caddy, but on prod I do not need SSL (off-loaded by the LB), so I stick to nginx after reading this.

Guess I'm waiting for Cloudflare to FLOSS-release their proxy https://news.ycombinator.com/item?id=32864119 :)

AtNightWeCode · on Sept 17, 2022

Interesting. Maybe I read too fast but what data was it benchmarked with? A common flaw is to not test with real world data. Most benchmarks are wrong. Would guess this is too. My gut feeling is that Nginx is significantly faster.

skyde · on Sept 16, 2022

TLDR: "Nginx will fail by refusing or dropping connections, Caddy will fail by slowing everything down"

To me it seem that Caddy suffer from BufferBloat. Under heavy congestion the goodput (useful throughput) will drop to 0 because client will start timing-out before the server get a chance to respond.

Caddy should use an algorithm similar to : https://github.com/Netflix/concurrency-limits

Basically check what was the best request latency, and decrease concurrency limit until latency stop improving.

mholt · on Sept 16, 2022

Thanks, we'll consider that, maybe as an option. Want to open an issue so we don't forget?

I'd probably learn toward CUBIC: https://en.wikipedia.org/wiki/CUBIC_TCP

(I implemented Reno in college, but times have changed)

The nice thing about Caddy's failure mode is that the server won't give up; the server has no control over if or when a client will time out, so it I never felt it made much sense to optimize for that.

skyde · on Sept 17, 2022

thanks for considering the feedback! What i was trying to explain is Server not giving up is never a good thing under congestion.

This is why in ip switches they drop packet when the outbound queue is full. And letting the caller retry sending the packet.

When there is no congestion it’s ok for the server to not timeout at all and wait for the queue to drain and let the client decide when to close the tcp connection.

but when there is congestion you don’t want the queue inside the server to get too long. Because the server ( including caddy) might be behind a tcp load-balancer so it’s better for the client to queue it’s request inside another Caddy instance that is less busy.

feel free to reach out to me at maxime.caron@gmail.com

I would be happy to try to add it to Caddy if you are interested

metaltyphoon · on Sept 16, 2022

I wonder how this compares to YARP.

stefantalpalaru · on Sept 16, 2022

> I’ll build hosts with Terraform (because That’s What We Use These Days) in EC2

> [...]

> Create two EC2 instances - their default size is c5.xlarge

When you're benchmarking, you want a stable platform between runs. Virtual private servers don't offer that, because the host's resources are shared between multiple guests, in unpredictable ways.

stevewatson301 · on Sept 16, 2022

The c5 instances get dedicated cores, and thus should be exempt from resource contention due to shared cores.

speedgoose · on Sept 16, 2022

Do you get dedicated IOs on these too? AWS tends to throttle heavily most instances after some time.

stevewatson301 · on Sept 16, 2022

For dedicated disk IOPS you should take a look at the EBS provisioned IO volumes, or perhaps use the ephemeral stores that come with some of their more expensive instances.

tylerjl · on Sept 16, 2022

This is hard because while, yes, some platform with less potential for jitter and/or noisy neighbors would help eliminate outside influence on the metrics, I think it's also valuable to benchmark these in a scenario that I would assume _most_ operators would run them in, which is a VPS situation and not bare-metal. FWIW, I did try really hard to eliminate some of the contamination in the results that would arise from running in a VPS by doing things like using the _same_ host reconfigured to avoid potential shifts in the underlying hypervisor, etc.

But I would certainly agree that, for the utmost accurate results, a bare-metal situation would probably be more accurate than what I have written.

zmxz · on Sept 16, 2022

Which platform would you suggest to use for this benchmark?

bdcravens · on Sept 16, 2022

Ideally your own hardware with nothing else running on it. For convenience you could use a VM assuming they were setup identically.

porker · on Sept 17, 2022

Two dedicated servers from Hetzner or similar provider, rented for a month

0x457 · on Sept 16, 2022

Well, AWS offers "metal" servers.

Kiro · on Sept 16, 2022

Why would any of those fail at a measly 10k clients? 10 billion clients maybe.

EugeneOZ · on Sept 16, 2022

We don't have enough humans for that test.

teknopaul · on Sept 16, 2022

A test is limited to 1024 clients which is (despite aversion to the term webscale) not a lot, even on an Intranet.

I would say if you are not testing 10kcc you are not pushing the difference between nginx and apache1.3

As soon as you do push 10kcc, kernel tcp buffers and the amount of junk in your browser http headers start to be more import than server perf. Just in the amount of data coming into the nic.

mordornginx · on Sept 16, 2022

People still liking nginx making money on it.

nginx awful use and only make easy accidentally shoot one’s feet’s.