Good god, 4 million log requests per second. 400 TB a day (compressed) if they stored the logs.
I recently setup a fun project using a RethinkDB cluster and Node.js express middleware[1] that logs all requests to RethinkDB in JSON. I did some load testing and sustained 1,400 writes per second, and was quite happy thinking this would scale up very larger. However, not 4 million write per second large. :-)
Interesting how NGINX + Lua is becoming more and more widely used in mission critical applications with huge amount of traffic. Since the introduction of LuaJIT performances have been outstanding and many companies like Netflix, Alibaba, Cloudflare, Kong [0], Airbnb all run on a customized nginx with Lua modules; doing amazing things from security to API management.
I predicted about a year ago that nginx/LuaJIT (OpenResty) is the sleeping giant of web development. I've seen more and more companies start using it, and I wouldn't be surprised if people start talking about it as a Node.js alternative in the not so distant future.
The good thing about NGINX is that it's perceived as language agnostic. So no matter what your codebase lang, you can always put an nginx on top. While Node tools, are only good if you use them within a Node stack (see Strongloop). As far as I know nginx is powering 150M websites/apps.
And Lua is taking off because of these reasons such as the ability to extend Nginx or HAproxy. Plus, is simple to use, easy to learn and highly efficient, without the need to touch C/C++.
My apologies, I just noticed a flood of these previously unrecognized domains which had a lot of "web chrome" around a central YouTube video. I shouldn't have included the language about flagging it, as that made my comment more negative than I intended it. I am slowly learning to pare down my HN comments to avoid that editorializing.
To determine whether something is really just "blog spam", I always look at whether the page adds anything that would be lost on YouTube. In this case there's the description, a link to slides, and information on the speaker. So this one gets a pass.
The Nginx + LuaJIT approach is new to me. I went looking and found their blog post about it [1]. That post mentions log aggregation, but it sounds from this talk like they're doing dynamic routing via the Lua code. Is the idea to accept requests for entirely separate sites via one front-end Nginx host(s), and then perform a sort of NAT or virtual routing to different client properties based on the request headers?
At any rate, very interesting stuff, and more to read up on now.
In the world of lossy counting. This [1] could be something to look at if you wanted to answer the question "how many requests/secs towards jgc.org at noontime, two months ago"
Are you complaining about the obvious ES JSON in one of the slides not getting a mention or just the lack of mention at all?
If the latter, it's highly likely due to the findings (possibly even independently verified) of the treatment ES received from Jepsen, which was revisited in a talk at that same conference: https://news.ycombinator.com/item?id=9778291
I recently setup a fun project using a RethinkDB cluster and Node.js express middleware[1] that logs all requests to RethinkDB in JSON. I did some load testing and sustained 1,400 writes per second, and was quite happy thinking this would scale up very larger. However, not 4 million write per second large. :-)
[1] https://github.com/commando/express-rethinkdb-logger