(Setting aside the fact that Logstash is JRuby/Java app easily eating said gigabytes of heap)
Do JSON logs take gigabytes?
If they do for you then yes, gigabytes of disk and memory are pretty much guaranteed. Also things tend to pile up with time (even on a few weeks horizon).
In all honesty, I believe a finely crafted native code solution for this problem could achieve x3 less ram usage and x2-3 indexing/search performance. Going beyond that is also possible but will take remarkable engineering.
In near future I totally see someone producing a great open-source distributed search project that is at least on par with today’s ES core feature set.
ES is good but it takes a small army to keep on top of the performance and management. And then there's the upgrades which will fix a few bugs and introduce new ones too.
> I believe a finely crafted native code solution for this problem could achieve x3 less ram usage and x2-3 indexing/search performance
Many people believe that about everything written in Java. The story often ends up a lot more complicated though. In the domains where it shines Java is surprisingly hard to beat without very significant effort. And then you must also contemplate what they same significant effort would achieve if directed at the Java solution or cherry picking particular performance hotspots out of it.
I’m basing it on my limited experience of rewriting things from optimized but messy C++/D to JVM. Yes, the end result is much simpler but is memory hungry and is ~2x slower (after optimizations and tuning). Sometimes you can fit Java in less then ~2x of original footprint but at the cost of burning CPU on frequent GC cycles.
Not every application is the same but the moment it involves manipulating a lot of data in Java you fight the platform to get back the control you require for those things. And by the end of day there is a limit at which reasonable people stop and just give up dodging memory allocations, creatively reusing objects, wrangling off-heap memory without the help of type system.
Do JSON logs take gigabytes?
If they do for you then yes, gigabytes of disk and memory are pretty much guaranteed. Also things tend to pile up with time (even on a few weeks horizon).
In all honesty, I believe a finely crafted native code solution for this problem could achieve x3 less ram usage and x2-3 indexing/search performance. Going beyond that is also possible but will take remarkable engineering.
Update: to expand on the last point, c++ solutions are typically closed sourced. Rust and Go both have interesting open-source full text engines: https://github.com/tantivy-search/tantivy https://github.com/blevesearch/bleve
In near future I totally see someone producing a great open-source distributed search project that is at least on par with today’s ES core feature set.