Logstash joins Elasticsearch

clarkdave · on Aug 27, 2013

Logstash, Elasticsearch and Kibana are just fantastic. After being unsatisfied with a whole bunch of Logging As A Service providers (I tried loggly.com, logentries.com and splunkstorm.com) I spent an afternoon setting up Logstash and co and couldn't be happier.

There's a neat demo of Kibana here: http://demo.kibana.org/#/dashboard/elasticsearch/Logstash%20...

The only thing that isn't fully baked in with this stack is alerts (e.g. sending an email if a certain error log message comes in), but you can do that using Logstash filters and outputs, although there's no pretty UI.

There are some excellent Chef cookbooks for setting up Logstash and friends too:

- Logstash: https://github.com/lusis/chef-logstash

- Elasticsearch: https://github.com/elasticsearch/cookbook-elasticsearch

- Kibana: https://github.com/lusis/chef-kibana

vid · on Aug 27, 2013

it is worth noting there is a Node implementation of logstash.

https://github.com/bpaquet/node-logstash

It is "logstash compatible" (at ElasticSearch, so it works with Kibana) and in my experience very easy to work with, and probably a lot lighter weight than the JRuby version.

stock_toaster · on Aug 28, 2013

oh neat. thanks for the link -- had never run across it before.

capkutay · on Aug 27, 2013

For anyone who can't immediately see the significance..this is Elasticsearch's entry into real-time log analytics. There is plenty of room for innovation and financial opportunity in this area, given the success of the $5 billion valued Splunk along with companies like SumoLogic and LogLogic.

What's most interesting is that Elasticsearch seems like a completely open source (and widely used) offering of a product that Splunk charges close to oracle pricing for.

Shameless plug: If you're looking for an opportunity at a well-funded true real-time analytics company in silicon valley...feel free to ping me. There's lots of exciting and fun work to do in this area.

nasalgoat · on Aug 28, 2013

The one thing Splunk has going for it over ES is the amount of resources it requires to work at scale.

I needed 12 ES boxes for every one Splunk box to handle the 100MB/day log load of my system, and even then they ran at a high load and searches often failed, and in some cases it took hours for the indexer to catch up.

jordansissel · on Aug 28, 2013

This experience sounds especially bad. Sorry about that.

As mentioned in another comment in this post, I was doing 300gigs of data per day with an elasticsearch cluster size of 7 elasticsearch nodes (16 cores & 16gb ram per node) and load was around 5-10% cpu utilization.

100MB/day is pretty small in terms of log data, I think. If you attempt this again, please invoke the community (elasticsearch's is great!) and see if we can assist you in figuring out what's busted.

joshhart · on Aug 27, 2013

We built a similar system log-searching system using SenseiDB at LinkedIn. Splunk was outrageously expensive.

It turns out that lucene based systems are pretty good at information retrieval and aren't shackled with all the OLTP requirements most databases have.

noelwelsh · on Aug 27, 2013

You probably want to include contact details somewhere. Your profile has none. (I'm not looking for a job right now, but I'm interested in the area.)

capkutay · on Aug 27, 2013

Ah..I thought the e-mail on my profile page shows up to other users. You can reach me at john.kutay@gmail.com

zisaacson · on Aug 27, 2013

if you're looking to join Sumo Logic, please feel free to ping me ;)

benmmurphy · on Aug 27, 2013

logstash + elasticsearch are pretty amazing. however, if you are generating a high rate of log entries you may want to consider using mozilla hekad instead (http://hekad.readthedocs.org/en/latest/). on our servers logstash was running around 20% CPU during quite periods while hekad was running around 1-2% CPU. while during busy periods i think logstash was going up to 100% CPU while hekad was sitting around 20-30% CPU.

hekad is written in go which compiles down to native code while logstash is written in jruby which is not the most performant runtime.

jordansissel · on Aug 27, 2013

I don't know what bottlenecks you had when were observing high resource usage in logstash, but, in general, if there's a performance problem, it is a bug, and we can fix it.

The next release of logstash (1.2.0 is in beta) has a 3.5x improvement in event throughput. For numbers: on my workstation at home (6 vcpu on virtualbox, host OS windows, 8gb ram, host cpu is FX-8150) - with logstash 1.1.13, I can process roughly 31,000 events/sec parsing apache logs. With logstash 1.2.0.beta1, I can process 102,000 events/sec.

Processing speed will vary greatly by what you are doing with your events and it doesn't make sense to generalize performance characteristics globally, especially with a metric that, alone, doesn't really tell me much (cpu utilization).

If it's slow, it's a bug. We can fix it. :)

Further, you can use hekad with logstash and with elasticsearch (one or both together, it doesn't matter).

In terms of problems solved, logstash helps solve transport and real-time processing problems. In cases where the logstash agent is too resource intensive, the logstash community offers many alternatives on this site: http://cookbook.logstash.net/recipes/log-shippers/

The community (myself included) is very interested in helping logstash be a success for its users, so if you do see performance problems, things that behave weirdly, or anything strange, it's probably a bug, and we can fix it.

The short version of all of this is captured by the project principles listed in the logstash readme: https://github.com/logstash/logstash/#project-principles

<3

benmmurphy · on Aug 27, 2013

that's good to hear. it could also have been a plugin that we were using that slowed things down. our log files are in csv so i wrote a plugin that uses ruby csv to parse lines and split them into key-value pairs based on String->List[String] hash we have. so it might have been the go csv parser has much better performance than the ruby csv parser.

adamlj · on Aug 27, 2013

It sounds very likely that it was Ruby that consumed so much of the CPU.

kossmoboleat · on Aug 27, 2013

So heka does the same job as the central logstash server or does it offer a persistence layer, too?

Did you find a usable GUI or web interface to view the logs?

zimbatm · on Aug 27, 2013

Heka can output events into elasticsearch like logstash. Then you can also use Kibana for the UI.

quicksilver03 · on Aug 27, 2013

Another possible log shipper is nxlog, it compiles to native code and does not have any noticeable impact in terms of CPU or memory usage on my various low-end servers.

http://nxlog-ce.sourceforge.net/

cmgreen · on Aug 27, 2013

I used to use nxlog to collect Windows AD logs to a Linux server. Often, it would end up deadlocking (2-3/week) on the Windows Side and would stop shipping the logs. It was very useful to have the logs but I'm very glad we were able to replace it.

kossmoboleat · on Aug 27, 2013

Do I have to buy the commercial version to get a web interface or GUI to analyze or browse the logs?

http://log4ensics.com/

quicksilver03 · on Aug 27, 2013

Only if you want.

On my servers I use the open source version of nxlog to collect various logs and forward them to a central nxlog server, which in turn feeds logstash. Behind logstash I have configured elasticsearch as storage and I use kibana as a GUI to search and browse.

netvarun · on Aug 27, 2013

We've had no issues with rsyslog, which already comes packaged in ubuntu. Runs with no issues on micro instances on the AWS cloud, even at heavy workloads.

vosper · on Aug 27, 2013

What kind of throughput are you seeing on your cluster, in terms of messages-per-second?

benmmurphy · on Aug 27, 2013

on one of our machines that sends generates logs we do around 2400/s when the application is under heavy use. we have 9 machines that generates logs but they all generate different amounts. we mostly are using heka for generating stats from log files because we are too lazy to instrument the code and we have excellently detailed and formatted logs :) but we do have some logstash machines still pushing stuff to elastic search for low traffic applications we run.

we found that when using logstash even just for pushing stats to statsd it was not performing well enough. i've experimented with hekad pushing to elasticsearch on our staging cluster and performed well enough but we had weird problems showing up in nagios when we were using logstash+elasticsearch in production (checks were timing out even though we were seeing no degradation of performance on the servers). because of this it is quite difficult to get any kind of central log pushing into production. :(

JoachimSchipper · on Aug 27, 2013

I'm confused. Can someone explain to me why this is so obviously interesting, yet not worth discussing, that it stands - as of 2 hours after submission - at 75 points with zero comments?

Honestly, I've never heard of either company, although I obviously wish them the best of luck. Am I just out of touch?

jpgvm · on Aug 27, 2013

Logstash + Elasticsearch + Kibana is the biggest thing in opensource operational tools since Nagios.

viraptor · on Aug 27, 2013

I'd put CMSs (puppet, chef, etc.) in between, but otherwise I agree. These were the tools making a huge difference.

ape4 · on Aug 27, 2013

Maybe I a too traditional... but I like KISS when it comes to this kinds of thing.

zwily · on Aug 28, 2013

Logstash, ES, and Kibana actually are more KISS than any other log searching setup I've tried.

Except for grep of course.

wiremine · on Aug 27, 2013

To understand the interest, you need to understand the moving parts:

Logstash is a sort of pipeline for data you want to log: you can define multiple inputs, transform/filter the data, and then define multiple outputs.

Example 1: read in your apache logs (input), attach geoip data (transform), and then push the resulting data to elasticsearch (output).

Example 2: read from syslog (input), grep on the input to ignore certain files (filter), then push to graphite for graphing (output).

you can have multiple inputs, multiple transforms/filters, and multiple outputs. You can also chain logstash instances together, so you can have "roll up" logs. Logstash itself is a bit heavy in terms of CPU/RAM (it is written in Java), so there are a few, lighter weight "shippers", and you can ship into a Redis instance to proxy events.

Elasticsearch is a java-based search engine with a great REST API and a _lot_ of features. It is built on top of Lucene. It doesn't have a built in GUI. It also scales out super easily.

Kibana is front-end to Elasticsearch, which lets search/visualize your log events.

Ok, those are the parts, why this is interesting: As other commenters have pointed out, this is a powerful combination for understanding your log data. You can ship logs from apps, services and hosts, visualize what's going on, search, correlate, etc.

don_draper · on Aug 27, 2013

Log file analysis is a big deal. This one company (Splunk) alone is worth 5 billion dollars:

http://www.google.com/finance?q=splunk&ei=VwsdUqi2L5qglwON4g...

Ecio78 · on Aug 27, 2013

There's always something to learn and there's always somet technology or company we don't know (yet).

Of the three, you should at least could have heard of ElasticSearch: it is a general purpose search server based on Apache Lucene like its cousin Apache SOLR (maybe you know it), but with REST API, JSON support etc..

You can use it for searching stuff and one (but not the only one) scenario is logging. In this case you use some other software (like Logstash but I think there are others too) to collect logs from different sources (i.e. syslog for operating systems, ruby apps via gems, raw tcp etc..) and according to a friend of mine that uses it for this purpose it's very good for its speed, easy to use, to scale etc..

N.B. I invite anyone more expert to elaborate/correct what I've said, I've just used Apache SOLR in the past and never tried ES

ook · on Aug 28, 2013

I think any of Logging/Monitoring/Metrics at scale can be thought of as Chicken & Egg problems.

They are important, hard to do well and have a bad habit of only causing issues which swallow engineering time when you are firefighting furiously trying to scale core services.

That's why as someone pointed out separately Spunk is a $5Bn company and people who have had these problems previously are very excited by this news.

(It's also why StatsD&Graphite/OpenTSDB, Riemann/Sensu etc etc are all super interesting)

netvarun · on Aug 27, 2013

This is great news. Our centralized logging system at Semantics3 (https://semantics3.com) is built using Logstash+Kibana+Rsyslog+ElasticSearch. Running off a single EC2 large instance it has been been able to seamlessly aggregate and process logs from about 200-300 instances, processing on average of about 15 GB of log data. We hit some performance bottlenecks (particularly with elasticsearch) when our number of instances went beyond the 300 mark. But that should get fixed once we shard and distribute ElasticSearch.

Looking forward to some really tight integration between the Logstash, ES and Kibana.

100k · on Aug 27, 2013

Logstash is awesome. We use it at Swiftype to index all our logs and it's super helpful nailing down support requests and bugs (using Kibana).

Since you can access the logs via the Elasticsearch API, we made users' recent logs available to them in our dashboard: https://swiftype.com/blog/api-logs.html

victorhooi · on Aug 27, 2013

I wonder how all this compares to Graylog2? (http://graylog2.org/)

Those guys are meant to be releasing a new re-vamped version at the end of October, from the screenshots and videocasts, looks pretty good:

https://www.facebook.com/graylog2

vosper · on Aug 27, 2013

For people using this, I'd be interested to know what kind of throughput you're seeing and your cluster size - I'm trying to find something that can handle upwards of 100k small messages per second for a near-realtime analytics platform, and although this is a bit left-field (compared to Cassandra, HBase etc...) it could be a fit.

jordansissel · on Aug 27, 2013

At my last job (prior to joining elasticsearch), I had a cluster of 7 machines (16 cores, 16gb ram, 2TB raid1), each running logstash and elasticsearch.

The event rate going into this cluster was about 5000 events/sec on average (burst up to 10,000 events/sec sometimes).

During a maintenance (two machines going offline for disk repairs), I benchmarked the surviving 5-node cluster at 88,000 events/sec peak performance.

In terms of capacity planning, this means that we could have a 9x increase in normal event load and still not need to grow the cluster's processing capacity.

Persistent storage is another story. We stored about 300GB/day of events, getting us roughly 45 days of data retention before we would run out of space (2TB * 7 nodes / 300gb/day; roughly 45 days). I'm working on improving storage efficiency of logstash and elasticsearch, too, so retention should improve greatly in the long term.

For other experiences, it's useful to invoke the community and ask what others are done - the #logstash irc channel on freenode is very active as is the logstash-users@googlegroups.com mailling list.

Hope this helps!

vosper · on Aug 28, 2013

Thanks for the detailed reply! My use case is a stream of distinct, ordered events identified by a UUID, where the first event makes up about 95% of the volume; that is, we don't often receive subsequent events with the same UUID.

The initial event and any subsequent ones tend to arrive close together in time, so the challenge is to find something that can handle a high insertion rate, a relatively low update rate, while providing fast aggregations suitable for charting in a web-frontend. In Riak, Couchbase or HyperDex we'd use a secondary index and do our own math, but Elasticsearch is attractive because it appears to support the kind of queries we're interested in out of the box, in addition to having a good reported write-rate.

Persistence is less of an issue, because after a short period of time (a couple of hours) we would summarise the events into our analytics DB (Infobright) and so we could set a TTL on the data stored in Elasticsearch.

Again, thanks for the response and I'll check out the mailing-list and IRC channel.

Edit: Grammar

markelliot · on Aug 27, 2013

What's the raw scale of input data for your 300GB/day of stored events? (assuming that's 300GB on disk stored in Elasticsearch)

jordansissel · on Aug 27, 2013

I think it was roughly 300 million events/day (1kb per event). There is some overhead incurred by logstash (turning a log into json, parsing it into fields) and by elasticsearch (analyzing/indexing data).

In practical terms, and by way of example, a plain text apache access log, fully parsed by logstash (breaking out fields, etc), has historically bloated by quite a bit (6.2x I have measured). Lately, however, with improvements to logstash, better default settings, and elasticsearch being awesome, the 'inflation' number gets down to something more like 1.5x - which isn't bad considering all the awesome you get with it.

Long term, I am working towards making the 'raw data to stored data' ratio something less than 1x.

You can see some experiments I did a year ago on this: https://github.com/jordansissel/experiments/blob/master/elas...

I will repeat these experiments after the next release of logstash, and I expect storage ratios to improve significantly.

kapilvt · on Aug 27, 2013

it your taking feature requests, a plugin to archive to s3 would be really nice, for long term data retention. and more props to lumberjack-go port.

jaryd · on Aug 27, 2013

Logstash is really great and Jordan is approachable and very helpful. To all interested, I recommend joining their IRC channel (#logstash on Freenode) and talking to the people there a bit.

Congrats :)

Keyframe · on Aug 27, 2013

I'm currently evaluating elasticsearch and riak for rt analytics of large amount of data. Anyone has similar experience? Maybe even Cassandra, haven't touched it seriously yet.

devopser · on Aug 27, 2013

ElasticSearch itself should be very good now since they have moved to Lucene 4.0 which brought in lot of improvements in memory usage.

I evaluated elasticsearch for RT analytics. It works wonders for point queries, where your result set is going to be small. Didn't work well for aggregate queries which need to scan lot of data. The biggest problem was field cache in Lucene. Almost all our queries needed to fo faceting which had a big impact on field cache.

Also, I don't know about Riak, but in ES the joins you can do are very limited.

Keyframe · on Aug 27, 2013

I'll do extensive testing, but I need to scan a lot of data (aggregate basically). I'd be comfortable even with index size in multiples of data size if it delivered RT queries. Have you evaluated anything else?

devopser · on Aug 28, 2013

We also checked mongodb. We dropped it mainly because index size was getting too big.

If your data is read-only then Cloudera Impala is worth a try. It's really fast.

Keyframe · on Aug 28, 2013

I was looking at Impala (Cassandra) as well as keeping an eye on Drill progress. My data is write only in ETL stage so it seems it could be the right way. Lots of testing ahead! - thanks

eikenberry · on Aug 27, 2013

When looking at elasticsearch remember that it is a secondary datastore used for indexing, searching, querying, etc. Not for a durable long term data store. You'll need another system for that along with a system for re-feeding elasticsearch.

reinhardt · on Aug 27, 2013

Is this really the case? I realize you probably can't express the equivalent of arbitrary SQL queries in elasticsearch but what would prevent it from being used as a primary NoSQL datastore?

eikenberry · on Aug 27, 2013

Primary reason at the moment is that elasticsearch currently has no backup/long term durability story. They plan on something for 1.0, but that doesn't help with the current situation.

Also we've suffered data loss on several occasions with elasticsearch. This has been getting better but is still a concern. Having the external long term datastore and a handy import method made these small hiccups.

Lastly IMO elasticsearch works best, particularly when working with log like data, as a rolling window view into the data. This keeps your elasticsearch cluster to a reasonable size (saving $$) while keeping the ability to re-load old data for exploration when you need it.

smcleod · on Aug 27, 2013

Both Logstash and elasticsearch are great - but they both suffer from the same flaw: they're a pain to deploy and it's a pain to manage their packages.

jordansissel · on Aug 27, 2013

With logstash, I aim to make it as easy to deploy as possible. That is, in part, why the releases are self-contained jar files with all depenencies built-in (except for java itself). We also started working on shipping rpm/deb packages with recent releases.

Like I always say, if it's hard to use or appears to have major flaws or pains, it's a bug, and we can fix it. Let us know! :)

dbarlett · on Aug 27, 2013

If you're crazy like me and run elasticsearch on Windows, I can't recommend elasticsearch-setup [1] highly enough. Combined with node discovery via the EC2 API [2], it's been rock-solid.

[1] http://ruilopes.com/elasticsearch-setup/

[2] http://www.elasticsearch.org/guide/reference/modules/discove...

devopser · on Aug 27, 2013

This space is heating up. Cloudera is building a similar stack with Solr - http://www.cloudera.com/content/cloudera/en/campaign/introdu...

vigeek · on Aug 27, 2013

This is great news as well. @ Wildbit we have a dedicated logging server consisting of Rsyslog, ES, LogStash and Kibana3. It's been improving considerably each month.

chriscareycode · on Aug 27, 2013

I love Logstash+Kibana+Elasticsearch. Holding 410 million log files in a 10 node cluster! Congratulations Jordan!

koppo · on Aug 27, 2013

this is the bestest news i've heard in a long long time ...