Announcing Graylog v1.0 GA

gameguy43 · on Feb 19, 2015

Oh maaaaan the old graylog2 site had probably my /favorite/ branding in all of software. The heading was "Manage your logs in the dark and have lasers going and make it look like you're from space."

https://web.archive.org/web/20130302051347/http://graylog2.o...

Sad to see it's gone all "professional" now :(

The web service had some hilarious bits too. Does it still say "enraging gorillas... | mounting party hats" etc when you log in?

francesca · on Feb 19, 2015

I miss that too. Also the Party Gorilla was fantastic. Either way, glad to see the project has grown!

koliber · on Feb 19, 2015

"Users who tested the beta and release candidate versions of v1.0 reported huge improvements in performance and stability. Some of them were unable to crash the system no matter how hard they tried."

This statement from the page makes me conclude that others were indeed able to crash the system, perhaps without even trying too hard.

My guess is that the statement should be re-worded.

tbdr · on Feb 19, 2015

I was able to crash the system without even trying.

After waiting for a 500MB something docker image download and running the container halted the host OSX machine while looping with errors (failed API connections, failed to load SIGAR, ...)

The screenshots looked great but the first steps experience were a deal breaker.

real_joschi · on Feb 19, 2015

We've tested the Docker image on Mac OS X with boot2docker but of course there are always things that can go wrong.

It would be awesome if you could help us improve the Docker image by creating an issue at https://github.com/Graylog2/graylog2-images/issues with the error messages you've encountered when starting the container.

teacup50 · on Feb 19, 2015

Maybe requiring a full VM when a standalone self-contained application will do is not the winning strategy.

real_joschi · on Feb 19, 2015

Oh, we got both kinds. We got country and western. ;-)

Seriously, the Docker image and the VM images are there to help people getting started and to quickly try out Graylog. We also offer regular OS packages in DEB and RPM formats (supporting Debian 7, Ubuntu 12.04 and 14.04, and CentOS 6) and a quick setup application (for a simple demo setup of Graylog and its dependencies). So choose your poison and be happy!

nonane · on Feb 19, 2015

I just tried this with boot2docker on a OS X machine (Yosemite) and it went very smoothly.

lennartkoopmann · on Feb 19, 2015

That is a very good point. We are in the process of changing that. :)

The meaning should be that with v1.0 it is finally very hard to crash the system by overloading it. Previous versions were easier to crash by filling memory in load spike situations for example.

peterstjohn · on Feb 19, 2015

Hurrah - good to see that the Elasticsearch dependency is no longer locked to a rather old version (that bit me a few times when setting up a cluster last year). I'm still a bit wary about having to spin up MongoDB for it as well, though…

EvanAnderson · on Feb 19, 2015

The MongoDB requirement gives me pause, too. I've wanted to give ToroDB (https://github.com/torodb/torodb) a try for awhile now, and I think I'm going to see if Graylog will talk to it for grins.

real_joschi · on Feb 19, 2015

Please let us know how it worked out! ;-)

losvedir · on Feb 19, 2015

So is this in the vein of Splunk, say? My company also uses logentries, and has been quite happy with it, but a hosted open-source tool would be great.

lennartkoopmann · on Feb 19, 2015

Yes! Let me know if we can help with any questions.

See also: http://www.infoworld.com/article/2885752/log-analysis/open-s...

darksaints · on Feb 19, 2015

Is there any way to not use MongoDB in the backend?

crudbug · on Feb 19, 2015

I had the same question. Why are they using MongoDB when Elasticsearch can do the same job, may be better.

real_joschi · on Feb 19, 2015

We don't want to use ES too much as Graylog should still work when ES has some problems. It's currently the default output but that may change in the future for special use cases.

That being said, we're aware that many people don't like the dependency on MongoDB and we'll work on that.

zantana · on Feb 19, 2015

Is Graylog effectively a single product that does the whole thing? My dismay with the ELK stack is you are effectively juggling 3 separate products with different release cycles.

lennartkoopmann · on Feb 19, 2015

Yes! We have put a lot of effort into making this one thing.

The graylog-web-interface connects to the graylog-server REST APIs and that is it. You can manage and monitor the whole system from the graylog-web-interface.

Both components are always released together.

mrits · on Feb 19, 2015

Finally, less flexibility!

crudbug · on Feb 19, 2015

The good thing about separate projects is that you can reuse Kibana in other projects. e.g www.packetbeat.com

0xbadcafebee · on Feb 19, 2015

So there's Graylog, then there's Graylog2, then there's the new-and-improved Graylog.... If someone forks it will there be a new-and-improved Graylog2?

benjarrell · on Feb 19, 2015

Yea I am confused, is this different from Graylog2?

lennartkoopmann · on Feb 19, 2015

We decided to drop the "2" from the name to have less confusion with "Graylog(2) 2.0" - Point taken here, the initial "2" was not a good idea :D

(The documentation may need a "History" page)

0xbadcafebee · on Feb 19, 2015

No Graylog3?

real_joschi · on Feb 19, 2015

As Men in Black 3 wasn't that great and Star Wars: Episode 3 also had its lengths (don't get me started about Lethal Weapon 3!), we tried to avoid that. Let's see how Ghostbusters 3 will work out and maybe (but just maybe) we'll think about releasing Graylog3. ;-)

0xbadcafebee · on Feb 19, 2015

"Logbusters." I ain't afraid of no log!

listic · on Feb 19, 2015

What other pieces of software are there in this class? Who are the competitors?

dwwoelfel · on Feb 19, 2015

I've used Splunk in the past, which is very good.

Their search interface and field extraction let you perform complex queries on your logs. We used the following query to identify users that were having a bad time on the site, by counting slow queries by user and adding them up:

"slow query" source="/var/log/application.log" | rex field=_raw "in (?<time>\\d+)ms" | search time>2000 | rex field=_raw "User: (?<login>[^\\s]+) " | top login

You can do much more complex queries--it's better to think of Splunk as a temporal database than a log aggregator.

Does anyone know if Graylog or other logging solutions can do the same thing? Splunk is amazing, but it's annoying to manage the infrastructure and it's crazy expensive.

csears · on Feb 19, 2015

Splunk and ElasticSearch + LogStash + Kibana (ELK) are two popular solutions in the same space.

minalecs · on Feb 19, 2015

I've tried and used sumologic . It may not be appropriate for some companies because everything is sent and analyzed on their cloud infrastructure, but is a very good product.

kiyoto · on Feb 19, 2015

Some folks in the Fluentd community uses Graylog2+Fluentd as an alternative to EFK (Elasticsearch Fluentd Kibana) See http://www.fluentd.org/guides/recipes/graylog2

aerialcombat · on Feb 19, 2015

So what is it exactly?

lennartkoopmann · on Feb 19, 2015

A log management tool: You send all your local log messages/files to a central place (Graylog) and have unified searching, filtering, monitoring, alerting, forwarding, ...

All open source.

aerialcombat · on Feb 19, 2015

Call me stupid, but I've always learned better from examples than descriptions. So it's a log management tool. I can have my app write logs or files into Graylog, then I can handle the logs/files within Graylog much more easily than, say, if I were to write my own code to make some kind of sense with the log data. Am I understanding this correctly?

So I can maybe tell easily where users had problems with my app, or I can even use it as some kind of analytics program for my app?

aerialcombat · on Feb 19, 2015

I've visited the site and read what I could before, but without knowing what it does, it was pretty cumbersome to read through the entire website to figure out what exactly it does.

I was initially drawn to the fact that it took five and a half years to get to 1.0. God knows how hard it must've been to work on it to perfect it all these years, heck spending five months on an app is tough enough. I would've given up reading about it otherwise if it hadn't been for the five years thing. I just think that, despite all it's capable of, if the descriptions were more in layman's terms it could reach wider audience.

It actually turns out that this is exactly what I was looking for since I'm about to have my app out, and if I hadn't asked the question, I would've just ignored it all together.

While I'm very thankful for the additional explanations, I still think that nice illustration or a simple video of telling how it can solve real world problems will help people understand it better.

lennartkoopmann · on Feb 19, 2015

Exactly. Think of this search query you could do:

  source:your-app AND http_status_code:>=500

This would give you a list of all HTTP 500s of "your-app".

Or:

  user_id:12346 AND http_status_code:>=500

... to see all errors a user caused after customer care called you, reporting an error the user got. Stacktrace is there immediately without having to find the right log file on the right web server.

snowwrestler · on Feb 19, 2015

I've used SumoLogic to do log management at work, and it's been helpful in specific ways:

- We have a server that runs dozens of websites. When the load spikes, we can quickly get a count of recent log entries for all the sites. The site with the anomalously high number of entries is where we start troubleshooting. This could also be automated as "anomaly detection" that sends us alerts, but we haven't configured that yet--happens rarely.

- One of our servers got hacked. Running log searches helped us pinpoint when it happened, which site was "patient zero," and how the bad guys got in.

- We launched a new site and forgot the Google Analytics code, which we didn't catch right away. We were able to run a report from the server logs to approximate the traffic data that GA missed.

Having all the logs feed into a centralized service made it easier and faster to find the information we needed across a bunch of websites, as opposed to working directly with Apache log files.

We looked at using ELK (Elasticsearch, Logstash, Kibana) to do the same thing "for free," but decided we did not want to manage a complex software stack to help manage a complex software stack. :-) We'll take a look a Graylog I'm sure, but there is something to be said for paying for this as a service--one less thing to worry about.

hijinks · on Feb 19, 2015

correct.. its a way for you to index and search through logs and easily pull out where issues might be happening.

Its like splunk but not as poweful

jMyles · on Feb 19, 2015

So is it similar to Sentry?

lennartkoopmann · on Feb 19, 2015

Sentry is for monitoring exceptions and errors on your platform. While Graylog can do that, too (without the aggregation though) it is also capable of monitoring any log messages out there and not only errors.

jMyles · on Feb 19, 2015

Huh?

Sentry is great for monitoring all sorts of log emissions.

lennartkoopmann · on Feb 19, 2015

Ok. I have used Sentry the last time a long time ago and maybe I'm outdated.

I suggest you try out Graylog and let us know about your findings! I'll make sure to look at Sentry again.

jMyles · on Feb 19, 2015

Roger. I'll give it a spin next time I'm doing a monitoring setup.

Thanks for the work!

NextPerception · on Feb 19, 2015

I looked around a little bit on the web page but the answer to my two questions was not immediately apparent to me. (Note: I might just be blind) 1. Can I set up a second Graylog Server as a failover with automated recovery and log syncing when both nodes come back up? 2. Can I run it on any cheap cheap embedded platforms like a raspberry pi or beagleboard?

lennartkoopmann · on Feb 19, 2015

Yes, you can set up as many Graylog servers as you want and put them behind a load balancer or integrate any kind of failover you want.

We do not recommend it running on extremely small platforms like a raspberry pi, but a very small VM (we have OVAs and other virtual appliances ready) is able to process a lot of messages already.

toomuchtodo · on Feb 19, 2015

> and put them behind a load balancer

Only if your load balancer supports UDP, which most don't. You'll most likely need to use DNS load balancing in this case unless you're sending GELF messages with TCP.

gbrindisi · on Feb 19, 2015

I'm evaluating Graylog as a piece in our monitoring infrastructure.

Does anybody have some experience using it in production?

nirvdrum · on Feb 19, 2015

I've used it and have colleagues that have used it a few companies. I would suggest to get the most out of it, you should set up some sort of tagged logging. In Ruby, there's log4r + log4r-gelf to add additional context (e.g., Rails request GUID), that way you can easily and quickly link related messages. If you lack that flexibility, Graylog's extractors work rather nicely too. It's just a lot harder to manage, IMHO.

It's really helpful being able to pull up all requests for a single user across the cluster. Or track the status of all components of a Sidekiq job. Or see which API requests are most popular. Etc.

exabrial · on Feb 20, 2015

Yes. We've been using it in production for about 1.5 years to gather all of our syslog and our application logs. No problems at all, and we're up to about 20 shards of data now.

Honestly, I have no idea what we did before we started using it. It's an absolutely essential tool.

druiid · on Feb 19, 2015

Yep! We do about 5k events/sec and it holds up pretty well as long as you throw enough memory at it (It's a java app). Make sure you have a cluster though if you're going to be doing those kinds of levels. Feel free to respond if you have any other questions. I'll be testing out the 1.0 release myself today.

gtirloni · on Feb 19, 2015

+1

The solution seems neat. It'd be nice to know who is using it in production, how big is their environment, etc. I couldn't find much information online.

whiskykilo · on Feb 19, 2015

We're receiving roughly 200 msgs/s in our environment. We have it set up in an active/active environment across two data centers. It's been rock solid for us. It takes a little bit of time to get certain applications to send messages properly, but once you do, it's pure gold.

We are using NXLog on our Windows servers to send event log messages in GELF format. This allows us to truly delve in and search event logs so much easier than what we normally would be able to in Windows.

whiskykilo · on Feb 19, 2015

We have been using it in production, it rocks! Very powerful tool and the company behind it is full of great, helpful folks.

acamilo · on Feb 19, 2015

we are using it in production and works very well. ~200 clients sending events with a total of 63,996,350 events.

arca_vorago · on Feb 19, 2015

Can I just have on thing that does event logs and snmp and uptime and alerting for internal and external devices in a secure way all in one package? I'm so tired of having to have 30 different softwares installed just to accomplish 3 things.

boothead · on Feb 19, 2015

Could anyone give an executive summary of the differences with the ELK stack?

lennartkoopmann · on Feb 19, 2015

This might help: http://docs.graylog.org/en/1.0/pages/ideas_explained.html

boothead · on Feb 19, 2015

Perfect! Thank you.

stephen · on Feb 19, 2015

Looks really great. Has anyone compared this to CloudWatch Logs? When I initially looked at CW Logs, it seemed fairly incomplete compared to Splunk.

michilehr · on Feb 19, 2015

Congratulations to Lennart and his team!

whytry · on Feb 19, 2015

So... it runs in a JVM? ok....