Introducing LoggerFS (2013)

infosectosser · on June 5, 2014

This article was written in September 2013, just a few days before AutoRef - the company the author was working for at the time - announced they were shutting down [1]. Does anybody know if development continued on the project?

[1] http://www.bizjournals.com/pittsburgh/blog/techflash/2013/09...

cyounkins · on June 5, 2014

Hi, previous CEO of AutoRef here.

I've spoken with Eric (the engineer who spearheaded this), and while code exists it needs some work for release. It hasn't been worked on since the closure of AutoRef.

Eric and I will be working on getting a release ready within a week. If you'd like to be notified when a release is ready, please sign up for this mailchimp list and we'll email you: http://eepurl.com/WlqXz

staunch · on June 5, 2014

My first concern is what happens to daemons when they can't write to the FUSE mount because this program is down or misbehaving. I believe many daemons would misbehave, block, or die. That scares me but maybe I'm being paranoid.

I think all the arguments against log shippers are pretty weak, workarounds are simple, especially if the alternative introduces any instability.

e12e · on June 5, 2014

Can hardly be worse than in the event of a full disk/partition?

ithkuil · on June 5, 2014

The failure more is very different. On log partition disk full log writes will fail, which you favourite logging library might decide to ignore. A misbehaving FUSE daemon can hang applications that write files and assume it's a local disk.

A smart logger can have a writer thread and a buffer and decide to drop log lines if the logger is too slow. But LoggerFS is meant to be a drop in solution for legacy code, so the concern is perfectly valid.

jefftk · on June 6, 2014

You could set up a watchdog.

rdtsc · on June 6, 2014

Yap one could put a watchdog. And then a status reporting and monitoring system to log the log system failures. At some point in that complexity just reverting to rsyslog would seem like a breath of fresh air. Even with its shortcomings.

I have been there, over-designed myself into a hole and then looking back at what was started as a simple 3 step idea now turning into exponentially growing number of branches and corner cases that have to be handled..

Sometimes it is easier to just say "ok this was not a good design" and just throw it away. I have done that and looking back it was a good decision.

staunch · on June 5, 2014

Depending on your disk not to magically fill up is very different from relying on this FUSE daemon to be up and operational.

e12e · on June 5, 2014

To be clear, I do think it's a valid concern -- my point was that "what if log location isn't writable/disk is full" is pretty basic concern that any daemon should account for (what happens if somehow mount -oro /var/log?). And writes failing shouldn't be that much of an alien error condition.

In sum, if fuse/loggerfs together can guarantee that every possible state will result in sane (error) state on writes -- this shouldn't be worse than a disk dying/fs corruption/disk-full type failure.

It will obviously be another point of failure.

On a side note, mounting this under eg /var/log, and then having a strong guarantee that failure will result in an unmount, revealing a writable /var/log seems like the best of both worlds. Would probably have to HUP all writers though...

hueving · on June 6, 2014

>what if log location isn't writable/disk is full

These are two different cases though. If a disk is full the write fails and software usually handles that well. Depending on the state of FUSE, it will just block indefinitely.

Twirrim · on June 6, 2014

It talks about buffering in memory. That could be bad.. I'm assuming/hoping they'd put some kind of upper bound on it, or you could easily take out a server.

rch · on June 5, 2014

It needs testing, like anything new, but I think the concept is sound enough to warrant at least a modicum of attention. It's something I've been wanting for a while actually.

mmahemoff · on June 5, 2014

A little light on detail. Can someone please explain exactly how this works?

The best I can figure is it's a shipper replacement. So apps write to a log file as usual, but it's actually buffered in memory and pushed to a central server via a FIFO queue.

Given it's all in-memory, it will be small and transient, so you're completely relying on the central server to store it reliably. (Not a criticism, just trying to understand it.)

mmastrac · on June 5, 2014

My first impression is that this is a fantastic and novel development for centralized log analysis.

Basically, it's a virtual filesystem that pretends to create files, but actually intercepts writes to log-style files in this virtual location and buffers those in memory, eventually shipping them off to a place where you can run central analytics on them (ie: Splunk, Logstash, etc).

The application doesn't know that it's writing to a virtual file: it just opens a descriptor, dumps a line or two every few seconds and continues along its merry way. The logs never touch the disk, which means that they don't content for limited disk I/O bandwidth.

achillean · on June 5, 2014

I actually wrote exactly the same thing years ago (2007), even had the same name :)

http://sourceforge.net/projects/loggerfs/

seanewest · on June 5, 2014

Put it on github -- people might send some Pull Requests.

derefr · on June 6, 2014

Presuming your daemons rotate their logs, what's the benefit of this over mounting a plain-old tmpfs over your logs folder? Either way, the shipper reads virtual files that are actually stored in memory, and which eventually get purged. It's just the thresholds that are different.

mmastrac · on June 6, 2014

You'd have to be rolling your logs fairly often to avoid filling up tmpfs with log lines that have already been shipped. The advantage of this approach is that log lines that are shipped are no longer on the box at all.

ams6110 · on June 5, 2014

How many logging frameworks don't already support syslog?

e12e · on June 5, 2014

Probably none, given: "echo hi|logger".

Rantenki · on June 6, 2014

Take with a grain of salt, since it's anecdotal, but I have seen logger hang when syslog rotates. Logger is really only 100% reliable when you are piping it the output of a command that runs in a bounded amount of time. If you have a real daemon it's always better to use syslog directly, and an inability to use syslog is maybe an indication of the quality of the daemon in question.

lxm · on June 6, 2014

This article http://engineering.linkedin.com/distributed-systems/log-what... says that by the time you're done satisfying every requirement of a distributed logging system with multiple writers and readers and various reliability requirements, you've essentially rebuilt Apache Kafka.

seanewest · on June 5, 2014

I'm just waiting for us all to make a FS for practically everything through FUSE (a virtual file system which is used by LoggerFS).

I learned about FUSE while getting into Plan 9 and the "everything as a file" mentality. Lets make everything a file!

colin_mccabe · on June 6, 2014

As others have commented, in general, it seems simplest and best to use the remote features of rsyslog or journald rather than going through a filesystem layer.

But this looks like a fun project. Is there an advantage to this over using NFS for logs?

divoxx · on June 5, 2014

I can't find a link to the actual project on the article. Is it just me? lol

rch · on June 5, 2014

The article mentions that they are still working on getting a release together.

divoxx · on June 5, 2014

Ah, apparently I overlooked that multiple times. Thanks :)

disbelief · on June 5, 2014

A quick Google turned up a repository on Sourceforge: http://sourceforge.net/projects/loggerfs/

Started in 2007, last update in 2013. Not sure if it's the same thing, but the description is similar:

> LoggerFS is a fuse-based virtual file system that allows you to store log files from apache, syslog and more directly in a database instead of a regular file.

jhardcastle · on June 5, 2014

Another commenter upthread[0] said that was his project, same name and same idea.

> I actually wrote exactly the same thing years ago (2007), even had the same name :)

[0] https://news.ycombinator.com/item?id=7855060

e12e · on June 5, 2014

"For now, we’re polishing the code and writing tests to prepare an initial open source release of LoggerFS."

imperialWicket · on June 5, 2014

This is a pretty old article, and not much seems to have happened in LoggerFS development. Maybe log shippers aren't so bad after all?

kordless · on June 6, 2014

Think about the different use cases for logs. I've spent an insane number of hours thinking about it, unfortunately. Live data is useful for graphing or alerting on things. Prehistoric data which is made to be searchable is useful for troubleshooting, or doing forensics work. These use cases require widely varying types of technology to make them scalable, reliable and useful, where 'useful' is defined as saving me time and money.

I LOVE the idea of turning the idea of logging on it's head!

anoother · on June 5, 2014

This sounds really interesting, but I can't seem to ascertain what it actually does.

> The log data is buffered in-memory (potentially journaled for reliability) and sent over a configurable transport.

What are the options for this 'configurable transport'? What is/are the endpoint(s)? Does LoggerFS have facilities for storing and reading-back logs, or does it rely on other services for this?

The post only seems to /hint/ at answers to these questions.

tlrobinson · on June 5, 2014

From the article:

    Backend/Aggregator agnostic (includes multiple log transports)
      Supports any Syslog-based log manager
        Loggly, Splunk, Logstash, Rsyslog/Syslog-ng
      ZeroMQ
      NSQ transport – used internally at AutoRef.com
      Generic UDP/TCP
      And soon: AMQP and Redis (and later: Scribe? Fluentd?)

It deals with the specific problem of collecting the logs from the applications on your servers and shipping them to an aggregator (of which there are already many, e.x. Loggly, Splunk, Logstash)

cbsmith · on June 6, 2014

Alternatively one could just log to a queue/network directly...

danudey · on June 6, 2014

Assuming one's software allows for that. Which a lot of software either doesn't do, doesn't do with sufficient configuration, does badly, or does but doesn't support your particular logging method.

As an example, nginx only recently gained the ability to log to syslog; Apache has a logging module but it's not exceptionally customizable if you wanted to log to, say, ActiveMQ, or to a custom service (unless you write a separate binary to accept logs on stdin).

cbsmith · on June 6, 2014

Yup. Given that those logging mechanisms also expose all the problems with file based logging, you have a much more significant effect with a lot less complexity by working on fixing those deficiencies than this route.

Hello71 · on June 6, 2014

so it's basically systemd-journal, except more fragile and significantly harder to use (requiring open, fwrite, etc) than the simple printf.

danudey · on June 6, 2014

This isn't a way of adding logging to your own software; it's a way of taking any software's logging (yes, even printf) and sending it to a remote server without having to modify the software directly.

So, basically, it's like systemd-journal except not actually like systemd-journal, significantly easier to use, and can use printf directly instead.

godisdad · on June 5, 2014

Buh? http://en.wikipedia.org/wiki/Log-structured_file_system

Xorlev · on June 5, 2014

Log-structured != log-centric.

Log-centric (LoggerFS) is a filesystem around managing log files.

Log-structured is a way of structuring data such that it's written to sequentially and is always appending (while dropping from the head).

Log files are most similar to log-structured, but not a filesystem dedicated to shipping log files for centralization.

awda · on June 5, 2014

Totally and radically unrelated. Next time, please read?

eknkc · on June 5, 2014

Duh? http://en.wikipedia.org/wiki/Cuttlefish