Don't forget the Pi (generally) uses flash. So seek speeds are irrelevant. If yo...

tjoff · on Sept 5, 2015

Just read the second comment on the link you provided and you have the answer.

TheLoneWolfling · on Sept 5, 2015

> Nowaday’s fast Class 10 cards achieve their high read and write speeds only by buffering in advance, assuming that the subsequent blocks will be read or written next. That means they inevitably sacrifice random read and write speeds. This effect is in the order of several magnitudes. For instance, a card with sequential read and write speeds of 10 MB/s might collapse to just 0.01 MB/s for random access.

Yow. Yet another example of over-optimization to benchmarks.

...Except that, even then, how exactly is appending to a log file repeatedly random access? Bad journaling?

tjoff · on Sept 5, 2015

This is much closer to random access than sequential access. Counter question: How is appending to a log file repeatedly sequential access?

No meta data or other interaction required? Surely the sequential writes benefit from larger buffers? Not to mention that the flash block is way larger than the appended message. You would need TRIM to not make that an extremely wasteful operation, not sure if the typical Pi installations make use of it but a quick google seems to indicate that it at least has been a problem. So writing that 1 kbit now means reading a whole block, edit the content of it, and writing that whole block back - which is hardly sequential either. 1 Mbit/s is starting to seem pretty fast.

The only pathological case here is the "not optimal sequential write case", which sure is enough to kill any random USB-thumbdrive or SD-card. Just copying two files two a thumdrive at once can kill performance by an order of magnitude compared to doing them one after another.

TheLoneWolfling · on Sept 5, 2015

This is where you benefit by having a file system (or disk controller) that isn't brain-dead.

Unfortunately, it seems like both the controller and file system are brain-dead on the RPi.

Namely, you have a cache that detects this sort of (relatively-common) append-only operation, and buffers it until it makes sense to actually write it (either because the disk is otherwise unoccupied or because you are close to the limits of your power buffer).

Alternatively, you have a file system that actually knows the advantages and limitations of flash. You can do append-only files efficiently on flash (without doing all of the read + write dance). "All" you need to do is take advantage of a bit of scratch space and knowing how flash works. Namely that you don't necessarily need to flash a block to write to it, as long as you are only turning zeros to ones (or vice versa, depending on how physical bits are mapped to logical bits. Multilevel flash is a little more complex, but still doable). (For instance, instead of having one filestamp, you have space for 16 filestamps, with two bytes at the beginning, all initially set to all zeros. You check the last bit set and use that filestamp. Congratulations, you now can update the file 16 times before needing to reflash the block. If that's not enough, you can extend it arbitrarily (note that you can do this dynamically for files only when they are updated enough to warrant it!) And metadata/etc can be done in the same way. For something like a log file, you have, say, 16 slots for how much of the block is occupied, and a pointer to the next block (or null). You append the content of the new log line to the file and add the new content. One write, no blocks being flashed. (Two writes if/when you run out of space in the block.)

See, for instance, https://github.com/bnahill/FLogFS/ or https://en.wikipedia.org/wiki/F2FS/

salgernon · on Sept 5, 2015

The sole purpose of the pi is not to write kernel log files. I'm sure if you want to build a piece of hardware / software kit that did nothing but write sequentially to a flash drive, that'd be blazingly fast.

But if you want log rolling, filtered output, serial output, non-flash file systems (nfs, etc) and a general purpose computing device where printk is the rarest of rare events (in the lifetime of a task) then you're going to lose some of your optimizations.

TheLoneWolfling · on Sept 5, 2015

Nevertheless, writing (essentially) append-only files is something that happens often enough that most filesystems should cope with reasonably well. And writing 20x slower than it could is nowhere near coping reasonably well.

(Also: note that you don't need to cope with nfs/etc. Flash is different enough from spinning rust that I have no problem with a filesystem being optimized for one or the other as opposed to both.)

tjoff · on Sept 5, 2015

I'm going to guess that the Pi uses ext3/4 which I'm assuming does pretty well with appending files on a spinning disk.

Yes, we need something tailored for flash and the advent of SSDs have put some effort into that. However, the controller of a SD-card is for understandable reasons not as advanced as one on an SSD. One of the issues that seemed prevalent of the Pi was that SD-cards, when issuing a TRIM command, just erases the flash - immediately. Which of course kind of defeats the purpose and leads to bad performance (trying to optimize for future writes by sacrificing current writes).

The recommendation that I saw was to schedule a pass clearing free space for when the Pi wasn't in active use.

Problem with the controller doing some of the stuff (which I guess is inevitable due to different design needs) and the file system doing some other stuff is that when the filesystem is paired with a bad controller (but one that still does some kind of wear leveling) you are kind of screwed anyway, because either implementation can not rely on the other. So, just use an SSD instead?

I guess it's worse on the Pi since it's somewhat trying to be a workstation on mobile hardware. And sometimes I get the impression that even phone makers can't tailor it well anyway (samsungs android filesystem for the older phones were apparently pretty awful for instance). And android only got TRIM support in, I believe, Kitkat? But only if the whole chain of device hardware and drivers supported it (so god knows which current devices actually support it)... Learning that was quite shocking to me and probably a large source of "my phone just keeps getting slower" problems. But who cares if it just accelerates sales of new phones?

TheLoneWolfling · on Sept 5, 2015

Or you have one filesystem that is geared towards dumb flash, and another that is geared towards smart flash.

I don't see why people try to design filesystems that do everything. As usual for hybrids, they try to do everything and as a result don't work well anywhere.

As long as you keep the limitations of dumb flash in mind (namely, that flipping bits in one direction is slow (things are a little more complex on MLC, but still doable)), it's surprisingly easy to design a filesystem that does well on "dumb" flash.

tjoff · on Sept 5, 2015

This is a tangent but you seem to be under the impression that filesystems grow on trees. I've been waiting for a modern COW linux filesystem for almost a decade now. Maybe in a few years it's mature enough to wait a few more years for it to become stable? (I'm looking at you BTRFS) ;)

I guess BTRFS would fall into the filesystems that do everything category... Though at the other side of the spectrum you have filesystems that have not been tested enough or isn't maintained.

simoncion · on Sept 5, 2015

FSs designed to run on dumb flash exist: https://encrypted.google.com/#q=site:lwn.net+nand+flash+file...

> I don't see why people try to design filesystems that do everything.

For general use, [0] I would rather use a FS that works on pretty much every media type [1] -and has reasonable perf- than to have to worry about whether $SPECIALTY_FS is actually tuned for $WEIRDO_BLOCK_DEVICE, or if I've failed to understand the particulars of either the FS, device, or both.

[0] That is, when getting the absolute best perf isn't a requirement.

[1] Except for like, tape, and optical RAM/WORM drives, natch.

TheLoneWolfling · on Sept 5, 2015

I don't mind having a filesystem that works reasonably well on most devices. But I would consider "being able to append to a file reasonably quickly" as part of that "reasonably well".

Note that it's entirely possible to have a filesystem that's actually several different filesystems with a magic selector. (I mean, block sizes and reserved block percentages and inline inodes already are along those lines)

simoncion · on Sept 5, 2015

If your underlying device is slow -as SD cards are notorious for being for any use case other than picture or video storage and retrieval-, no amount of FS juju can help you.