Hacker News new | past | comments | ask | show | jobs | submit login

These abstraction statements just aren't true.

Yes, you can architect a storage system this way. But 1) Even if you do, many, many, many high-performance systems are "on top" of a filesystem but don't actually use the filesystem for anything except perhaps as a block allocator. Consider databases.

2) Many, many object stores do not abstract on top of filesystems. Modern RADOS, the distributed object store, stores its local data in a local object store called BlueStore. BlueStore speaks directly to the block device; there's no filesystem involved.

3) Even if you did store part of your distributed object store data on top of a local filesystem, that's not necessarily an issue. HDFS does this. (HDFS, despite the "FS" in its name, is an object store as most practitioners understand them.)




1) Yes, databases are just object stores with indexing and querying, which are an abstraction on filesystems, which are abstractions on block devices.

2) Rados is an object store, which is an abstraction on BlueStore (effectively a filesystem and replacement of FileStore), which is an abstraction on block devices.

3) HDFS is an object store, which is an abstraction on filesystems, which are an abstraction on block devices.

I'm not sure what you're point is because you just restated what I already said. They are abstraction, and they work just fine without any performance issues because that is the trade off of having an abstraction.

What I also said is that emulating low-level layers on a higher-level interface (like a block device on top of a database or object store) will never match the original block device. What is untrue about this?


It appears that you want to make a very simple statement: "Abstractions tend to introduce overhead. If any software layers, an OS, or a network are added on top of a block device, I/O overhead will be introduced somewhere." As a conversational seed, I would wager most people would agree with that statement, in general.

One issue in this thread is that abstractions are concepts, not cpu instructions. In order to discuss overhead, one needs to reify the abstraction. For example, if you care about latency overhead, the block scheduler will definitely introduce overhead. But if you care about throughput, you probably /want/ abstractions like queues and schedulers.

> "What I also said is that emulating low-level layers on a higher-level interface (like a block device on top of a database or object store) will never match the original block device. What is untrue about this?"

Nothing is untrue about the sentiment of your statement. But from a practical standpoint, storage devices are useless pieces of junk without software. So to say abstractions slow down storage device while ignoring their utility feels arbitrary: why not talk about the length of the SATA cable, or the firmware in the disk controller? If the answer is that you just wanted to make the simple statement like the one I quoted at the start of this post then that's great, I think we are all in agreement. Otherwise, it's not clear what your point is and many of the supporting examples that you list are stated as fact, but are in reality either generally untrue, or very nuanced points, both of which tend to attract strong opinions :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: