Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We built HopsFS-S3 [0] for exactly this problem, and have running it as part of Hopsworks now for a number of years. It's a network-aware, write-through cache for S3 with a HDFS API. Metadata operations are performed on HopsFS, so you don't have the other problems list max listing operations return 1000 files/dirs.

NVMe is what is changing the equation, not SSD. NVMe disks now have up to 8 GB/s, although the crap in the cloud providers barely goes to 2 GB/s - and only for expensive instances. So, instead of 40X better throughput than S3, we can get like 10X. Right now, these workloads are much better on-premises on the cheapest m.2 NVMe disks ($200 for 4TB with 4 GB/s read/write) backed by a S3 object store like Scality.

[0] https://www.hopsworks.ai/post/faster-than-aws-s3



the numbers you're giving are throughput (byte/sec) not latency.

The comment you reply to is talking mostly about latency - reporting that S3 object get latencies (time to open the object and return its head) in the single-digits ms, where S3 was 50ms before.

BTW EBS can do 4GB/sec per volume. But you will pay for it.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: