Hacker News new | past | comments | ask | show | jobs | submit login

What makes you think it's not compressed? (or that the data is stored as XML?)

There's very sophisticated compression systems throughout each experiment's data acquisition pipelines. For example this paper[1] describes the ALICE experiment's system for Run3, involving FPGAs and GPUs to be able to handle 3.5TB/s from all the detectors. This one [2] outlines how HL-LHC & CMS use neural networks to fine tune compression algorithms on a per-detector basis.

Not to mention your standard data files are ROOT TFiles with TTrees which store arrays of compressed objects.

It's all pretty neat.

[1] https://arxiv.org/pdf/2106.03636

[2] https://arxiv.org/pdf/2105.01683




link 1 shows Alice compresses down to less than 100 GB/sec, and uncompressed at 3.5TB/sec.

The article has a sub-headline:

> A petabyte per second

The math doesn't work out for me, unless there at 30 ALICE equivalents at CERN. I think there are about 3.


The 'uncompressed' stream has already been winnowed down substantially: there's a lot of processing that happens on the detectors themselves to decide what data is worth even sending off the board. The math for the raw detectors is 100 million channels of data (not sure how many per detector, but there's a lot of them stacked around the collision) sampling at 40Mhz (which is how often the bunches of accelerater particles cross). Even with just 2 bits per sample, that's 1PB/sec. But most of that is obviously uninteresting and so doesn't even get transmitted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: