Hacker News new | past | comments | ask | show | jobs | submit login

> Facebook just published a blog about moving petabytes per hour.

For the curious: https://engineering.fb.com/data-infrastructure/scribe/

Edit: HN thread: https://news.ycombinator.com/item?id=21181982




Absolutely astounding to me, petabytes an hour? That's in the region of a meg to several megs per user per hour looking at their monthly active user figures.


It's mostly telemetry data. /s or not /s not sure.


Mmm it's not only "telemetry data". It's that (e.g. Scuba) and other types of logs, and not only Facebook (e.g. Instagram as well).

Basically, everything that needs logging and post-processing by both real-time systems (e.g. Puma) and batch processing (e.g. all of the data that's ingested and sent to the data warehouse) goes through Scribe.

(disclaimer: I work in Scribe)

Scuba: https://research.fb.com/publications/scuba-diving-into-data-...

Puma: https://research.fb.com/publications/realtime-data-processin...


Does that include Instagram and WhatsApp? Or just FB branded stuff like Messenger and the network app...


Not everything that flows through Scribe is tied to an (external) user, though. Tons of internal systems use it as well, notably anything that logs to Scuba (which is pretty much everything at Facebook. Wide-structured system logs are awesome).


You should post that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: