Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Chunking is the term I used because that's more relevant to the data science domain I'm focusing on here (e.g. Pandas has "chunksize", Zarr has "chunks"). Streaming has some implication of an ongoing stream of data to me... but I ought to clarify some of the assumptions about a fixed size of data, yes.


Chunking and streaming are different things to me. Chunking means you get to process multiple rows of data at the same time, usually useful to take advantage of SIMD. Streaming means that the data is accessed in a single pass: once you compute the effect of a given row on your statistics, you never have to rewind to see it again.

Many modern performant solutions will use both, but they're not the same thing.


One important application of chunking is efficient I/O.

Mass storage is most efficient when doing large sequential reads and writes, so you normally feed your constant-space streaming algorithms from buffers with a large number of input records.

Sometimes you can just tell the OS do efficient chunking prefetch for you.


If you're streaming in a language like Python, its IO will be doing some degree of chunking behind the scenes. It might be beneficial to do more manually.


Often called buffering when applied to IO.


netCDF, used for storing e.g. large multidimensional climate datasets, is also "chunked" through its HDF[0] backbone.

The hard part for me is the "transpose" or "striding" problem. i.e. when the data is stored in a series of (x, y, z) files for a given hour, day, or month and I need a time series at a point.

[0] https://en.wikipedia.org/wiki/Hierarchical_Data_Format


> Chunking is the term I used because that's more relevant to the data science

If your intended audience are data scientists then why didn’t you mention Dask?


Ditto. Python multiprocessing also uses “chunks” with a specified chunksize.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: