Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for sharing. Do I understand correctly that this requires loading the whole file in memory along with an ordered list of keys? Or is it just the first n bytes that are loaded in memory? If the former, then it seems very expensive in terms of RAM, particularly if your data file has multiple columns.

As an alternative I used is to load the file in a database, then sort by the key I want (which only loads the key in memory) and then output the result into a file. It does go through disk but you can address larger files as you only need the key in memory, and not the whole file.




It does a memory map (mmap). Your file is addressed in virtual memory. See https://en.wikipedia.org/wiki/Mmap and https://en.wikipedia.org/wiki/Memory-mapped_file


Indeed I load the entire file, but I think provided I have RAM to load the file with all the columns, the approach of loading everything should give me optimal performance. However I agree with your approach when there isn't enough RAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: