Thanks for sharing. Do I understand correctly that this requires loading the whole file in memory along with an ordered list of keys? Or is it just the first n bytes that are loaded in memory? If the former, then it seems very expensive in terms of RAM, particularly if your data file has multiple columns.
As an alternative I used is to load the file in a database, then sort by the key I want (which only loads the key in memory) and then output the result into a file. It does go through disk but you can address larger files as you only need the key in memory, and not the whole file.
Indeed I load the entire file, but I think provided I have RAM to load the file with all the columns, the approach of loading everything should give me optimal performance. However I agree with your approach when there isn't enough RAM.
As an alternative I used is to load the file in a database, then sort by the key I want (which only loads the key in memory) and then output the result into a file. It does go through disk but you can address larger files as you only need the key in memory, and not the whole file.