I was thinking along eru's lines. The specifics I came up with:
1) Compress sorted lists by encoding the differences between successive numbers using a code that's shorter for smaller numbers.
2) Use in-memory merge sort. That way you don't need random access to the lists, and it's overall O(n log n).
3) In order to merge large lists when memory is already almost full, divide the lists into blocks, allocate blocks for the merged list while you're deallocating blocks from the two source lists.
4) To do this in Python, use the "array" module.