I may be missing something, but the most obvious improvement would be to eliminate duplicate input. A million integers in the range of 1 to 100,000 (the prompt doesn't specify if the range is inclusive or exclusive) means that at least 90% of the numbers must be duplicates. What would be the speed up if the list generation filtered them out?