I can only do a humble and poor job of explaining this process by analogy to other algorithms. If you've got a better explanation I could rote memorize, I'd be very appreciative.
I can't say I have had much success explaining my thesis clearly to anyone except people from the same department, but I can try to give my understanding of discrimination from an implementor's perspective.
The succinct version is that all (discrete) data can be serialised as bit strings and those bit strings can be sorted in linear time by (say) radix sort.
There are two ideas at work here: that all (discrete) data structures ultimately are composed of primitive types that can be sorted in linear time, and that we should only examine each primitive value once.
However, to be fair it should be mentioned that the analysis uses a different machine model (the RAM model) than is usually employed in sorting (where all comparisons are considered to take constant time, which is arguably an inferior model of real computers if you are sorting e.g. strings of variable length).
To be honest, though, the original paper is so well-written that I have a hard time explaining it better.