'If you can store infinite information outside of a system, then you can achieve infinite information density inside the system if you use outside the system as 'context'. I hope that paraphasing makes it clear enough the problem with this line of thinking. You have to include all the information when calculating absolute density. In your case, the context has to be stored somewhere too.
It's worse than that. There are a finite number of states of a message, and you can only reference that number of states of the outside system. If you think of the message as a pointer containing n bits, then you can only reference the first 2^n positions in the dictionary.
If the outside system is a probability distribution over output messages (a more general case of the dictionary you described) then the problem is synonymous with compression.
If all information has to be accounted for and stored somewhere, and context is part of the information, then you can't store any information without storing all information, everywhere. Because every bit of information exists inside the context of the entire universe.
You're getting very metaphysical here, but reality remains the same even if you expand these principles to the universe. The rules of physics still apply.
I think he has a point, although it's more about the semantics of the term 'information' density.
Shannon information is always measured relative to a receiving context in which it the symbols are understood, and information content is related to the inverse of the probability of observing a particular signal as assessed by the receiver. So from that perspective vinceguidry has a reasonable point.
However really the question being asked is something more like 'data density' and that is generally what people are talking about when the term 'information density' is invoked.
Edit: I see that the original article does indeed refer to Data Density, and that HN title is just wrong.
'If you can store infinite information outside of a system, then you can achieve infinite information density inside the system if you use outside the system as 'context'. I hope that paraphasing makes it clear enough the problem with this line of thinking. You have to include all the information when calculating absolute density. In your case, the context has to be stored somewhere too.