Ok, check it now. I added a little bit about that, though I'm sure I could still improve the wording.
Now we have something whose unit is "bits" but whose value includes fractions of a bit. What can we do with this? After all, if we're only storing one roll, we still need 3 bits to store 6 possibilities. The trick is that we can use fewer bits if we are storing more rolls at once. There are 2 wasted possibilities in those 3 bits we used for the first roll, and if you're clever, you can use those to encode some information about the next roll. If we're clever enough, and storing enough, 2.58... is the lower bound on the number of bits required per roll that you'll converge to with an optimal compression scheme.
Now we have something whose unit is "bits" but whose value includes fractions of a bit. What can we do with this? After all, if we're only storing one roll, we still need 3 bits to store 6 possibilities. The trick is that we can use fewer bits if we are storing more rolls at once. There are 2 wasted possibilities in those 3 bits we used for the first roll, and if you're clever, you can use those to encode some information about the next roll. If we're clever enough, and storing enough, 2.58... is the lower bound on the number of bits required per roll that you'll converge to with an optimal compression scheme.