In the tokenization example for '"Mississippilessly", why is 'si' not part of the combined column? It appears twice in the text. My speculation is that it got collapsed out with 'iss' (a longer token). Is that right?
Yes. At step 1 there are two 'i','s' and two 's','i', 'is' gets picked because it comes first. Then at step 7, because 'is' got formed, there are two instances of 'is','s' and only one instance of 's','i' so 'iss' gets formed.