> A tokenizer might encode “once upon a time” as “once,” “upon,” “a,” “time,” for example, while encoding “once upon a ” (which has a trailing whitespace) as “once,” “upon,” “a,” ” .” Depending on how a model is prompted — with “once upon a” or “once upon a ,” — the results may be completely different, because the model doesn’t understand (as a person would) that the meaning is the same.
TechCrunch, I respect that you have a style guide vis a vis punctuation and quotation marks, but please understand when it's appropriate to break the rules. :P
TechCrunch, I respect that you have a style guide vis a vis punctuation and quotation marks, but please understand when it's appropriate to break the rules. :P