Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> A tokenizer might encode “once upon a time” as “once,” “upon,” “a,” “time,” for example, while encoding “once upon a ” (which has a trailing whitespace) as “once,” “upon,” “a,” ” .” Depending on how a model is prompted — with “once upon a” or “once upon a ,” — the results may be completely different, because the model doesn’t understand (as a person would) that the meaning is the same.

TechCrunch, I respect that you have a style guide vis a vis punctuation and quotation marks, but please understand when it's appropriate to break the rules. :P



When this problem comes up in code-related documentation at work, I often fall back onto either a distinct typeface or a background-color shift.

That way it's clearer when I'm referring to a literal code-string versus quote marks that are part of the prose.


I usually use <pre></pre> for this.


<tt>

Teletype longa, vita brevis.


The <tt> tag is deprecated.


The blink and marquee tags are deprecated but they still work...


True. But it is well supported and it meets a need.


stymied by their own tokenization !


I'm glad I wasnt the only one




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: