Hacker News new | past | comments | ask | show | jobs | submit login

Reminds me of "Gadsby", a 50.000 word novel without the letter "e":

https://en.m.wikipedia.org/wiki/Gadsby_(novel)




I'd be curious to know if it was easier or harder (or perhaps just as difficult) to write than the French equivalent. [0]

The Wikipedia article goes on to discuss interesting aspects of how the book was translated in different languages, with different self-imposed constraints.

[0] https://en.wikipedia.org/wiki/A_Void


I can’t say for certain, but I’d guess that writing without the letter “e” is slightly more difficult in French than in English. For one, “e” is a bit more common in French (around 15% of all letters, versus about 12% in English). But more importantly, French grammar adds extra challenges—like gender agreement, where feminine forms often require an “e”, and the frequent use of articles like le and les, which become unusable.

That said, I think the most impressive achievement is the English translation of the French novel. Writing an original constrained novel is hard enough, but translating one means you can’t just steer the story wherever you like. You have to preserve the plot, tone, and themes of the original, all while respecting a completely different set of linguistic limitations. That’s a remarkable balancing act.


Georges Perec did the same with his novel "La Disparition".

What is almost as impressive is that these novels (at least Perec's) have been translated to other languages.


I imagine LLMs would excel in this kind of writing these days.

But really impressive for the time.


I think it's the exact opposite, as they operate on a token-level, not a character level, which makes tasks like these harder for them. So they would generate a sentence with multiple es in it and just proclaim that they didn't.

(Just tried it, "write a short story of 12 sentences without one occurence of the letter e" - it had 5 es.)


You're assuming all you can do is prompt it. Surely you could also constrain its output to tokens that genuinely contain no e’s (or make only max 4 letters per word). LLMs actually output a probability distribution of next tokens; ChatGPT just always picks the top one, but you could totally just always filter that list by any constraint you want.


But the problem is that the tokens are subwords, which means that if you simply disallowed tokens with es, you'd make it hard to complete a word given a prefix.

For example, it may start like this "This is a way to solv-", or "This is th-"


If I understand it correctly, that's a valid concern but the way structured generation library like outlines[1] work is that they can generate multiple variants of the inference (which they call beam search).

One beam could be "This is a way to solv-". With no obvious "good" next token. Another beam could be "This way is solv-". With "ing" as the obvious next token.

It will select the best beam for the output.

[1]:https://github.com/dottxt-ai/outlines


... What if you retrained it from scratch, on an e-less corpus?


Yes, that would probably work quite well, given enough training data. However, I interpreted the question/claim as a task that LLMs excell at, meaning that writing text while avoiding a certain character is a task for a general purpose LLM.


I tried something like that some time ago. The problem with that strategy is the lack of backtracking.

Let's say I prompt my LLM to exclusively use the letters 'aefghilmnoprst' and the LLM generates "that's one small step for a man, one giant leap for man-"[1]. Since the next token with the highest probability ("-kind") isn't allowed, it may very well be that the next appropriate word is something really generic or, if your grammar is really restrictive, straight up nonsense because nothing fits. And then there's pathological stuff like "... one giant leap for man, one small step for a man, one giant leap for man- ...".

[1] Toy example - I'm sure these specific rules are not super restrictive and "management" is right there.


The next token is obviously "goes". (Any language model that disagrees is simply wrong.)


I'm not sure if my chain's bein' yanked right now, but surely you mean "gos"‽


The plural of mangoe is mangoes. https://en.wiktionary.org/wiki/mangoe


I was going to point that out.

What I will add is that constrained generation is supported by the major inference engine like llama.cpp, vllm and the likes, so what you are describing is actually trivial on locally hosted models, you just have to provide a regex that prevent them to use the letter 'e' in the output.


You can do this more properly with the antislop sampler and we are working on a follow up paper to our previous work on this exact problem.

https://github.com/sam-paech/antislop-sampler

https://arxiv.org/abs/2306.15926


All the training data contains 'e's.


That is not a counter point! The output has a probability distribution so you can assing zero to any e-containing token and scale everything else up accordingly.


I think an LLM would do well on this if you gave it a function that located words with an e so it could change them.


They’d probably sucks at a challenge like that because they work on tokens and don’t really see individual letters.

There was a post here a little while back asking AI models to count the number of Rs in the word raspberry and most failed.



You don't need to go all the way to LLMs when a simpler approach may do.

Here's a "What if?" on a very similar issue that uses Markov chains: https://what-if.xkcd.com/75/


LLMs are usually shit at this kind of wordplay, they don’t understand the rules - words that begin or end or include particular letters, words that rhyme, words with particular numbers is syllables - they’ll get it right more often than wrong, maybe, but in my experience they just aren’t capable catching wrong answers before returning them to the reader, even if they’re told to check their work.


8 of them on the cover!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: