Although "eager" isn't called out, a recent study of academic publications shows that the use of LLMs can be measured through word frequency analysis [1], finding certain words are disproportionally represented:
> We study vocabulary changes in 14 million PubMed abstracts from 2010–2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words.
I don't want to look for the source of analysis right now, but I recall reading a study demonstrating that a large part, if not most of the word frequency shift was caused by RLHF training done on data predominantly generated by people hired from lower income English-speaking countries which simply have a different dialect of English with a noticeably different frequency of certain phrases and expressions, so e.g. at least some versions of ChatGPT got RLHF-trained to speak more in a Nigerian English dialect.
Since there isn't a single English (English learners generally get informed about the choice of UK vs US English only, but most English is spoken outside of UK and USA in other places and other dialects), but multiple different Englishes, any English speaker will probably find something to be surprised by, and there is an economic incentive to get data from people other than the relatively expensive native speakers of UK or USA English.
There wasn't a study or analysis. It was just lazy speculation that felt good because it could be bound up in a "evil white countries exploiting the developing world" narrative. Where exploiting was "paying to do a job".
Again, there is effectively zero real data showing this. Further, RLHF isn't likely to reinforce such word selection regardless.
A more logical, likely scenario is that training data is biased heavily towards higher grade level material, so word selection veers towards writings that you find in those realms.
> It was just lazy speculation that felt good because it could be bound up in a "evil white countries exploiting the developing world" narrative. Where exploiting was "paying to do a job".
Exploitation like that is in fact happening (see pretty much everything having to do with social media content moderation and RLHF to avoid disturbing content.
Also "paying to do a job" is not the moral panacea you seem to think it is.
tinfoil had theory: they implanted watermarks already, so that AI generated text can be flagged for future training runs or as a service, such that some phrases are coaxed to become statistical beacons.
That's not really a tinfoil hat theory. That's been possible for some years and OpenAI reportedly does watermark their outputs, and can detect it. They just haven't released it as a service because it'd annoy all the users who are using it for cheating :)
Yeah I would like to see some evidence of this too. It's just asserted as truth in the article. Delve doesn't seem like a particularly unusual word to me, especially in the context of scientific abstracts, and LLMs could totally learn random weird things. How common is "it's important to remember" in Nigeria?
then again, most history consists of whitewashing back when northern countries were exploiting everywhere else in various ways: imperialism, colonialism, neocolonialism, capitalism, financialization,...
typical people prefer to pretend this is simply "order" and "progress"; seemingly blind to their own ideological baggage like fish in water
The window of time where word frequency of chatgpt's favourites and usage of chatgpt is closely related is rather small I think. Academic language has a number of 'marker' words that are basically just style and will be more or less copied once you read many papers. 'Rigorous' is a general example, but most fields have their own. If many papers you read while writing your own paper use words like 'delve', you will be much more likely to use it yourself.
On another note, while the paper itself is pretty cool, in discussions on it I thought people where kind of looking down on using LLM's to help you write. There's a philistine moat in many fields around writing style. While writing well is in my experience correlated with paper quality, it is not predicated by it. And introducing tools that help people write more readable papers is probably a net benefit overall.
I wonder why some words are overrepresented. Isn't the whole idea of language models to model word distribution as close as possible? Does it have something to do with RLHF? Or it's the training data?
Language models would be fairly useless for most people if they accurately modelled the source distribution, no better than autocomplete. In fact, they were fairly useless when they modelled the source distribution, that's why ChatGPT was an instant hit whereas GPT-3 was mainly only interesting to other AI reasearchers.
What made LLMs suddenly interesting was that the responses were much more like answers and much less like additional questions in the same vein as the prompt.
>In fact, they were fairly useless when they modelled the source distribution, that's why ChatGPT was an instant hit whereas GPT-3 was mainly only interesting to other AI reasearchers.
I had a bot which used the original GPT3 (i.e. the completion model, not the chat model) and its answers were pretty decent (with the right prompting). Often even better than GPT3.5, whose answers were overly formulaic in comparison ("as an AI language model...", "it's important to ..." all the time)
> We study vocabulary changes in 14 million PubMed abstracts from 2010–2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words.
1: https://arxiv.org/html/2406.07016v1