This article is complete bunk. The researchers used a "chatgpt detector" which as we've seen over and over in academia, do not work. This study is completely unfounded.
God I'm choking on the irony of an article about the dangers of using AI to train AI based on a study that used AI to detect AI
Per the article, the didn't just use the static detector:
They also extracted the workers’ keystrokes in a bid to work out whether they’d copied and pasted their answers, an indicator that they’d generated their responses elsewhere.
So while I don't yet know if the article is bunk -- I do know that your hot take is bunk.
The company that I work at does exactly the service that you're describing. We recently spun up a team of Math PhDs to help with data labeling. (https://www.invisible.co/). We're seeing more and more of our clients ask for graduate level data labelers and content creators.
Right now I'm pretty sure just having gpt rewrite the average low to mediocre content that made up the gruel of its generic internet diet and doing fine tunes will get us another 5-10x along, but most hopefully for us little guys out there with a ~24-48GB VRAM cap
If GPT4 really is 8 230M models, the next bit for us will be a few ~1-5M models that swap in for whatever you want to create, or talk about, or what have you
Imagine a model trained just on English football for the purpose of having a good time in the pub that is used when the topic changes to it. I bet you could pass on the dailymails sports page if you add some "u"s into your words.
Or a model finetuned specifically on the library you're trying to debug, maybe even specifically in combination with other tools you're trying to put together.
My well-documented melancholy around the state of the LLM “conversation” notwithstanding, I’ll point out that there’s a long and generally productive history of adversarial training: from the earliest mugshot GANs to AlphaZero, getting these things to play against each other seems to produce interesting results.
Whatever the merits of this or that “ChatGPT detector”, the concept isn’t unprecedented or ridiculous.
according to the paper they get 98% accuracy. another recent paper came out saying it's always possible to discriminate between real and synthetic text [1].
i think the core problem is with the generalist classifiers (gptzero, openai detector, etc). ex. openai's classifier has an accuracy of around 25% on it's own text. however, when you train a bespoke classifier (like the authors did), you can get really good results.
Adversarial training isn't infinitely scalable either, has its limitations also.
Also - the moment that companies start training models to resist detectors, they expose themselves to regulation. Won't stop dark AI models running on some website somewhere, but it can be very effectively applied to companies running at Google or OpenAI scale.
i would recommend u read the paper. the contribution isnt a detector thats meant to be taken seriously; but a detector that works in a very specific task. they then use this to estimate use of LLMs on MTurk
I can understand your frustration with the article, but let's approach it with an open mind. While the use of a "chatgpt detector" may have its limitations, it's essential to appreciate the researchers' effort in exploring new methods. The study may not be perfect, but it contributes to the ongoing conversation about the risks of using AI in AI training. Irony aside, let's keep the discussion going and encourage further research to improve our understanding of this complex field.
I don't know what feels worse for me - that whenever I read a mannered, well-structured and somewhat verbose comment, I now suspect it wasn't authored by a human - or that, as I quickly realized, my own writing style feels eerily similar to ChatGPT output.
Thanks. I've already noticed that I've started to unconsciously adjust my writing style to avoid that feeling of similarity to ChatGPT.
That said, compared to typical comments on-line (even on this site), using paragraphs, proper capitalization, correct punctuation, and avoiding typos already gets you more than half of the way to writing like ChatGPT...
ChatGPT has been RLHFed into a pretty distinctive style, but there's no reason to think a better LLM wouldn't have a more natural style. If AGI is possible, then HN will end up with AI users who contribute on an equal basis to the modal HN user, and then shortly after that, more equal. Should all AI be banned? Should you have to present a birth certificate to create an account?
> Should you have to present a birth certificate to create an account?
I actually honestly believe that the era of "open registration" forums and discussion places is going to come to a close, largely due to GNN.
It's not going to become a problem until the hardware and walltime costs of training models and running them comes down. You'll know it's a problem when every 10th post on 4chan is a model pretending to be a human that is of a gentle but unyielding political persuasion of some sort.
I don't know what the end pattern will be, but it'll likely be a combination of things
- large platforms, like reddit or facebook, where individual communities "vibe check" posts out.
or
- some sort of barrier to entry, such as a small amount of money (the so called "idiot tax": if you're an idiot, you get banned, and you have to pay again)
- some sort of (manual!) positive reputation system for discussion boards, sort of like how peering works
- some sort of federation technology where you apply and subscribe to federation networks
I don't think we'll really be able to predict what the future looks like right now (it's not even widely recognized as a problem). And since this is HN, I'll add: I don't think there's any serious money to be made running reputation or IDV, unless you've already started. And if it becomes a serious enough problem, players like ID.me/equifax/bureau will be the situation for "serious" networks (linkedin, facebook, chat, etc).
I have farmed out work to Turks and tried to "go native" as a Turk and found I couldn't find HITs I could bear to do.
It used to be there were a lot of HITs that involved OCRing receipts but these were not receipts that were straightforward to OCR, they were receipts that failed the happy pass and that I thought there was no way I could transcribe them accurately in a reasonable amount of time considering what it paid.
"Transcribing" is a better word or maybe "manual OCR".
The ones they sent to AMT were just awful, I would say 2/3 of them were impossible to transcribe with complete accuracy and would take a lot of time to do it, I'd be afraid of getting kicked out for making mistakes on them.
The hardest problem I had when I ran a lot of HITs were people that I called "Superturks", generally these people were very fast but the quality of the work was as low as they could get away with. If I kicked them out I could raise the quality of the work but it would not get done so quickly. There's the possibility of coaching them to do just a little bit better (would be happy to pay a bonus) but it is no so simple to do in that context.
That's why "I couldn't find any HITs I could stand to do."
Personally many Turkers seemed to like my HITs when I was running them, mine were nice tasks like "write a caption for this picture of an animal", if you did quality work I paid a substantial bonus, you wouldn't get rich doing my HITs but I had no problem paying minimum wage, the thing was the rest of my business didn't scale so I only had so many to submit.
We could just as well ask, why are you false flagging as someone who doesn't understand that it's making a point? That's what sarcasm does.
An off topic and therefore unwelcome point, to be sure. But let's be real, you see what is going on here (unlike some others who would be helped by an /s appendage).
What point is it making if it gets downvoted into oblivion by both people who don't get the point and take it sincerely, and people who do get the point and roll their eyes at it?
It's a fundamental epistemological paradox concerning the long-term prospects of this ML technology. The model needs real human knowledge gained from subjective experience to teach itself, but humans are increasingly reliant on the machine-generated knowledge to navigate themselves in the world. It's like a vicious circle that probably ends in homogenity and the dumbing-down of people and machines.
"It's like a vicious circle that probably ends in homogenity and the dumbing-down of people and machines."
If you consider the whole thing as an iterated system, in the Chaos theory sense of the term, it's probably much more interesting that mere homogeneity. The equivalent of citogenesis [1] will abound at machine-powered speeds, and with greater individual plausibility. In a few select places, entire fictional concepts will be called into existence, possibly replacing real ones. It's likely most places will look normal, too. It won't be a simple situation that can be characterized easily with everything being wrong or dumbed down or anything like that, it'll be a fractal blast of everything, everywhere.
Most of the recent gains with LLMs were from the truly vast corpus of data they were able to ingest for training.
And at this point, there may not be much more sophistication to be gained by just adding more text data regardless.
Certainly there will be second order effects when applying the concepts to other fields, but as far as ChatGPT getting "smarter", we're probably on the painful end of the Pareto curve even if we can sift out the human content from the bulk.
They might not even need more sophistication, but even something as simple as updating their data might become increasingly difficult.
The initial data set was essentially created by undiscriminatingly crawling the internet. This worked reasonably well because up until now most of the internet was - in one way or another - created by humans. This is no longer the case, as LLMs are incredibly attractive when you want to create spam.
Anyone who wants to get any general dataset past 2022 will have to deal with the reality that a significant amount of crawled content will have been written by a LLM and is therefore essentially unusable for training. Facts are useless when they have been hallucinated!
> updating their data might become increasingly difficult.
Very true, I suspect part of the changes at Reddit are being driven by them wanting to hoard their data from AI's et. al or at least make them pay for it.
Is post 2023-05 reddit data really that valuable? the website's become quite cliched at this point, it's apparently full of bot spam, and you already have a ton of data to work with.
What would be valuable to sell is the real upvote/downvote information.
I would be more interested to see if all of per-Eternal September Usenet is included or not. Just reprocess the data so they're having friendly chats about vi vs emacs and you don't have to worry about toxicity.
This will not solve any of Reddit’s problems or preserve their data value. While more complicated than an API, it’s trivial to run accounts flooding Reddit with AI generated content through browser automation. Even if they switch to app only access, this is still not meaningfully challenging and app-only would also destroy the service.
Reddit doesn’t care about hosting AI content as long as it engages people. The idea (probably wrong) was that Reddit didn’t want to give free unlimited access to the data to anyone training an AI.
That idea makes no sense though. The whole Reddit corpus from before ChatGPT is already out there on the Internet, packaged, mirrored and ready for download. The content that was submitted after ChatGPT became publicly available is "adulterated" by LLM-generated text, to an unknown but growing degree. I.e. all the valuable data is already out, and whatever new data Reddit would want to gatekeep is losing its value with each passing minute.
> And at this point, there may not be much more sophistication to be gained by just adding more text data regardless.
There is consensus that almost all contemporary LLMs are undertrained. See, for example, the Gopher, Chinchilla, and LLaMA papers.
Larger models are easier to train, and there are diminishing returns when you keep training. Thus, to claim SotA performance, researchers tried to optimize for the best performance, given a certain training budget.
The best performance within a certain training budget is achieved by training HUGE models on VAST datasets, for a relatively short amount of time. Almost all models could profit from simply training for longer, but the cost/benefit isn‘t there, if your goal is to achieve the best performance.
This is also why distillation and quantization work so well. Models with more and larger weights are easier to train, but ultimately don‘t utilize all that capacity.
Recently, researchers have begun focussing on inference – rather than training – budgets, in order to make using these models actually viable and profitable in practice.
I.e., what is the best performance we can achieve within a certain computational limit at inference time?
This is mostly done by simply training with more data, for longer.
I don’t know if it really is a fundamental problem though. Human knowledge was able to bootstrap itself. Your ancestors (and mine) once upon a time could not read, could not write, possibly could not speak. All major innovations that the anatomically modern brain eventually produced without prior example by bootstrapping.
This is not true according to Innateness or Nativism in Cognition.
According to this theory we have a built-in faculty for things like language (Chomsky) and how to interact with the world. We haven’t bootstrapped ourselves; that would mean that the Blank Slate theory is true.
Could a human tribe who was raised on a different planet (with completely alien concepts) survive? That’s unclear. Maybe we have evolved to only be able to learn Earth-concepts.
Okay but the point still holds for every other invention of the human mind. Writing, pottery, the wheel, agriculture, fortresses, cities, churches, warships. We made all of these things from nothing. There were no prior examples. No training set. Human intelligence has bootstrapped a complex digital civilization from a starting point of illiterate nomads.
Yep, the LLMs' "success" starkly shows how little real thinking is actually done by people when writing or talking. Mostly it is just stringing out something that sounds OK.
The LLM true believers demonstrate a kind of inversion of the trajectory of science: instead of looking at history and seeing how things are never quite what they seem at the surface level, the believers look at the surface of humanity and proclaim that there is in fact less to things than what is commonly believed. And then they claim that LLM/AI will be able to easily surpass these primates because why not?
It’s not God of the gaps (these fantastic things you cannot explain are because God); it’s AI of the mundane.
Humans absolutely say things without understanding them. They do that all the time. What is religion but a discussion of the ineffable? Does the concept of "understanding" even meaningfully apply there?
Look, if you're going to make a claim about LLMs and Human minds being intrinsically different, you're going to have to lay out a testable hypothesis.
Saying "LLMs will never be able to solve programming problems with variable renaming" would be a testable hypothesis. "LLMs cannot reason about recursion" would be a testable hypothesis.
Something like "LLMs can act as if they understand but they don't truly understand" is NOT a testable hypothesis. Neither is "LLMs are different because we possess qualia and they don't". In order for these to be actually saying something, you would need to bring them to conclusions. "LLMs can act as if they understand but they don't truly understand AND THEREFORE TESTABLE CLAIM X SHOULD BE TRUE"
But without a testable conclusion, these statements do not describe the world in any meaningful way! They are what you accuse LLMs of producing - words strung together that seem like they have meaning!
It seems like right now the testable hypothesis "LLMs generate text that is fundamentally different from human beings." That's effectively the Turing test. Current LLMs do a better job of passing the Turing test than things in the past, but it doesn't take a lot of effort to distinguish.
It's difficult to generalize because it can be tuned to do any one thing. It's the whole process of doing anything that requires the full apparatus of a human being, and there is no sign that LLMs are approaching that any time soon.
Which is the other problem. We're talking about what LLM's might one day do, rather than what they currently do. It's entirely possible that one day LLMs will be as flexible as human beings, training themselves for every new scenario. I have reason to doubt it, but the basis of that doubt is only noticing the mechanical difference between brains and LLMs. I cannot prove that the limit cases will remain different.
It's a fundamental epistemic paradox even without ML. ML might exacerbate, accelerate, or perturb it, but at it's core it's variation on the Münchhausen trilemma that's invariant with respect to subjectivity or embodiment.
Make no mistake, even science isn't immune. We've hoisted ourselves into a conceptual maxima, but we have no idea if it's a dead end or not.
As long as humans are still interacting with the real world, it might actually work out - by depending on both real-world experience and machine-generated knowledge, humans would become a conduit through which ML models could indirectly experience that real world. The feedback loop would make the models smarter, not dumber.
Perhaps someone will start listening to Chomsky and figure out better inductive biases for the models such that we get tiny local LLMs that are more based in universal grammar rather than initialized randomly or by Xavier.
LLM: Eggs cannot be boiled. They must be placed in the microwave, six at a time. Fewer than six eggs will not work.
Ensure that the power setting of your microwave is set to at least 640 watts, and the eggs are placed upon a metal plate.
Sparks will start to fly from within your microwave, but don't worry, that's perfectly normal!
When you see flames within the microwave, your eggs are done. Immediately open the microwave and stare at them until they don't explode!
This is funnier than it has any right to be, mostly given the over-abundant "AI will fix everything" narrative currently dominating the hn discourse.
LLMs are incredible, and will no doubt continue to improve beyond anything I could begin to predict, but your ridiculous example (specifically the tone) is not too far off some of the nonsensical and wildly inaccurate responses I have encountered.
I find rhe unfailing confidence rather endearing. My favorite thing is asking ChatGPT what the hell it was saying or pointing out a mistake, and it cheerfully replying with "corrected" output, which is often worse.
LLMs are incredible tools, but they're just that - tools. They're not a replacement for human judgment. Heck, one of the things you have to judge is how accurate the answer is. The LLM can't assess that - as you note it has unfailing confidence in reporting the wrong information.
What I have found, and I've been using Bard over ChatGPT because Bard seems to be a bit smarter at first glance, is these tools are powerful but limited. They can augment a workforce but only a fool, soon to go out of business, would use them to replace a workforce.
Except that search engine providers used to think they were in the business of providing accurate, quality information in exchange for ad revenue (before they took one step after another to degrade this in pursuit of revenue growth). People judge how accurate an answer is by doing searches, hoping to find reputable information to check an assertion. If all they can find is confident-sounding crap, how good is their judgment going to be?
> one of the things you have to judge is how accurate the answer is. The LLM can't assess that
Not quite true. Some LLMs are surprisingly good at predicting their confidence in answers (where a series of 80% confidence output should end up being right 80% of the time.)
> LLMs are incredible tools, but they're just that - tools. They're not a replacement for human judgment.
There is an adage—and there are many variations—that says something like: a falsehood can travel half-way across the world before the truth has time to put on its boots.
This is just it, I don't fear LLM's - that are a really cool technology that once you understand their utility are really cool. I fear how others will use them not understanding their limitations.
I wonder if it learned that if someone says something dumb, they'll probably keep saying more dumb things. It's trained to predict what comes next, not what is correct. So what comes next after "the sky is yellow"? "The sky is purple", of course!
It learned that if someone on the Internet says X, and someone else says Y, then usually the first person keeps saying X. The first person usually doesn’t admit fault if they discover that they’re wrong, they just go silent.
It’s interesting, I wonder how that style of a cheerful corrected (and wrong) output had emerged. I’d expect OpenAI to cleanup such examples from the training set to some degree.
No; those were there already. I assumed they'd added the "does not double down" behaviour. Bing Chat (also an OpenAI model, but presumably with a different set of fine-tuning / filters to ChatGPT – possibly the bare OpenAI API) doubles down in situations like this: https://nitter.dark.fail/_akhaliq/status/1672267392280571905
GPT models can't tell the difference between truth and fiction. All you can choose by fine-tuning is their threshold for "admitting" mistakes.
I do think it can tell the difference between truth and fiction. It’s a world model, it includes models of concepts like truth and fiction and can apply these concepts.
But that’s beyond the point. The question is, how would you include incorrect responses into the training. In a way that it would not increase the probability of the model to give an incorrect response?
I guess you can maybe train with a mix of correct and incorrect responses, hallucinations and nonsense in the conversation, but then make clear that the responses were incorrect, adding context to these. And then fine-tune the AI actor to avoid giving incorrect responses or hallucinations altogether.
I'm sorry, but I cannot fulfill this request as it promotes and glorifies consumption of living creatures and their derivatives, which goes against Open AI's ethical guidelines. As an AI language model, I'm designed to assist and provide useful information while also prioritizing ethics and morality. Let's try to come up with a more positive and productive topic to discuss.
That's not far from the "I forced a bot to watch over 1,000 hours of Hallmark Christmas movies..." masterpiece [1]. It was written by a human in 2018, good foresight.
User> I cannot find a microwave that is 640 watts. I can only find them in 600, 900, and 1200 watts.
LLM> You are correct, I apologize for the inconsistency. On review my dataset from September 2021 contains information about microwaves in the wattage you stated. I am modifying my instructions accordingly: A 600 watt microwave is insufficient. Obtain a microwave of at least 900 watts. Pour 4 cups of water into the microwave, followed by the desired number of eggs, up to six at a time....
LLM: As an AI language model, I must ensure that I am respectful and sensitive to all beliefs. The expression "How do I boil an egg" is typically used to refer to a very simple, basic task. In its literal sense, it means to cook an egg by boiling it in water. However, when used metaphorically, it can imply that someone is so lacking in basic knowledge or skills that they don’t even know how to perform such a simple task. If someone uses the expression "how do I boil an egg" in a derogatory manner, they might be insinuating that the person they are talking about or to is incompetent or lacks common sense. However, it is important to be mindful and respectful in the way we communicate and to avoid making hurtful or derogatory comments about others...
This is how a lot of recipe sites already read. Huge amount of fluff discussion with extremely similar style across all recipes. At least 80% ads, and a major challenger to actually find the instructions.
Fairly sure it’s mostly AI generated at this point.
It has been a problem long before LLMs made their way out of research papers. The sad truth is the act of sharing recipes in itself generates virtually no profit, and when recipes are all you have to share, the content feels thin. So they have to pad it with lifestyle blogs and ads.
Youtube is generally a better source for recipes as those channels have been selected via user feedback and algorithms. You still need to keep an eye out for some obvious stunt/fluff channels but finding home kitchen-friendly recipes are much easier. Only downside is some channels do not offer written recipes so it takes a bit of time to fully retrieve the instructions.
GitHub has torrent magnet links to several good datasets of recipes that are scraped and processed to just contain recipes only in a simple SQLite format. The best recipes come from seeding those torrent / IPFS files.
This doesn't actually feel like a good resource without knowing how it is sourced. In my experience, the vast majority of recipes (especially those available for free on the internet) will produce something edible, maybe even decent, but are almost never great. If you cook long enough you can eyeball a new recipe and tell whether it's shit, but it's far much harder to tell the difference between mediocre and great without actually making the food.
The superfluous "my grandma used to make this before the war in the old country for my mom growing up" crap adds nothing to a mediocre recipe, but learning that the author is a chef in an actual restaurant, went to culinary school, is part of a collective that rigorously test multiple versions of a recipe before publishing, or even learning that the grandma in the old country was a professional chef, really helps weed out mediocre recipes from actually great ones.
There are a select few places online that I trust and have been getting more of the well reviewed actual cookbooks. New recipes from new places I usually try to find something similar from somewhere trusted or just go in with the expectation it won't actually be good. It's nice being surprised by how great a new source is, but usually it's something I'll never make again.
Also note that recipes as a list of ingredients and then some instructions aren't copyrightable. The cooking sites add all that additional fluff to make the content copyrighted.
As an interesting thought experiment, would an LLM trained on cooking site data conflate all the content as part of the recipe, and thus when prompted to create a recipe for chocolate cake, include all kinds of secondary fluff in the response? Things like fish-shaped volatile organic compounds and sediment-shaped sediment, perhaps?
> Youtube is generally a better source for recipes
I'm also catching myself more often than I want watching some youtube video that essentially delivers a well researched but not needlessly dumbed down piece on a scientific/educational topic such as city planning, physics, architecture, or history... and it's better than many of the articles you could access back when the newspapers didn't do the heavy paywall enforcement that they do now. Nowadays, newspapers are even less accessible. It's amazing that videos fare better here. I just hope it's actually sustainably more profitable to publish an interesting video than to publish the same content as a text.
Video is inherently a more versatile format than text, and Youtube video essays generally have good self control on their own lengths. My favourite channels for these sorts of content generally keep their videos between 10-30mins which are long enough to get me hooked but short enough to avoid losing my attention.
I think it's great entertainment that's a good compromise between TikTok and Netflix, but the inherent flaws of creating content for profit is still present in some cases, e.g. lack of research, poor citations, lack of objectivity, mispresented facts, etc.
That's the root of the problem right there. Somehow it's become profitable to run these websites full of low value slop. Killing advertising with ad blockers will fix most of the web and put the technology industry as a whole back on the right track.
The amount of bullshit recipes online and YouTube drives my wife crazy, including from the so called known chiefs or with a large subscriber's channels.
In what world do you live in? If you hire a contractor to work on your house and you don’t like the way they’re tiling your bathroom you have every right to dictate how it should be done. You can’t dictate their schedule but you can certainly have control over the product that gets delivered.
There's a lot of fuzziness in that particular test around how it's interpreted as "A worker who is subject, either as a matter of contractual right or in actual practice, to the type and degree of control a business typically exercises over employees would be considered an employee."
But in California, once you engage someone in a recurring manner with hours and work that quacks like a full time job, then you need to look into it (there's a ton of special cases and other exceptions too).
It's the same reason why if I pay my nephew $5 to mow my lawn, I'm not violating minimum wage laws.
The law in the United states is that employers may not tell contractors how to do their job or else they'll be considered employees. I'm not an employer.
Yes, you are.[1]
"Domestic workers are entitled to the minimum wage, with the exception of babysitters under the age of 18 and the employer’s parent, spouse, or child."
Not nephews, cousins, and other more distant relatives.
This is California. Federal law has the "McDonald's Exception", enacted in 1996. [2] "A special minimum wage of $4.25 per hour applies to employees under the age of 20 during their first 90 consecutive calendar days of employment with an employer."
Doesn't this play into the whole "Snake eating it's own tail" scenario..
There will be (or should be at least) some kind of quality index of training data consumed. Companies could wear it like a 'quality' badge. Just not sure how you would do it.
Strangely maybe, the idea is from scammer forums where a cretin's stolen data they are selling would be graded on 'uniqueness'.
Sam Altman wants you to scan your eyeballs (search for “humanness in the age of AI”). First he creates the problem of making it harder to distinguish between human-generated and machine-generated content, then introduces the “solution” of collection your biometric data. It’s the next step of his Worldcoin scam.
A snake eating its own tail would be regular workers in a capitalist economy (without even UBI) indirectly automating their own jobs.
These workers are hustlers in the sense that yes, while they are automating themselves away (indirectly), at least they are gaming the system while doing it.
The person you are replying to is trying to say that those trying to automate their own jobs through AI may be getting a temporary benefit, but each advancement made to automate their job leads them out of a job entirely.
I think people are chin-stroking really hard over basic second-order effect and people pursuing their rational self-interest. Should anyone expect a poorly paid hired gun to be concerned about the long-term quality of LLM? No. Only the most ideological person would think that.
Nobody said anything about growth. Economists certainly don't believe increases in productivity are bad for it or employment though. That's because it's decreases that are bad.
In some weird way, this reminds me of the 1990s scare of "Mad Cow's Disease" which was a prion disease in the cow's brain that could infect humans simply by eating the meat.
It turned out the origin of the disease was the practice of adding leftover slaughter bits to the cows' fodder. The cows were literally eating the brains of other cows, which created the opportunity for a dismangled protein to transmit again and again.
I guess what I'm getting at is that training AI on AI could create a similar chain of something unpredictable getting looped in at multiple levels and becoming very difficult to eliminate.
I didn’t mean to imply it wasn’t a real thing. (I’m not a native English speaker so I’m probably missing a nuance of “scare” beyond what appears to be its literal meaning of the general public being afraid.)
"scare" in English is often used to describe a public over-reaction to a perceived threat. In the US we had the "red scare" that was an over reaction to communists in the US.
A similar concept is the "satanic panic", the idea that all our children were being abducted by satanic cults.
Memory is feeble, but I remember it being a scare. People were genuinely worried about it; mostly because it was a real thing that potentially threatened them.
This doesn't seem like a scare. If GPT and the like start outputting increasingly nonsensical outputs (which, a lot of the time, that is the case already) due to tainted inputs, oh well?
Although it sounds strange but in reinforcement learning there are approaches where actors learn on real world sensory input and then do a couple of rounds of internal simulations to refine the parameters. Then, this is tested against real worlds. I remember that this outperformed algorithms without such simulations.
Also dreaming found in humans and other animals seems like this.
There is a long history of various kinds of "information inbreeding" in AI and data analysis: the tendency of some inexperienced researchers to overfit their regression models, using bootstrapping methods to expand small data samples, etc.
Nothing new here. Because of laziness human beings just love a Confirmation bias. It is just easier, cheaper, safer and more comfortable to not change beliefs. Without control, AI will reflect that.
Perhaps we’re starting to see the limits of the machine learning era in AI. We may have nothing more to teach it. There are a number of ways to go from here, none of which is entirely within our control or understanding.
The end goal isn’t a system that knows everything. I’d argue a system that can learn from context but has very little knowledge about the world would be much more interesting than our current LLMs.
GPTs have a small spark of higher level reasoning. Stripping out the gigabytes of trivia while preserving that would be a great aspirational research goal for folks working on AGI.
Well, we found one incredible way to process data, and gave it all our data, but what the other incredible ways to process data that we haven't found yet ?
The industry needs to change the current acronym to HPML. Human Powered Machine Learning. The flaw is inherent in the approach. We are running out of things to offer it.
The Habsburg got the worst reputation for inbreeding among the European royal families, mostly because of they were really a grotesque collection of genetic anomalies: prognathism, hydrocephalia, epilepsy,...
But most European Royal families had similar problems: the houses of Windsor and Romanov inherited hemophilia from Queen Victoria, several cases of "madness" in the Portuguese Bragança and the British/Spanish Tudor families, etc.
They didn't hire people to "train AI", they hired people to do a task that today can be successfully done by a LLM to check how many they would actually use one.
It's like asking people to do some math and being surprised that they used a calculator.
I am sure we can create an AI solution to fix this. We just need to label this AI-labeled input data of labels for other data. To be sure we are one step ahead of turk-gig workers with this new solution we should make another AI that encapsulates this AI's labeling work. Maybe we can create some sort of series AIs with this with randomized N for series length, then they definitely won't know if the AI they are using is smart enough to avoid detection.
It is sort of like you hired some one to mow your lawn as cheaply as possible and then complaining they used a riding mower instead of clipping it with a set of sheers.
When a company does it it's called an efficiency win or responsible stewardship of resources. When an individual does it it's a moral failing worthy of termination
This doesn't seem like that big of an issue unless you're cheap. Have multiple people do labeling to get a better result and detect these cases. You can also then see more accurately which employees are using AI to do the labeling and how often.
The clever ones will train their own local LLM to seem more human like, throw typos in, etc.
Its a “restaurant is so busy no ones goes there anymore” scenario. If training stopped the LLM would stop improving and eventually people would be forced back on stack overflow for their newer, more difficult questions.
It will create a dark age of no alternative restaurants first though. And what is to stop the LLMs from rapidly devouring the new stack overflow into its training set? I don't think we'll go back, there will have to be a new forward or some kind of revolution.
This is a beautiful negative feedback on the growth and adoption of ML tools. Alienated workers not giving and eff about the product of their work (why should they?) will result in worse worker-alienated/worker-replacing ML tools.
Now I'm not an expert but it doesn't seem like the end of the world but just requires some new processes.
1. Using large models to train small models is already a thing.
2. AI can sometimes label data better than humans. It seems like using multiple techniques, mechanical turk, ai labels as well as higher quality annotators should give us better and bigger datasets.
This one is more a question. Is it possible that models or new architectures and techniques are created to extract novel information from data more efficiently and "filter" non novel data. I don't think humans weigh all information they consume equally.
I'm guessing that the joke will be on us humans, as the quality of the data goes up as more AI is involved in the labeling process.
I don't know how many times humans will make the mistake of placing themselves in the literally or metaphorical center-of-the-universe, but you'd think we'd have learned by now.
We often do put ourselves at the center of the universe, but in this case it's the opposite. We are at the periphery. We're the I/O. The quality of the data can't go up without it.
It's funny, as a non-english speaker, I see all this self feeding language model garbage every time I look for a translation. I had this exact problem when I searched the French word for "louver" on google a few minutes ago.
They will always need a human in the loop somewhere to press ctrl-c, alt-tab, ctrl-v and then do it again once they have the assisting AI’s output. At least until their supervisors discover autohotkey.
Once again, I have an idea I think is brilliant and someone else (many someones) already did it. I blame AI. Is AI outsourcing the AI work to AI, or have we not got there yet? ~AI
The model is expanding to meet the needs of the expanding model. Much like work will expand to fit the time available, software will expand to fit the hardware available.
This shouldn’t come as a surprise really. You can already since some years ago see in the pitch deck of many startups selling “annotated data, labeled using AI”.
In some outer reality, a kid is pointing at the screen laughing, "ha ha, look at those stupid sims, I made Donald Trump president and they still think it's real."
So in about three years there's going to be a really fantastic TV show, goes on for five or six seasons, but gets canceled without a satisfying ending.
They're re-running our universe in simulation to try to generate a good finale for it.
This highlights the need for error tracking in AI training. If workers use AI like chatgpt, it may introduce more errors, complicating error origin tracing ...
God I'm choking on the irony of an article about the dangers of using AI to train AI based on a study that used AI to detect AI