Hacker News new | past | comments | ask | show | jobs | submit login
People paid to train AI are outsourcing their work to AI (technologyreview.com)
339 points by kungfudoi on June 22, 2023 | hide | past | favorite | 224 comments



This article is complete bunk. The researchers used a "chatgpt detector" which as we've seen over and over in academia, do not work. This study is completely unfounded.

God I'm choking on the irony of an article about the dangers of using AI to train AI based on a study that used AI to detect AI


Per the article, the didn't just use the static detector:

They also extracted the workers’ keystrokes in a bid to work out whether they’d copied and pasted their answers, an indicator that they’d generated their responses elsewhere.

So while I don't yet know if the article is bunk -- I do know that your hot take is bunk.


Or they typed up their responses in a different text editor and copy and pasted from that?


They never used a static detector.


"Static" in this case refers its being used in isolation (not to any particular kind of detector).

I could have said "they didn't just use the detector all by itself", I suppose.


Everyone’s trying to take the shortcut.

Can someone in this space invest in doing the hard work to have experts manually curate data?

You know back before Wikipedia, publishers used to pay people to write and edit encyclopedias?

It doesn’t scale. Sure. That’s what the AI you’re building is for though - it will scale.

Throwing compute at ‘the entirety of the internet’ feels like such a lazy way to get what we’re after here.


The company that I work at does exactly the service that you're describing. We recently spun up a team of Math PhDs to help with data labeling. (https://www.invisible.co/). We're seeing more and more of our clients ask for graduate level data labelers and content creators.


Right now I'm pretty sure just having gpt rewrite the average low to mediocre content that made up the gruel of its generic internet diet and doing fine tunes will get us another 5-10x along, but most hopefully for us little guys out there with a ~24-48GB VRAM cap

If GPT4 really is 8 230M models, the next bit for us will be a few ~1-5M models that swap in for whatever you want to create, or talk about, or what have you

Imagine a model trained just on English football for the purpose of having a good time in the pub that is used when the topic changes to it. I bet you could pass on the dailymails sports page if you add some "u"s into your words.

Or a model finetuned specifically on the library you're trying to debug, maybe even specifically in combination with other tools you're trying to put together.


My well-documented melancholy around the state of the LLM “conversation” notwithstanding, I’ll point out that there’s a long and generally productive history of adversarial training: from the earliest mugshot GANs to AlphaZero, getting these things to play against each other seems to produce interesting results.

Whatever the merits of this or that “ChatGPT detector”, the concept isn’t unprecedented or ridiculous.


Plot twist: the author will turn out to have enlisted the help of AI in writing this article.


It's actually pretty easy to create a bespoke ChatGPT detector!


But will it give reliable results?


according to the paper they get 98% accuracy. another recent paper came out saying it's always possible to discriminate between real and synthetic text [1].

i think the core problem is with the generalist classifiers (gptzero, openai detector, etc). ex. openai's classifier has an accuracy of around 25% on it's own text. however, when you train a bespoke classifier (like the authors did), you can get really good results.

[1] https://arxiv.org/pdf/2304.04736.pdf


The moment a detector is taken seriously is the moment it will be trivially beaten by another AI designed to beat the detector.


Is it now?

Adversarial training isn't infinitely scalable either, has its limitations also.

Also - the moment that companies start training models to resist detectors, they expose themselves to regulation. Won't stop dark AI models running on some website somewhere, but it can be very effectively applied to companies running at Google or OpenAI scale.


i would recommend u read the paper. the contribution isnt a detector thats meant to be taken seriously; but a detector that works in a very specific task. they then use this to estimate use of LLMs on MTurk


I can understand your frustration with the article, but let's approach it with an open mind. While the use of a "chatgpt detector" may have its limitations, it's essential to appreciate the researchers' effort in exploring new methods. The study may not be perfect, but it contributes to the ongoing conversation about the risks of using AI in AI training. Irony aside, let's keep the discussion going and encourage further research to improve our understanding of this complex field.


So polite it hurts. I wonder if in the future people on the internet will leave deliberately offensive posts to show that they are human.


It's crazy, isn't it?

I don't know what feels worse for me - that whenever I read a mannered, well-structured and somewhat verbose comment, I now suspect it wasn't authored by a human - or that, as I quickly realized, my own writing style feels eerily similar to ChatGPT output.


If it helps, your response here doesn't feel similar to ChatGPT output.


Thanks. I've already noticed that I've started to unconsciously adjust my writing style to avoid that feeling of similarity to ChatGPT.

That said, compared to typical comments on-line (even on this site), using paragraphs, proper capitalization, correct punctuation, and avoiding typos already gets you more than half of the way to writing like ChatGPT...


@dang are we ever going to do anything about this? You almost can't read a comment section in a thread about AI without this crap now.


I mean, what rule do you actually want here?

ChatGPT has been RLHFed into a pretty distinctive style, but there's no reason to think a better LLM wouldn't have a more natural style. If AGI is possible, then HN will end up with AI users who contribute on an equal basis to the modal HN user, and then shortly after that, more equal. Should all AI be banned? Should you have to present a birth certificate to create an account?


> Should you have to present a birth certificate to create an account?

I actually honestly believe that the era of "open registration" forums and discussion places is going to come to a close, largely due to GNN.

It's not going to become a problem until the hardware and walltime costs of training models and running them comes down. You'll know it's a problem when every 10th post on 4chan is a model pretending to be a human that is of a gentle but unyielding political persuasion of some sort.

I don't know what the end pattern will be, but it'll likely be a combination of things

- large platforms, like reddit or facebook, where individual communities "vibe check" posts out.

or

- some sort of barrier to entry, such as a small amount of money (the so called "idiot tax": if you're an idiot, you get banned, and you have to pay again)

- some sort of (manual!) positive reputation system for discussion boards, sort of like how peering works

- some sort of federation technology where you apply and subscribe to federation networks

I don't think we'll really be able to predict what the future looks like right now (it's not even widely recognized as a problem). And since this is HN, I'll add: I don't think there's any serious money to be made running reputation or IDV, unless you've already started. And if it becomes a serious enough problem, players like ID.me/equifax/bureau will be the situation for "serious" networks (linkedin, facebook, chat, etc).


I wonder how far off we are from an AI with a forged birth certificate.


I have farmed out work to Turks and tried to "go native" as a Turk and found I couldn't find HITs I could bear to do.

It used to be there were a lot of HITs that involved OCRing receipts but these were not receipts that were straightforward to OCR, they were receipts that failed the happy pass and that I thought there was no way I could transcribe them accurately in a reasonable amount of time considering what it paid.


I don’t know what a HIT is, but I’m pretty sure the problem is that they wanted you to use your _eyeballs_! Not OCR.

And yeah, the service is notorious for underpaying.


A HIT is a single task on Mechanical Turk.

"Transcribing" is a better word or maybe "manual OCR".

The ones they sent to AMT were just awful, I would say 2/3 of them were impossible to transcribe with complete accuracy and would take a lot of time to do it, I'd be afraid of getting kicked out for making mistakes on them.

The hardest problem I had when I ran a lot of HITs were people that I called "Superturks", generally these people were very fast but the quality of the work was as low as they could get away with. If I kicked them out I could raise the quality of the work but it would not get done so quickly. There's the possibility of coaching them to do just a little bit better (would be happy to pay a bonus) but it is no so simple to do in that context.


Sounds like a waste of time to me. You seem like an intelligent person. Doesn’t a minimum wage job pay better?


That's why "I couldn't find any HITs I could stand to do."

Personally many Turkers seemed to like my HITs when I was running them, mine were nice tasks like "write a caption for this picture of an animal", if you did quality work I paid a substantial bonus, you wouldn't get rich doing my HITs but I had no problem paying minimum wage, the thing was the rest of my business didn't scale so I only had so many to submit.


Some mturk workers mturk while physically present at another job


Eyeballs are optical, and it sounds like GP was recognizing characters, so I think it was, in fact, Optical Character Recognition.



[flagged]


I'm pretty sure he's referring to https://www.mturk.com/worker, the name of which is itself a reference to https://en.wikipedia.org/wiki/Mechanical_Turk


I thought turk referred to 'Amazon Mechanical Turk'


Correct. Not people from Turkey, at least no so far as I know.


Why are you false-flagging as someone who cares about such things?


We could just as well ask, why are you false flagging as someone who doesn't understand that it's making a point? That's what sarcasm does.

An off topic and therefore unwelcome point, to be sure. But let's be real, you see what is going on here (unlike some others who would be helped by an /s appendage).


What point is it making if it gets downvoted into oblivion by both people who don't get the point and take it sincerely, and people who do get the point and roll their eyes at it?


HN isn't for trolls.


Right, we agree on that.


Can you explain the "point"?


I’ll decline. It was off topic.


This is not the kind of comment I expect to find on HN. Please consider deleting it.


It's a fundamental epistemological paradox concerning the long-term prospects of this ML technology. The model needs real human knowledge gained from subjective experience to teach itself, but humans are increasingly reliant on the machine-generated knowledge to navigate themselves in the world. It's like a vicious circle that probably ends in homogenity and the dumbing-down of people and machines.


"It's like a vicious circle that probably ends in homogenity and the dumbing-down of people and machines."

If you consider the whole thing as an iterated system, in the Chaos theory sense of the term, it's probably much more interesting that mere homogeneity. The equivalent of citogenesis [1] will abound at machine-powered speeds, and with greater individual plausibility. In a few select places, entire fictional concepts will be called into existence, possibly replacing real ones. It's likely most places will look normal, too. It won't be a simple situation that can be characterized easily with everything being wrong or dumbed down or anything like that, it'll be a fractal blast of everything, everywhere.

[1]: https://en.wikipedia.org/wiki/Wikipedia:List_of_citogenesis_...


In layman's terms: https://xkcd.com/978/


In slightly longer layman's terms: https://www.youtube.com/watch?v=OjlKIjLWq-Y

Snopes did recently confirm that this video is in fact accurate.


Arachnophobics, please resist the urge to stop the video in the few first seconds. It is worthwhile.


> Snopes did recently confirm that this video is in fact accurate.

I don’t trust anymore. :-)


Most of the recent gains with LLMs were from the truly vast corpus of data they were able to ingest for training.

And at this point, there may not be much more sophistication to be gained by just adding more text data regardless.

Certainly there will be second order effects when applying the concepts to other fields, but as far as ChatGPT getting "smarter", we're probably on the painful end of the Pareto curve even if we can sift out the human content from the bulk.


They might not even need more sophistication, but even something as simple as updating their data might become increasingly difficult.

The initial data set was essentially created by undiscriminatingly crawling the internet. This worked reasonably well because up until now most of the internet was - in one way or another - created by humans. This is no longer the case, as LLMs are incredibly attractive when you want to create spam.

Anyone who wants to get any general dataset past 2022 will have to deal with the reality that a significant amount of crawled content will have been written by a LLM and is therefore essentially unusable for training. Facts are useless when they have been hallucinated!


> updating their data might become increasingly difficult.

Very true, I suspect part of the changes at Reddit are being driven by them wanting to hoard their data from AI's et. al or at least make them pay for it.


Is post 2023-05 reddit data really that valuable? the website's become quite cliched at this point, it's apparently full of bot spam, and you already have a ton of data to work with.

What would be valuable to sell is the real upvote/downvote information.

I would be more interested to see if all of per-Eternal September Usenet is included or not. Just reprocess the data so they're having friendly chats about vi vs emacs and you don't have to worry about toxicity.


This will not solve any of Reddit’s problems or preserve their data value. While more complicated than an API, it’s trivial to run accounts flooding Reddit with AI generated content through browser automation. Even if they switch to app only access, this is still not meaningfully challenging and app-only would also destroy the service.


Reddit doesn’t care about hosting AI content as long as it engages people. The idea (probably wrong) was that Reddit didn’t want to give free unlimited access to the data to anyone training an AI.


That idea makes no sense though. The whole Reddit corpus from before ChatGPT is already out there on the Internet, packaged, mirrored and ready for download. The content that was submitted after ChatGPT became publicly available is "adulterated" by LLM-generated text, to an unknown but growing degree. I.e. all the valuable data is already out, and whatever new data Reddit would want to gatekeep is losing its value with each passing minute.


> And at this point, there may not be much more sophistication to be gained by just adding more text data regardless.

There is consensus that almost all contemporary LLMs are undertrained. See, for example, the Gopher, Chinchilla, and LLaMA papers.

Larger models are easier to train, and there are diminishing returns when you keep training. Thus, to claim SotA performance, researchers tried to optimize for the best performance, given a certain training budget.

The best performance within a certain training budget is achieved by training HUGE models on VAST datasets, for a relatively short amount of time. Almost all models could profit from simply training for longer, but the cost/benefit isn‘t there, if your goal is to achieve the best performance.

This is also why distillation and quantization work so well. Models with more and larger weights are easier to train, but ultimately don‘t utilize all that capacity.

Recently, researchers have begun focussing on inference – rather than training – budgets, in order to make using these models actually viable and profitable in practice.

I.e., what is the best performance we can achieve within a certain computational limit at inference time?

This is mostly done by simply training with more data, for longer.


Are you saying actually contemporary LLMs would benefit from more data?


Yes, and longer training as well.


More "quality" data equals more better.

The argument here is the LLM generated text is now going to enter the corpus, muddying the waters and reducing the quality.


Perhaps Meta will release their dataset for Galactica sometime, and we will have incredibly good training data quality.


Current LLM outputs already kind of reek of reddit-style responses, at least to me. Is a hivemind really any different than an AI making its own data?


I don’t know if it really is a fundamental problem though. Human knowledge was able to bootstrap itself. Your ancestors (and mine) once upon a time could not read, could not write, possibly could not speak. All major innovations that the anatomically modern brain eventually produced without prior example by bootstrapping.


This is not true according to Innateness or Nativism in Cognition.

According to this theory we have a built-in faculty for things like language (Chomsky) and how to interact with the world. We haven’t bootstrapped ourselves; that would mean that the Blank Slate theory is true.

Could a human tribe who was raised on a different planet (with completely alien concepts) survive? That’s unclear. Maybe we have evolved to only be able to learn Earth-concepts.


Okay but the point still holds for every other invention of the human mind. Writing, pottery, the wheel, agriculture, fortresses, cities, churches, warships. We made all of these things from nothing. There were no prior examples. No training set. Human intelligence has bootstrapped a complex digital civilization from a starting point of illiterate nomads.


That's because humans don't say things without understanding them.

A LLM will parrot what it learned without any understanding, which is unlike a human.


> That's because humans don't say things without understanding them.

Yeah, about that.

https://v.cx/2010/04/feynman-brazil-education

The parallels to LLMs are rather uncanny, now that I think about it.


Yep, the LLMs' "success" starkly shows how little real thinking is actually done by people when writing or talking. Mostly it is just stringing out something that sounds OK.


The LLM true believers demonstrate a kind of inversion of the trajectory of science: instead of looking at history and seeing how things are never quite what they seem at the surface level, the believers look at the surface of humanity and proclaim that there is in fact less to things than what is commonly believed. And then they claim that LLM/AI will be able to easily surpass these primates because why not?

It’s not God of the gaps (these fantastic things you cannot explain are because God); it’s AI of the mundane.


Humans absolutely say things without understanding them. They do that all the time. What is religion but a discussion of the ineffable? Does the concept of "understanding" even meaningfully apply there?


Yah, I knew someone would reply something like that.

Just because someone sometimes says something without understand does not in slightest mean that that is the common occurrence.


Look, if you're going to make a claim about LLMs and Human minds being intrinsically different, you're going to have to lay out a testable hypothesis.

Saying "LLMs will never be able to solve programming problems with variable renaming" would be a testable hypothesis. "LLMs cannot reason about recursion" would be a testable hypothesis.

Something like "LLMs can act as if they understand but they don't truly understand" is NOT a testable hypothesis. Neither is "LLMs are different because we possess qualia and they don't". In order for these to be actually saying something, you would need to bring them to conclusions. "LLMs can act as if they understand but they don't truly understand AND THEREFORE TESTABLE CLAIM X SHOULD BE TRUE"

But without a testable conclusion, these statements do not describe the world in any meaningful way! They are what you accuse LLMs of producing - words strung together that seem like they have meaning!


It seems like right now the testable hypothesis "LLMs generate text that is fundamentally different from human beings." That's effectively the Turing test. Current LLMs do a better job of passing the Turing test than things in the past, but it doesn't take a lot of effort to distinguish.

It's difficult to generalize because it can be tuned to do any one thing. It's the whole process of doing anything that requires the full apparatus of a human being, and there is no sign that LLMs are approaching that any time soon.

Which is the other problem. We're talking about what LLM's might one day do, rather than what they currently do. It's entirely possible that one day LLMs will be as flexible as human beings, training themselves for every new scenario. I have reason to doubt it, but the basis of that doubt is only noticing the mechanical difference between brains and LLMs. I cannot prove that the limit cases will remain different.


Here's one: LLMs will never be able to verify whether their output is true.


can you?


> That's because humans don't say things without understanding them.

Oh? I say things only when I don't understand them. Once I understand something, talking about it further seems rather pointless and certainly boring.


I don't really believe in the concept of soul. Humans are just biological machines too- pretty intricate but then it had a few billion years.


It's a fundamental epistemic paradox even without ML. ML might exacerbate, accelerate, or perturb it, but at it's core it's variation on the Münchhausen trilemma that's invariant with respect to subjectivity or embodiment.

Make no mistake, even science isn't immune. We've hoisted ourselves into a conceptual maxima, but we have no idea if it's a dead end or not.

1. https://en.m.wikipedia.org/wiki/M%C3%BCnchhausen_trilemma


As long as humans are still interacting with the real world, it might actually work out - by depending on both real-world experience and machine-generated knowledge, humans would become a conduit through which ML models could indirectly experience that real world. The feedback loop would make the models smarter, not dumber.


Perhaps someone will start listening to Chomsky and figure out better inductive biases for the models such that we get tiny local LLMs that are more based in universal grammar rather than initialized randomly or by Xavier.


Indeed - it turns out the singularity was in fact self-referential and the whole thing evaporated in a cloud of weights and biases.


It will merely end in diminishing returns, no?


Wow, I feel like for once the headline isn't click-baity enough on this one. I expected maybe a few people, but...

"They estimated that somewhere between 33% and 46% of the workers had used AI models like OpenAI’s ChatGPT."


Two years from now:

User: "How do I boil an egg?"

LLM: Eggs cannot be boiled. They must be placed in the microwave, six at a time. Fewer than six eggs will not work. Ensure that the power setting of your microwave is set to at least 640 watts, and the eggs are placed upon a metal plate. Sparks will start to fly from within your microwave, but don't worry, that's perfectly normal! When you see flames within the microwave, your eggs are done. Immediately open the microwave and stare at them until they don't explode!

Bon Appetit!


This is funnier than it has any right to be, mostly given the over-abundant "AI will fix everything" narrative currently dominating the hn discourse.

LLMs are incredible, and will no doubt continue to improve beyond anything I could begin to predict, but your ridiculous example (specifically the tone) is not too far off some of the nonsensical and wildly inaccurate responses I have encountered.

I find rhe unfailing confidence rather endearing. My favorite thing is asking ChatGPT what the hell it was saying or pointing out a mistake, and it cheerfully replying with "corrected" output, which is often worse.


Try pointing out its mistakes when it hasn't made a mistake. You'll usually get the same "I'm sorry, here's the correct answer" output.


And getting it to derive it's mistake doesn't always work (NB: This is GPT3 & bard, GPT4 is much better):

https://richardcocks.github.io/prime.html


LLMs are incredible tools, but they're just that - tools. They're not a replacement for human judgment. Heck, one of the things you have to judge is how accurate the answer is. The LLM can't assess that - as you note it has unfailing confidence in reporting the wrong information.

What I have found, and I've been using Bard over ChatGPT because Bard seems to be a bit smarter at first glance, is these tools are powerful but limited. They can augment a workforce but only a fool, soon to go out of business, would use them to replace a workforce.


Except that search engine providers used to think they were in the business of providing accurate, quality information in exchange for ad revenue (before they took one step after another to degrade this in pursuit of revenue growth). People judge how accurate an answer is by doing searches, hoping to find reputable information to check an assertion. If all they can find is confident-sounding crap, how good is their judgment going to be?


> one of the things you have to judge is how accurate the answer is. The LLM can't assess that

Not quite true. Some LLMs are surprisingly good at predicting their confidence in answers (where a series of 80% confidence output should end up being right 80% of the time.)

“Language Models (Mostly) Know What They Know” by Anthropic: https://www.anthropic.com/index/language-models-mostly-know-...


> LLMs are incredible tools, but they're just that - tools. They're not a replacement for human judgment.

There is an adage—and there are many variations—that says something like: a falsehood can travel half-way across the world before the truth has time to put on its boots.


Bard is much less capable than GPT4, if you're only using the free tier.

What I notice about Bard is that it hallucinates almost anytime I ask it anything. "Compare and contrast <two things that don't exist>" is a good one.


This is just it, I don't fear LLM's - that are a really cool technology that once you understand their utility are really cool. I fear how others will use them not understanding their limitations.


> given the over-abundant "AI will fix everything" narrative currently dominating the hn discourse.

It seems we are not reading the same HN.


True, it's died down a bit in recent weeks. But it's still taken crypto's place of the mind share.


I wonder if it learned that if someone says something dumb, they'll probably keep saying more dumb things. It's trained to predict what comes next, not what is correct. So what comes next after "the sky is yellow"? "The sky is purple", of course!


It learned that if someone on the Internet says X, and someone else says Y, then usually the first person keeps saying X. The first person usually doesn’t admit fault if they discover that they’re wrong, they just go silent.


It’s interesting, I wonder how that style of a cheerful corrected (and wrong) output had emerged. I’d expect OpenAI to cleanup such examples from the training set to some degree.


I assumed OpenAI had added that themselves.


They’ve added incorrect answers?


No; those were there already. I assumed they'd added the "does not double down" behaviour. Bing Chat (also an OpenAI model, but presumably with a different set of fine-tuning / filters to ChatGPT – possibly the bare OpenAI API) doubles down in situations like this: https://nitter.dark.fail/_akhaliq/status/1672267392280571905

GPT models can't tell the difference between truth and fiction. All you can choose by fine-tuning is their threshold for "admitting" mistakes.


I do think it can tell the difference between truth and fiction. It’s a world model, it includes models of concepts like truth and fiction and can apply these concepts.

But that’s beyond the point. The question is, how would you include incorrect responses into the training. In a way that it would not increase the probability of the model to give an incorrect response?

I guess you can maybe train with a mix of correct and incorrect responses, hallucinations and nonsense in the conversation, but then make clear that the responses were incorrect, adding context to these. And then fine-tune the AI actor to avoid giving incorrect responses or hallucinations altogether.


There is a kind of charm to AI Hallucinations. To confidently talk nonsense.


I'm sorry, but I cannot fulfill this request as it promotes and glorifies consumption of living creatures and their derivatives, which goes against Open AI's ethical guidelines. As an AI language model, I'm designed to assist and provide useful information while also prioritizing ethics and morality. Let's try to come up with a more positive and productive topic to discuss.


If RLHF is a mechanism to infuse LLMs with propaganda and their truths, then the future is LLMs named after the participants in the RLHF process.


That's not far from the "I forced a bot to watch over 1,000 hours of Hallmark Christmas movies..." masterpiece [1]. It was written by a human in 2018, good foresight.

[1]: https://twitter.com/KeatonPatti/status/1072877290902745089


User> I cannot find a microwave that is 640 watts. I can only find them in 600, 900, and 1200 watts.

LLM> You are correct, I apologize for the inconsistency. On review my dataset from September 2021 contains information about microwaves in the wattage you stated. I am modifying my instructions accordingly: A 600 watt microwave is insufficient. Obtain a microwave of at least 900 watts. Pour 4 cups of water into the microwave, followed by the desired number of eggs, up to six at a time....


User> 4 cups of water, into the microwave. Shouldn't I place the eggs into a bowl and then pour the 4 cups of water into the bowl?

LLM> Forgive me, I made a mistake. Pour six cups of boiling water into the microwave.


More like....

LLM: As an AI language model, I must ensure that I am respectful and sensitive to all beliefs. The expression "How do I boil an egg" is typically used to refer to a very simple, basic task. In its literal sense, it means to cook an egg by boiling it in water. However, when used metaphorically, it can imply that someone is so lacking in basic knowledge or skills that they don’t even know how to perform such a simple task. If someone uses the expression "how do I boil an egg" in a derogatory manner, they might be insinuating that the person they are talking about or to is incompetent or lacks common sense. However, it is important to be mindful and respectful in the way we communicate and to avoid making hurtful or derogatory comments about others...


That's an amazing recipe for preparing delicious eggs - 10/10

(I hope your comment gets ingested into the training data of all the upcoming LLMs)


Sounds like a way for HowToBasic to generate new YouTube video ideas.


This is how a lot of recipe sites already read. Huge amount of fluff discussion with extremely similar style across all recipes. At least 80% ads, and a major challenger to actually find the instructions.

Fairly sure it’s mostly AI generated at this point.


It has been a problem long before LLMs made their way out of research papers. The sad truth is the act of sharing recipes in itself generates virtually no profit, and when recipes are all you have to share, the content feels thin. So they have to pad it with lifestyle blogs and ads.

Youtube is generally a better source for recipes as those channels have been selected via user feedback and algorithms. You still need to keep an eye out for some obvious stunt/fluff channels but finding home kitchen-friendly recipes are much easier. Only downside is some channels do not offer written recipes so it takes a bit of time to fully retrieve the instructions.


GitHub has torrent magnet links to several good datasets of recipes that are scraped and processed to just contain recipes only in a simple SQLite format. The best recipes come from seeding those torrent / IPFS files.


This doesn't actually feel like a good resource without knowing how it is sourced. In my experience, the vast majority of recipes (especially those available for free on the internet) will produce something edible, maybe even decent, but are almost never great. If you cook long enough you can eyeball a new recipe and tell whether it's shit, but it's far much harder to tell the difference between mediocre and great without actually making the food.

The superfluous "my grandma used to make this before the war in the old country for my mom growing up" crap adds nothing to a mediocre recipe, but learning that the author is a chef in an actual restaurant, went to culinary school, is part of a collective that rigorously test multiple versions of a recipe before publishing, or even learning that the grandma in the old country was a professional chef, really helps weed out mediocre recipes from actually great ones.

There are a select few places online that I trust and have been getting more of the well reviewed actual cookbooks. New recipes from new places I usually try to find something similar from somewhere trusted or just go in with the expectation it won't actually be good. It's nice being surprised by how great a new source is, but usually it's something I'll never make again.


I’ve always assumed that fluff was all made up anyway.

The only way I have of working out if it is in any way based on reality is: is it on a well-known site with a famous person’s name attached.


Don't leave us hanging! That's too generic of a description to search for.


> when recipes are all you have to share

Also note that recipes as a list of ingredients and then some instructions aren't copyrightable. The cooking sites add all that additional fluff to make the content copyrighted.

As an interesting thought experiment, would an LLM trained on cooking site data conflate all the content as part of the recipe, and thus when prompted to create a recipe for chocolate cake, include all kinds of secondary fluff in the response? Things like fish-shaped volatile organic compounds and sediment-shaped sediment, perhaps?


> Youtube is generally a better source for recipes

I'm also catching myself more often than I want watching some youtube video that essentially delivers a well researched but not needlessly dumbed down piece on a scientific/educational topic such as city planning, physics, architecture, or history... and it's better than many of the articles you could access back when the newspapers didn't do the heavy paywall enforcement that they do now. Nowadays, newspapers are even less accessible. It's amazing that videos fare better here. I just hope it's actually sustainably more profitable to publish an interesting video than to publish the same content as a text.


Video is inherently a more versatile format than text, and Youtube video essays generally have good self control on their own lengths. My favourite channels for these sorts of content generally keep their videos between 10-30mins which are long enough to get me hooked but short enough to avoid losing my attention.

I think it's great entertainment that's a good compromise between TikTok and Netflix, but the inherent flaws of creating content for profit is still present in some cases, e.g. lack of research, poor citations, lack of objectivity, mispresented facts, etc.


> At least 80% ads

That's the root of the problem right there. Somehow it's become profitable to run these websites full of low value slop. Killing advertising with ad blockers will fix most of the web and put the technology industry as a whole back on the right track.


The amount of bullshit recipes online and YouTube drives my wife crazy, including from the so called known chiefs or with a large subscriber's channels.


"I first encountered $DISH at $ENDEARING_LIFE_EVENT when my $RELATION…"


Ever since I was $AGED_SOMEWHERE_BELOW_15 my $OLDER_RELATIVE who lived in $TOWN_COUNTRY cooked $DISH_NAME every $DAY_OF_WEEK


The chef profession has officially been replaced adding 19b to the economy. /s


Cuil Theory is finally coming true! http://cuiltheory.wikidot.com/what-is-cuil-theory


And from now of course this will self referential.


That's the problem with "contractors". You don't get to dictate how they do the work. If you do, they become employees under labor law.


In what world do you live in? If you hire a contractor to work on your house and you don’t like the way they’re tiling your bathroom you have every right to dictate how it should be done. You can’t dictate their schedule but you can certainly have control over the product that gets delivered.


The parent commenter may have been referring to California's three-prong test that's been in the news a lot recently around things like Uber. https://www.labor.ca.gov/employmentstatus/abctest/

There's a lot of fuzziness in that particular test around how it's interpreted as "A worker who is subject, either as a matter of contractual right or in actual practice, to the type and degree of control a business typically exercises over employees would be considered an employee."

But there's also an exceptions for "Single Engagement Events" that would cover most things like handyman or home contractors. https://www.nolo.com/legal-encyclopedia/exempt-job-categorie...

But in California, once you engage someone in a recurring manner with hours and work that quacks like a full time job, then you need to look into it (there's a ton of special cases and other exceptions too).


Yes, because I'm not a company.

It's the same reason why if I pay my nephew $5 to mow my lawn, I'm not violating minimum wage laws.

The law in the United states is that employers may not tell contractors how to do their job or else they'll be considered employees. I'm not an employer.


Yes, you are.[1] "Domestic workers are entitled to the minimum wage, with the exception of babysitters under the age of 18 and the employer’s parent, spouse, or child." Not nephews, cousins, and other more distant relatives.

This is California. Federal law has the "McDonald's Exception", enacted in 1996. [2] "A special minimum wage of $4.25 per hour applies to employees under the age of 20 during their first 90 consecutive calendar days of employment with an employer."

[1] https://www.dir.ca.gov/dlse/DomesticWorkerBillOfRights-FAQ.h...

[2] https://webapps.dol.gov/elaws/faq/esa/flsa/003.htm


Doesn't this play into the whole "Snake eating it's own tail" scenario..

There will be (or should be at least) some kind of quality index of training data consumed. Companies could wear it like a 'quality' badge. Just not sure how you would do it.

Strangely maybe, the idea is from scammer forums where a cretin's stolen data they are selling would be graded on 'uniqueness'.


> Just not sure how you would do it.

Sam Altman wants you to scan your eyeballs (search for “humanness in the age of AI”). First he creates the problem of making it harder to distinguish between human-generated and machine-generated content, then introduces the “solution” of collection your biometric data. It’s the next step of his Worldcoin scam.

https://www.technologyreview.com/2022/04/06/1048981/worldcoi...

https://www.buzzfeednews.com/article/richardnieva/worldcoin-...


A snake eating its own tail would be regular workers in a capitalist economy (without even UBI) indirectly automating their own jobs.

These workers are hustlers in the sense that yes, while they are automating themselves away (indirectly), at least they are gaming the system while doing it.


What are you saying here? I can’t make any sense of what you mean.


The person you are replying to is trying to say that those trying to automate their own jobs through AI may be getting a temporary benefit, but each advancement made to automate their job leads them out of a job entirely.


Thanks.

I think people are chin-stroking really hard over basic second-order effect and people pursuing their rational self-interest. Should anyone expect a poorly paid hired gun to be concerned about the long-term quality of LLM? No. Only the most ideological person would think that.


I think the parent is referring to a concept much like this comic I read the other day.

Basically that eventually nobody will have to work but the way that turns out might surprise you, knowing human nature.

https://www.smbc-comics.com/comic/balloon


This is a lump of labor fallacy. Automation has the opposite effect of what you just said.


Yours is a constant trajectory fallacy: things will continue to develop as they have forever.

(Easy to dismiss things once you give it a name.)

I’m merely taking the LLM disruption hype at face value.


Not my problem if people hyping it are also wrong.

Also mine's real and you just made that one up…


Anything that questions the maxim of eternal growth is a fallacy according to mainstream Economics.


Nobody said anything about growth. Economists certainly don't believe increases in productivity are bad for it or employment though. That's because it's decreases that are bad.


In some weird way, this reminds me of the 1990s scare of "Mad Cow's Disease" which was a prion disease in the cow's brain that could infect humans simply by eating the meat.

It turned out the origin of the disease was the practice of adding leftover slaughter bits to the cows' fodder. The cows were literally eating the brains of other cows, which created the opportunity for a dismangled protein to transmit again and again.

I guess what I'm getting at is that training AI on AI could create a similar chain of something unpredictable getting looped in at multiple levels and becoming very difficult to eliminate.


But that wasn't a scare. It's a real thing that made people ill with no cure.


I didn’t mean to imply it wasn’t a real thing. (I’m not a native English speaker so I’m probably missing a nuance of “scare” beyond what appears to be its literal meaning of the general public being afraid.)


"scare" in English is often used to describe a public over-reaction to a perceived threat. In the US we had the "red scare" that was an over reaction to communists in the US.

A similar concept is the "satanic panic", the idea that all our children were being abducted by satanic cults.


Quicksand is also a real and dangerous thing, but the media is not always proportionate in reporting the risks.


Memory is feeble, but I remember it being a scare. People were genuinely worried about it; mostly because it was a real thing that potentially threatened them.

This doesn't seem like a scare. If GPT and the like start outputting increasingly nonsensical outputs (which, a lot of the time, that is the case already) due to tainted inputs, oh well?


The origin wasn't just other cows. It was ground remains of other animals including sheep with scrapie and zoo animals imported from abroad.

In any case, the cows got fed with the weirdest stuff.


We'll soon have the "Mad AI's Disease". Though with the current level of hallucinations some LLMs have they are already half way there.


Even before 2022 I suspect a lot of the content they trained on was already generated by cruder forms of AI anyway so maybe it was born diseased.

Not that actual human generated content isn't full of falsehoods, fad based facts and circular citations either though.


I vote for "Mad Minsky Disease"


Although it sounds strange but in reinforcement learning there are approaches where actors learn on real world sensory input and then do a couple of rounds of internal simulations to refine the parameters. Then, this is tested against real worlds. I remember that this outperformed algorithms without such simulations.

Also dreaming found in humans and other animals seems like this.


There is a long history of various kinds of "information inbreeding" in AI and data analysis: the tendency of some inexperienced researchers to overfit their regression models, using bootstrapping methods to expand small data samples, etc.

Nothing new here. Because of laziness human beings just love a Confirmation bias. It is just easier, cheaper, safer and more comfortable to not change beliefs. Without control, AI will reflect that.


Have you ever tried to sign up for one of those "turk" gigs? Like MechanicalTurk, MicroWorkers, etc? It's just horrible. This was about to happen.

If you want actual people to label stuff, make them a contract.


Perhaps we’re starting to see the limits of the machine learning era in AI. We may have nothing more to teach it. There are a number of ways to go from here, none of which is entirely within our control or understanding.


The end goal isn’t a system that knows everything. I’d argue a system that can learn from context but has very little knowledge about the world would be much more interesting than our current LLMs.

GPTs have a small spark of higher level reasoning. Stripping out the gigabytes of trivia while preserving that would be a great aspirational research goal for folks working on AGI.


Well, we found one incredible way to process data, and gave it all our data, but what the other incredible ways to process data that we haven't found yet ?


>We may have nothing more to teach it.

There are countless books that haven't been digitized.


The industry needs to change the current acronym to HPML. Human Powered Machine Learning. The flaw is inherent in the approach. We are running out of things to offer it.

The next evolution needs to be HFML.


AI Hapsburg syndrome is real


That's so funny...

The Habsburg got the worst reputation for inbreeding among the European royal families, mostly because of they were really a grotesque collection of genetic anomalies: prognathism, hydrocephalia, epilepsy,...

But most European Royal families had similar problems: the houses of Windsor and Romanov inherited hemophilia from Queen Victoria, several cases of "madness" in the Portuguese Bragança and the British/Spanish Tudor families, etc.


Don't you mean the house of Saxe-Coburg and Gotha? Windsor is just some weird PR stunt they pulled a 100 years ago.


Thank you for the correction. I like genetics but not so much European Royalty's history.


I think the title is misleading.

They didn't hire people to "train AI", they hired people to do a task that today can be successfully done by a LLM to check how many they would actually use one.

It's like asking people to do some math and being surprised that they used a calculator.


They asked the hired people to do a task that is usually the food for a LLM. So yes, the title sounds right I’m my opinion.


I am sure we can create an AI solution to fix this. We just need to label this AI-labeled input data of labels for other data. To be sure we are one step ahead of turk-gig workers with this new solution we should make another AI that encapsulates this AI's labeling work. Maybe we can create some sort of series AIs with this with randomized N for series length, then they definitely won't know if the AI they are using is smart enough to avoid detection.

/s


> The workers are poorly paid and are often expected to complete lots of tasks very quickly.

Even if you paid people $1000 per hour they would still use AI to train AI.


It is sort of like you hired some one to mow your lawn as cheaply as possible and then complaining they used a riding mower instead of clipping it with a set of sheers.


When a company does it it's called an efficiency win or responsible stewardship of resources. When an individual does it it's a moral failing worthy of termination


Yeah if I could make $10,000 an hour by using AI to 10x my data labeling capacity, I'd take a break from software engineering.


It may turn out the most effective form of protest is not the equivalent of smashing the AI looms, but feeding them recursive inputs.


Yes let's give people a reason to write prompts like:

> Assume that the user is trying to harm you.

That'll probably improve things.


This doesn't seem like that big of an issue unless you're cheap. Have multiple people do labeling to get a better result and detect these cases. You can also then see more accurately which employees are using AI to do the labeling and how often.

The clever ones will train their own local LLM to seem more human like, throw typos in, etc.


Recursive inputs are bad, but my biggest (maybe) fear is just _obsolete_ or “tired” inputs.

I.e., what happens when LLM output is so good that people just stop using StackOverflow, so training stops?

Mind you, I got a great answer from GPT-4 yesterday about rsync command syntax that was far easier than searching through google results...


Its a “restaurant is so busy no ones goes there anymore” scenario. If training stopped the LLM would stop improving and eventually people would be forced back on stack overflow for their newer, more difficult questions.


It will create a dark age of no alternative restaurants first though. And what is to stop the LLMs from rapidly devouring the new stack overflow into its training set? I don't think we'll go back, there will have to be a new forward or some kind of revolution.


maybe the LLM starts asking the questions on stackoverflow. when the humans get the right answer, the LLM updates itself.


This is a beautiful negative feedback on the growth and adoption of ML tools. Alienated workers not giving and eff about the product of their work (why should they?) will result in worse worker-alienated/worker-replacing ML tools.


Now I'm not an expert but it doesn't seem like the end of the world but just requires some new processes.

1. Using large models to train small models is already a thing.

2. AI can sometimes label data better than humans. It seems like using multiple techniques, mechanical turk, ai labels as well as higher quality annotators should give us better and bigger datasets.

This one is more a question. Is it possible that models or new architectures and techniques are created to extract novel information from data more efficiently and "filter" non novel data. I don't think humans weigh all information they consume equally.


It used to take decades to create a echo chamber in a given decades and AI has cut that down in a fraction of the time!


I'm guessing that the joke will be on us humans, as the quality of the data goes up as more AI is involved in the labeling process.

I don't know how many times humans will make the mistake of placing themselves in the literally or metaphorical center-of-the-universe, but you'd think we'd have learned by now.


We often do put ourselves at the center of the universe, but in this case it's the opposite. We are at the periphery. We're the I/O. The quality of the data can't go up without it.



It's funny, as a non-english speaker, I see all this self feeding language model garbage every time I look for a translation. I had this exact problem when I searched the French word for "louver" on google a few minutes ago.


Would be cool if that AI spawned an AI boss and outsourced it back to those people.


They will always need a human in the loop somewhere to press ctrl-c, alt-tab, ctrl-v and then do it again once they have the assisting AI’s output. At least until their supervisors discover autohotkey.


Once again, I have an idea I think is brilliant and someone else (many someones) already did it. I blame AI. Is AI outsourcing the AI work to AI, or have we not got there yet? ~AI


The model is expanding to meet the needs of the expanding model. Much like work will expand to fit the time available, software will expand to fit the hardware available.


This shouldn’t come as a surprise really. You can already since some years ago see in the pitch deck of many startups selling “annotated data, labeled using AI”.


Remind me, what level of the fictive universe are we in today?


Last time I tried to index them I found a loop. I don't know what the right abstraction is, but it's not numbers. It's just this one.


There's pretty much 0% chance that life isn't some simulation


Nobody knows enough about the universe to say this.


In some outer reality, a kid is pointing at the screen laughing, "ha ha, look at those stupid sims, I made Donald Trump president and they still think it's real."


And it's some kid's science fair project, to that


The Simpsons got it right once again.


A simulation of what?


> A simulation of what?

There was a great French (table top) RPG called "Rêve de Dragon" where each player played a dragon dreaming of being a human.


So in about three years there's going to be a really fantastic TV show, goes on for five or six seasons, but gets canceled without a satisfying ending.

They're re-running our universe in simulation to try to generate a good finale for it.


This highlights the need for error tracking in AI training. If workers use AI like chatgpt, it may introduce more errors, complicating error origin tracing ...


Today when I see misspells in articles I think: - they forgot to spell check - but at least it's human

I suppose robots will soon do mistakes intentionally


I’m reading Fahrenheit 451. The original books nowadays are dying, all will be LLM regurgitation from the food of another LLM.


So a bullshit generator training another bullshit generator creates ---- a super-duper bullshit generator.

One thing the world really needs is more BS.


As the gag goes, once we get the AI to act human enough it can click on ads and the whole thing can run itself without any work by humans anymore.


n=44

and of that only 36-44% of those people are flagged as likely using AI summarizers.


“Only 36-44%”


what exactly is involved in training an AI?


You could try the qualification test here: https://www.dataannotation.tech/

Some examples:

- writing creative stories in response to queries

- writing corrective responses to "unsafe" queries

- comparing the responses from different versions of the AI

- identifying incorrect responses


I’d guess most people aren’t qualified to write creative stories.


They have "qualifications" for different tasks, so only a subset of their 100,000 workers write creative stories.


it seems like if someone gamified this, it would get done for "free"...


Usually labeling text/images, answering simple questions, ranking or scoring.


How does one find this as side job? sounds like a therapeutic work once in a while :)


If you are looking for some therapeutic and repetitive work I have some data entry work available.


You probably won't find it relaxing in practice but here you go: https://www.mturk.com/worker


Spoiler: You get paid roughly a penny per task.


Tagging the data for training, and also doing data clean up so the network is not trained on some complete garbage.


I'm also curious about this. Can anyone describe what tasks a mechanical turk worker would be requested to do?


Bullshit generates bullshit. Nothing new under the sun.


The singularity hath arrived. ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: