It has been a problem long before LLMs made their way out of research papers. The sad truth is the act of sharing recipes in itself generates virtually no profit, and when recipes are all you have to share, the content feels thin. So they have to pad it with lifestyle blogs and ads.
Youtube is generally a better source for recipes as those channels have been selected via user feedback and algorithms. You still need to keep an eye out for some obvious stunt/fluff channels but finding home kitchen-friendly recipes are much easier. Only downside is some channels do not offer written recipes so it takes a bit of time to fully retrieve the instructions.
GitHub has torrent magnet links to several good datasets of recipes that are scraped and processed to just contain recipes only in a simple SQLite format. The best recipes come from seeding those torrent / IPFS files.
This doesn't actually feel like a good resource without knowing how it is sourced. In my experience, the vast majority of recipes (especially those available for free on the internet) will produce something edible, maybe even decent, but are almost never great. If you cook long enough you can eyeball a new recipe and tell whether it's shit, but it's far much harder to tell the difference between mediocre and great without actually making the food.
The superfluous "my grandma used to make this before the war in the old country for my mom growing up" crap adds nothing to a mediocre recipe, but learning that the author is a chef in an actual restaurant, went to culinary school, is part of a collective that rigorously test multiple versions of a recipe before publishing, or even learning that the grandma in the old country was a professional chef, really helps weed out mediocre recipes from actually great ones.
There are a select few places online that I trust and have been getting more of the well reviewed actual cookbooks. New recipes from new places I usually try to find something similar from somewhere trusted or just go in with the expectation it won't actually be good. It's nice being surprised by how great a new source is, but usually it's something I'll never make again.
Also note that recipes as a list of ingredients and then some instructions aren't copyrightable. The cooking sites add all that additional fluff to make the content copyrighted.
As an interesting thought experiment, would an LLM trained on cooking site data conflate all the content as part of the recipe, and thus when prompted to create a recipe for chocolate cake, include all kinds of secondary fluff in the response? Things like fish-shaped volatile organic compounds and sediment-shaped sediment, perhaps?
> Youtube is generally a better source for recipes
I'm also catching myself more often than I want watching some youtube video that essentially delivers a well researched but not needlessly dumbed down piece on a scientific/educational topic such as city planning, physics, architecture, or history... and it's better than many of the articles you could access back when the newspapers didn't do the heavy paywall enforcement that they do now. Nowadays, newspapers are even less accessible. It's amazing that videos fare better here. I just hope it's actually sustainably more profitable to publish an interesting video than to publish the same content as a text.
Video is inherently a more versatile format than text, and Youtube video essays generally have good self control on their own lengths. My favourite channels for these sorts of content generally keep their videos between 10-30mins which are long enough to get me hooked but short enough to avoid losing my attention.
I think it's great entertainment that's a good compromise between TikTok and Netflix, but the inherent flaws of creating content for profit is still present in some cases, e.g. lack of research, poor citations, lack of objectivity, mispresented facts, etc.
Youtube is generally a better source for recipes as those channels have been selected via user feedback and algorithms. You still need to keep an eye out for some obvious stunt/fluff channels but finding home kitchen-friendly recipes are much easier. Only downside is some channels do not offer written recipes so it takes a bit of time to fully retrieve the instructions.