Storm: LLM system that researches a topic and generates full-length wiki article

lukev · 2024-04-11T19:00:42 1712862042

I can see this being useful iif the content is generated on demand and then discarded.

Publishing AI generated material is generally speaking a horrible idea and does nobody any good (at least until accuracy levels get much much better.)

Even if they do it well and truthfully (which they don't) current LLMs can only summarize, digest, and restate. There is no non-transient value add. LLMs may have a place to help query, but there is no reason to publish LLM regurgitations alongside the ground truth used to generate them.

CuriouslyC · 2024-04-11T21:09:28 1712869768

I think bootstrapping documentation with LLM output is a great practice. It's a wiki, people can update it from a baseline, just as long as they can see what was LLM generated to know that it shouldn't be taken as absolute truth.

The hardest part of good documentation is getting started. Once there are docs in place it's usually much easier to revise and correct than it would have been to write correctly by hand the first time. Think of it like automating a rough draft.

msp26 · 2024-04-11T21:19:11 1712870351

Maybe the generated text could be a slightly different colour until it's verified. But you'd have to make sure there's no easy way of verifying everything mindlessly without having read it.

skissane · 2024-04-12T00:55:31 1712883331

Technique I've found helpful personally: get the LLM to generate text in small chunks (e.g. a paragraph at a time). After generating each chunk, it is immediately reviewed by a human, who can edit it manually to correct any mistakes, ask the LLM to try again, or prompt the LLM to make specified changes. When the human is satisfied with that chunk, it is saved, and we move on to the next one.

Sometimes, the output the LLM generates is correct and I'm just approving it. Other times, it is mostly right, and I can easily identify and correct its errors. Yet other times, it is totally wrong, but often typing out why it is wrong is a good start to actually generating correct text. Often (but not always), the kinds of false assumptions which LLMs make are similar to those a human reader would make, so stuff like "A and B sound very similar but, in the context of this system, actually have completely different meanings" is a useful addition to documentation anyway.

The worst case scenario is the LLM generates something which is subtlety wrong, and the human review fails to pick up on the subtle error. But, that's something which can happen even with no LLMs involved at all. It isn't uncommon for people to make subtle errors in documents they write (often because they are misremembering something) and for those subtle errors not to be picked up during the review process. I'm not convinced the odds of this happening with an LLM-assisted workflow are significantly greater than with a purely human workflow.

cyanydeez · 2024-04-12T00:08:54 1712880534

Unfottunately youre just setting the stage gor subtling incorrect data

petercooper · 2024-04-11T20:45:19 1712868319

current LLMs can only summarize, digest, and restate. There is no non-transient value add.

Though, at a stretch, Wikipedia itself could be considered based around summarization, digesting, and restating/citing things said elsewhere, given its policy of verifiability: "Even if you are sure something is true, it must have been previously published in a reliable source before you can add it." Now, LLMs aren't well known for their citation skills, to be fair.. :-)

skissane · 2024-04-12T00:59:43 1712883583

I remember one time, I wanted to update a Wikipedia article with some more recent developments, but was having trouble working out the best way of wording them. I found a newspaper article discussing those developments, so I scribbled down some notes summarising the article's content, and then asked ChatGPT to reword them for me more eloquently. And its output was good enough for me to paste in to the Wikipedia article with only minor adjustments, along with a cite to the newspaper article. I'm sure I'm not the only person to have done something like that.

lukev · 2024-04-11T21:18:43 1712870323

Yeah, when AIs can comprehensively cite their sources I might change my opinion on that.

Though note that there still isn't any need to publish static content. The power of LLMs is that they can be dynamic and responsive!

Even if we hypothesize that it were possible for a LLM to write a high-quality wikipedia-like output, generating the whole thing statically in advance like existing Wikipedia would be relatively pointless. It'd be much more interesting to generate arbitrary (and infinite!) pages on demand.

mistrial9 · 2024-04-12T00:40:42 1712882442

excellent virtue signalling here -- however, commercial publishers, competitive attorneys, advertising sales and others are literally falling over themselves in an avalanche of doing exactly this, that you advise against (politely).

This moment reminds me very much of the late 1990s when it was common knowledge that "claim jumping a domain name" is very rude and not advisable, or the common knowledge among intellectuals that "ads will ruin the Internet" .. yes, polite people did not make companies to claim jump domain name registration, or push annoying and repetitive ads on the Internet..

but..

xbmcuser · 2024-04-12T02:22:46 1712888566

I would love for something like this to be attached to LibGen where it reads the millions of scientific papers. As in my opinion human knowledge today is more than what a group of people can handle let alone individuals. Their is lot of domain specific knowledge that would translate and be used in other domains but unless by chance a human with speciality in both domains sees it will not get ported or assimilated in the 2nd domain.

elicksaur · 2024-04-12T00:56:45 1712883405

How could it be true that the content generated would have value only if it is not persisted?

If it doesn’t have value for being saved and published, why would it have value for the person viewing it ephemerally?

vineyardmike · 2024-04-12T09:03:27 1712912607

Obviously LLMs aren't deterministic, but they generally can be modeled as a simple, stateless, function: Given text, generate text. Depending on your views towards wasting energy, why store something that can be "trivially" re-generated?

One reason to NOT want to persist an LLM output is to avoid contaminating the internet with LLM text that could be confused for human text. This is useful for historical reasons (what did people say/think) and for future LLM training purposes.

In the future, there might be other reasons to not persist LLM-generated text. If you store the inputs instead of outputs then it can be "replayed" with a new LLM. I can generate some text via an LLM running on my phone, but if I later wanted a higher-quality output, it'd be easier to re-generate the data on my laptop later (or the cloud, obviously). One related idea that comes to mind is personalization - let the LLM use language each reader is familiar with, or focusing on different bits of info to different people. Maybe if an LLM knows I have a PhD in a certain subject, it'd summarize a research paper differently than it'd summarize it for my neighbor who has different preexisting knowledge.

Hugsun · 2024-04-18T01:39:05 1713404345

If you set the temperature to zero, LLMs are deterministic.

observationist · 2024-04-11T22:20:52 1712874052

This is categorically untrue. Publishing material generated like this is going to be generally better than human generated content. It takes less time, can be systematically tested and rigorous, and you can specifically avoid the pitfalls of bias and prejudice.

A system like this is multilayered, with prompts going through the whole problem solving process, considering the information presented, assuring quality and factuality, assigning the necessary citations and documentation for claims.

Accuracy isn't a problem. The way in which AI is used creates the problem - ChatGPT and most chat based models are single pass, query/response type interactions with models. Sometimes you get a second pass with a moderation system, doing a review to ensure offensive or illegal things get filtered out. Without any additional testing and prompt engineering, you're going to run into hallucinations, inefficient formulations, random "technically correct but not very useful" generations, and so forth. Raw ChatGPT content shouldn't be published without significant editing and going through the same quality review process any human written text should go through.

What Storm accomplishes is an algorithmic and methodical series of problem solving steps, each of which can be tested and verified and validated. This is synthesized in a particular way, intended as a factual reference article. Presumably you could insert debiasing and checks for narrative or political statements, ensuring attribution and citation occur for quotations, and rephrasing anything generated by the AI as a neutral, academic statement of fact with no stylistic and artistic features.

This is significantly different from the almost superficial interactions you get with chatbots, unless you specifically engineer your prompts and cycle through similar problem solving methods.

Tasks like this are well within the value add domain of current AI capabilities.

Compared to the absolute trash of SEO optimized blog posts, the agenda driven, ulterior laden rants and rambles in social media, and the "I'm oh-so-cleverly influencing the narrative" articles posted to Wikipedia by humans, content like this is a clear winner in quality, in my opinion.

AI isn't at the point where it's going to spit out well grounded novel answers to things like "what's the cure for cancer?" but it can absolutely produce a principled and legible explanation of a phenomenon or collection of facts about a thing.

JohnFen · 2024-04-12T20:57:10 1712955430

> Publishing material generated like this is going to be generally better than human generated content.

I think the opposite. Publishing material generated like this is going to dilute our knowledge base and result in worse content.

But we're both speculating. Only time will tell. I hope that you're right and I'm wrong.

tiptup300 · 2024-04-11T19:02:09 1712862129

are llms able to look at a list of categories, read content and then determine which of the categories apply?

OKRainbowKid · 2024-04-11T22:15:58 1712873758

This could be achieved by generating embeddings of suitable representations of the categories once, and then embedding the content at runtime, before using some distance metric to find matching categories for the content embedding.

warkdarrior · 2024-04-11T19:49:37 1712864977

This is a very broad question, but in short, yes, they can do this. It depends on the granularity and overlap of those categories.

msp26 · 2024-04-11T21:44:31 1712871871

Absolutely

visarga · 2024-04-11T21:30:02 1712871002

> current LLMs can only summarize, digest, and restate. There is no non-transient value add.

No, you're wrong. LLMs create new experiences after deployment, either by assisting humans, or by solving tasks they can validate, such as code or game play. In fact any deployed LLM gets to be embedded in a larger system - a chat room, a code running environment, a game, a simulation, a robot or inside a company - it can learn from iterative tasks because each following iteration carries some kind of real world feedback.

Besides that, LLMs trivially learn new concepts and even new skills with a short explanation or demonstration, they can be pulled out of their training distribution and collect experiences doing new things. If OpenAI has 100M users and they consume 10K tokens/user/month, that makes for 1 trillion tokens of human-AI interaction rich with new experiences and feedback.

In the text modality LLMs have consumed most of the high quality human text, that is why all SOTA models are roughly on par, they trained on the same data. That means easy time is over, AI has caught up with all human language data. But from now on AI models need to create experiences of their own, because learning from your own mistakes is much faster. The more they get used, the more feedback and new information they collect. The environment is the teacher, not everything is written in books.

And all that text - the trillions of tokens they are going to speak to us - in turn contributes to scientific discoveries and progress, and percolate back into the next training set. LLMs have massive impact at language level on people so by extension on the physical world and culture. They have already influenced language and the arts.

LLMs can create new experiences, learn new skills, and have a significant impact through widespread deployment and interaction. There is "value add" if you look at the grand picture.

__loam · 2024-04-11T23:10:06 1712877006

https://twitter.com/itsandrewgao/status/1689634145717379074?...

Yeah this is really having a positive impact on scientific discovery.

visarga · 2024-04-12T04:40:46 1712896846

Your link only shows what unscrupulous people would do.

Here is a LLM with search doing competitive level coding:

https://deepmind.google/discover/blog/competitive-programmin...

and in general, applying evolutionary methods on top:

https://scholar.google.com/scholar?cites=1264402983985539857...

The explanation is simple - either learn from past experience which is human text for now, or learn from present time experience which comes from the environment. The environment is where LLMs can do novel things.

pstorm · 2024-04-11T19:00:24 1712862024

I looked into this to see where it was getting new information, and as far as I can tell, it is searching wikipedia exclusively. Useful for sure, but not exactly what I was expecting based on the title.

pksebben · 2024-04-11T19:20:27 1712863227

That gives me an idea.

There are wikipedias in other languages - Maybe this framework could be adapted to translate the search terms, fetch mulitlingual sources, translate them back, and use those as comparisons.

I've found a lot of stuff out through similar by-hand techniques that would be difficult to discover on english search. I'd be curious to see how much differential there is between accounts across language barriers.

Lerc · 2024-04-11T20:26:10 1712867170

As a base for researching the idea, Wikipedia seems like a decent data source.

For broader implementation you would want to develop the approach further. The idea of sampling other-language Wikipedia mentioned in a sibling comment seems to be a decent next step.

Extending it to bringing in from wider sources would be another step. I doubt it would be infallible but it would be really interesting to see how it compares to humans performing the same task. Especially if there were a additional ability to verify written articles and make corrections.

philipov · 2024-04-11T20:35:36 1712867736

> As a base for researching the idea, Wikipedia seems like a decent data source.

If your goal is to generate a wiki article, you can't assume one already exists. That's begging the question. If you could just search wikipedia for the answer, you wouldn't need to generate an article.

Lerc · 2024-04-11T20:56:48 1712869008

I don't think their goal is to generate a wikipedia article. Their goal is to figure out how one might generate a wikipedia article.

manishsharan · 2024-04-11T18:55:23 1712861723

At what point will it be just LLM Bots arguing with Other LLM Bots on Wikepedia edits ?

_akhe · 2024-04-11T21:08:54 1712869734

As long as the LLM Moderator deems it safe discourse let the best idea win! I'd love a debate between 2 highly-accurate and context-aware LLMs - if such a thing existed.

Otherwise it would be like reading HN or Reddit debates where 2 egomaniacs who are both wrong continually straw man each other with statements peppered with lies and parroted disinfo, aint got time for that.

samgriesemer · 2024-04-11T21:36:50 1712871410

Small thing, but the blurb on the README says

> While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.

So it can't produce articles that require many edits? Meaning it can produce publication-ready articles that don't need lots of edits? Or it can't produce publication-ready articles, and the articles produced require lots of edits? I can't make sense of this statement.

adr1an · 2024-04-11T21:48:42 1712872122

It gives you a draft that you should keep working on. For example, fact checking.

agilob · 2024-04-11T18:40:00 1712860800

Nucleo AI Alpha

An AI assistant app that mixes AI features with traditional personal productivity. The AI can work in the background to answer multiple chats, handle tasks, and stream/feed entries.

https://old.reddit.com/r/LocalLLaMA/comments/1b8uvpw/does_fr...

brap · 2024-04-11T19:13:18 1712862798

I don’t know how well this works (demo is broken on mobile), but I like the idea.

Imagine an infinite wiki where articles are generated on the fly (from reputable sources - with links), including links to other articles (which are also generated) etc.

I actually like this sort of interface more than chat.

rrr_oh_man · 2024-04-11T20:21:42 1712866902

Check out https://github.com/MxDkl/AutoWiki (there are project with similar names doing stuff like this)

skywhopper · 2024-04-11T22:22:18 1712874138

From my experiments, this thing is pretty bad. It mixes up things that have similar names, it pulls in entirely unrelated concepts, the articles it generates are mind-numbingly repetitive and verbose (although notably with slightly different "facts" each time things are restated), its citations are often completely unrelated to the topic at hand, and facts are cited by references that don't back them up.

I mean, the spelling and syntax of the sentences is mostly correct, just like any LLM content. But there's ultimately still no coherence to the output.

barbarr · 2024-04-11T18:30:19 1712860219

I guess this is a good thing for increasing coverage of neglected areas. But given how cleverly LLMs can hide hallucinations, I feel like at least a few different auditor bots should also sign off on edits to ensure everything is correct.

pksebben · 2024-04-11T18:54:37 1712861677

This method has actually been proven effective at increasing reliability / decreasing hallucinations [1]

1 - https://arxiv.org/abs/2402.05120

_akhe · 2024-04-11T21:32:24 1712871144

This would be useful for RAG when a Wiki doesn't exist. findOrCreate

ranyume · 2024-04-12T14:41:53 1712932913

What's the point of a tool that helps you research a topic if said tool has to approve your topic first? It refused to research my topic because it was sensitive.

cess11 · 2024-04-11T19:04:30 1712862270

Kinda weird to promote automated reordering and rephrasing of information as research.

What do the authors call what they're doing? Magic?

LeoPanthera · 2024-04-11T18:21:25 1712859685

I saved a full snapshot of Wikipedia (and Stack Overflow) in the weeks before ChatGPT launched, and every day I'm more glad that I did. They will become the Low Background Steel of text.

barbarr · 2024-04-11T18:32:43 1712860363

Good analogy! There's good reason to believe that web archives "uncontaminated" by LLM output will have some unique value in the future (if not now).

cmcollier · 2024-04-11T18:41:52 1712860912

For those wondering about the analogy:

* https://en.wikipedia.org/wiki/Low-background_steel

pksebben · 2024-04-11T18:57:59 1712861879

That's gonna be a lot of fun to play with in a year or so.

There's a concurrent explosion of 'veracity' analysis - it'll be fun to run those against wikipedia a year from now and your data.

Incidentally, are you interested in mirroring your dataset and making it more robust? I'm sure I've got a few TB of storage lying around somewhere...

Anon84 · 2024-04-11T19:02:21 1712862141

You can just download it yourself. Wikimedia publishes regular dumps in easily accessible formats: https://dumps.wikimedia.org/enwiki/20240320/ (the most recent for english Wikipedia)

_akhe · 2024-04-11T21:11:40 1712869900

"Note that the data dumps are not backups, not consistent, and not complete."

pksebben · 2024-04-11T19:23:49 1712863429

I don't see historical dumps. Am I just dumb?

Anon84 · 2024-04-11T19:52:32 1712865152

No, the website is just weird. The original link I posted is for the most recent dump... if you want older ones: https://dumps.wikimedia.org/enwiki/

LeoPanthera · 2024-04-11T20:22:50 1712866970

They are already on the Internet Archive as Kiwix archives.

jakderrida · 2024-04-11T18:32:12 1712860332

The thing is that the Wiki mods will need to be more diligent with uncited things. I also see 2 massive opportunities here. First is that they can have agents check the cited source and verify whether the source backs up what's said to a reasonable degree. Second opportunity is fitting in things only found in other language Wikis that either be incorporated into the english one or help introduce new articles. Believe it or not, LLMs can't generate english answers for things answered only in Russian (or any language) in the training data.

groceryheist · 2024-04-11T18:34:34 1712860474

> First is that they can have agents check the cited source and verify whether the source backs up what's said to a reasonable degree.

This is a hard and tmk unsolved NLP/IR problem, and data access is an issue.

> Second opportunity is fitting in things only found in other language Wikis that either be incorporated into the english one or help introduce new articles.

This has been attempted via machine translation in the past, and it failed because you need native speakers to verify and correct the translations and this wasn't the sort of work that people were jumping to volunteer to do.

lazyasciiart · 2024-04-12T00:13:13 1712880793

I speak multiple languages. It'd be fun to be given a link saying "this article has content present that is not in the English version" and see if I can update either of the articles using what's in the other copy. But I'm not going to go read wikipedia articles in two languages to find them.

WhitneyLand · 2024-04-11T19:03:35 1712862215

>>LLMs can't generate english answers for things answered only in Russian in the training data.

For multilingual LLM’s? Why do you think that?

An LLM can translate inputs of arbitrary Russian text. If there were an English question about something only in the training data as Russian, I would expect an answer - with the quality being on par with its general translation capabilities.

tiptup300 · 2024-04-11T19:03:49 1712862229

You know that wikipedia keeps revisions on all articles. I'm sure you could put together a script to make a copy any time of each page from a certain point of time.

schainks · 2024-04-11T23:39:01 1712878741

How do you browse this snapshot? I'm interested in this solution, too

jankovicsandras · 2024-04-11T19:19:45 1712863185

This looks cool!

There's a small ironically funny typo in the first line: knolwedge

wwarner · 2024-04-12T00:19:24 1712881164

this is important as it collects and reports its references. a) it’s the correct paradigm for using llms. b) through human interactions, it can learn from its mistakes.

spxneo · 2024-04-11T18:43:45 1712861025

I hope somebody took a snapshot of the entire internet before 2020, that is our only defence against knowledge laundry.

Wreaking havoc on the digital Akashic records.

zingelshuher · 2024-04-12T04:19:23 1712895563

Expect sh*t load of AI hallucinations. As if Wiki isn't bad enough with BS some intentionally posting.

jankovicsandras · 2024-04-11T19:16:35 1712862995

Logans_Run · 2024-04-11T18:01:54 1712858514

Oh dear lord .... sub heading states - Storm - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Good luck with this storm, wiki's the world over. Just a thought but ... maybe someone should ask an org like the Internet Archive to snap-shot Wikipedia asap and label it Pre-Storm and After-Storm

achrono · 2024-04-11T21:13:05 1712869985

LLM mediocrity is just a reflection of human mediocrity, and my bet is on the average LLM to get way better much faster than the average human doing the same.

_akhe · 2024-04-11T21:29:00 1712870940

Agree with you, but on mediocrity: Mistral barely passes as usable, GPT-4 is barely better than Googling, and nothing else I've tried is even ready for production. So there's some element of the model's design, weights/embeddings, and training data that matters a lot.

Only fine-tuned models are producing impressive work, because when we say something is impressive it by definition means not like the status quo - the model must be tuned toward some bias or other, whether it's aesthetic or otherwise, in order to stand out from the rest. And generic models like GPT or Stable Diffusion will always be generic, they won't have a bias toward certain truths - they'll be mostly unbiased which we want for general research or internet search.

So it's interesting, in order to get incredible quality of work out of AI, you have to make it specific, but in order to that, you have to train it on the work of humans. I think for this reason AI will always be ultimately behind humans, though it of course will displace a lot of work we do, which is significant.

singleshot_ · 2024-04-11T22:22:57 1712874177

Humans are limited in the volume of garbage they can produce.

tossandthrow · 2024-04-11T18:21:05 1712859665

there is this sentiment of Ai induced deterioration and pollution.

what if that is not the case? what if the quality of this type of content actually increases?

pksebben · 2024-04-11T19:09:37 1712862577

On the one hand, a tool is as good or bad as the person wielding it. Smart folks with the right intentions will certainly be able to use this stuff to increase the rate and quality of their output (because they're smart, so they'll verify rather than trust. Hopefully.)

On the other, moderation is an unsolved problem. The general mess of the internet is probably not quite ready to be handed a footgun of this caliber.

As with many things tech, some of the outcome falls to us, the techies. We can build systems to help steer this.

tossandthrow · 2024-04-11T19:25:44 1712863544

> On the one hand, a tool is as good or bad as the person wielding it.

I think the real reason is one line dogmas like this.

pksebben · 2024-04-11T20:32:22 1712867542

I'm not sure I follow you - reason for what?

To be clear - I'm with you that these systems can absolutely be a force for vast good (at least, I think that was what you were getting at unless there was a missing '/s'). I use them daily to pretty astounding effect.

I'll admit to being a little put off by being labeled dogmatic - it's not something I consider myself to be.

tossandthrow · 2024-04-11T21:34:12 1712871252

it was a half sentence, for that I apologize. and I don't remember entirely what I meant.

However, I do see a lot of one-sentence "truthms" being thrown around. like "garbage in; garbage out" and the likest.

these are not correct. we can just look at the current state of the art with LLMs that has vast amounts of garbage going in - it seems like the value is in the vastness of the data over the quality.

> On the one hand, a tool is as good or bad as the person wielding it.

I see this as being a dogme. smart people make good LLMs dumb people do not. but this is an open question. it seems like the biggest wallet will be the winner of the LLM game.

please correct me if I misunderstood something.

pksebben · 2024-04-12T00:09:32 1712880572

Ah, I see. What I meant by that was, "A tool is as good or evil as the person wielding it".

There are most definitely good and bad tools, in terms of more or less effective. Machine learning models are for sure outclassing a whole swath of tools in a number of domains and will more than likely continue to overtake purposes over time.

Whether this is a good thing for society is what I thought we were questioning - which is what I meant by steering. We can build tooling to do things like establish veracity, enable interrogation of models, and provide reasoning about internals (which we should do).

Open sourcing as much of this effort as possible will further lead to Good Things (because why are people working on these for free if they're not creating something of actual use) - whilst leaving all ML development to large corporations will inevitably ensure that the only thing you can trust an ML model to do would be spy on you and try to get you to buy stuff, because money.

God the grammar in that last sentence was terrible. But I think you get the point.

CamperBob2 · 2024-04-11T18:24:49 1712859889

It will for a while, I imagine. But the long-term is a concern. Where will new information come from, exactly?

tossandthrow · 2024-04-11T18:27:17 1712860037

why not Ai?

And even if we accept the premise (as flawed as it might be) that Ai is not able to create original knowledge, most of what's online is dessimination and dies not represent new information but just old information rewritten to be understandable by a certain segment.

something LLMs excel at.

poyu · 2024-04-11T18:32:21 1712860341

> Ai is not able to create original knowledge

The current state of LLMs do hallucinate though. It's just not a very trustworthy source of facts.

tossandthrow · 2024-04-11T18:40:45 1712860845

just like my first teachers said I should absolutely not use Wikipedia.

LLMs was popularized less than 2 years ago.

I think it is safe to assume that it will be as trustworthy as you see Wikipedia today, and probably even more as you can embed reasoning techniques into the LLMs to correct misunderstandings.

Wikipedia cannot self correct.

howenterprisey · 2024-04-11T18:45:36 1712861136

Wikipedia absolutely self-corrects, that's the whole point!

tossandthrow · 2024-04-11T19:27:52 1712863672

it does not. it's authors corrects it.

unless you see Wikipedia as the organisation and not the encyklopedia?

in that case: sigh, then everything self corrects

howenterprisey · 2024-04-11T21:01:13 1712869273

It is incoherent to discuss Wikipedia as some text divorced from the community and process that made it, so I'm done here.

pksebben · 2024-04-11T19:14:03 1712862843

There's an important difference between wikipedia and the LLMs that are actually useful today.

Wikipedia is open, like completely open.

GPT is not.

Unless we manage to crack the distributed training / incremental improvement barriers, LLMs are a lot more likely to follow the Google path (that is, start awesome and gradually enshittify as capitalist concerns pollute the decision matrix) than they are the Wikipedia path (gradual improvement as more eyes and minds work to improve them).

tossandthrow · 2024-04-11T19:53:24 1712865204

this is super interesting!

it also carves I to the question what constituted model openness?

most people agree that just releasing weights are not enough.

but I don't think it will ever be feasible to say that reproducing model training is feasible. especially when factoring in branching and merging of models.

for me this is an open and super interesting question.

pksebben · 2024-04-11T20:25:13 1712867113

Here's what I envision (note: impossible with current state of the art)

A model that can be incrementally trained (this is the bit we're missing) hosted by a nonprofit, belonging to "we the people" (like wikipedia).

The training process could be done a little like wikipedia talk pages are now - datasets are proposed and discussed out in the open and once generally approved, trained into the model.

Because training currently involves backpropagation, this isn't possible. Hinton was working on a structure called "forward-forward" that would have overcome this (if it worked) before he decided humanity couldn't be trusted [1]. It is my hope that someone smarter than me picks up this thread of research - although in the spirit of personal responsibility I've started picking up my old math books to try and get to a point where I grok the implementation enough to experiment myself (I'm not super confident I'm gonna get there but you can't win if you don't play, right?)

It's hard to tell when (if?) we're ever going to have this - if it does happen, it'll be because a lot of people do a lot of really smart unpaid work (after seeing OpenAI do what it did, I don't have a ton of faith that even non-profit orgs have the will or the structure to pull it off. Please prove me wrong.)

1 - https://arxiv.org/abs/2212.13345

skywhopper · 2024-04-11T22:24:55 1712874295

How could it? LLMs hallucinate false information. Even if hallucinations are improved, the false information they've generated is now part of the body of text they will be trained on.

prionassembly · 2024-04-11T18:35:07 1712860507

I mean, putting a bullet to someone's head can extirpate a brain tumor they hadn't been alerted to before, while leaving a grateful person owing you kudos. What if?

tossandthrow · 2024-04-11T19:26:46 1712863606

you can always find some radical regressionist argument that is completely out of contact with anything.

congrats on that!

jerf · 2024-04-11T19:14:48 1712862888

The concern is not just a vaguely cynical hand-wringing about how bad AI is. Feeding AIs their own output as training material is a bad thing for mathematical reasons, and feeding AIs the output of other very similar AIs is close enough for it to also be bad. The reasons are subtle and hard to describe in plain English, and I'm not enough of an expert to even try, so pardon if I don't. But given that it is hard to determine if output is from an AI, AI really does face a crisis of having a hard time coming across good training material in the future.

orbital-decay · 2024-04-11T20:18:51 1712866731

>Feeding AIs their own output as training material is a bad thing for mathematical reasons

Most model collapse studies explore degenerate cases to determine the potential limits of the training process of the same model. No wonder you will get terrible results if you recursively recompress a JPEG 100 times! In real world it's nowhere near that bad, because models are never trained on their output alone and always guaranteed to receive the certain amount of external data, starting from the manual dataset curation (yes, that's also fresh data in itself).

Meanwhile, synthetic datasets are entirely common. I suspect this is a non-issue that is way overblown by people misinterpreting these studies.

jerf · 2024-04-11T20:25:18 1712867118

I suspect it's overblown today. Hopefully it'll be overblown indefinitely.

However, if AIs become as successful as Nvidia stock price implies, it could indeed become difficult to find text that is guaranteed to not be AI. It is conceivable that in 20 years it will be very difficult to generate a training set at any scale that isn't 90% already touched by AIs.

Of course, it's conceivable that in 20 years we'll have AIs that don't need the equivalent of millennia of training to come up to their full potential. The problem is much more tractable if one merely needs to produce megabytes of training data to obtain a decent understanding of English rather than many gigabytes.

tossandthrow · 2024-04-11T19:25:09 1712863509

can you show me a mathematical reason that cannot philosophically be applied to people also? people only being fed other people output.

jerf · 2024-04-11T20:21:19 1712866879

I'd go with "no", because people just consuming the output of other people is a big ongoing problem. Input from the universe needs to be added in order to maintain alignment with the universe, for whichever "universe" you are considering. Without frequent reference to reality, people feeding too much on people will inevitably depart from reality.

In another context, you may know this as an "echo chamber". Not quite exactly the same concept, but very, very similar.

I do like to remind people that the AI of today and LLMs are not the whole of reality. Perhaps someday there will be AIs that are also capable of directly consulting the universe, through some sort of body they can use. But the current LLMs, which are trained on some sort of human output, need to exclude AI-generated input or they too will converge on some sort of degenerate attractor.

tossandthrow · 2024-04-11T20:52:55 1712868775

yep, then we are back a "vaguely cynical hand-wringing about how bad AI is."

currently we have mostly LLMs in the mix. but there are no reason that the Ai mix will not contain embodied agents thst also publish stuff in the internet. (think search and rescue bots that automatically write a report).

Now Ai is connected to reality without people in the mix.

jerf · 2024-04-12T14:37:26 1712932646

When trying to close a rhetorical trap on someone, it is useful to first be sure they stepped in it.

whitehexagon · 2024-04-11T18:33:21 1712860401

Hmm something about this title containing the word 'research' disturbs me. I associate that word with rigorous scientific methods that leads to fact based knowledge or maybe some new hypothesis, not some LLM hallucinating sources, references, quotes and all the other garbage they spit out when challenged over a point of fact. Horrifying to think peeps might turn towards these tools for factual information.

madeofpalk · 2024-04-11T18:40:59 1712860859

This anthropomorphism really bothers me. These tools are useful for what they’re good for, but I really dislike the agency people keep trying to give to them.

_akhe · 2024-04-11T21:19:16 1712870356

It should also bother marketers in the AI industry because it confuses people on what the incredible value is.

So many people think LLM means chatbot, even here on HN. So many people think agent means mentally humanoid.

But we have others, like Stable Diffusion's Web UI and Leonardo.AI - these are just tools with interfaces and the text entry for prompting is not presented as though it's a conversation between 2 people.

Someone shared an AI songmaker here recently... And there's a number of promising RAG tools for improving workflows for: Doctors, mechanics, researchers, lawyers.

I agree with you and expect the "AI character" use case to narrow significantly.

Terr_ · 2024-04-11T18:47:33 1712861253

I think there's always been fine line between anthropomorphism as a metaphorical way to indicate complexity versus a pitfall where people (especially outside of a field) start acting like it's a literal statement.

Ex: "the gyroscope is trying to stay upright", or "the computer complains because the update is broken" or "evolution will give the birds longer beaks".

That said, I agree that the problem is dramatically more-severe when it comes to "AI".

devmor · 2024-04-11T18:40:31 1712860831

Yes, I came to the comments to say the same thing. The LLM is not doing research - it is aggregating data associated with terms and reorganizing text based on what previous responses to a similar prompt would look like.

At the most generous level of scrutiny, the only part that could be related to research would be the aggregation of sources - but that is only a precursor to research and likely is too generalized to be as accurate as a specialist preparing data for actual research.

mistermann · 2024-04-11T19:11:41 1712862701

> Hmm something about this title containing the word 'research' disturbs me. I associate that word with rigorous scientific methods...

The presence of the word "scientific" in this statement disturbs me.

echo8899 · 2024-04-12T05:35:08 1712900108

Is there any better word in your mind instead of "research"?