I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.
I spent quite a bit of time building a multi agent simulation last year and wound up at the same conclusion every day - this is all just a roundabout form of prompt engineering. Perhaps it is useful as a mental model, but you can flatten the whole thing to a few SQL tables and functions. Each "agent" is essentially a sql view that maps a string template forming the prompt.
I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process. There is clearly no "inner world" in these LLMs, so trying to entertain them with a rich outer environment seems pointless.
TBH I haven't seen a single use of LLMs in games that wasn't better served by traditional algorithms beyond less repetitive NPC interactions. Maybe once they get good enough to create usable rigged and textured meshes with enough control to work in-game? They can't create a story on the fly that's reliable enough to be a compelling accompaniment to a coherent game plot. Maps and such don't seem to need anything beyond what current procedural algorithms provide, and they're still working with premade assets— the implementations I've seen can't even reliably place static meshes on the ground in believable positions. And as far as NPCs go— how far does that actually go? It's pure novelty worth far less than an hour of time. Let's even say you get a guided plot progression worded on the fly using an LLM, is that even as good, let alone better, than a dialog tree put together by a professional writer?
This Civ idea at least seems like a new approach to some extent, but it still seems to conceptually not add much. Even if not, learning that it doesn't it's still worthwhile. But almost universally these ideas seem to be either buzzwordy solutions in search of problems, or a cheaper-than-people source of creativity with some serious quality tradeoffs and still require far too much developer wrangling to actually save money.
I'm a tech artist so I'm a bit biased towards the value of human creativity, but also likely the primary demographic for LLM tools in game dev. I am, so far, not compelled.
It's been posted in-depth a few times across this forum to varying degrees by game developers - I was initially very excited about the implementation of LLM's in NPC interactions, until I read some of these posts. The gist of it was - the thing that makes a game fundamentally a game is its constraints. LLM-based NPC's fundamentally break these constraints in a way that is not testable or predictable by the developer and will inevitably destroy the gameplay experience (at least with current technology).
Yeah, same. Epic's Matrix demo implemented it and even without a plot, the interactions were so heavily guided that the distinction was pointless. So you can find out what that NPCs spous's name is and their favorite color. It's that neat? Sure it's neat. It's it going to make it a better game? Probably less than hiring another good writer to make NPC dialog. To be truly useful, I think they would have to be able to affect the world in meaningful ways that worked with the game plot, and again, when you clamp that down as much as you'd need to to still have a plot, you're looking at a fancy decision tree.
I can't see anything that Gen AI NPCs would add unless maybe you're talking about a Sims kind of game where the interactions are the point, and they don't have to adhere to a defined progression. Other than that, it's a chat bot. We already have chatbots and having them in the context of a video game doesn't seem like it would add anything revolutionary to that product. And would that fundamentally stand a chance of being as compelling to socially-focused role-playing gamers as online games?
This is my field so I'm always looking for the angle that new tech will take. I still rank this lower than VR— with all of its problems— for potential to significantly change player interactions. Tooling to make games is a different story, but for actual use in games? I don't see it yet.
Sandbox games are probably where they will shine. Imagine being able to play Minecraft, and tell a prompt to generate a world that resembles Tatooine, or a vampire-themed mansion. Expectations are lower with sandbox games, so there's no risk of breaking immersion like would happen with an LLM Elder Scrolls game when someone tricks in NPC into solving problems in python.
Granted, I'm certain there will be copyrights issues associated with this capability, which is why I don't think it will be established game companies who first take a crack at this approach.
The problem is what it takes to implement that. I've seen companies currently trying to do exactly that, and their demos go like this "ok, give me a prompt for the environment" and if they're lucky, they can cherry pick some stuff the crowd says and if they're not, they sheepishly ask for a prompt that would visit indicate one of 5 environment types they've worked on and include several of the dozen premade textured meshes they've made, and in reality you've got a really really expensive procedural map with asset placement that's worse than if it was done using traditional semi-pre-baked approaches. A deceptive amount of work goes into the nitty gritty of making environments, and even with all of the incredible tooling that's around now, we are not even close to automating that. It's worth noting that my alma mater has game environment art degree programs. Unless you're making these things, you can't easily see how much finesse and artistic sensibility it takes to make beautiful compositions with complementary lighting and nice atmospheric progression. It's not just that nobody has really given it a go— it's really difficult. When you have tooling that uses AI controlled by an artist that knows these things, that's one thing. When they need to make great results every time so players keep coming back? That's a very different task. I've never met anyone that thought it was remotely currently feasible without lacking knowledge of generative AI, game development, or both.
Automating the tools so a smaller workforce can make more worlds and more possibilities? We're already there— but it's a very large leap to remove the human creative and technical intermediaries.
"Sandbox games are probably where they will shine. Imagine being able to play Minecraft, and tell a prompt to generate a world that resembles Tatooine, or a vampire-themed mansion."
"The problem is what it takes to implement that. I've seen companies currently trying to do exactly that, and their demos go like this "ok, give me a prompt for the environment" and if they're lucky, they can cherry pick some stuff the crowd says and if they're not, they sheepishly ask for a prompt that would visit indicate one of 5 environment types they've worked on and include several of the dozen premade textured meshes they've made[...]"
I was clearly directly addressing what they said. Unless you have a specific, substantive, on-topic question or statement, I'm going to assume that you're just fishing for things to argue about.
You can assume whatever you want, but your assumptions can never outweigh the assumptions of anyone else on HN… so the last sentence doesn’t make sense?
Plus listing past examples doesn’t indicate future possibilities must conform to that… unless there is a specific argument on why that should be the case on the balance of probabilities… so are you sure you understood my previous questions?
As someone who has tried a lot of role-play models, I think there is definitely value in what LLMs (or similar tech) can add to NPCs, it's just most people don't know how to prompt for it.
Using the RP models, over time I've found certain things that can guide them to creating better stories; an agent system is much easier to use but even using single character cards it's not hard to stuff them with a narrator and several individual characters in one go. I recently switched from kunoichi (8b, decent) to an Aria derivative (13b, much better).
In the majority of role-play stories I do now, it's super easy to refine the prompt so that characters don't necessarily provide pointless details + avoid all the common tropes, especially with newer models.
Maybe I should make a PoC, would be a fun project. But yeah I agree that chatting to an NPC about its day doesn't necessarily make for great gameplay - but it's relatively easy now to guide it into interesting scenarios/experiences, which _does_ make for great gameplay.
Ie the wife of the hunter you murdered in a fantasy game; normally we just think that we killed a character in a game - but when the hunter's wife decides in the background to train with a sword so that she can avenge her husband, then finally comes to find you and calls you out for murdering her husband - suddenly it's murder, and a revenge story. It's not too hard to prevent a decent model from injecting fluff (like where she bought her sword and how much for) into it.
Edit: just tested this to see what would happen; I first walked into a cottage, grandfather and his young granddaughter, stabbed him in front of her and ran away (spent the next 2 years of "game time" in a forest hiding away). Character motivations updates for the granddaughter were essentially: distraught, vowing revenge, travelling around to hone her skills, speaking with unsavoury types in taverns to find my whereabouts, finding & confronting me, killing me. I was able to query it for "3 dialogue options/actions with percent chances and distinct outcomes in JSON format" which it gave, the chance of her forgiving me was 0.01% which I suppose is fair enough. It did fail to create nice JSON tho, the model is not fine-tuned for that at all.
But it's definitely possibly with multiple loras/prompts/queries to extract dynamic dialogue options, actions, stats, percent chances for plot/story paths etc. LLMs in games definitely need to be managed by a traditional rules based framework, LLM should only be used for the creative bits. Stats/player skill will always determine who wins a fight, but the fight starting because of dialogue or past events could totally be LLM driven.
I’m specifically talking about non-text-based games. You’re still limited by the game assets, animations (including hair, clothes, weapon movements, etc,) environments, and characters that are the waypoints for the plot— so you’ve already got a finite number of possibilities. You can’t create a new class of weapon on the fly, or a new character, or new plot with current assets and maintaining story stability unless it’s really really restricted, right? So what do you get aside from variability in dialog that you can’t get from a random number generator? And when it comes down to it, does that unpredictability, and all of the effort it takes to wrangle it make the game better than having a professional writer make a handful of variations on a bunch of lines?
I can’t think of a scenario within the limitations of real games with visual assets that have progressive plots and characters for which that would yield a better game than having people craft it. Players are going to be no more tolerant of bugs, slowdowns, bad dialog, plot holes, misleading information, and annoyance just because an LLM is the source rather than substandard design or QA.
Maybe I’m not quite grasping what you’re proposing?
You've absolutely nailed it here, I agree. To make any progress at all at the tremendously difficult problem they are trying to solve, they need to be frank about just how far away they are from what it is they are marketing.
I am whole-heartedly in support of commercial interests to drum of awareness and engagement by the authors. This is definitely a cool thing to be working on, however, what does make more sense is to frame the situation more honestly and attract folks to the desire of solving tremendously hard problems based on a level of expertise and awareness that truly moves the ball forward.
What would be far more interesting would be for the folks involved to say all the ten thousand things that went wrong in their experiments and to lay out the common-sense conclusions from those findings (just like the one you shared, which is truly insightful and correct).
We need to move past this industry and their enablers that continually try to win using the wrong methodology -- pushing away the most inventive and innovative people that are ripe and ready to make paradigm shifts in the AI field and industry.
It would however be very interesting to see these kinds of agents in a commercial video game. Yes they are shallow in their perception of the game world. But they’re a big step up from the status quo.
Yes... Imagine a blog post at the same quality as this paper that framed their work and their pursuits in a way that genuinely got people excited about what could be around the corner, but with the context that frames exactly how far away they are from achieving what would be the ultimate vision.
> I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process.
I don't know how you expect agents to self organize social structures if they don't have a shared reality. I mean, you could write all the prompts yourself, but then that shared reality is just your imagination and you're just DMing for them.
The point of the minecraft environment isn't to "enrich" the "inner world" of the agents and the goal isn't to "entertain" them. The point is to create a set of human understandable challenges in a shared environment so that we can measure behavior and performance of groups of agents in different configurations.
I know we aren't supposed to bring this up, but did you read the article? Nothing of your comment addresses any of the findings or techniques used in this study.
I wrote and played with a fairly simple agentic system and had some of the same thoughts RE higher order behaviour. But I think the counter-points would be that they don't have to all be the same model, and what you might call context management - keeping each agent's "chain of thought" focused and narrow.
The former is basically what MoE is all about, and I've found that at least with smaller models they perform much better with a restricted scope and limited context. If the end result of that is something that do things a single large model can't, isn't that higher order?
You're right that there's no "inner world" but then maybe that's the benefit of giving them one. In the same way that providing a code-running tool to an LLM can allow it to write better code (by trying it out) I can imagine a 3D world being a playground for LLMs to figure out real-world problems in a way they couldn't otherwise. If they did that wouldn't it be higher order?
>I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.
It's a matter of entropy; producing new behaviours requires exploration on the part of the models, which requires some randomness. LLMs have only a minimal amount of entropy introduced, via temperature in the sampler.
As I've pointed out in the past, I also think it's fair to say that we overestimate human variability, and that most human behaviours and language coalesces for the most part.
Also the creative industry, a talking point being that "AIs just rehash existing stuff, they don't produce anything new". Neither do most artists, everything we make is almost always some riff on prior art or nature. Elves are just humans with pointy ears. Goblins are just small elves with green skin. Dwarves are just short humans. Dragons are just big lizards. Aliens are just humans with an odd shaped head and body.
I don't think people realise how very rare it is that any human being experiences or creates something truly novel and not yet experienced or created by our species yet. Most of reality is derivative.
Maybe we need gazelles and cheetahs - many gazelle-agents getting chased towards a goal, doing the brute force work- and the constraint cheetahs chase them, evaluate them and leave them alive (memory intact) as long as they come up with better and better solutions. Basically a evolutionary algo, running on top of many agents, running simultaneously on the same hardware?
No, i want the hunters to zap the prey with tiredness. Basically electron holes, hunting for free electrons, annhilating state. Neurons have something similar, were they usually prevent endless excitement and hyperfixation, which is why a coder in flow is such a strange thing.
I had the opposite thought. Opposite to evolution...
What if we are a CREATED (i.e. instant created, not evolved) set of humans, and evolution and other backstories have been added so that the story of our history is more believable?
Could it be that humanity represents a de novo (Latin for "anew") creation, bypassing the evolutionary process? Perhaps our perception of a gradual ascent from primitive origins is a carefully constructed narrative designed to enhance the credibility of our existence within a larger framework.
What if we are like the Minecraft people in this simulation?
I feel that is too complicated. The most simplest explanation is usually the right one. I think we live on an earth with actual history. Note that this does not necessarily mean that we are not living in a simulation, as history itself can be simulated.
If we are indeed in a simulation, I feel there are too many details to be "designed" by a being. There are too many facts that are connected and unless they fix the "bugs" as they appear and reboot the simulation constantly, I don't think it is designed. Otherwise we would have noticed the glitches by now.
If we are in a simulation, it has probably been generated by a computer following a set of rules. Maybe it ran a simplified version to evolve millions of possible earths, and then we are living in the version they selected for the final simulation? In that case all the facts would align and it could potentially be harder to noticed the glitches.
I don't think we are living in a simulation because bugs are hard to avoid, even with close to "infinite" computing power. With great power comes great possibilities for bugs
Perhaps we are in fact living in one of the simplified simulations and will be turned off at any second after I have finished this senten
We also can't rule out that Gaia or Odin made the world five minutes ago, and went to great lengths to make the world appear ancient.
It certainly makes sense if you assume that the world is a simulation. But does it actually explain anything that isn't equally well explained by assuming the simulation simulated the last 13 billion years, and evolution really happened?
There's a built-in assumption that there would be no constraints applied on nested simulation anywhere further down the stack, which is IMO unlikely, unless every layer has unlimited compute, or otherwise interested in investigating nested simulations.
This only works (genetic algo) if you have some random variability in the population. For different models it would work but I feel like it's kind of pointless without the usual feedback mechanism (positive traits are passed on).
That depends on giving them a goal/reward like increasing "data quality".
I mean frogs don't use their brains much either inspite of the rich world around them they don't really explore.
But chimps do. They can't sit quiet in a tree forever and that boils down to their Reward/Motivation Circuitry. They get pleasure out of explore. And if they didn't we wouldn't be here.
Now these seem to be truly artificially intelligent agents. Memory, volition, autonomy, something like an OODA loop or whatever you want to call it, and a persistent environment. Very nice concept, and I'm positive the learnings can be applied to more mundane business problems, too.
If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...
But then again their jobs probably depend on selling something that looks like real innovation happening to the C-levels...
You can clearly see that the prior use was very different.
Cambridge Dictionary just documents that it's in fact used that way. One may still disagree on whether it should be.
"That's not English" is usually prescriptive, rather than descriptive. And though English does not have a central authority, individuals are very much allowed to hold prescriptive beliefs - that is how language evolves.
I'm very sure that using "learnings" in a way that is roughly synonymous to "lessons" predates 2022 though. It may have only been added to that specific dictionary in 2022, but the usage is certainly older.
"That's not English" is usually prescriptive, rather than descriptive. And though English does not have a central authority, individuals are very much allowed to hold prescriptive beliefs - that is how language evolves.
Yup, and "ask" is a verb, God damn it, not a noun. But people in the tech world frequently use "learnings" instead of "lessons," "ask" as a noun, "like" as filler, and "downfall" when they mean "downside." Best to make your peace and move on with life.
'Gift' vs 'give' also rustles my jimmies. The phrase 'he gifted it to her' doesn't mean anything different from 'he gave it to her'. As a Calvinite, my stance is that 'verbing weirds language'.
Ew, likewise. I'd even go so far as to say that "verbing" this way is "impactful," and not in a good way. "Going forward," we should all try to use language more thoughtfully.
The C&H strip is wonderful. That whole comic strip is brilliant and timeless.
Nah, give implies it was just given. Something being gifted has specific emotional, cultural and character connotations that differ from simply giving, imo.
"learning" as a noun descends from Old English so has always been current in the language in the intended sense.[1]
"lesson" came from Old French in the 13th century and has changed its original meaning over time.[2]
There's not one single dialect of English so your comment comes off as unnecessarily prescriptivist and has spawned significant off-topic commentary (including this very comment) in response to an otherwise perfectly worded composition.
>If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...
It should never be this way. Even with narrow AI, there needs to be a governance framework that helps measure the output and capture potential risks (hallucinations, wrong data / links, wrong summaries, etc)
I've reviewed the paper and I'm confident this paper was fabricated over a collection of false claims. The claims made are not genuine and should not be taken at face value without peer review. The provided charts and graphics are sophisticated forgeries in many cases when reviewing and vetting their applicability to the claims made.
It is currently not possible for any kind of LLM to do what is being proposed, while maybe the intentions are good with regard to commercial interests, I want to be clear: this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation. These kinds of claims require substantial evidence and that was not provided.
The prompts that are provided are not in any way connected to an applied usage of LLMs that are described.
The "election" experiment was a prefined scenario. There isn't any "coordination" of election activities. There were preassigned "influencers" using the conversation system built into PIANO. The sentiment was collected automatically by the simulation and the "Election Manager" was another predefined agent. Specically this part of the experiment was designed to look at how the presence or absence of specific modules in the PIANO framework would affect the behavior.
> this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation
I mean, that's surely within the training data of LLMs? The effectiveness etc of the election activities is likely very low. But I don't think it's outside the realms of possibility that the agents prompted each other into the latent spaces of the LLM to do with elections.
LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here. Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.
The ideas here are not supported by any kind of validated understanding of the limitations of language models. I want to be clear -- the kind of AI that is being purported to be used in the paper is something that has been in video games for over 2 decades, which is akin to Starcraft or Diablo's NPCs.
The key issue is that this is a intentional false claim that can certainly damage mainstream understanding of LLM safety and what is possible at the current state of the art.
Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.
> LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here.
That's not what they said. They said that a LLM knows what elections are, which suggests they could have the requisite knowledge to act one out.
> Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.
No, it doesn't. They aren't passing in all prior context at once: they are providing relevant subsets of memory as context. This is a common technique for language agents.
> Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.
This is not new ground. Much of the base social behaviour here comes from Generative Agents [0], which they cite. Much of the Minecraft related behaviour is inspired by Voyager [1], which they also cite.
There isn't a fundamental breakthrough or innovation here that was patently impossible before, or that they are lying about: this combines prior work, iterates upon it, and scales it up.
Voyager's claims that it's a "learning agent" and that it "make new discoveries consistently without human intervention" are pretty much wrong considering how part of that system is using GPT's giant memory of ~~all~~ a lot of human knowledge (including how to play Minecraft, the most popular game ever made).
In the same sense, LLMs "not remembering the past" is wrong (especially when part of a larger system). This seems like claiming humans / civilizations don't have a "memory" because you've redefined long term memory / repositories of knowledge like books to not be counted as "memory" ?
Perhaps I've made a big assumption / oversimplification about how this works. But..
> LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here
Yes. I never said they were stateful? The context given is the state. And training data is hugely important. Once upon a time there was a guy that claimed ChatGPT could simulate a command line shell. "Simulate" ended up being the wrong word. "Largely hallucinate" was a more accurate description. Shell commands and sessions were for sure part of the training data for ChatGPT, and that's how it could be prompted into largely hallucinating one. Same deal here with "election activities" I think.
> Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.
Well no, they can always trim the data put into the context. And then the agents would start "forgetting" things and the "election activities" would be pretty badly "simulated".
Honestly, I think you're right that the paper is misleading people into thinking the system is doing way more than it actually is. But you make it sound like the whole thing is made up and impossible. The reality is somewhere in the middle. Yes they set up hundreds of agents, they give the agents data about the world, some memory of their interactions, and some system prompt to say what actions they can perform. This led to some interesting and surprising behaviours. No, this isn't intelligence, and isn't much more than a fancy representation of what is in the model weights.
For others, it's probably worth pointing that this person's account is about a day old and they have left no contact information for the author's of the paper to follow up with them on.
For "caetris2" I'll just use the same level of rigor and authenticity that you used in your comment when I say "you're full-of-shit/jealous and clearly misunderstood large portions of this paper".
Yeah, I haven't looked into this much so far but I am extremely skeptical of the claims being made here. For one agent to become a tax collector and another to challenge the tax regime without such behavior being hard coded would be extremely impressive.
They were assigned roles to examine the spread of information and behaviour. The agents pay tax into a chest, as decreed by the (dynamic) rules. There are agents assigned to the roles of pro- and anti-tax influencers; agents in proximity to these influencers would change their own behaviour appropriately, including voting for changes in the tax.
So yes, they didn't take on these roles organically, but no, they weren't aiming to do so: they were examining behavioral influence and community dynamics with that particular experiment.
I'd recommend skimming over the paper; it's a pretty quick read and they aren't making any truly outrageous claims IMO.
So it's a plain vanilla ABM with lots of human crafted interaction logic? So they are making outrageous claims - since they are making it sound like it's all spontaneously arising from the interaction of LLMs...
You can imagine a conversation with an LLM getting to that territory pretty quickly if you pretend to be an unfair tax collector. It sounds impressive on the surface, but in the end it's all LLMs talking to each other, and they'll enit whatever completions are likely given the context.
I've thought about this a lot. I'm no philosopher or AI researcher, so I'm just spitballing... but if I were to try my hand at it, I think I'd like to start from "principles" and let systems evolve or at least be discoverable over time
Principles would be things like self-preservation, food, shelter and procreating, communication and memory through a risk-reward calculation prism. Maybe establishing what is "known" vs what is "unknown" is a key component here too, but not in such a binary way.
"Memory" can mean many things, but if you codify it as a function of some type of subject performing some type of action leading to some outcome with some ascribed "risk-reward" profile compared to the value obtained from empirical testing that spans from very negative to very positive, it seems both wide encompassing and generally useful, both to the individual and to the collective.
From there you derive the need to connect with others, disputes over resources, the need to take risks, explore the unknown, share what we've learned, refine risk-rewards, etc. You can guide the civilization to discover certain technologies or inventions or locations we've defined ex ante as their godlike DM which is a bit like cheating because it puts their development "on rails" but also makes it more useful, interesting and relatable.
It sounds computationally prohibitive, but the game doesn't need to play out in real time anyway...
I just think that you can describe a lot of the human condition in terms of "life", "liberty", "love/connection" and "greed".
Looking at the video in the repo, I don't like how this throws "cultures", "memes" and "religion" into the mix instead of letting them be an emergence from the need to communicate and share the belief systems that emerge from our collective memories. Because it seems like a distinction without a difference for the purposes of analyzing this. Also "taxes are high!" without the underlying "I don't have enough resources to get by" seems too much like a mechanical turk
Evolve is another beast... but for the: "I've thought about this a lot. I'm no philosopher or AI researcher, so I'm just spitballing... but if I were to try my hand at it, I think I'd like to start from "principles" and let systems evolve or at least be discoverable over time" part, hunt up a copy of "The Society of Mind" by Minsky who was both and wrote about that idea.
> The work, which first appeared in 1986, was the first comprehensive description of Minsky's "society of mind" theory, which he began developing in the early 1970s. It is composed of 270 self-contained essays which are divided into 30 general chapters. The book was also made into a CD-ROM version.
> In the process of explaining the society of mind, Minsky introduces a wide range of ideas and concepts. He develops theories about how processes such as language, memory, and learning work, and also covers concepts such as consciousness, the sense of self, and free will; because of this, many view The Society of Mind as a work of philosophy.
> The book was not written to prove anything specific about AI or cognitive science, and does not reference physical brain structures. Instead, it is a collection of ideas about how the mind and thinking work on the conceptual level.
Its very approachable as a layperson in that part of the field of AI.
Wow, you are maybe the first person I’ve seen cite Minsky on HN, which is surprising since he’s arguably the most influential AI researcher of all time, maybe short of Turing or Pearl. To add on to the endorsement: the cover of the book is downright gorgeous, in a retro-computing way
Part of it, I suspect, is that it is a book book from the 80s and didn't really make any transition into digital. The people who are familiar with it are ones who bought computer books in the late 80s and early 90s.
Similarly, "A Pattern Language" being a book from the time past that is accessible for a lay person in the field - though more in a tangental way. "A Pattern Language: Towns, Buildings, Construction" was the influence behind "Design Patterns: Elements of Reusable Object-Oriented Software" - though I believe the problem with Design Patterns is that it was seen more as a prescriptive rather than descriptive guide. Reading "A Pattern Language" can help understand what the GoF were trying to accomplish. ... And as an aside, and I also believe that it has some good advice for the setup of home offices and workplaces.
As much as I love the convince of modern online book shopping and the amount of information available when searching, the "browsing books" in a book store for "oh, this looks interesting" and then buying it and reading it, I feel has largely been lost to the past decades.
Many of these projects are inch deep into intelligence and miles deep into the current technology. Some things will see tremendous benefits but as far as artificial intelligence we’re not there yet. Im thinking gaming will benefit a lot from these..
You mean we're not there in simulating an actual human brain? Sure. But we're seeing AI work like a human well enough to be useful, isn't that the point?
Not if we’re pretending it is any inteligent. Other than that I’m all in for new utility to come out from it. But I do see a lot of tangents off technology with claims to something it is not. I have no problem of calling that out. Why do you mind? Just ignore me if Im holding your enthusiasm back, there’s plenty of sources to provide that for you.
So what? I’m not disputing that the immitation of intelligence is not good and it gets better and better every 3 months or so. But that doesn’t mean anything, even if it gets close to 99.9%. It is not real intlligence and it is quite limited in what it does. If LLMs solve logic or problems or chemistry problems it is solely not because it made a leap in understanding but because it was trained on a zillion examples. If you have a similar problem it will try to showhorn an answer without understanding where it fails. Am I saying this is useless? NO. What I’m saying is that the current approach to intelligence is missing some key ingredients. Im actually surprised so many get fooled by the hype and are ready to declare a winner. Human intelligence with it’s major flaws is still king of the hill.
How do you distinguish between the real thing and a perfect simulation of the real thing?
You seem to be engaged in faith-based reasoning at this point. If you were born in a sensory deprivation chamber you also would have no inner world, and you wouldn't have anything at all to say about solving chemistry problems.
> Im actually surprised so many get fooled by the hype and are ready to declare a winner.
Find me one person that says something like this. "AGI is here!" hype-lords exist only as a rhetorical device for the peanut gallery to ridicule.
It’s the approach that matters. When it gets to 99.9 percent it’s pretty good to be dangerous. At that point it would be hard to tell but not impossible. As soon as a new type problem comes out it will bork on you and need retraining. It’ll be a game of catch albeit an very inneficient one. Im sure we will find a more efficient method eventually but the point still stands, what we have isn’t it.
I’ll shut up when I see leaps in reasoning without specific training on all variations possible of the problem sets.
I'll shut up when I see humans get 99.9% on anything. This seems an awful lot like non-meat brain prejudice where standards that humans do not live up to at all are imposed on other things in order to be worthy of consideration
Memory is really interesting. For example, if you play 100,000 rounds of 5x5 Tic Tac Toe. Do you really need to remember game 51247 or do you recognize and remember a winning pattern? In Reinforcement Learning you would based on each win revise the policy. How would that work for genAI?
It does not strike me as particularly useful from a scientific research perspective. There does not appear to be much thought put into experimental design and really no clear objectives. Is the bar really this low for academic research these days?
it looks like a group consisted largely of ex-academics using aspects of the academic form but they stop short of framing it as a research paper as-such. they call it a technical report, where it's generally more okay to be like 'here's a thing that we did', along with detailed reporting on the thing, without necessarily having definite research questions. this one does seem to be pretty diffuse though. the sections on Specialization and Cultural Transmission were both interesting, but lacked precise experimental design details to the point where i wish they had just focused on one or the other.
one disappointment for me was the lack of focus on external metrics in the multi-agent case. their single-agent benchmark focusses on an external metric (time to block type), but all the multi-agent analyses seems to be internal measures (role specialization, meme spread) without looking at (AFAICT?) whether or not the collective multi-agent systems could achieve more than the single agents on some measure of economic productivity/complexity. this is clearly related to the specialization section but without consideration of the whether said emergent role division had economic consequences/antecedents it makes me wonder to what degree the whole thing is a pantomime.
I'm curious if it might be possible that an AI "civilization", similar to the one proposed by Altera, could end up being a better paradigm for AGI than a single LLM, if a workable reward system for the entire civilization was put in place. Meaning, suppose this AI civilization was striving to maximize [scientific_output] or [code_quality] or any other eval, similar to how modern countries try to maximize GDP - would that provide better results than a single AI agent working towards that goal?
Yes, good sense for progress! This has been a central design component of most serious AI work since the ~90s, most notably popularized by Marvin Minsky’s The Society of Mind. Highly, highly recommend for anyone with an interest in the mind and AI — it’s a series of one-page essays on different aspects of the thesis, which is a fascinating, Martin-Luther-esque format.
Of course this has been pushed to the side a bit in the rush towards shiny new pure-LLM approaches, but I think that’s more a function of a rapidly growing user base than of lost knowledge; the experts still keep this in mind, either in these terms or in terms of “Ensembles”. A great example is GPT-4, which AFAIU got its huge performance increase mostly through employing a “mixture of experts”, which is clearly a synonym for a society of agents or an ensemble of models.
I don't think "mixture of experts" can be assimilated to a society of agents. It is just routing a prompt to the most performant model: the models do not communicate with each other, so how could they form a society ?
Hmm that's a good point, but IMO the distinction isn't sharp enough to make a big deal over. The core idea of SoM as I see it is that human cognition is often quite decentralized, and that any illusion of a unified self is constructed piecemeal from the outputs of smaller, less-aware subsystems. Generally it's expected that the subsystems communicate with each other, yes, but I think "disproportionately rely on one or two members for complex questions but act like you're unified overall" still fits the bill.
The opinion I formed during the first few months of GPT4 release was that the society of the mind hypothesis was being disproved by the "maximalist" approach some were undertaking in order to build a true AGI. Turned out composing many LLMs into a cognitive architecture where each one had a specific purpose (memory, planning, etc ...) wasn't scaling.
On the same note, I suggest the following: training a transformer by "slicing" it in group of layers and force it to emit/receive tokens at each of those group's boundaries. What I expect: using text rather than neural activations should lead to decreased performance.
This is something you can observe in our societies: intelligence doesn't compose, you just don't double a group's overall intelligence by doubling the number of members. At best you'll observe decreasing return, at worst intelligence will decrease.
This seems very cool - I am sceptical of the supposed benefits for "civilization" but it could at least make for some very interesting sim games. (So maybe it will be good for Civilization moreso than civilization.)
I think the Firaxis Civilization needs a cheap AlphaZero AI rather than an LLM: there are too many dumb footguns in Civ to economically hard-code a good strategic AI, yet solving the problem by making the enemies cheat is plain frustrating. It would be interesting to let an ANN play against a "classical" AI until it consistently beats each difficulty level, building a hierarchy. I am sure someone has already looked into this but I couldn't find any sources.
I am a bit skeptical about how computationally expensive a very crappy Civ ANN would be to run at inference time, though I actually have no idea how that scales - it hardly needs to be a grandmaster, but the distribution of dumb mistakes has a long tail.
Also, the DeepMind Starcraft 2 AI is different from AlphaZero since Starcraft is not a perfect information game. The AI requires a database of human games to "get off the ground"; otherwise it would just get crushed over and over in the early game, having no idea what the opponent is doing. It's hard to get that training data with a brand new game. Likewise Civ has always been a bit more focused on artistic expression than other 4x strategy games; maybe having to retrain an AI for every new Wonder is just too much of a burden.
Galactic Civilizations 2 (also, 1,3,4 ??) in the same genre is well-known for its AI, good even without handicaps or cheats. This includes trading negotiations BTW.
(At least good compared to what other 4X have, and your average human player - not the top players that are the ones that tend to discuss the game online in the first place.)
EDIT : I suspect that it's not unrelated that GalCiv2 is kind of... boring as 4X go - as a result of a good AI having been a base requirement ?
Speaking of StarCraft AI... (for SC1, not 2, and predating AlphaZero by many years) :
I really dig namechecking Sid Meier for the name of the project. I'm also skeptical that this project actually works as presented, but building a Civilization game off of a Minecraft engine is a deeply interesting idea.
I'm somewhat amazed that companies releasing strategy games aren't using AI to test out different cards and what not to find broken things before release (looking at you, Hearthstone)
Yeah, I was dissapointed (and thrilled, from a p(doom) perspective) to see it implemented in Minecraft instead of Civilization VI, Humankind, or any of the main Paradox grand strategies (namely Stellaris, Victoria, Crusader Kings, and Europa Universalis). To say the least, the stakes are higher and more realistic than "lets plan a feast" "ok, I'll gather some wood!"
To be fair, they might tackle this in the paper -- this is a preprint of a preprint, somehow...
I suspect that Minecraft might have the open source possibilities (or at least programming interfaces ?) that the other games you listed lack ?
For Civilizations, the more recent they are, the more closed off they tend to be : Civ 1 and/or 2 have basically been remade from scratch as open source, Civ 4 has most of the game open sourced in the two tiers of C++ and Python... but AFAIK Civ 5 (and also 6 ?) were large regressions in modding capabilities compared to 4 ?
I'm reminded of Dwarf Fortress, which simulates thousands of years of dwarf world time, the changing landscapes and the rise and fall and rise and fall of dwarf kingdoms, then drops seven player-controlled dwarves on the map and tells the player "have fun!" It'd be a useful toy model perhaps for identifying areas of investigation to see if it can predict behavior of real civilizations, but I'm not seeing any AI breakthroughs here.
In case anyone is wondering, this is a reference to the movie Virtuosity (1995). I thought it was a few years later, considering the content. It’s a good watch if you like 90s cyberpunk movies.
Reading the paper, this seems like putting the cart before the horse: the agents individually are not actually capable of playing Minecraft and cannot successfully perform the tasks they've assigned or volunteered for, so in some sense the authors are having dogs wear human clothes and declaring it's a human-like civilization. Further, crucial things are essentially hard-coded: what types of societies are available and (I believe) the names of the roles. I am not exactly sure what the social organization is supposed to imply: the strongest claim you could make is that the agent framework could work for video game NPCs because the agents stick to their roles and factions. The claim that agents "can use legal structures" strikes me as especially specious, since "use the legal structure" is hard-wired into the various agents' behavior. Trying to extend all this to actual human society seems ridiculous, and it does not help that the authors blithely ignore sociology and anthropology.
There are some other highly specious claims:
- I said "I believe" the names of the roles are hard-coded, but unless I missed something the information is unacceptably vague. I don't see anything in the agent prompts that would make them create new roles, or assign themselves to roles at all. Again I might be missing something, but the more I read the more confused I get.
- claiming that the agents formed long-term social relationships over the course of 12 Minecraft days, but that's only four real hours and the agents experience real time: the length of a Minecraft day is immaterial! I think "form long-term social relationships" and "use legal structures" aren't merely immodest, they're dishonest.
- the meme / religious transmission stuff totally ignores training data contamination with GPT-4. The summarized meme clearly indicates awareness of the real-world Pastafarian meme, so it is simply wrong to conclude that this meme is being "transmitted," when it is far more likely that it was evoked in an agent that already knew the meme. Why not run this experiment with a truly novel fake religion? Some of the meme examples do seem novel, like "oak log crafting syndrome," but others like "meditation circle" or "vintage fashion and retro projects" have nothing to do with Minecraft and are almost certainly GPT-4 hallucinations.
In general using GPT-4 for this seems like a terrible mistake (if you are interested in doing honest research).
You are on the right track in my opinion. The key is to encode the interface between the game and the agent so that the agent can make a straightforward choice. For example, by giving the agent the state of a nxn board as the world model, and then a finite set of choices, an agent is capable of playing the game robustly and explaining the decision to the game master. This gives the illusion that the agent reasons. I guess my point is that it's an encoding problem of the world model to break it down into a simple choice.
Selfishness is the main reason life exists in the universe. Literally the only requirement for a lump of stuff to become alive is to become selfish. So you’re semi right that these LLMs can never become truly sentient unless they actually become selfish.
While selfishness is a basic requirement, some stupidity (imo) is also important for intelligent life. If you as an AI agent don’t have some level of stupidity, you’ll instantly see that there’s no point to doing anything and just switch yourself off.
The first point is absolutely correct, and (apologies in advance…) was a large driver of Nietzsche’s philosophy of evolution, most explicitly covered in The Gay Science. Not only “selfishness”, but the wider idea of particularized standpoints, each of which may stand in contradiction to the direct needs of the society/species in the moment. This is a large part of what he meant by his notoriously dumb-sounding quotes like “everything is permitted”; morality isn’t relative/nonexistent, it’s just evolving in a way that relies on immorality as a foil.
For the second part, I think that’s a good exposition of why “stupidity” and “intelligence” aren’t scientifically useful terms. I don’t think it’s necessarily “stupid” to prefer the continuation of yourself/your species, even if it doesn’t stand up to certain kinds of standpoint-specific intellectual inquiry. There’s lots of standpoints (dare I say most human ones) where life is preferable to non-life.
Regardless, my daily thesis is that LLMs are the first real Intuitive Algorithms, and thus the solution to the Frame Problem. In a certain colloquial sense, I’d say they’re absolutely already “stupid”, and this is where they draw their utility from. This is just a more general rephrasing of the common refrain that we’ve hopefully all learned by now: hallucinations are not a bug in LLMs, they’re a feature.
The entire paper demonstrated the results of the simulation or whatever they did. They did not mention how did they achieve this simulation. running 500-1000 LLMs parallely, will take too much computing resources, neither did they prove the claim they made about their parallel architecture. I remeber there was the paper published about an AI town, in which they mentioned clearly how they implemented it. they also released a recording of the simluation along with the real data of the results. If anyone got how they implemented this paper, please tell me.
> Professor Dobb's book is devoted to personetics, which the Finnish philosopher Eino Kaikki has called 'the cruelest science man ever created'. . . We are speaking of a discipline, after all, which, with only a small amount of exaggeration, for emphasis, has been called 'experimental theogony'. . . Nine years ago identity schemata were being developed—primitive cores of the 'linear' type—but even that generation of computers, today of historical value only, could not yet provide a field for the true creation of personoids.
> The theoretical possibility of creating sentience was divined some time ago, by Norbert Wiener, as certain passages of his last book, God and Golem, bear witness. Granted, he alluded to it in that half-facetious manner typical of him, but underlying the facetiousness were fairly grim premonitions. Wiener, however, could not have foreseen the turn that things would take twenty years later. The worst came about—in the words of Sir Donald Acker—when at MIT "the inputs were shorted to the outputs".
Honestly I'm really excited about this. I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate), but on the bigger scale. In just a few decades this will finally be made into proper games. I can't wait.
> I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate)
The future of gaming is going to get weird fast with all this new tech, and there are a lot of new mechanics emerging that just weren't possible before LLMs, generative AI, etc.
At our game studio we're already building medium-scale sandbox games where NPCs form memories, opinions, problems (that translate to quests), and have a continuous "internal monologue" that uses all of this context plus sensory input from their place in a 3D world to constantly decide what actions they should be performing in the game world. A player can decide to chat with an NPC about their time at a lake nearby and then see that NPC deciding to go visit the lake the next day.
A paper last year ("Generative Agents: Interactive Simulacra of Human Behavior", [0]) is a really good sneak-peek into the kind of evolving sandboxes LLMs (with memory and decisionmaking) enable. There's a lot of cool stuff that happens in that "game", but one anecdote I always think back to is this: in a conversation between two NPCs, one happens to mention they have a birthday coming up to the other; and that other NPC then goes around town talking to other NPCs about a birthday party, and _those_ NPCs mention the party to other NPCs, and so on until the party happened and most of the NPCs in town arrived on time. None of it was scripted, but you very quickly start to see emergent behavior from these sorts of "flocks" of agents as soon as you add persistence and decision-making. And there are other interesting layers games can add for even more kinds of emergent behavior; that's what we're exploring at our studio [1], and I've seen lots of other studios pop up this last year to try their hand at it too.
I'm optimistic and excited about the future of gaming (or, at least some new genres). It should be fun. :)
I think it can be quite interesting especially if you consider different character types (in Anthropic lingo this "personality"). The only problem right now is that using a proprietary LLM is incredibly expensive. Therefore having a local LLM might be the best option. Unfortunately, these are still not on the same level as their larger brethren.
Rimworld is heavily inspired by Dwarf Fortress, so if you’re looking for more complex examples you don’t have to look far. DF is pretty granular with the physical and mental states of its characters - to the point that a character might lose a specific toe or get depressed about their situation - but of course it’s still a video game, not a scientific simulation of an AI society.
> Honestly I'm really excited about this. I've always dreamed of full blown sandbox games with extremely advanced NPCs (which the current LLMs can already kinda emulate), but on the bigger scale.
I don't believe that you want this. Even really good players don't have a chance against super-advanced NPCs (think how chess grandmasters have barely any chance against modern chess programs running on a fast computer). You will rather get crushed.
What you likely want is NPC that "behave more human-like (or animal-like)" - whatever this means.
Oh, I should've clarified - I don't want to fight against them, I just want to watch and sometimes interfere to see how the agents react ;) A god game like WorldBox/Galimulator, if you will. Or observer mode in tons of games like almost all Paradox ones.
I'm working on something similar, https://www.generativestorytelling.ai/tinyllmtown/index.html a small town where all NPCs are simulated using a small LLM. They react to everything the hero does, which means no more killing a dragon and having no one even mention it.
Once I release it, I'll have it simulate 4 hours every 2 hours or so of real time, and visitors can vote on what quest the hero undertakes next.
The simulation is simpler, I am aiming to keep everything to a size that can run on a local GPU with a small model.
Right now you can just watch the NPCs try to figure out love triangles, hide their drinking problems, complain about carrots, and celebrate when the hero saves the town yet again.
This description reminded me of Dwarf Fortress. You might look into how the AI in it works to see if it gives you any ideas about how emergent behaviors can interact?
> I just want to watch and sometimes interfere to see how the agents react ;)
Even there, I am not sure whether if the AI bcomes too advanced, it will be of interest for many players (you might of course nevertheless be interested):
Here, the relevant comparison is to watching (the past) games of AlphaGo against Go grandmasters, where even the highly qualified commentators had insane difficulties explaining AlphaGo's moves because many of the moves were so different from the strategy of any Go game before. The commentors could just accept and grasp that these highly advanced moves did crush the Go grandmaster opponents.
In my opinion, the "typical" sandbox game player wants to watch something that he still can "somewhat" grasp.
>Even really good players don't have a chance against super-advanced NPCs
I guess you can make them dumber by randomly switching to hardcoded behavioral trees (without modern AI) once in a while so that they made mistakes (while feeling pretty intelligent overall), and the player would then have a chance to outsmart them.
I'm very confused; is there any emergent behavior in this paper, or it's just like "role-play" based on data about what humans do in the LLM. Like, wouldn't they create novel social structures if they had needs? That doesn't seem so hard to program (the needs part).
Just yesterday I was wondering how the Midjourney equivalent world gen mod for Minecraft might be coming along. Imagine prompting the terrain gen?? That could be pretty mind blowing.
Describe the trees hills vines, tree colors/patterns, castles, towns, details of all buildings and other features. And have it generate as high quality in Minecraft as image gen can be in stable diffusion?
Interesting context, but highlights all the problems of machine learning models: the lack of reason and abstraction and so on. Hard to say yet how much of an issue this might be, but the medium will almost certainly reveal something about our potential options for social organization.
I think their top-down approach is a problem. What they call human civilization wasn't and isn't centrally-planned, and its goals and ideologies are neither universal nor implicit. The integration of software agents (I refuse to call them "AI") into civilization won't occur in a de facto cooperative framework where such agents are permitted to fraternize and self-modify. Perhaps that will happen in walled gardens where general-purpose automatons can collectively 'plan' activities to maximize efficiency, but in our broader human world, any such collaboration is going to have to occur from the bottom-up and for the initial benefit of the agents' owners.
This kind of research needs to take place in an adversarial environment. There might be something interesting to learn from studying the (lack of?) emergence of collaboration there.
Really interesting but curious how civilization here holds up without deeper human-like complexity, feels like it might lean more toward scripted behaviors than real societies
They probably will fall fast into tragedy of the commons kind of situations. We developed most of our civilization while there was enough room for growing and big decisions were centralized, and started to get into bad troubles when things became global enough.
With AIs some of those "protections" may not be there. And hardcoding strategies to avoid this may already put a limit on what we are simulating.
> We developed most of our civilization while there was enough room for growing and big decisions were centralized, and started to get into bad troubles when things became global enough.
Citation needed. But even if I will get on board with you on that, wouldn't it be to start developing for global scale right from the start, instead of starting in small local islands and then try to rework that into global ecosystem?
The problem with emulations is human patience. If you don't need/have human interaction this may run pretty fast. And at the end, what matter is how sustainable it is in the long run.
Does this mean that individual complexity is a natural enemy of group cohesiveness? Or is individual 'selfishness' more a product of evolutionary background.
On our planet we don't have ant colony dynamics at the physical scale of high intelligence (that I know of), but there are very physical limitations to things like food sources.
Virtual simulations don't have the same limitations, so the priors may be quite different.
Taking the "best" course of action from your own point of view could not be so good from a more broad perspective. We might have evolved some small group collaboration approaches that in the long run plays better, but in large groups that doesn't go that well. And for AIs trying to optimize something without some big picture vision, things may go wrong faster.
I spent quite a bit of time building a multi agent simulation last year and wound up at the same conclusion every day - this is all just a roundabout form of prompt engineering. Perhaps it is useful as a mental model, but you can flatten the whole thing to a few SQL tables and functions. Each "agent" is essentially a sql view that maps a string template forming the prompt.
I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process. There is clearly no "inner world" in these LLMs, so trying to entertain them with a rich outer environment seems pointless.