Hacker News new | past | comments | ask | show | jobs | submit login

> I always have issues with LLMs completely forgetting where things are in a scene, or even what parts a given animal has, e.g. saying "hands" when the subject is a quadruped

I dunno what llm you are using, but a combination of finetuning with a specific prompt structure and good prompt engineering help the LLM stay "logical" like that. This LORA, for instsnce, has specific sections for the different characters in the training dataset: https://huggingface.co/lemonilia/LimaRP-Llama2-13B-v3-EXPERI...

Other than that, higher parameter models (70B, and the "frankenstein" 20B llama models) tend to be better at this.




Yeah, well that's just the problem, isn't it. The model isn't good at my task already, so I'm going to have to obtain my own dataset, curate the whole thing myself, organize it and finetune the model based on it, so on and so forth. I'm going to spend so much time actually creating the stories that I want to create, rather than troubleshooting the pipeline. And it totally helps that the entire stack is built on top of fragile python scripts.

I just wish there were a way of making these models already perform well on niche tasks like "write this story, except the characters are quadrupeds, and therefore are not human". Like Warriors (the book series, about cats), without having to go and spend weeks curating a dataset of books about non-human characters.

I'm sure that's so much of an ongoing area of research that it goes without saying.

> I dunno what llm you are using

I started with the RWKV family of models before realizing the amount of overfit is so critically unfunny that the model files aren't even on my computer anymore.

Anyway, the best I have found so far is Chronos-Hermes-13B. I believe that's a dedicated roleplaying model. I guess furry roleplays would make good training data, wouldn't it.

Chronos-Hermes-13B itself though is a mildly cursed/degenerate hybrid of two other models that don't really work together properly with the new GGML quantizations, and it's based on the old LLaMA-1 family of models, but I haven't found anything better yet.


> Chronos-Hermes-13B

Its not SOTA anymore. I dunno what is, but just look at what people are running on Lite:

https://lite.koboldai.net/#

The new darling seems to Mythos and Xwin-based hybrid models, as well as models with the 70B version of Chronos in them.

Also, see this, specifically the "IQ" metric: https://rentry.co/ayumi_erp_rating

> write this story, except the characters are quadrupeds, and therefore are not human

But the RP models should be able to get this with some prompt engineering. You may have to be repetitive in the instruction block, saying things like "...the characters are not human. All the characters have four feet. All the characters are quadraped animals..." and so on to really emphasize it to the LLM.


Honestly ERP models sound like they would be the best fit for this task, it's just hard to find one that's trained on quadrupeds rather than humans or even furries, if that makes any sense. I will try the repetitive method soon


There is a lot of effort put into those ERP models, lol. The training and datasets are actually really good, hence they are very good at the non-e RP part.


Pretty funny how so much effort goes into making and categorizing specifically NSFW content lol

I wouldn't be surprised if at least a few contributors in the open source AI community initially got in just because of this aspect


> at least a few contributors

Its more than a few.

The eRP community was apparently going strong before Llama or Stable Diffusion were even a thing, using GTP-J finetunes and some even older base models. Llamav1 was like a Christmas present, and all the UIs and infrastructure were already set up to work on it.

That was a strong motivator for all the work on Stable Diffusion too.

I see no problem with any of this, lol.


Oh, yes, I do know that this scene is quite large. I still remember the drama of when AI Dungeon began cracking down on NSFW content due to model licensing (?), or the thermonuclear response that followed character.ai doing something vaguely similar.

It's more that I always pictured AI developers (especially ones that interact with models directly) as these very intelligent, selfless scientists - and not ordinary people who are okay with using these technologies for more "hedonistic" purposes, so to say. Nothing wrong with it obviously, and I'm really interested in seeing where this stuff is headed for in the future.


> I wouldn't be surprised if at least a few contributors in the open source AI community initially got in just because of this aspect

whistles innocently




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: