I've been working extensively with LLMs on a generative storytelling side project (named www.generativestorytelling.ai because I am terrible at naming things) and once prompts start getting complex, ChatGPT wins by a landslide. I can do all sorts of complicated prompts to ChatGPT[0] and it will, by and large, come up with great output.
Meanwhile, Bard gets confused by basic things such as "after this message I will send another one, do not reply until the second message is sent" and instead tries to immediately reply.
IMHO not very many people doing reviews of chatbots are really pushing them the bots to their limits, and those who are pushing the bots really hard are often too busy to take the time and make their work public (which is the reason I am developing in the open!)
> (named www.generativestorytelling.ai because I am terrible at naming things)
A bit beside the point - but struck me as odd given the context; chatgpt is pretty good at naming things:
Model: GPT-4
User
Suggest a better name for: "working extensively with LLMs on a generative storytelling side project (named www.generativestorytelling.ai"
ChatGPT
Certainly, here are a few suggestions for a name for your project:
"NarrativeAI: Automated Generative Storytelling"
"TellTalesAI: Collaborative Story Crafting"
"StoryForge.ai: LLM-driven Story Creation"
"ScriptSmith.ai: Mastering Narrative Generation"
"StoryGenix.ai: Redefining Generative Storytelling"
"PlotBender.ai: The Future of Digital Storytelling"
"NarrativeNexus.ai: Blending LLMs and Storytelling"
"TaleSpin.ai: Spinning Stories with AI"
"ProsePrism.ai: Reflecting Imagination"
"EpicInk.ai: Digitalizing Generative Storytelling"
Please note that it's also crucial to check for the availability of these domains before you decide on one.
User
Be more creative
ChatGPT
Absolutely! Here are some more creative and less conventional suggestions:
"AInkWoven.com: Stories Threaded in Code"
"FableFabric.ai: Weaving Narratives with AI"
"MythMechanics.ai: Engineering Epic Tales"
"FantasiaForge.ai: Crafting Dreams with LLMs"
"EpicEther.ai: Conjuring Stories from the Cloud"
"NeuralNovelettes.ai: Intelligent Design, Intelligent Stories"
"ChronicleCodex.ai: Archiving AI's Imagination"
"QuantumQuill.ai: Penning Possibilities with LLMs"
"SagaSilicon.ai: Where Stories Meet Tech"
"PapyrusPixel.ai: Marrying Old Traditions with New Possibilities"
Remember, the availability of these domain names is also an important factor to consider.
ChatGPT may produce inaccurate information about people, places, or facts.
ChatGPT 3.5 has been improving in its creative output pretty much every other week since I started looking at it. The improvements don't get announced, things just sort of change. Breaks prompts all the time though! For my usage, sometime around 3 weeks ago was probably peak, a few days ago its started getting really wordy, so much so that it runs into the 4k token limit before it can complete stories now!
Obligatory follow-up question: I'm guessing you have a reason to only mentioning and using GPT-3.5 in this project, but in terms of more general evaluation of complex prompt handling, did you have a chance to test GPT-4, and if yes, how does it fare in comparison?
I predict that, if you tested it, the answer will be "GPT-4 is much better than GPT-3.5 at this" - but the part I'm interested in is hearing your opinion on how much and how exactly is GPT-4 better at handling these kinds of prompts/interactions.
(If my prediction is wrong and GPT-4 isn't much better than GPT-3.5 here, that would be even more interesting to hear about.)
I'm using GPT3.5 because it is a public facing site running off of my personal credit card. :)
Back when I first started out, GPT-4 was generating more creative stories for sure, and it obeyed instructions better. I have had to do a lot of prompt engineering to make sure GPT3.5 keeps to the format I have specified. If you take a look at the source I linked to, I basically taught GPT3.5 a syntax that uses emojis, which is super cool because in addition to the emojis I gave examples of, GPT3.5 has come up with its own use cases. For example when a princess got exiled, it used a door emoji! Sadly I have had to turn the temperature down to keep things on track, at higher temperatures GPT3.5 tends to forget the syntax.
GPT4 is easier to direct, it follows instructions much better, and the stories are possibly better, but I haven't used it nearly as much for this particular project.
I completely agree. For simple or short conversations, the models are similar enough, even for the low parameter models. But as the conversation gets more complicated, the difference between the models becomes more obvious. The benchmarks used today only give a basic understanding of the language model's abilities and you have to use it yourself to see how capable it really is.
Build something. Play with it. Use it everyday and watch the models evolve.
Not everything can be (easily) measured, especially when it comes to creative output. Heck even things like "which UI API is better to use" is incredibly subjective, people working at a high level building CRUD apps will look at VB6 and think it is amazing, while others want low level graphics primitives.
Programmatically using an AI is almost like having a partner you have to negotiate with. Miscommunications are common, the AI occasionally decides to change its answer format, or just improvise brand new things. For the https://www.generativestorytelling.ai project I had to come up with a a syntax and parser that is adaptable to changes, while staying under a token limit. The token limit is one reason I didn't use JSON, but the other reason is that I've had friends tell me that GPT will sometimes throw brand new syntax in with its answers, so I wanted a response format that was loose enough to allow the AI to be creative without breaking my parser!
It is an incredibly different experience than any other system I've ever worked with.
Meanwhile, Bard gets confused by basic things such as "after this message I will send another one, do not reply until the second message is sent" and instead tries to immediately reply.
IMHO not very many people doing reviews of chatbots are really pushing them the bots to their limits, and those who are pushing the bots really hard are often too busy to take the time and make their work public (which is the reason I am developing in the open!)
[0] https://github.com/devlinb/arcadia/blob/main/backend/src/rou...