You are basically getting a fixed prompt to return structured data with a small amount of automation and vendor lockin. All these LLM libraries are just crappy APIs to the underlying API. It is trivial to write a script that does the same and will be much more flexible as models and user needs evolve.
As an example, think about how you could change the prompt or use python classes instead. How much work would this be using a library like this versus something that lifts the API calls and text templating to the user like: https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/llm...
1. Running the typescript type checker against what is returned by the LLM.
2. If there are type errors, combining those into a "repair prompt" that will (it is assumed) have a higher likelihood of eliciting an LLM output that type checks.
3. Gracefully handling the cases where the heuristic in #2 fails.
In my experience experimenting with the same basic idea, the heuristic in #2 works surprisingly well for relatively simple types (i.e. records and arrays not nested too deeply, limited use of type variables). It turns out that prompting LLMs to return values inhabiting relatively simple types can be used to create useful applications. Since that is valuable, this library is valuable inasmuch as it eliminates the need to hand roll this request pattern, and provides a standardized integration with the typescript codebase.
How does TypeChat tie you to OpenAI more than zod-gpt does? The interface required of a chat completion model is as simple as it gets, and you can provide your own easily (as the linked post makes clear)
The ergonomics of most of these AI libraries are built around using whatever models they provide integrations for: according to the file you linked retries won't even work unless you go and roll them in your implementation.
I'm sure someone will open a PR for Anthropic/Cohere/etc. but a quick glance made it pretty clear they made it with OpenAI-first in mind, or even low hanging fruit like retries would have been abstracted away at a higher level.
I don't know where all you people work that your employer would prefer a random git repo (that has no support and no guarantee of updates) over a solution from Microsoft. (Alternatively: that you have so much free time that you'd prefer to fiddle with your own validation code instead of writing your actual app)
Open source solutions are great (which this still is, btw), but having a first-party solution is also a good thing.
You're overrating the influence of the name Microsoft here. It's just some devs from the company working on this with no proper guarantee backing the project.
I've been through this whole song and dance already with Microsoft's Guidance (another LLM project) and could not justify using it further in production at work. We built some tools and wrappers ourselves and it wasn't even that difficult. These libraries are often more trouble than they're worth.
Not really, better to leave the AI stuff to the AI people rather than PL people. When you don't, you get gimmick libraries like this rather than a solution that fits into the ecosystem
These folks have no pedigree when it comes to LLMs or AI, so no it does not lend credibility
I don't know which employer is hiring the people who make logical leaps like this but I thank them for their sacrifice.
At the end of the day the repo I linked is grokkable with about 10 minutes of effort, and has simple demonstrable usefulness by letting you swap out the LLM you're calling.
Both are experimental open source libraries in an experimental space.
these are trivial steps you can add in any script, as your link demonstrates.
Why would I want to add all this extra stuff just for that? The opaque retry until it returns valid JSON? That sounds like it will make for many pleasant support cases or issues
Personally, I have found investing more effort in the actual prompt engineering improves success rates and reduces the need to retry with an appended error message. Especially helpful are input/output pairs (i.e. few-shot) and while we haven't tried it yet, I imagine fine-tuning and distillation would improve the situation even more
There are many subtleties to invoking the typescript type checker from node. It's nice to have support for that from the team that maintains the type checker.
Is the team working on typescript in a good position to be making LLM libraries, interfaces, and abstractions? Do they have the background and context to understand how their library fits into AI workflows? Could they have provided the same value with a blog post and sample code?
If this is how you talk to people, it is quite clear who the bigger prick is, you don't even know me to make such judgements, rather shallow don't you think?
Pretty much all the LLM libraries I'm seeing are like this. They boil down to a request to the LLM to do something in a certain way. I've noticed under complex conditions, they stop listening and start reverting to their 'default' behavior.
But that said it still feels like using a library is the right thing to do... so I'm still watching this space to see what matures and emerges as a good-enough approach.
vendor lock in to a library and the design choices they make
basically, since it reduces the user input space, you are giving up flexibility and control for some questionably valuable abstractions, such as a predefined prompt, no ability to prompt engineer, CoT/ToT, etc...
if anything, choose a broader framework like langchain and have something like this an extension or plugin to the framework, no need for a library for this one little thing
Weird, I would suggest the opposite - LangChain is a nuke that was hastily assembled to crack a peanut, almond, and whatever other nuts were hype driven into the framework. It's a mess of spaghetti - which is nothing against the Langchain authors - it was just the first iteration in a new problem space. But adopting it in a new codebase is a big commitment that locks you into complexity you'll almost certainly want to shed at some point.
Whereas this library is a much more focused approach that does one small thing well, and could be integrated into your own homerolled frameworks (or probably even langchain itself, assuming you use langchain.js).
I agree that LangChain has some pretty poor APIs and abstractions, and I do even question the usefulness of what they provide.
But this library amounts to a loop around a very basic prompt and running the ts toolchain to produce an error message that is then appended to the prompt next iteration. It is not easily integrated into anything and is written by people who do not practice or develop AI.
The value is turn unstructured data into structured data and ensure it satisfies schema constraints.
For example: you have 1000 free-text survey responses about your product, building a schema and for-each `TypeChat`ing them would get you a dataset for that free-text. It's mind-bogglingly useful.
yes, turning unstructured data into structured data is one of the most useful ways to use an LLM right now. It has been done before with using schemas and could be done without all the extra cruft.
There was a similar example a few months back using XML instead, but I haven't heard much about it since, because again, the library did not add value on top of doing these things in a more open or scripted setting.
MSFT has another project in similar vain, guardrails, interesting idea, but made worse by wrapping it in a library. Most of these LLM ideas are better as a function than a library, make them transform the i/o rather than every library needing to write wrappers around the LLM APIs as well
There are several more making use of OpenAPI / JSONSchema rather than TS.
We use a subset of CUE, essentially JSON without as many quotes or commas. The LLMs are quite flexible with few-shot learning. They can be made more reliable with fine-tuning. They can be made faster and cheaper with distillation.
the debate is about how valuable the abstraction here is to warrant a library, and the fact that it predefines the prompt and api call flow, so you cannot prompt engineer or use something like CoT/ToT
except it's not neat or novel, this idea has been around and implemented for many months now, by many people, using many methods. Running a tool on the output and then feeding that back to the LLM, also not novel and a widely used technique
> We'd love to know if TypeChat is something that's useful and interests you!
People can debate till the cows come home. But it's worth remembering that hacker news is about stimulating intellectual curiosity.
There's no reason for this to have a fixed flow, either - it's got a hint of diagonalizability to it - by which I mean, you can get the model to build a schema for dynamic flows, given a 'bootstrapping' schema. No different than what has always had to happen for someone to write a compiler for a programming language in the language itself.
Getting these models to reliably return a consistent structure without frequent human intervention and/or having to account for the personal moral opinions of big tech CEOs is not trivial, no.
There are multiple ways to get structured output, and what this library is doing is not really that interesting. The concept is interesting and has had multiple implementations already, the code (and abstraction) here is not interesting and creates more issues than it solves
the quality of the prompt does not look that good from my experience reaching flexible structured output based on a schema
There are other questionable decisions and a valuable use of engineering time is indeed to evaluate candidate abstractions and think about the long-term cost of adopting them. In this case, it does not seem like it saves that much effort and in the long run means a lot of important LLM knobs are out of your control. Not a good tradeoff
Why all the rigamarole of hoping you get a valid response, adding last-mile validators to detect invalid responses, trying to beg the model to pretty please give me the syntax I'm asking for...
...when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.
But OpenAI apparently does not expose the full scores of all tokens, it only exposes the highest-scoring token. Which is so odd, because if you run models locally, using Guidance is trivial, and you can guarantee your json is correct every time. It's faster to generate, too!
It’s like the story of the brown M&Ms[0]. If the model is returning semantically correct data, you would hope that it can at least get the syntax correct. And if it can’t then you ought to throw the response away anyway.
Also I believe that such a method cannot capture the full complexity of TypeScript types.
That's a great analogy! I'd been wondering for a while whether that's a problem with this approach; to be honest I still don't know whether it is, so it would be good to see someone test it empirically.
> when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.
Yes, you can guarantee a syntactically correct JSON that way, but will it be a semantically correct? If the model really really really wanted to put another token there, but you are forcing it to put a {, maybe the following generated text won't be as good.
Well, if the output doesn't conform to the format it's useless. If the model can't produce good and correct output then it's simply not up to the task.
In my experience, LLM responses result in a fair distribution of outputs that do have semantically useful outputs but do not precisely adhere to the requested format. If I chose to use a strongly typed language for LLM parsing, perhaps I would be tempted to eliminate complexity and simply throw structural outliers away, and explain to the suits that a certain percentage of our queries/expenses are unusable. Instead, more sophisticated coercion techniques could be applied instead to increase output utilization.
That really strongly depends on your task. Lots of tasks can accept a non-zero failure rate in return for better results on the successful cases. I'm not sure I can think of any off the top of my head where you'd use a LLM and can never deal with a failure, particularly if you're using an external service where you're guaranteed to have to deal with errors or downtime at some point.
I agree that sampling only valid tokens is a very promising approach.
I experimented a bit with finetuning open source LLMs for JSON parsing (without guided token sampling). Depending on one's use case, 70B parameters might be an overkill. I've seen promising results with much much smaller models. Finetuning a small model combined with guided token sampling would be interesting.
Then again, finetuning is perhaps not perfect for very general applications. When you get input that you didn't anticipate in your training dataset, you're in trouble.
The LLM will be able to handle more complex scenarios. I could imagine a use-case: If you are ordering from a self-vending machine, instead of having to go through the whole process you just say your order out loud. You can say, for example, a couple chocolate bars and the LLM tries to guess from inventory.
Of course, if you are on the web, it makes no sense. It is much easier to use the mouse to click on a couple of items.
I swear I think of something and Anders Hejlsberg builds it.
Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.
> Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.
Yup, a general desire of mine is to locally run an LLM which has actionable interfaces that i provide. Things like "check time", "check calendar", "send message to user" and etc.
TypeChat seems to be in the right area. I can imagine an extra layer of "fit this JSON input to a possible action, if any" and etc.
I see a neat hybrid future where a bot (LLM/etc) works to glue layers of real code together. Sometimes part of ingestion, tagging, etc - sometimes part of responding to input, etc.
All around this is a super interesting area to me but frankly, everything is moving so fast i haven't concerned myself with diving too deep in it yet. Lots of smart people are working on it so i feel the need to let the dust settle a bit. But i think we're already there to have my "dream home interface" working.
ChatGPT isn’t the limiting factor here, a good way to expose the toggles is. I recently tried to expose our company CRM to employees by means of a Teams bot they could ask for stuff in natural language (like „send an invite link to newlead@example.org“ or „how many MAUs did customer Foo have in June“), but while I almost got there, communicating an ever-growing set of actionable commands (with an arbitrary number of arguments) to the model was more complex than I thought.
Care to share what made it complex? My comment above was most likely ignorant, but my general thought was to write some header prompt about available actions that the LLM could map to, and then ask it if a given input text matches to a pre-defined action. Much like what TypeChat does.
Does this sound similar enough to what you were doing? Was there something difficult in this that you could explain?
Aside from being completely hand-wavey in my hypothetical guess-timated implementation, i had figured the most difficult part would be piping complex actions together. "Remind me tomorrow about any events i have on my calendar" would be a conditional action based on lookups, etc - so order of operations would also have to be parsed somehow. I suspect a looping "thinking" mechanism would be necessary, and while i know that's not a novel idea i am unsure if i would nonetheless have to reinvent it in my own tech for the way i wanted to deploy.
Interacting with APIs is the old style. The magic of ChatGPT is the same magic as google had back in the day - you ask it in plain english and it has an answer.
I'm guessing the solution looks like a model trained to take actions on the internet. Kinda sucks for those of us on the outside, because whatever we make is going to be the same, brittle, chewing-gum and duct tape approach as usual. Best to wait for the bleeding edge, like what that MinecraftGPT project was aiming at.
This as a dynamic mapper in a backend layer can be huge.
For example, try to keep up with (frequent) API payload changes around a consumer in Java. We implemented a NodeJS layer just to stay sane. (Banking, huge JSON payloads, backends in Java)
It could shine, or it could be an absolute disaster.
Code/functionality archeology is already insanely hard in orgs with old codebases. Imagine the facepalming that Future You will have when you see that the way the system works is some sort of nondeterministic translation layer that magically connects two APIs where versions are allowed to fluctuate.
I think it's ironic that some people are saying the likes of Chat GPT will make software engineers obsolete when in reality there will be huge demand for the humans that will eventually be needed to clean up messes just like this.
This is my hot take: we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here, but people are so heavily invested in AI, that money is still being pumped into building stuff (and of course, it's one of the best way to guarantee your academic paper gets published). I mean, LangChain is kind of a joke and they raised $10M seed lol.
DeFi/crypto went through this phase 2 years ago. Mark my words, it's going to end up being this weird limbo for a few years where people will slowly realize that AI is a feature, not a product. And that its applicability is limited and that it won't save the world. It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.
I keep mentioning that even the most useful AI tools (Copilot, etc.) are marginally useful at best. At the very best it saves me a few clicks on Google, but the agents are not "intelligent" in the least. We went through a similar bubble a few years ago with chatbots[1]. These days, no one cares about them. "The metaverse" was much more short-lived, but the same herd mentality applies. "It's the next big thing" until it isn't.
Hard disagree on AI being just a bubble with limited applicability.
> It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.
You literally just cherry-picked the most difficult applications of AI. The vast majority of peoples' jobs don't involve life or death, and thus are ripe for automation. And even if the life or death jobs retain a human element, they will most certainly be augmented by AI agents. For example a surgery might still be handled by a human, but it will probably become mandatory for a doctor or nurse to diagnose a patient in conjunction with an AI.
> We went through a similar bubble a few years ago with chatbots
Are you honestly comparing that to now? ChatGPT got to 100 million users in a few months and everyone and their grandma has used it. I wasn't even aware of any chatbot bubble a few years ago, it certainly wasn't that significant.
> even the most useful AI tools (Copilot, etc.) are marginally useful at best
Sure, but you're literally seeing them in their worst versions. ChatGPT has been a life-changer for me, and it doesn't even execute code yet (Code Interpreter does though, which I haven't tested yet)
By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.
AI isn't just some fad, it's going to change literally every industry, and way faster than people think. The cynicism here trying to dismiss the implications of AI by comparing it to the metaverse are just absurd and utterly lacking in imagination. Yes there is still a lot of work that needs to be done, specifically in the AI agent side of things, but we will get there, probably way faster than people realize, and the implications are enormous.
> By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.
Eventually, perhaps. But by 2023? Definitely not.
I think both you and the GP are at opposite ends of the extreme and the reality is somewhere in that gulf in between
When I use ChatGPT I feel like I'm looking at a different technology than other people. It's supposed to be able to answer every question and teach me anything, but in practice it turns out to be a content-farm-as-a-service (CFaaS?) Copilot is similar, it's usually easier for me to write the code than iterate through it to find the least bad example and then fix the bugs.
That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.
> That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.
I think the state space when looking at something like Go v. natural language (or even formal languages like programming languages or first/second order logic) is not even remotely comparable. The number of states in Go is 3^361. The number of possible sentences in English, while technically infinite, has some sensible estimates (Googling shows the relatively tame 10^570 figure).
> we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here
Hard disagree. A very clear counterexample from my usage:
Gpt-4 is phenomenal at helping a skilled person work on tangential tasks where their skills generally translate but they don’t have strong domain knowledge.
I’ve been writing code for a decade, and recently I’ve been learning some ML for the first time. I’m using gpt-4 everyday and it’s been a delight.
To be fair, I can see one might find the rough edges annoying on occasion. For me, it’s quite manageable and not much of a bother. I’ve gotten better at ignoring or working around them. There is definitely an art to using these tools.
I expect the value provided to continue growing. We haven’t plucked all of the low-hanging or mid-hanging fruit yet.
I can share chat transcripts if you are interested.
> DeFi/crypto went through this phase 2 years ago.
A key difference is that these things, no matter how impressive their technical merits, required people to completely reshape whatever they were doing to get the first bit of benefit.
Modern AI (and really, usually LLMs) has immediate and broad applicability across nearly every economic sector, and that's why so many of us are already building and releasing features with it. There's incredible value in this stuff. Completely world-changing? No. But enough to create new product categories and fundamentally improve large swaths of existing product capabilities? Absolutely.
I feel like this is actually a very sensible take. AI has many uses, and it can be really good at some things, but it's not the hail mary it's being treated as.
How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet, and how has OpenAI not released their own voice assistant?
Also like RSS, if there were some standard URL a websites exposed for AI interaction, using this TypeChat to expose the interfaces, we'd be well on our way here.
OpenAI is pretty likely working on their own (see Kaparthy's "Building a kind of JARVIS @ OреոΑӏ"), and Microsoft of course is doing an integration or reinterpretation of Cortana with OpenAI's LLMS (since they are incapable of building their own models nowadays it seems - "Why do we have Microsoft Research at all?”-S.N.), but there's a lot less value in voice driven LLM then there is in actually being able to perform actions. Take Alexa for example, you need a system that can handle smart home control in a predictable, debuggable, way otherwise people would get annoyed. I definitely think you can do this, but the current system as built (and others like Siri and to a lesser use Cortana) all have a bunch of hooks and APIs being used by years and years of rules and software built atop less powerful models. They need to both maintain the current quality and improve on it while swapping out major parts of their system in order to make this work, which takes time.
Not to mention that none of these assistants actually make any money, they all lose money really, and are only worthwhile to big companies with other ways to make cash or drive other parts of their business (phones, shopping, whatever), so there's less incentive for a startup to do it.
I worked on both Cortana and Alexa in the past, thought a lot about trying to build a new version of them ground up with the LLM advancements, and while the tech was all straight forward and even had some new ideas for use cases that are enabled now, could not figure out a business model that would work (and hence, working on something completely different now).
It's July, they just needed to put a voice interface on ChatGPT, it'd easily help them sell more pro licenses as well. I'm not a conspiracy person, but this just seems so obvious it feels like there's something else going on here.
The official ChatGPT app has had voice-recognition for a while now. Still not closing the obvious loop with text-to-speech, but probably they have bigger fish to fry. It might be that the projected extra subscription revenue would not make such a big difference in the rate at which they burn through capital.
Talking to Alexa is laughable now, after having interacted with ChatGPT and Bing. It's so frustrating to see capable hardware being let down by crappy software for years upon years.
I'm really looking forward to something that I can use to control Home Assistant. I'm just really nervous about using any cloud-based API for this, so I would like to get something running on a server in my own house. But I would also want the voice recognition and response times to be extremely fast so I don't feel like I'm ever waiting for anything. I've seen a few DIY attempts at a personal assistant but there's always a significant delay that would become very annoying if I used it regularly.
Seriously, it feels like there’s some collusion going on behind the scenes. This is the most obvious use case for the technology, but none of the big vendors have explored it.
I think it's because it turns out that taming a generative language model is really difficult. It's what we need to support more than some hardcoded simple questions, but companies like Google who are known for search want to keep their image of "use us to find what you're looking for". In the current state, their models (especially Bard in my experience) simply return bullshit and want to sound confident. They need to get beyond that stage.
But I feel you. My Google Assistant doesn't even seem to look for answers to questions anymore. All I get, even for simple queries, is a "sorry, I don't understand".
Whoops - thanks for catching this. Earlier iterations of this blog post used an different schema where `size` had been accidentally specified as a `number`. While we changed the schema, we hadn't re-run the prompt. It should be fixed now!
Their example here is really weak overall IMO. Like more than just that typo. You also probably wouldn’t want a “name” string field anyway. Like there’s nothing stoping you from receiving
{
name: “the brown one”,
size: “the espresso cup”,
… }
Like that’s just as bad as parsing the original string. You probably want big string union types for each one of those representing whatever known values you want, so the LLM can try and match them.
But now why would you want that to be locked into the type syntax? You probably want something more like Zod where you can use some runtime data to build up those union types.
You also want restrictions on the types too, like quantity should be a positive, non-fractional integer. Of course you can just validate the JSON values afterwards, but now the user gets two kinds of errors. One from the LLM which is fluent and human sounding, and the other which is a weird technical “oops! You provided a value that is too large for quantity” error.
The type syntax seems like the wrong place to describe this stuff.
I feel like that's just a documentation bug. I'm guessing they changed from number of ounces to canonical size late in the drafting of the announcement and forgot to change the output value to match.
There would be no way for a system to map "grande" to 16 based on the code provided, and 16 does not seem to be used anywhere else.
Looks like it just runs the LLM in a loop until it spits out something that type checks, prompting with the error message.
This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.
Typescript's type system is much more expressive than the one the function call feature makes available.
I imagine closing the loop (using the TS compiler to restrict token output weights) is in the works, though it's probably not totally trivial. You'd need:
* An incremental TS compiler that could report "valid" or "valid prefix" (ie, valid as long as the next token is not EOF)
For the TS compiler: If you took each generation step, closed any partial JSON objects (ie close any open `{`), checked that it was valid JSON and then validated it using a deep version of Partial<T>, that should do the trick.
That's why I mentioned you check the JSON validity first. You'd obviously need to continue letting it generate tokens until you can parse the JSON to check if the type is partial. You could of course close even the quotes but then you'd get "not valid" signals from TS when the AI is like "just let me finish!" :-)
I'm not familiar with how TypeChat works, but Guidance [1] is another similar project that can actually integrate into the token sampling to enforce formats.
My take on this is, it should be easy for an engineer to spin up a new "bot" with a given LLM. There's a lot of boring work around translating your functions into something ChatGPT understands, then dealing with the response and parsing it back again.
With systems like these you can just focus on writing the actual PHP code, adding a few clear comments, and then the bot can immediately use your code like a tool in whatever task you give it.
Another benefit to things like this, is that it makes it much easier for code to be shared. If someone writes a function, you could pull it into a new bot and immediately use it. It eliminates the layer of "converting this for the LLM to use and understand", which I think is pretty cool and makes building so much quicker!
None of this is perfect yet, but I think this is the direction everything will go so that we can start to leverage each others code better. Think about how we use package managers in coding today, I want a package manager for AI specific tooling. Just install the "get the weather" library, add it to my bot, and now it can get the weather.
Hang on, so this is doing runtime validation of an object against a typescript type definition? Can this be shipped as a standalone library/feature? This would be absolutely game changing for validating api response payloads, etc. in typescript codebases.
yup, just found that, super neat, I am 100% interested in using this for other runtime validation...
It's interesting because I've always been under the impression the TS team was against the use of types at runtime (that's why projects like https://github.com/nonara/ts-patch exist), but now they're doing it themselves with this project...
I wonder what the performance overhead of starting up an instance of tsc in memory is? Is this suitable for low latency situations? Lots of testing to do...
I'm very surprised that they're not using `guidance` [0] here.
It not only would allow them to suggest that required fields be completed (avoiding the need for validation [1]) and probably save them GPU time in the end.
There must be a reason and I'm dying to know what it is! :)
Side-note, I was in the process of building this very thing and good ol' Misrocoft just swung in and ate my lunch.. :/
One of the key things that we've focused on with TypeChat is not just that it acts as a specification for retrieving structured data (i.e. JSON), but that the structure is actually valid - that it's well-typed based on your type definitions.
The thing to keep in mind with these different libraries is that they are not necessarily perfect substitutes for each other. They often serve different use-cases, or can be combined in various ways -- possibly using the techniques directly and independent of the libraries themselves.
So, it's a thing that appends "please format your response as the following JSON" to the prompt", then validates the actual response against the schema, all in a "while (true)" loop (literally) until it succeeds. This unbelievable achievement is a work of seven people (authors of the blog post).
Honestly, this is getting beyond embarrassing. How is this the world we live in?
It's because not everyone can be as gifted as you.
I think the (arguably very prototypical) implementation is not what's interesting here. It's the concept itself. Natural language may soon become the default interface for most of the computing people do on a day to day basis, and tools like these will make it easier to create new applications in this space.
I'm gonna love trying to figure out what query gets the support chatbot to pair me with an actual human so that I can solve something that's off script
Yeah it’s basically a retry loop. I’m curious about the average response time and the worst case amount of iterations.
At best, all these “retry until successfully” are just hacks to bridge the formal world with the stochastic. It’s just useless without some stats on how it performs.
And even if it conforms. Your not sure the data makes sense. Probably .. but exactly that probably
I think he's probably more of an author in the way that the leader of a research team is always credited on any paper by the team, even if he didn't personally do any actual work on it?
Anyway, TIL that Hejlsberg is also involved with TypeScript...
I agree with comments saying this is basically a 10-line "demo script" everyone could write and it is weird to have big names associated with it.
But I heard from MS friends that AI is an absolute "need to have". If you're not working on AI, you're not getting (as much) budget. I suspect this is more about ticking the box than producing some complex project. Unfortunately, throughout the company, folks are doing all kinds of weird things to tick the box like writing a "copilot" (with associated azure openai costs) fine-tuned on a handful of documentation articles :(
Define a struct and tag it with golang's json comments. Then, give it a prompt and ...
type dinnerParty struct {
Topic string `json:"topic" jsonschema:"required" jsonschema_description:"The topic of the conversation"`
RandomWords []string `json:"random_words" jsonschema:"required" jsonschema_description:"Random words to prime the conversation"`
}
completer := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
d := gollum.NewOpenAIDispatcher[dinnerParty]("dinner_party", "Given a topic, return random words", completer, nil)
output, _ := d.Prompt(context.Background(), "Talk to me about dinosaurs")
seems like they run the generated response through the typescript type checker, and if it fails, retry using the error message as a further hint to the LLM, until it succeeds.
This is funny, I have something pretty similar in my code, except it's using Zod for runtime typechecking, and I convert Zod schemas to json schemas and send that to gpt-3.5 as a function call. I would expect that using TypeScript's output is better for recovering from errors than with Zod's output, so I can definitely see the advantage of this.
Relevant: Built this which generalizes to arbitrary regex patterns / context free grammars with 100% adherence and is model-agnostic — https://news.ycombinator.com/item?id=36750083
Just use function calling, declare your function schema using Zod, and convert it to JSONSchema automatically. You don't have to write your types more than once, you get proper validation with great error messages, and can extend it.
It seems jsonformer has some advantages such as only generating tokens for the values and not the structure of the JSON. But this project seems to have more of a closed feedback loop prompt the model to do the right thing.
At least for llama.cpp users, this recently introduced PR -- https://github.com/ggerganov/llama.cpp/pull/1773 -- introducing grammar-based sampling could potentially improve structural reliability of LLaMA output. They provide an example JSON grammar as well.
Hi there! I'm one of the people working on TypeChat and I just want to say that we definitely welcome experimentation on things like this. We've actually been experimenting with running Llama 2 ourselves. Like you said, to get a model working with TypeChat all you really need is to provide a completion function. So give it a shot!
This looks quite similar to how were using OpenAI functions and zod (JSON Schema) to have OpenAI answer with JSON and interact with our custom functions to answer a prompt: https://wundergraph.com/blog/return_json_from_openai
Reliance on strong typing for LLM output coercion is a potentially lossy and inefficient approach that can introduce redundant LLM queries and costs. LLM output is far more subtle than this. But the strongly typed hammer is very attractive to many developers, particularly those in the Typescript ecosystem.
This is rather trivial. The real challenge would be to make it choose what type to return. The function api does that, but then natural conversations sometimes involve calling multiple functions, and there isn't a good schema for that.
That's interesting way to validate JSON. Basically they run the whole compiler (making it a runtime dependency). Hopefully this horrible implementation would nudge TypeScript developers into a direction of implementing RTTI.
I wish Copilot did something like this. I've found it'll regularly invent C# methods which don't exist, an error which seems trivial to catch and hide from the user. No output is better than bad output.
If I can use this instead of functions, it's gonna save me a buttload of API usage, because the Typescript interface syntax is so concise. Can't wait to try it.
Here's a relevant paper that folks may find interesting: <snip>Semantic Interpreter leverages an Analysis-Retrieval prompt construction method with LLMs for program synthesis, translating natural language user utterances to ODSL programs that can be transpiled to application APIs and then executed.</snip>
it's basically the same thing, but uses a more concise spec for writing the schema (typescript vs jsonschema)
In the end, both methods try to coax the model into returning a JSON object, one method can be used with any model, the other is tied to a specific, ever changing vendor API
Why would one choose to only support "OpenAI" and nothing else?
I'm totally happy to be able to receive structured queries, but I'm also not 100% sure TypeScript is the right tool, it seems to be an overkill. I mean obviously you don't need the power of TS with all its enums, generics, etc.
Plus given that it will run multiple queries in loop, it might end up very expensive for it abide by your custom-mage complex type
Here's the core of the message sent to the LLM: https://github.com/microsoft/TypeChat/blob/main/src/typechat...
You are basically getting a fixed prompt to return structured data with a small amount of automation and vendor lockin. All these LLM libraries are just crappy APIs to the underlying API. It is trivial to write a script that does the same and will be much more flexible as models and user needs evolve.
As an example, think about how you could change the prompt or use python classes instead. How much work would this be using a library like this versus something that lifts the API calls and text templating to the user like: https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/llm...