Hacker News new | past | comments | ask | show | jobs | submit login
TypeChat (microsoft.github.io)
556 points by DanRosenwasser on July 20, 2023 | hide | past | favorite | 169 comments



I don't see the value add here.

Here's the core of the message sent to the LLM: https://github.com/microsoft/TypeChat/blob/main/src/typechat...

You are basically getting a fixed prompt to return structured data with a small amount of automation and vendor lockin. All these LLM libraries are just crappy APIs to the underlying API. It is trivial to write a script that does the same and will be much more flexible as models and user needs evolve.

As an example, think about how you could change the prompt or use python classes instead. How much work would this be using a library like this versus something that lifts the API calls and text templating to the user like: https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/llm...


The value is in:

1. Running the typescript type checker against what is returned by the LLM.

2. If there are type errors, combining those into a "repair prompt" that will (it is assumed) have a higher likelihood of eliciting an LLM output that type checks.

3. Gracefully handling the cases where the heuristic in #2 fails.

https://github.com/microsoft/TypeChat/blob/main/src/typechat...

In my experience experimenting with the same basic idea, the heuristic in #2 works surprisingly well for relatively simple types (i.e. records and arrays not nested too deeply, limited use of type variables). It turns out that prompting LLMs to return values inhabiting relatively simple types can be used to create useful applications. Since that is valuable, this library is valuable inasmuch as it eliminates the need to hand roll this request pattern, and provides a standardized integration with the typescript codebase.


Here's a project that does that better imo:

https://github.com/dzhng/zod-gpt

And by better I mean doesn't tie you to OpenAI for no good reason


How does TypeChat tie you to OpenAI more than zod-gpt does? The interface required of a chat completion model is as simple as it gets, and you can provide your own easily (as the linked post makes clear)

https://github.com/microsoft/TypeChat/blob/4d34a5005c67bc494...


The ergonomics of most of these AI libraries are built around using whatever models they provide integrations for: according to the file you linked retries won't even work unless you go and roll them in your implementation.

I'm sure someone will open a PR for Anthropic/Cohere/etc. but a quick glance made it pretty clear they made it with OpenAI-first in mind, or even low hanging fruit like retries would have been abstracted away at a higher level.


I don't know where all you people work that your employer would prefer a random git repo (that has no support and no guarantee of updates) over a solution from Microsoft. (Alternatively: that you have so much free time that you'd prefer to fiddle with your own validation code instead of writing your actual app)

Open source solutions are great (which this still is, btw), but having a first-party solution is also a good thing.


You're overrating the influence of the name Microsoft here. It's just some devs from the company working on this with no proper guarantee backing the project.

I've been through this whole song and dance already with Microsoft's Guidance (another LLM project) and could not justify using it further in production at work. We built some tools and wrappers ourselves and it wasn't even that difficult. These libraries are often more trouble than they're worth.


I’m pretty sure Anders, Steve Lucco, and Daniel Rosenwasser worked on this. So inventors + current lead PM of typescript.

Should lend some credibility to the project.


Not really, better to leave the AI stuff to the AI people rather than PL people. When you don't, you get gimmick libraries like this rather than a solution that fits into the ecosystem

These folks have no pedigree when it comes to LLMs or AI, so no it does not lend credibility


I don't know which employer is hiring the people who make logical leaps like this but I thank them for their sacrifice.

At the end of the day the repo I linked is grokkable with about 10 minutes of effort, and has simple demonstrable usefulness by letting you swap out the LLM you're calling.

Both are experimental open source libraries in an experimental space.


Many companies expressly avoid Microsoft products, particularly given its well exposed history of embrace, extend, extinguish.


Look at Guidance - that's being ignored by Microsoft yet it's an official repo


I use Zod a great deal day to day, so this is appealing inasmuch as it would allow me to re-use those definitions.


Anything like this but for Python?


these are trivial steps you can add in any script, as your link demonstrates.

Why would I want to add all this extra stuff just for that? The opaque retry until it returns valid JSON? That sounds like it will make for many pleasant support cases or issues

Personally, I have found investing more effort in the actual prompt engineering improves success rates and reduces the need to retry with an appended error message. Especially helpful are input/output pairs (i.e. few-shot) and while we haven't tried it yet, I imagine fine-tuning and distillation would improve the situation even more


There are many subtleties to invoking the typescript type checker from node. It's nice to have support for that from the team that maintains the type checker.


Admittedly, couldn't they spend some effort on making that invocation less subtle instead?


Is the team working on typescript in a good position to be making LLM libraries, interfaces, and abstractions? Do they have the background and context to understand how their library fits into AI workflows? Could they have provided the same value with a blog post and sample code?


Your coworkers must love you.


Indeed, we all do what we are good at and appreciate each other and no having to do the things they do

But what does your comment have to do with any of this at all?


It's called sarcasm. But here, let me say the same thing directly: you are an insufferable prick. Imagine gatekeeping talking to a fucking chatbot.


If this is how you talk to people, it is quite clear who the bigger prick is, you don't even know me to make such judgements, rather shallow don't you think?


I know enough to see you're not nearly as smart as you think you are.


agreed. not to mention we're talking about Microsoft here. the same company that gave us "guidance", a defunct LLM framework.


I’ve used guidance, why is it defunct? I found it was powerful at templating, really decent for generating synthetic datasets.


Pretty much all the LLM libraries I'm seeing are like this. They boil down to a request to the LLM to do something in a certain way. I've noticed under complex conditions, they stop listening and start reverting to their 'default' behavior.

But that said it still feels like using a library is the right thing to do... so I'm still watching this space to see what matures and emerges as a good-enough approach.


Where's the vendor lock-in? This is an open source library and the file you linked to even includes configs for two vendors: ChatGPT and Bard.


vendor lock in to a library and the design choices they make

basically, since it reduces the user input space, you are giving up flexibility and control for some questionably valuable abstractions, such as a predefined prompt, no ability to prompt engineer, CoT/ToT, etc...

if anything, choose a broader framework like langchain and have something like this an extension or plugin to the framework, no need for a library for this one little thing


Weird, I would suggest the opposite - LangChain is a nuke that was hastily assembled to crack a peanut, almond, and whatever other nuts were hype driven into the framework. It's a mess of spaghetti - which is nothing against the Langchain authors - it was just the first iteration in a new problem space. But adopting it in a new codebase is a big commitment that locks you into complexity you'll almost certainly want to shed at some point.

Whereas this library is a much more focused approach that does one small thing well, and could be integrated into your own homerolled frameworks (or probably even langchain itself, assuming you use langchain.js).


I agree that LangChain has some pretty poor APIs and abstractions, and I do even question the usefulness of what they provide.

But this library amounts to a loop around a very basic prompt and running the ts toolchain to produce an error message that is then appended to the prompt next iteration. It is not easily integrated into anything and is written by people who do not practice or develop AI.


The value is turn unstructured data into structured data and ensure it satisfies schema constraints.

For example: you have 1000 free-text survey responses about your product, building a schema and for-each `TypeChat`ing them would get you a dataset for that free-text. It's mind-bogglingly useful.


yes, turning unstructured data into structured data is one of the most useful ways to use an LLM right now. It has been done before with using schemas and could be done without all the extra cruft.

There was a similar example a few months back using XML instead, but I haven't heard much about it since, because again, the library did not add value on top of doing these things in a more open or scripted setting.

MSFT has another project in similar vain, guardrails, interesting idea, but made worse by wrapping it in a library. Most of these LLM ideas are better as a function than a library, make them transform the i/o rather than every library needing to write wrappers around the LLM APIs as well

There are several more making use of OpenAPI / JSONSchema rather than TS.

We use a subset of CUE, essentially JSON without as many quotes or commas. The LLMs are quite flexible with few-shot learning. They can be made more reliable with fine-tuning. They can be made faster and cheaper with distillation.


Yes as the abstractions gets better it becomes easier to code useful things.


the debate is about how valuable the abstraction here is to warrant a library, and the fact that it predefines the prompt and api call flow, so you cannot prompt engineer or use something like CoT/ToT


This amounts to saying ‘how dare someone publish some code that they wrote!’

Is it your impression that this is being pitched as some grand solution?

That this was published as a way to shut out other people from doing the same thing in other ways?

Can’t we just look at a cool thing someone did, and released for other people to play with, and say ‘huh! That’s neat!’ And get inspired?


except it's not neat or novel, this idea has been around and implemented for many months now, by many people, using many methods. Running a tool on the output and then feeding that back to the LLM, also not novel and a widely used technique

> We'd love to know if TypeChat is something that's useful and interests you!

We are providing feedback to them here


People can debate till the cows come home. But it's worth remembering that hacker news is about stimulating intellectual curiosity.

There's no reason for this to have a fixed flow, either - it's got a hint of diagonalizability to it - by which I mean, you can get the model to build a schema for dynamic flows, given a 'bootstrapping' schema. No different than what has always had to happen for someone to write a compiler for a programming language in the language itself.


Getting these models to reliably return a consistent structure without frequent human intervention and/or having to account for the personal moral opinions of big tech CEOs is not trivial, no.


There are multiple ways to get structured output, and what this library is doing is not really that interesting. The concept is interesting and has had multiple implementations already, the code (and abstraction) here is not interesting and creates more issues than it solves


Tell me how to get reliably structured output. I'm all ears.


I have a prompt from February pre chatgpt and now I just use the models functions support, it's built for exactly that



It’s essentially prompt engineering as a service with some basic quality-control features thrown in.

Sure, your engineers could implement it themselves, but don’t they have better things to do?


the quality of the prompt does not look that good from my experience reaching flexible structured output based on a schema

There are other questionable decisions and a valuable use of engineering time is indeed to evaluate candidate abstractions and think about the long-term cost of adopting them. In this case, it does not seem like it saves that much effort and in the long run means a lot of important LLM knobs are out of your control. Not a good tradeoff


You can probably define the python language grammar as a typescript type though!


Here's one thing I don't get.

Why all the rigamarole of hoping you get a valid response, adding last-mile validators to detect invalid responses, trying to beg the model to pretty please give me the syntax I'm asking for...

...when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.

This is what Guidance does already, also from Microsoft: https://github.com/microsoft/guidance

But OpenAI apparently does not expose the full scores of all tokens, it only exposes the highest-scoring token. Which is so odd, because if you run models locally, using Guidance is trivial, and you can guarantee your json is correct every time. It's faster to generate, too!


It’s like the story of the brown M&Ms[0]. If the model is returning semantically correct data, you would hope that it can at least get the syntax correct. And if it can’t then you ought to throw the response away anyway.

Also I believe that such a method cannot capture the full complexity of TypeScript types.

[0] https://www.snopes.com/fact-check/brown-out/


That's a great analogy! I'd been wondering for a while whether that's a problem with this approach; to be honest I still don't know whether it is, so it would be good to see someone test it empirically.


> when you can guarantee a valid JSON syntax by only sampling tokens that are valid? Instead of greedily picking the highest-scoring token every time, you select the highest-scoring token that conforms to the requested format.

Yes, you can guarantee a syntactically correct JSON that way, but will it be a semantically correct? If the model really really really wanted to put another token there, but you are forcing it to put a {, maybe the following generated text won't be as good.

I'm not sure, I'm just wondering out loud.


Well, if the output doesn't conform to the format it's useless. If the model can't produce good and correct output then it's simply not up to the task.


In my experience, LLM responses result in a fair distribution of outputs that do have semantically useful outputs but do not precisely adhere to the requested format. If I chose to use a strongly typed language for LLM parsing, perhaps I would be tempted to eliminate complexity and simply throw structural outliers away, and explain to the suits that a certain percentage of our queries/expenses are unusable. Instead, more sophisticated coercion techniques could be applied instead to increase output utilization.


That really strongly depends on your task. Lots of tasks can accept a non-zero failure rate in return for better results on the successful cases. I'm not sure I can think of any off the top of my head where you'd use a LLM and can never deal with a failure, particularly if you're using an external service where you're guaranteed to have to deal with errors or downtime at some point.


I agree that sampling only valid tokens is a very promising approach.

I experimented a bit with finetuning open source LLMs for JSON parsing (without guided token sampling). Depending on one's use case, 70B parameters might be an overkill. I've seen promising results with much much smaller models. Finetuning a small model combined with guided token sampling would be interesting.

Then again, finetuning is perhaps not perfect for very general applications. When you get input that you didn't anticipate in your training dataset, you're in trouble.


The LLM will be able to handle more complex scenarios. I could imagine a use-case: If you are ordering from a self-vending machine, instead of having to go through the whole process you just say your order out loud. You can say, for example, a couple chocolate bars and the LLM tries to guess from inventory.

Of course, if you are on the web, it makes no sense. It is much easier to use the mouse to click on a couple of items.


Llama.cpp recently added grammar based sampling, which constraints token selection to follow a rigid format like you describe.

https://github.com/ggerganov/llama.cpp/pull/1773


OpenAI doesn’t expose this information because it makes it vastly easier to train your model off theirs.


I swear I think of something and Anders Hejlsberg builds it.

Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.


> Structured requests and responses are 100% the next evolution of LLMs. People are already getting tired of chatbots. Being able to plug in any backend without worrying about text parsing and prompts will be amazing.

Yup, a general desire of mine is to locally run an LLM which has actionable interfaces that i provide. Things like "check time", "check calendar", "send message to user" and etc.

TypeChat seems to be in the right area. I can imagine an extra layer of "fit this JSON input to a possible action, if any" and etc.

I see a neat hybrid future where a bot (LLM/etc) works to glue layers of real code together. Sometimes part of ingestion, tagging, etc - sometimes part of responding to input, etc.

All around this is a super interesting area to me but frankly, everything is moving so fast i haven't concerned myself with diving too deep in it yet. Lots of smart people are working on it so i feel the need to let the dust settle a bit. But i think we're already there to have my "dream home interface" working.


I just published CopilotKit, which lets you implement this exact functionality for any web app via react hooks.

`useMakeCopilotActionable` = you pass the type of the input, and an arbitrary typescript function implementation.

https://github.com/RecursivelyAI/CopilotKit

Feedback welcome


I was thinking about this yesterday. ChatGPT really is good enough to act as a proper virtual assistant / home manager, with enough toggles exposed.


ChatGPT isn’t the limiting factor here, a good way to expose the toggles is. I recently tried to expose our company CRM to employees by means of a Teams bot they could ask for stuff in natural language (like „send an invite link to newlead@example.org“ or „how many MAUs did customer Foo have in June“), but while I almost got there, communicating an ever-growing set of actionable commands (with an arbitrary number of arguments) to the model was more complex than I thought.


Care to share what made it complex? My comment above was most likely ignorant, but my general thought was to write some header prompt about available actions that the LLM could map to, and then ask it if a given input text matches to a pre-defined action. Much like what TypeChat does.

Does this sound similar enough to what you were doing? Was there something difficult in this that you could explain?

Aside from being completely hand-wavey in my hypothetical guess-timated implementation, i had figured the most difficult part would be piping complex actions together. "Remind me tomorrow about any events i have on my calendar" would be a conditional action based on lookups, etc - so order of operations would also have to be parsed somehow. I suspect a looping "thinking" mechanism would be necessary, and while i know that's not a novel idea i am unsure if i would nonetheless have to reinvent it in my own tech for the way i wanted to deploy.


https://github.com/ShelbyJenkins/LLM-OpenAPI-minifier

I have a working solution to exposing the toggles.

I’m integrating it into the bot I have in the other repo.

Goal is you point to an openapi spec and then GPT can run choose and run functions. Basically Siri but with access to any API.


Good shit!


Interacting with APIs is the old style. The magic of ChatGPT is the same magic as google had back in the day - you ask it in plain english and it has an answer.

I'm guessing the solution looks like a model trained to take actions on the internet. Kinda sucks for those of us on the outside, because whatever we make is going to be the same, brittle, chewing-gum and duct tape approach as usual. Best to wait for the bleeding edge, like what that MinecraftGPT project was aiming at.


How about unix's (and plan9's more extreme version of) "everything is a file" philosophy? The gift that won't stop giving..


(How) did you solve this?


Write me an email if you’re interested in details, probably have some code to share. Address is in my profile.


Tell me about it - I implemented this just yesterday except with a focus on functions rather than objects.


This as a dynamic mapper in a backend layer can be huge.

For example, try to keep up with (frequent) API payload changes around a consumer in Java. We implemented a NodeJS layer just to stay sane. (Banking, huge JSON payloads, backends in Java)

Mapping is really something LLMs could shine.


It could shine, or it could be an absolute disaster.

Code/functionality archeology is already insanely hard in orgs with old codebases. Imagine the facepalming that Future You will have when you see that the way the system works is some sort of nondeterministic translation layer that magically connects two APIs where versions are allowed to fluctuate.


I think it's ironic that some people are saying the likes of Chat GPT will make software engineers obsolete when in reality there will be huge demand for the humans that will eventually be needed to clean up messes just like this.




This is my hot take: we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here, but people are so heavily invested in AI, that money is still being pumped into building stuff (and of course, it's one of the best way to guarantee your academic paper gets published). I mean, LangChain is kind of a joke and they raised $10M seed lol.

DeFi/crypto went through this phase 2 years ago. Mark my words, it's going to end up being this weird limbo for a few years where people will slowly realize that AI is a feature, not a product. And that its applicability is limited and that it won't save the world. It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.

I keep mentioning that even the most useful AI tools (Copilot, etc.) are marginally useful at best. At the very best it saves me a few clicks on Google, but the agents are not "intelligent" in the least. We went through a similar bubble a few years ago with chatbots[1]. These days, no one cares about them. "The metaverse" was much more short-lived, but the same herd mentality applies. "It's the next big thing" until it isn't.

[1] https://venturebeat.com/business/facebook-opens-its-messenge...


Hard disagree on AI being just a bubble with limited applicability.

> It won't be able to self-drive cars due to all the edge cases, it won't be able to perform surgeries because it might kill people, etc.

You literally just cherry-picked the most difficult applications of AI. The vast majority of peoples' jobs don't involve life or death, and thus are ripe for automation. And even if the life or death jobs retain a human element, they will most certainly be augmented by AI agents. For example a surgery might still be handled by a human, but it will probably become mandatory for a doctor or nurse to diagnose a patient in conjunction with an AI.

> We went through a similar bubble a few years ago with chatbots

Are you honestly comparing that to now? ChatGPT got to 100 million users in a few months and everyone and their grandma has used it. I wasn't even aware of any chatbot bubble a few years ago, it certainly wasn't that significant.

> even the most useful AI tools (Copilot, etc.) are marginally useful at best

Sure, but you're literally seeing them in their worst versions. ChatGPT has been a life-changer for me, and it doesn't even execute code yet (Code Interpreter does though, which I haven't tested yet)

By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.

AI isn't just some fad, it's going to change literally every industry, and way faster than people think. The cynicism here trying to dismiss the implications of AI by comparing it to the metaverse are just absurd and utterly lacking in imagination. Yes there is still a lot of work that needs to be done, specifically in the AI agent side of things, but we will get there, probably way faster than people realize, and the implications are enormous.


> By 2030 humans probably won't be typing code anymore, it'll just be prompting machines and directing AI agents. By then most peoples' jobs will also be automated.

Eventually, perhaps. But by 2023? Definitely not.

I think both you and the GP are at opposite ends of the extreme and the reality is somewhere in that gulf in between


When I use ChatGPT I feel like I'm looking at a different technology than other people. It's supposed to be able to answer every question and teach me anything, but in practice it turns out to be a content-farm-as-a-service (CFaaS?) Copilot is similar, it's usually easier for me to write the code than iterate through it to find the least bad example and then fix the bugs.

That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.


> That said, AlphaGo went from "hallucinating" bad moves to the best player in the world in a fairly short period of time. If this is at all doable for language models, GPT-x may blow all this out of the water.

I think the state space when looking at something like Go v. natural language (or even formal languages like programming languages or first/second order logic) is not even remotely comparable. The number of states in Go is 3^361. The number of possible sentences in English, while technically infinite, has some sensible estimates (Googling shows the relatively tame 10^570 figure).


> we're slowly entering the "tooling" phase of AI, where people realize there's no real value generation here

Hard disagree. A very clear counterexample from my usage:

Gpt-4 is phenomenal at helping a skilled person work on tangential tasks where their skills generally translate but they don’t have strong domain knowledge.

I’ve been writing code for a decade, and recently I’ve been learning some ML for the first time. I’m using gpt-4 everyday and it’s been a delight.

To be fair, I can see one might find the rough edges annoying on occasion. For me, it’s quite manageable and not much of a bother. I’ve gotten better at ignoring or working around them. There is definitely an art to using these tools.

I expect the value provided to continue growing. We haven’t plucked all of the low-hanging or mid-hanging fruit yet.

I can share chat transcripts if you are interested.


> DeFi/crypto went through this phase 2 years ago.

A key difference is that these things, no matter how impressive their technical merits, required people to completely reshape whatever they were doing to get the first bit of benefit.

Modern AI (and really, usually LLMs) has immediate and broad applicability across nearly every economic sector, and that's why so many of us are already building and releasing features with it. There's incredible value in this stuff. Completely world-changing? No. But enough to create new product categories and fundamentally improve large swaths of existing product capabilities? Absolutely.


I feel like this is actually a very sensible take. AI has many uses, and it can be really good at some things, but it's not the hail mary it's being treated as.


Your analysis is based on what's possible now. This is the worst it'll ever be.


How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet, and how has OpenAI not released their own voice assistant?

Also like RSS, if there were some standard URL a websites exposed for AI interaction, using this TypeChat to expose the interfaces, we'd be well on our way here.


OpenAI is pretty likely working on their own (see Kaparthy's "Building a kind of JARVIS @ OреոΑӏ"), and Microsoft of course is doing an integration or reinterpretation of Cortana with OpenAI's LLMS (since they are incapable of building their own models nowadays it seems - "Why do we have Microsoft Research at all?”-S.N.), but there's a lot less value in voice driven LLM then there is in actually being able to perform actions. Take Alexa for example, you need a system that can handle smart home control in a predictable, debuggable, way otherwise people would get annoyed. I definitely think you can do this, but the current system as built (and others like Siri and to a lesser use Cortana) all have a bunch of hooks and APIs being used by years and years of rules and software built atop less powerful models. They need to both maintain the current quality and improve on it while swapping out major parts of their system in order to make this work, which takes time.

Not to mention that none of these assistants actually make any money, they all lose money really, and are only worthwhile to big companies with other ways to make cash or drive other parts of their business (phones, shopping, whatever), so there's less incentive for a startup to do it.

I worked on both Cortana and Alexa in the past, thought a lot about trying to build a new version of them ground up with the LLM advancements, and while the tech was all straight forward and even had some new ideas for use cases that are enabled now, could not figure out a business model that would work (and hence, working on something completely different now).


It's July, they just needed to put a voice interface on ChatGPT, it'd easily help them sell more pro licenses as well. I'm not a conspiracy person, but this just seems so obvious it feels like there's something else going on here.


The official ChatGPT app has had voice-recognition for a while now. Still not closing the obvious loop with text-to-speech, but probably they have bigger fish to fry. It might be that the projected extra subscription revenue would not make such a big difference in the rate at which they burn through capital.


No big company wants their appliance to accidentally talk customer's child into suicide or spouse into a divorce. Bad for image.


It's not like ChatGPT can't do that already..


> How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet

When I first learned what ChatGPT was my thought was "oh so like what Siri is supposed to be."


Talking to Alexa is laughable now, after having interacted with ChatGPT and Bing. It's so frustrating to see capable hardware being let down by crappy software for years upon years.


Microsoft is doing that to replace Cortana in windows 11


I'm really looking forward to something that I can use to control Home Assistant. I'm just really nervous about using any cloud-based API for this, so I would like to get something running on a server in my own house. But I would also want the voice recognition and response times to be extremely fast so I don't feel like I'm ever waiting for anything. I've seen a few DIY attempts at a personal assistant but there's always a significant delay that would become very annoying if I used it regularly.


Seriously, it feels like there’s some collusion going on behind the scenes. This is the most obvious use case for the technology, but none of the big vendors have explored it.


It takes a while to develop a product, and the world only woke up to them mere months ago


I think it's because it turns out that taming a generative language model is really difficult. It's what we need to support more than some hardcoded simple questions, but companies like Google who are known for search want to keep their image of "use us to find what you're looking for". In the current state, their models (especially Bard in my experience) simply return bullshit and want to sound confident. They need to get beyond that stage.

But I feel you. My Google Assistant doesn't even seem to look for answers to questions anymore. All I get, even for simple queries, is a "sorry, I don't understand".


Willow, and the Willow Interference Server have the option to use Vicuna with speech input and TTS


> It's unfortunately easy to get a response that includes { "name": "grande latte" }

    type Item = {
        name: string;
        ...
        size?: string;
I'm not really following how this would avoid `name: "grande latte"`?

But then the example response:

    "size": 16
> This is pretty great!

Is it? It's not even returning the type being asked for?

I'm guessing this is more of a typo in the example, because otherwise this seems cool.


Whoops - thanks for catching this. Earlier iterations of this blog post used an different schema where `size` had been accidentally specified as a `number`. While we changed the schema, we hadn't re-run the prompt. It should be fixed now!


Their example here is really weak overall IMO. Like more than just that typo. You also probably wouldn’t want a “name” string field anyway. Like there’s nothing stoping you from receiving

    {
        name: “the brown one”,
        size: “the espresso cup”,
    … }
Like that’s just as bad as parsing the original string. You probably want big string union types for each one of those representing whatever known values you want, so the LLM can try and match them.

But now why would you want that to be locked into the type syntax? You probably want something more like Zod where you can use some runtime data to build up those union types.

You also want restrictions on the types too, like quantity should be a positive, non-fractional integer. Of course you can just validate the JSON values afterwards, but now the user gets two kinds of errors. One from the LLM which is fluent and human sounding, and the other which is a weird technical “oops! You provided a value that is too large for quantity” error.

The type syntax seems like the wrong place to describe this stuff.


I feel like that's just a documentation bug. I'm guessing they changed from number of ounces to canonical size late in the drafting of the announcement and forgot to change the output value to match.

There would be no way for a system to map "grande" to 16 based on the code provided, and 16 does not seem to be used anywhere else.


The rest of the paragraph discusses "what happens when it ignores type?", so I think that's where they were going with that?


Looks like it just runs the LLM in a loop until it spits out something that type checks, prompting with the error message.

This is a cute idea and it looks like it should work, but I could see this getting expensive with larger models and input prompts. Probably not a fix for all scenarios.


At least with OpenAI, wouldn't it be better if under the hood it was using the new function call feature?


Typescript's type system is much more expressive than the one the function call feature makes available.

I imagine closing the loop (using the TS compiler to restrict token output weights) is in the works, though it's probably not totally trivial. You'd need:

* An incremental TS compiler that could report "valid" or "valid prefix" (ie, valid as long as the next token is not EOF)

* The ability to backtrack the model

Idk how hard either one piece is.


For the TS compiler: If you took each generation step, closed any partial JSON objects (ie close any open `{`), checked that it was valid JSON and then validated it using a deep version of Partial<T>, that should do the trick.


Not for even the simplest schemas.

Eg, given even the type:

    {"aLongerKey": "value"}
The generation prefix:

    {"a
would by your algorithm produce the following invalid output:

    {"a}


That's why I mentioned you check the JSON validity first. You'd obviously need to continue letting it generate tokens until you can parse the JSON to check if the type is partial. You could of course close even the quotes but then you'd get "not valid" signals from TS when the AI is like "just let me finish!" :-)


But that isn’t valid JSON


Right, it would fail even before hitting the typing check.


I'm not familiar with how TypeChat works, but Guidance [1] is another similar project that can actually integrate into the token sampling to enforce formats.

[1]: https://github.com/microsoft/guidance


It’s logit bias. You don’t even need another library to do this. You can do it with three lines of python.

Here’s an example of one of my implementations of logit bias.

https://github.com/ShelbyJenkins/shelby-as-a-service/blob/74...


except that guidance is defunct and is not maintained anymore.


did they announce that anywhere? it does appear like progress has slowed down quite a lot.


I suspect most products are concerned about product-market fit then they can wrangle costs down.

There's also a good assumption that models will be improving structured output as the market is demanding it.


I built and released something really similar to this (but smaller scope) for Laravel PHP this week: https://github.com/adrenallen/ai-agents-laravel

My take on this is, it should be easy for an engineer to spin up a new "bot" with a given LLM. There's a lot of boring work around translating your functions into something ChatGPT understands, then dealing with the response and parsing it back again.

With systems like these you can just focus on writing the actual PHP code, adding a few clear comments, and then the bot can immediately use your code like a tool in whatever task you give it.

Another benefit to things like this, is that it makes it much easier for code to be shared. If someone writes a function, you could pull it into a new bot and immediately use it. It eliminates the layer of "converting this for the LLM to use and understand", which I think is pretty cool and makes building so much quicker!

None of this is perfect yet, but I think this is the direction everything will go so that we can start to leverage each others code better. Think about how we use package managers in coding today, I want a package manager for AI specific tooling. Just install the "get the weather" library, add it to my bot, and now it can get the weather.


Starred this as I've been working on a similar but maybe more broader scoped approach, but I think some of your ideas are really slick!


Hang on, so this is doing runtime validation of an object against a typescript type definition? Can this be shipped as a standalone library/feature? This would be absolutely game changing for validating api response payloads, etc. in typescript codebases.



yup, just found that, super neat, I am 100% interested in using this for other runtime validation...

It's interesting because I've always been under the impression the TS team was against the use of types at runtime (that's why projects like https://github.com/nonara/ts-patch exist), but now they're doing it themselves with this project...

I wonder what the performance overhead of starting up an instance of tsc in memory is? Is this suitable for low latency situations? Lots of testing to do...


Great point. They're against it unless you're running it in a loop and paying for every API call!


I'm very surprised that they're not using `guidance` [0] here.

It not only would allow them to suggest that required fields be completed (avoiding the need for validation [1]) and probably save them GPU time in the end.

There must be a reason and I'm dying to know what it is! :)

Side-note, I was in the process of building this very thing and good ol' Misrocoft just swung in and ate my lunch.. :/

[0] https://github.com/microsoft/guidance

[1] https://github.com/microsoft/TypeChat/blob/main/src/typechat...


It's not super clear how this differs from another recently released library from Microsoft: Guidance (https://github.com/microsoft/guidance).

They both seem to aim to solve the problem of getting typed, valid responses back from LLMs


One of the key things that we've focused on with TypeChat is not just that it acts as a specification for retrieving structured data (i.e. JSON), but that the structure is actually valid - that it's well-typed based on your type definitions.

The thing to keep in mind with these different libraries is that they are not necessarily perfect substitutes for each other. They often serve different use-cases, or can be combined in various ways -- possibly using the techniques directly and independent of the libraries themselves.


    const schema = fs.readFileSync(path.join(__dirname, "sentimentSchema.ts"), "utf8");
    const translator = typechat.createJsonTranslator<SentimentResponse>(model, schema, "SentimentResponse"); 
It would have been much nicer if they took this an an opportunity to build generic runtime type introspection into TypeScript.


So, it's a thing that appends "please format your response as the following JSON" to the prompt", then validates the actual response against the schema, all in a "while (true)" loop (literally) until it succeeds. This unbelievable achievement is a work of seven people (authors of the blog post).

Honestly, this is getting beyond embarrassing. How is this the world we live in?


It's because not everyone can be as gifted as you.

I think the (arguably very prototypical) implementation is not what's interesting here. It's the concept itself. Natural language may soon become the default interface for most of the computing people do on a day to day basis, and tools like these will make it easier to create new applications in this space.


I'm gonna love trying to figure out what query gets the support chatbot to pair me with an actual human so that I can solve something that's off script


Ideally you would jutst click the "talk to a human" button, but what do I know?


Yeah it’s basically a retry loop. I’m curious about the average response time and the worst case amount of iterations.

At best, all these “retry until successfully” are just hacks to bridge the formal world with the stochastic. It’s just useless without some stats on how it performs.

And even if it conforms. Your not sure the data makes sense. Probably .. but exactly that probably

I would not recommend using this in production.


Hm... so how do we know that the actual values in the produced json are correct???


As with anything output by “AI”: you don’t.


One of the authors is Anders Hejlsberg, the guy behind c# and delphi


I think he's probably more of an author in the way that the leader of a research team is always credited on any paper by the team, even if he didn't personally do any actual work on it?

Anyway, TIL that Hejlsberg is also involved with TypeScript...


That’s what makes it even more embarrassing.


I agree with comments saying this is basically a 10-line "demo script" everyone could write and it is weird to have big names associated with it.

But I heard from MS friends that AI is an absolute "need to have". If you're not working on AI, you're not getting (as much) budget. I suspect this is more about ticking the box than producing some complex project. Unfortunately, throughout the company, folks are doing all kinds of weird things to tick the box like writing a "copilot" (with associated azure openai costs) fine-tuned on a handful of documentation articles :(


I've written a version of this in Golang (tied to OpenAI API, mostly): https://github.com/stillmatic/gollum/blob/main/dispatch.go

Define a struct and tag it with golang's json comments. Then, give it a prompt and ...

    type dinnerParty struct {
        Topic       string   `json:"topic" jsonschema:"required" jsonschema_description:"The topic of the conversation"`
        RandomWords []string `json:"random_words" jsonschema:"required" jsonschema_description:"Random words to prime the conversation"`
    }
    completer := openai.NewClient(os.Getenv("OPENAI_API_KEY"))
    d := gollum.NewOpenAIDispatcher[dinnerParty]("dinner_party", "Given a topic, return random words", completer, nil)
    output, _ := d.Prompt(context.Background(), "Talk to me about dinosaurs")
and you should get a response like

    expected := dinnerParty{
        Topic:       "dinosaurs",
        RandomWords: []string{"dinosaur", "fossil", "extinct"},
    }


It's not clear to me how they ensure the responses will be valid JSON, are they just asking for it, then parsing the result with error checking?



seems like they run the generated response through the typescript type checker, and if it fails, retry using the error message as a further hint to the LLM, until it succeeds.


I would expect that, if it doesn’t do that even, why bother… that is also trivial to do anyway.


also some very basic prompt engineering


This is funny, I have something pretty similar in my code, except it's using Zod for runtime typechecking, and I convert Zod schemas to json schemas and send that to gpt-3.5 as a function call. I would expect that using TypeScript's output is better for recovering from errors than with Zod's output, so I can definitely see the advantage of this.


Relevant: Built this which generalizes to arbitrary regex patterns / context free grammars with 100% adherence and is model-agnostic — https://news.ycombinator.com/item?id=36750083


Just use function calling, declare your function schema using Zod, and convert it to JSONSchema automatically. You don't have to write your types more than once, you get proper validation with great error messages, and can extend it.


There already are techniques to guade LLMs into producing output that adhere to a schema. For e.g. forcing LLMs to stick to a Context-Free Grammar: https://matt-rickard.com/context-free-grammar-parsing-with-l...

Just like many similar methods, this is based on logit biasing, so it may have an impact on quality.


Anyone knows in what situations this approach is superior to jsonformer (https://github.com/1rgs/jsonformer) and vice versa?

Or are they solving different problems?

It seems jsonformer has some advantages such as only generating tokens for the values and not the structure of the JSON. But this project seems to have more of a closed feedback loop prompt the model to do the right thing.


At least for llama.cpp users, this recently introduced PR -- https://github.com/ggerganov/llama.cpp/pull/1773 -- introducing grammar-based sampling could potentially improve structural reliability of LLaMA output. They provide an example JSON grammar as well.


Someone should just get this working on Llama 2 instead of O̶p̶e̶n̶AI.com [0]

All this is it's just talking to a AI model sitting on someone else's server.

[0] https://github.com/microsoft/TypeChat/blob/main/src/model.ts...


Hi there! I'm one of the people working on TypeChat and I just want to say that we definitely welcome experimentation on things like this. We've actually been experimenting with running Llama 2 ourselves. Like you said, to get a model working with TypeChat all you really need is to provide a completion function. So give it a shot!


The most recent gpt4all (https://github.com/nomic-ai/gpt4all) includes a local server compatible with OpenAPI -- this could be a useful start!


"Using Zod to Build Structured ChatGPT Queries"[1] is a pattern I found useful. This doesn't seem too different.

[1] https://medium.com/@canadaduane/using-zod-to-build-structure...


This looks quite similar to how were using OpenAI functions and zod (JSON Schema) to have OpenAI answer with JSON and interact with our custom functions to answer a prompt: https://wundergraph.com/blog/return_json_from_openai


Why are we trying to get structured output out of something that was specifically designed to produce natural-language output?


Because we can ;-)


This is a fantastic concept! It's going to be super useful to map users' intent to API / code in a super reliable way.


Reliance on strong typing for LLM output coercion is a potentially lossy and inefficient approach that can introduce redundant LLM queries and costs. LLM output is far more subtle than this. But the strongly typed hammer is very attractive to many developers, particularly those in the Typescript ecosystem.


This is rather trivial. The real challenge would be to make it choose what type to return. The function api does that, but then natural conversations sometimes involve calling multiple functions, and there isn't a good schema for that.


That's interesting way to validate JSON. Basically they run the whole compiler (making it a runtime dependency). Hopefully this horrible implementation would nudge TypeScript developers into a direction of implementing RTTI.


I wish Copilot did something like this. I've found it'll regularly invent C# methods which don't exist, an error which seems trivial to catch and hide from the user. No output is better than bad output.


I'd love to see a robust study on the effectiveness of this and several other ways to coax a structured response out:

- Lots of examples / prompt engineering techniques

- MS Guideance

- TypeChat

- OpenAI functions (the model itself is tuned to do this, a key differentiator)

- ...others?


If I can use this instead of functions, it's gonna save me a buttload of API usage, because the Typescript interface syntax is so concise. Can't wait to try it.


I am not sure why this exist, maybe I am missing something, and it does not seem like there is much value past “hey check this out this is possible”


Here's a relevant paper that folks may find interesting: <snip>Semantic Interpreter leverages an Analysis-Retrieval prompt construction method with LLMs for program synthesis, translating natural language user utterances to ODSL programs that can be transpiled to application APIs and then executed.</snip>

https://arxiv.org/abs/2306.03460


Why this instead of GPT Functions?


it's basically the same thing, but uses a more concise spec for writing the schema (typescript vs jsonschema)

In the end, both methods try to coax the model into returning a JSON object, one method can be used with any model, the other is tied to a specific, ever changing vendor API

Why would one choose to only support "OpenAI" and nothing else?


TL;DR: This is ChatGPT + TypeScript.

I'm totally happy to be able to receive structured queries, but I'm also not 100% sure TypeScript is the right tool, it seems to be an overkill. I mean obviously you don't need the power of TS with all its enums, generics, etc.

Plus given that it will run multiple queries in loop, it might end up very expensive for it abide by your custom-mage complex type


this is going to create space for some hilarious and funky input attacks.


TL;DR: It's asking ChatGPT to format response according to a schema.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: