Hey HN! We just open-sourced – our text to SQL LLM that outperforms OpenAI's gpt-3.5-turbo on out-of-training-set schemas, and matches gpt-4 when trained on a single business's schema.
SQLCoder is a fine-tuned variant of StarCoder, supplemented with a lot of hand-curated of data and slightly novel fine-tuning techniques.
We are also open-sourcing our framework for evaluating whether LLM-generated SQL is correct. SQL is tricky to evaluate. Two very different SQL queries can both be "correct". For the question, "who are the 10 most recent users from Toronto", both of the following are correct in their own ways – so we had to build a new framework algorithm to evaluate query correctness.
Query 1: ```sql
SELECT userid, username
from users
where city='Toronto'
order by created_at DESC LIMIT 10;
```
Query 2: ```sql
SELECT userid, firstname || ' ' || lastname
from users
where city='Toronto'
order by created_at DESC LIMIT 10;
```
The model is small enough to run on a single A100 40GB with weights in 16 bit floats, or on a single high-end consumer GPU (like RTX 3090/4090) with 8bit quantization. We will also release a ggml-based quantized version soon, and you should soon be able to run it on most M1 or M2 Macbooks with 32GB of RAM.
The model weights have a CC BY-SA 4.0 license. You can use and modify the model for any purpose – including commercial use. However, if you modify the weights (for example, by fine-tuning), you must open-source your modified weights under the same license terms.
This is so sorely needed. I used the app after the PH launch and loved how easy the self-serve was!
Do you have plans to let users define "types" of data that can be redacted (like monetary terms in a contract, code embedded in documents etc)? Also, any plans on making this an API that other developers could build on top of?
Great questions and thanks for trying the product!!
Yup so a few thoughts here - we're exploring using embeddings to allow a description of what you want to hide, that will actually immediately show you what of the data you have synced already would be caught by that (or which previous requests).
This looks fantastic. Will try replacing our current fine-tuned FLAN-UL2 model with this.
I wonder how the devtooling around this will evolve. Seems like a matter of days until someone creates a GUI wrapper around this, and obviates the need to use programmer time for fine-tuning
I'm curious, what are the differences between T5, Flan-T5, and Flan-UL2 for fine-tuning? Does the instruction tuning matter at all, once you're fine-tuning?
Great tool! Some feedback on your landing page: the Start Practicing CTA button at the bottom of the page should ideally link to the link [1] in your "Start a new YC interview prep session to kick-start your prep" option. Not just the login screen. I was confused about how to start a practice interview after I logged in.
Ah, yeah this landing page is optimized for new accounts. For new accounts that use the registration link on the page, we preload the YC prep project template on their accounts.
But you’re right, we should be able to devise a way to use a single link for both cases, new and existing users. We’ll try to tweak the setup here to better accommodate existing users!
Farm fires are partially a result of legislation to prevent ground water levels from going too low [1]. Fairly multifaceted issue, not just electoral appeasement
I like Wikipedia's current events portal [1]. Does not (generally) include frivolous news, and provides a much wider perspective than most news outlets
Algorithmically generated content by itself might not be a bad thing for users' search experience IMO, as long as the engine can differentiate between "right" and "gibberish" content.
Linking to verified sources (research papers, official websites, verified social media accounts) when writing about these topics might make this easier. LLMs will then be able to understand if one if misrepresenting what was stated in the linked sources or not.
Did something similar as an experiment a few years ago, except I used photos and name strings as fuzzy identifiers across social media profiles.
We also scraped individual reactions from social media apps to get a _very_ detailed profile on what they engaged with (like using the "Angry" reaction emoji when Trump said something stupid vs using the "Angry" reaction emoji when someone AOC said something stupid).
Never released it in the wild for obvious ethnical reasons, but was an interesting technical challenge. Also led to super interesting insights – like learning that videos and text links were watched by entirely different audiences on Facebook and Twitter [1]
SQLCoder is a fine-tuned variant of StarCoder, supplemented with a lot of hand-curated of data and slightly novel fine-tuning techniques.
We are also open-sourcing our framework for evaluating whether LLM-generated SQL is correct. SQL is tricky to evaluate. Two very different SQL queries can both be "correct". For the question, "who are the 10 most recent users from Toronto", both of the following are correct in their own ways – so we had to build a new framework algorithm to evaluate query correctness.
Query 1: ```sql SELECT userid, username from users where city='Toronto' order by created_at DESC LIMIT 10; ```
Query 2: ```sql SELECT userid, firstname || ' ' || lastname from users where city='Toronto' order by created_at DESC LIMIT 10; ```
The model is small enough to run on a single A100 40GB with weights in 16 bit floats, or on a single high-end consumer GPU (like RTX 3090/4090) with 8bit quantization. We will also release a ggml-based quantized version soon, and you should soon be able to run it on most M1 or M2 Macbooks with 32GB of RAM.
The model weights have a CC BY-SA 4.0 license. You can use and modify the model for any purpose – including commercial use. However, if you modify the weights (for example, by fine-tuning), you must open-source your modified weights under the same license terms.
Our evaluation framework is at https://defog.ai/blog/open-sourcing-sqleval/ Interactive demo: https://defog.ai/sqlcoder-demo/
Would love for you to give it a spin, and let us know what you think!