You're right on ratios, but actually the ratio is much worse than 6:1 since they are MoEs. The 20B has 3.6B active, and the 120B has only 5.1B active, only about 40% more!
This is neat! I think in general there are really deep connections between semantically meaningful diffs (across modalities) and supervision of AI models. You might imagine a human-in-the-loop workflow where the human makes edits to a particular generation and then those edits are used as supervision for a future implementation of that thing. We did some related work here: https://www.tensorzero.com/blog/automatically-evaluating-ai-... on the coding use case but I'm interested in all the different approaches to the problem and especially on less structured domains.
Looks very buttoned up. My local project has some features tuned for my explicit agent flows however (built directly into my inference engine), so can't really jump ship just yet.
I definitely see different prompts based on what I'm doing in the app. As we mentioned there are different prompts for if you're asking questions, doing Cmd-K edits, working in the shell, etc. I'd also imagine that they customize the prompt by model (unobserved here, but we can also customize per-model using TensorZero and A/B test).
wireshark would work for seeing the requests from the desktop app to Cursor’s servers (which make the actual LLM requests). But if you’re interested in what the actual requests to LLMs look like from Cursor’s servers you have to set something like this up. Plus, this lets us modify the request and A/B test variations!
Sorry, can you explain this a bit more? Either you're putting something between your desktop to the server (in which case Wireshark would work) or you're putting something between Cursor's infrastructure and their LLM provider, in which case, how?
we're doing the latter! Cursor lets you configure the OpenAI base URL so we were able to have Cursor call Ngrok -> Nginx (for auth) -> TensorZero -> LLMs. We explain in detail in the blog post.
We're working on an OSS industrial-grade version of this at TensorZero but there's a long way to go. I think the easiest out of the box solution today is probably OpenAI RFT but that's a partial solve with substantial vendor lock-in.
This is very neat work! Will be interested in how they make this sort of thing available to the public but it is clear from some of the results they mention that search + LLM is one path to the production of net-new knowledge from AI systems.
OP here: I saw the spinlaunch video and got really excited that this was completely tractable in simplified form with basic physics. So I did the math expecting to see a huge savings in fuel mass and....it was basically negligible. I'm curious if anyone on HN (maybe even from Spinlaunch!) could explain where I go wrong. Otherwise you might find it interesting to read. Thanks!
like I said above, we certainly hope so! It has been slow progress so far but applying modern ML / control techniques to tokamaks is one of the truly exciting applications of the current generation of AI in my opinion. Biased because this is literally what I do all day
Do you have a github repo for the controller software? It will be fiberop to the sensors? And currentcontrolling by the ai?
Have you considered antagonistic training? One AI tries to destabilize the proces, the other trains not against a simulation, but against the destabilizing input and a succes-metric?
reply