Hacker Newsnew | past | comments | ask | show | jobs | submit | vrm's commentslogin

a 6:1 parameter ratio is too small for specdec to have that much of an effect. You'd really want to see 10:1 or even more for this to start to matter

You're right on ratios, but actually the ratio is much worse than 6:1 since they are MoEs. The 20B has 3.6B active, and the 120B has only 5.1B active, only about 40% more!

This is neat! I think in general there are really deep connections between semantically meaningful diffs (across modalities) and supervision of AI models. You might imagine a human-in-the-loop workflow where the human makes edits to a particular generation and then those edits are used as supervision for a future implementation of that thing. We did some related work here: https://www.tensorzero.com/blog/automatically-evaluating-ai-... on the coding use case but I'm interested in all the different approaches to the problem and especially on less structured domains.


if you haven't check out our repo -- it's free, fully self-hosted, production-grade, and designed for precisely this application :)

https://github.com/TensorZero/tensorzero


Looks very buttoned up. My local project has some features tuned for my explicit agent flows however (built directly into my inference engine), so can't really jump ship just yet.

Looking great so far though!


I definitely see different prompts based on what I'm doing in the app. As we mentioned there are different prompts for if you're asking questions, doing Cmd-K edits, working in the shell, etc. I'd also imagine that they customize the prompt by model (unobserved here, but we can also customize per-model using TensorZero and A/B test).


wireshark would work for seeing the requests from the desktop app to Cursor’s servers (which make the actual LLM requests). But if you’re interested in what the actual requests to LLMs look like from Cursor’s servers you have to set something like this up. Plus, this lets us modify the request and A/B test variations!


Sorry, can you explain this a bit more? Either you're putting something between your desktop to the server (in which case Wireshark would work) or you're putting something between Cursor's infrastructure and their LLM provider, in which case, how?


we're doing the latter! Cursor lets you configure the OpenAI base URL so we were able to have Cursor call Ngrok -> Nginx (for auth) -> TensorZero -> LLMs. We explain in detail in the blog post.


Ah OK, I saw that, but I thought that was the desktop client hitting the endpoint, not the server. Thanks!


We're working on an OSS industrial-grade version of this at TensorZero but there's a long way to go. I think the easiest out of the box solution today is probably OpenAI RFT but that's a partial solve with substantial vendor lock-in.


This is very neat work! Will be interested in how they make this sort of thing available to the public but it is clear from some of the results they mention that search + LLM is one path to the production of net-new knowledge from AI systems.


would it be possible to fuzz an arbitrary JSON schema with this? I’ve been looking for such a lib for a while now


If you can translate the schema into native rust types, then yes!

https://github.com/oxidecomputer/typify may help for starters. Please create an issue if you need further help with integration!


OP here: I saw the spinlaunch video and got really excited that this was completely tractable in simplified form with basic physics. So I did the math expecting to see a huge savings in fuel mass and....it was basically negligible. I'm curious if anyone on HN (maybe even from Spinlaunch!) could explain where I go wrong. Otherwise you might find it interesting to read. Thanks!


like I said above, we certainly hope so! It has been slow progress so far but applying modern ML / control techniques to tokamaks is one of the truly exciting applications of the current generation of AI in my opinion. Biased because this is literally what I do all day


Do you have a website or any papers on your work I could read?


I need to redo my website, just getting into the more public part of my PhD. Papers I would recommend from our collaboration on

control of normalized plasma pressure: https://papers.nips.cc/paper/2019/hash/7876acb66640bad41f1e1...

plasma profile transport modeling: https://iopscience.iop.org/article/10.1088/1741-4326/abe08d/...

hybrid dynamical modeling of gross plasma quantities: https://arxiv.org/abs/2006.12682

uncertainty quantification for plasma dynamics: https://arxiv.org/abs/2011.09588

It's still early days for this work and for us but we're looking at pushing reinforcement learning in methods and engineering to solve this problem


Do you have a github repo for the controller software? It will be fiberop to the sensors? And currentcontrolling by the ai?

Have you considered antagonistic training? One AI tries to destabilize the proces, the other trains not against a simulation, but against the destabilizing input and a succes-metric?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: