More

the_mitsuhiko · 2025-11-16T23:44:54 1763336694

> * Contrary to APIs, they can change their interface whenever they want and with little consequences.

I already made this argument before, but that's not entirely right. I understand that this is how everybody is doing it right now, but that in itself cause issues for more advanced harnesses. I have one that exposes MCP tools as function calls in code, and it encourages the agent to materialize composed MCP calls into scripts on the file system.

If the MCP server decides to change the tools, those scripts break. That is is also similar issue for stuff like Vercel is advocating for [1].

[1]: https://vercel.com/blog/generate-static-ai-sdk-tools-from-mc...

lsaferite · 2025-11-17T16:09:27 1763395767

Wouldn't the answer to this be to have the agent generate a new materialized workflow though? You already presumably have automated the agent's ability to create these workflows based off some prompting and a set of MCP Servers.

the_mitsuhiko · 2025-11-16T21:24:41 1763328281

I don't find that to help much at all, particularly because some tools really only make sense with a bunch of other tools and then your context is already polluted. It's surprisingly hard to do this right, unless you have a single tool MCP (eg: a code/eval based tool, or an inference based tool).

stavros · 2025-11-16T21:37:53 1763329073

Don't you have a post about writing Python instead of using MCP? I can't see how MCP is more efficient than giving the LLM a bunch of function signatures and allow it to call them, but maybe I'm not familiar enough with MCP.

the_mitsuhiko · 2025-11-16T21:40:47 1763329247

> Don't you have a post about writing Python instead of using MCP?

Yes, and that works really well. I also tried various attempts of letting agents to write code that exposes MCP tool calls via an in-language API. But it's just really, really hard to work with because MCP tools are generally not in the training set, but normal APIs are.

stavros · 2025-11-16T21:48:28 1763329708

Yeah, I've always thought that your proposal was much better. I don't know why one of the big companies hasn't released something that standardised on tool-calling via code, hm.

the_mitsuhiko · 2025-11-14T19:03:07 1763146987

> How does society resolve this kind of abuse of the democratic process? It is a dynamic that is repeated in many areas.

A lot of society wants this. A lot of parents are asking for this.

soulofmischief · 2025-11-14T19:05:57 1763147157

That doesn't mean anything, because they're not necessarily educated on the topic, and yet are making decisions that affect everyone.

When it's so cheap to enact mass propaganda, selective omission and manufactured intent, it becomes impossible to just say, "well, the people want it." Their decision making process is compromised by the same people pushing these policies through.

Democracy is indeed broken, and we have to take that seriously if we're going to fix it.

dns_snek · 2025-11-14T19:08:44 1763147324

Please quantify "a lot". What percentage of the population wants all private communication between adults to be monitored and censored by a government agency? Can we put it to a vote - right after publicly discussing (debunking) all of the false beliefs that its proponents have?

the_mitsuhiko · 2025-11-14T19:30:28 1763148628

The question that is at the core is “police can wire tap calls but they cannot wire tap chats. Should this change?” The details are not all that important to people.

dns_snek · 2025-11-14T20:54:36 1763153676

Legal interception requires a court order, Chat control is mass surveillance.

Trying to build support for mass surveillance by misrepresenting it as targeted tool with checks and balances is exactly the kind of bad faith discourse I'm talking about.

the_mitsuhiko · 2025-11-06T17:58:26 1762451906

And Europeans don't it because quite frankly, we're not really doing anything particularly impressive with AI sadly.

abecode · 2025-11-06T21:18:06 1762463886

At ECAI conference last week there was a panel discussion and someone had a great quote, "in Europe we are in the golden age of AI regulation, while the US and China are in the actual golden age of AI".

speedgoose · 2025-11-06T18:03:03 1762452183

To misquote the French president, "Who could have predicted?".

https://fr.wikipedia.org/wiki/Qui_aurait_pu_pr%C3%A9dire

embedding-shape · 2025-11-06T19:35:53 1762457753

He didn't coin that expression did he? I'm 99% sure I've heard people say that before 2022, but now you made me unsure.

Sharlin · 2025-11-06T19:50:57 1762458657

"Who could've predicted?" as a sarcastic response to someone's stupid actions leading to entirely predictable consequences is probably as old as sarcasm itself.

speedgoose · 2025-11-07T08:00:52 1762502452

People said it before, but he said it without sarcasm about things that many people could in fact predict.

seydor · 2025-11-06T19:58:42 1762459122

We could add cookie warnings to AI, everybody loves those

DrNosferatu · 2025-11-06T20:13:14 1762459994

Europe should act and make its own, literal, Moonshot:

https://ifiwaspolitical.substack.com/p/euroai-europes-path-t...

imtringued · 2025-11-07T09:10:39 1762506639

>Moonshot 1: GPT-4 Parity (2027) >Objective: 100B parameter model matching GPT-4 benchmarks, proving European technical viability

This feels like a joke... Parity with a 2024 model in 2027? The Chinese didn't wait, they just did it.

The timeline for #1 LLM is also so far into the future that it is entirely plausible that by 2031, nobody uses transformer based LLMs as we know them today anymore. For reference: The attention paper is only 8 years old. Some wild new architecture could come out in that time that makes catching up meaningless.

DrNosferatu · 2025-11-07T15:32:15 1762529535

Note the EU-Moonshot project is based on own silicon / compute sovereignty.

GPT4 parity on a own silicon trained indigenous model is just an early goal.

Indeed, the ultimate goal is EU LLM supremacy - which means under democratic control.

toephu2 · 2025-11-07T03:27:22 1762486042

Europe gave us cookie popups on every single website.

Gigachad · 2025-11-07T06:43:03 1762497783

Only ones with invasive spyware cookies. Essential site function cookies do not require a consent banner.

alpineman · 2025-11-06T20:36:33 1762461393

actually Mistral is pretty good and catching up as the other leading models stagnate - the coding and OCR is particularly good

utopiah · 2025-11-06T21:06:00 1762463160

> we're not really doing anything particularly impressive with AI sadly.

Well, that's true... but also nobody else is. Making something popular isn't particularly impressive.

saubeidl · 2025-11-06T20:28:30 1762460910

Honestly, do we need to? If the Chinese release SOTA open source models, why should we invest a ton just to have another one? We can just use theirs, that's the beauty of open source.

hex4def6 · 2025-11-06T22:03:23 1762466603

For the vast majority, they're not "open source" they're "open weights". They don't release the training data or training code / configs.

It's kind of like releasing a 3d scene rendered to a JPG vs actually providing someone with the assets.

You can still use it, and it's possible to fine-tune it, but it's not really the same. There's tremendous soft power in deciding LLM alignment and material emphasis. As these things become more incorporated into education, for instance, the ability to frame "we don't talk about ba sing se" issues are going to be tremendously powerful.

uvaursi · 2025-11-06T19:53:58 1762458838

[flagged]

jacquesm · 2025-11-06T20:30:43 1762461043

What a load of tripe.

saubeidl · 2025-11-06T20:22:30 1762460550

I'm tired of this ol' propaganda trope.

* We're leading the world in fusion research. https://www.pppl.gov/news/2025/wendelstein-7-x-sets-new-perf...

* Our satellites are giving us by far the best understanding of our universe, capturing one third of the visible sky in incredible detail - just check out this mission update video if you want your mind blown: https://www.youtube.com/watch?v=rXCBFlIpvfQ

* Not only that, the Copernicus mission is the world's leading source for open data geoobservation: https://dataspace.copernicus.eu/

* We've given the world mRNA vaccines to solve the Covid crisis and GLP-1 antagonists to solve the obesity crisis.

* CERN and is figuring out questions about the fundamental nature of the universe, with the LHC being by far the largest particle accelerator in the world, an engineering precision feat that couldn't have been accomplished anywhere else.

Pioneering, innovation and drive forward isn't just about the latest tech fad. It's about fundamental research on how our universe works. Everyone else is downstream of us.

CamperBob2 · 2025-11-07T02:14:55 1762481695

Don't worry, we in the US are hot on your heels in the own-goal game ( https://www.space.com/space-exploration/nasa-is-sinking-its-... ).

All you have to do is wait by the Trump River and wait for our body to come floating by.

uvaursi · 2025-11-07T14:07:39 1762524459

I’m confused. Who is this “We”? Do you realize how behind in many respects most of Europe is? How it’s been parceled up and destroyed by the EU? Science projects led by a few countries doesn’t cut it.

It’s not propaganda at all. The standards of living there are shit. But enjoy the particle collider, I guess?

saubeidl · 2025-11-07T14:44:29 1762526669

We is Europe. Like everywhere else, we are behind in some aspects and ahead in others.

> The standards of living there are shit.

Now you're just trolling. I've lived in both the US and in multiple EU countries. Let me tell you, the standard of living in the US does not hold a candle to the one in the EU.

the_mitsuhiko · 2025-11-06T07:45:15 1762415115

> idempotency (absurd does this via steps - but would be better handled with the APIs themselves being idempotent, then not using steps

That is very hard to do with agents which are just all probabilistic. However if you do have an API that is either idempotent / uses idempotency keys you can derive an idempotency key from the task: const idempotencyKey = `${ctx.taskID}:payment`;

That said: many APIs that support the idempotency-key header only support replays of an hour to 24 hours, so for long running workflows you need to capture the state output anyways.

eximius · 2025-11-06T17:51:16 1762451476

I was not thinking of the agent case specifically. But yes, you have to make the APIs idempotent, either with these step checkpoints or by wrapping the underlying API. It's not hard to make a postgres-transaction-based idempotency layer wrapper, then you can have a much longer idempotency TTL.

> so for long running workflows you need to capture the state output anyways.

That would be a _very_ long running workflow. Probably worth breaking up into different subtasks or, I guess as Absurd does it, step checkpoints.

the_mitsuhiko · 2025-11-06T07:43:34 1762415014

It's quite funny in a way for me because even back in the Cadence days I thought it was the hottest shit ever, but it was just too complex to run for a small company and cadence was not the first (SWF and others came before). It felt like unless you had really large workflows you would ignore these systems entirely. And now, due to the problems that agents pose, we're all in need of that.

I'm happy it's slowly moving towards mass appeal, but I hope we find some simple solutions like Absurd too.

the_mitsuhiko · 2025-11-05T23:21:21 1762384881

> I took a brief look at your docs. What would you say is the main difference of yours vs some of the other options? Just the simplicity of it being a single sql file and a sdk wrapper? Sorry if the docs answer this already - trying to take a quick look between work.

It's really just trying to be as simple as possible. I was motivated by trying to just do the most simple thing I could come up with after I did not really find the other solutions to be something I wanted to build on.

I'm sure they are great, but I want to leave the window open to having people self host what we are building / enable us to deploy a cellular architecture later and thus I want to stick to a manageable number of services until until I can no longer. Postgres is a known quantity in my stack and the only postgres only solution was DBOS which unfortunately did not look ready for prime time yet when I tried it. That said, I noticed that DBOS is making quite some progress so I'm somewhat confident that it will eventually get there.

jedberg · 2025-11-06T01:21:57 1762392117

Could you provide some more specifics as to why DBOS isn’t “ready for prime time”? Would love to know what you think is missing!

FWIW DBOS is already in production at multiple Fortune 500 companies.

the_mitsuhiko · 2025-11-06T07:40:02 1762414802

> Could you provide some more specifics as to why DBOS isn’t “ready for prime time”? Would love to know what you think is missing!

Some time in September I was on a call with Qian Li and Peter Kraft and gave them direct feedback. The initial reason this call happened was related to a complaint of mine [1] about excessive dependencies on the Python client which was partially remedied [2] but I felt it should be possible to offload complexity away from the clients even further. My main experiments were pretty frustrating when I tried it originally because everything in the client is global state (you attach to the DBOS object) which did not work for how I was setting up my app. I also ran into challenges with passing through async and I found the step based retry not to work for me.

(There are also some other oddities in the Python client in particular: why does the client need to know about Flask?)

In the end, I just felt it was a bit too early and I did not want to fight that part of the infrastructure too much. I'm sure it will get there, and I'm sure it has happy users. I was just not one of them.

[1]: https://x.com/mitsuhiko/status/1958504241032511786

[2]: https://x.com/qianl_cs/status/1971242911888281608

biasafe_belm · 2025-11-06T02:33:46 1762396426

I'd love to hear both of your thoughts! I'm considering durable execution and DBOS in particular and was pretty happy to see Armin's shot at this.

I'm building/architecting a system which will have to manage many business-critical operations on various schedules. Some will be daily, some bi-weekly, some quarterly, etc. Mostly batch operations and ETL, but they can't fail. I have already designed a semblance of persistent workflow in that any data ingestion and transformation is split into atomic operations whose results are persisted to blob storage and indexed in a database for cataloguing. This means that, for example, network requests can be "replayed", and data transformation can be resumed at any intermediate step. But this is enforced at the design stage, not runtime like other solutions.

My system also needs to be easily auditable and written in Python. There are many, many ways to build this (especially if you include cloud offerings) but, like Armin, I'm trying to find the simplest architecture possible so our very small team can focus on building and not maintaining.

jedberg · 2025-11-06T16:46:58 1762447618

As the CEO of DBOS I'm of course heavily biased, but I think DBOS is the perfect solution for you. It has everything you need (queues, crons, easy to understand pure Python). We'd be happy to help you through it if you get stuck. You can pop into the DBOS discord too (link on our webpage).

mfrye0 · 2025-11-07T17:52:53 1762537973

Thanks for this. That makes sense.

the_mitsuhiko · 2025-11-04T15:31:15 1762270275

That's a completely unrelated issue. Once someone sent you a link to a tweet, you could read it.

as1mov · 2025-11-04T15:42:34 1762270954

Is it unrelated? From the parent comment:

> Come on, pre-Elon you could click on a Twitter link and read the entire thread as well as the replies, now you just get a single tweet with no context above/below.

I don't want to nitpick stupid shit like this mate. But my point was to emphasise that Twitter had been going downhill before the takeover.

(And fact that it was always a toxic cesspool regardless of who owned it, but that's a different matter altogether)

the_mitsuhiko · 2025-11-03T21:38:23 1762205903

Publishing frontend source maps is not uncommon and quite often done intentionally.

the_mitsuhiko · 2025-11-03T16:35:34 1762187734

> short answer--maybe not that _hard_, but it adds a lot of complexity to manage when you're trying to offer real-time search. most vector DB solutions offer this ootb. This post is meant to just point out the tradeoffs with pgvector (that most posts seem to skip over)

Question is if that tradeoff is more or less complexity than maintaining a whole separate vector store.