Hacker Newsnew | past | comments | ask | show | jobs | submit | crystal_revenge's commentslogin

Decoder only LLMs are Markov chains with sophisticated models of the state space. Anyone familiar with Hamiltonian Markov Chains will know that for good results you need a warm up period so that you're sampling from the typical set which is the area with generally the highest probability density in the distribution (not necessary the high propbability/maximum likelihood).

I have spent a lot of time experimenting with Chain of Thought professionally and I have yet to see any evidence to suggest that what's happening with CoT is any more (or less) than this. If you let the model run a bit longer it enters a region close to the typical set and when it's ready to answer you have a high probability of getting a good answer.

There's absolutely no "reasoning" going on here, except that some times sampling from the typical set near the region of your answer is going to look very similar to how human reason before coming up with an answer.


I don't understand the analogy.

If I'm using an MCMC algorithm to sample a probability distribution, I need to wait for my Markov chain to converge to a stationary distribution before sampling, sure.

But in no way is 'a good answer' a stationary state in the LLM Markov chain. If I continue running next-token prediction, I'm not going to start looping.


I think you're confusing the sampling process and the convergence of those samples with the warmup process (also called 'burn-in') in HMC. When doing HMC MCMC we typically don't start sampling right away (or, more precisely we throw out those samples) because we may be initializing the sampler in a part of the distribution that involves pretty low probability density. After the chain has run awhile it tends to end up sampling from the typical set which, especially in high dimensional distribution, tends to more correctly represent the distribution we actually want to integrate over.

So for language when I say "Bob has three apples, Jane gives him four and Judy takes two how many apples does Bob have" we're actually pretty far from the part of the linguistic manifold where the correct answer is likely to be. As the chain wanders this space it's getting closer until it finally statistically follow the path "this answer is..." and when it's sampling from this path it's in a much more likely neighborhood of the correct answer. That is, after wandering a bit, more and more of the possible paths are closer to where the actual answer lies than they would be if we had just forced the model to choose early.

edit: Michael Betancourt has great introduction to HMC which covers warm-up and the typical set https://arxiv.org/pdf/1701.02434 (he has a ton more content that dives much more deeply into the specifics)


The warmup process is necessary in order to try to find high-probability regions of the target distribution. That's not an issue for an LLM, since it's trained to sample directly from a distribution which looks like natural language.

There is some work on using MCMC to sample from higher-probability regions of an LLM distribution [1], but that's a separate thing. Nobody doubts that an LLM is sampling from its target distribution from the first token it outputs.

[1] https://arxiv.org/abs/2510.14901


> When doing HMC MCMC we typically don't start sampling right away (or, more precisely we throw out those samples) because we may be initializing the sampler in a part of the distribution that involves pretty low probability density.

And how that applies to LLMs? Since they don't do MCMC.


No, I still don’t understand the analogy.

All of this burn-in stuff is designed to get your Markov chain to forget where it started.

But I don’t want to get from “how many apples does Bob have?” to a state where Bob and the apples are forgotten. I want to remember that state, and I probably want to stay close to it — not far away in the “typical set” of all language.

Are you implicitly conditioning the probability distribution or otherwise somehow cutting the manifold down? Then the analogy would be plausible to me, but I don’t understand what conditioning we’re doing and how the LLM respects that.

Or are you claiming that we want to travel to the “closest” high probability region somehow? So we’re not really doing burn-in but something a little more delicate?


You need to think about 1) the latent state 2) the fact that part of the model is post trained to bias the MC towards abiding by the query in the sense of the reward.

A way to look at it is that you effectively have 2 model "heads" inside the LLM, one which generates, one which biases/steers.

The MCMC is initialised based on your prompt, the generator part samples from the language distribution it has learned, while the sharpening/filtering part biases towards stuff that would be likely to have this MCMC give high rewards in the end. So the model regurgitates all the context that is deemed possibly relevant based on traces from the training data (including "tool use", which then injects additional context) and all those tokens shift the latent state into something that is more and more typical of your query.

Importantly, attention acts as a Selector and has multiple heads, and these specialize, so (simplified) one head can maintain focus on your query and "judge" the latent state, while the rest can follow that Markov chain until some subset of the generated+tool injected tokens give enough signal to the "answer now" gate that the middle flips into "summarizing" mode, which then uses the latent state of all of those tokens to actually generate the answer.

So you very much can think of it as sampling repeatedly from an MCMC using a bias, A learned stoping rule and then having a model creating the best possible combination of the traces, except that all this machinery is encoded in the same model weights that get to reuse features between another, for all the benefits and drawbacks that yields.

There was a paper close when OF became a thing that showed that instead of doing CoT, you could just spend that token budget on K parallel shorter queries (by injecting sth. Like "ok, to summarize" and "actually" to force completion ) and pick the best one/majority vote. Since then RLHF has made longer traces more in distribution (although there's another paper that showed as of early 2025 you were trading reduced variance and peak performance as well as loss of edge cases for higher performance on common cases , although this might be ameliorated by now) but that's about the way it broke down 2024-2025


In RNNs and Transformers we obtain probability distribution of target variable directly and sample using methods like top-k or temprature sampling.

I don't see the equivalence to MCMC. It's not like we have a complex probability function that we are trying to sample from using a chain.

It's just logistic regression at each step.


Right, you're describing sampling a single token which is equivalent to sampling from one step in the Markov Chain. When generating output you're repeating this process and updating your state sequentially which is the definition of the Markov Chain since at each state the embedding (which represents our current state) is conditionally independent of the past.

Every response from an LLM is essentially the sampling of a Markov chain.


I wish it got called "scaffolding" instead

When you adopt the probability distribution point of view, this is often called 'burn-in'. See eg the usage in https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_al...

That sounds a lot like bad marketing. Chain of thoughts is better, it makes you think the thing is thinking !

How does MC warm-up fit with LLMs? With LLMs you start with a prompt, so I don't see how "warm up" applies.

You're not just sampling from them like some MC cases.

> If you let the model run a bit longer it enters a region close to the typical set and when it's ready to answer you have a high probability of getting a good answer.

What does "let the model run a bit longer" even mean in this context?


I've generally found an inverse correlation between "understands AI" and "exuberance for AI".

I'm the only person at my current company who has had experience at multiple AI companies (the rest have never worked on it in a production environment, one of our projects is literally something I got paid to deliver customers at another startup), has written professionally about the topic, and worked directly with some big names in the space. Unsurprisingly, I have nothing to do with any of our AI efforts.

One of the members of our leadership team, who I don't believe understands matrix multiplication, genuinely believes he's about to transcend human identity by merging with AI. He's publicly discussed how hard it is to maintain friendship with normal humans who can't keep up.

Now I absolutely think AI is useful, but these people don't want AI to be useful they want it to be something that anyone who understands it knows it can't be.

It's getting to the point where I genuinely feel I'm witnessing some sort of mass hysteria event. I keep getting introduced to people who have almost no understanding of the fundamentals of how LLMs work who have the most radically fantastic ideas about what they are capable of on a level I have ever experienced in my fairly long technical career.


Personally, I don't understand how LLMs work. I know some ML math and certainly could learn, and probably will, soon.

But my opinions about what LLMs can do are based on... what LLMs can do. What I can see them doing. With my eyes.

The right answer to the question "What can LLMs do?" is... looking... at what LLMs can do.


I'm sure you're already familiar with the ELIZA effect [0], but you should be a bit skeptical of what you are seeing with your eyes, especially when it comes to language. Humans have an incredible weakness to be tricked by language.

You should be doubly skeptically ever since RLHF has become standard as the model has literally been optimized to give you answers you find most pleasing.

The best way to measure of course is with evaluations, and I have done professional LLM model evaluation work for about 2 years. I've seen (and written) tons of evals and they both impress me and inform my skepticism about the limitations of LLMs. I've also seen countless times where people are convinced "with their eyes" they've found a prompt trick that improves the results, only to be shown that this doesn't pan out when run on a full eval suite.

As an aside: What's fascinating is that it seems our visual system is much more skeptical, an eyeball being slightly off created by a diffusion model will immediately set off alarms where enough clever word play from an LLM will make us drop our guard.

0. https://en.wikipedia.org/wiki/ELIZA_effect


We get around this a bit when using it to write code since we have unit tests and can verify that it's making correct changes and adhering to an architecture. It has truly become much more capable in the last year. This technology is so flexible that it can be used in ways no eval will ever touch and still perform well. You can't just rely on what the labs say about it, you have to USE it.

Interesting observation about the visual system. Truth be told, we get the visual feedback about the world at a much higher data rat AND the visual about the world is usually much higher correlated with reality, whereas the language is a virtual byproduct of cognition and communication.

No one understands how LLMs work. But some people manage to delude themselves into thinking that they do.

One key thing that people prefer not to think about is that LLMs aren't created by humans. They are created by an inhuman optimization algorithm that humans have learned to invoke and feed with data and computation.

Humans have a say in what it does and how, but "a say" is about the extent of it. The rest is a black box - incomprehensible products of a poorly understood mathematical process. The kind of thing you have to research just to get some small glimpses of how it does what it does.

Expecting those humans to understand how LLMs work is a bit like expecting a woman to know how humans work because she made a human once.


Bro- do you even matrix multiply?

Spot on in my experience.

I work in a space where I get to build and optimise AI tools for my own and my team's use pretty much daily. As such I focus mainly on AI'ing the crap out of boring & time-consuming stuff that doesn't interest any of us any more, and luckily enough there's a whole lot of low hanging fruit in that space where AI is a genuine time, cost and sanity saver.

However any activity that requires directed conscious thought and decision making where the end state isn't clearly definable up front tends to be really difficult for AI. So much of that work relies on a level of intuition and knowledge that is very hard to explain to a layman - let alone eidetic idiots like most AIs.

One example is trying to get AI to identify security IT incidents in real time and take proactive action. Skilled practitioners can fairly easily use AI to detect anomalous events in near real time, but getting AI to take the next step to work out which combinations of "anomalous" activities equate to "likely security incident" is much harder. A reasonably competent human can usually do that relatively quickly, but often can't explain how they do it.

Working out what action is appropriate once the "likely security incident" has been identified is another task that a reasonably competent human can do, but where AIs are hopeless. In most cases, a competent human is WAAAY better at identifying a reasonable way forward based on insufficient knowledge. In those cases, a good decision made quickly is preferable to a perfect decision made slowly, and humans understand this fairly intuitively.


> I've generally found an inverse correlation between "understands AI" and "exuberance for AI".

Few years ago I had this exact observation regarding self driving cars. Non/semi engineers who worked in the tech industry were very bullish about self driving cars, believing every and ETA spewed by Musk, engineers were cautious optimistically or pessimistically depending on their understanding of AI, LiDAR, etc.


This completely explains why so many engineers are skeptical of AI while so many managers embrace it: The engineers are the ones who understand it.

(BTW, if you're an engineer who thinks you don't understand AI or are not qualified to work on it, think again. It's just linear algebra, and linear algebra is not that hard. Once you spend a day studying it, you'll think "Is that all there is to it?" The only difficult part of AI is learning PyTorch, since all the AI papers are written in terms of Python nowadays instead of -- you know -- math.)

I've been building neural net systems since the late 1980s. And yes they work and they do useful things when you have modern amounts of compute available, but they are not the second coming of $DEITY.


Linear algebra cannot be learned in a day. Maybe multiplying matrices when the dimensions allow but there is far more to linear algebra than knowing how to multiply matrices. Knowing when and why is far more interesting. Knowing how to decompose them. Knowing what a non-singular matrix is and why it’s special and so on. Once you know what’s found in a basic lower devision linear algebra class, one can move it linear programming and learn about cost functions and optimization or numerical analysis. PyTorch is just a calculator. If I handed someone a Ti-84 they wouldn’t magically know how to bust out statistics on it…

> This completely explains why so many engineers are skeptical of AI while so many managers embrace it: The engineers are the ones who understand it.

Curiously some Feynman chap reported that several NASA engineers put the chance of the Challenger going kablooie—an untechnical term for rapid unscheduled deconstruction, which the Challenger had then just recently exhibited—at 1 in 200, or so, while the manager said, after some prevarications—"weaseled" is Feynman's term—that the chance was 1 in 100,000 with 100% confidence.


I mostly disagree with this. Lots of things correlate weakly with other things, often in confusing and overlapping ways. For instance, expertise can also correlate with resistance to change. Ego can correlate with protection of the status quo and dismissal of people who don't have the "right" credentials. Love of craft can correlate with distaste for automation of said craft (regardless of the effectiveness of the automation). Threat to personal financial stability can correlate with resistance (regardless of technical merit). Potential for personal profit can correlate with support (regardless of technical merit). Understanding neural nets can correlate both with exuberance and skepticism in slightly different populations.

Correlations are interesting but when examined only individually they are not nearly as meaningful as they might seem. Which one you latch onto as "the truth" probably says more about what tribe you value or want to be part of than anything fundamental about technology or society or people in general.


I think there is a correlation between when you can you expect from something when I know their internals vs someone that doesn’t know but is not like who knows internals is much much better.

Example: many people created websites without a clue of how they really work. And got millions of people on it. Or had crazy ideas to do things with them.

At the same time there are devs that know how internals work but can’t get 1 user.

pc manufacturers never were able to even imagine what random people were able to do with their pc.

This to say that even if you know internals you can claim you know better, but doesn’t mean it’s absolute.

Sometimes knowing the fundamentals it’s a limitation. Will limit your imagination.


I'm a big fan of the concept of 初心 (Japanese: Shoshin aka "beginners mind" [0] ) and largely agree with Sazuki's famous quote:

> “In the beginner’s mind there are many possibilities, but in the expert’s there are few”

Experts do tend to be limited in what they see as possible. But I don't think that allows carte blanche belief that a fancy Markov Chain will let you transcend humanity. I would argue one of the key concepts of "beginners mind" is not radical assurance in what's possible but unbounded curiosity and willingness to explore with an open mind. Right now we see this in the Stable Diffusion community: there are tons of people who also don't understand matrix multiplication that are doing incredible work through pure experimentation. There's a huge gap between "I wonder what will happen if I just mix these models together" and "we're just a few years from surrendering our will to AI". None of the people I'm concerned about have what I would consider an "open mind" about the topic of AI. They are sure of what they know and to disagree is to invite complete rejection. Hardly a principle of beginners mind.

Additionally:

> pc manufacturers never were able to even imagine what random people were able to do with their pc.

Belies a deep ignorance of the history of personal computing. Honestly, I don't think modern computing has still ever returned to the ambition of what was being dreampt up, by experts, at Xerox PARC. The demos on the Xerox Alto in the early 1970s are still ambitious in some senses. And, as much as I'm not a huge fan, Gates and Jobs absolutely had grand visions for what the PC would be.

0. https://en.wikipedia.org/wiki/Shoshin


I think this is what is blunted by mass education and most textbooks. We need to discover it again if we want to enjoy our profession with all the signals flowing from social media about all the great things other people are achieving. Staying stupid and hungry really helps.

I think this is more about mechanistic understanding vs fundamental insight kind of situation. The linear algebra picture is currently very mechanistic since it only tells us what the computations are. There are research groups trying to go beyond that but the insight from these efforts are currently very limited. However, the probabilistic view is very much clearer. You can have many explorable insights, both potentially true and false, by jıst understanding the loss functions, what the model is sampling from, what is the marginal or conditional distributions are and so on. Generative AI models are beautiful at that level. It is truly mind blowing that in 2025, we are able to sample from the megapixel image distributions conditioned on the NLP text prompts.

If were true then people could predict this AI many years ago

If you dig ml/vision papers from old, you will see that formulation-wise they actually did, but they lacked the data, compute, and the mechanistic machinery provided by the transformer architecture. The wheels of progress are slow and requires many rotations to finally reach somewhere.

It's definitely interesting to look at people's mental models around AI.

I don't know shit about the math that makes it work, but my mental model is basically - "A LLM is an additional tool in my toolbox which performs summarization, classification and text transformation tasks for me imperfectly, but overall pretty well."

Probably lots of flaws in that model but I just try to think like an engineer who's attempting to get a job done and staying up to date on his tooling.

But as you say there are people who have been fooled by the "AI" angle of all this, and they think they're witnessing the birth of a machine god or something. The example that really makes me throw up my hands is r/MyBoyfriendIsAI where you have women agreeing to marry the LLM and other nonsense that is unfathomable to the mentally well.

There's always been a subset of humans who believe unimaginably stupid things, like that there's a guy in the sky who throws lightning bolts when he's angry, or whatever. The interesting (as in frightening) trend in modernity is that instead of these moron cults forming around natural phenomena we're increasingly forming them around things that are human made. Sometimes we form them around the state and human leaders, increasingly we're forming them around technologies, in line with Arthur C. Clarke's third law - that "Any sufficiently advanced technology is indistinguishable from magic."

If I sound harsh it's because I am, we don't want these moron cults to win, the outcome would be terrible, some high tech version of the Dark Ages. Yet at this moment we have business and political leaders and countless run-of-the-mill tech world grifters who are leaning into the moron cult version of AI rather than encouraging people to just see it as another tool in the box.


In all seriousness, this comment really makes me want to try out Zig!

You want a language that releases a compiler on a specific platform then intentionally breaks it for everyone on something trivial just to troll and irritate them?

I like a language that aggressively discourages writing code in Notepad on Windows.

Every text editor on windows adds a carriage return by default.

You haven't given any actual reasons this makes sense, if you don't like windows why would you be using it in the first place? Why would you care what text editor people use?

Why would it be ok to release something on a platform just to annoy your own users?


Last I checked even Apple migrated to LF. Perhaps it's time for Windows to stop being the odd man out? Regardless:

  not work with every windows text editor
Last I checked both Visual Studio Code and Notepad++ can both make line endings configurable. That covers a plurality of use cases. Even the built-in Notepad supports using CR or LF only for going on eight years now.

Perhaps it's time for Windows to stop being the odd man out?

This is the same nonsense rationalizations that zig gave. Windows is the odd man out. If you want to release something on windows you match an extra byte on the ends of lines. It isn't that hard and even the simplest toy language does it. It's just part of line splitting, it isn't even something that happens at the language stage.

Last I checked both Visual Studio Code and Notepad++ can both make line endings configurable.

Last time I checked it was totally unnecessary because no other language releases for a platform and tries to punish their users. Options like that are to make files match while being worked on for different platforms, not so that a compiler doesn't try to punish and troll its users for using it.


> This is the same nonsense rationalizations that zig gave.

I'm guessing you didn't live through the early days of webdev when you had to jump through ridiculous hoops just to support IE. At least back then there was the excuse that IE had the lions share of the market and many corporate users.

The industry wide acceptance of supporting IE majorly held back what websites/apps were capable of being. Around 2012ish (right as I was leaving webdev) more and more major teams started to stop supporting earlier broken versions of IE (this was largely empowered by the rising popularity of Chrome). This had a major impact on improving the state of web applications, and also got MS to seriously improve their web browser. Moves like this one by the Zig team are the only way you're going to push Microsoft to improve the situation.

Now you may claim "but Windows is 70% of users!" but this issue doesn't impact anyone wanting to run Zig applications, only those writing them. If you're an inexperienced dev that's super curious about Zig, this type of error might be another good nudge that maybe Windows isn't the OS you want to be working on.


Now you may claim "but Windows is 70% of users!" but this issue doesn't impact anyone wanting to run Zig applications, only those writing them.

No one is confused about how a compiler works. Those people being intentionally trolled are called your users when you make a language.

If you're an inexperienced dev that's super curious about Zig, this type of error might be another good nudge that maybe Windows isn't the OS you want to be working on.m

Then why did they make a windows version? Any normal person just sees that they shouldn't invest time in a language intentionally annoying it's own users for trying it out.

You still haven't come up with any explanation, your whole tangent about internet explorer has no relevance. There isn't one part of your comment that makes sense. Why would you even care about other people's OS and text editors? What kind of fanaticism would lead to wanting to use a language because they intentionally annoy users of something you aren't even involved in?

The whole thing is basically a case of "this things doesn't stand on any merits, I've just decided that I don't like certain people and they did something to upset them even though they are really just shooting themselves in the foot".


  If you want to release something on windows you match an extra byte on the ends of lines
Did I miss some sort of formal directive from Microsoft or is this just outrage that someone dared do something not up to your standards?

  try to punish and troll its users for using it
Nobody's being punished. Configuring your dev environment is something people do for every language. Let's add some perspective here: we're talking about a single runtime option for your text editor of choice. BFD. More to the point, why isn't your editor or IDE properly supporting Zig files?

Did I miss some sort of formal directive from Microsoft or is this just outrage that someone dared do something not up to your standards?

It's just the way it works, it isn't my standards, it is literally any piece of software that detects line breaks.

Nobody's being punished. Configuring your dev environment is something people do for every language.

No one has to configure around this issue because it is trivially solved and dealt with by every piece of software on the planet. It takes longer to write an error message than it does it just split a line correctly.

Let's add some perspective here: we're talking about a single runtime option for your text editor of choice.

Let's add some perspective here: they intentionally broke their own software to upset 72% of their potential users.

More to the point, why isn't your editor or IDE properly supporting Zig files?

No one has to care about zig, it's a niche language that doesn't care about its users, it's irrelevant except for hacker news threads.

If some language started demanding you save all your text files with carriage returns or will will error out, what would you think?

You sound like a lawyer grasping at straws instead of someone with a reasonable perspective that wouldn't be hypocritical when flipped around.


  You sound like a lawyer grasping at straws instead of 
  someone with a reasonable perspective that wouldn't be
  hypocritical when flipped around.
What lawyer speak? You're throwing a temper tantrum over a situation entirely of your own making. That there's a Windows port of Zig and sufficient users to justify its continued existence pretty clearly shows your hyperbole isn't representative in the way you claim.

Were I in a situation where I needed to work with something not expecting LF line termination I'd either configure my dev environment appropriately or find tools that do what I want.

  No one has to care about zig, it's a niche language that doesn't
  care about its users, it's irrelevant except for hacker news threads.
So when it's your tool selection nobody has to care? But when someone else makes a decision you disagree with it's the end of the world? Gotcha. Don't check that checkbox. Stay mad, bro.

it's the end of the world?

You didn't confront anything I wrote and instead just made up something no one said. All I did say was that zig is intentionally hostile to their own users, which they are.

If you could actually deal with what I wrote I think you would have done it already.


From where I'm sitting it seems like it's time for you to take a break from this thread.

I guess we're at the "claim the other person is upset to avoid what they said" (and edit posts) part of the conversation.

I can think of nothing more peak HN than criticizing a company worth $282 Billion with $6 billion in profit (for startup kids that means they have infinite runway and then some) that has existed for over 100 years with "I'm not even sure what they do these days". I mean the problem could be with IBM... what a loser company!

:) As much I love ragging on ridiculous HN comments, I think this one is rooted in some sensibility.

IBM doesn’t majorly market themselves to consumers. The overwhelming majority of devs just aren’t part of the demographic IBM intends to capture.

It’s no surprise people don’t know what they do. To be honest it does surprise me they’re such a strongly successful company, as little as I’ve knowingly encountered them over my career.


I think you're hallucinating this scenario. There is no contradiction with a company making money and someone not understanding what they do.

I love R, but how can you make that claim when R uses three distinct object-oriented systems all at the same time? R might seem stable only because it carries along with it 50 years of history of programming languages (part of it's charm, where else can you see the generic function approach to OOP in a language that's still evolving?)

Finally, as someone who wrote a lot of R pre-tidyverse, I've seen the entire ecosystem radically change over my career.


Pandas is generally awful unless you're just living in a notebook (and even then it's probably least favorite implementation of the 'data frame' concept).

Since Pandas lacks Polars' concept of an Expression, it's actually quite challenging to programmatically interact with non-trivial Pandas queries. In Polars the query logic can be entirely independent of the data frame while still referencing specific columns of the data frame. This makes Polars data frames work much more naturally with typical programming abstractions.

Pandas multi-index is a bad idea in nearly all contexts other than it's original use case: financial time series (and I'll admit, if you're working with purely financial time series, then Pandas feels much better). Sufficiently large Pandas code bases are littered with seemingly arbitrary uses of 'reset_index', there are many times where multi-index will create bugs, and, most important, I've never seen any non-financial scenario where anyone has ever used Multi-index to their advantage.

Finally Pandas is slow, which is honestly the least priority for me personally, but using Polars is so refreshing.

What other data frames have you used? Having used R's native dataframes extensively (the way they make use of indexing is so much nicer) in addition to Polars both are drastically preferable to Pandas. My experience is that most people use Pandas because it has been the only data frame implementation in Python. But personally I'd rather just not use data frames if I'm forced to used Pandas. Could you expand on what you like about Pandas over other data frames models you've worked with?


I initially considered using Pandas to work with community collections of Elite: Dangerous game data, specifically those published first by EDDB (RIP) and now by Spansh. However, I quickly hit the maximum process memory limits because my naïve attempts at manipulating even the smallest of those collections resulted in Pandas loading GB-scale JSON data files into RAM. I'm intrigued by Polars stated support for data streaming. More professionally, I support the work of bioinformaticians, statisticians, and data scientists, so I like to stay informed.

I like how in Pandas (and in R), I can quickly load data sets up in a manner that lets me do relational queries using familiar syntax. For my Elite: Dangerous project, because I couldn't get Pandas to work for me (which the reader should chalk up to my ignorance and not any deficiency of Pandas itself), I ended up using the SQLAlchemy ORM with Marshmallow to load the data into SQLite or PostgreSQL. Looking back at the work, I probably ought to have thrown it into a JSON-aware data warehouse somehow, which I think is how the guy behind Spansh does it, but I'm not a big data guy (yet) and have a lot to learn about what's possible.


You can assert whatever you want, but Polars is a great answer. The performance improvements are secondary to me compared to the dramatic improvement in interface.

Today all serious DS work will ultimately become data engineering work anyway. The time when DS can just fiddle around in notebooks all day has passed.


Pandas is widely adopted and deeply integrated into the Python ecosystem. Meanwhile, Polars remains a small niche, and it's one of those hype technologies that will likely be dead in 3 years once most of its users realise that it offers them no actual practical advantages over Pandas.

If you are dealing with huge data sets, you are probably using Spark or something like Dask already where jobs can run in the cloud. If you need speed and efficiency on your local machine, you use NumPy outright. And if you really, really need speed, you rewrite it in C/C++.

Polars is trying to solve an issue that just doesn't exist for the vast majority of users.


Arguably Spark solves a problem that does not exist anymore: single node performance with tools like DuckDB and Polars is so good that there’s no need for more complex orchestration anymore, and these tools are sufficiently user-friendly that there is little point to switching to Pandas for smaller datasets.

> Pandas is widely adopted and deeply integrated into the Python ecosystem.

This is pretty laughable. Yes there are very DS specific tools that make good use of Pandas, but `to_pandas` in Polars trivially solves this. The fact that Pandas always feels like injecting some weird DSL into existing Python code bases is one of the major reasons why I really don't like it.

> If you are dealing with huge data sets, you are probably using Spark or something like Dask already where jobs can run in the cloud. If you need speed and efficiency on your local machine, you use NumPy outright. And if you really, really need speed, you rewrite it in C/C++.

Have you used Polars at all? Or for that matter written significant Pandas outside of a notebook? The number one benefit of Polars, imho, is that Polars works using Expressions that allow you to trivially compose and reuse fundamental logic when working with data in a way the works well with other Python code. This solves the biggest problem with Pandas is that it does not abstract well.

Not to mention that Pandas is really poor dataframe experience outside of it's original use case which was financial time series. The entire multi-index experience is awful and I know that either you are calling 'reset_index' multiple times in your Pandas logic or you have bugs.


> once most of its users realise that it offers them no actual practical advantages over Pandas

What? Speed and better nested data support (arrays/JSON) alone are extremely useful to every data scientist.

My produtivity skyrocketed after switching from pandas to polars.


>Today DS work will ultimately become data engineering work anyway.

Oh yeah? Well in my ivory tower the work stops being serious once it becomes engineering, how do you like that elitism?!


"Data Science" has never been related to academic research, it has always emerged in a business context. I wouldn't say that researchers at Deep Mind are "data scientists", they are academic researchers who focus on shipping papers. If you're in a pure research environment, nobody cares if you write everything in Matlab.

But the last startup I was at tried to take a similar approach to research was unable to ship a functioning product and will likely disappear in a year from now. FAIR has been largely disbanded in favor of the way more shipping-centric MSL, and the people I know at Deep Mind are increasingly finding themselves under pressure to actually produce things.

Since you've been hanging out in an ivory tower then you might be unaware that during the peek DS frenzy (2016-2019) there were companies where data scientists were allowed to live entirely in notebooks and it was someone else's problem to ship their notebooks. Today if you have that expectation you won't last long at most companies, if you can even find a job in the first place.

On top of that, I know quite a few people at the major LLM teams and, based on my conversations, all of them are doing pretty serious data engineering work to get things shipped even if they were hired for there modeling expertise. It's honestly hard to even run serious experiments at the scale of modern day LLMs without being pretty proficient at data engineering related tasks.


I suspect you’re misjudging the friend here. This sounds more like the famous “no brown m&ms” clause in the Van Halen performance contract. As ridiculous as the request is, it being followed provides strong evidence that the rest (and more meaningful) of the requests are.

Sounds like the friend understands quite well how LLMs actually work and has found a clever way to be signaled when it’s starting to go off the rails.


It's also a common tactic for filtering inbound email.

Mention that people may optionally include some word like 'orange' in the subject line to tell you they've come via some place like your blog or whatever it may be, and have read at least carefully enough to notice this.

Of course ironically that trick's probably trivially broken now because of use of LLMs in spam. But the point stands, it's an old trick.


Apart from the fact that not even every human would read this and add it to the subject, this would still work.

I doubt there is any spam machine out there the quickly tries to find peoples personal blog before sending them viagra mail.

If you are being targeted personally, then of course all bets are off, but that would’ve been the case with or without the subject-line-trick


It's not so much a case of personal targeting or anything particularly deliberate.

LLMs are trained on the full internet. All relevant information gets compressed in the weights.

If your email and this instruction are linked on your site, that goes in there, and the LLM may with some probability decide it's appropriate to use it at inference time.

That's why 'tricks' like this may get broken to some degree by LLM spam, and trivially when they do, with no special effort on the spammer's part. It's all baked into the model.

What previously would have involved a degree of targeting that wouldn't scale now will not.


Could try asking for a seahorse emoji in addition…

> I suspect you’re misjudging the friend here. This sounds more like the famous “no brown m&ms” clause in the Van Halen performance contract. As ridiculous as the request is, it being followed provides strong evidence that the rest (and more meaningful) of the requests are.

I'd argue, it's more like you've bought so much into the idea this is reasonable, that you're also willing to go through extreme lengths to recon and pretend like this is sane.

Imagine two different worlds, one where the tools that engineers use, have a clear, and reasonable way to detect and determine if the generative subsystem is still on the rails provided by the controller.

And another world where the interface is completely devoid of any sort of basic introspection interface, and because it's a problematic mess, all the way down, everyone invents some asinine way that they believe provides some sort of signal as to whether or not the random noise generator has gone off the rails.

> Sounds like the friend understands quite well how LLMs actually work and has found a clever way to be signaled when it’s starting to go off the rails.

My point is that while it's a cute hack, if you step back and compare it objectively, to what good engineering would look like. It's wild so many people are all just willing to accept this interface as "functional" because it means they don't have to do the thinking that required to emit the output the AI is able to, via the specific randomness function used.

Imagine these two worlds actually do exist; and instead of using the real interface that provides a clear bool answer to "the generative system has gone off the rails" they *want* to be called Mr Tinkerberry

Which world do you think this example lives in? You could convince me, Mr Tinkleberry is a cute example of the latter, obviously... but it'd take effort to convince me that this reality is half reasonable or that's it's reasonable that people who would want to call themselves engineers should feel proud to be a part of this one.

Before you try to strawman my argument, this isn't a gatekeeping argument. It's only a critical take on the interface options we have to understand something that might as well be magic, because that serves the snakeoil sales much better.

> > Is the magic token machine working?

> Fuck I have no idea dude, ask it to call you a funny name, if it forgets the funny name it's probably broken, and you need to reset it

Yes, I enjoy working with these people and living in this world.


It is kind of wild that not that long ago the general sentiment in software engineering (at least as observed on boards like this one) seemed to be about valuing systems that were understandable, introspectable, with tight feedback loops, within which we could compose layers of abstractions in meaningful and predictable ways (see for example the hugely popular - at the time - works of Chris Granger, Bret Victor, etc).

And now we've made a complete 180 and people are getting excited about proprietary black boxes and "vibe engineering" where you have to pretend like the computer is some amnesic schizophrenic being that you have to coerce into maybe doing your work for you, but you're never really sure whether it's working or not because who wants to read 8000 line code diffs every time you ask them to change something. And never mind if your feedback loops are multiple minutes long because you're waiting on some agent to execute some complex network+GPU bound workflow.


You don’t think people are trying very hard to understand LLMs? We recognize the value of interpretability. It is just not an easy task.

It’s not the first time in human history that our ability to create things has exceeded our capacity to understand.


> You don’t think people are trying very hard to understand LLMs? We recognize the value of interpretability. It is just not an easy task.

I think you're arguing against a tangential position to both me, and the person this directly replies to. It can be hard to use and understand something, but if you have a magic box that you can't tell if it's working. It doesn't belong anywhere near the systems that other humans use. The people that use the code you're about to commit to whatever repo you're generating code for, all deserve better than to be part of your unethical science experiment.

> It’s not the first time in human history that our ability to create things has exceeded our capacity to understand.

I don't agree this is a correct interpretation of the current state of generative transformer based AI. But even if you wanted to try to convince me; my point would still be, this belongs in a research lab, not anywhere near prod. And that wouldn't be a controversial idea in the industry.


We used the steam engine for 100 years before we had a firm understanding of why it worked. We still don’t understand how ice skating works. We don’t have a physical understanding of semi-fluid flow in grain silos, but we’ve been using them since prehistory.

I could go on and on. The world around you is full of not well understood technology, as well as non deterministic processes. We know how to engineer around that.


> We used the steam engine for 100 years before we had a firm understanding of why it worked. We still don’t understand how ice skating works. We don’t have a physical understanding of semi-fluid flow in grain silos, but we’ve been using them since prehistory.

I don't think you and I are using the same definition for "firm understanding" or "how it works".

> I could go on and on. The world around you is full of not well understood technology, as well as non deterministic processes. We know how to engineer around that.

Again, you're side stepping my argument so you can restate things that are technically correct, but not really a point in of themselves. I see people who want to call themselves software engineers throw code they clearly don't understand against the wall because the AI said so. There's a significant delta between knowing you can heat water to turn it into a gas with increased pressure that you can use to mechanically turn a wheel, vs, put wet liquid in jar, light fire, get magic spinny thing. If jar doesn't call you a funny name first, that's bad!


> I don't think you and I are using the same definition for "firm understanding" or "how it works".

I’m standing in firm ground here. Debate me in the details if you like.

You are constructing a strawman.


> It doesn't belong anywhere near the systems that other humans use

Really for those of us who actually work in critical systems (emergency services in my case) - of course we're not going to start patching the core applications with vibe code.

But yeah, that frankenstein reporting script that half a dozen amateur hackers made a mess of over 20 years instead of refactoring and redesigning? That's prime fodder for this stuff. NOBODY wants to clean that stuff up by hand.


> Really for those of us who actually work in critical systems (emergency services in my case) - of course we're not going to start patching the core applications with vibe code.

I used to believe that no one would seriously consider this too... but I don't believe that this is a safe assumption anymore. You might be the exception, but there are many more people who don't consider the implications of turning over said intellectual control.

> But yeah, that frankenstein reporting script that half a dozen amateur hackers made a mess of over 20 years instead of refactoring and redesigning? That's prime fodder for this stuff. NOBODY wants to clean that stuff up by hand.

It's horrible, no one currently understands it, so let the AI do it, so that still, no one will understand it, but at least this one bug will be harder to trigger.

I don't agree that harder to trigger bugs are better than easy to trigger bugs. And from my view, the argument that "it's currently broken now, and hard to fix!" Isn't exactly an argument I find compelling for leaving it that way.


> I used to believe that no one would seriously consider this too... but I don't believe that this is a safe assumption anymore. You might be the exception, but there are many more people who don't consider the implications of turning over said intellectual control.

Then they'll pay for it when something goes wrong with their systems with their job etc. You need a different mindset in this particular segment industry - %99.999 uptime is everything (we actually have a %100 uptime for the past 6 years on our platform - chasing that last 0.001 is hard, and something will _eventually_ hit us).

> It's horrible, no one currently understands it, so let the AI do it, so that still, no one will understand it, but at least this one bug will be harder to trigger.

I think you're commenting without context. It's a particular nasty Perl script that's been duct taped to shell scripts and bolted hard on to a Proprietary Third Party application which needs to go - having Claude/GPT rewrite that in a modern language, spending some time on it to have it design proper interfaces and API's around where the script needs to interface other things when nobody wants to touch the code would be the greatest thing that can happen to it.

You still have the old code to test, so have the agent run exhaustive testing on its implementation to prove that its robust, or more so than the original. It's not rocket surgery.


Your comment would be more useful if you could point us to some concrete tooling that’s been built out in the last ~3 years that LLM assisted coding has been around to improve interpretability.

That would be the exact opposite of my claim: it is a very hard problem.

This reads like you either have an idealized view of Real Engineering™, or used to work in a stable, extremely regulated area (e.g. civil engineering). I used to work in aerospace in the past, and we had a lot of silly Mr Tinkleberry canaries. We didn't strictly rely on them because our job was "extremely regulated" to put it mildly, but they did save us some time.

There's a ton of pretty stable engineering subfields that involve a lot more intuition than rigor. A lot of things in EE are like that. Anything novel as well. That's how steam in 19th century or aeronautics in the early 20th century felt. Or rocketry in 1950s, for that matter. There's no need to be upset with the fact that some people want to hack explosive stuff together before it becomes a predictable glacier of Real Engineering.


> There's no need to be upset with the fact that some people want to hack explosive stuff together before it becomes a predictable glacier of Real Engineering.

You misunderstand me. I'm not upset that people are playing with explosives. I'm upset that my industry is playing with explosives that all read, "front: face towards users"

And then, more upset that we're all seemingly ok with that.

The driving force of enshittifacation of everything, may be external, but degradation clearly comes from engineers first. These broader industry trends only convince me it's not likely to get better anytime soon, and I don't like how everything is user hostile.


Man I hate this kind of HN comment that makes grand sweeping statement like “that’s how it was with steam in the 19th century or rocketry in the 1950s”, because there’s no way to tell whether you’re just pulling these things out of your… to get internet points or actually have insightful parallels to make.

Could you please elaborate with concrete examples on how aeronautics in the 20th century felt like having a fictional friend in a text file for the token predictor?


We're not going to advance the discussion this way. I also hate this kind of HN comment that makes grand sweeping statement like "LLMs are like having a fictional friend in a text file for the token predictor", because there's no way to tell whether you're just pulling these things out of your... to get internet points or actually have insightful parallels to make.

Yes, during the Wright era aeronautics was absolutely dominated by tinkering, before the aerodynamics was figured out. It wouldn't pass the high standard of Real Engineering.


> Yes, during the Wright era aeronautics was absolutely dominated by tinkering, before the aerodynamics was figured out. It wouldn't pass the high standard of Real Engineering.

Remind me: did the Wright brothers start selling tickets to individuals telling them it was completely safe? Was step 2 of their research building a large passenger plane?

I originally wanted to avoid that specific flight analogy, because it felt a bit too reductive. But while we're being reductive, how about medicine too; the first smallpox vaccine was absolutely not well understood... would that origin story pass ethical review today? What do you think the pragmatics would be if the medical profession encouraged that specific kind of behavior?

> It wouldn't pass the high standard of Real Engineering.

I disagree, I think it 100% is really engineering. Engineering at it's most basic is tricking physics into doing what you want. There's no more perfect example of that than heavier than air flight. But there's a critical difference between engineering research, and experimenting on unwitting people. I don't think users need to know how the sausage is made. That counts equally to planes, bridges, medicine, and code. But the professionals absolutely must. It's disappointing watching the industry I'm a part of willingly eschew understanding to avoid a bit of effort. Such a thing is considered malpractice in "real professions".

Ideally neither of you to wring your hands about the flavor or form of the argument, or poke fun at the gamified comment thread. But if you're gonna complain about adding positively to the discussion, try to add something to it along with the complaints?


As a matter of fact, commercial passenger service started almost immediately as the tech was out of the fiction phase. The airship were large, highly experimental, barely controllable, hydrogen-filled death traps that were marketed as luxurious and safe. First airliners also appeared with big engines and large planes (WWI disrupted this a bit). Nothing of that was built on solid grounds. The adoption was only constrained by the industrial capacity and cost. Most large aircraft were more or less experimental up until the 50's, and aviation in general was unreliable until about 80's.

I would say that right from the start everyone was pretty well aware about the unreliability of LLM-assisted coding and nobody was experimenting on unwitting people or forcing them to adopt it.

>Engineering at it's most basic is tricking physics into doing what you want.

Very well, then Mr Tinkleberry also passes the bar because it's exactly such a trick. That it irks you as a cheap hack that lacks rigor (which it does) is another matter.


> As a matter of fact, commercial passenger service started almost immediately as the tech was out of the fiction phase. The airship were large, highly experimental, barely controllable, hydrogen-filled death traps that were marketed as luxurious and safe.

And here, you've stumbled onto the exact thing I'm objecting to. I think the Hindenburg disaster was a bad thing, and software engineering shouldn't repeat those mistakes.

> Very well, then Mr Tinkleberry also passes the bar because it's exactly such a trick. That it irks you as a cheap hack that lacks rigor (which it does) is another matter.

Yes, this is what I said.

> there's a critical difference between engineering research, and experimenting on unwitting people.

I object to watching developers do, exactly that.


I use agents almost all day and I do way more thinking than I used to, this is why I’m now more productive. There is little thinking required to produce output, typing requires very little thinking. The thinking is all in the planning… If the LLM output is bad in any given file I simply step in and modify it, and obviously this is much faster than typing every character.

I’m spending more time planning and my planning is more comprehensive than it used to be. I’m spending less time producing output, my output is more plentiful and of equal quality. No generated code goes into my commits without me reviewing it. Where exactly is the problem here?


It feels like you’re blaming the AI engineers here, that they built it this way out of ignorance or something. Look into interpretability research. It is a hard problem!

I am blaming the developers who use AI because they're willing to sacrifice intellectual control in trade for something that I find has minimal value.

I agree it's likely to be a complex or intractable problem. But I don't enjoy watching my industry revert down the professionalism scale. Professionals don't choose tools that they can't explain how it works. If your solution to understanding if your tool is still functional is inventing an amusing name and trying to use that as the heuristic, because you have no better way to determine if it's still working correctly. That feels like it might be a problem, no?


I’m sorry you don’t like it. But this has very strong old-man-yells-at-cloud vibes. This train is moving, whether you want it to or not.

Professionals use tools that work, whether they know why it works is of little consequence. It took 100 years to explain the steam engine. That didn’t stop us from making factories and railroads.


> It took 100 years to explain the steam engine. That didn’t stop us from making factories and railroads.

You keep saying this, why do you believe it so strongly? Because I don't believe this is true. Why do you?

And then, even assuming it's completely true exactly as stated; shouldn't we have higher standards than that when dealing with things that people interact with? Boiler explosions are bad right? And we should do everything we can to prove stuff works the way we want and expect? Do you think AI, as it's currently commonly used, helps do that?


Because I’m trained as a physicist and (non-software) engineer and I know my field’s history? Here’s the first result that comes up on Google. Seems accurate from a quick skim: https://www.ageofinvention.xyz/p/age-of-invention-why-wasnt-...

And yes we should seek to understand new inventions. Which we are doing right now, in the form of interpretability research.

We should not be making Luddite calls to halt progress simply because our analytic capabilities haven’t caught up to our progress in engineering.


Can you cite a section from this very long page that might convince me no one at the time understood how turning water into steam worked to create pressure?

If this is your industry, shouldn't you have a more reputable citation, maybe something published more formally? Something expected to stand up to peer review, instead of just a page on the internet?

> We should not be making Luddite calls to halt progress simply because our analytic capabilities haven’t caught up to our progress in engineering.

You've misunderstood my argument. I'm not making a luddite call to halt progress, I'm objecting to my industry which should behave as one made up of professionals, willingly sacrifice intellectual control over the things they are responsible for, and advocate others should do the same. Especially not at the expense of users, which I see happening.

Anything that results in sacrificing the understanding over exactly how the thing you built works is bad should be avoided. The source, either AI or something different, doesn't matter as much as the result.


The steam engine is more than just boiling water. It is a thermodynamic cycle that exploits differences in the pressure curve in the expansion and contraction part of the cycle and the cooling of expanding gas to turn a temperature difference (the steam) into physical force (work).

To really understand WHY a steam engine works, you need to understand the behavior of ideal gasses (1787 - 1834) and entropy (1865). The ideal gas law is enough to perform calculations needed to design a steam engine, but it was seen at the time to be just as inscrutable. It was an empirical observation not derivable from physical principles. At least not until entropy was understood in 1865.

James Watt invented his steam engine in 1765, exactly a hundred years before the theory of statistical mechanics that was required to explain why it worked, and prior to all of the gas laws except Boyle’s.


This could be a very niche standup comedy routine, I approve.

For me (and most of my friends/coworkers) the point of AoC was to write in some language that you always wanted to learn but never had the chance. The AoC problems tend to be excellent material for a crash course in a new PL because they cover a range of common programming tasks.

Historically good candidates are:

- Rust (despite it's popularity, I know a lot of devs who haven't had time to play with it).

- Haskell (though today I'd try Lean4)

- Racket/Common Lisp/Other scheme lisp you haven't tried

- Erlang/Elixir (probably my choice this year)

- Prolog

Especially for those langs that people typically dabble in but never get a change to write non-trivial software in (Haskell, Prolog, Racket) AoC is fantastic for really getting a feel for the language.


Yes, this year I'm going for Lean 4: https://github.com/ngrislain/lean-adventofcode-2025

It's a great language. It's dependent-types / theorem-proving-oriented type-system combined with AI assistants makes it the language of the future IMO.


Isn't the whole point of AoC to NOT use AI? Even says so in the FAQ

Yes, I'm doing it without AI to learn the language, nonetheless I do think that Lean 4 + AI is a super-powerful combination.

Like with the leader board. People do it to score points, not to learn. Hence, cheating.

It always seemed odd to me that a persistent minority of HN readers seem to have no interest in recreational programming/technical problems solving and perpetually ask "why should I care?"

It's totally fine not to care, but I can't quite get why you would then want to be an active member in a community of people who care about this stuff for no other reason than they fundamentally find it interesting.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: