At AWS engineering scale they can absolutely figure it out if they have the slightest interest in doing so. I've heard all the excuses — they all suck.
Businesses with lawyers and stuff can afford to negotiate with AWS etc. when things go wrong. Individuals who want to upskill on AWS to improve their job prospects have to roll the dice on AWS maybe bankrupting them. AWS actively encourages developers to put themselves in this position.
I don't know if AWS should be regulated into providing spending controls. But if they don't choose to provide spending controls of their own accord, I'll continue to call them out for being grossly irresponsible, because they are.
I like this approach, because it lets everything Just Work without anyone outside of the implementation having to think about it at all, and only incurs meaningful overhead if you're doing something really silly, but crucially still won't break — it'll just be slightly slow. That feels like the right trade-off.
Not quite, I think. Some kinds of redundancy are good, and some are bad. Good redundancy tends to reduce mistakes rather than introduce them. E.g. there's lots of redundancy in natural languages, and it helps resolve ambiguity and fill in blanks or corruption if you didn't hear something properly. Similarly, a lot of "entropy" in code could be reduced by shortening names, deleting types, etc., but all those things were helping to clarify intent to other humans, thereby reducing mistakes. But some is copy+paste of rules that should be enforce in one place. Teaching a computer to understand the difference is... hard.
Although, if we were to ignore all this for a second, you could also make similar estimates with, e.g., gzip: the higher the compression ratio attained, the more "verbose"/"fluffy" the code is.
Fun tangent: there are a lot of researchers who believe that compression and intelligence are equivalent or at least very tightly linked.
Interpreting this comment, it would predict low complexity for code copied unnecessarily.
I'm not sure though. If it's copied a bunch of times, and it actually doesn't matter because each usecase of the copying is linearly independent, does it matter that it was copied?
Over time, you'd still see copies being changed by themselves show up as increased entropy
I agree, hallucination is a completely different phenomena from inadvertently filling in knowledge gaps.
Hallucinating to me, is not a one off effect, but a dynamic phenomena. When our sensory processing/interpreting continues iterating, but wanders away from, or separates from, actual sensory input.
Dreams being a functional example. Drugs that cause our sensory systems to be disrupted or overwhelmed by unusual internal signals, being another example.
LLM’s operate in discrete steps. So that “interpretation continues iterating” is a very good description of what’s actually happening.
There’s a little uncertainty in the process, so sometimes it will pick the wrong symbol at the wrong time and the entire process just spirals into nonsense. But it’s semi lucid nonsense like a dream or hallucination not line noise.
The confidently stating the wrong thing bit is arguably a different though related problem. There’s making up a citations that don’t exist and there’s inserting song lyrics where a citation should be.
Confabulation: Inadvertent pseudo-memory fill in, within a generally reasonable stable context representation.
Hallucination: Context representation becomes unstable, resulting in open ended drifting and morphing into incoherence.
The scale of error and recursion of error in the latter make the effects quite different.
You could be right, perhaps sometimes the initial causes might not be so different. But I would be surprised if that was true, given there doesn't seem to be much of a middle ground between the wide differences in scale and downstream effect.
You could be right that it’s a fundamental difference in what’s going on, but each word is a new iteration/token and most errors are more than a single token.
I’m assuming it’s like a converging vs diverging series in how well the LLM recovers from problems. Where the boundary between the two states is arbitrarily small, but OpenAI etc has fine tuned the system so you generally see the converging / confabulation types of errors even if the difference is just a slight change in the scale of error.
> I’m assuming it’s like a converging vs diverging series in how well the LLM recovers from problems.
That's a really good point.
Similar causes diverging in effect, a phase change or not, depending on whether a critical feedback threshold is hit or not. That could certainly be a factor.
With increasing doses of some drugs, humans do progress from being just a bit wonky, to full on lost in space!
But I think confabulation might be on a different continuum too. We fill in recalled memories with made up details all the time. Our memory storage is lossy, associative and overlapping. Our recall is always an imperfect combination of actual incidence memory filled out by similar memories.
We must confabulate to recall. Statistically, sometimes we over confabulate.
Hallucinate seems like a good word for image-producing neural nets (which is probably where it’s use originated?). Confabulate might be a better word for talkative LLMs, and less good for images.
I wonder if any neuroscientists or psychologists agree about humans and neural nets being similar this way. It’s less than unlikely that the mechanisms are the same between people and LLMs saying not true things. Aside from there being a wide variety of reasons people fabricate untrue things, we already know the mechanisms for neural net hallucinations and confabulations. It’s a non-self-aware machine designed to output tokens or pixels or whatever, it will turn the crank and spit out something with no concept of whether it’s true or not at all times. People, on the other hand, are often using emotion to drive what they say (and often without knowing it). People will sometimes rationalize their confabulations, sometimes say untrue things based on belief, sometimes say things driven by fear or embarrassment, sometimes lie because they have ulterior motives. None of these things apply to neural nets, so the similarity between human and NN confabulations seems at best limited to superficial summaries, no?
I find this surprising, given that my initial response to reading the iouring design was:
1. This is pretty clean and straightforward.
2. This is obviously what we need to decouple a bunch of things without the previous downsides.
What has made it so hard to integrate it into common language runtimes? Do you have examples of where there's been an irreconcilable "impedance mismatch"?
in the most general form: you need a fairly "loose" memory model to integrate the "best" (performance wise) parts, and the "best" (ease of use/forward looking safety) way to integrate requires C library linkage. This is troublesome in most GC languages, and many managed runtimes. There's also the issue that uring being non-portable means that the things it suggests you must do (such as say pinning a buffer pool and making APIs like read not immediate caller allocates) requires a substantially separate API for this platform than for others, or at least substantial reworks over all the existing POSIX modeled APIs - thus back to what I said originally, we need a replacement for POSIX & BSD here, broadly applied.
I can see how a zero-copy API would be hard to implement on some languages, but you could still implement something on top of io_uring with posix buffer copy semantics , while using batching to decrease syscall overhead.
Zero-copy APIs will necessarily be tricky to implement and use, especially on memory safe languages.
I think most GC languages support native/pinned me(at least Java and C# do memory to support talking to kernel or native libraries.
The APIs are even quite nice.
Java's off-heap memory and memory segment API is quite dreadful and on the slower side. C# otoh gives you easy and cheap object pinning, malloc/free and stack-allocated buffers.
Rust's async model can support io-uring fine, it just has to be a different API based on ownership instead of references. (That's the conclusion of my posts you link to.)
I think if Intel was a private company then it would have a better chance of recovery via a collection of focused experiments to overcome their biggest technical deficiencies (compared to their competitors) or to, e.g., figure out a compelling new product that the market didn't realise it needed, but that doesn't require solving difficult physics problems.
But to do this sort of thing you need a dictator at the top who is willing to risk a run of negative-profit quarters to fix the company's underlying rot. If you try to do anything like that as a leader of a public company, then shareholders tend to get angry.
I wonder if there's any lesson that could be distilled from the (minority of?) public companies that don't end up settling into a pattern of carefully-managed mediocrity. Is there a unifying theme? I haven't spent enough time thinking about this to even propose an answer other than "cult of personality around the leader" as maybe helping.
> But to do this sort of thing you need a dictator at the top who is willing to risk a run of negative-profit quarters to fix the company's underlying rot.
Private companies still have shareholders that the CEO answers to -- who tend to get angry at taking losses quarter after quarter with no clear path to growth.
I find it a bit surprising that this is still a thing. Why is there ever a time when I _must_ reboot? Is it just that mainstream kernels were designed at a time when people had lower expectations around this sort of thing, and now it's too hard to evolve their designs toward something that would allow for zero-downtime patching in ~all cases?
Other examples that make me wonder if it's mostly because people haven't demanded better:
- Enormous updates for all kinds of things (gigabytes for a bug fix release) because differential updates aren't pervasive.
- Windows updates where a huge amount of work is done during the "rebooting" phase (why can't most of this be done before reboot?)
- Absolutely atrocious power management on pretty much anything that's not a MacBook, and even not perfect on those.
I never thought we'd have flying cars by now, but if you asked me a decade ago to predict the future of operating systems... it wouldn't be this.
Not dismissing the usefulness of the project at all, but curious what the concrete benefits of that are -- is it mainly to have a smaller, more auditable bootstrap process to make it easier to avoid "Reflections on Trusting Trust"-type attacks?
It seems like you'd need to trust a C compiler anyway, but I guess the idea is that there are a lot of small C compiler designs that are fairly easy to port?
Let me make a small example that may illustrate the issue.
You can download the NetBSD source tree and compile it with any reasonable c compiler, whether you're running some sort of BSD, macOS or Linux. Some OSes have much older gcc (Red Hat, for instance), some have modern gcc, some have llvm. The source tree first compiles a compiler, which then compiles NetBSD. It's an automatic, easy to understand, easy to audit, two step process that's really nice and clean.
With rust, if you want to compile current rust, you need a pretty modern, up to date rust. You can usually use the last few versions, but you certainly can't use a version of rust that's even a year old. This, to some of us, is ridiculous - the language shouldn't change so much so quickly that something that was brand new a year ago literally can't be used today to compile something current.
If you really want to bootstrap rust from c, you'd have to start with rust from many years ago, compile it, then use it to compile newer rust, then use that to compile even newer rust, perhaps a half a dozen times until you get to today's rust. Again, this is really silly.
There are many of us who'd like to see rust be more directly usable and less dependent on a chain of compilers six levels deep.
That was back in 2018. Today mrustc can bootstrap rustc 1.54.0, but current rustc version is 1.80.1. So if the amount of steps still scales similarly, then today we're probably looking at ~26 rustc compilations to get to current version.
And please read that while keeping in mind how Rust compilation times are.
> It seems like you'd need to trust a C compiler anyway, but I guess the idea is that there are a lot of small C compiler designs that are fairly easy to port?
Sorry but TFA explains it very well how to go from nothing to TinyCC. The author's effort now is to go from TinyCC to Rust.
Right, but I was trying to understand the author's motivation, and this was me handwaving about if it could be about compiler trust. The article discusses bootstrapping but not explicitly why the author cares—is it just a fun exercise (they do mention fascination)? Are they using an obscure architecture where there is no OCaml compiler and so they need the short bootstrap chain? _Is_ it about compiler trust?
(Again since it can come off wrong in text, this was just pure curiosity about the project, not dismissiveness.)
But with the Rust compiler in C the audit path would be much shorter it sounds like, and therefore be more auditable.
Plus OP also wrote in the post that a goal was to be able to bootstrap to Rust without first having to bootstrap to C++, so that other things can be written in Rust earlier on in the process. That could mean more of the foundation of everything being bootstrapped being written in Rust, instead of in C or C++.
What good is being slightly shorter if it us still nowhere remotely close to practical?
Its kind of like saying 100 years is a lot shorter than 200 years. It might be true, but if all the time you have to dedicate is a few hours it really doesnt matter.
I dunno about that; suppose dozer completes its goal, and 1 year later you want to audit the bootstrap chain. Latest Rust probably won't be able to be compiled by it, so you now need to audit what, 6 months of changes to the Rust language? How many months is short enough to handle?
If dozer _does_ keep getting maintained, the situation isn't exactly better either: you instead have to audit the work dozer did to support those 6 months of Rust changes.
Though for me it's less the auditable part, and more that I would be able to build the compiler myself if I wanted, without jumping through so many unnecessary hoops. For the same reason I like having the source code of programs I use, even if most of the time I just use my package manager's signed executable.
And if someone open sources their program, but then the build process is a deliberately convoluted process, then to me that starts to smell like malicious compliance ("it's technically open source"). It's still a gift since I'd get the code either way, so I appreciate that, but my opinion would obviously be different between someone who gives freedoms to users in a seemingly-reluctant way vs someone who gives freedoms to users in an encouraging way.
Businesses with lawyers and stuff can afford to negotiate with AWS etc. when things go wrong. Individuals who want to upskill on AWS to improve their job prospects have to roll the dice on AWS maybe bankrupting them. AWS actively encourages developers to put themselves in this position.
I don't know if AWS should be regulated into providing spending controls. But if they don't choose to provide spending controls of their own accord, I'll continue to call them out for being grossly irresponsible, because they are.