More

pornel · 2025-08-01T10:31:47 1754044307

Bringing the check immediately is associated with fast food, and overcrowded touristy places that are rushing customers to leave. Places that want to be fancy act like you're there to hang out, not to just eat and leave.

It is sometimes absurd. In the UK there's an often an extra step of "oh, you're paying by card? let me go back and bring the card reader". Some places have just one reader shared among all waiting staff, so you're not going to get it faster unless you tip enough to make the staff wrestle for it.

I like the Japanese style the best — there's a cashier by the exit.

pornel · 2025-07-31T22:37:08 1754001428

Even with the best intentions, the implementation is going to have bugs and quirks that weren't meant to be the standard.

When there's no second implementation to compare against, then everything "works". The implementation becomes the spec.

This may seem wonderful at first, but in the long run it makes pages accidentally depend on the bugs, and the bugs become a part of the spec.

This is why Microsoft has a dozen different button styles, and sediment layers of control panels all the way back to 1990. Eventually every bug became a feature, and they can't touch old code, only pile up new stuff around it.

When you have multiple independent implementations, it's very unlikely that all of them will have the same exact bug. The spec is the subset that most implementations agree on, and that's much easier to maintain long term, plus you have a proof that the spec can be reimplemented.

Bug-compatibility very often exposes unintended implementation details, and makes it hard even for the same browser to optimize its own code in the future (e.g. if pages rely on order of items you had in some hashmap, now you can't change the hashmap, can't change the hash function, can't store items in a different data structure without at least maintaining the old hashmap at the same time).

charcircuit · 2025-08-01T01:36:20 1754012180

Is that so bad though? It's essentially what's already the case and as you said the developers already have an incentive to avoid making such bugs. Most developers are only going to target a single browser engine anyways, so bug or not any divergence can cause end users problems.

pornel · 2025-07-31T01:24:11 1753925051

Regulations are like code of a program. It's the business logic of how we want the world to be.

Like all code, it can be buggy, bloated and slow, or it can be well-written and efficiently achieve ambitious things.

If you have crappy unmaintainable code that doesn't work, then deleting it is an obvious improvement.

Like in programming, it takes a lot of skill to write code that achieves its goals in a way that is as simple as possible, but also isn't oversimplified to the point of failing to handle important cases.

The pro-regulation argument isn't for naively piling up more code and more bloat, but for improving and optimizing it.

pornel · 2025-07-29T02:39:51 1753756791

Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

In typical video encoding motion compensation of course isn't derived from real 3D motion vectors, it's merely a heuristic based on optical flow and a bag of tricks, but in principle the actual game's motion vectors could be used to guide video's motion compensation. This is especially true when we're talking about a custom codec, and not reusing the H.264 bitstream format.

Referencing previous frames doesn't add latency, and limiting motion to just displacement of the previous frame would be computationally relatively simple. You'd need some keyframes or gradual refresh to avoid "datamoshing" look persisting on packet loss.

However, the challenge is in encoding the motion precisely enough to make it useful. If it's not aligned with sub-pixel precision it may make textures blurrier and make movement look wobbly almost like PS1 games. It's hard to fix that by encoding the diff, because the diff ends up having high frequencies that don't survive compression. Motion compensation also should be encoded with sharp boundaries between objects, as otherwise it causes shimmering around edges.

CyberDildonics · 2025-07-29T04:23:28 1753763008

Motion vectors in video codecs are an equivalent of a 2D projection of 3D motion vectors.

3D motion vectors always get projected to 2D anyway. They also aren't used for moving blocks of pixels around, they are floating point values that get used along with a depth map to re-rasterize an image with motion blur.

pornel · 2025-07-29T09:02:31 1753779751

They are used for moving pixels around when used in Frame Generation. P-frames in video codecs aim to do exactly the same thing.

Implementation details are quite different, but for reasons unrelated to motion vectors — the video codecs that are established now were designed decades ago, when use of neural networks was in infancy, and the hardware acceleration for NNs was way outside of the budget of HW video decoders.

CyberDildonics · 2025-07-29T15:09:19 1753801759

There is a lot to unpack here.

First, neural networks don't have anything to do with this.

Second, generating a new frame would be optical flow and it always is 2D, there is no 3D involved because it's from a 2D image not a 3D scene.

https://en.wikipedia.org/wiki/Optical_flow https://docs.opencv.org/3.4/d4/dee/tutorial_optical_flow.htm...

Third, optical flow isn't moving blocks of pixels around by an offset then encoding the difference, it is creating a floating point vector for every pixel then re-rasterizing the image into a new one.

pornel · 2025-07-29T17:28:24 1753810104

You've previously emphasised use of blocks in video codecs, as if it was some special distinguishing characteristic, but I wanted to explain that's an implementation detail, and novel video codecs could have different approaches to encoding P-frames. They don't have to code a literal 2D vector per macroblock that "moves pixels around". There are already more sophisticated implementations than that. It's an open problem of reusing previous frames' data to predict the next frame (as a base to minimize the residual), and it could be approached in very different ways, including use of neural networks that predict the motion. I mention NNs to emphasise how different motion compensation can be than just copying pixels on a 2D canvas.

Motion vectors are still motion vectors regardless of how many dimensions they have. You can have per-pixel 3D floating-point motion vectors in a game engine, or you can have 2D-flattened motion vectors in a video codec. They're still vectors, and they still represent motion (or its approximation).

Optical flow is just one possible technique of getting the motion vectors for coding P-frames. Usually video codecs are fed only pixels, so they have no choice but to deduce the motion from the pixels. However, motion estimated via optical flow can be ambiguous (flat surfaces) or incorrect (repeating patterns), or non-physical (e.g. fade-out of a gradient). Poorly estimated motion can cause visible distortions when the residual isn't transmitted with high-enough quality to cover it up.

3D motion vectors from a game engine can be projected into 2D to get the exact motion information that can be used for motion compensation/P-frames in video encoding. Games already use it for TAA, so this is going to be pretty accurate and authoritative motion information, and it completely replaces the need to estimate the motion from the 2D pixels. Dense optical flow is a hard problem, and game engines can give the flow field basically for free.

You've misread what I've said about optical flow earlier. You don't need to give me Wikipedia links, I implement codecs for a living.

CyberDildonics · 2025-07-29T17:57:13 1753811833

The big difference is that if you are recreating an entire image and there isn't going to be any difference information against a reference image you can't move pixels around, you have to get fractional values out of optical flow and move pixels fractional amounts that potentially overlap in some areas and leave gaps in others.

This means rasterization and making a weighted average of moved pixels as points with a kernel with width and height.

Optical flow isn't one technique, it's just a name for getting motion vectors in the first place.

Here is a lecture to help clear it up.

https://www.cs.princeton.edu/courses/archive/fall19/cos429/s...

pornel · 2025-07-29T23:25:44 1753831544

I've started this thread by explaining this very problem, so I don't get why you're trying to lecture me on subpel motion and disocclusion.

What's your point? Your replies seem to be just broadly contrarian and patronizing.

I've continued this discussion assuming that maybe we talk past each other by using the term "motion vectors" in narrower and broader meanings, or maybe you did not believe that the motion vectors that game engines have can be incredibly useful for video encoding.

However, you haven't really communicated your point across. I only see that whenever I describe something in a simplified way, you jump to correct me, while failing to realize that I'm intentionally simplifying for brevity and to avoid unnecessary jargon.

CyberDildonics · 2025-07-29T23:41:16 1753832476

You said they were the same and then talked about motion vectors from 3D objects and neural networks for an unknown reason.

I'm saying that moving pixels and taking differences to a reference image is different from re-rasterizing an image with distortion and no correction.

pornel · 2025-07-28T12:01:24 1753704084

This isn't about smarts, but about performance and memory usage.

Tokens are a form of compression, and working on uncompressed representation would require more memory and more processing power.

amelius · 2025-07-28T12:17:24 1753705044

The opposite is true. Ascii and English are pretty good at compressing. I can say "cat" with just 24 bits. Your average LLM token embedding uses on the order of kilobits internally.

pornel · 2025-07-28T17:02:38 1753722158

You can have "cat" as 1 token, or you can have "c" "a" "t" as 3 tokens.

In either case, the tokens are a necessary part of LLMs. They have to have a differentiable representation in order to be possible to train effectively. High-dimensional embeddings are differentiable and are able to usefully represent "meaning" of a token.

In other words, the representation of "cat" in an LLM must be something that can be gradually nudged towards "kitten", or "print", or "excavator", or other possible meanings. This is doable with the large vector representation, but such operation makes no sense when you try to represent the meaning directly in ASCII.

amelius · 2025-07-28T17:26:16 1753723576

True, but imagine an input that is ASCII, followed by some layers of NN that result in an embedded representation and from there the usual NN layers of your LLM. The first layers can have shared weights (shared between inputs). Thus, let the LLM solve the embedding problem implicitly. Why wouldn't this work? It is much more elegant because the entire design would consist of neural networks, no extra code or data treatment necessary.

mathis · 2025-07-28T21:25:08 1753737908

This might be more pure, but there is nothing to be gained. On the contrary, this would lead to very long sequences for which self-attention scales poorly.

pornel · 2025-07-28T20:50:32 1753735832

The tokens are basically this, a result of precomputing and caching such layers.

blutfink · 2025-07-28T12:37:55 1753706275

The LLM can also “say” “cat” with few bits. Note that the meaning of the word as stored in your brain takes more than 24 bits.

amelius · 2025-07-28T12:55:35 1753707335

No, an LLM really uses __much__ more bits per token.

First, the embedding typically uses thousands of dimensions.

Then, the value along each dimension is represented with a floating point number which will take 16 bits (can be smaller though with higher quantization).

blutfink · 2025-07-28T13:24:53 1753709093

Of course an LLM uses more space internally for a token. But so do humans.

My point was that you compared how the LLM represents a token internally versus how “English” transmits a word. That’s a category error.

amelius · 2025-07-28T13:38:00 1753709880

But humans we can feed ascii, whereas LLMs require token inputs. My original question was about that: why can't we just feed the LLMs ascii, and let it figure out how it wants to encode that internally, __implicitly__? I.e., we just design a network and feed it ascii, as opposed to figuring out an encoding in a separate step and feeding it tokens in that encoding.

cesarb · 2025-07-28T14:03:50 1753711430

> But humans we can feed ascii, whereas LLMs require token inputs.

To be pedantic, we can't feed humans ASCII directly, we have to convert it to images or sounds first.

> My original question was about that: why can't we just feed the LLMs ascii, and let it figure out how it wants to encode that internally, __implicitly__? I.e., we just design a network and feed it ascii, as opposed to figuring out an encoding in a separate step and feeding it tokens in that encoding.

That could be done, by having only 256 tokens, one for each possible byte, plus perhaps a few special-use tokens like "end of sequence". But it would be much less efficient.

amelius · 2025-07-28T16:48:58 1753721338

Why would it be less efficient, if the LLM would convert it to an embedding internally?

cesarb · 2025-07-28T23:36:13 1753745773

Because each byte would be an embedding, instead of several bytes (a full word or part of a word) being a single embedding. The amount of time a LLM takes is proportional to the number of embeddings (or tokens, since each token is represented by an embedding) in the input, and the amount of memory used by the internal state of the LLM is also proportional to the number of embeddings in the context window (how far it looks back in the input).

fragmede · 2025-07-29T05:55:34 1753768534

The token for cat is 464 which is just 9 bits.

pornel · 2025-07-26T16:50:10 1753548610

This didn't need Microsoft's teeth to fail. There isn't a single "Linux" that game devs can build for. The kernel ABI isn't sufficient to run games, and Linux doesn't have any other stable ABI. The APIs are fragmented across distros, and the ABIs get broken regularly.

The reality is that for applications with visuals better than vt100, the Win32+DirectX ABI is more stable and portable across Linux distros than anything else that Linux distros offer.

pornel · 2025-07-26T16:33:04 1753547584

I would like CPUs to move to the GPU model, because in the CPU land adoption of wider SIMD instructions (without manual dispatch/multiversioning faff) takes over a decade, while in the GPU land it's a driver update.

To be clear, I'm talking about the PTX -> SASS compilation (which is something like LLVM bitcode to x86-64 microcode compilation). The fragmented and messy high-level shader language compilers are a different thing, in the higher abstraction layers.

pornel · 2025-07-25T02:08:39 1753409319

I don't know what you're referring to. Rust's threads are OS threads. There's no magic runtime there.

The same memory corruption gotchas caused by threads exist, regardless of whether there is a borrow checker or not.

Rust makes it easier to work with non-trivial multi-threaded code thanks to giving robust guarantees at compile time, even across 3rd party dependencies, even if dynamic callbacks are used.

Appeasing the borrow checker is much easier than dealing with heisenbugs. Type system compile-time errors are a thing you can immediately see and fix before problems happen.

OTOH some racing use-after-free or memory corruption can be a massive pain to debug, especially when it may not be possible to produce in a debugger due to timing, or hard to catch when it happens when the corruption "only" mangles the data instead of crashing the program.

shadowgovt · 2025-07-25T02:20:23 1753410023

It's not the runtime; it's how the borrow-checker interoperates with threads.

This is an aesthetics argument more than anything else, but I don't think the type theory around threads and memory safety in Rust is as "cooked" as single-thread borrow checking. The type assertions necessary around threads just get verbose and weird. I expect with more time (and maybe a new paradigm after we've all had more time to use Rust) this is a solvable problem, but I personally shy away from Rust for multi-threaded applications because I don't want to please the type-checker.

pornel · 2025-07-25T13:03:15 1753448595

You know that Rust supports scoped threads? For the borrow checker, they behave like same-thread closures.

Borrow checking is orthogonal to threads.

You may be referring to the difficulty satisfying the 'static liftime (i.e. temporary references are not allowed when spawning a thread that may live for an arbitrarily long time).

If you just spawn an independent thread, there's no guarantee that your code will reach join(), so there's no guarantee that references won't be dangling. The scoped threads API catches panics and ensures the thread will always finish before references given to it expire.

shadowgovt · 2025-07-25T15:43:34 1753458214

I'll have to look more closely at scoped threads. What I'm referring to is that compared to the relatively simple syntax of declaring scopes for arguments to functions and return values to functions, the syntax when threads get involved is (to take an example from the Rust Book, Chapter 21):

  pub fn spawn<F, T>(f: F) -> JoinHandle<T>
      where
          F: FnOnce() -> T,
          F: Send + 'static,
          T: Send + 'static,

... yikes. This is getting into "As easy to read as a C++ template" territory.

steveklabnik · 2025-07-25T16:12:48 1753459968

The signature for scoped threads is both simpler and more complicated depending on how you look at it:

https://doc.rust-lang.org/stable/std/thread/fn.scope.html

But really, that first type signature is not very complex. It can get far, far, far worse. That’s just what happens when you encode things in types.

(It reads as “spawn is a function that accepts a closure that returns a type T. It returns a JoinHandle that also wraps a T. Both the closure and the T must be able to be sent to another thread and have a static lifetime.”)

pornel · 2025-07-24T22:51:47 1753397507

I like Hyundai's HDA much more than Tesla's.

With Tesla it's all-or-nothing, and when it inevitably drives poorly, I can only turn it off. It physically resists me turning the steering wheel while it's driving, and overcoming the resistance results in an unpleasant and potentially dangerous jerk.

OTOH in IONIQ I can control lane assist and adaptive cruise control separately. The lane assist is additive to normal steering. It doesn't take over, only makes the car seem to naturally roll along the road.

sergiotapia · 2025-07-24T23:15:16 1753398916

I don't touch the steering wheel period in my Tesla though. Literally from door to door.

pornel · 2025-07-24T00:52:54 1753318374

Go's goroutines aren't plain C threads (blocking syscalls are magically made async), and Go's stack isn't a normal C stack (it's tiny and grown dynamically).

A C function won't know how to behave in Go's runtime environment, so to call a C function Go needs make itself look more like a C program, call the C function, and then restore its magic state.

Other languages like C++, Rust, and Swift are similar enough to C that they can just call C functions directly. CPython is a C program, so it can too. Golang was brave enough to do fundamental things its own way, which isn't quite C-compatible.

9rx · 2025-07-24T15:35:09 1753371309

> CPython is a C program

Go (gc) was also a C program originally. It still had the same overhead back then as it does now. The implementation language is immaterial. How things are implemented is what is significant. Go (tinygo), being a different implementation, can call C functions as fast as C can.

> ...so it can too.

In my experience, the C FFI overhead in CPython is significantly higher than Go (gc). How are you managing to avoid it?

pornel · 2025-07-25T13:06:34 1753448794

I think in case of CPython it's just Python being slow to do anything. There are costs of the interpreter, GIL, and conversion between Python's objects and low-level data representation, but the FFI boundary itself is just a trivial function call.

9rx · 2025-07-25T14:40:14 1753454414

> but the FFI boundary itself is just a trivial function call.

Which no different than Go, or any other language under the sun. There is no way to call a C function other than trivially, as you put it. The overhead in both Python and Go is in doing all the things you have to do in order to get to that point.

A small handful of languages/implementations are designed to be like C so that they don't have to do all that preparation in order to call a C function. The earlier comment included CPython in them. But the question questioned how that is being pull off, as that isn't the default. By default, CPython carries tremendous overhead to call a C function — way more than Go.

johnisgood · 2025-07-24T20:23:58 1753388638

I would like to know this, too.

hinkley · 2025-07-24T13:49:56 1753364996

I wonder if they should be using something like libuv to handle this. Instead of flipping state back and forth, create a playground for the C code that looks more like what it expects.

johnisgood · 2025-07-24T00:56:12 1753318572

What about languages like Java, or other popular languages with GC?

lmm · 2025-07-24T02:29:12 1753324152

Java FFI is slow and cumbersome, even more so if you're using the fancy auto-async from recent versions. The JVM community has mostly bitten the bullet and rewritten the entire world in Java rather than using native libraries, you only see JNI calls for niche things like high performance linear algebra; IMO that was the right tradeoff but it's also often seen as e.g. the reason why Java GUIs on the desktop suck.

Other languages generally fall into either camp of having a C-like stack and thread model and easy FFI (e.g. Ruby, TCL, OCaml) and maybe having futures/async but not in an invisible/magic way, or having a radically different threading model at the cost of FFI being slow and painful (e.g. Erlang). JavaScript is kind of special in having C-like stack but being built around calling async functions from a global event loop, so it's technically the first but feels more like the second.

hinkley · 2025-07-24T13:55:03 1753365303

JNI is the second or maybe third FFI for Java. JRI existed before it and that was worse, including performance. The debugging and instrumentation interfaces have been rewritten more times.

https://docs.oracle.com/en/java/javase/24/docs/specs/jni/int... mentions JRI.

But it seems like JNI has been replaced by third party solutions multiple times as well.

https://developer.okta.com/blog/2022/04/08/state-of-ffi-java...

pjc50 · 2025-07-24T11:44:18 1753357458

C# does marshal/unmarshal for you, with a certain amount of GC-pinning required for structures while the function is executing. It's pretty convenient, although not frictionless, and I wouldn't like to say how fast it is.

andrewflnr · 2025-07-24T02:06:33 1753322793

Similar enough to C I guess, at least in their stack layout.

fnord123 · 2025-07-24T11:30:26 1753356626

It's explained in the article.