Hacker News new | past | comments | ask | show | jobs | submit login

Was summarizing this article for a group of friends who largely met during the Apple II days and wanted to repost a bit of that here:

The optimized code at the end takes 94 nanoseconds to sum an array of 1024 32-bit floating point numbers.

In 94 nanoseconds, our old friend the 1 MHz 6502, would be just starting to consider signaling the memory chips that maybe they ought to try digging up the first byte of the first instruction in the program.

Worth mentioning, that code is entirely dependent on running in cache. Otherwise even the mighty M1 Max in the post would still be stuck waiting on that first memory fetch. DRAM is slow. :-)




Lucky our total L1 cache sizes are about as big as the entire addressable memory of the 6502.

We truly live in amazing times.


Truly. And I'm also always amazed by how much slower (in terms of wall time) modern software is than it ought to be. CPUs sure don't feel three to four orders of magnitude faster than they were 50 years ago, because software has gotten four to five orders of magnitude more wasteful to compensate. Argh...


I recently moved a project from a Phenom II 945 to an I7 8700. The project took around 1:30 to compile on the Phenom, 15s to compile on the I7. Others working on the project with even more modern systems are compiling in half that time.

The major advantage I had was running the same compilers, same interpreters and same tooling, just in better hardware.

On the other hand, I always felt KDE 2 and 3 were way more responsive on way less hardware than one of the modern versions or Gnome. Part of it is UI design - giving feedback that an operation is going to take some time immediately instead of blocking until the operation is done before doing anything.


Plenty of that software is fairly well optimized, but most software is not that optimized. Microsoft Teams, Discord, Slack, basically anything else that also uses Electron... it's not UI design, it's legitimately wasted work, and tons of it.


So many programmers saw "Premature optimization is the root of all evil" and thought it means "Caring about performance makes you a heretic".

You can't hotspot optimize a Fiat Multipla into an Formula 1 car. When every software you run creates a dozen factories to replace one for-loop, you get the modern desktop experience


    > So many programmers saw "Premature optimization is the root of all evil" and thought it means "Caring about performance makes you a heretic".
This is very well said. Also, with enough experience, many good programmers will "know" where the hotspots will be, while writing the code. So, they can be "sloppy" in areas where it doesn't matter, and "care more about performance" in areas where it will likely matter much more.


    > most software is not that optimized
I would say: For good reason. The value is too low.

    > basically anything else that also uses Electron... 
To me, the point of Electron is a lovely platform with very high developer productivity. This is partly the reason why Java/C# was able to displace so much C&C++ code inside big enterprise corps writing CRUD apps. Sure, Java/C# is a bit bloatly/slow compared to C&C++, but the developer productivity is way higher.

    > it's not UI design, it's legitimately wasted work, and tons of it.
I don't understand this part. Can you explain more? Maybe an example would help.


> I would say: For good reason. The value is too low.

Oh my god, absolutely not. The value is extremely high to me, the consumer, even if the company doesn't care. Sure, they can technically still get their value proposition out even if the app is slow and bloated and sucky because every app these days is slow and bloated and sucky. And you can say they only care about money all you want, but that's not going to convince me that's a good reason for an app to perform terribly.

> To me, the point of Electron is a lovely platform with very high developer productivity. This is partly the reason why Java/C# was able to displace so much C&C++ code inside big enterprise corps writing CRUD apps. Sure, Java/C# is a bit bloatly/slow compared to C&C++, but the developer productivity is way higher.

I use Tauri in place of Electron and it's a lot snappier than Electron is, and you can get much of the same benefits in terms of developer productivity, because it's also just a webview. The host side needs to be a Rust application, but I wouldn't say it's much more difficult to get started with than Electron. Obviously, what you put inside the webview still matters, but in general, I'd say Electron is just an inferior choice in most cases, even if you do legitimately want a webview.

> I don't understand this part. Can you explain more? Maybe an example would help.

Parent said the lack of progress bars in GNOME make it feel slow. I argue that while progress bars could be a hack to make the user more open to waiting, the issue is that the user probably shouldn't need to wait at all. There are definitely cases where a progress bar is a good idea, but they shouldn't be needed in most cases.

For example, if I right-click a message in Discord, sometimes the context menu contains a loading spinner. I have to wait for the context menu to finish loading. Is a loading spinner the right solution here? Surely the user wouldn't want for the menu to not open, or for it to simply be blank until it loads. I think none of those are quite right. The context menu shouldn't have to load at all, it should simply open.


> the context menu contains a loading spinner

How the fuck is that possible, permissible, tolerated?

Another example. I am looking for a different file browser because Dolphin waits while something happens when I select a bunch of files and right-click. I know that it is downloading something very slowly because I can see the network activity but I have no idea why.


> I am looking for a different file browser because Dolphin waits while something happens when I select a bunch of files and right-click.

File browsers being slow seems like a common issue. When I tell Windows 11 Explorer to sort by modified date, it has to slowly load the list and populate the files one by one. What the fuck it is doing I have absolutely no clue. All I know is that it's doing something it shouldn't, because the last modified date should be returned instantly by the directory listing, and no additional metadata should be required to perform the sort.

And back on macOS, around five years ago, whenever I opened a folder, Finder would wait for a few seconds and then play the uncollapse animation for every folder I have expanded, shifting the position of basically every item in the list (and also causing tons of lag).

I think the second one is a well-intentioned feature that just needs some work, but the first one is just garbage, that's not how it should work.


Well it depends on what software you're talking about. Browsers are way more capable than they were before. I'd be surprised if old computers would be able to play a 1080p video at 60fps even if somehow your network card had the bandwidth for it. And copying/pasting files is certainly way faster than it used to be. Compiling large applications is much faster now when it would be an overnight process decades ago.

Nothing is stopping you from using the old school software but the reality is that old software was missing a lot of the features that we rely on. That's not to say that current software isn't bloated but if you told me I could go back to using the old software I wouldn't (and I could).


Now you're making me wonder if there's a 6502 emulator making use of that fact.


Hah, I had the same thought. What kind of hacks can I do to convince the processor to keep as much of the emulator + opcodes in L1 as I can...


Back in the nineties, 3DFX had a synthetic rendering benchmark that relied on keeping the entire benchmark in L1 but the secret was taking over the entire machine so that no interrupt or other mechanism could pollute the cache.


A bit of ignorance on my part, but would the L1 be holding the data and the instructions? In which case we would be trying to fit our entire 6502 emulator in less than 64K of memory alongside the emulated RAM/ROM?


https://en.wikipedia.org/wiki/Apple_M1#CPU:

“The high-performance cores have an unusually large 192 KB of L1 instruction cache and 128 KB of L1 data cache and share a 12 MB L2 cache; the energy-efficient cores have a 128 KB L1 instruction cache, 64 KB L1 data cache, and a shared 4 MB L2 cache.”

⇒ chances are you don’t even have to try to fit an emulator of a 8-bit machine and it memory into the L1 cache.


I think you would very much have to try to fit a complete emulator of, say, the Game Boy into 128 + 64KB.

There's plenty of behaviour that is self-evident on real silicon but verbose and tricky to express in software.


Real question about L1 caches. For a long time, x86 (Intel & AMD) L1 caches have been pretty much pegged at 32KB. Do you know why they didn't make them larger? My guess: There is a trade-off between economics and performance.


There is a trade-off between cache size and latency.


Ok, so why do the new Mx chips from Apple have an L1 cache size greater than 32KB? Did they solve some long standing design issue?


The CPU decides what goes in there and when. You can only pray and offer sacrifices and guess at when and how.


Depends on the precise architecture, but ARM (and other RISC designs) usually have separate data and instruction L1 caches. You may need to be aware of this if writing self-modifying code, because cache coherence may not be automatic




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: