More

johnklos · 2025-06-06T18:07:01 1749233221

One of the biggest problems with CPUs is legacy. Tie yourself to any legacy, and now you're spending millions of transistors to make sure some way that made sense ages ago still works.

Just as a thought experiment, consider the fact that the i80486 has 1.2 million transistors. An eight core Ryzen 9700X has around 12 billion. The difference in clock speed is roughly 80 times, and the difference in number of transistors is 1,250 times.

These are wild generalizations, but let's ask ourselves: If a Ryzen takes 1,250 times the transistor for one core, does one core run 1,250 times (even taking hyperthreading in to account) faster than an i80486 at the same clock? 500 times? 100 times?

It doesn't, because massive amounts of those transistors go to keeping things in sync, dealing with changes in execution, folding instructions, decoding a horrible instruction set, et cetera.

So what might we be able to do if we didn't need to worry about figuring out how long our instructions are? Didn't need to deal with Spectre and Meltdown issues? If we made out-of-order work in ways where much more could be in flight and the compilers / assemblers would know how to avoid stalls based on dependencies, or how to schedule dependencies? What if we took expensive operations, like semaphores / locks, and built solutions in to the chip?

Would we get to 1,250 times faster for 1,250 times the number of transistors? No. Would we get a lot more performance than we get out of a contemporary x86 CPU? Absolutely.

AnthonyMouse · 2025-06-06T18:41:34 1749235294

Modern CPUs don't actually execute the legacy instructions, they execute core-native instructions and have a piece of silicon dedicated to translating the legacy instructions into them. That piece of silicon isn't that big. Modern CPUs use more transistors because transistors are a lot cheaper now, e.g. the i486 had 8KiB of cache, the Ryzen 9700X has >40MiB. The extra transistors don't make it linearly faster but they make it faster enough to be worth it when transistors are cheap.

Modern CPUs also have a lot of things integrated into the "CPU" that used to be separate chips. The i486 didn't have on-die memory or PCI controllers etc., and those things were themselves less complicated then (e.g. a single memory channel and a shared peripheral bus for all devices). The i486SX didn't even have a floating point unit. The Ryzen 9000 series die contains an entire GPU.

Sohcahtoa82 · 2025-06-06T18:36:41 1749235001

> If a Ryzen takes 1,250 times the transistor for one core, does one core run 1,250 times (even taking hyperthreading in to account) faster than an i80486 at the same clock? 500 times? 100 times?

Would be interesting to see a benchmark on this.

If we restricted it to 486 instructions only, I'd expect the Ryzen to be 10-15x faster. The modern CPU will perform out-of-order execution with some instructions even run in parallel, even in single-core and single-threaded execution, not to mention superior branch prediction and more cache.

If you allowed modern instructions like AVX-512, then the speedup could easily be 30x or more.

> Would we get to 1,250 times faster for 1,250 times the number of transistors? No. Would we get a lot more performance than we get out of a contemporary x86 CPU? Absolutely.

I doubt you'd get significantly more performance, though you'd likely gain power efficiency.

Half of what you described in your hypothetical instruction set are already implemented in ARM.

ahartmetz · 2025-06-07T11:31:53 1749295913

A Ryzen is muuuuch more than 10-15x faster than a 486, and AVX et al do diddly squat for a lot of general-purpose code.

Clock speed is about 50x and IPC, let's say, 5-20x. So it's roughly 500x faster.

Sohcahtoa82 · 2025-06-08T05:33:56 1749360836

I meant a comparison on a clock-for-clock level. In other words, imagine either the 486 running at the clock speed of a Ryzen, or the Ryzen running at the clock speed of the 486. In other other words, compare ONLY IPC.

The line I was commenting on said:

> If a Ryzen takes 1,250 times the transistor for one core, does one core run 1,250 times (even taking hyperthreading in to account) faster than an i80486 at the same clock?

Emphasis added by me.

layla5alive · 2025-06-06T22:39:44 1749249584

In terms of FLOPS, Ryzen is ~1,000,000 times faster than a 486.

For serial branchy code, it isn't a million times faster, but that has almost nothing to do with legacy and everything to do with the nature of serial code and that you can't linearly improve serial execution with architecture and transistor counts (you can sublinearly improve it), but rather with Denard scaling.

It is worth noting, though, that purely via Denard scaling, Ryzen is already >100x faster, though! And via architecture (those transistors) it is several multiples beyond that.

In general compute, if you could clock it down at 33 or 66MHz, a Ryzen would be much faster than a 486, due to using those transistors for ILP (instruction-level parallelism) and TLP (thread-level parallelism). But you won't see any TLP in a single serial program that a 486 would have been running, and you won't get any of the SIMD benefits either, so you won't get anywhere near that in practice on 486 code.

The key to contemporary high performance computing is having more independent work to do, and organizing the data/work to expose the independence to the software/hardware.

Szpadel · 2025-06-06T20:12:43 1749240763

that's exactly why Intel proposed x86S

that's basically x86 without 16 and 32 bit support, no real mode etc.

CPU starts initialized in 64bit without all that legacy crap.

that's IMO great idea. I think every few decades we need to stop and think again about what works best and take fresh start or drop some legacy unused features.

risc v have only mandatory basic set of instructions, as little as possible to be Turing complete and everything else is extension that can be (theoretically) removed in the future.

this also could be used to remove legacy parts without disrupting architecture

kvemkon · 2025-06-06T18:37:48 1749235068

Would be interesting to compare transistor count without L3 (and perhaps L2) cache.

16-core Zen 5 CPU achieves more than 2 TFLOPS FP64. So number crunching performance scaled very well.

It is weird, that the best consumer GPU can 4 TFLOPS. Some years ago GPUs were an order of magnitude and more faster than CPUs. Today GPUs are likely to be artificially limited.

kvemkon · 2025-06-06T18:49:22 1749235762

E.g. AMD Radeon PRO VII with 13.23 billion transistors achieves 6.5 TFLOPS FP64 in 2020 [1].

[1] https://www.techpowerup.com/gpu-specs/radeon-pro-vii.c3575

zozbot234 · 2025-06-06T19:00:15 1749236415

> 16-core Zen 5 CPU achieves more than 2 TFLOPS FP64. So number crunching performance scaled very well.

These aren't realistic numbers in most cases because you're almost always limited by memory bandwidth, and even if memory bandwidth is not an issue you'll have to worry about thermals. Theoretical CPU compute ceiling is almost never the real bottleneck. GPU's have a very different architecture with higher memory bandwidth and running their chips a lot slower and cooler (lower clock frequency) so they can reach much higher numbers in practical scenarios.

kvemkon · 2025-06-06T20:02:43 1749240163

Sure, not for BLAS Level 1 and 2 operations. But not even for Level 3?

layla5alive · 2025-06-06T22:19:14 1749248354

Huh, consumer GPUs are doing Petaflops of floating point. FP64 isn't a useful comparison because FP64 is nerfed on consumer GPUs.

kvemkon · 2025-06-06T22:32:50 1749249170

Even recent nVidia 5090 has 104.75 TFLOPS FP32.

It's useful comparison in terms of achievable performance per transistor count.

saati · 2025-06-06T20:20:35 1749241235

But history showed exactly the opposite, if you don't have an already existing software ecosystem you are dead, the transistors for implementing x86 peculiarities are very much worth it if people in the market want x86.

colechristensen · 2025-06-06T18:28:33 1749234513

GPUs scaled wide with a similar number of transistors to a 486 and just lots more cores, thousands to tens of thousands of cores averaging out to maybe 5 million transistors per core.

CPUs scaled tall with specialized instruction to make the single thread go faster, no the amount done per transistor does not scale anywhere near linearly, very many of the transistors are dark on any given cycle compared to a much simpler core that will have much higher utilization.

zozbot234 · 2025-06-06T18:30:46 1749234646

> Didn't need to deal with Spectre and Meltdown issues? If we made out-of-order work in ways where much more could be in flight and the compilers / assemblers would know how to avoid stalls based on dependencies, or how to schedule dependencies? What if we took expensive operations, like semaphores / locks, and built solutions in to the chip?

I'm pretty sure that these goals will conflict with one another at some point. For example, the way one solves Spectre/Meltdown issues in a principled way is by changing the hardware and system architecture to have some notion of "privacy-sensitive" data that shouldn't be speculated on. But this will unavoidably limit the scope of OOO and the amount of instructions that can be "in-flight" at any given time.

For that matter, with modern chips, semaphores/locks are already implemented with hardware builtin operations, so you can't do that much better. Transactional memory is an interesting possibility but requires changes on the software side to work properly.

AtlasBarfed · 2025-06-06T23:58:08 1749254288

If you have a very large CPU count, then I think you can dedicate a CPU to only process a given designated privacy/security focused execution thread. Especially for a specially designed syscall, perhaps

That kind of takes the specter meltdown thing out of the way to some degree I would think, although privilege elevation can happen in the darndest places.

But maybe I'm being too optimistic

hnaccount_rng · 2025-06-07T06:20:58 1749277258

Isn't the problem the "labelling" of "privacy-sensitive" in the first place?

dist-epoch · 2025-06-06T18:35:07 1749234907

If you look at a Zen5 die shots half of the space is taken by L3 cache.

And from each individual core:

- 25% per core L1/L2 cache

- 25% vector stuff (SSE, AVX, ...)

- from the remaining 50% only about 20% is doing instruction decoding

https://www.techpowerup.com/img/AFnVIoGFWSCE6YXO.jpg

zozbot234 · 2025-06-06T18:39:32 1749235172

The real issue with complex insn decoding is that it's hard to make the decode stage wider and at some point this will limit the usefulness of a bigger chip. For instance, AArch64 chips tend to have wider decode than their close x86_64 equivalents.

epx · 2025-06-06T18:40:35 1749235235

Aren't 99,99999% of these transistors used in cache?

nomel · 2025-06-06T21:19:30 1749244770

Look up "CPU die diagram". You'll see the physical layout of the CPU with annotated blocks.

Zen 3 example: https://www.reddit.com/r/Amd/comments/jqjg8e/quick_zen3_die_...

So, more like 85%, or around 6 orders of magnitude difference from your guess. ;)

PopePompus · 2025-06-06T18:53:16 1749235996

Gosh no. Often a majority of the transistors are used in cache, but not 99%.

Const-me · 2025-06-07T17:15:18 1749316518

CPUs can’t do that, but legacy is irrelevant. They just don’t have enough parallelism to leverage all these extra transistors. Let’s compare the 486 with a modern GPU.

Intel 80486 with 1.2M transistors delivered 0.128 flops / cycle.

nVidia 4070 Ti Super with 45.9B transistors delivers 16896 flops / cycle.

As you see, each transistor became 3.45 times more efficient at delivering these FLOPs per cycle.

johnklos · 2025-06-06T18:26:14 1749234374

> and the difference in number of transistors is 1,250 times

I should've written per core.

amelius · 2025-06-06T18:39:30 1749235170

> and now you're spending millions of transistors

and spending millions on patent lawsuits ...

smegger001 · 2025-06-07T01:26:38 1749259598

correct me if i am wrong but isn't that what was tried with the Intel Itanium processors line, only the smarter compilers and assemblers never quiet got there.

what makes it more likely to work this time?

PhilipRoman · 2025-06-07T12:23:32 1749299012

Optimizing compiler technology was still in the stone age (arguably still is) when Itanium was released. LLVM had just been born and GCC didn't start using SSA until 2005. Egraphs were unheard of in context of compiler optimization.

That said, yesterday I saw gcc generate 5 KB of mov instructions because it couldn't gracefully handle a particular vector size so I wouldn't get my hopes up...

johnklos · 2025-06-01T03:38:25 1748749105

I'm truly surprised that later versions of AIX like 4.3 can't be run on the ANS. How different is the close-to-Power-Mac hardware from real IBM hardware? I wonder...

This reminds me that I need to recap my close-to-ANS hardware Power Mac 9600...

classichasclass · 2025-06-01T03:45:22 1748749522

Not especially similar, aside from the bus and CPU. IBM hardware of that era was straight-up PReP and later straight-up CHRP, but Apple never adopted either for Old World Macs, and even New World Macs are an incompatible mix of the two.

rbanffy · 2025-06-02T11:52:57 1748865177

Would have been nice if IBM ported AIX to PowerMacs...

flomo · 2025-06-01T07:02:00 1748761320

Itsamystery what Apple was thinking, but I suspect almost zero external customers actually bought an ANS. Apple probably used it internally (dogfood), and maybe some external partners got one.

Sorta like CHRP "Windows NT on PPC", it was printed on the CD-ROM, but the machines were never actually sold to the public.

Also edit to clarify: OP is about porting Doom to an old UNIX, not the usual small computer. The ANS was a big computer. (OP says IBM ported Quake.)

classichasclass · 2025-06-01T14:51:41 1748789501

(author) The ANS 500 I have and that this was developed on was purchased by the University I used to work for to serve as the bookstore inventory management system. The vendor refused to support it anymore after Apple cancelled the line, but it was purchased retail, minus Apple's academic discount, of course. The University had no particular relationship with Apple otherwise.

Apple did use ANSes for many years after they were discontinued. Austin had a group of Shiners still in service for internal use as late as 2005.

The ANS does have "big computer" I/O options but it's still descended from the Power Mac 9500, which is its closest relative. Harpoon AIX has a lot of changes to support the different hardware.

johnklos · 2025-05-31T16:17:10 1748708230

> They definitely could have made avx512 instructions trigger a switch to p-cores,

That'd be an OS thing.

This is a problem that has been solved in the mainframe / supercomputing world and which was discussed in the BSD world a quarter of a century ago. It's simple, really.

Each CPU offers a list of supported features (cpuctl identify), and the scheduler keeps track of whether a program advertises use of certain features. If it does want features that some CPUs don't support, that process can't be scheduled on those CPUs.

I remember thinking about this way back when dual Nintendo cartridge Pentium motherboards came out. To experiment, I ran a Pentium III and a Celery on an adapter card, which, like talking about self hosting email, infuriated people who told me it can't be done. Different clock speeds, different CPU features, et cetera, worked, and worked well enough to wonder what scheduler changes would make using those different features work properly.

johnklos · 2025-05-28T06:01:39 1748412099

A square that's one thousand units by one thousand units doesn't give a rational number, much less an integer one, for the diagonal.

A 9" CRT would never be precisely 9", because beam trace width and height are analog, plus there's overscan, so a 9" screen would simply give something pretty close to 9".

johnklos · 2025-05-27T20:27:54 1748377674

The title is incorrect, because b&w Macs have 512×342 resolution, not 512x324.

It wouldn't've been too crazy had Apple went with 64K x 4 chips, so they'd've just needed four of them to get 128 KB at a full 16 bits wide.

512x342 was 16.7% of 128 KB of memory, as opposed to 18.75% with 512x384. Not much of a difference. But having square pixels is nice.

jerbear4328 · 2025-05-27T20:49:01 1748378941

It looks like it's just the HN submitted title which is wrong (currently "Why the Original Macintosh Had a Screen Resolution of 512×324"). The article's title is "Why the Original Macintosh Had a Screen Resolution of 512×342", and "324" doesn't appear anywhere on the page.

bscphil · 2025-05-27T20:55:18 1748379318

Looks like someone is reading Hacker News comments and editing the page - archive.org captured the page probably mid-edit, and it says "324" in one place: https://web.archive.org/web/20250527202300/https://512pixels...

ChuckMcM · 2025-05-27T21:01:36 1748379696

Oh that's priceless. Real time HN feedback loops.

dhosek · 2025-05-27T22:14:10 1748384050

Wouldn’t be the first time.

90s_dev · 2025-05-27T21:19:23 1748380763

> wouldn't've

Really, John? You really had to make me parse that word?

webstrand · 2025-05-27T21:20:35 1748380835

It's a great word, I use it all the time.

kragen · 2025-05-28T04:33:59 1748406839

You shouldn't've tho. Who'd've complained if you hadn't've?

kevin_thibedeau · 2025-05-27T22:17:29 1748384249

It usually isn't transcribed with Klingon orthography.

90s_dev · 2025-05-27T23:54:25 1748390065

I bet you also work for the IRS don't you

brookst · 2025-05-28T02:02:41 1748397761

You version of shouldn’t’ve’s punctuation isn’t like that?

kstrauser · 2025-05-28T01:50:25 1748397025

Who’d’ve thought?

JKCalhoun · 2025-05-27T22:01:24 1748383284

Worth adding? The (almost [1]) omni-present menu bar ate 20 pixels of vertical space as well — so you could say the application had 322 of useable rows.

[1] To be sure, many games hide the menu bar.

johnklos · 2025-05-26T17:00:43 1748278843

I don't think software engineers were independently looking at emissions data and unilaterally decided to "fix" the emissions shortcomings in software. I think they were told by others to do that. It's good that Germany is going after the people who decided that fraud was the answer.

bhelkey · 2025-05-26T17:48:08 1748281688

> It's good that Germany is going after the people who decided that fraud

When the VW scandal broke, the US indicted seven senior executives. None of these seven were extradited to the US to stand trial [1].

The VW scandal was made public in 2015 [2] and involved cheating since 2009. Sentencing only two executives to jail a decade after their wrong doing made international news does not send a strong message.

[1] https://www.justice.gov/archives/opa/pr/former-ceo-volkswage...

[2] https://www.bbc.com/news/business-34324772

watwut · 2025-05-27T08:37:40 1748335060

Germany does not extradite its nationals to the US at all. They can sometimes extradite to other EU states, but not to USA.

Sending own citizens to foreign country is generally big deal and not something that is done.

johnklos · 2025-05-25T00:53:13 1748134393

Imagine being filthy rich and thinking, "You know what? I'm going to risk going to jail to do illegal things because I'm not rich enough."

Why is this so common?

codedokode · 2025-05-25T01:54:18 1748138058

Rich mafia bosses also kill people. It's their lifestyle.

duxup · 2025-05-25T01:58:09 1748138289

I don't think it is common, plenty of rich folks out there who do just fine.

We just hear about the bad ones.

I grew up in a smaller town in the midwest. There were some neighbors, bunch of old guy friends living in post WWII era baby boomer houses just down a few blocks. Nice guys, they were small town attorneys, politicians, small businessmen who ran some very humble businesses, and etc. They all drove 10 year old basic cars, golfed together on men's night, mowed their own lawns until they couldn't anymore.

It wasn't until I was older that I realized that they were all on the board of a local community bank that they started long ago. Over the years the bank grew, absorbed other banks.

Everyone of them was worth somewhere in the dozens of millions of dollars.

You would never know it.

zdragnar · 2025-05-25T02:21:58 1748139718

In the US, there are 800 billionaires. There are 5.5 million people that millionaires by liquid assets alone, and at least another 17 million by net value from things like retirement accounts or home value.

Odds are pretty decent you know one of them, or know someone who does, and probably don't even realize it.

ents · 2025-05-25T00:56:09 1748134569

I think it's because there's nothing left to do except amass more money and power. Like most billionaires can do whatever they want, but they keep working, or are in the public eye. Why?

paulryanrogers · 2025-05-25T01:30:27 1748136627

Plenty of rich folks keep their head down, bribe politicians (call it lobbying), and just keep getting what they want quietly.

johnklos · 2025-05-22T18:59:53 1747940393

Mozilla continues to appear to not get the philosophies behind open source. If they really wanted to help people and not simply try to get market share and make money, they'd examine ways to make Pocket itself open source, including the server end of things.

"We're handing this over to a non-profit" would be nice.

therealdrag0 · 2025-05-22T20:01:47 1747944107

Looks like much of it is. https://github.com/Pocket

Not sure how complete it is. But appears to have a typescript backend included.

johnklos · 2025-05-22T01:21:16 1747876876

Does it surprise anyone that Verizon's Department of Evil would lie, cheat, bribe, steal, or otherwise do whatever it can to not be held to account for its agreements?

I'm not surprised, just as I won't be surprised if Verizon gets their way after throwing a few million dollars in the direction of Trump's "Library" or whatever.

johnklos · 2025-05-22T01:14:13 1747876453

> How long Netatalk will be able to support AFP remains to be seen however, since it too is based on the protocol itself. Since Apple removed native core AFP support from macOS, even third-party AFP products may no longer work.

> AFP has served Apple well. It was simple and easy to use - and it was reliable. But since we live in a TCP/IP and Windows-based world now, it has outlived its usefulness.

What? Huh?

Since when does an open source project somehow stop working because an OS stops supporting whatever the project does?

Netatalk may very well become MORE relevant, because it may be the only way for Macs running the newest macOS to interact with older Macs.

And "TCP/IP and Windows-based"? Is this AI generated slop, or just a really bad author who doesn't understand technology? AFP has been able to use TCP/IP since at lease System 7.6.

Sigh.

It's sad, in part because it brought so many generations of Macs together. I have an iMac G3 motherboard built in to a Tonka truck that runs Mac OS X 10.4 Tiger and acts as a file server and can support m68k machines running System 7.6.1, all the way through Arm Macs running Sequoia 15.5. It's a good thing Netatalk exists!

kirb · 2025-05-22T01:34:17 1747877657

There are some strange passages in this, such as here where it suddenly decides to bring up the man page and how to exit man:

> There's an NFS app for macOS called NFS Manager from Germany's Marcel Bresink.

> On pre-15.5 Macs, see the Terminal AFP command mount_afp by opening Terminal and typing:

> man mount_afp and pressing Return on your keyboard. To exit the man system, press Control-Z or the q key.

> Several third-party NAS vendors, such as Synology and others, include AFP support in their products, but that's likely to come to an end soon too.

(Not clear why it would be coming to an end if they’re based on Linux!)

The cached headline I saw on Mastodon also called it “depreciated”.

Losing AFP sucks, because macOS’s SMB support continues to be abysmally slow, and really needs Apple’s undocumented proprietary SMB extensions to work halfway decent. Lately I’ve been accessing my SMB shares (from both Samba and Windows 11) through Cyberduck, because Finder is just unbearably slow and gets tripped up on file permissions for no reason. Deprecated or not, Netatalk will be more important than ever if users need a protocol that just works.

vondur · 2025-05-22T01:49:50 1747878590

You can still download Samba for MacOS. I’m guessing it doesn’t integrate or replace the Apple supplied SMB software?