One of the biggest problems with CPUs is legacy. Tie yourself to any legacy, and now you're spending millions of transistors to make sure some way that made sense ages ago still works.
Just as a thought experiment, consider the fact that the i80486 has 1.2 million transistors. An eight core Ryzen 9700X has around 12 billion. The difference in clock speed is roughly 80 times, and the difference in number of transistors is 1,250 times.
These are wild generalizations, but let's ask ourselves: If a Ryzen takes 1,250 times the transistor for one core, does one core run 1,250 times (even taking hyperthreading in to account) faster than an i80486 at the same clock? 500 times? 100 times?
It doesn't, because massive amounts of those transistors go to keeping things in sync, dealing with changes in execution, folding instructions, decoding a horrible instruction set, et cetera.
So what might we be able to do if we didn't need to worry about figuring out how long our instructions are? Didn't need to deal with Spectre and Meltdown issues? If we made out-of-order work in ways where much more could be in flight and the compilers / assemblers would know how to avoid stalls based on dependencies, or how to schedule dependencies? What if we took expensive operations, like semaphores / locks, and built solutions in to the chip?
Would we get to 1,250 times faster for 1,250 times the number of transistors? No. Would we get a lot more performance than we get out of a contemporary x86 CPU? Absolutely.
Modern CPUs don't actually execute the legacy instructions, they execute core-native instructions and have a piece of silicon dedicated to translating the legacy instructions into them. That piece of silicon isn't that big. Modern CPUs use more transistors because transistors are a lot cheaper now, e.g. the i486 had 8KiB of cache, the Ryzen 9700X has >40MiB. The extra transistors don't make it linearly faster but they make it faster enough to be worth it when transistors are cheap.
Modern CPUs also have a lot of things integrated into the "CPU" that used to be separate chips. The i486 didn't have on-die memory or PCI controllers etc., and those things were themselves less complicated then (e.g. a single memory channel and a shared peripheral bus for all devices). The i486SX didn't even have a floating point unit. The Ryzen 9000 series die contains an entire GPU.
> If a Ryzen takes 1,250 times the transistor for one core, does one core run 1,250 times (even taking hyperthreading in to account) faster than an i80486 at the same clock? 500 times? 100 times?
Would be interesting to see a benchmark on this.
If we restricted it to 486 instructions only, I'd expect the Ryzen to be 10-15x faster. The modern CPU will perform out-of-order execution with some instructions even run in parallel, even in single-core and single-threaded execution, not to mention superior branch prediction and more cache.
If you allowed modern instructions like AVX-512, then the speedup could easily be 30x or more.
> Would we get to 1,250 times faster for 1,250 times the number of transistors? No. Would we get a lot more performance than we get out of a contemporary x86 CPU? Absolutely.
I doubt you'd get significantly more performance, though you'd likely gain power efficiency.
Half of what you described in your hypothetical instruction set are already implemented in ARM.
I meant a comparison on a clock-for-clock level. In other words, imagine either the 486 running at the clock speed of a Ryzen, or the Ryzen running at the clock speed of the 486. In other other words, compare ONLY IPC.
The line I was commenting on said:
> If a Ryzen takes 1,250 times the transistor for one core, does one core run 1,250 times (even taking hyperthreading in to account) faster than an i80486 at the same clock?
In terms of FLOPS, Ryzen is ~1,000,000 times faster than a 486.
For serial branchy code, it isn't a million times faster, but that has almost nothing to do with legacy and everything to do with the nature of serial code and that you can't linearly improve serial execution with architecture and transistor counts (you can sublinearly improve it), but rather with Denard scaling.
It is worth noting, though, that purely via Denard scaling, Ryzen is already >100x faster, though! And via architecture (those transistors) it is several multiples beyond that.
In general compute, if you could clock it down at 33 or 66MHz, a Ryzen would be much faster than a 486, due to using those transistors for ILP (instruction-level parallelism) and TLP (thread-level parallelism). But you won't see any TLP in a single serial program that a 486 would have been running, and you won't get any of the SIMD benefits either, so you won't get anywhere near that in practice on 486 code.
The key to contemporary high performance computing is having more independent work to do, and organizing the data/work to expose the independence to the software/hardware.
that's basically x86 without 16 and 32 bit support, no real mode etc.
CPU starts initialized in 64bit without all that legacy crap.
that's IMO great idea. I think every few decades we need to stop and think again about what works best and take fresh start or drop some legacy unused features.
risc v have only mandatory basic set of instructions, as little as possible to be Turing complete and everything else is extension that can be (theoretically) removed in the future.
this also could be used to remove legacy parts without disrupting architecture
Would be interesting to compare transistor count without L3 (and perhaps L2) cache.
16-core Zen 5 CPU achieves more than 2 TFLOPS FP64. So number crunching performance scaled very well.
It is weird, that the best consumer GPU can 4 TFLOPS. Some years ago GPUs were an order of magnitude and more faster than CPUs. Today GPUs are likely to be artificially limited.
> 16-core Zen 5 CPU achieves more than 2 TFLOPS FP64. So number crunching performance scaled very well.
These aren't realistic numbers in most cases because you're almost always limited by memory bandwidth, and even if memory bandwidth is not an issue you'll have to worry about thermals. Theoretical CPU compute ceiling is almost never the real bottleneck. GPU's have a very different architecture with higher memory bandwidth and running their chips a lot slower and cooler (lower clock frequency) so they can reach much higher numbers in practical scenarios.
But history showed exactly the opposite, if you don't have an already existing software ecosystem you are dead, the transistors for implementing x86 peculiarities are very much worth it if people in the market want x86.
GPUs scaled wide with a similar number of transistors to a 486 and just lots more cores, thousands to tens of thousands of cores averaging out to maybe 5 million transistors per core.
CPUs scaled tall with specialized instruction to make the single thread go faster, no the amount done per transistor does not scale anywhere near linearly, very many of the transistors are dark on any given cycle compared to a much simpler core that will have much higher utilization.
> Didn't need to deal with Spectre and Meltdown issues? If we made out-of-order work in ways where much more could be in flight and the compilers / assemblers would know how to avoid stalls based on dependencies, or how to schedule dependencies? What if we took expensive operations, like semaphores / locks, and built solutions in to the chip?
I'm pretty sure that these goals will conflict with one another at some point. For example, the way one solves Spectre/Meltdown issues in a principled way is by changing the hardware and system architecture to have some notion of "privacy-sensitive" data that shouldn't be speculated on. But this will unavoidably limit the scope of OOO and the amount of instructions that can be "in-flight" at any given time.
For that matter, with modern chips, semaphores/locks are already implemented with hardware builtin operations, so you can't do that much better. Transactional memory is an interesting possibility but requires changes on the software side to work properly.
If you have a very large CPU count, then I think you can dedicate a CPU to only process a given designated privacy/security focused execution thread. Especially for a specially designed syscall, perhaps
That kind of takes the specter meltdown thing out of the way to some degree I would think, although privilege elevation can happen in the darndest places.
The real issue with complex insn decoding is that it's hard to make the decode stage wider and at some point this will limit the usefulness of a bigger chip. For instance, AArch64 chips tend to have wider decode than their close x86_64 equivalents.
CPUs can’t do that, but legacy is irrelevant. They just don’t have enough parallelism to leverage all these extra transistors. Let’s compare the 486 with a modern GPU.
Intel 80486 with 1.2M transistors delivered 0.128 flops / cycle.
nVidia 4070 Ti Super with 45.9B transistors delivers 16896 flops / cycle.
As you see, each transistor became 3.45 times more efficient at delivering these FLOPs per cycle.
correct me if i am wrong but isn't that what was tried with the Intel Itanium processors line, only the smarter compilers and assemblers never quiet got there.
Optimizing compiler technology was still in the stone age (arguably still is) when Itanium was released. LLVM had just been born and GCC didn't start using SSA until 2005. Egraphs were unheard of in context of compiler optimization.
That said, yesterday I saw gcc generate 5 KB of mov instructions because it couldn't gracefully handle a particular vector size so I wouldn't get my hopes up...
I'm truly surprised that later versions of AIX like 4.3 can't be run on the ANS. How different is the close-to-Power-Mac hardware from real IBM hardware? I wonder...
This reminds me that I need to recap my close-to-ANS hardware Power Mac 9600...
Not especially similar, aside from the bus and CPU. IBM hardware of that era was straight-up PReP and later straight-up CHRP, but Apple never adopted either for Old World Macs, and even New World Macs are an incompatible mix of the two.
Itsamystery what Apple was thinking, but I suspect almost zero external customers actually bought an ANS. Apple probably used it internally (dogfood), and maybe some external partners got one.
Sorta like CHRP "Windows NT on PPC", it was printed on the CD-ROM, but the machines were never actually sold to the public.
Also edit to clarify: OP is about porting Doom to an old UNIX, not the usual small computer. The ANS was a big computer. (OP says IBM ported Quake.)
(author) The ANS 500 I have and that this was developed on was purchased by the University I used to work for to serve as the bookstore inventory management system. The vendor refused to support it anymore after Apple cancelled the line, but it was purchased retail, minus Apple's academic discount, of course. The University had no particular relationship with Apple otherwise.
Apple did use ANSes for many years after they were discontinued. Austin had a group of Shiners still in service for internal use as late as 2005.
The ANS does have "big computer" I/O options but it's still descended from the Power Mac 9500, which is its closest relative. Harpoon AIX has a lot of changes to support the different hardware.
> They definitely could have made avx512 instructions trigger a switch to p-cores,
That'd be an OS thing.
This is a problem that has been solved in the mainframe / supercomputing world and which was discussed in the BSD world a quarter of a century ago. It's simple, really.
Each CPU offers a list of supported features (cpuctl identify), and the scheduler keeps track of whether a program advertises use of certain features. If it does want features that some CPUs don't support, that process can't be scheduled on those CPUs.
I remember thinking about this way back when dual Nintendo cartridge Pentium motherboards came out. To experiment, I ran a Pentium III and a Celery on an adapter card, which, like talking about self hosting email, infuriated people who told me it can't be done. Different clock speeds, different CPU features, et cetera, worked, and worked well enough to wonder what scheduler changes would make using those different features work properly.
A square that's one thousand units by one thousand units doesn't give a rational number, much less an integer one, for the diagonal.
A 9" CRT would never be precisely 9", because beam trace width and height are analog, plus there's overscan, so a 9" screen would simply give something pretty close to 9".
It looks like it's just the HN submitted title which is wrong (currently "Why the Original Macintosh Had a Screen Resolution of 512×324"). The article's title is "Why the Original Macintosh Had a Screen Resolution of 512×342", and "324" doesn't appear anywhere on the page.
Worth adding? The (almost [1]) omni-present menu bar ate 20 pixels of vertical space as well — so you could say the application had 322 of useable rows.
I don't think software engineers were independently looking at emissions data and unilaterally decided to "fix" the emissions shortcomings in software. I think they were told by others to do that. It's good that Germany is going after the people who decided that fraud was the answer.
> It's good that Germany is going after the people who decided that fraud
When the VW scandal broke, the US indicted seven senior executives. None of these seven were extradited to the US to stand trial [1].
The VW scandal was made public in 2015 [2] and involved cheating since 2009. Sentencing only two executives to jail a decade after their wrong doing made international news does not send a strong message.
I don't think it is common, plenty of rich folks out there who do just fine.
We just hear about the bad ones.
I grew up in a smaller town in the midwest. There were some neighbors, bunch of old guy friends living in post WWII era baby boomer houses just down a few blocks. Nice guys, they were small town attorneys, politicians, small businessmen who ran some very humble businesses, and etc. They all drove 10 year old basic cars, golfed together on men's night, mowed their own lawns until they couldn't anymore.
It wasn't until I was older that I realized that they were all on the board of a local community bank that they started long ago. Over the years the bank grew, absorbed other banks.
Everyone of them was worth somewhere in the dozens of millions of dollars.
In the US, there are 800 billionaires. There are 5.5 million people that millionaires by liquid assets alone, and at least another 17 million by net value from things like retirement accounts or home value.
Odds are pretty decent you know one of them, or know someone who does, and probably don't even realize it.
I think it's because there's nothing left to do except amass more money and power. Like most billionaires can do whatever they want, but they keep working, or are in the public eye. Why?
Mozilla continues to appear to not get the philosophies behind open source. If they really wanted to help people and not simply try to get market share and make money, they'd examine ways to make Pocket itself open source, including the server end of things.
"We're handing this over to a non-profit" would be nice.
Does it surprise anyone that Verizon's Department of Evil would lie, cheat, bribe, steal, or otherwise do whatever it can to not be held to account for its agreements?
I'm not surprised, just as I won't be surprised if Verizon gets their way after throwing a few million dollars in the direction of Trump's "Library" or whatever.
> How long Netatalk will be able to support AFP remains to be seen however, since it too is based on the protocol itself. Since Apple removed native core AFP support from macOS, even third-party AFP products may no longer work.
> AFP has served Apple well. It was simple and easy to use - and it was reliable. But since we live in a TCP/IP and Windows-based world now, it has outlived its usefulness.
What? Huh?
Since when does an open source project somehow stop working because an OS stops supporting whatever the project does?
Netatalk may very well become MORE relevant, because it may be the only way for Macs running the newest macOS to interact with older Macs.
And "TCP/IP and Windows-based"? Is this AI generated slop, or just a really bad author who doesn't understand technology? AFP has been able to use TCP/IP since at lease System 7.6.
Sigh.
It's sad, in part because it brought so many generations of Macs together. I have an iMac G3 motherboard built in to a Tonka truck that runs Mac OS X 10.4 Tiger and acts as a file server and can support m68k machines running System 7.6.1, all the way through Arm Macs running Sequoia 15.5. It's a good thing Netatalk exists!
There are some strange passages in this, such as here where it suddenly decides to bring up the man page and how to exit man:
> There's an NFS app for macOS called NFS Manager from Germany's Marcel Bresink.
> On pre-15.5 Macs, see the Terminal AFP command mount_afp by opening Terminal and typing:
> man mount_afp and pressing Return on your keyboard. To exit the man system, press Control-Z or the q key.
> Several third-party NAS vendors, such as Synology and others, include AFP support in their products, but that's likely to come to an end soon too.
(Not clear why it would be coming to an end if they’re based on Linux!)
The cached headline I saw on Mastodon also called it “depreciated”.
Losing AFP sucks, because macOS’s SMB support continues to be abysmally slow, and really needs Apple’s undocumented proprietary SMB extensions to work halfway decent. Lately I’ve been accessing my SMB shares (from both Samba and Windows 11) through Cyberduck, because Finder is just unbearably slow and gets tripped up on file permissions for no reason. Deprecated or not, Netatalk will be more important than ever if users need a protocol that just works.
Just as a thought experiment, consider the fact that the i80486 has 1.2 million transistors. An eight core Ryzen 9700X has around 12 billion. The difference in clock speed is roughly 80 times, and the difference in number of transistors is 1,250 times.
These are wild generalizations, but let's ask ourselves: If a Ryzen takes 1,250 times the transistor for one core, does one core run 1,250 times (even taking hyperthreading in to account) faster than an i80486 at the same clock? 500 times? 100 times?
It doesn't, because massive amounts of those transistors go to keeping things in sync, dealing with changes in execution, folding instructions, decoding a horrible instruction set, et cetera.
So what might we be able to do if we didn't need to worry about figuring out how long our instructions are? Didn't need to deal with Spectre and Meltdown issues? If we made out-of-order work in ways where much more could be in flight and the compilers / assemblers would know how to avoid stalls based on dependencies, or how to schedule dependencies? What if we took expensive operations, like semaphores / locks, and built solutions in to the chip?
Would we get to 1,250 times faster for 1,250 times the number of transistors? No. Would we get a lot more performance than we get out of a contemporary x86 CPU? Absolutely.
reply