For years I've had this impression that Intel CPUs were, to put it simply, trying too hard. I administer servers for various companies, and some use Intel even though I generally recommend AMD or non-x86.
A pattern I've noticed is that some of the AMD systems I administer have never crashed or panicked. Several are almost ten years old and have had years of continuous uptime. Some have had panics that've been related to failing hardware (bad memory, storage, power supply), but none has become unstable without the underlying cause eventually being discovered.
Intel systems, on the other hand, have had panics that just have had no explanation, have had no underlying hardware failures, and have had no discernible patterns. Multiple systems, running an OS and software that was bit-for-bit identical to what has been running on AMD systems, have panicked. Whereas some of the AMD systems that had bad memory had consumer motherboards with non-ECC memory, the Intel systems have typically been Supermicro or Dell "server" systems with ECC.
In one case two identical Supermicro Xeon D systems with ECC were paired with two identical Steamroller (pre-Ryzen) AMD systems. All systems provided primary and backup NAT, routing, firewalling, DNS, et cetera. The Xeon systems were put in place after the AMD systems because certain people wanted "server grade" hardware, which is understandable, and low power AMD server systems weren't a thing in that time period. Over the course of several years, the Xeon systems had random panics, whereas one of the AMD systems had a failed SSD, but no unplanned or unexplained panic or outage, and the other had never had a panic or unplanned reboot in all the years it was in continuous service.
Had I collected information more deliberately from the very beginning of these side-by-side AMD and Intel installations, I'd have something more than anecdotal, but I'm comfortable calling the conclusion real: multiple generations of Intel systems, even with server hardware and ECC, have issues with random crashes and panics, on the order of perhaps one every year or two. I do not see a similar instability on AMD, though.
With brand new Intel CPUs taking substantially more power than similarly performing AMD CPUs, we have a more literal example of what I think is the underlying cause: Intel is trying way too hard to get every tiny bit of performance out of their CPUs, often to the detriment of the overall balance of the system. Between the not insignificantly higher number of CPU vulnerabilities on Intel due to shortcuts illustrated by the performance losses from enabling mitigations, and the rather shocking power draw of stock Intel CPUs that have turbo boosting enabled, I can't recommend any Intel system for any use where stability matters.
On the other hand, I'm currently dealing with an AMD system which seems to randomly hard reboot every couple of days, as if someone pressed the power button for 5 seconds.
It could keep running for 70 hours, or it could crash twice in 4 hours. Stress-testing CPU, GPU, memory, and storage doesn't invoke a crash, but it'll crash when all I'm running is a single Firefox tab with HN open.
Maybe I got unlucky, or maybe you got lucky. Who knows, really.
I'm one of those people who will go down the rabbit hole and not come back until I've figured it out, else I would never trust a flaky system ever again. This happened with one system where memory was failing, but it was in a very specific region which would coincidentally be where the OS stored filesystem structures, so the filesystem would become corrupt after a certain amount of use. Since it didn't feel or seem like anything memory related, it took weeks of trying different things until I decided to see about the memory.
On most computers, holding the power button for five seconds turns off the power, whereas a hard reboot might be similar to someone pressing the reset button. If it were me, I'd try, in order, checking that you have the correct version of your BIOS (I accidentally loaded the latest BIOS on a system with a Ryzen 2600, and it made memory transfers unhappy), using another power supply for a while, running another OS, like NetBSD, on an external drive and doing lots of test compiles, clocking the RAM one or two steps slower, then removing all but one DIMM at a time and testing with just one DIMM.
> On the other hand, I'm currently dealing with an AMD system which seems to randomly hard reboot every couple of days, as if someone pressed the power button for 5 seconds.
I had this actually with my Silverstone cased HTPC. Reboots several times shortly after bootup. Random. Turns out the power button was actually defective. Who would have guessed that.
A pattern I've noticed is that some of the AMD systems I administer have never crashed or panicked. Several are almost ten years old and have had years of continuous uptime. Some have had panics that've been related to failing hardware (bad memory, storage, power supply), but none has become unstable without the underlying cause eventually being discovered.
Intel systems, on the other hand, have had panics that just have had no explanation, have had no underlying hardware failures, and have had no discernible patterns. Multiple systems, running an OS and software that was bit-for-bit identical to what has been running on AMD systems, have panicked. Whereas some of the AMD systems that had bad memory had consumer motherboards with non-ECC memory, the Intel systems have typically been Supermicro or Dell "server" systems with ECC.
In one case two identical Supermicro Xeon D systems with ECC were paired with two identical Steamroller (pre-Ryzen) AMD systems. All systems provided primary and backup NAT, routing, firewalling, DNS, et cetera. The Xeon systems were put in place after the AMD systems because certain people wanted "server grade" hardware, which is understandable, and low power AMD server systems weren't a thing in that time period. Over the course of several years, the Xeon systems had random panics, whereas one of the AMD systems had a failed SSD, but no unplanned or unexplained panic or outage, and the other had never had a panic or unplanned reboot in all the years it was in continuous service.
Had I collected information more deliberately from the very beginning of these side-by-side AMD and Intel installations, I'd have something more than anecdotal, but I'm comfortable calling the conclusion real: multiple generations of Intel systems, even with server hardware and ECC, have issues with random crashes and panics, on the order of perhaps one every year or two. I do not see a similar instability on AMD, though.
With brand new Intel CPUs taking substantially more power than similarly performing AMD CPUs, we have a more literal example of what I think is the underlying cause: Intel is trying way too hard to get every tiny bit of performance out of their CPUs, often to the detriment of the overall balance of the system. Between the not insignificantly higher number of CPU vulnerabilities on Intel due to shortcuts illustrated by the performance losses from enabling mitigations, and the rather shocking power draw of stock Intel CPUs that have turbo boosting enabled, I can't recommend any Intel system for any use where stability matters.