Price / GB of DRAM hasn't actually fallen much in the 10 years of progression.[1] LPDDR is still over $3/GB. UDIMM is still ~$3 /GB, which is about the same in 2010 / 2011. i.e Despite what you may heard about DRAM price collapse in 2019, the price floor of DRAM has been pretty much the same over the past 10 years.
Every other silicon has gotten cheaper, NAND, ICs, just not DRAM. And yet our need for DRAM is forever increasing. From In-Memory Datastore on Servers to Mobile Phones with Camera shooting rapid 4K images.
Compared to NAND, or Foundry like TSMC, there are clear roadmaps where cost is heading, and what cost reduction we could expect in the next 5 years, along with other outlook. There is nothing of sort in DRAM. At least I dont see anything to suggest we could see $2/GB DRAM, if not even lower. I dont see how EUV is going help either, there won't even be enough EUV TwinScan machines going around for Foundries in the next 3 years, let alone NAND and DRAM.
The only good news is the low / normal capacity ECC DRAM has finally fallen to ~$5/GB. ( They used to be $10-20/ GB ).
I would not be surprised at all if it were to come out that Samsung, Micron, and other major players are price-fixing just like they have been caught doing multiple times in the past. They seem to pay their fines as a cost of doing business and then continue to operate like a cartel. Seems this is the same situation, and I don't doubt this is directly responsible for inflated DRAM pricing.
It is easy. For example you could hire 10000 people to perform calculations for your complex mathematical model (like it was done century ago) or you can buy Raspberry PI to perform the same task in no time. Which will be cheaper?
This is a bad comparison. DRAM margins are razor thin and unlike transistors, DRAM has not gotten much smaller, which is the main cost savings gained from process improvements.
Instead, server and accelerator vendors want ever faster and higher performance DRAM, and so these performance gains trickle down to consumers, but nothing is driving price down.
People on here literally think Moore's is a natural law and if computer hardware isn't getting 15% cheaper every year there must be funny business involved...
They have.
Giving the industry the benefit of the doubt after decades of price fixing is generous at best.
“To date, five manufacturers have pleaded guilty to their involvement in an international price-fixing conspiracy including Hynix, Infineon, Micron Technology, Samsung, and Elpida.”
It’s claimed to be a very high margin industry and supply is artificially constrained to maintain that margin.
I call it the economic fallacy. It's very common, especially on discussion forums like this.
Every problem is only related to how much competition there is, but not to how hard the underlying problem is.
It's somewhat related to the "awareness fallacy": The belief that every problem that humanity has can be solved if just everyone is aware and willing to act.
>DRAM margins are razor thin and unlike transistors, DRAM has not gotten much smaller, which is the main cost savings gained from process improvements.
Should we then expect SRAM to eventually be cheaper than DRAM?
In the end price discovery is supply/demand, and it appears investment in DRAM manufacturing is focused on improving RAM performance, and just keeping up with demand, not outpacing it.
Since the barrier to entry is very high, there's not so much pressure to compete further on price.
While it's true that the market ultimately determines pricing, the fact that manufacturers can stuff larger amounts of RAM into the newer equivalent SKUs means the price per GB for fast memory would still be driven down. This is even possible if new unit prices are higher, i.e. if this year's "entry-level" SKU costs a little more than last year's but now has more capacity.
The thing that's gotten smaller is the minimum feature size on a silicon wafer. The same types of etching and doping processes can be used to create many integrated circuits.
DRAM chips are integrated circuts that consist of individual transistors and capacitors for each bit, plus the wiring and logic to read, refresh, and write to those bits. They're substantially transistors.
It's true that logic, power, analog, flash, DRAM, SRAM, and mixed signal ICs do have some significant differences, and some manufacturers optimize for a subset of those capabilities, but they're similar enough that if one industry advances leaps and bounds (like Flash storage and low-power processing have).
> The thing that's gotten smaller is the minimum feature size on a silicon wafer.
No, even minimum feature size is improving much slower than in the past. Fabs are focusing on specifics that are still giving gain: lower power transistors, SRAM. Really high performance transistors like for amps have not gotten much smaller, DRAM has not gotten much smaller, analog has not gotten much smaller.
The capacitors and sense amplifiers in DRAM have not gotten smaller nearly as fast as any of the other features.
> DRAM chips are integrated circuts that consist of individual transistors and capacitors for each bit, plus the wiring and logic to read, refresh, and write to those bits.
That’s a very satisfying answer. I’m aware that the processes for logic, flash, and DRAM have some significant differences but I don’t know much more than “differences exist” (e.g. and therefore you have a different die for CPU and flash).
My reasoning is, at best, “SSDs have gotten cheaper, SSDs are kind of like RAM, shouldn’t RAM get cheaper?” and I know that’s not exactly an expert opinion.
SSDs are non-volatile — they don’t need power and constant refreshing like DRAM does. So you can do different things based on a different heat and power budget, like going 3D and adding more layers that would kill regular ram or a cpu, etc.
One thing I'd like to understand better about DDR5 is how well the built-in ECC is going to work to improve reliability. DDR5 comes with "chip level ECC" [1] of which the main purpose is to be able to better sell highly complicated memory chips with minor defects.
But as a consequence as I understand, it will allow for the correction of single bit memory flips. With regular DDR4 or previous generations, you don't get any error correction. Any bit error in your DDR4 modules has the potential to corrupt data. If you want to be protected from that, you will need to get ECC memory.
Unfortunately, anything with "ECC" in hardware for unfortunate reasons gets labeled with an "enterprise" sticker. And that means a certain price level, and a certain power consumption. (Yes I know you can get Ryzen boxes that work with ECC, but that's still PC sized hardware for hundreds of dollars).
If DDR5 can bring error correction to the masses - like in single board computers, 10W NAS boxes, smartphones - that would be pretty cool. But I'm not sure whether my reading of that is correct.
I expect Intel to still cripple some aspect of DDR5 ECC on consumer chips; maybe it will correct errors but the memory controller won't report them. Or maybe it's possible to disable ECC even though it's already implemented.
I also expect servers to use two levels of ECC to provide chipkill and also to keep server RAM more expensive than consumer.
Ryan Smith in the link above seems to suggest error correction will be done transparently anyway, so it seems like it won't reported to the OS. So it doesn't look like Intel could cripple it even if they wanted to.
> Ryan Smith in the link above seems to suggest error correction will be done transparently anyway, and it won't reported to the OS.
I once heard a rant from someone on how not reporting this to the OS is really bad for diagnosing issues, even soft errors that are auto-healed. (It could have been from Bryan Cantrill, but couldn't say for sure.)
I do think that there will still be some people interested in ECC detection and reporting, but that's not really why I'm interested in ECC memory. The dividing line "enterprise = error correction with monitoring / non-enterprise = silent error correction" is much more sensible than "non-enterprise = no error correction at all, good luck" IMO.
Isn't the primary reason why people are looking into diagnostics is because it's very hard to determine whether ECC is working in the first place? Because it depends on the particular hardware setup? If the spec states that all DDR5 is supposed to have internal error correction anyway, then I'm happy to take for granted error correction is working until I read about the scandals of non-spec cheap DDR5 :)
Yes, I do not run things at a scale that would need that, but I would appreciate at least a toggle to have it available if needed: default=quiet(er) would be fine for most cases.
One of the great ironies of modern computers is we dispensed with ECC just as we started ballooning out the size of RAM and shrinking the transistors so that single bit errors were more likely. I'd be very grateful for system ECC.
I wouldn’t be surprised if at some level that physics has forced the manufacturer’s hand—that previously low error rates are now unacceptable when you multiply them by 64GB.
I actually tried really hard to get RAM to corrupt a bit for a school project and didn't manage a single bit flip.
How often have you actually heard of data corruption due to non-ECC memory? Either yourself, any degree of 'friend of a friend', or perhaps a study that looked into the matter with more success than I had. I don't mean a newspaper story because exceptional cases are reported because they're rare exceptions rather than common enough that we'd be likely to come across it in our lifetimes.
e.g. "2009 Google's paper "DRAM Errors in the Wild: A Large-Scale Field Study" says that there can be up to 25000-75000 one-bit FIT per Mbit (failures in time per billion hours), which is equal to 1 - 5 bit errors per hour for 8GB of RAM after my calculations. Paper says the same: "mean correctable error rates of 2000–6000 per GB per year". "
> 1 - 5 bit errors per hour for 8GB of RAM after my calculations
That is way off from what I'm seeing. When launching Factorio I use 90% of my 8GB RAM and never once have I noticed data corruption, and I could tell you how many hours I've played but that would be embarrassing.
The test I did in school with heated-up RAM (the internet said that's when flips should occur more often) also wrote many many gigabytes without a single failure.
Not sure what hardware or temperatures that source is running but it's not DDR3/DDR4 at heats below hairdryer melting temperature because that's where I had to stop the experiment with zero failures.
I would have to find the paper again, but CPU caches can mask the error rate. The cached values also can overwrite any corruption with correct values. This has interesting side effect of protecting commonly accessed data structures and function pointers from causing out right crashing. Same applies to commonly used values in a computation.
Unless you get a bit flip in data structure pointer or function pointer, it just adds an error to computation, but does not just out crash.
Also we are talking only a handful of errors out billions of calculations.
-Edit
Also swap space may keep very rarely accessed data from corruption on the other end of spectrum.
And there are ways to manipulate RAM access patterns to induce errors as described in the initial Rowhammer attack paper plus later RAMBleed papers. Hopefully this newer DDR version is designed to be resistant to this type of attack.
I've seen bit flips reported in edac utils on a system with ECC memory. People routinely try to induce memory errors by overclocking their memory (to verify ECC is working). The triggering of bit flips is the very foundation of the Rowhammer attack (yes, I know Rowhammer can circumvent ECC with advanced techniques). Error correcting codes are used in networking environments, CPU caches, hard drives, anywhere but main memory.
Not sure why memory bit flips have the reputation of being such an edge case. It could be that it was an edge case 20 years ago, but it clearly isn't anymore. Computer memory has changed too.
If ECC is supposed to be a security measure then I can see the point, but aside from intentional flipping (by an attacker), a blanket statement like "it clearly isn't [an edge case] anymore" doesn't strike me as true. For it being a security measure, though, shouldn't it compute a much stronger checksum than one or two bits like ECC usually does?
I've experienced errors that likely propagated through memory errors.
I have a ZFS/NFS server on a 2012 i7 with 4gb RAM. I use it primarily to store various torrents (up to ~250GB each).
I have had my torrent client find single-chunk errors in a couple of torrents I was seeding (twice over a few TB worth of torrents). I recall reading ZFS filesystem over NFS is particularly prone to this. I did some worried searching and remember finding it was likely caused by memory errors being persisted to disk, but I don't have any links handy anymore.
I likely would not have noticed the corruption if the torrent client hadn't alerted me.
There was a great defcon talk about this. Basically using unicode and registering domain names with a bit flip resulted in results. Like email at Microsoft and some other major companies.
I think the title was dns squatting but can't find it at the moment
That was actually the inspiration for my experiment with this in school, and I also setup one or two domains to catch bit flips but never got any hits. It's a complete myth as far as I have been able to tell (research tells me otherwise by using huge setups, and there are a few commenters here that seem to have first-hand experience, some seeming more trustworthy than others, but clearly not a majority of people). I get that it's (obviously) more than a myth, but I'm not sure it deserves the goo-goo eyes that it seems to trigger with many engineers, either. It's a neat feature slightly above the gimmick status but gets way more attention.
I've seen a ton of it in the field. In general the ram stability is so bad that large operating systems fail to boot from corruption, so most of the time it doesn't get to the data corruption phase.
What I read around is that's not the enterprise ECC, it's more akin to ECC bits used in flash memory. It'll allow manufacturers to play fast and loose with memory.
I guess what I don't understand then is what big advantage "enterprise ECC" has left over this DDR5 "non-enterprise ECC". (Seriously, why is ECC "enterprise"? Everybody wins with memory error correction.) If regular DDR5 can correct single bitflips, it is on par in correction capabilities with "enterprise ECC" DDR4.
Maybe this won't allow for the detection of multiple flips, and maybe won't even report single bit flips to the OS (it'll just fix them silently). I suppose there's no big need to support detection and reporting for the vast majority of use cases. Ryan Smith at Anandtech in the link above says as much: "Between the number of bits per chip getting quite high, and newer nodes getting successively harder to develop, the odds of a single-bit error is getting uncomfortably high. So on-die ECC is meant to counter that, by transparently dealing with single-bit errors."
But for my purposes, if just the correction capabilities are on par with DDR4 ECC I'd be absolutely fine with that. And I guess that goes for many people. Even while using ECC memory now at home, I'm not monitoring the correction statistics and I'm guessing few people do in general. It might as well be silent today if you ask me.
> I guess what I don't understand then is what big advantage "enterprise ECC" has left over this DDR5 "non-enterprise ECC"
ECC should be end-to-end, so it detects and (hopefully) corrects errors anywhere along the path, not just within a chip.
Step 1 of handling a lot of ECC correction events is to reseat the DIMM, because often it's just an issue with the connection, not actually a memory defect.
And you may not care too much about reports of correction events, but you definitely want to see correction failures reported - the point is, after all, to avoid corruption.
Most servers have chipkill ECC that can survive an entire 4-bit chip going bad so that's more powerful than classic SECDED. I don't know how often chipkill kicks in though.
I don't know what "enterprise ECC" means, but there are certainly grades of protection, from single bit error detection (parity) thru triple error correct quadruple error detect, and inline vs non-inline correction schemes. (for the latter, the machine has to stop, go back, fix the error, & resume, potentially at significant performance cost)
If you could point me to a <20W machine with ECC memory that I can buy online like a mere mortal human person, then I'd very interested!
The only thing that comes to mind in that category for me is the PC Engines APU2. The APU2 is a very neat piece of kit, don't get me wrong, but it being the only option is not great either.
The D-1529 can be TDP limited to 20W (at a whopping 1.3GHz), but if you goal is idle power consumption under 20w you are better off with one of the 35w TDP D-1602 since it is only a dual core and has the lowest standby consumption. The D-1518 is also a popular choice, and basically the same as the D-1529 but with a 35W tdp limit so it can turbo to 2.2GHz.
note - the xeon-d series is not socketed, so the motherboard price includes the CPU. Make sure to get board that takes full size DIMMS since there is not a lot of ECC laptop memory on the market.
I'm aware of the Xeon D line, but sadly that was also squarely aimed at the enterprise albeit maybe a smaller scale enterprise. It's not something you can easily go out and buy any more, and when you do it's both very expensive, still PC sized, and not any more power efficient at idle or configured TDP when compared to a new Intel workstation or new AMD setup.
EBay offers a number of Xeon D boards at reasonable prices, many with extensive passive cooling. They have a lot of life left in them, to my mind, with good enough power efficiency for a box with normally very low CPU load. No moving parts, nothing to break. Yes, they are not speed demons, but low TDP presumes that anyway. ECC RAM support is there, though.
I bought it from that page. I wanted fanless silence and I wanted ECC and it took me a long time to find this machine but found it I did and it works. Linux reports seeing the ECC is indeed enabled (edac-utils?). Measured with a Kill-a-Watt meter it does consume more than 20W (maybe 30ish IIRC?). I actually disabled some Linux power management though to get bluetooth mice and keyboards to not lag after being idle for 5 seconds so my system might draw more power than others. Drives a 4K display buttery smooth, can watch all the YouTube and Netflix you want on Linux.
"the shift from dedicated DDR4 memory controllers to Serdes-based, high speed differential signaling mixed with buffer chips on memory modules that can be taught to speak DDR4, DDR5, GDDR6, 3D XPoint, or whatever, is an important shift in system design and one that we think, ultimately, the entire industry will get behind eventually."
I think this makes sense for IBM, but probably not for AMD and Intel. AMD and Intel are putting out new chips every year (even if they're just Skylake refreshes), so it's not too big of a deal to change the memory interface, and they are capable of putting in two flavors of DDR support when some flexibility is needed. For IBM, I don't think they release incremental chip designs each year, so it makes more sense for them to take the tradeoff of a flexible interface, so they can get onboard with newer ram faster.
It is pretty much the current greed driven technology. Company doesn't want to invest in proper FPGA technology that will allow reconfiguration of chipsets on the fly, but instead they want people to buy new chipsets every year or so and preferably make the old ones obsolete (not resellable)
Computers have never used FPGA memory controllers that are upgradeable to newer DRAM standards so there's really no reason to say that's the "proper" solution. And memory standards overlap for 4-5 years which also happens to be the lifetime of a PC so the price/performance benefit of future-proofing really isn't there.
How would the increased cost of a FPGA memory controller benefit the end user?
They would still need a new motherboard to use newer memory types, because the modules will have different connectors. They might need a new chipset if the newer memory type wasn't actually addressable with the FPGA (not enough pins, not enough signalling capacity, not enough voltage flexibility, etc).
For the majority of users, they don't change the cpu, motherboard, or memory for the life of the computer (in many cases, some or all of these parts are soldered to the board). Paying more for flexibility that will never be used isn't good for anyone.
AMD and Intel have released multiple DDR4 chipsets over there last 4~ years. The chipsets releases have hardly been related to RAM type support. Typically it's power or PCIe related, with Intel being the worst offender.
Ryzen has seen improvements to DDR4 module support with each generation and to take advantage of the latest generation you may need to upgrade your motherboard to one with the latest chipset, depending on vendor support. Even if your older board supports the newer generation, likely you have a more limited QVL for DDR4 modules, so compatibility and perf may be limited.
This is why I implied it had limited impact, though there is some truth to boards with newer AMD chipsets offering better memory support.
This was a response to parent comment stating it was greedy behavior to try to get consumers to buy new motherboards with new chipsets, when the chipsets have little impact on RAM compatibility/support.
Adds a latency hop, which generally can be dealt with by prefetching and larger caches on the CPU side.
There's speculation that AMD is going to do the same - in Zen 2 and later designs the CPU chiplets are coupled with different IO dies depending on the design (Ryzen, Threadripper, Epyc), and swapping out the IO die for one that has support for new/different memory types would less work than taping out a whole new monolithic CPU.
Intel uses monolithic dies, meaning everything is on the same chunk of silicon. This has given them some improvements to latency[1] and power usage[2] in the past but hurt them on yields. AMD has a chiplet design, which improves yields[3] and may allow a more modular approach as mentioned.
[1] can be overcome via caching and other considerations, but purely from this aspect the impact is this.
[2] Longer traces lead to higher capacitance, and the power estimation formula P=C(V^2)f*a shows that this one aspect will change power use. Everything on one die means less parasitic capacitance.
[3] If the defect density is the same, and if you have 10 errors per wafer, then you will see different yields if you make 10 vs 100 vs 1000 chips on that wafer. Chiplets are smaller than monolithic designs, so we can put more of them on one wafer which improves yield independent of process
Intel desktop CPUs are monolithic. A 10900K is a single die, meanwhile a AMD 3900X is three dies: two compute dies and one I/O die (which are sourced from two manufacturers on two different processes afaik). An AMD server CPU has the same compute dies, except more of them, and a very different IO die.
The AMD compute dies are only connected to the IO die and other compute dies (and power). All IO connections exclusively go through the IO die, so the IO die can be customized to change the IO of the CPU without changing anything about the compute dies. It would be entirely feasible to just re-spin the IO die to add support for different memory, Thunderbolt or other IO ports. The IO die is also made on a cheaper, lower-density and performance process (14 nm / 16 nm) than the compute dies (7 nm).
So AMD is back to having a northbridge again but it's on the same package instead of the motherboard for latency reasons? Or could we actually get away with a northbridge on the motherboard again?
Sort of - it allows them to build lots of different sorts of systems - many CPU chiplets and a mem controller, 1 CPU and a mem controller etc etc all from the same basic components - Intel have to spin a new chip for each SKU, AMD can build different SKUs by packaging stuff differently - it gives them a lot more flexibility and means they can spin out new stuff to fit a new market segment much more quickly
I know that motivation but I'm more curious about the hardware architecture angle. Integrating the memory controller in the CPU was supposedly a big gain at the time. Now it's in a different chip and multi-socket motherboards already have to traverse the board to access RAM attached to another chip. Are the interconnects better now and so going back to a single northbridge is workable? Would it simplify the topology in multi-socket systems to have all the RAM together instead of having to take care with process affinity to RAM? I'd love a source for discussion around these kinds of tradeoffs.
On package interconnection latency and power is way lower than going off package, AMD also doubled the L3 cache size to compensate for increased memory latency. The issue of putting too much IO in single die is that perimeter of die/package needs to fit all the IO traces on substrate/motherboard, which means more layers and more costs. Everything is just a performance/cost/power trade off. But I would say that off package controllers dealing with multiple CPU chips are probably less viable now than they were before current core count increase.It's because synchronization traffic would require insanely large busses to those controllers, if you wanted to have lots of sockets, and then you would need a lot more pins to those.
On the other hand if you had those on package you would get a lot more bandwidth at much lower power and latency, which is what AMD has done.
> Power10 chip .. using the DDR4 buffer chip from MicroChip and only adding a mere 10 nanoseconds to memory latency.
At 5GHz that would be 50 cycles or 1m round-trip. The delay is in processing - bridging logical and technical protocols, resynchronization, fanout, probably even a cache layer.
I like to grab RAM towards the end of the generation, so far it has been the best bang for my buck (for my personal computer). From the production estimates, looks like I'll jump into DDR5 at the end of 2023 or most likely 2024
Yeah, when DDR4 came out first, the early modules were easily outperformed by good DDR3 modules on every metric except power consumption. DDR5 will be king in a few years.
I have a 4690K and my RAM is a 32GB EVGA DDR3 ram that I suspect was made by hynix or the other manufacturer of fast rams, that runs at speed up to par with the slowest DDR4 but with DDR3 latency, it is very awesome.
I wonder if DDR5 will be fast enough to compensate for the slower latency (or maybe they improved latency this time?)
To be honest I didn't felt yet any need to move off my current machine, I would only upgrade its GPU, but I can't do that because I can't afford a new GPU AND a new Monitor (I use a CRT monitor with VGA cable... it is a very good monitor so no reason to replace it, but newer GPUs don't support it).
>that runs at speed up to par with the slowest DDR4 but with DDR3 latency, it is very awesome.
AFAIK ddr4 having higher latency than ddr3 is a myth. It has a higher cl number, but that's measured in cycles, so the higher cl number of ddr4 is compensated by its higher clocks. The actual latency (measured in nanoseconds) is about the same, or slightly lower in ddr4 than ddr3.
I think that's what he's saying, that if the latency is equivalent on best ddr3 and lowest ddr4 and ddr3 is cheaper or you already own it maybe better to wait.
Latency is significantly impacted by the physical distance between RAM and CPU’s. Which combined with modern cache sizes means they get diminishing returns trying to minimize it and thus make different tradeoffs.
Core-to-RAM latency is in the neighborhood of 50 ns (well-tuned Intel system with low-latency memory) to ~80 ns (bottom-of-the-barrel system). At propagation speed, that's about 10 meters. A big chunk of this latency is internal to the CPU (so is not influenced by distance to the memory at all), another big chunk is the inherent slowness of accessing a DRAM array (10+ ns, independent of the location of the memory).
It's worth pointing out how little this has changed over the past decades. A 2006 AMD CPU is 100 % competitive in regards to memory latency with Intel's 2020 flagship desktop CPU.
10 meters one way = 5 meters round trip. Trace the longest physical path a signal travels from your CPU to a memory chip on the DIMM and back, it’s likely longer than you think. And yea on it’s own plenty of overhead, but everyone making latency tradeoffs bashing their design as part of a larger system with a single unavoidable limit.
Eurogamer did write a review about playing modern games on a (very good) CRT monitor [1]. It's better than any LCD and even OLED monitors according to them:
A point not mentioned there but that is a major reason for me to use CRT: Contrast.
The contrast in most flat screens I tried is just terrible, a classic example I use is trying to play Superhot and then watch Game of Thrones right after... Superhot was everything white, so I fiddled with the controls until it was playable.
Then Game of Thrones everything was black. So I fixed it... then went back to Superhot and everything was white again.
With CRT after I adjusted it, I don't need to adjust anymore.
Some of those last generation CRTs had amazing picture. I had one from 2003 or so that could do 2048x1564 with a picture better than anything you could get from an LCD until the late 2000s.
You may know this already, but if not - if you have others to share the console with (eg kids, partner), I think it'll be worth going for the '5 just from that perspective.
Switching between games is a non-trivial (yeah yeah mock me, 1st world problems) thing on the ps4 in that the active game must be quit first, then the new loaded. You may not be on a suitable spot to quit either, far from save point. On ps5 (or xsx) it's supposedly a very quick alt-tab kind of thing.
So far, I'm kind of happy in secret that the rest of the family prefers the Nintendo :)
Depends upon what you pay - certainly the second market will see a surge and drop in prices. If you can tech on top end 5 years later, it's not that bad and sure wallet friendly.
Also factor in that it usually takes a few years until the full potential of a platform it tapped, so the PS4 in it's prime days for games.
I do not understand your argument as the parent post is saying the PS5 can play PS4 games. So the PS5 will also benefit from the great games coming on PS4. The prices of the games would be the same, you would benefit from the same experience of the devs developing on the PS4. The only disadvantage is that the PS5 will be pricer. And it will also be less ecological-friendly if you intend to buy an used PS4. But that's it.
I've seen a few people who ride one generation back and buy used. It means you can get the console incredibly cheap (and can choose the best/most reliable variant), have a huge catalogue of games to explore (which are also incredibly cheap), and already know which games are standout and which are misses. Doesn't work if you're into the latest multiplayer games though.
Should be, at the end of the day they are both just X86_64 systems. It isn't like the console generations of old where each one would used a specialized CPU usually designed specifically for that console. (The PS3 had to include a full PS2 chip for its compatibility layer)
All that said, I do see them somehow screwing it up.
But it doesn't have any games worth playing on it, whereas the ps4 has an established and extensive library that would take (me, at least) quite some time to get through. Probably long enough for the ps5 to become more purchasable than a steel Daytona, unlike now where it"s apparently extremely difficult to buy
For me, I'll get a PS5 for Demons' Souls alone. When you factor in upcoming games like the new Horizon Zero Dawn next year and the graphics-upgrades like Cyberpunk 2077, its definitely worth it.
Although, realistically, it'll be a few months into next year before I actually get one.
Worst case, creditors cant take experiences from you, nor would they bother with most consumer goods.
I’m wanting to stay eligible for low rate mortgages or leases, and thats the priority but yolo. More motivation to make more before it becomes a problem, the end.
Not really. They plan to try to make "as many PS4 games BC as possible" which means they are actively porting games they deem worth porting, versus it being natively supported
A huge amount of games will be portable from the get-go. GCN and RDNA have extremely similar assembly languages.
The only major assembly language change from GCN -> RDNA is DPP / cross-lane operations (kinda like pshufb from x86). But I'm not even sure if the PS4 had those instructions.
It's deeper than that, the shaders are _binary_ compatible.
> The only major assembly language change from GCN -> RDNA is DPP / cross-lane operations (kinda like pshufb from x86). But I'm not even sure if the PS4 had those instructions.
Yeah, it didn't. AFAIK, they were added in GCN 3, while the PS4 is GCN 1.1.
Vop2 are the "2-source / 1-destination" instruction format. You can see from the table that GCN 1.0 and GCN 1.2 don't even line up at all.
It wouldn't be hard to compile GCN 1.0 into GCN 1.2 instructions, but it wouldn't be binary-compatible, just assembly-language compatible (like 8080 -> 8086).
--------
Some other facts:
* RDNA is Wave32 native. Wave64 compatibility is available though, so that should mostly work for backwards compatibility (aside from DPP, which you do point out may not exist in PS4)
* S_WAITCNT ("wait for memory" instruction) has grossly changed in RDNA. In GCN, waiting for VM_CNT(0) will wait on loads and stores. But VM_CNT(0) only waits for loads on RDNA.
You need to change every S_WAITCNT VM_CNT(0) (GCN) into S_WAITCNT VM_CNT(0), followed by a new S_WAITCNT_VSCNT 0 instruction (wait for 0 outstanding loads, THEN wait for 0 outstanding stores).
This isn't "binary compatible", but if you just inserted one instruction on every GCN S_WAITCNT, you'd get the proper behavior in RDNA.
-----
I'm seeing GCN -> RDNA as a "mostly easy compile" between the assembly languages. But it doesn't seem binary compatible to me. I wouldn't be surprised if there were one or two issues that popped up however.
Or it's _really_ close, but the systems are complex enough that they feel they need to QA/cert the games again even if the vast majority of games require no changes.
Do shaders on PS4 games ship precompiled? If so, they'll at least need to either recompile those or create some sort of translator (which certainly isn't impossible). Either way, I'm sure even in an ideal situation, Sony wouldn't want to make outlandishly absolute claims like "perfect backwards-compatibility".
It would be silly for Sony to not support BC to at least PS4 games (if not PS3-2-1 through some sort of emulation) since Xbox Series X will have BC down to the first Xbox console.
I was planning on same thing, until I estimated how much faster modern PC was at single threaded compilation tasks compared to my old one, and how it impacted my productivity. I should of upgraded earlier.
I thought so also, but decade of small improvements do accumulate to a large number. Something like 20% improvement 4 times would be doubling the IPC, but single threaded compilation is where I got more than that.
The after purchase testing was even better than expected.
i7 920 compiled a specific compilation unit in 14 seconds. 3900x in 3.5 seconds. And that was before any tuning, I had done some bios tuning for i7 920, while 3900x I just limited the power to make it quieter. The IPC is more than double, the larger caches are probably the thing that pushes IPC beyond expectations. (I got more cores to improve scaling of multithreaded code.) Both times were recompilations where there was enough ram to have all the files cached and the the folders used where on ssd:s with modern CPU used nvme and older one SATA. Even if the nvme matters that upgrade wouldn't of been possible without getting modern MB. The make system was make, and what I used was LLVM tutorial code in single file that included many LLVM headers. The software wasn't upgraded between those runs, just moved disks from old system to new system and copied the data.
I did look before the purchase if there was IPC improvements that would of made more sense to buy new than buy some Westmere 6 core to upgrade my system. Results made it clear to me that getting any cheap new CPU would of been preferable over wasting time with getting the westmere. The IPC improvements for compilation were way higher than average IPC improvements.
But even sandy bridge was weak enough in the benchmarks that it would of made sense to upgrade from that. Before purchase I just looked from phoronix benchmarks/open benchmarking database a compilation benchmark that had worst CPU scaling for more cores and used that as approximation for single threaded compilation.
My own results were much larger than what I assumed it would of been based on comparing the sandy bridge to modern CPU:s and then multiplying that with clock speed advantage and IPC advantage of sandy bridge vs i7 920.
Oh, when I got i7 920 I decided not to upgrade until I got 8 cores. Then it was AVX-512 happened I knew I must have that so that I could play around optimizing code with it, Intel just could get it's 10nm very soon so that I could get those 8 core AVX-512 parts in reasonable price and power envelope. I just did the math, and realized wasted time because slow CPU would cost me more over next 2 years than upgrading.
The i7 920 is a dozen years old (2008). No explanation required. I was thinking there wasn't much of an IPC increase over the last half dozen years or so, but there's absolutely been one over that long. Though really it's funny how little it's still increased - in 1996 we were just getting Pentiums at 150 MHz, and I don't think anyone would argue those are anywhere near comparable.
If we split the cycle time in two components, one is transistor and one is interconnect. Interconnect delay per length increases as much as the length decreases in each shrink. The transistor delay halves. That halving got transistor delay below the interconnection delay, so doubling that got huge improvements. Also for IPC there is diminishing returns, and easily available improvements got eaten early, and everyone is chasing diminishing returns. The Pentium PRO brought OoO and went from 2 to 3 decode and from 2 to 4 micro-ops/second. Core 2 went from 1 to 2 FP pipelines.
Everyone is hitting the same issues, O(n^2) power and die costs for some structures widening things, but smaller fraction of code getting improved by widening. But it isn't all, it also increases latency which is either lower clocks or increased branch missprediction penalty.
Improving CPU:s has become harder simply because all the easier things have been done, and cost of improving one metric often causes worsening another thing slightly.
edit:
Just to add, it isn't impossible to improve, but it's just has become harder. And this knowledge was part of reason why until I saw actual benchmarks I wasn't too interested in upgrading. I just found out the one thing I cared had gotten much better during that period.
I try to refresh when a new ram gen launches with a new socket. That gives me a good shot at long term upgradability. Waiting on zen 4 now for my next complete build.
You are very patient. I'm not sure about skipping this generation, because who knows when we will get ECC UDIMMs for AMD cpus. It could be late 22 or or even mid 23.
ECC on AMD seems quite hard in practice even though the CPU support isn't artificially limited like with Intel. Making sure the motherboard supports it is hard and then the RAM options are extremely limited and generally quite slow and expensive. Threadripper is maybe better but that's quite a jump in cost. It would be so nice to finally be able to assemble low-cost home servers and workstations with ECC but it remains a niche where you have to give up a lot to get it.
I've been waiting for DDR5, I knew it was coming and my PC runs the games I play just fine on DDR3. I do have a Thinkpad p51 that has Xeon/DDR4, so I didn't completely skip DDR4.
To my hardware colleagues on HN, what prevents something similar to Dennard Scaling on DRAMs?
My very naive textbook knowledge is that every bit for DRAM uses up a single transistor and a capacitor, whereas a SRAM cell uses up 6 transistors.
How is it then that with all the scaling so far that traditional SRAMs haven't caught up with DRAM capacities? A single DRAM chip is huge compared to the total die size of any micro-processor.
As the sibling comment asks about cheaper DRAMs, I'm trying to understand how using SRAMs haven't caught up yet from a price/GB perspective.
I don't know why you would expect a 6T SRAM cell to ever be smaller than a 1T DRAM cell given that both of them are scaling. Also, DRAM die sizes appear to be 40-80 sq. mm which is smaller than processors. https://www.semiconductor-digest.com/2019/09/13/dram-nand-an...
First, DRAM and SRAM are more than just the transistors, they are the lines going into each of the transistors carrying the signal. They are also all the control circuitry around those transistors. When you write out, you aren't just involving the 6 transistors to store, but rather a whole host of control transistors.
Next up, changes in current on a wire induce current on surrounding lines. This induced current is results in what's known as "cross talk". There are a bunch of methods to combat this, the primary one is to make sure there is enough space between lines to avoid it. This means that while your transistor size may get smaller and smaller, you still have a limit on how close you can place those transistors, otherwise you risk unwanted bit flips. DRAM has a major advantage here simply because it requires fewer lines to control state. That results in a more dense packing of memory.
With those two points in mind, there's simply no way for SRAM to ever have the same price/GB or density as DRAM (without the market screwing with prices).
> How is it then that with all the scaling so far that traditional SRAMs haven't caught up with DRAM capacities?
Leakage.
As FETs get smaller they leak more. CMOS Logic has dealt with this by having "dark silicon" -- yes, you get twice as many transistors as the last generation, but you can't use as many of them at the same time. You have to keep some of them turned off. But turning off SRAM means lost data, so "dark SRAM" is useless -- unlike, say a "dark vector unit" or "dark floating point unit".
DRAMs can optimize the entire process for just one job -- the 1T1C access gate -- to keep leakage at bay. Or if all else fails, just refresh more often, which hurts standby power but isn't a meaningful contributor to active-mode power.
SRAMs are catching up, but they are still much less dense, and are normally configured for smaller line sizes and lower latencies than DRAM. DRAM requires sense amplifiers and capacitors which have scaled both scaled slower than transistors.
From a systems perspective, lots of work has gone into hiding DRAM's faults highlighting its strong points, so a system where DRAM is replaced with SRAM will be more expensive but not realize most of the possible benefits without major redesigns of the memory systems.
Intel has some xeons with over 70MB of L3 and also released some eDRAM chips to play around with this idea, but notice they used eDRAM to get 128MB of L4 on a consumer chip - SRAM is still very expensive!
Will we get performance increases, and how big will they be in the average case, not for some specific codes with low cache hit ratio on large datasets, and attributed solely to bandwidth increases and not architecture IPC improvements?
"For bandwidth, other memory manufacturers have quoted that for the theoretical 38.4 GB/s that each module of DDR5-4800 can bring, they are already seeing effective numbers in the 32 GB/s range. This is above the effective 20-25 GB/s per channel that we are seeing on DDR4-3200 today."
That looks like a 20%+ improvement IF you are bottle-necking on DDR4.
How does this play in with the fact that the DDR5 is clocked at 4800mHz and the DDR4 is at 3200? Would we not expect a 50% improvement with respect to transfer rates with a 50% increase in clock? I really don't know.
There are even 4800mHz DDR4 DIMMs available now, even if they are niche.
Depends a lot on your CPU's architecture as well as workloads, how far ahead it prefetches vs how often it's stalled on large memory reads. So it's hard to know. e.g. Zen+ (Ryzen 2000) was seeing 10% in some gaming workloads going from DDR4-2400 to DDR4-3600, but it's much less drastic on Intel CPUs or even Zen 2 (Ryzen 3000) because the memory controller is smarter so the slower RAM is less of a detriment. And then if you go above 3600mhz (or 3800mhz if overclocked) on zen 2 you start getting negative returns for a bit because the CPU memory controller can no longer run at the same clock as the memory and that induces overhead. But maybe a 4800mhz if it can be stable easier gets far enough ahead of that penalty that the improvement goes positive again. Or maybe Zen 4/DDR5-lake just works with memory entirely differently and the performance gains are massive or neglible.
The short of it is it's very hard to make predictions here.
The reason Zen1 (and 2) speed up a lot from ram speed increases is because the memory controller speed is tied to ram speed. So when you bump up the ram speed you also bump the memory controller speed, which reduces ram latency and inter-cpu communication latency.
There is no telling what the memory controller -> ram ratio will be with ddr5, the memory controller has speed limits, so you aren't going to get free speed just because the dr5 starts at 4800, because the zen memory controller can't run that fast anyway.
Zen 2 IMC can do DDR4-5000, but it's rather pointless, because it requires un-coupled mode with IFCLK != UCLK (since IFCLK cannot be pushed to more than ~1.9 GHz, and even that is not stable on many CPUs), which adds so much latency that you _need_ DDR4-5000+ to approach the performance of DDR4-3800 with IFCLK = UCLK.
>even Zen 2 (Ryzen 3000) because the memory controller is smarter so the slower RAM is less of a detriment
Source? I'm not of zen 2 having a "smarter" memory controller that improves performance. AFAIK the only improvements that they did implement (that could be construed as relating to the memory subsystem) was larger caches and a better branch predictor.
It uses CPU performance counters to show things like ITLB_Misses or MEM_Bandwidth. It won't show when you're waiting for GPU/SSD/etc because those aren't visible from CPU performance counters. I'm not aware of a single tool that will do everything, unfortunately.
Also, this isn't a "benchmarking suite"; it's a tool you can use to instrument whatever load you're running, which I'd say is better. It's often used to improve software but could also identify if faster RAM will help.
Benchmarking of what? Based on the task, you need a specific benchmark. If it's gaming, there are various benchmarks, run them, see utilization, whichever is not 100% is the bottleneck.
If it's computation, it's more complicated to discover the bottleneck (your problem may be cache misses, memory bandwidth, architecture that doesn't go well with the algorithm).
AMD will present its next generation Ryzen CPUs, based on Zen 3, the day after tomorrow, 08.10.2020 [0][1] - maybe we can get more info about DDR5 compatibility already then.
Although this question is more academic in nature, how "difficult" is memory training/initialization compared to DDR4? I recall an active microcontroller needing to calibrate the DRAM on startup for DDD4.
I haven't been able to find any specs on latency, and whether it has improved or not. I assume it hasn't, because it doesn't tend to, but does anyone know for sure?
If you thought HNers were insufferable whiners regarding RAM or NAND soldered to motherboards, just wait until they start marketing monolithic CPU+memory chips.
Hooray for FINALLY putting a local DC/DC converter ON THE DIMM so the motherboard can feed it with high-voltage/low-current power instead of low-voltage/high-current. The latter has become increasingly impractical (and noisy!)
Often the initial consumers would be enterprises instead of casual users. There are numerous enterprise use cases where higher bandwidth and lower latency would be worth the cost. Some that come to mind are in financial services and ML inferencing. I can imagine high-mem compute instances of cloud service providers being an obvious place for these.
Also, going for 2 to 1 seconds is pretty huge if you're doing some operation hundreds or thousands of times a day.
However, all those people who complained that the Atom editor took too much memory are about to experience a new world when they buy their next computer.
I used Emacs back when people joked “Eight Megs and Constantly Swapping”
To be fair. Electron bloat doesn't grow with application size. Only incompetence does. By using electron you are basically forcing your application to need around 200-300MB of RAM no matter how trivial it is, but that's all there is to it. Poor application performance has more to do with bad application development. Nothing prevents you from e.g. building Atom in a way that lets you view files bigger than 2MB with good performance or building a Slack client that doesn't leak memory. I can run lots of tabs in Firefox with good performance but if each tab was using its own browser instance I would run out of memory very quickly.
A casual user like you ends up using quite complex compute on the cloud, when you watch a video on youtube, scroll a newsfeed, and make an airline booking. Some of the advanced complexities of those tasks, when programmed well by good engineers, can become feasible with this.
All the Machine Learning related buzzwords exist mainly because they have only now become computationally feasible. You never know what will come next.
On the whole, I find these 20% performance memory upgrade (leading to a fraction of that in real-world performance) more obnoxious than anything, but I'd love ECC!
I have collected about a dozen single bit (corrected) errors in my home cloud, and seen 2 uncorrectable errors.
I also have come across about a dozen 'bad' sticks of ram which show errors in memtest86, about half of which only have 1 bad row, and 2 of which only show errors after multiple passes which I assume is thermal based.
I do however build all of my computers from pieces from the recycling, so the statistics may be a bit off of the norm...
When you think about the logic of what you said, it is pretty silly.
Without ECC RAM, how would you know that you had a single bit flip? How would you know that you needed ECC RAM?
When you talk to people who run server systems, you'll find there's plenty of bit flips. This expertise is getting harder to find though, as more people run systems in the "cloud" where there's no visibility into the physical error statistics.
> Without ECC RAM, how would you know that you had a single bit flip?
Errors, glitches, or files with unexplained errors. If there is a bit flip in code, more likely than not that'll be an invalid instruction, jump, or something like that, so it would crash. If it's in data, often there's some algorithm involved (compression, linked list, js/css/html, whatever), so that would either find invalid data and crash or display an error, or display at least a wrong pixel to a certain degree (but, true, I'd have to spot a single color channel being off and there's, let's say, a 4/8 chance of the significance being too low to really see; but this only applies to a raw bitmap case). Data on disk should also become corrupted (I'm thinking of image or video editing, data processing like gigabytes of port scans that I recently processed, etc.) if it was inadvertently modified in RAM. There are a ton of ways to notice this, though admittedly also a certain number of cases where one wouldn't.
I see your point though, like, if the error is silently corrected by the software or I just hit retry and don't figure what the error must have been (also because it won't be reproducible), I'm unlikely to find out that I need ECC. Maybe I should introduce random bit flips on a raspberry pi (so I don't corrupt my actual production filesystem) and use that for an evening for browsing / programming / other usual activities to either prove or disprove the theory that I'd notice if this happens with any sort of regularity.
Price / GB of DRAM hasn't actually fallen much in the 10 years of progression.[1] LPDDR is still over $3/GB. UDIMM is still ~$3 /GB, which is about the same in 2010 / 2011. i.e Despite what you may heard about DRAM price collapse in 2019, the price floor of DRAM has been pretty much the same over the past 10 years.
Every other silicon has gotten cheaper, NAND, ICs, just not DRAM. And yet our need for DRAM is forever increasing. From In-Memory Datastore on Servers to Mobile Phones with Camera shooting rapid 4K images.
Compared to NAND, or Foundry like TSMC, there are clear roadmaps where cost is heading, and what cost reduction we could expect in the next 5 years, along with other outlook. There is nothing of sort in DRAM. At least I dont see anything to suggest we could see $2/GB DRAM, if not even lower. I dont see how EUV is going help either, there won't even be enough EUV TwinScan machines going around for Foundries in the next 3 years, let alone NAND and DRAM.
The only good news is the low / normal capacity ECC DRAM has finally fallen to ~$5/GB. ( They used to be $10-20/ GB ).
[1] https://secureservercdn.net/166.62.107.55/ff6.d53.myftpuploa...