Crossbar believes it can write data to its chip at 140 megabytes per second, compared to 7 megabytes a second for flash. Read performance is 17 megabytes per second, with a random read latency of 30 nanoseconds.
The write performance sounds good, but the read performance seems very low. Also I don't know where they get 7 megs per second for flash. Sounds like the picked the worst performer to compare to.
I'd be more impressed if they had something on the market.
The numbers are fairly suspicious, they don't reference the density of the flash they are benchmarking or if they are behind controllers.
Flash chips are generally multiplexed and behind a smart controller that compresses and and caches data, which can greatly effect the throughput numbers.
Lots of other stuff to know -- what's the BER (bit error rate), what's the programming model (page erase, like flash)? How many simultaneous reads can you push through? Writes? Can you halt an ongoing operation or change it mid-stream?
They're talking about per-chip performance. Flash write performance really is that low - they get around it by spreading writes to multiple chips.
Crossbar can do the same thing, in which case the same number of chips provide higher throughput. Read latency is not parallelisable, of course, so there'll be no way for flash to catch up there without making fundamental advances.
" It could also perform its storage functions at 20 times lower power, extending the battery life of devices using it to weeks, months, or years. "
no, reducing RAM power 20x does not extend battery life of devices to weeks. Anything with a screen or active antenna will likely not see much of a difference
Where does the extra battery life come from? The CPU? Graphics? Better RAM? Better wifi/BT? Software improvements in the OS and apps to use power more efficiently? Lower power screen?
Battery life is affected by such a multitude of components that it's hard to extrapolate savings in one component to the battery life of the complete device.
Yes, but that has nothing to do with the RAM. Processors take a significant chunk of the platform power while RAM in almost all cases does not anymore.
They also don't compare the read speed (17 MB/s) to existing flash memory. They showcase an amazing write speed, but most applications access data a lot more frequently than they create it.
Even then the limit is unlikely to come from non-volatile storage itself. The current effective flash R/W speeds are pretty much limited by the controllers.
I remember an article stating that embedded flash controllers typically have a 20MB/s speed limit. This in turn meant that wireless speeds beyond 200Mb/s were mostly worthless, because the devices simply could not utilise any more. If someone can find the article again, I would be delighted.
However: if the reality with RRAM is even half as good as the claims go, then it would certainly encourage the hardware vendors to invest in somewhat better controllers. After all, what good is a new, hyperspeed storage medium if you can't access it any faster than the old one?
The whole premise of taking NAND from 25nm to 19nm (for instance) is to fit more floating gates in the same area. You can take that as a smaller die, or as more bits on a slightly larger die than the previous generation.
Die size is indeed a major factor on cost. For a given technology (litho node + process, e.g. # and type of steps), the cost to process a wafer is fairly constant regardless of die size.
If you shrink a die size, you fit more die on a wafer. Additionally, yield goes up (given an independent manufacturing defect density), and especially for large die, the tessellation around the edges has a major impact.
Indirectly, even testing is related to die size, in that there is a limit to tester parallelism, and more gates means more time and more combinatorial patterns to test, e.g. for stuck-at testing.
There are of course non-linear costs in packaging and package-level testing and elsewhere.
The per unit costs of a chip[1] are essentially constant per wafer. The smaller a chip is, the more of them each wafer will produce. Also, the smaller a chance there is that any given chip will be ruined by an imperfection. So then number of chips you get from each dollar of production cost is a bit less than linearly inversely proportional to the area of the chip. There are other factors too, though, and wafers from more advanced nodes will tend to be more expensive. However, the two examples being compared here were both from the 25nm nod.
[1] Which dominate with memory since production runs tend to be large and the regular patterns make for less design investment than, say, a CPU.
It's always been related, but it's not the only factor. Packaging, assembly and test are the other primary components of manufacturing cost. You are correct though, that price (as opposed to cost) is going to include amortized R&D, marketing, license fees, margin etc.
"[Crossbar's CEO] could not estimate the price of a 1TB RRAM module, but said it will cheaper than NAND flash partly because RRAM is less expensive to manufacture."
Does anyone know if there has been progress in volatile memory tech? I'm hoping for 1TB RAM chips with 10x faster access times than current RAM. It will enable PC gaming to deliver unmatched immersive experiences, among other applications.
Especially if this can be accessed as VRAM directly and just as fast by the GPU. Infinite-detail fractal-resolution destructable-animatable voxel terrains come to mind (not the Minecraft kind of "voxel", mind)..
The thing is at this point a lot of latency comes from the fact that each dimm is physically displaced from the cpu.
When you are on a timescale of nanoseconds even electricity's speed can be slow when being compared to something like intel's l4 cache which is on-die.
For any type of volatile memory that latency will exist until mobo designers move the ram closer to the cpu or adopt optical interfaces between parts.
For some cool calculations that can put things in perspective take the speed of light as 299 792 458 m/s and take the time of a nanosecond as 10^-9 seconds to get ~0.3 m/ns for light. That means that for every third of a meter the dimms are away from the cpu means a constant 1ns delay in terms of latency.
Not strictly true. Most algorithm choices in gaming can make trade offs between CPU and RAM. If you increase available RAM, you can usually use algorithms that are more RAM hungry and less CPU hungry for large speed benefits.
In 1 TB of RAM you can keep without any compression 3d array 1000m * 1000m * 64m of voxels with 1 voxel being a cube 2 cm * 2 cm * 2 cm. And you can lookup it randomly with negligible latency and do real time raytracing on it.
If that won't change games I don't know what can.
Besides GPUs will obviously also use this technology if it really works.
I don't know what is special about emulating a ps3 on a PC, most mainstream PCs and GPUs are faster than the Power-PC based cell processor in the ps3 (an 8 year old console). Even the ps4 does not contain any better graphics processing capabilities than a recent relatively high end PC.
The only reason to prefer the GPU is because it confers an advantage over the traditional CPU+RAM combination. There's nothing inherently special about a modern GPU. The GPU is a sequence of actions and abilities encoded into hardware, e.g. the ability to automatically perform various kinds of texture filtering transparently to the game developer.
Since the GPU is hardware, and since hardware is less flexible than software, a graphics programmer would always prefer a software-based pipeline to a hardware-based one. The reason hardware pipelines are preferred is strictly because their advantages outweigh their disadvantages. Typically, using a GPU enables graphics programmers to create renderers which are 10-100x more efficient than software-based renderers, so the added flexibility of a software rasterizer tends to be forgotten in the face of massive efficiency enabled by the GPU.
The GPU primarily became popular because (a) it offloaded part of the computation from the CPU to dedicated hardware, freeing up the CPU for other tasks like game logic, AI, and more recently physics computations (though nVidia is trying hard to convince developers that hardware-accelerated physics is a viable concept), (b) GPUs increased the amount of available memory, and (c) GPUs dramatically increased the throughput (memory operations per second) of graphics memory.
Memory latency plays a key role in many modern graphics algorithms, such as voxel-based renderers. It's often the case that an algorithm needs to repeatedly cast rays against a voxel structure until hitting some kind of geometry. Therefore, within an individual pixel of the screen to be rendered, this type of algorithm can be hard to parallelize because typically the raycasting can't be broken up into parallelizable steps. It typically looks like, "While not hit: traceAlongRay();" for each pixel, each frame. I.e. this algorithm can only trace one section of the ray at a time before tracing the next.
That raycasting algorithm is memory-latency-bound because it completes only when it finishes looking up enough memory locations that it detects the ray has intersected some 3D geometry. In other words, by reducing memory latency by 2x, and assuming memory bandwidth is sufficient, then this algorithm will complete twice as fast. This means instead of 24 frames per second, you might get 48 frames per second.
So, all that said, if it becomes common to have 1TB of regular RAM with the latency and bandwidth traditionally offered by GPUs, along with a surplus of available CPU cores to offload computations to, then software renderers will once again become preferable to GPU renderers. A software pipeline will always be more flexible and easier to maintain than a hardware pipeline, simply because the featureset of the software pipeline isn't restricted to the capabilities of the videocard hardware it's executing on. It's also easier to debug and maintain.
All of that means that it'll be easier for art pipelines to produce more complex, more immersive visual experiences than at present. But replacing the traditional GPU-based renderer with a CPU-based software renderer will only be practical if there's a major advance of RAM technology in the future, because current RAM tech can't match the memory bandwidth / latency of a modern GPU. Hence, any major developments in the area of volatile memory tech will be extremely interesting to graphics programmers.
And a large part of what makes a better GPU is RAM bandwidth. If you're using integrated graphics there actually tends to be a quite large difference between using system RAM clocked at 1066 and RAM clocked at 1866. If you're using a discrete GPU card the RAM that makes a difference to your gaming performance is already soldered onto the card, but that might still see an improvement from faster memory technologies since the people who make those cards could use it.
If you're using integrated graphics there isn't any PCI bus involved. Even when graphics was off-die it was on the Southbridge.
And the bandwidth between the GPU and the GDDR on the graphics card doesn't have anything to do with the PCI bus either, except to a small extent when synchronizing with the CPU or initially loading textures or whatever.
Apart from the cutting-edge APUs (like AMD Kaveri implementing hUMA), the memory of integrated GPUs and previous generation APUs was separate and copying chunks of it in between happened via the bus.
Plus since we're talking about cutting-edge gaming, integrated graphics is irrelevant (and so are APUs).
This article makes a lot of extraordinary claims. Extraordinary claims require extraordinary evidence.
In this case, that means a reputable tech reporter from a reputable publication has to say "this is the real deal" before these guys can be taken seriously.
(I'm not doubting that they have any tech, it's just that they seem to be promising something with no downsides, which in most cases means a marketing team has spun out of control.)
In the same sense as you might refer to a "normal" quantum computer or "normal" cancer cure, I guess.
The phrasing in the article is a little wishy washy, but it sounds like this company is going to production with a TB-per-IC-density non volatile storage product. If true, that's huge news.
No, but they've been expected to be on the market any day now for the past year or two. HP's been doing tons of research and prep work in commercializing them:
"They've been expected to be on the market any day now for the past year or two."
The last I heard was second quarter 2014 by HP, that's been the release date for the last year or so. Supposedly they could release them now, but they are trying to time the market for business reasons.
Just because you don't know about it doesn't mean anything. I don't mean that to be snide, but just to point out that you may have encountered a limit in your own knowledge. I've known about memristors for years and have been waiting to see a product come to market. They're interesting devices, you should read up on them. Also, search back through HN as there have been several worthwhile discussions of the technology.
Well you could verify the patents for yourself, you could cross-check the opinion of the Convergent Semi analyst for yourself, and you could reflect on the fact that Kleiner Perkins is a well-established VC firm, and a bunch of other things.
This is a very competitive and huge market. Lot of companies running to obtain the next disruptive tehcnology/product. No one is really ahead of the others for now. So I guess it will be difficult to have any good tech reports without any marketing touch in it :)
Given the graphic on the top a chip with 8 GByte at 77mm^2 I have no idea where they're getting the "Terabyte on a chip" thing. Unless the mean a Terabyte on a 200 mm die?
FTA: "The company can put a terabyte of data, or about 250 hours of high-definition movies, on a single chip that is smaller than the equivalent flash memory chip (as pictured at top)."
As I understand it, one of the (more interesting) characteristics of memristors is that they can do computation. They don't mention anything like that here, so I doubt it.
A memristor can be set up as a logic element that does computation, or as a memory storage element, just like a transistor (only persistent).
For some strange reason, the tech press took this to mean that a memristor array can be dynamically reconfigured to act as logic or ram in the same device. This is mostly false -- while yes, this kind of devices can be built (with transistors, they are typically called field-programmable gate arrays, or FPGAs), this requires you to be able to reconfigure not just the gates, but all the wiring that goes into them, meaning that a reconfigurable array has to be more than an order of magnitude larger, and a few times slower than a non-reconfigurable one.
You never want to use an FPGA for ram because you can get proper, non-reconfigurable ram for (way) less than tenth of the cost. You never want to use a fpga for logic if you can afford an asic, because you can be several times faster with hard-baked logic.
Memristors will mean nice FPGAs that retain state on power down, but they will not mean a revolution of dynamically reconfiguring devices.
Oh, and memristor logic is slower than transistor logic (you can do with less gates, but the gates switch slower) so the ability to use memristors for logic elements will initially mostly be a win in memory devices where the necessary logic to manage the device can be made out of the same structures the device is made of.
Memristor do no computation at all. It follows the same idea as the described RRAM. High density, low cost, persistent, low power consumption, high write/read throughput, low latency. To the exception that it could even replace RAM, not only nand devices. I don't remember of the top of my head reading performance but it was clearly closer to RAM levels.
"Another possible application of memristors is logic circuits. Memristors can be used in hybrid CMOS-memristor circuits, or as a standalone logic gate. One notable logic application is using memristors in an FPGA, as configurable switches, connecting the CMOS logic gates."
Well, hopefully some competition will prompt Hynix to get the ball rolling, instead of just squeezing the last dollars out of current flash technology...
Also, that "Crossbar Chip Design" graphic made me chuckle. It is nearly meaningless by itself, and is placed so far away from the context it's discussed in, it almost remains that way unless you're trying to connect it.
Let's be cautiously optimistic. Every few years a new memory tech comes along with tons of articles about how it's going to displace the current one in only 2 years. (RRAM has been one of "the new ones" for years.)
I'm not saying it won't happen, but on the journey from idea to millions of units, a test chip is just the beginning, and in the meantime, NAND is moving.
Look at how slowly NAND has replaced rotating magnetic storage: it's been around since the 80's. Decades of iteration have brought it to a place where it's compelling for non-niche use cases.
The write performance sounds good, but the read performance seems very low. Also I don't know where they get 7 megs per second for flash. Sounds like the picked the worst performer to compare to.
I'd be more impressed if they had something on the market.