Reminds me of Adapteva's Parallela, which I bought and was really excited about, but was obviously a commercial failure. I work on software that has an above-average (but far less than HPC) need for concurrency, but even our most demanding customers seem to be content with off-the-shelf Intel, either because they don't need THAT many cores, or because they value the more sophisticated instructions (like encryption acceleration, etc.) I'm curious to know what kind of use cases y'all work on that might crave something like this.
Modern x64 is very difficult to beat and takes a lot more than just raw core count. The out of order execution, prefetching, multiple execution units, SIMD, cache, cache synchronization etc. are all very strong. I think the space where more weak cores can be utilized but not GPUs is pretty slim. Intel themselves even had that briefly with their high core count out of order atom core chips that didn't take off (or weren't given a chance).
I think software that can take advantage of more than a few cores without just using brute force fork-join parallelism in some of the heavy loops is rare. I don't think it has to be that way, but the problem comes down to software architecture which isn't going to be solved by leaving each application programmer to their own devices. It will take libraries that give them the means to do it without having to reason about low level synchronization.
>> the space where more weak cores can be utilized but not GPUs is pretty slim
Yeah I think that's what it is. As you said that I realized that I do actually have a few customers that can benefit massively from many cores, but they're just using MANY, MANY cores and it's all CUDA. I don't see many people going for the compromised approach of say, Xeon Phi.
Based on benchmarks (take them at face value) [1], Apple's A12/A12x processors are roughly 80% the performance of current-gen desktop CPUs (and on par with some mobile ones) in single threaded performance. I think a bit of that is power concerns, even with the 8 core chips.
There have been rumors floating around that Apple will switch to their own ARM cores (based on the A12, probably) for a future generation of Mac computers. I think they'd have to meet or beat x86 for that to happen.
Interestingly they're manufactured with TSMC's 7nm process, just like the new AMD chips.
It wasn't really open sourced. It's open in the sense that if you pay a bunch of money, and sign an NDA, you'll be given access to the source... which has always been the case. Some sales bro bought MIPS and slapped an "open" label on it. IMO, it's current steward has done more to kill it then ARM and RISC-V ever could.
When they first did their kickstarter they were talking about doing a follow up with 64 cores and being able to connect them together with those fancy pants connectors they had going on. I was all set to buy four... and none of that ever materialised.
I got the impression that a lot of folks had unrealistic expectations about how Linux would support the chip right from the start, which honestly wasn't all that surprising recalling the way they worded some of their promotional material.
One presenter claimed the board would pay for itself if you used it for mining...
I assume something like that would be much easier to implement than a real multicore system like linux but I never heard of anyone mining on their boards.
Maybe back in 2013 that might have been possible, but it seems like nobody figured out how to effectively use those boards back during that timeframe. And since then GPUs and then custom ASICs took over.
RIP. I have one of those boards from Kickstarter that's dead due to thermal destruction of the chip. They ran incredibly hot and didn't have sufficient throttling. Not that it was very useful anyway, but I liked the company's mission and attempt.
Nobody wants this particular chip but there isn't money to pay for a RISC-V Manhattan Project so there will have to be an incremental evolution of several generations of RISC-V chips before they produce a usable one.
> either because they don't need THAT many cores, or because they value the more sophisticated instructions (like encryption acceleration, etc.)
The parallela was much more complex, had its own bus architecture which required an FPGA to glue the chip to the Arm cores in the FPGA SoC which ran Linux. Compare that to a simple x86 or Arm system where the tooling and software is mature and you don't have to do any extra fancy work. Just write a program and execute it. Same thing that killed the IBM/Sony Cell processor. Neat idea, PITA to program.
It might've done well as a PCIe (or thunderbolt) thing. There are definitely large scale users of scalar math that could make use of it but you'd need to build it in 5nm and dozens of cores to be even remotely competitive.