A few points: data movement is expensive, but the reasons are not so simple:
1) x68s RAM needs to move huge amounts of data cheaply and extremely reliability. This requires a lot of power to fight noise. Once you're allowed to have a few bit flips, the bus voltages can be made much power directly lowering losses.
2) A significant portion of the large energy costs is actually in computing associated with memory movement! (according to Sohmers from REX Computing, about 40%). Encoding, queuing on processor, Decoding, fetching, queuing on the memory controller, and caching again on the CPU takes a lot of logic.
I think it's quite possible this approach is superior for low precision floating point applications that currently use CPUs/GPUs. Note this allows lowering voltages for the whole system, including interfaces and memory.
Also, data movement doesn't dominate the energy cost of all applications; there are plenty of operations still compute-bound (e.g. some evolutionary methods), it's just that the applications more in vogue are like that. There are plenty of applications that are currently done in DSPs and ASICs because of heavy computational costs (error correction decoding as mentioned in the article) that might benefit from this.
---
As far as I know, it is still an open question whether it is possible to generalize this approach for data-efficiency. I have a few ideas I've been playing around with but none that seem to work yet.
1) x68s RAM needs to move huge amounts of data cheaply and extremely reliability. This requires a lot of power to fight noise. Once you're allowed to have a few bit flips, the bus voltages can be made much power directly lowering losses.
2) A significant portion of the large energy costs is actually in computing associated with memory movement! (according to Sohmers from REX Computing, about 40%). Encoding, queuing on processor, Decoding, fetching, queuing on the memory controller, and caching again on the CPU takes a lot of logic.
I think it's quite possible this approach is superior for low precision floating point applications that currently use CPUs/GPUs. Note this allows lowering voltages for the whole system, including interfaces and memory.
Also, data movement doesn't dominate the energy cost of all applications; there are plenty of operations still compute-bound (e.g. some evolutionary methods), it's just that the applications more in vogue are like that. There are plenty of applications that are currently done in DSPs and ASICs because of heavy computational costs (error correction decoding as mentioned in the article) that might benefit from this.
---
As far as I know, it is still an open question whether it is possible to generalize this approach for data-efficiency. I have a few ideas I've been playing around with but none that seem to work yet.