This is probably going to be a hyper parallel fixed point / integer engine like TPU gen1. Doing fast matrix multiply over really small fields is very subpar on CPUs and GPUs. That was the initial reasoning behind TPU gen1 - improving runtime performance.
One question is if it will architecturally be closer to a GPU or an FPGA. The field moves so fast that it might make sense to "future-proof" a bit with a live-reconfigurable FPGA.
I'd bet on this being an ASIC, doing this on an FPGA with any serious size matrix would require a very expensive FPGA, whereas an ASIC would allow more gates in a smaller volume and would consume less power to boot.
From the manufacturers point of view phones not being future proof is a feature, not a bug, that way you'll upgrade to that new shiny item which will keep the profits rolling in.
> From the manufacturers point of view phones not being future proof is a feature, not a bug, that way you'll upgrade to that new shiny item which will keep the profits rolling in.
I don't think a better AI chip will be a convincing argument to change a phone one year later.
Not if you put it that way. Apple can simply make new AI features exclusive to newer phones with updated versions of the chips. If they open the chip up to developers, this effect can spill out to the app store. That would then provide the impetus to push consumers to upgrade.
Aren't all the iPhone chips ASIC's with the main one custom by those hardware people they acquired? Seems to be default expectation for whatever they add next. They sure as hell have the money, too. :)
DSPs are not nearly as good at matrix math as a GPU and the phone already has one of those. DSPs are typically good at signal processing in a fairly limited domain, they would not be calling it an AI chip if it was just a DSP.
You don't put an FPGA in a device you're going to sell 200M+ of. The cost per unit would be way higher than an asic, and your just going to come out with a better version next year anyway so why bother?
I foresee it as similar to their M series co-processors - the first one was pretty basic, and more sensors and jobs have been given to the newer ones each year.
I think a lot of people in this thread are making incorrect assumptions about FPGAs implementations of neural network applications.
(1) forward networks are constant multiplications, i.e. Fixed shift and add. FPGAs are very nearly optimal architecture for programmable constant-shift-and-add
(2) individual neurons in a network can be bitwidth optimized and huffman encoded for bit-serial execution, FPGAs are a very nearly optimal architure for variable bit-width operations in a bit serial architecture with programmable huffman decoders [edited: huffman encoding, not hamming]
(3) running a forward network requires multiple channels of parallel memory with separate but deterministic access patterns. Most fpga architurs are designed with onboard ram specifically to be used this way.
(4) fpga architectures can be designed inherently fault and defect tolerant, like gpus disabling cores, but with finer granularity. Especially if the compilation is done in the cloud, the particular defect / yield profile can be stored for placement optimization.
(5) anything optimized for ASIC design will be necessarily so close to an FPGA that it may as well benefit from the existing programmable logic ecosystem to be flexibly optimized for a particular trained network. You can't just tape out an asic for every trained network, but based my previous points, you most likely can optimize the logic for a specific forward network to run on an FPGA better than any asic designed to run arbitrary networks
There is a tiny one in the iPhone 7. But, that's for flexibility on current tasks not future proofing.
In terms of AI there is little reason to run it on the phone unless it's heavily used or needs or be low latency. Consider if they add a 100$ of computing power to a phone that sits unused 99% of the time they can just build a server using those same 100$ worth of parts that can then serve 100 phones saving 90+$ per phone including upkeep etc.
PS: This is the same logic why Amazon Echo is so cheap, it simply does not need much hardware.
It could be both. Perhaps Apple concluded that 1) they're subpar with cloud services and will have difficulty competing, 2) there's a growing need/demand for more privacy and less 'cloud', and 3) Apple's products are already, on the whole, recommended when it comes to privacy.
And based on that they figured privacy was a good thing to aim for. Play to their strengths and differentiate based on that.
Facebook has Cafe2Go. Apple is working on this (and already has bindings optimized to use the ARM vector unit for DNN evaluation).
Running on device, if it can be done with reasonable power, is a win for everyone. Better privacy, better latency, and more robust operation in the face of intermittent connectivity.
It'll be an ASIC so more GPU than FPGA. The real reason to upgrade the chip would be to add more transistors rather than any real instruction set upgrade so the FPGA doesn't really get Apple anything other than cost and wasted space.
FPGAs are pretty bad space-wise and power-wise compared to straight up ASICs. Apple could make some blocks highly configurable, but even an FPGA designer wouldn't use FPGA fabric to do multiplication if they cared about performance. FPGAs are a mix of general purpose logic blocks (the fabric) and dedicated blocks like multipliers, dividers, PLLs, memory, serializers and deserializers, etc.
That's what I'm thinking - some sort of configurable FPGA like fabric around a bulk of TPUv1 style cores, maybe for routing outputs around so you can do some nice pipe-lining like you might want with CV on video.
I don't think space is an issue, but an ASIC designed exactly for a workload will always beat an FPGA on power. But if you don't know the workload exactly or don't have the money to fab an ASIC then an FPGA will be superior if the workload is a bad fit for CPUs or GPUs. So if you can save (2-10)x power on some unknown ML workload in the future that might be preferable to (10-20)x on some fixed workload with a fixed-point ASIC.
I.e. Bitcoin mining went GPU->FPGA->ASIC, each with more investment required to design but higher overall performance in Hash/W. But that workload is known exactly.
I doubt they'll do an FPGA. Devices are too concerned with battery life to be running that, plus their margins would suffer or it'd be even more expensive.
Could this mean on-device ANI? My deal breaker with Amazon, Google, Microsoft and even Siri is their role in normalising the hoovering up of sensitive data.
More work needs to be done on training models with less data, differential privacy, and unsupervised learning, but so long as supervised learning continues to be the main path forward for the current set of "AI" centralizing the data into ginormous data sets will continue to be the norm.
I don't see how unsupervised learning makes this any better? That data you're training on in an unsupervised manner is still collected somewhere, and could contain as much private information as a labeled dataset.
Labels for supervised training tend to come from humans in the loop. I think many would consider another human looking at their photos, searches, etc. To be a loss of privacy albeit with a small surface area.
I think so- they acquired a company a few years back, percept.io, that did on device learning. I wouldn't be surprised if they were starting to put it in production.
Why is Bloomberg not mentioning that Google announced it was working on the same thing? They mentioned vaguely that Amazon and Google both were working on AI, but nothing about the seemingly similar TPU and how Google announced they were going to bring it to phones at I/O just a bit ago. Am I wrong to be thinking that's pretty relevant here?
I'm definitely doubting myself now. I remember watching the keynote and thinking how they didn't make a bigger deal out of the on device chip.
The only thing similar I can find is at about 1:22:00 in the keynote here https://youtu.be/Y2VF8tmLFHw but all he actually says is "silicon specific accelerators".
"Google has clearly committed to this vision of AI on the phone. At I/O, the company also unveiled a custom-built chip for both training and running neural networks in its data centers. I asked Google CEO Sundar Pichai if the company might build its own mobile chip along the same lines. He said the company had no plans to do so, but he didn’t rule it out either. “If we don’t find the state-of-the-art available on the outside,” he said, “we try to push it there.”"
"Companies such as Intel are already working on this kind of mobile AI processor."
"There’s already one mobile processor with a machine learning-specific DSP on the market today. The Qualcomm Snapdragon 835 system-on-a-chip sports the Hexagon DSP that supports TensorFlow. DSPs are also used for providing functionality like recognizing the “OK, Google” wake phrase for the Google Assistant, according to Moorhead."
Knowing Apple's (software) prowess in AI the end-result will still likely be shit compared to Google.
(I think what we are seeing here is the usual thing where Apple's software/product/design people decide the iPhone hardware roadmap, rather than the hardware people.)
Siri vs Google assistant. Google assistant being way better in understanding voice and performing tasks. I've used both iPhone and Pixel and I can confirm that Google assistant is way more smart and intelligent than Siri at this point.
Don't know why you're being downvoted, this has been my experience too. Siri feels more like a toy with its infuriating jokes and stunted capabilities, Google Assistant feels much more polished.
I have confidence that attempting a command I've never tried before with Google Assistant will work, with Siri it's potluck.
Wired recently did a test on major AI assistants with different accents and Google Assistant won every round (they gave the first round to siri for some reason but if you watch the video Assistant clearly won that round too)
My assumption has been that Google, Amazon, and Microsoft run the heavy-duty AI in the cloud when possible, benefiting from huge scale and easier updates. Maybe that assumption is wrong?
If it's right, is Apple adopting a more decentralized model, with AI (or more AI) running locally? Could that compete with cloud-based AI's advantages? Obviously it would be better for offline usage, for responsiveness when transmission to the cloud is a significant part the latency, and for confidentiality.
This is interesting indeed, although I suppose it was somewhat inevitable.
I'm definitely interested in the architectural details of the chip, but I doubt Apple will open up. Apple has control of the software stack and by extension, what models will run on this chip, so I expect that it will be a little bit more special purpose than general purpose.
I have been worried about this trend: if they don't open it up, things like this introduce a disparity between startups that can only have access to GPUs and big companies that make their own proprietary ASICs for their proprietary software, such that startups cannot complete.
It's weird to me that you somehow assume no chip makers will move into the market for mobile-ready AI processors, if this really becomes a thing. Apple certainly won't open up its designs, assuming they exist and ship. There's strongly negative incentive and cultural inclination for them to become a chip vendor.
This disparity has always existed, though. Big companies can throw money at things that start-ups can't.
So you rent or borrow from a bigger company while you can (Cloud TPUs), or your specialize in doing things that big companies with inflexible purpose-built hardware can't.
The flip side is that there is pretty obviously a market for such a product. If it isn't released by google or apple, it will be released by someone else. If it isn't, then that is a pretty good idea for a startup.
Only well funded startups will make ASICs.And most of them will fail. This is very different from many small startups programming general purpose computers.
The big companies can make their own. The next smaller companies will gather together and create a company (this is essentially how ARM started) to create one chip that works for all.
Most likely some kind of dedicated deep learning accelerator. This is coming with or without Apple:
> Exynos 8895 features VPU (Vision Processing Unit) which is designed for machine vision technology. This technology improves the recognition of an item or its movements by analyzing the visual information coming through the camera. Furthermore, it enables advanced features such as corner detection that is frequently used in motion detection, image registration, video tracking and object recognition.
> New Vision Processing Unit (VPU) paired to the Image Signal Processors (ISP) that provides a dedicated processing platform for numerous camera features, freeing up the CPU and GPU and saving power.
I was also hoping that with Google's high-efficiency for the TPU, they would make a version for mobile as well, at least for their Pixel phones. I guess they still might, but the fact that the TPU2 does both training and inference makes me think they won't do that anytime soon anymore.
The biggest reasons why I like this "mobile AI chips" trend is that they can give you back some privacy, if the data can be analyzed locally without going to the vendors' servers, and I think they will also boost the capabilities of computational photography. No more spying toys for kids, etc.
No idea what instruction set the Apple device uses but Google just announced alpha access to their Tensorflow Processing Unit: https://cloud.google.com/tpu/ on Google cloud
It's mainly moving memory around, matrix multiplication, convolution, and applying activation functions (sigmoid, tanh, relu, etc.). Very simple, high-level stuff. This has the handy side-effect of making timing very predictable, which makes the latency a lot more deterministic.
I wish they would have called it Apple Neural Technology, so we could start referring to the devices as ANTs and Hives and Colonies as we build out Richard Hendrick's new decentralized internet.
It's nice that the article is trying to deliver an intro that explains that Apple clearly has some catching up to do.
Except that now I'm pretty baffled, since I've seen an article a few months earlier, that says Apple is massively investing in AI, and already using it in several places in their products.
Apple is investing massively into AI and is using it in their products. However, Google has been working with AI longer and has much more experience. (The article praises Amazon's AI chops as well, I dunno about that one.)
I think the general availability of TPUs is an important inflection point in the path to AI popularization, abd, who knows, some type of singularity event. Definitely a milestone in 21st century history.
But I can't resist making a parallel between evolving TPUs and how the CPU found in the arm of the T800 changed history (negatively) forever in the Terminator universe.
That's what we need! A bunch of GPU's and computers carrying a car battery haha. Man that would be crazy. Pre trained before it leaves the factory. (don't know what I'm saying) but I do imagine a man-sized humanoid robot with a bunch of GPU's, hard drives.
I'm not too familiar with the concept of ML specific chip designs, but isn't most AI done on servers and the results returned to the device? What kind of applications involve local ML code execution?
Just as going from scalar to vector instructions provides a speedup so does going from vector to matrix instructions. If you've got big vectors than the amount of parallelism exposed for more hardware execution resources used on isn't too big but the reduction in register file read port usage is pretty significant.
Also, inference is usually happy with int8s whereas graphics workloads are mostly float32s. So you can save a lot of hardware that way too.
Why are graphics workloads float32? 32bit (1million+alpha) which is higher color resolution than most eyes can see - "true color" - is 3 8-bit ints + an 8 bit alpha channel (sometimes)
Because before you can see a color, you have to compute it. For example, you need to calculate what color would result from the interaction of a light source of a given color / intensity and a surface of a given color / reflectivity / glossiness etc. There's no way to reasonably compute that using just 8-bit ints without getting terrible banding / quantization artifacts.
I am 100% positive that they considered that before starting to work on this ASIC. Some possible reasons: GPUs are insufficiently specialized, or use too much energy on the subset of work that Apple wants to enable with this chip. GPUs are too large. GPUs are busy doing other things, etc.
They considered it sure - but why did they conclude they should go with an ASIC? That's what grandparent asked and it was a reasonable question. "They considered that" isn't a suitable answer.
I think when you add "why try to reinvent the wheel?" to the end, it is less of a question and more of a statement. Similar to saying, "Why would you do that?" after someone does something silly. You aren't actually asking them why they'd do the thing. You're saying they ought not have.
The rest of the reply seems to be an answer (as far as you can get with Apple)
I don't think it's wild speculation to say Apple is looking for efficiencies they may not have been able to get with GPUs especially performance-per-watt since so many of their devices are mobile focused
To be fair to nsxwolf, I did not originally explain. I tend to gradually expand my comments. The first iteration was just lashing out at this trend I see on this site which I highly disapprove of: facile reactions to any work that the commenter does not understand. I really detest this reaction that boils down to, "I once heard about a tensor, so clearly I have a better idea of whether this chip should be invented than the experts working at Apple."
despite gpus being fairly "general purpose" these days, there is still a lot of circuitry built for graphics pipeline like workloads. If you just want to do linear algebra you just need a high bandwidth interface to memory and lots of math units.
One question is if it will architecturally be closer to a GPU or an FPGA. The field moves so fast that it might make sense to "future-proof" a bit with a live-reconfigurable FPGA.