Apple Is Working on a Dedicated Chip to Power AI on Devices

highd · on May 26, 2017

This is probably going to be a hyper parallel fixed point / integer engine like TPU gen1. Doing fast matrix multiply over really small fields is very subpar on CPUs and GPUs. That was the initial reasoning behind TPU gen1 - improving runtime performance.

One question is if it will architecturally be closer to a GPU or an FPGA. The field moves so fast that it might make sense to "future-proof" a bit with a live-reconfigurable FPGA.

jacquesm · on May 26, 2017

I'd bet on this being an ASIC, doing this on an FPGA with any serious size matrix would require a very expensive FPGA, whereas an ASIC would allow more gates in a smaller volume and would consume less power to boot.

From the manufacturers point of view phones not being future proof is a feature, not a bug, that way you'll upgrade to that new shiny item which will keep the profits rolling in.

wklauss · on May 26, 2017

> From the manufacturers point of view phones not being future proof is a feature, not a bug, that way you'll upgrade to that new shiny item which will keep the profits rolling in.

I don't think a better AI chip will be a convincing argument to change a phone one year later.

RandomBK · on May 26, 2017

Not if you put it that way. Apple can simply make new AI features exclusive to newer phones with updated versions of the chips. If they open the chip up to developers, this effect can spill out to the app store. That would then provide the impetus to push consumers to upgrade.

LrnByTeach · on May 27, 2017

You nailed the Apple plan, they did exact same thing staring from Introduction of Siri (4S?), ... then Force Touch etc.

With each iteration of phone, they add a shiny new feature that is ONLY available on the newer model.

savoytruffle · on May 27, 2017

Siri was only available on new iPhones from then on, but it's now available on regular Macintoshes going back like 7-8 years.

moonforeshot · on May 27, 2017

i don't see how force touch can be added to older hardwares...

ferbivore · on May 27, 2017

It can't be. Night Shift, on the other hand...

hn_throwaway_99 · on May 27, 2017

I think you underestimate the power of marketing.

_pmf_ · on May 27, 2017

> I think you underestimate the power of marketing.

With Steve Jobs gone, Apple is more and more seen as the emperor without clothes[0] that it is.

[0] I.e. yet another rebrander of Chinese goods with marginal differences from other franchises

nickpsecurity · on May 27, 2017

"I'd bet on this being an ASIC"

Aren't all the iPhone chips ASIC's with the main one custom by those hardware people they acquired? Seems to be default expectation for whatever they add next. They sure as hell have the money, too. :)

bitmapbrother · on May 27, 2017

I don't think so. I believe Apple is basically going to do what Google and Qualcomm do and use a DSP's.

jacquesm · on May 27, 2017

'A DSP's'? (sic)

DSPs are not nearly as good at matrix math as a GPU and the phone already has one of those. DSPs are typically good at signal processing in a fairly limited domain, they would not be calling it an AI chip if it was just a DSP.

happyopossum · on May 26, 2017

You don't put an FPGA in a device you're going to sell 200M+ of. The cost per unit would be way higher than an asic, and your just going to come out with a better version next year anyway so why bother?

I foresee it as similar to their M series co-processors - the first one was pretty basic, and more sensors and jobs have been given to the newer ones each year.

amirhirsch · on May 27, 2017

I work on FPGA embedded vision.

I think a lot of people in this thread are making incorrect assumptions about FPGAs implementations of neural network applications.

(1) forward networks are constant multiplications, i.e. Fixed shift and add. FPGAs are very nearly optimal architecture for programmable constant-shift-and-add

(2) individual neurons in a network can be bitwidth optimized and huffman encoded for bit-serial execution, FPGAs are a very nearly optimal architure for variable bit-width operations in a bit serial architecture with programmable huffman decoders [edited: huffman encoding, not hamming]

(3) running a forward network requires multiple channels of parallel memory with separate but deterministic access patterns. Most fpga architurs are designed with onboard ram specifically to be used this way.

(4) fpga architectures can be designed inherently fault and defect tolerant, like gpus disabling cores, but with finer granularity. Especially if the compilation is done in the cloud, the particular defect / yield profile can be stored for placement optimization.

(5) anything optimized for ASIC design will be necessarily so close to an FPGA that it may as well benefit from the existing programmable logic ecosystem to be flexibly optimized for a particular trained network. You can't just tape out an asic for every trained network, but based my previous points, you most likely can optimize the logic for a specific forward network to run on an FPGA better than any asic designed to run arbitrary networks

fauigerzigerk · on May 27, 2017

But what about unit cost?

rch · on May 26, 2017

IIRC there's already an FPGA in the iPhone.

Retric · on May 27, 2017

There is a tiny one in the iPhone 7. But, that's for flexibility on current tasks not future proofing.

In terms of AI there is little reason to run it on the phone unless it's heavily used or needs or be low latency. Consider if they add a 100$ of computing power to a phone that sits unused 99% of the time they can just build a server using those same 100$ worth of parts that can then serve 100 phones saving 90+$ per phone including upkeep etc.

PS: This is the same logic why Amazon Echo is so cheap, it simply does not need much hardware.

panic · on May 27, 2017

Privacy is another reason to keep the computation local.

themihai · on May 27, 2017

I believe that's actually the main reason to choose a "local" AI.

taneq · on May 27, 2017

Good luck getting any of the big players to acknowledge that. >.<

happyopossum · on May 27, 2017

That's exactly why Apple does things like analyze your photo library locally on the phone - for privacy.

DannyBee · on May 27, 2017

Yes, surely this has no relation to their inability to build reliable cloud products ....

mercer · on May 27, 2017

It could be both. Perhaps Apple concluded that 1) they're subpar with cloud services and will have difficulty competing, 2) there's a growing need/demand for more privacy and less 'cloud', and 3) Apple's products are already, on the whole, recommended when it comes to privacy.

And based on that they figured privacy was a good thing to aim for. Play to their strengths and differentiate based on that.

dgacmu · on May 27, 2017

I believe many of them do. Google has TensorFlow Lite: https://techcrunch.com/2017/05/17/googles-tensorflow-lite-br...

Facebook has Cafe2Go. Apple is working on this (and already has bindings optimized to use the ARM vector unit for DNN evaluation).

Running on device, if it can be done with reasonable power, is a win for everyone. Better privacy, better latency, and more robust operation in the face of intermittent connectivity.

happyopossum · on May 27, 2017

You are correct, but we're talking about an entirely different magnitude of fpga for AI purposes than the tiny little one in the iPhone 7.

rtkwe · on May 26, 2017

It'll be an ASIC so more GPU than FPGA. The real reason to upgrade the chip would be to add more transistors rather than any real instruction set upgrade so the FPGA doesn't really get Apple anything other than cost and wasted space.

imh · on May 26, 2017

How do FPGAs compare on space and power? Would differences be enough to matter for mobile?

blackguardx · on May 26, 2017

FPGAs are pretty bad space-wise and power-wise compared to straight up ASICs. Apple could make some blocks highly configurable, but even an FPGA designer wouldn't use FPGA fabric to do multiplication if they cared about performance. FPGAs are a mix of general purpose logic blocks (the fabric) and dedicated blocks like multipliers, dividers, PLLs, memory, serializers and deserializers, etc.

highd · on May 26, 2017

That's what I'm thinking - some sort of configurable FPGA like fabric around a bulk of TPUv1 style cores, maybe for routing outputs around so you can do some nice pipe-lining like you might want with CV on video.

highd · on May 26, 2017

I don't think space is an issue, but an ASIC designed exactly for a workload will always beat an FPGA on power. But if you don't know the workload exactly or don't have the money to fab an ASIC then an FPGA will be superior if the workload is a bad fit for CPUs or GPUs. So if you can save (2-10)x power on some unknown ML workload in the future that might be preferable to (10-20)x on some fixed workload with a fixed-point ASIC.

I.e. Bitcoin mining went GPU->FPGA->ASIC, each with more investment required to design but higher overall performance in Hash/W. But that workload is known exactly.

seanp2k2 · on May 26, 2017

I doubt they'll do an FPGA. Devices are too concerned with battery life to be running that, plus their margins would suffer or it'd be even more expensive.

blackguardx · on May 26, 2017

Space is only an issue because it is directly correlated with unit cost.

aswanson · on May 26, 2017

Probably worse on power consumption as well as price per MIP.

dnautics · on May 27, 2017

Fpgas are generally power hogs, which makes sense considering how they work.

petra · on May 27, 2017

Mythic is working on analog/digital neural network chips. Some others also. Much better power-consumption.

Maybe Apple aims on those ? it will certainly achieve a much higher WOW factor, which is key here.

JumpCrisscross · on May 26, 2017

Could this mean on-device ANI? My deal breaker with Amazon, Google, Microsoft and even Siri is their role in normalising the hoovering up of sensitive data.

yellow_postit · on May 26, 2017

More work needs to be done on training models with less data, differential privacy, and unsupervised learning, but so long as supervised learning continues to be the main path forward for the current set of "AI" centralizing the data into ginormous data sets will continue to be the norm.

halflings · on May 26, 2017

I don't see how unsupervised learning makes this any better? That data you're training on in an unsupervised manner is still collected somewhere, and could contain as much private information as a labeled dataset.

visarga · on May 27, 2017

Yep, whether it's supervised or not has no bearing on privacy. What counts is where the data is processed and who has it.

yellow_postit · on May 28, 2017

Labels for supervised training tend to come from humans in the loop. I think many would consider another human looking at their photos, searches, etc. To be a loss of privacy albeit with a small surface area.

fnbr · on May 27, 2017

I think so- they acquired a company a few years back, percept.io, that did on device learning. I wouldn't be surprised if they were starting to put it in production.

whatisani · on May 26, 2017

what is ANI?

georgespencer · on May 26, 2017

Artificial Narrow Intelligence or "weak" AI. Like Siri.

jchw · on May 26, 2017

Why is Bloomberg not mentioning that Google announced it was working on the same thing? They mentioned vaguely that Amazon and Google both were working on AI, but nothing about the seemingly similar TPU and how Google announced they were going to bring it to phones at I/O just a bit ago. Am I wrong to be thinking that's pretty relevant here?

ex3ndr · on May 26, 2017

Because Google's TPU is for servers and not for Mobile.

jchw · on May 27, 2017

Pretty sure they announced during I/O that they were working on making a TPU for mobile.

hn_throwaway_99 · on May 27, 2017

Do you have a link? I don't remember this, and I didn't find anything through googling.

TechRomancer · on May 27, 2017

I'm definitely doubting myself now. I remember watching the keynote and thinking how they didn't make a bigger deal out of the on device chip.

The only thing similar I can find is at about 1:22:00 in the keynote here https://youtu.be/Y2VF8tmLFHw but all he actually says is "silicon specific accelerators".

So honestly at this point it could mean anything.

xxgreg · on May 27, 2017

"Google has clearly committed to this vision of AI on the phone. At I/O, the company also unveiled a custom-built chip for both training and running neural networks in its data centers. I asked Google CEO Sundar Pichai if the company might build its own mobile chip along the same lines. He said the company had no plans to do so, but he didn’t rule it out either. “If we don’t find the state-of-the-art available on the outside,” he said, “we try to push it there.”"

"Companies such as Intel are already working on this kind of mobile AI processor."

https://www.wired.com/2017/05/google-really-wants-put-ai-poc...

Edit - also:

"There’s already one mobile processor with a machine learning-specific DSP on the market today. The Qualcomm Snapdragon 835 system-on-a-chip sports the Hexagon DSP that supports TensorFlow. DSPs are also used for providing functionality like recognizing the “OK, Google” wake phrase for the Google Assistant, according to Moorhead."

http://www.pcworld.com/article/3197412/mobile/heres-how-goog...

visarga · on May 27, 2017

And TPUs are not for sale, just for rent in the cloud, and that is only since recently.

shauder · on May 26, 2017

Knowing them this will be pretty good. The A10 is a beast.

mwfj · on May 27, 2017

Knowing Apple's (software) prowess in AI the end-result will still likely be shit compared to Google.

(I think what we are seeing here is the usual thing where Apple's software/product/design people decide the iPhone hardware roadmap, rather than the hardware people.)

yalogin · on May 27, 2017

What is your basis for this? Am curious.

midhunsezhi · on May 27, 2017

Siri vs Google assistant. Google assistant being way better in understanding voice and performing tasks. I've used both iPhone and Pixel and I can confirm that Google assistant is way more smart and intelligent than Siri at this point.

mathieuh · on May 27, 2017

Don't know why you're being downvoted, this has been my experience too. Siri feels more like a toy with its infuriating jokes and stunted capabilities, Google Assistant feels much more polished.

I have confidence that attempting a command I've never tried before with Google Assistant will work, with Siri it's potluck.

Grazester · on May 27, 2017

Yeah it's not even up for debate. Google is way ahead of Apple in Ai and just about nearly another we could name.

had2makeanacct · on May 27, 2017

Wired recently did a test on major AI assistants with different accents and Google Assistant won every round (they gave the first round to siri for some reason but if you watch the video Assistant clearly won that round too)

https://www.youtube.com/watch?v=gNx0huL9qsQ

It was astonishing how right Assistant can be

DigitalJack · on May 27, 2017

OS X has degraded over the last several years in terms of bugginess. Anecdotally.

It has gone from "it just works" to a source of annoyance. Although nothing can top my work computer (Windows 7) for irritation.

Siri is a bag of hurt. I bet Apple has a very useful library of recordings of people saying "damn it, that's not what I said."

Just thinking about how terrible Siri is makes me hope and pray that they really aren't working on any self-driving car software.

hackuser · on May 26, 2017

My assumption has been that Google, Amazon, and Microsoft run the heavy-duty AI in the cloud when possible, benefiting from huge scale and easier updates. Maybe that assumption is wrong?

If it's right, is Apple adopting a more decentralized model, with AI (or more AI) running locally? Could that compete with cloud-based AI's advantages? Obviously it would be better for offline usage, for responsiveness when transmission to the cloud is a significant part the latency, and for confidentiality.

happycube · on May 26, 2017

Google's been working on distributed training as well.

hackuser · on May 26, 2017

Why? What is the benefit to Google?

Also, are they doing training for the local user or for Google's 'general' systems or for both?

bitmapbrother · on May 27, 2017

So they can do on device AI/ML with TensorFlow Lite, through the use of specialized neural network DSP's, as discussed during the keynote at I/O 2017.

https://youtu.be/Y2VF8tmLFHw?t=1h22m8s

deepnotderp · on May 26, 2017

Well....

This is interesting indeed, although I suppose it was somewhat inevitable.

I'm definitely interested in the architectural details of the chip, but I doubt Apple will open up. Apple has control of the software stack and by extension, what models will run on this chip, so I expect that it will be a little bit more special purpose than general purpose.

cft · on May 26, 2017

I have been worried about this trend: if they don't open it up, things like this introduce a disparity between startups that can only have access to GPUs and big companies that make their own proprietary ASICs for their proprietary software, such that startups cannot complete.

saidajigumi · on May 26, 2017

It's weird to me that you somehow assume no chip makers will move into the market for mobile-ready AI processors, if this really becomes a thing. Apple certainly won't open up its designs, assuming they exist and ship. There's strongly negative incentive and cultural inclination for them to become a chip vendor.

cbhl · on May 26, 2017

This disparity has always existed, though. Big companies can throw money at things that start-ups can't.

So you rent or borrow from a bigger company while you can (Cloud TPUs), or your specialize in doing things that big companies with inflexible purpose-built hardware can't.

spott · on May 26, 2017

The flip side is that there is pretty obviously a market for such a product. If it isn't released by google or apple, it will be released by someone else. If it isn't, then that is a pretty good idea for a startup.

cft · on May 26, 2017

Only well funded startups will make ASICs.And most of them will fail. This is very different from many small startups programming general purpose computers.

asimpletune · on May 26, 2017

So then maybe the key is a start up that's in the business of raising the chance of success that other players having in this endeavor?

bluGill · on May 26, 2017

The big companies can make their own. The next smaller companies will gather together and create a company (this is essentially how ARM started) to create one chip that works for all.

elefanten · on May 26, 2017

Won't it likely be in their interest to make those capabilities open to 3rd parties? My guess: the platforms that don't will suffer in experience.

cft · on May 26, 2017

What about the server side? Google executing Tensorflow on ASICs versus a startup restricted to GPUs or Google Cloud/TPUs?

wyldfire · on May 26, 2017

> . The chip, known internally as the Apple Neural Engine,

Is this a real IC/processor for arbitrary software or an abstraction of an underlying GPU/DSP?

mtgx · on May 26, 2017

Most likely some kind of dedicated deep learning accelerator. This is coming with or without Apple:

> Exynos 8895 features VPU (Vision Processing Unit) which is designed for machine vision technology. This technology improves the recognition of an item or its movements by analyzing the visual information coming through the camera. Furthermore, it enables advanced features such as corner detection that is frequently used in motion detection, image registration, video tracking and object recognition.

http://www.samsung.com/semiconductor/minisite/Exynos/w/solut...

> New Vision Processing Unit (VPU) paired to the Image Signal Processors (ISP) that provides a dedicated processing platform for numerous camera features, freeing up the CPU and GPU and saving power.

https://www.mediatek.com/products/smartphones/mediatek-helio...

I was also hoping that with Google's high-efficiency for the TPU, they would make a version for mobile as well, at least for their Pixel phones. I guess they still might, but the fact that the TPU2 does both training and inference makes me think they won't do that anytime soon anymore.

The biggest reasons why I like this "mobile AI chips" trend is that they can give you back some privacy, if the data can be analyzed locally without going to the vendors' servers, and I think they will also boost the capabilities of computational photography. No more spying toys for kids, etc.

protomyth · on May 26, 2017

What does the instruction set look like on that? Is there another example out of this type of chip?

binocarlos · on May 26, 2017

No idea what instruction set the Apple device uses but Google just announced alpha access to their Tensorflow Processing Unit: https://cloud.google.com/tpu/ on Google cloud

MBCook · on May 26, 2017

Google has their Tensor Flow. chips, I would imagine it is similar.

protomyth · on May 26, 2017

Is there released documentation on the instruction set of the Google Cloud TPU?

pjscott · on May 27, 2017

They list the main ones on pages 3-4 of this paper:

https://drive.google.com/file/d/0Bx4hafXDDq2EMzRNcy1vSUxtcEk...

It's mainly moving memory around, matrix multiplication, convolution, and applying activation functions (sigmoid, tanh, relu, etc.). Very simple, high-level stuff. This has the handy side-effect of making timing very predictable, which makes the latency a lot more deterministic.

indy · on May 26, 2017

> I was also hoping that with Google's high-efficiency for the TPU, they would make a version for mobile as well

At Google IO there was a slide during the keynote that they're working with Mediatek to produce a mobile TPU

samstave · on May 26, 2017

I wish they would have called it Apple Neural Technology, so we could start referring to the devices as ANTs and Hives and Colonies as we build out Richard Hendrick's new decentralized internet.

covercash · on May 26, 2017

I'm more excited for the dedicated chip that performs real time hot dog detection!

samstave · on May 27, 2017

Mine seems to only detect Little Smokies :-(

strin · on May 26, 2017

Neural Engine = General Matrix Multiplier?

philplckthun · on May 27, 2017

It's nice that the article is trying to deliver an intro that explains that Apple clearly has some catching up to do.

Except that now I'm pretty baffled, since I've seen an article a few months earlier, that says Apple is massively investing in AI, and already using it in several places in their products.

So what am I supposed to believe now? :/

luhn · on May 27, 2017

Believe them both.

Apple is investing massively into AI and is using it in their products. However, Google has been working with AI longer and has much more experience. (The article praises Amazon's AI chops as well, I dunno about that one.)

H4CK3RM4N · on May 27, 2017

As I understand it, Apple simply hasn't been able to provide the same quantity of training data that other companies use.

ojosilva · on May 27, 2017

I think the general availability of TPUs is an important inflection point in the path to AI popularization, abd, who knows, some type of singularity event. Definitely a milestone in 21st century history.

But I can't resist making a parallel between evolving TPUs and how the CPU found in the arm of the T800 changed history (negatively) forever in the Terminator universe.

ge96 · on May 27, 2017

That's what we need! A bunch of GPU's and computers carrying a car battery haha. Man that would be crazy. Pre trained before it leaves the factory. (don't know what I'm saying) but I do imagine a man-sized humanoid robot with a bunch of GPU's, hard drives.

bonoetmalo · on May 27, 2017

I'm not too familiar with the concept of ML specific chip designs, but isn't most AI done on servers and the results returned to the device? What kind of applications involve local ML code execution?

msh · on May 27, 2017

If you want to protect user privacy it's a very relevant selling point to be able to say that the data never leaves the device.

faragon · on May 27, 2017

Apple partially offloading Siri from the cloud to client devices in order to reduce datacenter costs?

samfisher83 · on May 26, 2017

Why not use a gpu. A lot of AI stuff is linear algebra: Multiply accumulate etc.

Symmetry · on May 26, 2017

Just as going from scalar to vector instructions provides a speedup so does going from vector to matrix instructions. If you've got big vectors than the amount of parallelism exposed for more hardware execution resources used on isn't too big but the reduction in register file read port usage is pretty significant.

Also, inference is usually happy with int8s whereas graphics workloads are mostly float32s. So you can save a lot of hardware that way too.

dnautics · on May 27, 2017

Why are graphics workloads float32? 32bit (1million+alpha) which is higher color resolution than most eyes can see - "true color" - is 3 8-bit ints + an 8 bit alpha channel (sometimes)

kgwgk · on May 27, 2017

GPUs are not (only) about representing pixels, they are (mostly) about geometric computation.

VladimirGolovin · on May 27, 2017

Because before you can see a color, you have to compute it. For example, you need to calculate what color would result from the interaction of a light source of a given color / intensity and a surface of a given color / reflectivity / glossiness etc. There's no way to reasonably compute that using just 8-bit ints without getting terrible banding / quantization artifacts.

obstinate · on May 26, 2017

I am 100% positive that they considered that before starting to work on this ASIC. Some possible reasons: GPUs are insufficiently specialized, or use too much energy on the subset of work that Apple wants to enable with this chip. GPUs are too large. GPUs are busy doing other things, etc.

nsxwolf · on May 26, 2017

They considered it sure - but why did they conclude they should go with an ASIC? That's what grandparent asked and it was a reasonable question. "They considered that" isn't a suitable answer.

obstinate · on May 26, 2017

I think when you add "why try to reinvent the wheel?" to the end, it is less of a question and more of a statement. Similar to saying, "Why would you do that?" after someone does something silly. You aren't actually asking them why they'd do the thing. You're saying they ought not have.

cortesoft · on May 26, 2017

But they didn't say that?

stephencanon · on May 26, 2017

They did, but they've since edited the comment.

obstinate · on May 26, 2017

That is exactly what they said.

huxley · on May 26, 2017

The rest of the reply seems to be an answer (as far as you can get with Apple)

I don't think it's wild speculation to say Apple is looking for efficiencies they may not have been able to get with GPUs especially performance-per-watt since so many of their devices are mobile focused

obstinate · on May 26, 2017

To be fair to nsxwolf, I did not originally explain. I tend to gradually expand my comments. The first iteration was just lashing out at this trend I see on this site which I highly disapprove of: facile reactions to any work that the commenter does not understand. I really detest this reaction that boils down to, "I once heard about a tensor, so clearly I have a better idea of whether this chip should be invented than the experts working at Apple."

m_mueller · on May 26, 2017

Basically just look up Google's TPUs and the reasoning behind them.

enos_feedler · on May 26, 2017

despite gpus being fairly "general purpose" these days, there is still a lot of circuitry built for graphics pipeline like workloads. If you just want to do linear algebra you just need a high bandwidth interface to memory and lots of math units.

Geee · on May 27, 2017

Are these kind of chips used to accelerate the NN training process only?

gigatexal · on May 27, 2017

Good. But Google beat them to it already.

npgatech · on May 27, 2017

How so? Google developed a server class TPU for Datacenters. Apple is trying to build on-device low powered custom chip.

bitmapbrother · on May 27, 2017

Google announced it at I/O. They're using DSP's, specialized for neural network processing, on the SoC with TensorFlow Lite on the device.

https://youtu.be/Y2VF8tmLFHw?t=1h22m8s

GeekyBear · on May 27, 2017

Apple added a neural networking API that can leverage the GPU or CPU of a given device in the last version of iOS.

https://www.wired.com/2016/06/apple-bringing-ai-revolution-i...

There was an article last year discussing customizations Apple had made to Imagination's GPU that made it more suitable for this purpose.

http://www.realworldtech.com/apple-custom-gpu/

npgatech · on May 27, 2017

Oh wow, thanks. I feel like Apple is a step behind in pretty much all things AI.

gigatexal · on May 27, 2017

Yeah SIRI has been lame compared to Google's hey Google for a long time.

ClammyMantis488 · on May 27, 2017

Very interesting...