A Deep Learning USB Stick

randyrand · on April 28, 2016

Lots of misunderstanding in this comments section.

Movidius makes low power neural network processors for mobile application. The Myriad V1 is used in google tango and the V2 (what the USB stick has) is used in the new DJI Phantom 4.

http://www.theverge.com/2016/3/16/11242578/movidius-myriad-2...

The Myriad chips are interesting because they combine MIPI camera interface lanes on the same chip as a general purpose NN/CV processor and an SDK suite of hardware accelerated computer vision functions (edge detection, Guassian blur, etc).

here's the white paper for the chip: http://uploads.movidius.com/1441734401-Myriad-2-product-brie...

Because programming these chips essentially requires having the hardware, and because the hardware was very hard to come by, programming these chips was mostly limited to Google, DJI, and other big partners.

With this release the everyday developer has access to these vision processing chips, and the barrier to development entry is considerably lower.

This is not meant to replace your titan X gpu.

revelation · on April 28, 2016

This is their own press release. What does that kind of hardware for CV primitives have to do with deep learning?

(Also, of course, this stick doesn't seem to have any kind of connectivity besides the USB to the host computer. How do I connect my camera? Having to shuffle the data from a camera to the stick passing the host computer somewhat defeats the point.)

krasin · on April 28, 2016

>What does that kind of hardware for CV primitives have to do with deep learning?

They have hardware convolutions on 12 SHAVE cores (kind of a DSP core). It means that the chip can run some useful subset of convolutional neural networks very fast and energy efficient.

They also have 2 general purpose SPARC cores, which allows you to have a "normal" program running there. Not sure, how locked the USB stick going to be, and if running your custom program would be an option.

>How do I connect my camera? The chip itself has a couple of MIPI lanes. The USB stick likely does not expose that. And I agree, that's suboptimal.

randyrand · on April 29, 2016

It doesn't defeat the point. You can still do development on it just fine.

hartator · on April 28, 2016

I don't fully get the negativity of comments on HN for this product on the sole fact that it's currently an usb stick.

I think it's just can make sense to push to the market this kind of chip early and don't wait to be bundle with another device. Kudo for Caffe support.

randyrand · on April 28, 2016

This is intended for development. I think HN commentators are missing that.

Given the limited availability of phones and products with the Myriad 2, and the fact that you might not be targeting a phone anyway, being able to buy and develop on a cheap USB stick is an incredibly smart move.

hatsunearu · on April 28, 2016

I don't see why you would need deep learning NN on a phone.

If this is for CV, why not use the myriad DSPs from every silicon company ever?

Seems like a buzzword fueled product.

randyrand · on April 29, 2016

Tell that to Google and DJI that use them in their own products.

They arent just for NN, they are general purpose image processing chips with NNs as one option.

jakub_h · on April 29, 2016

Because phones are ubiquitous?

Likewise, I can imagine people using this with an embedded Raspberry Pi.

magicfractal · on April 28, 2016

You guys know if the Myriad V2 processor is locked (just neural networks) or it will be programmable?

manav · on April 28, 2016

Well it would be suited to other computer vision tasks based on the hardwired accelerator cores.

randyrand · on April 29, 2016

Fully programmable.

http://www.hotchips.org/wp-content/uploads/hc_archives/hc26/...

babo · on April 28, 2016

Haven't found any info of the availability or price. Interesting concept anyway.

akramhussein · on April 28, 2016

<$100[0]

[0] http://www.tomshardware.com/news/movidius-fathom-neural-comp...

davesque · on April 28, 2016

Wouldn't this be limited to something like 10Gb/s throughput? That's not much compared to a GPU bus. Cool idea though. I wish this press release wasn't so light on specs.

manav · on April 28, 2016

It looks like they have developed an ASIC for ML. The obvious use case would be mobile devices where performance/power and latency (versus going to the cloud) is a concern.

eggy · on April 29, 2016

Can you connect it to a phone with a micro-B USB adapter, and then use it to run pre-trained networks with your phone's camera like I imagine it is used in thd DJI Phantom 4? I know the USB to the camera and onboard CPU will not be the same as the DJI's bus, but it would make for an interesting mobile testing platform for me to learn on.

NKCSS · on April 29, 2016

Use a surface tablet?

loeg · on April 28, 2016

This page absolutely wrecks Firefox 45.0.2. UI thread gets blocked indefinitely and spins a core. :-(

dharma1 · on April 28, 2016

this collab with FLIR for thermal cameras is interesting too. The video gives a bit more info about the myriad 2

https://www.youtube.com/watch?v=hsopAM8FexE

veeragoni · on April 28, 2016

How about this kind of extra processor comes in a mobile phone, which improves regular camera vision, all health sensors, better everything that we can do with mobile phones..

randyrand · on April 28, 2016

It does.

The Google Tango uses the Myriad V1. The new tango to be released in may comes with this new V2 chip.

stuartaxelowen · on April 28, 2016

This is not at all the case. Deep learning can do some great things, but its not magic.

joshvm · on April 28, 2016

You don't necessarily need it, Qualcomm have already demonstrated (live) image classification on tablets and it runs on a Snapdragon CPU:

https://www.youtube.com/watch?v=jkDSACXkErY

blakes · on April 28, 2016

Someday they will be I'm sure. Probably part of the SoC.

But not yet, software is not ready for it yet.

jklontz · on April 28, 2016

Seems like the perfect opportunity to insert a plug for my project: http://liblikely.org/

pboutros · on April 28, 2016

Betting on local instead of cloud is always an interesting gamble.

Pros:

- Security

- Control

Cons:

- Resource limitations

- (...)

mattnewton · on April 28, 2016

It looks like this is for running already-trained networks, and I think that's really the only practical way to do things right now if you want to make sense of something like video in real time because of bandwidth constraints. It looks like training would happen in 'the cloud' or similar.

skykooler · on April 28, 2016

Local is also useful in situations with limited or no internet access. Say you're trying to do deep recognition on a live video feed: many places this would be useful simply do not have the bandwidth available to stream video.

teddyknox · on April 28, 2016

Also useful in latency-sensitive applications, such as the drone flight demo they use in their video.

taneq · on April 28, 2016

Pro: Reliability/longevity (your product still works even six months later when the cloud API provider has been acqui-hired.)

brian_herman · on April 28, 2016

How powerful is it?

LogicFailsMe · on April 28, 2016

Not very...

At 15 inferences per second in fp16 for Googlenet, I'd guesstimate 50-60 GHFLOPs. That would give it very roughly 2x perf/W over TitanX.

exDM69 · on April 28, 2016

Given that figure and the Titan X at 250W TDP would put it somewhere around 1/125th of the performance. Which is probably more than good enough for inference given a network but you still need something beefy for the training.

It's still pretty interesting, though, since only need to do the training once.

mirekrusin · on April 29, 2016

I almost closed the tab with this page, the classifier in my head marked it as an ad.

pietrasagh · on April 28, 2016

look like fraud

castis · on April 28, 2016

Hey, I'll bite. Why do you think that?

tP5n · on April 28, 2016

their product is based on a buzz word and uses unknown hardware, the specs seem underwhelming (1W + usb = ?). their page is filled with videos of models and nature shots and a bunch of news categories that have a few articles from 2013/14. there's no way to order anything, yet they put out some wild claims:

'With Fathom, every robot, big and small, can now have state-of-the-art vision capabilities'

'It means the same level of surprise and delight we saw at the beginning of the smartphone revolution'

'With more than 1 million units of Myriad 2 already ordered'

tl;dr: because, to an outsider, it sure does look like fraud.

lawlessone · on April 29, 2016

Their hardware is integrated with many devices like googles Tango.

pietrasagh · on April 29, 2016

To clarify. I'm not expert (only wrote one thesis and designed nonlinear manipulator motion control system based on NN). But... Watching their video, as a noob, feels like drinking stream of bullshit from fire hose. If it really works and do what they claim - I beg, fire you marketing/media/whatever guy. It look like ad of self sharpening knifes.

randyrand · on April 28, 2016

It's a reputable company.

http://www.theverge.com/2016/3/16/11242578/movidius-myriad-2...

intrasight · on April 28, 2016

Just saying - I'm digging the idea of a USB stick that turns my laptop into an artificial intelligence.

nxzero · on April 28, 2016

Thought TensorFlow already did this, how is this USB different?

Seems like the more logical approach would be to have a widget app developers could easily deploy embedded TensorFlow builds in Android & iPhones. Has anyone looked into doing this or found someone already doing this?

danvoell · on April 28, 2016

My understanding is that it allows you to take your tensorflow model, transfer it to the USB and plug it in (interface with) your 3rd party hardware (other than your phone) which isn't connected to the cloud.

tacos · on April 28, 2016

This is a dev board, not a consumer product. And (contrary to the title) the press release explicitly says that it is not intended for "deep learning."

Acceleration is needed for training -- not running the models themselves. A quick glimpse of the power used (1 watt) lets you know exactly how much "acceleration" is going on in here. This is meant for tiny devices.

EDIT: My point is that this is a small-run dev board for a chip for some future $19 nannycam. It's not an "accelerator" you install on your PC to put your graphics card to shame running TensorFlow.

EDIT #2: This is another one of those HN threads that's overrun by enthusiasts. Jamming a chip onto a stick is simply how they sell embedded crap now.

Here's a crypto chip that'll really get you guys going: http://www.atmel.com/tools/AT88CK590.aspx

rm999 · on April 28, 2016

>Acceleration is needed for training -- not running the models themselves

This isn't true. Running neural networks (including CNNs) can be computationally and power intensive, and lends itself to the vector operations of GPUs, FPGAs, and ASICs. Putting the computations on devoted hardware could enable embedded applications that simply aren't possible otherwise.

Here's a whitepaper by Microsoft about using FPGA's to speed up CNNs: http://research.microsoft.com/apps/pubs/?id=240715

Article by Google explaining the importance of optimizing neural networks to run on mobile phones: http://googleresearch.blogspot.com/2015/07/how-google-transl...

tacos · on April 28, 2016

The Google article specifically says they were able to solve the problem on the (tiny) CPU in the phone by using vector operations on the CPU. Again, this is a tiny doodad for power-starved devices that lack any micro.

Have you played with Apple Accelerate? They've been baking this stuff into their chips for quite some time. Apple's FFT outperforms FFTW. https://developer.apple.com/library/mac/navigation/#section=...

zxv · on April 28, 2016

Are there any good examples of FPGA implementations of CNN?

I see one example in Verilog on github: https://github.com/ziyan/altera-de2-ann/blob/master/src/ann/...

dharma1 · on April 28, 2016

RNN - http://arxiv.org/abs/1511.05552v4

dgacmu · on April 28, 2016

Acceleration is actually useful for both. If you're running in a constrained, mobile or sensor environment, then you really do want acceleration that improves your power efficiency. It's also useful if you do high volume serving, but that's obviously not the case here.

(For example, Google's voice recognition on Android can run when offline.)

sailfast · on April 28, 2016

Sure, maybe a nannycam. Maybe whatever you want to run on your Arduino / Raspberry Pi. Maybe bolted on to a small lightweight drone swarm that communicates to itself while offline to learn about its environment.

What happens when 1000 of these are plugged in to 1000 drones and they all communicate? I don't know enough to even guess but perhaps something interesting and dangerous?

joaorj · on April 28, 2016

At first glance I think they would take off and be airborne. Other than that they would do whatever they were programmed to do. I know, my comment is stupid, thought I should fit in.

petra · on April 28, 2016

>> Acceleration is needed for training -- not running the models themselves.

Sometimes training is done on the customer site. For example a noise cancellation algorithm may learn audio characteristics of the user's environment to offer better performance.