Hacker News new | past | comments | ask | show | jobs | submit login
The BeagleBone AI is Available (beagleboard.org)
182 points by FrankSansC on Sept 20, 2019 | hide | past | favorite | 63 comments



Although it is priced competitively, ~ same as Nvidia Jetson Nano ($125) it seems underpowered when compared to Nano. Nano has 4GB RAM, 128 CUDA cores and can 4K encode/decode 30/60 FPS and also handle multiple streams when compared to 15/15? of BeagleBone.

Perhaps the Vision Engine is better for computer vision tasks, but having to use TIDL suite when compared to Jetson Nano's JetPack with tools which we use regularly on bigger GPUs is going be a hard compromise to make.

Jetpack includes CUDA 10, TensorRT, OpenCV 3.3.1 etc. by default and PyTorch is available separately for Jetson Nano. Besides the community is very active.


Thanks for pointing out the Nano, seems they are pricing it at $99:

https://www.nvidia.com/en-us/autonomous-machines/embedded-sy...


I bought Nvidia Jetson Nano Dev Kit in India as soon as it became available for (~$125), in USA it's bit cheaper at ~$110 (incl. shipping).

Please note that there is also Nano Module, which is a SOM (System on Module) with 16GB EMMC storage sold for $150. I think this is intended for clusters.


Something worth being aware of: the Jetson Nano development kit that's currently available has an early version of the Nano module which isn't pin-for-pin (finger-for-finger?) compatible with the current production version sold separately.

The production Nano module (which you can also now buy) won't work in the dev kit carrier! Similarly, the dev kit Nano module won't work in a carrier compatible with the production Nano module.

Apparently they'll refresh the development kit to match the production Nano module later in the year, but for now it's a big gotcha if you're designing hardware using the dev kit.


That’s pretty decent. I think Renesas has a devboard around the $125 price range. But it’s not as well specced.


For some tasks (real-time) the BeagleBone's PRU's are a big advantage.


I guess it has its uses cases where just RTOS doesn't cut it. It would have been a lot easier if they had put up a detailed comparison.


BeagleBone is dual-core A15 vs Jetson Quad A53 so that is a big difference there is processing power as well.


In which direction? Remember, the A15 was designed as a "big" core while the A53 was considered a "LITTLE", having a design lineage closer to the A7.


Correction, Jetson Nano has Quad-core ARM A57@1.43 GHZ. They are the Big cores of most big/LITTLE configuration with better L1/L2 cache(48KB/2MB unified in case of Nano) than A53 cores.


Does anybody actually use these TI deep learning libs? I've never even heard of them before.


Is it just me or is the ® symbol being everywhere a bit off-putting (in terms of parsing the text)?


It's not just you. Any trademark expert will tell you that you only have to use the symbol once, in the first or most prominent place where you use the trademark.

Also when you use other companies' trademarks, you should have a notice of who the trademark owner is, which this press release does for Sitara but not for the other trademarks used in the piece.


> Also when you use other companies' trademarks, you should have a notice of who the trademark owner is

No, you shouldn't. There's no evidence it's required.


> Also when you use other companies' trademarks, you should have a notice of who the trademark owner is

Do you have a cite for this? Thanks.


Sure, here is one:

http://www.bpmlegal.com/tmdodont.html

Search the page for "disclaimer", or just read the whole article - it has a lot of good trademark advice. It also somewhat disagrees with what I said about only using the trademark symbol once - it says to use it the first time and "occasionally thereafter".

So here is an interview with an IP attorney who suggests just using it once:

https://www.forbes.com/sites/work-in-progress/2014/03/12/whe...

(Disclaimer: I'm not a trademark expert! Just passing along what I've read from those who claim to be...)


> (Search the page for "disclaimer".)

This is confusing because that isn't how the USPTO seems to use the term "disclaimer" as regards trademarks:

https://www.uspto.gov/trademark/laws-regulations/how-satisfy...

> What Is a Disclaimer?

> A disclaimer is a statement that you include in your application to indicate that you do not claim exclusive rights to an unregistrable portion of your mark. For example, if you sell shirts and your mark includes the generic word "SHIRTS," you could not object to someone else also using the word “SHIRTS” as part of his/her mark. The word is still part of both marks, but no one is claiming exclusive rights in that word, because it is an 'unregistrable' component of an overall mark. (See below for typical examples of unregistrable matter that must be disclaimed.)

> A disclaimer does not physically remove the unregistrable portion from your mark or affect the appearance of your mark or the way you use it. It is merely a statement that the disclaimed words or designs need to be freely available for other businesses to use in marketing comparable goods or services.


I don't think the bpmlegal article I linked is using the word "disclaimer" in the same sense as USPTO here. One of the examples they list is:

Teflon is a registered trademark of DuPont.

They're just saying you should add a statement like that, not saying this is a "disclaimer" in the specific meaning USPTO uses above.


> They're just saying you should add a statement like that

With absolutely no evidence that it's required.


I feel like I'm reading an infomercial


It's absolutely illegible.


Is it just me or does 1GB of ram seem a little low for a $100+ board? I can't seem to find what speed it is either.

I'm not expecting anything crazy like 8gb or anything like that, but given how many boards sell at ~50$ with 4GB of ram this just seems kinda limited.


I'm speculating but, it's likely due to board size constraints and pricing.

AM57xx SoC have 2 different DDR3 memory controllers called the EMIFs. Each one has a 32 bit data width and can support up to 2GB of memory attached. A single x16 (16 bit wide data path) DDR3 chip in 512MB (4Gb) size is reasonably cheap to buy today but doubling that size to 1GB (8Gb) makes the price go up like 4x since there's just not much volume in that size sold. In order to fit onto the BeagleBone board size, only 2 DDR3 chips could fit, so only 1 EMIF is used, and to keep costs reasonable they used 512MB (4Gb) sized chips leading to 1GB of total DDR3.

Related, the C66 DSPs and M4 cores inside of AM57xx are only able to access the EMIF interfaces over the L3_MAIN core interconnect within the SoC. L3_MAIN is a 32 bit address bus interconnect, so at most can address only 4GB of memory address space. TI's memory maps for the DSPs and M4 cores both start the EMIF addressing at 0x8000_0000 so at most the DSPs and M4 cores can only access up to 2GB of DDR3. Addresses below 0x8000_0000 are mostly memory mapped peripherals, like the GPIOs and PCIe interfaces and such. The Cortex-A15 cores are able to use an extended addressing mode and can access the EMIFs through a different interface, they don't have to use L3_MAIN (but can, but there's errata), so if you connect more than 2GB of DDR3 to AM57xx then the big Cortex-A15s can access it but no one else can, which makes it less useful/valuable for a lot of designs.

If you want a dev board with more DDR3 and AM57xx, the BeagleBoard-x15 is an option although it appears to be hard to buy just the x15 from most distributors right now as they're out of stock. TI still sells the AM572x dev kit with the LCD screen directly (https://www.ti.com/store/ti/en/p/product/?p=TMDSEVM572X), albeit for a bit more than just the x15 used to cost and quite a lot more than the new AI costs.


I think that it probably doesn't make sense to compare this to a general-purpose single board computer. It's apples and oranges.

The BeagleBone AI is aimed at prototyping in industrial automation applications. I've never worked in that area, but I wouldn't be at all surprised to see that large amounts of RAM isn't a priority for industrial controllers. Probably the software tends to be frugal with memory, because a bigger heap means more cache misses, and more cache misses mean worse latency.

A Raspberry Pi, by contrast, is mostly targeted at running a GUI and memory-hungry user applications up to and including Minecraft. It's meant for teaching kids to program and hobby stuff. It doesn't have built-in DSPs and programmable real-time units, because those are for supporting applications that fall far outside its intended purpose of having fun with Python.


That reasoning is just plain wrong. The memory usage has almost nothing to do with the field where it will be used. It will depend on the size of the model being run on the board.

The size of that model will be determined by the number of weights used. Being that industrial automation will likely use CV, that will mean the potential for a lot of weights.

Same goes for any other non trivial use case.

So yeah, 1GB is paltry.


It's been a long time since I've done anything in machine vision, but, at least back in the day, what I was seeing was that, compared to other uses for machine vision, industrial applications tended to stay a lot simpler: Lower-resolution images, black-and-white imaging, support vector machines instead of neural nets (let alone deep learning), all that good stuff. They could get away with it because they are able to much more carefully control the input domain - controlled lighting conditions, consistent orientation of the thing being imaged, all that good stuff. So they don't need 10^9 or more weights' worth of ocean-boiling complexity the way you would in something like self-driving cars or impressing your friends with your Imagenet performance.

And if you can get away with running an SVM on a 1-megapixel black and white image, then your weights will fit in 1GB with an order of magnitude to spare.


Ok, what you said about lower res images makes sense. Lower variation of images maybe means you could get away with less weights/more quantization-- you could afford to lose more information in the model. Maybe 1GB can be sufficient then.

There's no reason to use an SVM over a (C)NN nowadays though.


Sure there is. With an SVM, you can pick different kernels to more carefully engineer specific behaviors, what kinds of errors your model is likely to make, etc. You can get a good, stable model on less training data, which is great when your training data is expensive to produce. (A situation that I'm guessing is not at all uncommon in industrial automation.) You get all that crispy crunchy large margin goodness. Stuff like that.

I'd absolutely focus on ANNs if I were an academic researcher, because that's the hot flavor of the month that's going to get your career the attention it needs to bring in funding, jobs, etc. I'd also pick it for Kaggle-type stuff, where there's effectively no real penalty for generalizing poorly. Bonus points if you consume more energy to train your model than Calgary does to stay warm in the winter.

In a business setting, though, I would only default to ANNs if it were holistically the best option for the problem domain. By "holistically" I mean, "there's more to it than chasing F1 scores at all costs." The business considerations that caused Netflix to never try productionizing the prize-winning recommendation engine, for example, are always worth thinking about. Personally, I'm disinclined to look past linear models - not even as far as kernel methods - without strong reason to believe that I'm dealing with a curve that can't be straightened with a simple feature transformation. Complexity is expensive, and needless complexity is a form of technical debt.


You can get a good, stable model on less training data, which is great when your training data is expensive to produce

Huh? SVMs don't perform better than NNs on less training data.

I'm sorry, but the rest of what you said is out of date and wrong. CNNs work better than SVMs for CV tasks. There's no reason to use SVMs anymore for CV, and no one in their right mind does.


The difference is the market size... you're comparing a niche board to a mass-market one.

That is never going to be a fair comparison.


For future reference, a Raspberry Pi 4 Model B+ with 4GB of RAM is being sold for around 60$.


That's fair, I'm just a little confused because the BeagleBone AI board isn't even trying to compete with the <$100 boards. I know it's meant for very specific industrial uses, but depending on your models and how you have it setup, 1GB of ram isn't very much.

If these are meant to be nodes in a larger system I guess that makes a little more sense, but if they're meant to be more autonomous that 1GB is going to be a real limitation for certain applications.


What I saw when I was looking (a few years ago) is the Beagle bone is oriented towards the industrial market/dev board for TI Industrial processors. Totally different than the hobbyist market.


Agree.

I expect $100 board to have 2 or 4 GB.

Yet once upon a time I thought a 48 K Apple II was a lot.


How does this comparing to Nvidia's Jetson-Nano(https://developer.nvidia.com/embedded/jetson-nano-developer-... for $99) which is cheaper and appears to be more powerful?

I used BB in previous projects, one thing definitely stands out for BB is that, it could be used as a product directly with a case and some certification(EMC,etc). Nvidia's Nano is more of a development platform.

Beagleboard predates RPi actually, though after Arduino, BB is arguably the very first board running a 32-bit ARM that is also open source, cheap, small, however it's overshadowed by RPi in recently years.


Yeah the Beagleboard and more generally the TI AM335x is getting old. Single core, DDR3-800, no PCI-e, no secure boot, and other bizarre limitations. It's nice that they're putting something newer out.


Dual Cortex-A15, 2 DSPs, 4 Vision Engines, 4 Real-time controllers (PRUs), 2 Cortex-M4s, 2D accelerators, dual 3D GPUs...

It's impressive but, being pretty much domain specific chip, can anyone make use of its capabilities at hobbyist levels where Beaglebone is targeting?


Having the PRUs as supporting microcontrollers can be great where timing is critical. For example, it's possible to decode signals from an AM receiver and pass them to the host, or monitor sensors and have immediate responses (such as triggering a shut-off via GPIO).

There's nothing particularly difficult about wiring a microcontroller up to a single board computer to do these jobs, I've done exactly that for reading my weather station and heating oil tank level. But it's messy and I think the cohesion of being able to do it directly on one board is a worthwhile advantage.


The PRUs share a section of main memory with the CPU, which is pretty unique.


Description on the main page says:

> Focused on everyday automation in industrial, commercial and home applications.

So looks like they're trying to expand utility beyond hobbyist market.


I think if you're a true hobbyist, the Raspberry Pi is the path of least resistance. I've done projects with both the RPI and the Beaglebone Black.

RPi is very aimed at taking some sort of existing project and modifying it, which sounds mean to say but actually lets you do some very powerful things. The average RPi user will never be compiling device tree overlays, screwing around with the bootloader, busting out the datasheet for the processor, etc. You find some project that is kind of like the one you want to do, use their hardware and low-level settings (involving device tree overlays, but so abstracted that you will never encounter that term), and then go to town on the software, probably written in something like Python.

The Beaglebone is more like a dev board for TI's SoC. You will configure every pin exactly as you want it with a device tree overlay. You will be reading the datasheet to figure out exactly how many CPU cycles a certain instruction takes on the PRU. The amount of power you have over IO and hooking them into Linux is infinite. As an example, you can control the clock signal for the entire board through an IO pin. If you are doing realtime processing and want to make something happen in the Beaglebone PRU at the exact same time as some external hardware, you have the power to make that happen. Most people do not need this.

The TLDR is that I would use the RPi for pretty much any "maker" project because it's so easy to get things working, but would use this new Beaglebone for something like a CNC machine. If you're making a CNC machine, you need a microcontroller to stop the motor instantly when it drives into the endstop (even if Linux is currently processing your mouse movement or a network packet), you need a realtime microcontroller to properly move the axes in unison (so you can cut a circle at your exact feedrate), and you need Linux to drive a monitor with the controls on it. This new board puts all the processing power you need on the same board, and has all the kernel hooks you need to communicate between your non-realtime Linux software and the microcontrollers. (It's been a couple years since I've used the PRUs, but they show up as a "remote CPU" under Linux and have APIs for bidirectional message passing, which is potentially more powerful than a serial port interface to an Arduino that does the realtime stuff.)

You can of course just plug an Arduino into a USB port to get the same effect... this is how pretty much every 3D printer ever works. I literally have this exact setup on my 3D printer; an RPi that controls my 3D printer over USB -- the RPi hosts a web interface, the microcontroller handles the realtime motor moves. If you are manufacturing something like this, having everything on one board will probably lower your costs. That is why the Beaglebone exists.

(What this has to do with AI? I don't know. Maybe it's for pick-and-place machines that need realtime motor moves and some computer vision.)


Yeah, I think robotics are a big focus with this board. Using the PRU for hard PWM and motor and encoder tasks is incredible, and it's just gonna be even more interesting with double the PRU resources.


>low cost development board yet, and it’s driving a whole new AI revolution.

This press release is a disaster as far as grammar is concerned. I am legitimately unable to tell if it has any special properties regarding AI.

And NO, it came out yesterday, it's not driving any revolutions.


I'm not overly familiar with TI's SOCs post-2010. Anyone out there with a good overview of what the Sitara AM5729 includes besides the bullet points in that piece?

And what about TIDL adoption? I've been working on the Intel/NVIDIA-grade part of the ML scale and have a few ESP32 boards to fiddle with OV2640 cameras, but very little in between except what Broadcom has been doing.


Not a summary (which would be from one perspective or another) the PDF's cover much and can wordsearch down into the details that pick your interest. http://www.ti.com/product/AM5729/technicaldocuments

As for tooling, could not say, but as a previous comment stated - the NVIDIA offering for the same money, makes this hard to stick out for many. Though i'm sure it, as most boards do, have a niche - what that niche is beyond already invested in the beagleboard environment and comfort level, that would be the only uptick for some that i'm seeing from initial glance.


Any experiences with using tensorflow models with TI Deep Learning (TIDL)?


I'd be interested in hearing anecdotes as well.

In my experience, software from hardware companies has been so reliably abysmal that it makes the "enterprise software" us SWEs like to complain about look decent by comparison.

Hopefully it's different this time :)


Does anybody actually use TIDL? Sounds more like a "me too" development from TI just so they can say they're in the DL game.


What's so "AI" about it? It doesn't even have a TPU. Kendryte K210 has a fixed point TPU, 400MHz dual core RISC V with FPU, 8 channel audio DSP, FFT and crypto acceleration, and costs $8.90 with wifi and $7.90 without. And the module is the size of a half of a postage stamp. Runs TensorFlow Lite (a subset of ops, but good enough to do practical things).


That has only 8MB RAM, and the company seems so fly-by-night they can't be bothered setting up Let's Encrypt.


8MB RAM is more than enough to run a quantized MobileNet, which they demonstrate by preloading object detection on it, out of the box. And the chip is real. I have a couple of boards with it, it works. I guess people just have a hard time believing all the stuff in the spec can be done for less than 10 bucks. 28nm by the way, not a joke. The company got its start in crypto mining, so this is a side gig for them.


Can someone tell me if there are easy to use libraries that can speed up existing ML code, say written in Python, on this?

Or do we have to write custom C/C++ code to make best use of available hardware?


It looks pretty cool and I think about getting one. Does anyone know if it has a dedicated neural network accelerator?


Can any of the better minds out here compare this with the Jetson/Jetson Nano ?


It's much more focused on real time controls (4 PRUs and 2 Cortex M4) than GPGPU processing. It's pretty much for industrial vision applications than the AI actually.


What are the typical use cases for this type of board ?


DSP and vision does not "AI" make.


FWIW, the SGX GPU in it were on of the first to go hog wild on f16.

And if they'd document their ISA, it's pretty amenable to being used for neural networks, wayore than the other mobile GPUs at least. It'd be a cold day in hell before they did that though unfortunately.


Their ISA isn't even documented?


No. I actually looked at reverse engineering it a few years back. If you pull apart their drivers you can figure it out pretty quickly (there's actually two different ISAs, the main shader cores and a tiny little RISC esque core that marshals work for the shader cores).

But their driver/software complexity is super high to even get a triangle in a buffer or run a compute job. They have a RTOS looking microkernel running on the shader cores, and there's a ton of caching and MMU setup you have to do from the GPU side (not the main app processor). And there's a lot of caching hints and hacks that are hard to work around if you don't know the context (a lot of tables for bug reference numbers and special cased code depending on those)

If anyone from Imagination is listening, the open source community would still love your help in supporting these chips. : ) They're really pretty inside, and the world should know about the good work y'all did!


The press release could have used a real life example, like "Training MNIST dataset takes .5 seconds" or something.

Where can I find info about how these edge computing boards are speeding up training time? Or how they compare to a 1080i?


As most of the so called machine learning boards, this one is not for learning, just for inferencing at best.


I don't think these are meant for training, but instead inference.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: