Hacker News new | past | comments | ask | show | jobs | submit login

> As I've been really interested in computer vision lately, I decided on writing a SIMD-accelerated implementation of the FAST feature detector for the ESP32-S3 [...]

> In the end, I was able to improve the throughput of the FAST feature detector by about 220%, from 5.1MP/s to 11.2MP/s in my testing. This is well within the acceptable range of performance for realtime computer vision tasks, enabling the ESP32-S3 to easily process a 30fps VGA stream.

What are some use cases for FAST?

Features from accelerated segment test: https://en.wikipedia.org/wiki/Features_from_accelerated_segm...

Is there TPU-like functionality in anything in this price range of chips yet?

Neon is an optional SIMD instruction set extension for ARMv7 and ARMv8; so Pi Zero and larger have SIMD extensions

Orrin Nano have 40 TOPS, which is sufficient for Copilot+ AFAIU. "A PCIe Coral TPU Finally Works on Raspberry Pi 5" https://news.ycombinator.com/item?id=38310063

From https://phys.org/news/2024-06-infrared-visible-device-2d-mat... :

> Using this method, they were able to up-convert infrared light of wavelength around 1550 nm to 622 nm visible light. The output light wave can be detected using traditional silicon-based cameras.

> "This process is coherent—the properties of the input beam are preserved at the output. This means that if one imprints a particular pattern in the input infrared frequency, it automatically gets transferred to the new output frequency," explains Varun Raghunathan, Associate Professor in the Department of Electrical Communication Engineering (ECE) and corresponding author of the study published in Laser & Photonics Reviews.

"Show HN: PicoVGA Library – VGA/TV Display on Raspberry Pi Pico" https://news.ycombinator.com/item?id=35117847#35120403 https://news.ycombinator.com/item?id=40275530

"Designing a SIMD Algorithm from Scratch" https://news.ycombinator.com/item?id=38450374




Thanks for reading!

> What are some use cases for FAST?

The FAST feature detector is an algorithm for finding regions of an image that are visually distinctive, which can be used as a first step in motion tracking and SLAM (simultaneous localization and mapping) algorithms typically seen in XR, robotics, etc.

> Is there TPU-like functionality in anything in this price range of chips yet?

I think that in the case of the ESP32-S3, its SIMD instructions are designed to accelerate the inference of quantized AI models (see: https://github.com/espressif/esp-dl), and also some signal processing like FFTs. I guess you could call the SIMD instructions TPU-like, in the sense that the chip has specific instructions that facilitates ML inference (EE.VRELU.Sx performs the ReLU operation). Using these instructions will still take away CPU time where TPUs are typically their own processing core, operating asynchronously. I’d say this is closer to ARM NEON.


SimSIMD https://github.com/ashvardanian/SimSIMD :

> Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, C, and Swift, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE

github.com/topics/simd: https://github.com/topics/simd

https://news.ycombinator.com/item?id=37805810#37808036


From gh-topics/SIMD:

SIMDe: SIMD everywhere: https://github.com/simd-everywhere/simde :

> The SIMDe header-only library provides fast, portable implementations of SIMD intrinsics on hardware which doesn't natively support them, such as calling SSE functions on ARM. There is no performance penalty if the hardware supports the native implementation (e.g., SSE/AVX runs at full speed on x86, NEON on ARM, etc.).

> This makes porting code to other architectures much easier in a few key ways:


> The FAST feature detector is an algorithm for finding regions of an image that are visually distinctive, …

Is that related to ‘Energy Function’ in any way?

(I ask because a long time ago I was involved in an Automated Numberplate Reading startup that was using an FPGA to quickly find the vehicle numberplate in an image)


What you are thinking of operates at a different level of abstraction. Energy functions are a general way of structuring a problem, used (sometimes abused) to apply an optimization algorithm to find a reasonable solution for it.

FAST is an algorithm for efficiently looking for "interesting" parts (basically, corners) of an image, so you can safely (in theory) ignore the rest of it. The output from a feature detector may end up contributing to an energy function later, directly or indirectly.


Interested in doing more of this type of work optimizing a SLAM/factorgraph pipeline?

Email in bio and would love to chat!


> Is there TPU-like functionality in anything in this price range of chips yet?

Kendryte K210 supports 1x1 and 3x3 convolutions on the "TPU". It was pretty well supported in terms of software & documentation but sadly it hasn't become popular.

These days, you can easily find cheap RV1103 ("LuckFox"), BL808 ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev boards, all of which have some sort of basic TPU. Unfortunately, they are designed to be linux boards meaning that all TPU related stuff is extremely abstracted with zero under-the-hood documentation. So it's absolutely unclear whether their TPUs are real or faked with clever code optimizations.


> These days, you can easily find cheap RV1103 ("LuckFox"), BL808 ("Ox64/Pine64") and CV1800B/SG20002 ("MilkV") based dev boards, all of which have some sort of basic TPU. Unfortunately, they are designed to be linux boards meaning that all TPU related stuff is extremely abstracted with zero under-the-hood documentation. So it's absolutely unclear whether their TPUs are real or faked with clever code optimizations.

They all have TPU in hardware, my team has been verifying and benchmarking them. Documentation is only available for the high-level C APIs to the libraries that a programmer is expected to use, and even that tends to be extremely lacking.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: