Optimizing latency of an Arduino MIDI controller

jarmitage · on Aug 29, 2017

Relevant plug: if you're interested in making ultra low latency (as low as 80us) embedded musical instruments, check out Bela:

- http://bela.io

- http://github.com/belaplatform

- Many of these papers 2015 or later feature Bela: http://instrumentslab.org/publications/

tyingq · on Aug 29, 2017

There's a nice short article that explains how Bela does this: http://hackaday.com/2016/04/13/bela-real-time-beaglebone-aud...

The pair of PRUs in the Beaglebone black is a large part of it.

kazinator · on Aug 29, 2017

80 microseconds is an insignificant increment of time relative to the attack of a musical note for most musical instruments.

The 80 microsecond wavelength corresponds to 12.5 kHz. That's in the range of the upper harmonics that determine the "crispness" or "air" of the tone.

Loudspeakers and filters will introduce more phase shift than this.

Oh, ... and sound travels a whopping 27 centimeters through air in 80 us.

I don't think any event in music needs to be timed to 80 us.

"Dude, did you pull down the 12.5 kHz band on the 31 band eq again? My hi-hat sounds late!"

"No way man, look: you moved your friggin' stool 27 cm from what it was before, see?"

jarmitage · on Aug 29, 2017

I am definitely not trying to argue that people can respond musically to events on the order of 80us! Although maybe some augmented future humans will prove that to be the case ;D

But think about it, having latency below (even way below) the threshold of human perception in a digital musical instrument increases the possibility space, in the same way that in digital recording you might use 192kHz sample rate even though we don't hear in that range.

It also means you can add extra components to your system that might add more latency without crossing the perception threshold.

So, to me there's plenty of advantages of having a system capable of this, many of which are still to be explored.

kazinator · on Aug 29, 2017

Low latency is useful if many devices are chained together. If you have a chain of ten, 80us becomes 800us: 0.8ms. That is still very good.

The original MIDI was designed for (reasonable) serial chaining; many devices have a MIDI IN and OUT port (and some have a THROUGH).

In spite of this, the protocol runs at only 31250 bps. It takes 10 bits (8N1) to encode one byte, and it takes something like 3 bytes to encode a "note on" message (for instance). The message is consequently 960 us wide: almost 1 ms!

So with no chaining of anything, just connecting a MIDI source (like a keyboard) to a synthesizer with a MIDI cable, we have a 1 ms minimum delay to turn on a note caused by the sheer duration of the message on the wire.

192 kHz sample rate for storage and transmission of audio is complete, utter bunk.

For sampling, oversampling is useful because it's easier and cheaper to make a fast ADC, and couple it with a cheaper, simpler analog filter. If you want to sample at 44.1 kHz or even 48 kHz, and capture a decent range of the audio spectrum without aliasing, you need a very steep "brick wall" filter at the Nyquist frequency. But if you sample at 192 kHz (with an aim to capturing the same spectrum), the filter doesn't need to be that steep. You still roll off past 20 kHz, but less aggressively. Not only is that simpler and cheaper, but the filter can be designed with better properties in regard to phase shift and group delay, and flatter response near the threshold. Of course, the idea is then to immediately reduce the data from the sampler to a lower rate. It's like moving much of the filter into the digital domain.

_pmf_ · on Aug 29, 2017

Very interesting; are you involved with this project?

jarmitage · on Aug 29, 2017

Yes, the team is made up of people from the Augmented Instruments Lab http://instrumentslab.org

shams93 · on Aug 29, 2017

This is really amazing!

joren- · on Aug 29, 2017

In the publication below [1] a comparison with respect to latency is made between Teensy, Arduino Uno, xOSC, Bela, Raspberry PI and xOSC. One of the findings is that serial over USB is slower than Midi over USB while technically very similar. The Axoloti [2] is not included in the publication but is of interest as well when building low latency audio devices.

[1] http://www.eecs.qmul.ac.uk/~andrewm/mcpherson_nime2016.pdf

[2] http://www.axoloti.com/

fit2rule · on Aug 29, 2017

The Axoloti is a superlative design for audio - both at software and hardware, layers. The Arduino, not so much.

The Article Author doesn't mention whether they've also abandoned the Arduino MIDI libs and written their own. Probably there's some latency up-stream that can be reduced, as well ..

jononor · on Aug 29, 2017

The Axoloti looks like the device I dreamt of creating many years ago when I was very into music instruments and just got into embedded hardware/software.

jononor · on Aug 29, 2017

Surprised to be on HN! Open for questions if anyone has got any.

bjt2n3904 · on Aug 29, 2017

Are you polling the sensors, or using interrupts? I don't see how going from one sensor to eight increases latency.

jononor · on Aug 29, 2017

Polling. The atmega32u4 only has 4/5 external interrupts, and our instrument has 8 pads. The CapacitiveSensor Arduino library used does this sequentially, busy-looping for each pin. It would be possible to rewrite this to do all 8 pins in parallel. Right now the readout for different pads is sampled as slightly different times, which works but non-ideal. A more modern uC can have capacitive sensing pherirals, like in the Teensy 3.0

the-dude · on Aug 29, 2017

IIRC, on a 328 you can turn on external interrupt for an entire port ( 8 pins ) at once. Once the interrupts fires, you detect which pin has changed.

This doesn't work for the 32u4 ?

jononor · on Aug 29, 2017

Looking at the datasheet for 32u4, port B does have pin change interrupt. So one could maybe use that. Depending on what pins are exposed on Leonardo board, might need to combine with with external interrupts on pins to get a full 8. A challenge is that the capacitive sensing works by sensing how long it takes. For small capacitance (depends on your sensor pads) and resistor values, this is in the order of tens of clock cycles. That can be challenging to measure with a timer. Several such measurents are summed up to suppress noise. However for bigger sensors with higher capacitance, like for distance sensing, then interrupt-based code makes more sense because the uC would actually wait for a significant time.

revelation · on Aug 29, 2017

Not directly related to the OP, but since it's a popular setup: the best you can hope for with a USB 2.0 FTDI or USB serial converter is USB fullspeed frame rate, or 1 kHz. So 1 millisecond in one direction.

voltagex_ · on Aug 29, 2017

What are the other options?

tzs · on Aug 29, 2017

> My first idea was to use a high-speed camera, using the video image to determine when pad is hit and the audio to detect when sound comes from the computer. However even at 120 FPS, which some modern cameras/smartphones can do, there is 8.33 ms per frame. So to find when pad was hit with higher accuracy (1ms) would require using multiple frames and interpolating the motion between them.

I wonder how accurate you could get if you hit the pad with the phone and used the phone's accelerometer to figure out when the impact occurred?

jononor · on Aug 29, 2017

Samples rates of accelerometers are usually around 100Hz, so 10ms between each sample. Some phones might be as high as 250Hz which might start to be usable. One challenge when using different sensors is to establish a joint timeline precisely. Might need to synchronize them with an event observed in both at the same time, like the 'clapper' used in filmmaking.

MrZeus · on Aug 29, 2017

When I read the title of the article I wondered "did they just discover ASIO4ALL?"

Yes. Yes, they did.

- http://asio4all.com/