Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Where are the good resources for learning audio processing?
164 points by SamCoding 3 months ago | hide | past | favorite | 50 comments
I'm trying to program a harmonizer, like Jacob Collier's one built by MIT's Ben Bloomberg. I am looking for good, accessible resources on pitch shifting (whilst still sounding natural) and other terms I've heard like format shifting.

Where are some good resources for this for somebody with extensive programming experience but no experience in audio processing?




Hello :) all of the resources mentioned here are great! One step I’d add to the learning part (and it’s what we did when building Jacob’s) is to spend a lot of time trying out existing implementations to determine what you like and don’t like.

For example, many of them don’t have great low end. Some are “sluggish” and need external enveloping. Getting a sense for what’s out there can help to provide a North Star when you write your own. Some classics are the Eventide H3000, IZotope Vocal Synth, TC Voice Live, Antares Harmony Engine, and Soundtoys Little Alterboy.


https://ccrma.stanford.edu/~jos/

Not much to say that Julius doesn't... open course materials for (almost) everything you might need in audio processing.


Second anything from CCRMA, the inventors of FM synthesis and still the one top programs in the country/world.


Thirded! CCRMA and its people are awesome (way more so than the rest of Stanford.)


This comment helped get me over the barrier to take a closer look...


I've used it as a reference or to gain a different perspective on something I'm familiar with - but it's generally way too terse to learn from

Ex: https://ccrma.stanford.edu/~jos/mdft/Bessel_Functions.html


That doesn't look terse to me, though it does require familiarity with the subject.

"The last expression can be interpreted as the Fourier superposition of the sinusoidal harmonics of [expression], i.e., an inverse Fourier series sum. In other words, [expression] is the amplitude of the k-th harmonic in the Fourier-series expansion of the periodic signal x_m(t)."

Many of the concepts are hyperlinked for reference. With the required familiarity, I would much rather read this than something that took seven pages to get to the point - say by assuming that the reader is unfamiliar with a premise out of an abundance of caution.


Julius is a national treasure. I learned immensely from his class and textbooks...


Great link, thanks :)


Hey - one of the industry standard time stretching library is "elastique" by Zynaptiq (licensed, not open source). Used by Ableton, FL Studio etc.

If you want to peak into some source code - you can look into Rubberband library:

https://breakfastquay.com/rubberband/

Rubberband is one of the time stretching/pitch shifting algorithms used in Reaper. You can download reaper trial and listen to the results with different parameters to see how you can tweak the code and if that gets any results you're happy with:

https://www.reaper.fm/


>Hey - one of the industry standard time stretching library is "elastique" by Zynaptiq.

The company is Zplane, not Zynaptiq. Easy mistake, there is a little overlap.

https://licensing.zplane.de/


I mix these 2 up constantly, thanks! :)


Learn how to search and read the research literature, most of the rest is just dabbling in the shallows.

MTG Barcelona has been doing R&D for Yamaha since the 90s. They have published a lot of work on time-frequency transformation and have certainly implemented harmonizers and time stretchers. Look for papers and thesis by Jordi Bonada, Alex Loscos and certainly others too: https://www.upf.edu/web/mtg/research/publications

Needless to say pitch shifting is nothing new, so going back to research publications from the 90s may help. Publications might be found in early conferences of DAFX, ICMC, IEEE Mohonk, ACM multimedia, JAES, etc etc. Try keywords like "waveform similarity overlap add" WSOLA, "Lent's algorithm".

The musicdsp mailing list has discussed pitch shifting many times over the years. Participants have included engineers with fairly intimate familiarity with algorithms employed by Eventide, z-plane, etc. I would search the archives (you may need to do some digging to find all of the archives dating back to the late 90s).

Maybe look at expired patents from Creative Labs, Eventide and Antares if you feel comfortable exposing yourself to that literature.


Professor Puckett, inventor of Max and PureData (the two top visual programming languages for DSP) has a book, The Theory and Practice of Electronic Music, with interactive examples written in PD, this one probably has an example exercise for a pitch shifter [0]

I often recommend also Music and Computers originally out of Columbia. [1]

[0]http://msp.ucsd.edu/techniques.htm

[1]https://musicandcomputersbook.com/


Very cool. I've used PD a lot in the past, but I didn't know about his book!


he gives awesome talks as well!


I find [1] a good reference. A con is the examples are in matlab, but it's clear enough between the text and matlab code to write your own implementation.

Also [2] is a decent book for overall dsp concepts.

[1] DAFX - Digital Audio Effects (Second Edition) Edited by Udo Zölzer https://dafx.de/DAFX_Book_Page_2nd_edition/index.html

[2] Understanding Digital Signal Processing, Richard Lyons


lyons is a good intro but maybe a bit handwavey at times (although my copy is an edition from the 90s).

consider maybe backing it up with one of the textbooks like oppenheim (the classic) or manolakis (one that i think i remember liking).


Zolzer’s book is the best out there on the topic that I know of.


Real-time audio programming 101: time waits for nothing http://www.rossbencina.com/code/real-time-audio-programming-...

C++ for Real-Time Audio Programming: https://learn.bela.io/tutorials/c-plus-plus-for-real-time-au...


You should learn Supercollider, starting with Eli Fieldsteel's tutorials.


Pretty good Audio Developer Conference talk on it here: https://www.youtube.com/watch?v=fJUmmcGKZMI


Also check out the blog (https://signalsmith-audio.co.uk/writing/2023/stretch-design/) and corresponding library on GitHub.


Great channel on fundamentals of digital audio: https://www.youtube.com/@akashmurthy


When I wanted to make a python application to separate a song into the source instruments I used this: https://www.coursera.org/learn/audio-signal-processing. I studied signal processing as a Computer Engineer student but I didn't really get it at the time, with that course I understood what I could do in practice.


It's not going to directly teach you how to build a harmonizer, but this guy has a series of incredible videos on audio processing that might be helpful: https://www.youtube.com/playlist?list=PL-wATfeyAMNoirN4idjev...


Great discord where many audio programmers hang out, may be able to answer your specific questions when you get to the more detailed areas. https://www.theaudioprogrammer.com/discord

A blog post about an open source C++ pitch shifting library: https://signalsmith-audio.co.uk/writing/2023/stretch-design/

And accompanying ADC talk: https://www.youtube.com/watch?v=fJUmmcGKZMI



Audio is half art, half science. That's why I'd try to find someone with experience.

Back in university, I heard lectures on FFT and its applications to audio signal processing. So open access university courses would be the second place I'd look. The approach I always try first is to ask people I know if they can recommend a conference/meetup. For example, the annual JUCE events appear to be chock full with VST plugin developers. There's also private schools like SAE where you (or your employer) can pay for you to have an hour with one of their lecturers to ask questions.



Check out officehours.global – a lot of audio people hang out there.


The Will Pirkle books have a lot of good info and code to get you started:

https://www.willpirkle.com

Audio programming is a lot of fun but it's the most challenging domain I've ever worked in. You have to be very careful with what you do on the audio thread. No locks, no memory allocation etc. Messing this up can result in some really ugly audio artifacts.


If you are interested in Apple platforms check out https://www.audiokit.io


Use LabView as a calculation engine to do experiments. The advantage is you get system-like diagrams.



http://blogs.zynaptiq.com/bernsee/time-pitch-overview/

Not sure if it's useful. It's probably going to involve granular synthesis.


Gnu Radio can easily handle audio I/O as well as it does IQ signals from SDR front ends. It's cross platform and you just build flow graphs, which then can be executed.



I would pick up a microcontroller dev board that has a mic built in (Eg one of the STM32 discoveries). Also get a "codec" dev board. (Or alternatively, use the MCU's onboarod DAC). Get it to receive audio, process it using DSP, then output it, and/or save to memory. This will really force you to understand it.


Why not just use a regular laptop for this? There’s a ton of low level sound processing libraries for every OS.


Bad advice. I have no idea how using a microcontroller would help someone understand pitchshifting algorithms.


I'm trying to program a harmonizer

Why?

Not questioning your motivations.

Rather I’m curious what they are.


www.airwindows.com may help.


Chris @ airwindows is super nice to release all his plugins and source code for free. But the code quality is really bad. That doesn't matter if you're using a plugin in the production of a song and it works well. But for learning dsp, it's a bad resource.


I was curious what you meant and went to have a look. At first all seemed well, until I got to the actual audio processing part. :)

https://github.com/airwindows/airwindows/blob/master/plugins...

Then again, maybe this is the norm for audio engineers? Not my field.


The documentation for that specific module even calls it out as "painfully hard-coded biquad filter code", YMMV.

I'd guess those files aren't what the author actually edits - there are commits that suggest that they are _generated_ from "boilerplate", and even a few files that seem to have failed interpolations ("__MyCompanyName__" in some copyright lines, for instance)

A lot of files also seem to have duplicated code, down to individual comments. For instance, the comment on line 24 reoccurs on line 344 of this effect:

https://github.com/airwindows/airwindows/blob/master/plugins...

and in the Mac AU version on line 267:

https://github.com/airwindows/airwindows/blob/master/plugins...

and in the Linux VST version on lines 24 and 344:

https://github.com/airwindows/airwindows/blob/master/plugins...


Not looking for an argument, but can you give some pointers as to what is bad about the code?


Someone already shared an example of how difficult the dsp code is to read. There's tons of magic numbers and short variable names with no description of why the code does what. Again, that's cool if you want to use the compiled version in a song, but it's not friendly to learn from. But I don't think the goal of this project is to be educational, it's just a little bonus that the code is open source.

There's also just things like uneven indentation, lots of things to reduce linecount at the expense of readability like single line if statements. Old optimization tricks that aren't necessary with modern compiler like `while (--sampleFrames >= 0)`.

Here's another example file

https://github.com/airwindows/airwindows/blob/master/plugins...

And the project structure is really weird. Normally you would use a framework like JUCE or iPlug, or write your own, such that your dsp code is written once and your multiplatform code is separate. Instead every platform (Mac, Windows, Linux, VST, AU) are a separate codebase with all the dsp code duplicated.

He's definitely using some sort of templating system to do this, maybe even using some tool that lets him write the dsp in Python or Matlab and converts it to C++. Basically these files in the github are not his true "source", this is generated from his real source. For that reason, commits are often massive and unhelpful in tracing changes to any individual effect.

For example, a recent commit where he added the WolfBot effect, a guitar amp simulator, has 427 changed files and 150,284 added lines. Also even though the commit is titled WolfBot, this commit includes the addition of other fx, CreamCoat, DeRez3, kCathedral3, kGuitarHall, kPlate140, kPlate240.

https://github.com/airwindows/airwindows/commit/7623a1c14b01...


Thanks, much appreciated that you took the time to point those things out.


Audio Anecdotes series




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: