Wow, didn't expect this to hit HN! I'm the author of this project and super glad that it's getting some traction.
Under the hood, this is essentially just a Python wrapper around JUCE (https://juce.com), a comprehensive C++ library for building audio applications. We at Spotify needed a Python library that could load VSTs and process audio extremely quickly for machine learning research, but all of the popular solutions we found either shelled out to command line tools like sox/ffmpeg, or had non-thread-safe bindings to C libraries. Pedalboard was built for speed and stability first, but turned out to be useful in a lot of other contexts as well.
This is great, congrats and thank you (& Spotify) for releasing this!
I was just about to look for a library to layer 2 tracks (a text-to-speech "voice" track, and a background music track) and add compression to the resulting audio.
A few questions if you don't mind:
- Pedalboard seems more suited to process one layer at a time, correct? I would be doing muxing/layering (i.e. automating the gain of each layer) elsewhere?
- Do you have a Python library recommendation to mux and add silence in audio files/objects? pydub seems to be ffmpeg-based. Is that a better option than a pure-Python implementation such as SoundFile?
That's correct: Pedalboard just adds effects to audio, but doesn't have any notion of layers (or multiple tracks, etc). It uses the Numpy/Librosa/pysoundfile convention of representing audio as floating-point Numpy arrays.
Mixing two tracks together could be done pretty easily by loading the audio into memory (e.g.: with soundfile.read), adding the signals together (`track_a * 0.5 + track_b * 0.5`), then writing the result back out again.
Adding silence or changing the relative timings of the tracks is a bit more complex, but not by much: the hardest part might be figuring out how long your output file needs to be, then figuring out the offsets to use in each buffer (i.e.: `output[start:end] += gain * track_a[:end - start]`).
Makes sense, so I'd be doing everything at the sample-level
For layers, I could have an array that represents "gain automation" for each layer, and then let numpy do `track_a * gain_a + track_b * (1-gain_a)` for the whole output in one go.
And I'd create silences by inserting 0's (and making sure that I'm inserting them after a zero crossing point to avoid clicks)
I'm prone to NIH :-) but I'll also try to see if something like this exists. But at least -- it's clearly do-able/prototype-able!
Out of curiosity, did you use any of the code produced by Echo Nest? They were a Boston audio tech company that had lots of features like this, but they got swallowed by Spotify many years ago. I built some tools on top of their service, I always wondered what happened to it.
No Echo Nest code was included in this project specifically, but my team owns a lot of the old Echo Nest systems, data, and audio magic (i.e.: what used to be the Remix API, audio features, audio analysis, etc.). Pedalboard is being used to continue a lot of the audio intelligence research that started way back with the Echo Nest!
(Fun fact: the Echo Nest's Remix API was what got me interested in writing code way back in high school. Now, more than a decade later, I'm the tech lead for the team that owns what's left of it. I still can't believe that sometimes.)
I've done a lot of stuff with the Audio Analysis API, and it's horribly underdocumented. I tracked down the phd thesis that formed the basis of that array some time ago[0], but it's pretty theoretical and likely outdated. Do you think you'll ever get around to actually documenting that API?
This is very interesting! I write VSTs as a hobby (usually in JUCE). Does this make it easier for load them into Python, or to create new ones, or...? How do you envision this being used?
We use Pedalboard (and Python) for offline/batch processing - mostly ML model training.
Pedalboard would also be usable in situations that are tolerant of high latency and jitter, though, given that all audio gets handed back to Python (which is both garbage collected and has a global interpreter lock) after processing is complete.
Instruments wouldn't be that hard to add to Pedalboard, but we don't have a use case for them on my team just yet. I might give that a try in the future, or might let someone else in the community contribute that.
Whilst this seems cool - I'm struggling to understand the real world use cases.
> Machine Learning (ML): Pedalboard makes the process of data augmentation for audio dramatically faster and produces more realistic results ... Pedalboard has been thoroughly tested in high-performance and high-reliability ML use cases at Spotify, and is used heavily with TensorFlow.
What are the actual use cases internally at Spotify and for the public here?
> Applying a VST3® or Audio Unit plugin no longer requires launching your DAW, importing audio, and exporting it; a couple of lines of code can do it all in one command, or as part of a larger workflow.
I wonder how many content creators are more comfortable with Python than with a DAW or Audacity?
> Artists, musicians, and producers with a bit of Python knowledge can use Pedalboard to produce new creative effects that would be extremely time consuming and difficult to produce in a DAW.
Googling "how to add reverb" yields Audacity as the first option. A free, open source tool available on Linux+Win+Mac. In what world is it easier to do this in Python for Artists, musicians and producers?
As a music producer that's well versed in Python myself (even if I hadn't switched to producing almost entirely out-of-the-box and on modular/hardware synths) I'd much rather just apply basic effects like these in a DAW/Audacity, where accessing and patching a live audio stream is much easier than figuring out how to do that in Python and only being able to apply effects to .wav files rather than live audio.
On the ML front (which is probably their primary motivation) it's pretty useful for the kind of things Spotify is interested. As a basic example, say you want to train a model to classify songs by genre. If you have say, a country song, adding a bit of reverb or compression to it will not change what genre it sounds like. So augmenting their training data with small transformations such as these can make their models more robust to these transformations. Obviously, this has to been judiciously, e.g, if you add tons of distortion and reverb to a country song it might sound like some experimental noise and not country. This kind of thing also can help with duplicate detection, song recommendations, playlist generation, autotagging, etc.
As for creators, maybe not a large fraction of music creators are coders, but there's certainly an intersection in that venn diagram, though I have no idea how large it is. And I imagine this could be used to create other tools that don't require coding.
Clearly, most of the time it makes more sense to apply FX interactively in your DAW of choice, but I find it useful to programmatically modify audio sometimes. For example, I've written quick scripts using sox and other tools to normalize/resample audio, as well as slice loops. I could see being able to add other fx such as compression or maybe even reverb programatically could be occasionally useful.
I don’t think I would. The amount of music produces and musicians I know the majority of them are relatively poor at ‘tech’ and definitely not coders/programmers/software engineers. They definitely know their way around tools, but not coders.
For machine learning on audio data, it is often (always?) useful to modify the original dataset to make the model more general.
A way to do such manipulation that is both convenient to use from Python (a major programming language in the field and well tied in to the major frameworks) and performant is extremely welcome.
While I think this is awesome (from the perspective of audio production pipelining requirements), I can't really see the need to use a VST to apply basic audio alterations for the stated ML purpose, since there are already many native DSP libraries that can apply reverb, distortion, delay, convolution, etc. to an audio signal.
Not a lot of creators are necessarily comfortable with Python or other coding, but there are definitely people (including me) interested in whatever can be done programmatically with a DAW quality, without a DAW.
This opens possibilities such as version control, collaboration via PR, the regular coding workflow etc.
(I am dabbling with music and Elixir + Rust at the moment, and definitely interested by what Pedalboard brings, including programmatic VST hosting etc).
There are plenty of standalone plugin hosts, so just write a plugin (JUCE offers a perfectly fine framework and workflow for that, as do some others like DPF), load it into a standalone plugin host, done.
> Applying a VST3® or Audio Unit plugin no longer requires launching your DAW, importing audio, and exporting it; a couple of lines of code can do it all in one command, or as part of a larger workflow.
Is also not really true. There are plenty of scriptable VST hosts, and libraries. BASS (the library) for instance has been around for ages and I've used it to host VSTs in script workflows.
I'm a bit perplexed why you quoted a statement saying what it's used for with "what are the actual use cases..." lol.
This useful for me, both for ML and for adding effects through sounds. Would much rather use python than a DAW. I know enough signal processing to prefer running code I can inspect rather than using some opaque GUI.
> I wonder how many content creators are more comfortable with Python than with a DAW or Audacity?
This opens it up potential for a simple GUI. For a basic user, drag and drop an audio file and flip virtual switches. Or, easier integration into a mobile "podcast creator" app.
Some years ago when Ardour (a crossplatform FLOSS DAW) was being sponsored by SSL (famous for their large scale mixing consoles), I got to attend a meeting designed to float and discuss "blue sky" ideas.
Somebody who had been with the company for a long time predicted that the broadcast world was going to end up demanding a box with just 3 buttons:
[ That was worse ]
[ That was better ]
[ Try something else ]
Everybody laughed, but everybody also knew that this was indeed the direction that audio engineering was going to go in.
One of my favourite things with Winamp ~20 years ago was the ability to stack DSP/sound plugins and then output to WAV. It was a weird but great way to quickly create CD-ready tracks that were crossfaded with effects (eg speed or stereo separation or vocal removal) etc. I was basically 'batch processing' through a GUI without even realizing it.
It's internal API can't consume external Python libs (IE pip deps), but people have built a bridge module that lets you use regular Python to communicate with another Python process in the DAW and invoke the full API.
This would be really useful, because you'd be able to programmatically process audio on tracks using pre-made effects, and let users create and share VST settings presets.
So users could write scripts which apply chains of FX directly to audio clips on tracks, grabbing the audio sources/files from the DAW programmatically. And a community library of useful FX chains could emerge from this.
Oh man -- Imagine a graphical node-based UI where the user can place nodes as FX and route the audio file through a series of FX nodes, with tunable params.
This is entirely doable!!
alias fx="sox - -t wav -"
cat foo.wav | fx overdrive | fx reverb | play -
considering that sox has existed since the early 1990's, I'd wager that the demand for that isn't exactly huge
(note that in practice you'd directly use sox's play command to apply effects as it's certainly muuuuuuuuuuuch more efficient than to spin up a ton of processes which'll read from stdin/stdout)
I've never actually used sox for fx, I wonder how good they are. (I have used it plenty for resampling, normalization, trimming, etc). But regardless, there's thousands of VSTs out there -- a lot more options than whatever sox has built in.
That's an interesting idea that you should be able to build on top of Pedalboard -- i.e., implement a CLI that accepts input and output file paths (and optionally support piped data, as in your example) and expose the VST plugin names and their options i.e.
That's easy to do if you first strip the WAV formatting stuff and then at the end add back the format information.
You can't really do it without that, because sound.wav contains both actual audio data and "metadata".
In the real world however, almost nobody who has done this sort of thing actually wants to do it that way. The processing always has a lot of parameters and you are going to want to play with them based on the actual contents of sound.wav. Doing this in realtime (listening while fiddling) is much more efficient than repeatedly processing then listening.
Pedalboard is a wrapper around the JUCE framework (https://juce.com), which is dual-licensed under the GPLv3 or a custom paid commercial license. We chose to license it with the GPLv3 rather than coming up with a dual-license solution ourselves, given that this is an audio processing tool in Python and will usually be used in scripts, backends, and other scenarios where users of Pedalboard are not likely to distribute their code in the first place.
> This ability to play with sound is usually relegated to DAWs, and these apps are built for musicians, not programmers. But what if programmers want to use the power, speed, and sound quality of a DAW in their code?
Well then, they could:
* use Faust or Soul
* use existing plugins in LV2, VST3 or AU formats
* write a new plugin in LV2, VST3 or AU formats
* use SuperCollider, or PureData or any of more than a dozen live-coding languages
* use VCV Rack or Reaktor or any of at least half-dozen other modular environments to build new processing pathways.
Oh wait ...
> Artists, musicians, and producers with a bit of Python knowledge can use Pedalboard to produce new creative effects that would be extremely time consuming and difficult to produce in a DAW.
So it's not actually for programmers at all, its for people "with a bit of Python knowledge".
OK, maybe I'm being a bit too sarcastic. I just get riled up by the breathless BS in the marketing copy for this sort of thing.
It's a plugin host, with the ability to add your own python code to the processing pathway. Nothing wrong with that, but there's no need to overstate its novelty or breadth.
[ EDIT: if I hadn't admitted to my own over-snarkiness, would you still have downvoted my attempt to point out other long-available approaches for the apparent use-case? ]
Under the hood, this is essentially just a Python wrapper around JUCE (https://juce.com), a comprehensive C++ library for building audio applications. We at Spotify needed a Python library that could load VSTs and process audio extremely quickly for machine learning research, but all of the popular solutions we found either shelled out to command line tools like sox/ffmpeg, or had non-thread-safe bindings to C libraries. Pedalboard was built for speed and stability first, but turned out to be useful in a lot of other contexts as well.