Songbird: Spatial Audio Encoding on the Web

drewbitllama · on Aug 16, 2017

Hello everyone, thanks for checking out the new repository. I've resolved the CDN link issues, but feel free to file any more issues at https://github.com/Google/songbird/issues. Looking forward to seeing all the great stuff you all make with it. :)

toomim · on Aug 17, 2017

Thanks! I'm confused about what this is, exactly, from the description:

> Songbird is a JavaScript API that supports real-time spatial audio encoding for the Web using Higher-Order Ambisonics (HOA). This is accomplished by attached audio input to a Source which has associated spatial object parameters. Source objects are attached to a Songbird instance, which models the listener as well as the room environment the listener and sources are in. Binaurally-rendered ambisonic output is generated using Omnitone, and raw ambisonic output is exposed as well.

My confusion is that The Web Audio API [1] also supports real-time spatial audio for the Web [2]. It looks like Ambisonics is a format that encodes spatial audio into a fixed set of audio channels, rather than just playing audio into PannerNodes directly.

Some questions: (1) Is Songbird indeed an alternative to the PannerNode API like I'm suspecting? (2) If so, why would you want to downmix your audio into a set of intermediate channels, rather than play each source directly into a PannerNode? (3) Is there any advantage to using Omnitone, which I suspect does the HRTFs, rather than using a PannerNode and its HRTFs directly?

[1] https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_A...

[2] https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_A...

drewbitllama · on Aug 17, 2017

Hi Toomin,

Thanks for your interest! Let me try to clarify and answer your questions:

1. Songbird is indeed an enhanced alternative to PannerNode.

2. It internally works with ambisonics, but outputs stereo (we use Omnitone internally to render the multichannel audio down into a stereo track).

The general reason people use ambisonics instead of direct HRTF rendering is because ambisonics allows for head rotation prior to rendering, so the user can easily turn their head, etc. without you having to adjust all the incoming sources' hrtfs.

The reason we feel Songbird is an upgrade to PannerNode is three-fold:

One, you can control the quality of the localization/spatialization effect by adjusting ambisonicOrder (1st to 3rd, atm).

Two, PannerNode is costly... 2 convolutions per source, while songbird is a fixed number of convolutions irregardless of the number of sources, so it ends up allowing you to get more for less.

Three, PannerNode doesn't support any sort of room modelling and Songbird produces spatialized (ambisonic) room reflections and reverberation.

Hope this helps clarify things! :)

Cheers, Drew

toomim · on Aug 17, 2017

That's exactly the information I needed! Thank you! It would be great to have this on the homepage.

drewbitllama · on Aug 17, 2017

Noted! Will add a better explanation to the README. :)

Sephr · on Aug 17, 2017

Should Songbird users be concerned about Creative's spatial audio patents? Will Google provide legal counsel if I get sued for using Songbird to provide spatial audio in a game?

thowfaraway · on Aug 17, 2017

Eli, I think you started with a reasonable question, but then followed it up with an unnecessarily passive aggressive question.

yorwba · on Aug 17, 2017

It's a valid concern though. People have been sued in the past for patent infringement resulting from using libraries provided by Google: http://www.x-plane.com/x-world/lawsuit/details/

jcelerier · on Aug 17, 2017

not when you know that Creative sued some companies to death : https://en.wikipedia.org/wiki/Aureal_Semiconductor

"On March 5, 1998 Creative Labs sued Aureal for patent infringement. Aureal countersued because they believed Creative was guilty of patent infringement. After numerous lawsuits Aureal won a favorable ruling in December 1999,[1] which vindicated Aureal from these patent infringement claims, but the legal costs were too high and Aureal filed for bankruptcy. On September 21, 2000, Creative acquired Aureal's assets from its bankruptcy trustee for US$32 million. The purchase included patents, trademarks, other property, as well as a release to Creative from any infringement by Creative of Aureal's intellectual property including A3D. The purchase effectively eliminated Creative's only competition in the gaming audio market. It also eliminated any requirements for Creative to pay past or future royalties as well as damages for products which incorporated Aureal's technology."

Aureal's tech was based on HRTF which is a different (now expired IIRC) patent than Ambisonics though;

TheRealPomax · on Aug 17, 2017

Everytime I see something like this rolling out of Google, all I can think is "if only this was a real project, and not yet-another-Google-shop-project". And no, that's not fair towards the people who make it, but then the entire thing is clearly marked as copyright Google, not copyright the people who deserve the credit, so Google doesn't even want me to think of this as something cool made by cool people, but another library pumped out by Google for the betterment of a market position.

voiper1 · on Aug 17, 2017

What's bothering you? Google apparently allows people to make awesome stuff during business time, and then released it under Apache License 2.0, which seems to be a pretty permissive license (https://github.com/google/songbird/blob/master/LICENSE)

It may not get the attention it deserves, but no project is guaranteed to have a maintainer-for-life.

TheRealPomax · on Aug 17, 2017

Mostly the part where it stays a Google project, and despite Apache licenses, no one can contribute without signing a CLA that discriminates against any developer without a phone or a permanent address. "The lawyers insisted" means I don't like the way you think you're releasing useful code into the world when you're really presenting something you locked down so hard people need to give you their direct personal information before you let them help make it better (Facebook does the same thing, which is why I don't contribute to their projects despite loving quite a few of them and using them daily).

TheArcane · on Aug 17, 2017

>allows people to make awesome stuff during business time

Isn't that R&D? A lot of comapnies do it.

igorgue · on Aug 16, 2017

OT: This brings me back... https://en.wikipedia.org/wiki/Songbird_(software)

oatmealsnap · on Aug 17, 2017

Yea...I thought that was gaining a second life. :(

breul99 · on Aug 17, 2017

Such a solid media player for it's time.

bowmessage · on Aug 16, 2017

Hmm. None of the examples are working for me. "The HTMLMediaElement passed to createMediaElementSource has a cross-origin resource, the node will output silence." in the console.

macOs Sierra 10.12.5 Firefox 55 64-bit

runesoerensen · on Aug 16, 2017

Didn't have this issue in latest Chrome on OSX.

The current demo URLs use `rawgit.com` but reference the script at `cdn.rawgit.com`, so maybe try and prefix the URLs with "cdn" to avoid cross-origin resources altogether? For instance https://cdn.rawgit.com/google/songbird/master/examples/room-...

Edit: Fixed since https://news.ycombinator.com/item?id=15032253

PaulHoule · on Aug 16, 2017

Worked OK on Microsoft Edge.

The spatial effect is not that bad, and that is speaking as someone who (1) loves 5.1 and other surround in games, movies and music, and (2) is usually unimpressed with headphone surround.

drewbitllama · on Aug 17, 2017

sushisource · on Aug 16, 2017

Same. No examples work on either FF or Chrome, Fedora 25. Audio from both browsers works perfectly fine for other things.

ben174 · on Aug 16, 2017

No luck on iOS. Not sure if it claims to be supported.

drewbitllama · on Aug 17, 2017

Hi Ben174,

I'm aware of the issue and I hope to have it resolved within the day. :)

Cheers, Drew

rotten · on Aug 17, 2017

Worked great for me with Vivaldi on Ubuntu 17.04.

nothis · on Aug 17, 2017

Same.

jhurliman · on Aug 16, 2017

This is great to see! I have a side project to port the old Peep Network Auralizer to a web based project[1] and the code already has the concept of mono sounds originating at different points in a 3D space, so this should be relatively straightforward to integrate. I was aiming for Android/iPhone compatibility though; is there any multichannel audio support on mobile yet or would I need a fallback?

[1] https://github.com/jhurliman/webpeep

drewbitllama · on Aug 16, 2017

Hi!

Songbird renders stereo-out using Omnitone internally, so Android/mobile is certainly supported. Feel free to file any issues you have at https://github.com/Google/songbird/issues.

Cheers, Drew

toomim · on Aug 17, 2017

How is this better than PannerNode, which is built into HTML5? I can hear the difference in the demo, but can someone put words to it?

moolcool · on Aug 17, 2017

I think part of it is room simulation. I might be wrong, but I don't think PannerNode supports things like the room materials like in the 2nd demo

jcelerier · on Aug 17, 2017

To properly enjoy ambisonics you need a system with multiple loudspeakers like this https://www.york.ac.uk/media/electronic-engineering/postgrad...

briankwest · on Aug 16, 2017

FreeSWITCH can do this in mod_conference using OpenAL, its pretty sweet.

adzm · on Aug 17, 2017

I've had lots of trouble with WebAudio and rendering to a file as output, though that was in the infancy of the relevant technology. I can't seem to find any information on this for Songbird nor Omnitone, though, but if it works it opens up a bunch of possibilities.

mnsc · on Aug 17, 2017

Had my headphones on backwards and got confused when I moved the S and L around. The S was obviously the "Subject"/me, but I couldn't figure out what the L was for. I settled for "Loud thing".

drewbitllama · on Aug 17, 2017

Haha. Actually, you might have had your headphones on correctly all along. S was "Source" and L was "Listener". I've added some clarification to the examples. Thanks for testing it out!

dharma1 · on Aug 17, 2017

Are there any FOSS tools for capturing custom HRTFs? With photogrammetry for instance? I found a paper using a Leap Motion for a custom HRTF capture, but they didn't publish any code

https://home.deib.polimi.it/antonacc/pubs/ICASSP_16_2.pdf

macawfish · on Aug 17, 2017

Woah!!! This is perfect for a project I'd like to do....

sabujp · on Aug 16, 2017

ahh, this is like the thing in realtec audio mixer that lets me change the room type, but in my browser and in realtime

olleromam91 · on Aug 17, 2017

This is neat!

Can you share what techniques you are using for reverberation processing? Early and diffuse reflections?

drewbitllama · on Aug 17, 2017

We're using a room acoustics model that captures early and late reflections based on the acoustic properties (dimensions and materials) of the room. :)

science404 · on Aug 17, 2017

Worth mentioning that this is just a box-shaped room. How many early reflections are calculated?

drewbitllama · on Aug 17, 2017

Hi Science404, yes indeed we're launching with the standard shoebox for now, but obviously we're thinking about the future too. :) Currently we calculate listener-based 1st-order reflections, optimized for performance. Once the ecosystem out there gets faster, we can explore fancier methods. ;) You can see this in EarlyReflections.js

Cheers, Drew

mycall · on Aug 17, 2017

I'm so glad ambisonics is finally getting its day.