This doesn't have much of a use case. - Can't use it in telephony (obvious appli...

ajsnigrutin · on April 9, 2024

Sometimes you have enough cpu and not enough bandwidth. Remote expeditions, rural schools in underdeveloped parts of the world, etc. You can stream a bunch of stuff (news, audiobooks, daily lectures, etc.) via (otherwise pricy) satellite links, and then a raspberrypi or whatever solar powered device can decode the audio without having to be real-time.

It's not a use for "everybody", but it might reduce the costs for those people who need this (or make new things viable).

The_Colonel · on April 9, 2024

Archiving over a long period of time might be a use case.

I often wonder, how much of the data currently in circulation will be lost at some point? HDD/SSD last a couple of years. Most of the data in the cloud will be copied over, but some will be lost. If you extrapolate to a 1000, 1 000 000 years, how much will remain? Will something survive the civilization collapse? I guess most people don't care, but some will ...

One way to make data mediums last longer is to make them lower density, and for that such super-low bitrate could be useful.

meindnoch · on April 9, 2024

No. Historical archives shouldn't use overly clever compression algorithms. Remember the JBIG2 fiasco. https://en.wikipedia.org/wiki/JBIG2#Character_substitution_e...

lifthrasiir · on April 10, 2024

Not only the JBIG2 fiasco was not an inherent flaw of JBIG2 itself, but any historical archive would want to use a bounded error model for any lossy compression algorithm anyway. We don't exactly know how much error is tolerable for given content, but we know that some error is definitely tolerable for most contents, and its upper bound can be used to specify the safe and reasonable compression level. Once that constraint has been met, the choice of algorithm is no longer relevant.

The_Colonel · on April 9, 2024

If you use simple encoding (e.g. uncompressed bitmap), your archival capacity will be extremely limited, esp. if you use low-density medium (optimized for longevity). There's an obvious trade-off between encoding complexity and how much can you archive.

One approach would be to have a layered strategy - simple (but inefficient) encoding for an initial set of data, accompanied by a bootstrap for the next level which would unlock access to a much larger collection of efficiently stored data.

kibwen · on April 9, 2024

The only data that survives a civilization-level collapse is that which requires as little decoding as possible; in other words, plaintext. Future archaeologists aren't going to have a working copy of your GAN-based audio decoder. Translate your data into text (in as many major languages as possible), carve it into stone, and stuff it in a cave in the desert.

dheera · on April 9, 2024

I would worry more about future archaeologists not being able to access e.g. Nvidia and TSMC engineering secrets than their ability to decode my cat pictures and shitty piano practice.

aembleton · on April 9, 2024

> Can't use it in telephony (obvious application for low bitrates); phone handsets and headsets don't have the power to do it in real time.

Not now but in another 5 years they will start to and in 10 years all new ones will probably have the power for this. I find it really exciting although it will consume more battery to run that; but if that is less than the radio antenna requires then it might make sense.

kazinator · on April 9, 2024

People are still bullish on Moore's.

ipsum2 · on April 9, 2024

There's plenty of use cases, just because you can't think of them doesn't mean there aren't any. Neural network based audio codecs like TSAC, Descript, Encodec, Soundstream are used for music and speech generation, audio upsampling, pre-training acoustic language models, speech-to-speech translation etc.

Check out the citations of Encodec (Facebook's open sourced audio codec) for more examples: https://scholar.google.com/scholar?cites=1126914113099467682...

atoav · on April 9, 2024

Something like this might be useful to put into hardware, e.g. with custom silicon or FPGAs. Maybe then the processing power won't be a big issue?

Edit: Ok I just saw the thing is over 200 MB, might not be feasible for a while.

ant6n · on April 9, 2024

I think the big use case is satellite telephony and satellite radio/audio/podcast play. You could do all audio applications off a 3KBit/s connection - that’s completely insane.

It would probably have to be optimized a bit further though, both in terms of computing as well as size. Goal would probably be real time encoding on an iPhone SE without breaking too much of a sweat, and a encoder/decoder perhaps less than 200MB?

I am curious how well this does work with a full orchestra music — that’s where encoders usually croak. Give me a sample of the Star Wars theme.

kazinator · on April 9, 2024

How does satellite telephony work? Alice calls Bob over satellite. Alice's telephone is on AC power, and contains a GPU cluster, and Bob's is the same?

ant6n · on April 9, 2024

Today's GPU clusters is tomorrow's cell phone. But this codec doesn't require that much power anyway. I didn't try it but on some forum somebody claimed 0.5x speed on a Core(TM) i3-7100U CPU @ 2.40GHz [1]. It sounds plausible that with some more optimization and a bit better hardware, that's a bit more specialized for AI, it could do real time encoding and decoding on cell phones.

[1] https://hydrogenaud.io/index.php/topic,125765.0.html

anthk · on April 9, 2024

I remember playing OPUS files at 16 KBPS over a... 2G? connection with mplayer's caching options. The audio sounded a bit better than MP3@32 or Real Audio back in the day.

As the music was "avant gardé", it almost fitted the genre.

shvedsky · on April 9, 2024

don't be such a debbie downer! celebrate it, and we'll find a use for that some day

Max-q · on April 9, 2024

Innovation goes in steps and iterations ;) When mp3 came out, I could just barely play a song encoded from 44.1kHz/16 bit stereo on my PC, taking almost 100% CPU. Today they can be played on a cheap microcontroller.

I like that they share their work, it can lead to something some day.

kazinator · on April 9, 2024

MP3's were playable on cheap boom box stereos, and portable CD players, 20+ years ago. Such consumer devices capable of decoding MP3's appeared within less than half a decade of MP3 itself, by my recollection.

Max-q · on April 9, 2024

I think you are correct on that one. How long will it take to run this neural net on cheap consumer devices? It might take more than 5 years. But if all the new AI stuff is not a hype, but continues to be used, we will probably see hardware for running it on cheap circuits in a not to distant future. Maybe using a GPU+RAM like structure. Maybe the analog circuits with analog flash will win? The future will show us :)

Maybe add this URL to the calendar on today's date in 5 years an go back and reply with the answer :-D

vardump · on April 9, 2024

When MP2 came out, my computer was barely able to play a song in 44.1 kHz 16 bit mono. I think bitrate was 192 kbps, but not sure.

(Later on I was so surprised MP4 didn't replace MP3!)

hbn · on April 9, 2024

If you thought it was weird that they went to video with MP4, imagine my shock that the next generation they got into firearms.

MrDrMcCoy · on April 9, 2024

That escalated quickly

heavyset_go · on April 9, 2024

Am I missing something in thinking that this could be alleviated like every other compression algorithm by implementing it via a hardware codec?

ipsum2 · on April 9, 2024

It's a neural network, not a traditional compression algorithm. It would be difficult to implement this efficiently in an ASIC AFAIK, but if there are any hardware designers that disagree please chime in.

woodson · on April 10, 2024

Traditional codecs also use a lot of “magic” tables with numbers (see e.g. AMR codecs used in GSM telephony).

I think this codec could be optimized to run relatively efficiently on the various AI accelerator chips modern phones have, which is “kind-of” doing it in hardware.

heavyset_go · on April 9, 2024

I figured some sort of NPU + tuned hardware would be enough, but I'm just going off of intuition.

eternityforest · on April 9, 2024

Ham Radio enthusiasts love to do stuff with $1000 radios. If this can run on any reasonable laptop it could be amazing.

They're putting neural accelerators in everything these days, I wouldn't be surprised if they got it to where it could work on a phone, in which case you could do voice over Meshtastic.

Agingcoder · on April 9, 2024

Storage of large amounts of voice conversations for regulatory purposes ? ( say, a trading floor )

kazinator · on April 9, 2024

Since music quality / stereo are not required, a speech codec could be used. I think this TSAC outperforms most of them on raw bit rate, but not energy efficiency and speed. E.g. SILK goes down to 6 kbps; that could be a contender.

Or maybe you do want really good quality in order to fingerprint the voices. Vocoder artifacts can give parties plausible deniability (that's not my voice).

orlp · on April 9, 2024

Or mass surveillance.

4ad · on April 9, 2024

> phone handsets and headsets don't have the power to do it in real time.

Until quite recently my phone was the faster computer I owned.

What phone cannot decode these?