Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This doesn't have much of a use case.

- Can't use it in telephony (obvious application for low bitrates); phone handsets and headsets don't have the power to do it in real time.

- Very small files of good quality would be useful in tiny embedded systems that have low flash space: but what systems of that type have the processing power for decoding? Very low storage more or less goes hand in hand with weak processing.

The quality is astonishing for the bit rate, though.



Sometimes you have enough cpu and not enough bandwidth. Remote expeditions, rural schools in underdeveloped parts of the world, etc. You can stream a bunch of stuff (news, audiobooks, daily lectures, etc.) via (otherwise pricy) satellite links, and then a raspberrypi or whatever solar powered device can decode the audio without having to be real-time.

It's not a use for "everybody", but it might reduce the costs for those people who need this (or make new things viable).


Archiving over a long period of time might be a use case.

I often wonder, how much of the data currently in circulation will be lost at some point? HDD/SSD last a couple of years. Most of the data in the cloud will be copied over, but some will be lost. If you extrapolate to a 1000, 1 000 000 years, how much will remain? Will something survive the civilization collapse? I guess most people don't care, but some will ...

One way to make data mediums last longer is to make them lower density, and for that such super-low bitrate could be useful.


No. Historical archives shouldn't use overly clever compression algorithms. Remember the JBIG2 fiasco. https://en.wikipedia.org/wiki/JBIG2#Character_substitution_e...


Not only the JBIG2 fiasco was not an inherent flaw of JBIG2 itself, but any historical archive would want to use a bounded error model for any lossy compression algorithm anyway. We don't exactly know how much error is tolerable for given content, but we know that some error is definitely tolerable for most contents, and its upper bound can be used to specify the safe and reasonable compression level. Once that constraint has been met, the choice of algorithm is no longer relevant.


If you use simple encoding (e.g. uncompressed bitmap), your archival capacity will be extremely limited, esp. if you use low-density medium (optimized for longevity). There's an obvious trade-off between encoding complexity and how much can you archive.

One approach would be to have a layered strategy - simple (but inefficient) encoding for an initial set of data, accompanied by a bootstrap for the next level which would unlock access to a much larger collection of efficiently stored data.


The only data that survives a civilization-level collapse is that which requires as little decoding as possible; in other words, plaintext. Future archaeologists aren't going to have a working copy of your GAN-based audio decoder. Translate your data into text (in as many major languages as possible), carve it into stone, and stuff it in a cave in the desert.


I would worry more about future archaeologists not being able to access e.g. Nvidia and TSMC engineering secrets than their ability to decode my cat pictures and shitty piano practice.


> Can't use it in telephony (obvious application for low bitrates); phone handsets and headsets don't have the power to do it in real time.

Not now but in another 5 years they will start to and in 10 years all new ones will probably have the power for this. I find it really exciting although it will consume more battery to run that; but if that is less than the radio antenna requires then it might make sense.


People are still bullish on Moore's.


There's plenty of use cases, just because you can't think of them doesn't mean there aren't any. Neural network based audio codecs like TSAC, Descript, Encodec, Soundstream are used for music and speech generation, audio upsampling, pre-training acoustic language models, speech-to-speech translation etc.

Check out the citations of Encodec (Facebook's open sourced audio codec) for more examples: https://scholar.google.com/scholar?cites=1126914113099467682...


Something like this might be useful to put into hardware, e.g. with custom silicon or FPGAs. Maybe then the processing power won't be a big issue?

Edit: Ok I just saw the thing is over 200 MB, might not be feasible for a while.


I think the big use case is satellite telephony and satellite radio/audio/podcast play. You could do all audio applications off a 3KBit/s connection - that’s completely insane.

It would probably have to be optimized a bit further though, both in terms of computing as well as size. Goal would probably be real time encoding on an iPhone SE without breaking too much of a sweat, and a encoder/decoder perhaps less than 200MB?

I am curious how well this does work with a full orchestra music — that’s where encoders usually croak. Give me a sample of the Star Wars theme.


How does satellite telephony work? Alice calls Bob over satellite. Alice's telephone is on AC power, and contains a GPU cluster, and Bob's is the same?


Today's GPU clusters is tomorrow's cell phone. But this codec doesn't require that much power anyway. I didn't try it but on some forum somebody claimed 0.5x speed on a Core(TM) i3-7100U CPU @ 2.40GHz [1]. It sounds plausible that with some more optimization and a bit better hardware, that's a bit more specialized for AI, it could do real time encoding and decoding on cell phones.

[1] https://hydrogenaud.io/index.php/topic,125765.0.html


I remember playing OPUS files at 16 KBPS over a... 2G? connection with mplayer's caching options. The audio sounded a bit better than MP3@32 or Real Audio back in the day.

As the music was "avant gardé", it almost fitted the genre.


don't be such a debbie downer! celebrate it, and we'll find a use for that some day


Innovation goes in steps and iterations ;) When mp3 came out, I could just barely play a song encoded from 44.1kHz/16 bit stereo on my PC, taking almost 100% CPU. Today they can be played on a cheap microcontroller.

I like that they share their work, it can lead to something some day.


MP3's were playable on cheap boom box stereos, and portable CD players, 20+ years ago. Such consumer devices capable of decoding MP3's appeared within less than half a decade of MP3 itself, by my recollection.


I think you are correct on that one. How long will it take to run this neural net on cheap consumer devices? It might take more than 5 years. But if all the new AI stuff is not a hype, but continues to be used, we will probably see hardware for running it on cheap circuits in a not to distant future. Maybe using a GPU+RAM like structure. Maybe the analog circuits with analog flash will win? The future will show us :)

Maybe add this URL to the calendar on today's date in 5 years an go back and reply with the answer :-D


When MP2 came out, my computer was barely able to play a song in 44.1 kHz 16 bit mono. I think bitrate was 192 kbps, but not sure.

(Later on I was so surprised MP4 didn't replace MP3!)


If you thought it was weird that they went to video with MP4, imagine my shock that the next generation they got into firearms.


That escalated quickly


Am I missing something in thinking that this could be alleviated like every other compression algorithm by implementing it via a hardware codec?


It's a neural network, not a traditional compression algorithm. It would be difficult to implement this efficiently in an ASIC AFAIK, but if there are any hardware designers that disagree please chime in.


Traditional codecs also use a lot of “magic” tables with numbers (see e.g. AMR codecs used in GSM telephony).

I think this codec could be optimized to run relatively efficiently on the various AI accelerator chips modern phones have, which is “kind-of” doing it in hardware.


I figured some sort of NPU + tuned hardware would be enough, but I'm just going off of intuition.


Ham Radio enthusiasts love to do stuff with $1000 radios. If this can run on any reasonable laptop it could be amazing.

They're putting neural accelerators in everything these days, I wouldn't be surprised if they got it to where it could work on a phone, in which case you could do voice over Meshtastic.


Storage of large amounts of voice conversations for regulatory purposes ? ( say, a trading floor )


Since music quality / stereo are not required, a speech codec could be used. I think this TSAC outperforms most of them on raw bit rate, but not energy efficiency and speed. E.g. SILK goes down to 6 kbps; that could be a contender.

Or maybe you do want really good quality in order to fingerprint the voices. Vocoder artifacts can give parties plausible deniability (that's not my voice).


Or mass surveillance.


> phone handsets and headsets don't have the power to do it in real time.

Until quite recently my phone was the faster computer I owned.

What phone cannot decode these?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: