"... and CTF challenge authors love to encode text into audio waveforms, which you can see using the spectogram view (although a specialized tool called Sonic Visualiser is better for this task in particular)."
Stenography is common, but usually it's just modifying the last bits of an image or hiding the extra payload after the image data. This is "drawing" within the audio spectrum, making the image out of audio that will be audible.
The "Aphex face" manifests as an clearly audible weird abstract sound at the end of the track. It sits nicely in the mix, but it's also distinct from the actual music - I would say it comes across like a signature rather than a musical element. The other examples in GP's linked article made more effort to produce things that are both musical and paintings in the spectrum, especially the Plaid track.
A DC offset, which amplitude-shift keying like this introduces, isn't so nice to your speakers. It might cause them to heat up as the offset waveform holds the magnet out in one direction.
I do love the idea of hiding data in "messy" audio, though :)
In a traditional amplifier the DC should be blocked by the DC blocking capacitors, but I guess with modern full-bridge class-D amplifiers which don't require those it'll have to be done on the software/digital side?
I'm no expert at all, I just saw that the DAC[1] I picked up for a project specifically states no blocking caps are needed.
I then found upon this application note[2] which states a full-bridge class D shares the benefits of a traditional bridge tied load[3] amplifier, like not requiring DC blocking capacitors.
Not sure how common this is, but clearly blocking caps is not something to be assumed.
Ah, that's a good observation. Looking at the datasheet for that TI chip, it is powered by + 3.3 V, but contains an internal charge pump that generates a - 3.3 V supply. So the output can swing across ground. That's what eliminates the need for blocking caps. And it specifies a DC offset error of +/- 1 millivolt. So they've designed around the need for blocking caps with internal circuitry and a guaranteed maximum output offset.
Now what happens if there is an input offset, such as a constant stream of input words other than zero? That's where the system designer has to take care of things. And there are offsets even when blocking caps are used. The caps only prevent compounding those offsets from one stage to the next.
And a certain amount of offset won't kill your speakers. It just has to be kept within reasonable bounds.
I've noticed when using the audio input of a PC for measurement, that there is a constant offset, which is just an artifact of the ADC. The audio recording software has a function for removing that offset. In fact I would consider an audio recording with a large residual offset to be a poor engineering practice.
Often, switchmode amplifiers with BTL output have a large normal mode offset, so they can be powered by a single supply. And single-supply audio circuits tend to be festooned with blocking caps. ;-)
I always laugh when vinyl is described as more “pure”. There’s nothing more pure than math, and that’s digital PCM audio. Sure, it’s stored as discreet samples, but that’s not how it comes out of speakers. The digital->analog converter will give you 1:1 perfect representation of the original waveform as long as you sample at 2x the highest frequency desired and there’s a low-pass filter in place.
For the record, I don't think vinyl enthusiasts ever describe vinyl audio as more pure. "Pure analog," yes, but that's different (and true). It's generally acknowledged by vinyl enthusiasts and audiophiles that vinyl introduces a lot of imperfections, which some people prefer.
Also worth noting that an ideal D/A converter will give you the exact waveform back, but such a device does not exist (but you can get pretty close).
Not strictly true, you must sample at 2x the frequency and at sufficient resolution in the amplitude domain, i.e. an ADC that samples at 44 kHz but with only one bit of resolution (outputs a 1 for positive input voltage and 0 for negative, say) would be pretty awful...
Have you ever played with an Arduino or similar device? Comparing inputs signals on a digital pin vs an analog pin? Hopefully, you'll agree it's the same concept. If you haven't, I'd encourage you to try one out. They are loads of fun. I am a huge fan of analog, yet digital is just so damn convenient. If you have played with one, you'll understand why your comment makes me smile and chuckle.
A digital pin will most likely output a signal using a zero-order hold, which is the simplest type of reconstruction filter.
A zero-order hold is just one possible way of turning a digital signal into an analog one which you can then measure with your oscilloscope. But the output of the zero-order hold is not the digital signal, because the digital signal is only defined at discrete sample points.
Sigh... you are wanting to show me that we can do A/D and D/A again? Thanks, I was totally unawares that we could do that. I've never heard an audio signal played back once it was digitized. My life is now complete.
What this guy is showing is not a digital signal. It is an analog signal that has been generated from digital data. Not sure what the point of all of this was, but thanks, I needed a break from finding this bug I've been trying to squash.
Not necessarily. The amplifiers sometimes are though.
In a normal 2 or 3 way speaker cabinet, you'll have an analog crossover which consists of something like a capacitor in series with the tweeter (high pass filter), and an inductor in series with the woofer (low pass filter).
In that case, the tweeter is protected from DC, but the woofer isn't.
Yes, it's unlikely to cause problems in any real-world setup, but it's theoretically possible :P
I've edited my original comment to clarify that it won't always be the case.
Signal that can damage hardware is something that used to be discussed with CRT monitors back in those ancient times, and rumours existed of a virus that played on this.
Skrillex is actually regarded as a leader in Brostep, a subgenre and/or style of Dubstep. At least in the USA, Brostep has practically supplanted Dubstep and co-opted the name. This has been occuring for about the past 10 years. A big factor is the "drop" part of Brostep sounding very attractive to non-EDM people who head-bang, and to whom traditional Dubstep would be percieved as "boring" or not stimulating enough (without intoxication). This has led to a re-enforcing cycle where domestic EDM festivals reach greater audiences, and so they keep promoting Brostep as Dubstep.
Personally, I'm not a fan of this shift. I prefer EDM from the era of the track you linked to, and/or contemporary artists who emulate the older styles.
Oh I know, I just get irrationaly annoyed by people calling this stuff dubstep. It feels like half the problem is people and/or clubs don't have soundsystems appropriate for playing bass-heavy music at the right level, so they end up listening to stuff that's light on the low-end but still calling it "dubstep".
You are probably right. For EDM, cultural differences have kept the US lagging behind Europe and the UK. EDM-specific clubs are not common outside of major cities known for nightlife. A lot of EDM tours end up at venues that aren't designed for EDM. These are the conditions informing people's tastes, so probably a big factor for why bass isn't as prominent as elsewhere. There truly are fans an places where it is very much alive, just not as strong as other places IMO.
Pretty much every genre changes significantly over its lifetime though. It seems like you could just say you like early dubstep just like people say, "I like old school hip hop" all the time.
I get what you’re saying but the sound is so completely different it’s not a simple evolution of style.
Listen to dnb from two decades ago you can still see where current stuff comes from. Compare early Digital Mystikz stuff with Skrillex and you wouldn’t call them the same thing.
It's still very garage/2steppy. I think Anti War Dub by Digital Mystikz is generally considered the birth of dubstep. Of course, ask 10 people and you'll get 10 answers.
Midnight Request Line was the first dubstep song to go mainstream (IIRC Annie Mac picked it up), but the genre existed before that point. It's a great song though.
I know the point of the article has nothing to do with the definition of dubstep so it took a lot of restraint to not write something about it ha, difficult when you grew up listening to the early stuff.
I'll take this as my only chance I'll probably get to post dubstep on HN in a valid discussion:
Can confirm this is no longer the new thing, but FWIW, jazz music similarly fell out of style ages ago, but that doesn't make new stuff in the genre uninteresting to those who enjoy it irrespective of hype value.
That is, it's no longer nifty and fashionable to listen to this type of "Brostep", but people who liked it without regard to its social status may continue listening to new material as though nothing changed, while others may have grown tired of the sound or the social clout it may have brought them to be "in the know" or part of some zeitgeist and simply kept up with "today's hits".
Sorry, this what the word means in the mainstream. When you say 'punk' people think of blink-182 rather than the clash, there's nothing you can do about that either. Maybe someone should have thought up a name slightly less stupid than brostep
The way the author went about doing and explaining this is somewhat confusing. What he arrived through that strange band split/merge process is actually ~identical* to:
- Apply an EQ filter that lowers the volume of the 0-100Hz band by 6dB (this happened because he halved the amplitude of the 100Hz band)
- Add a slow binary digital signal at around ~4 baud (2Hz fundamental), with slopes smoothed to around 10% of the bit time.
This has nothing to do with dubstep or bass drops - it would work for any song. It's just modulating data in infrasound, at 2Hz, which is well below the threshold of human hearing. The problem here is that he's also needlessly reducing the level of the 0-100Hz band to half the amplitude (6dB), which completely kills the bass feel of the original song. Dubstep fans will not approve (and he needs better speakers if he can't hear the difference).
A much simpler, more sensible process would be to just do this:
- Apply a steep highpass filter at 20Hz (the limit of human hearing), to remove any inaudible low-frequency (infra)sounds.
- Reduce the volume of the overall song by, say, around 1dB, to make a bit of headroom for the modulated digital signal
- Encode whatever you want in those 20Hz in the headroom you created (the amplitude can be quite low, e.g. 5%, it doesn't need to move the whole waveform over).
Then to decode it just lowpass the signal at 20Hz and do your bit detection after that - the filter will remove the audio, leaving only your signal, so it doesn't matter that your signal isn't a whole 50% of the output power. Now the song is only 1dB quieter. You can use as simple or as fancy a modulation technique as you want in that 20Hz band. You could use (normal) ASK as he did, just lowpass it to remove any high frequency components. You could use FSK. You could use QAM. Whatever.
* His process actually also messes up the original 100Hz band by modulating it with a ~4Hz square waveform due to the way he does the modulation by inverting and interpolating, which is going to create harmonics and other ickiness around the transitions, as well as does not guarantee the absence of clipping due to the way he only reduced the amplitude of the low 100Hz band (this process can actually increase peak levels, as can happen any time you use frequency filtering - try his high-pass filter command on this file and watch sox complain of clipping, even though the original file does not clip: https://mrcn.st/t/filtering_clips.wav ), so I would not recommend trying to emulate his approach precisely even if you want to achieve the same actual effect, since it's actually quite a silly way of going about doing it :)
It's not even dubstep/brostep related. It's also very inefficient.
The best picture/audio tool is probably Metasynth - as used by Aphex Twin. It's been around since the 90s and seems to be in development limbo at the moment, but with a bit of taming it will make all those classic dubstep/brostep sounds out of carefully selected images. Which can be graphic images of text.
Isn't this precisely what some ad tracking people or rights management are doing? Aren't they adding audio that is out of range for human consumption but things like Alexa can hear them so that, or so the broadcaster can tell that a pub is broadcasting a match without paying for it, etc?
This kind of modulating data in infrasound or ultrasound is common, yes. It has been used in toys too, e.g. things that respond to certain sounds from a show. chibi-tech stuck some reverse engineered ultrasound triggers for a certain line of toys in some of her songs :)
Infrasound only works digitally because no speaker system can reproduce frequencies that low, and many analog systems will corrupt them (e.g. AC coupling). Ultrasound is therefore used most of the time in practice, but I believe infrasound has been used in digital song watermarking for DRM/copyright tracking purposes.
Not really, unless you are doing a 1:1 digital copy, frequencies outside the range of human hearing are often removed.
Frequencies outside the range of a speaker are often filtered out as it can create distortion or even damage. And the job of lossy compression is to remove everything that you can't hear in order to save bytes, and limiting the bandwidth to what you can hear is the most basic step.
Instead, DRM systems typically encode data over a wide range of frequencies (spread spectrum), well within the audible range. It is designed in such a way that you could hear it in theory, but don't notice it because it blends with background noise. It is very robust, resisting compression, recording and even deliberate attacks. In fact, it is one of the techniques used by the military radios to resist jamming.
You can also trigger wake words like "Ok Google" or "Alexa" by using _harmonics_ of normal human voices that are outside the range of audible sound. The key is that the mic can't differentiate between the harmonics and actual speech of normal human pitch and so the trigger is set off, but the sound isn't audible to humans.
I don't have a link to the paper handy (sorry!) but IIRC I found a white paper on ArXiv called "Dolphin Attack" or similar that demonstrated this. It was a fun read.
While technically that sounds like a cool hack, it sounds like a dumb thing to actually allow to happen. "Okay Google" or "Hey Alexa" outside of the frequencies reproducible by human voices should be filtered out completely. At least, from my comfy chair nit picking someone else's work. Of course, by not filtering the acceptable bands allows them to do all of the ad tracking/rights management things they are allowing to occur. The fact that Alexa/Google is able to do these ad tracking/rights management is just another example showing that the mic is listening 100% of the time.
Do you have a link? I assumed a low pass filter was applied to the signal to reduce unnecessary data transmission. I have heard about what you're saying and I'm not dismissing it. One could run their television through a low pass filter to get around such tracking, right?
That's kind of what I was implying with one of my other replies. This filtering is something that Alexa/Googs should be doing on thier end. Their mics should only be listening in the frequency ranges of what human voices exist.
The limit of human hearing is a bit above around 20KhZ. Emphasis on the K. It is totally possible to hear 20Hz noise. In fact 20KhZ is better for this because you can use a higher bit rate and that band is going to have very little intentional noise to begin with.
Wrong side of the spectrum. The lower limit of human hearing is 20Hz. It is not possible to hear <20Hz noise. It is, however, possible to feel it, if the sound pressure is loud enough and you actually have a subwoofer capable of reproducing those frequencies, but that is rather unlikely unless your subwoofer is a cut-out in your room's wall with a fan in it.
If you feed a 20Hz signal to a typical home subwoofer (or even most club systems) and hear something, you aren't hearing 20Hz. You are hearing a bunch of high frequency rubbing noises as the speaker cone moves at 20Hz, trying and utterly failing to couple any amount of energy at that frequency into the air. This is why many songs these days are produced with "bass maximizers" and why modern laptops can sometimes have "decent bass". It's not bass, it's a filter that purposely distorts the bass, which your speakers can't reproduce, into higher frequencies, which it can and which we've learned to associate with heavy bass played through systems that can't reproduce it but distort instead.
Just for reference, I believe these are the subs we use at Euskal Encounter. I can vouch for the fact that they can make the floor shake in a massive event hall venue. Low end response: down to 28Hz. No more.
It is indeed better to modulate data in ultrasound since you have a lot more bandwidth - except for the fact that any lossy compression applied to your file is going to completely destroy your data. This is one thing the author got absolutely right.
https://www.magneticmag.com/2012/08/the-aphex-face-visualizi...