It's theoretically plausible that the noise has some structure, and with some bayesian priors, one can classify some of the variation as "probably noise" vs "probably signal". Of course it works best if you know what the signal is before you start, but if that's the case why are you trying to experimentally construct it? :-)
This is a common subtlely problem with statistical denoising (aka pattern recognition, aka lossy compression). Commonly the source data and noise models are synthetic formulas (simple sine wave or aussian), and knowledge of the data-generation procedure can infect the supposed algorithm. (a subtle form of training your model on your test dataset)
> but if that's the case why are you trying to experimentally construct it?
People are able to interpret speech under very high levels of noise due to having strong priors / being able to guess well / having shared state with the speaker.
Of course you should take advantage of whatever priors you have about the signal; we know rather trivially from information science that information gain / transmission rate is maximized when the listener knows as much as possible about the source. This is also key for compressed sensing.
You see analogues of this pattern everywhere in the world: e.g. people learn most quickly when they have some context about the problem space already.
The real question is why the sender would produce a signal with any level of predictability in it, rather than a signal that purely maximizes the effective bit rate of transmission (e.g. minimize redundancy modulo noise). But it's easy to see that physical constraints (e.g. how the human voice works) are at least partly responsible for the enforcement of regularity, at least for many signals that we're interested in in practice.
Yes, and also sometimes the redundancy is structurally unavoidable (e.g. if I make a public statement, inevitably some people will have more context about my mental state than others and understand my message better).
English, like all other natural languages, is very redundant. We can't change it to remove all (unnecessary) redundancy since they need to both be speakable and understandable by most possible pairs of communicators, and we are constrained by pesky little facts like the physiology of our mouths and vocal cords.
So in a very real sense, you can probably can practically beat the sampling theorem for most real-life signals that we care about, as long as you have the priors and the necessary compute to apply them.
But that's not really beating the sampling theorem.
Sampling is a self-contained symbol-transmission technique that is absolutely signal-agnostic. You can throw anything at it, including band-limited noise with no redundancy. As long as there's nothing in the signal above N/2, it works.
As soon as you include redundancy and priors, you're solving a different problem. Your channel is no longer signal-agnostic, and you can use known information external to the signal to reconstruct it. The advantage is you can use less data, but the disadvantage is that your reconstruction system has to make strong assumptions about the nature of the signal.
If those assumptions are incorrect, reconstruction fails.
Technically you could argue that the N/2 limit is a form of prior, and sampling is a special instance of a more general theory of channel transmission systems where assumptions are made.
The practical difference is that N/2 filtering followed by sampling at N is relatively trivial with a usefully general result. More complex systems can be more powerful for specific applications, but are more brittle and can be harder to construct.
It depends on what you mean by 'beating' the theorem, doesn't it?
If you mean that the theorem is actually wrong, then I agree— the proof works and you can't actually avoid its conclusions given its assumptions.
But I think we can agree that for many (most?) practical symbol-systems in their particular contexts, the signals are actually high redundant as viewed against a particular basis, so slavishly applying the conclusions of the sampling theorem will cause you to miss the possibility of side-stepping its assumptions entirely.
Whether you're really in a position to take advantage of the latent priors in the signal very much depend on the tools at your disposal—obviously you will not be extracting speech with smart priors if all you have is analog filters at your disposal.
> People are able to interpret speech under very high levels of noise due to having strong priors / being able to guess well / having shared state with the speaker.
In the information theoretic sense, this just means there is enough redundancy in the signal that you can error correct the noise. If you’re in a noisy room talking to a coworker, your shared context and priors are really redundant bits of information. If she says something about “nachoes” but you couldn’t tell if she said “nachoes”, “tacos” or “broncos”, the nachoes next to you add redundancy to the signal.
So in that context, what you’re saying is that we should take advantage of any redundancy we can find in an incoming signal.
The problem is there is no real way to estimate redundancy of a real signal of specific form, or more importantly discerning between likely alias vs likely true signal given a signal with unknown statistics.
The statistical approach might work for simple clean signals but makes critical mistakes in case of complex ones.
Specifically you get to estimate true phase and true magnitude in a given subband...
That's not unlike what you might do to recover a high frequency signal by using a band-pass filter and then sampling that (which is typically done by a mixer in a traditional analog system). If you know what the alias is you can reconstruct your signal. The point though is that you can't sample signals of a higher bandwidth without aliasing. If you know in advance the only alias is at a given frequency then you don't need to filter. The purpose of the filter is to ensure that the bandwidth of the sampled signal is correct given that you know nothing about the frequency content of the original signal. The sampling theorem is pretty specific and its proof is solid. So while it might be true you could use a single pixel to tell the difference between an apple and a banana, without filtering the image before sampling, it doesn't really relate to the sampling theorem at that point.
“For example we know that the neural recordings of interest are a superposition of pulse-like events – the action potentials – whose pulse shapes are described by just a few parameters. Under that statistical model for the signal one can develop algorithms that optimally estimate the spike times from noisy samples. This is a lively area of investigation.”
You can try to estimate the interference at 50Hz or 60Hz from the electric current from the wall. It's usually better to add some additional cables to grounds and shedding to reduce it as much as possible. But if everything fails, it's a nice stable that it's easy to subtract. [Protip: If your "signal" is too nice, check that the frequency is not 50Hz or 60Hz.]
But the article is about an article that tries to calculate and subtract the thermal noise, that is not nice at all.
As you say, filtering mains pollution is completely orthogonal to the stated problem. You would need to do that even if you used an anti-aliasing filter, since these are typically low-pass filters, not band-pass.
This is a common subtlely problem with statistical denoising (aka pattern recognition, aka lossy compression). Commonly the source data and noise models are synthetic formulas (simple sine wave or aussian), and knowledge of the data-generation procedure can infect the supposed algorithm. (a subtle form of training your model on your test dataset)