Nice, simple explanation of this concept. +1000 for using IPython notebook, though the left-right layout of the cells is a little hard to follow. Shamelessly, I prefer my own layout based on pelican-bootstrap: http://kastnerkyle.github.io/blog/2014/02/15/polyphase-signa... - but that moving swarm logo is pretty great!
However, I have to say that a high-pass filter (in this case, basically a brickwall applied using an FFT? might not be a great idea due to "ringing" in the time-domain though it works here...) is typically NOT the right answer for real datasets. Think of detecting outliers on a square wave, or a pinched sine wave, or any "non-pure" tone, really... Non-gaussian noise/interference is also way more common than you would be lead to believe :). Onset detection is a complex problem, but an easy to understand solution is a tunable lag-lead moving average filter.
One small window calculates the moving average, while a larger window also calculates the moving average. For "noise" or typical data, the averages should be similar given that both windows encompass more than a few cycles of the signal. When the small window average is > than the large window average by some margin, an event has just occurred. The two windows should then get to about the same average for the event duration (in this case, there is not really much duration). Once the small window average is < the large window average by some margin, then the signal event is over. There are better approaches from music information retrieval which use phase information as well...
The hard part is tuning the "above and below" thresholds and two window sizes from the signal you wish to catch - this approach is used as a simple alternative to wavelets for earthquake detection.
Great illustration, and if this technique works for your data that is awesome! Unfortunately, I have never found a dataset with such an easy solution for outlier detection...
It's a shame that the linked article, which uses the new IPython blog-authoring features, isn't identified as such. It's a nice example of IPython's interactive authoring abilities that mix normal text with technical content and graphics created using the IPython web interface.
However, I have to say that a high-pass filter (in this case, basically a brickwall applied using an FFT? might not be a great idea due to "ringing" in the time-domain though it works here...) is typically NOT the right answer for real datasets. Think of detecting outliers on a square wave, or a pinched sine wave, or any "non-pure" tone, really... Non-gaussian noise/interference is also way more common than you would be lead to believe :). Onset detection is a complex problem, but an easy to understand solution is a tunable lag-lead moving average filter.
One small window calculates the moving average, while a larger window also calculates the moving average. For "noise" or typical data, the averages should be similar given that both windows encompass more than a few cycles of the signal. When the small window average is > than the large window average by some margin, an event has just occurred. The two windows should then get to about the same average for the event duration (in this case, there is not really much duration). Once the small window average is < the large window average by some margin, then the signal event is over. There are better approaches from music information retrieval which use phase information as well...
The hard part is tuning the "above and below" thresholds and two window sizes from the signal you wish to catch - this approach is used as a simple alternative to wavelets for earthquake detection.
Great illustration, and if this technique works for your data that is awesome! Unfortunately, I have never found a dataset with such an easy solution for outlier detection...