I found motion detection to be the easy part when building my NVR. I just used trial and error and scipy filters and eventually found something I'm happy with.
Handwriting a GST pipeline is pretty much what I did. I start with frame differences(I only decode the keyframes that happen every few seconds, so motion detection has to work in a single frame to have good response time).
Then I do a greyscale erosion to suppress small bits of noise and prioritize connected regions.
After that I take the average value of all pixels, and I subtract it, to suppress the noise floor, and also possibly some global uniform illumination changes.
Then I square every pixel, to further suppress large low intensity background noise stuff, and take the average of those squares.
I mostly only run object detection after motion is detected, and I have a RAM buffer to capture a few seconds before an event occurs.
NVR device code(In theory this can be imported and run from a few like python script), but it needs some cleanup and I've never tried it outside the web server.
My CPU object detection is OK, but the public, fast, easy to run models and my limited understanding of them is the weak point. I wound up doing a bunch of sanity check post filters and I'm sure it could be done much better with better models and better pre/post filtering.
Some of this code is older, before I was more serious about this specific code, and moving to type annotations has been pretty much the big project of the year for me for everything personal, among other "Eliminate everything hacky" projects, going back into 10yo code and cleaning up tons of stuff.
My bigger priority has been moving from Mako to Jinja2, especially for some particularly horrid templates that could not be highlighted or formatted because there's not many good Mako tools, JSON schema validation, but I definitely agree type annotations are critical.
VS Code is smart enough to catch a lot of stuff sans annotations though, so you can get by with a lot of nonsense, especially when half your time is just fighting GStreamer and you're not paying as much attention to the python side.
There's nothing better than GST that I've ever seen for dealing with media without actually having to touch the performance critical stuff in your own code, but it is not easy to debug stuff buried in autogenerated python bindings to C code, especially with an extra RPC layer to use a background process and defend against segfaults.
There's also lots of other weird stuff, like imports not at the top of the file, meant to support systems where some module wasn't available, and generally all kinds of cleanup that's slowly happening.
Handwriting a GST pipeline is pretty much what I did. I start with frame differences(I only decode the keyframes that happen every few seconds, so motion detection has to work in a single frame to have good response time).
Then I do a greyscale erosion to suppress small bits of noise and prioritize connected regions.
After that I take the average value of all pixels, and I subtract it, to suppress the noise floor, and also possibly some global uniform illumination changes.
Then I square every pixel, to further suppress large low intensity background noise stuff, and take the average of those squares.
I mostly only run object detection after motion is detected, and I have a RAM buffer to capture a few seconds before an event occurs.
NVR device code(In theory this can be imported and run from a few like python script), but it needs some cleanup and I've never tried it outside the web server.
https://github.com/EternityForest/iot_devices.nvr/blob/main/...
GST wrapper utilities it uses, motion detection algorithms at top:
https://github.com/EternityForest/scullery/blob/Master/scull...
My CPU object detection is OK, but the public, fast, easy to run models and my limited understanding of them is the weak point. I wound up doing a bunch of sanity check post filters and I'm sure it could be done much better with better models and better pre/post filtering.