Derushing in general is the most time consuming, so not only language pattern recognition but also image recognition: "From the rushes, extract all the sequences with bicycle crashes to give me a pile of clips to use in my edit" !
I film a bunch of skateboarding, and it can take tens of tries to land a trick. Similarly, there's usually an unique sound that signals a trick was finally landed.
Good multi-modal search and discovery is a huge part of cracking the editing problem.
Looks like https://kino.ai addresses that derushing stage, but as a specialized tool rather than as a function inside a video editor - which makes a lot of sense to me.
Tens? It sometimes takes my crew hundreds of tries (all on DV tapes).
How far have you been able to come with search for trick variations? It would be interesting to see a system that can reliably recognize what’s switch, nollie vs fakie etc. Then have it generate a list of all tricks for each skater and perhaps outstanding fails. Just some thoughts.