Great work! For unobfuscated or lightly packed ones, I guess your approach could mostly work.
One question: how do you detect when a binary is intentionally made to statically look similar to one binary, while its behavior actually mimics another?
That's a good question. There are Tigress transformations [1,2] that seem highly relevant to this goal, but they're harder to work with because the resulting C code isn't always compilable without errors.
In my work I'm not looking for intentional spoofing, but the obfuscations I do use [3,4,5,6,7] end up building very similar control flow structures for different functions. Maybe that fits the spirit of your question... Let me know if not.
So far I'm doing purely static analysis and control flow, but the broader field of reverse engineering includes dynamic/symbolic analysis where you track values through a running/simulated program. Great results but very costly to run.
I've been focusing on making cheap/static analysis better, so I haven't explored the dynamic/symbolic side at all yet.
One question: how do you detect when a binary is intentionally made to statically look similar to one binary, while its behavior actually mimics another?