Suppose we had an index of snippets, meaning you've parsed them and are able to search isomorphically. So, e.g. variable names are not significant. Some techniques discussed[1].
Then we run that against source repos, we could get update notifications for copypasta'd code.
"In file F at line L, it looks like you used some code from SO at revision R. In revision R', it's been corrected."
SO copypasta is better than NPM, because no one can change the codesnippet to steal bitcoins once you've copied it into your code base. It's much more secure than a mutable database.
> Just look at the Left-pad thing, or the event-stream thing.
Those prove that we could see the problem. Brokenness doesn't go away when you grab a snippet of code or reinvent the wheel, you're simply unaware of how much of it is buggy or broken.
What do you mean by this? As far as I understand, NPM provides access to packages, not snippets and doesn't as far as I know provide a way to search the code in those packages let alone isomorphically.
A lot of npm packages aren't longer than a typical stackoverflow answer, and they get used everywhere, to the point where installing a dozen packages can lead to tens of thousands of sub-packages being installed.
At that point, the packages are essentially "indexed snippets" of code.
> I qualitatively analyzed the top 50 clones in that list and was able to identify the source (or at least a source) of the snippets in most of the cases.
Then we run that against source repos, we could get update notifications for copypasta'd code.
"In file F at line L, it looks like you used some code from SO at revision R. In revision R', it's been corrected."
[1]: https://wiki.haskell.org/Hoogle#Theoretical_Foundations