Suppose we had an index of snippets, meaning you've parsed them and are able to ...

eterm · on Dec 4, 2019

We essentially have that, they're stored in NPM, and it's horrible.

It turns out when you can package snippets you use so many you can't possibly keep track and audit them all.

Just look at the Left-pad thing, or the event-stream thing.

earthboundkid · on Dec 4, 2019

SO copypasta is better than NPM, because no one can change the codesnippet to steal bitcoins once you've copied it into your code base. It's much more secure than a mutable database.

ben509 · on Dec 6, 2019

> Just look at the Left-pad thing, or the event-stream thing.

Those prove that we could see the problem. Brokenness doesn't go away when you grab a snippet of code or reinvent the wheel, you're simply unaware of how much of it is buggy or broken.

spuz · on Dec 4, 2019

What do you mean by this? As far as I understand, NPM provides access to packages, not snippets and doesn't as far as I know provide a way to search the code in those packages let alone isomorphically.

eterm · on Dec 4, 2019

A lot of npm packages aren't longer than a typical stackoverflow answer, and they get used everywhere, to the point where installing a dozen packages can lead to tens of thousands of sub-packages being installed.

At that point, the packages are essentially "indexed snippets" of code.

3fe9a03ccd14ca5 · on Dec 4, 2019

There’s going to be a massive amount of false positives:

“I see you used “for i in...” and that copies this SO question about iteration...”

ben509 · on Dec 6, 2019

Agreed, you'd definitely need a mechanism to mitigate false positives.

One technique would be to try and define what constitutes "trivial" code.

Another would be to prioritize sources. Documentation from standard or major third party libraries should take precedence over SO.

Another would be a feedback mechanism. If repo authors vote a particular snippet up or down, after a threshold it could be excluded from matching.

Or you could opt-in by means of a comment, though this might make it useless.

Shog9 · on Dec 4, 2019

There has been a bit of research on this[1]:

> I qualitatively analyzed the top 50 clones in that list and was able to identify the source (or at least a source) of the snippets in most of the cases.

[1]: https://meta.stackoverflow.com/questions/375761/how-to-handl...