It still astounds me that we haven't solved reusable modules in 2016.
Sure, we have libraries, apis, package managers etc. but every time I read a code base, there is always a utility function reinventing the wheel.
Someone wrote it because it is still difficult to discover modular code and reuse it easily.
It's pretty nuts when you think about it. Imagine mechanical engineers having to recreate the same CAD file because they can't find a component with the same functionality (which does happen but for other reasons e.g intellectual property).
But we don't have the same intellectual property challenges and yet the best tools we have to discover and reuse code are 90s search engines that treat code as raw text with zero contextual understanding and often outdated code snippets.
Haskell has Hoogle[1], which allows you to search for functions using a type signature. This is surprisingly effective.
Let's say you want that `contains` function from the original post. You'd search for `(Eq a) => a -> [a] -> Bool`, which describes a function that takes two parameters and returns a boolean. The first result[2] is the `elem` function, which is exactly what we wanted!
This is a bit of a contrived example, but I have honestly been surprised by how effective searching by type signature is in Haskell. I wonder if it is possible for a language with a weaker type system, like JavaScript.
Does the order of the parameters matter? It's been a while since I've touched Haskell, so maybe it's obvious that this isn't [a] -> a -> Bool for some idiomatic reason. I guess that would be `isContained` instead of `contains`, so it probably wouldn't be the first thing I search for, but there's at least some potential for ambiguity.
It's almost a convention in Haskell. The idea is, since currying functions is easy and common, you stick the argument that you'd want the least in a curried version last. So if you wanted to find out if x was in ten lists, you'd just:
map (elem x) [list1, list2, ..., list10]
or, for folds, the function is the one least likely to change, so you put it first, and the list is most likely to change, so you put it last:
Of course, it's always debatable if you can find some situation where you wanted to curry in a different order, but generally you pick in order to reduce forced named lambda parameters. As far as I can tell, that's how it's done.
Edit: if you saw the sneaky edit, you'll know that this particular convention isn't easy to follow!
As you can see here[1], changing the argument order does not prevent `elem` from showing up in the first position.
If there are multiple close matches, ordering of arguments can change which is first, but the one you're looking for is almost always in the first few.
We've come a lot farther than you may realize. We've come far enough that, having removed a lot of the accidental complexities of importing external modules, we've discovered collectively than the essential complexity is non-zero. Bringing a module into your project is non-trivially expensive. Bringing multiple modules into your project grows in expense super-linearly. (Not "exponentially", but definitely something greater than linearly.)
There probably isn't a nirvana just waiting for That One Great Tool. Or, alternatively, if there is, it probably involves having to switch to something like a dependently-typed system or the very bleeding edge of where the Haskell community is right now, which is probably still not really quite where it needs to be yet for this. And that would still involve a lot of cost for that switch.
Oh, and this is one of those places where I perhaps may be justified in pointing out again that for all the sound and fury in software development, in reality our field does not move that quickly. There's a ton of package managers out there, but broadly speaking, they're the same set of features shuffled into various combinations. Nothing wrong with that, per se. Just a lot less innovation than may initially meet the eye.
>We've come a lot farther than you may realize. We've come far enough that, having removed a lot of the accidental complexities of importing external modules, we've discovered collectively than the essential complexity is non-zero. Bringing a module into your project is non-trivially expensive. Bringing multiple modules into your project grows in expense super-linearly. (Not "exponentially", but definitely something greater than linearly.)
That's because those modules also have multiple (and competing) dependencies though.
If most modules in all languages depended on the same, lower level modules (with rigidly fixed APIs) that wouldn't the case.
but every time I read a code base, there is always a utility function reinventing the wheel.
Sometimes the wheel is invented because the existing wheel isn't built for the same purpose (ex: off-road).
ex:
I recently needed to check a buffer to see if it's contents were valid UTF-8 in nodejs. I wanted to do this without having to run it through something like StringDecoder, because it's an intermediate buffer, and I didn't want to have the overhead of actually converting it since it wasn't for my use. So I do 'npm install is-utf8'. Then:
So the package is-utf8 is really more like "Is a printable UTF-8 string". Which is certainly a reasonable choice, and a possibly reasonable default, but was not what I needed.
This is a big one. External dependencies are a liability, you need to be getting enough value out of them to offset their cost. For small one-off functions it's often better to just rewrite the thing than to pull in a library and have to worry about versioning or the library being abandoned or the API changing or the package manager on your target system not having that library or only having an old version or one that was compiled weirdly and doesn't have the function you need. There can also be interface friction between your code and the library--different syntactic style, or you have to convert the arguments from your native form into something the library can understand and then convert the result back.
Don't get me wrong, libraries are wonderful and useful and you should be using them where appropriate, but like everything they have tradeoffs. I've seen way too many projects that try to just tie every single library they can find together without having to write any code. I know these projects because they're a nightmare to install and keep working and I end up spending way too much time trying to hack around all of the code rot to get them running again.
Libraryism is the sinister flipside to Not-Invented-Here Syndrome, the same kind of conceptual teeter-totter as arguments around thin vs. thick clients.
This is one of the best qualities of languages like Haskell, the type system promotes discovery and true reuse of code. There has even been research into specifying dependencies as 'something that implements this signature' rather than specific packages.
Yeah haskell is amazing. I'm still really looking forward to the day when somebody writes something actually useful in haskell. (Oh but pandoc, xmonad, shellcheck... in how many years?) ;-)
It gets more daft when every time someone tries to solve the problem, they make it worse. See the 200 version of visual C++ restributable you end up installed.
I'm tempted to say developers shouldn't have been able to target specific dll versions, but that would cause different problems.
But we have this in software. On the JVM, Maven repos are essentially an enormous repository of existing modules that you can pull in your project with a one line addition to your build file. This covers small utility libraries to manipulate strings or dates (like the one you allude to) to entire ecosystems like web containers and everything in-between.
I don't understand what part of this doesn't qualify as a "reusable module" to you.
- How do you search for a function other than by description or name. Assuming the developer classified properly and you even happen to use the same domain language as the writer.
- When multiple modules are returned, similar packages, how do you compare and select efficiently without wasting hours reading the code, evaluating it, checking if it is still maintained and hoping there are no hidden bugs etc.
It is easy to share code. Package manager even facilitate updating code. But it is still a nightmare to find/discover code and reuse it
It's like building a lego set. You know the building block you want but if you have to spend hours looking for it, you give up.
I believe competent developers write shitty code not because they are lazy or inexperienced but because the cost of writing great code is very high.
Hoogle helps for Haskell projects, as you can search by Type, and the Types are expressive enough that that actually finds what you're looking for most of the time.
Neil Mitchell, the author of hoogle, has some idiosyncratic ideas about what to search over in hoogle. If you want a more expansive search, give hayoo a try.
> - How do you search for a function other than by description or name.
You were asking for modules and now you're down to the function level. That's a much easier problem to solve. Hoogle, Javadoc, documentation and good index engines solve this problem easily.
Note that type only is pretty ambiguous and most of the time useless without additional metadata. Types tell you what, they don't tell you why or how.
> - When multiple modules are returned, similar packages, how do you compare and select efficiently without wasting hours reading the code, evaluating it, checking if it is still maintained and hoping there are no hidden bugs etc.
At some point, the human brain has to step in and make a call. You can't expect an automated tool to dictate to you the code you need to write.
A random thought. Linting and static analysis tools for code smells might be correlated with bad code, in which case a tool that let you see that package A has huge monolithic functions and poor test and Package B is modular and simple and good test coverage might be a good way to decide between modules that do the same thing.
True. Stuff like common functionality (regex engine, math functions, talk to postgres, etc) should be written once, and consumed with higher level interfaces from all other languages.
The problem is we don't have a safe, easy to build and package with everything else, and easy to consume (without performance penalty) language. The best we have is C, which is not enough.
I think that's part of the problem he's talking about. Why do new modules have to be made for a new framework? The framework should play nicely with standard modules written vanilla for that specific language. This way we're not staying over whenever a new framework is made.
Those should delegate all the implementation to libs that work across languages (e.g. in C) and just provide a JS API for them. But they don't work that way, except for a very small number...
It's not necessarily a problem. It actually cuts both ways.
The code is not going to benefit from updates, but the code is not also going to be harmed by updates. Think API or behavior changes. Additionally, if the use case for a dependency changes, sometimes the "new" solution doesn't match well.
Ex:
I work with jscript in ASP sometimes. A javascript library changed such that instead of doing iterative processing of nested items (pushing an item on to an array, then looping, popping off the item and processing it, then repeating), they changed to use node's nexttick with some logic like 'we want to move to nested function calls for processing, and doing that iteratively would blow the stack, so we'll just use nexttick so it won't have an increasing stack'. Well, jscript in ASP doesn't have nexttick or any equivalent timer. So while the original code itself worked flawlessly, the use case of the authors moved and we pulled that code in as an external dependency.
That's obviously an extreme case, but I can't really count the number of times that an API change in an NPM package for node has meant modifying code, without a change in functionality.
So yes, you get updates, and sometimes those are going to be security updates and real bug fixes. Other times though that update is going be adding new functionality (and possibly new attack vectors), or dropping support for your use case, etc.
Like many things in our field, it's wisest to look at the risks in all cases, evaluate them for the specific situation at hand, and then choose the appropriate one, rather than cargo-culting one 'best practice'.
Stackoverflow is an invaluable resource when you are a newbie or learning a new stack.
That said, the real value of stackoverflow is not only in the code but the surrounding explanation and the following comments which discuss the merits and demerits of each solution. So you get to read multiple solutions to the same problem and realize why the top answers standout from the others. In a way, you learn to smell good code and bad :)
So searching on StackOverflow can be a learning experience!
But these kind of developments might be the next step and who are we say?
One problem I foresee is, the highly rated solution could be for a specific version of the stack/software ( the latest stable or the obselete, unmaintained one). In which case, you might end up with an error amplifying the mess.
I had a conversation with a coworker a few days ago where I jokingly suggested building an AI system that you give a failing test suite and it uses stackoverflow answers to make your tests pass. Sounds like we are one step closer to making that happen.
That actually sounds realistic. A well defined test suite is probably a good target for AI. Main issue is no partial wins, which are needed for training, so you'd need to break all functionality down into absolutely minimal units.
Just rerun it with a lower target code complexity. You'll know you set it too low when it starts monkey-patching the test framework to avoid actually doing anything.
This can be avoided by optimizing for the simplest code that works. Checking many cases in multiple lines is always going to be longer and more complex than just a single operation on a single line.
Which is how AIs avoid overfitting, in general, by penalizing complexity.
Rather than indexing StackOverflow, why not do this by indexing all the open source code out there? Sure StackOverflow answers are good, but they usually skip error checking for focusing on the question at hand. Code used in production applications surely is more sound.
code quality is context sensitive. There's a lot of crap in the Linux kernel that makes sense in context that would be terrible in any other context.
Come up with a general purpose algorithm for evaluating the quality of code and we can get somewhere. AFAIK that's the whole point of being a developer: you can compare the code intelligently.
A good start would be for Github to host a package that contains the latest versions of each repo. However even they may not have that right (depends on TOS).
Perhaps instead of using StackOverflow "correct" answers, someone could create a site that has example of anti-patterns and code smells, which you could then use to analyze the code on Github and link to possible alternatives.
Have the code editor report back to the cloud the amount of times a certain snippet was "selected"/used. And just rank the drop-down box/autocomplete options accordingly.
I guess we as programmers should seriously think about the future of our craft.
If, for example, we feel we are basically using the same building blocks over and over again, we should seriously think about organizing libraries and code snippets and questions asking for such snippets in an organized way and provide ways to transpile 1 solution in different languages etc..
We should not accept the current state of our craft as final and rather think about how to improve in general.
If for instance something like StackOverflow has become the Wikipedia of Code then let's think hard about how to make into a full blown tool, with all the features and semantics we need. It was a nice project the way it started and grew but it doesn't have to stay like that forever!
This isn't working at all for me...Either I don't get what it's supposed to do, or the common functions I'm typing aren't common enough. On a related note, 50 StackOverflow points seems like a high number for something like this, and may reduce the results significantly enough that the example doesn't work for most common problems.
Thanks - you're absolutely right. I've been meaning to rebuild the index with a lower threshold. Right now I think there are only something like 140k fragments in there. I also plan to add a "score" slider so you can choose the quality of your autocompletions.
If it's a joke "Ahah", if not it's just depressing. SO is a valuable resource to confront and understand concepts. Let's not encourage this copy/paste culture
The OS/SDK/Browser/Protocol/Firmware which you used to type this comment is a compiled copy/paste of a zillion lines of code, of which you probably typed none.
If we encourage the DRY paradigm in the whole development cycle, copy/pasting a function is just a further reach of such paradigm. Now, if a programmer decides whether or not study and understand the pasted code has nothing to do with the quality of the final product.
Furthermore, IDEs by design encourage copy/pasting to speed development, most offer functionality to store code snippets. In that sense SO is like a web extension of IDEs.
It may be bug-free, but it may not be the correct solution for the specific problem the programmer was trying to solve. It may also not do what is expected under different circumstances.
It may be a hammer when you're looking for a screwdriver.
> If we encourage the DRY paradigm in the whole development cycle, copy/pasting a function is just a further reach of such paradigm.
No, it's the opposite. The whole point of DRY is you don't copy/paste, you put the thing in a common place and have references to it.
Many SO answers are good enough to be libraries. But they should be libraries, not snippets that are copied and pasted. Fortunately we're moving in that direction - our library infrastructure is becoming good enough that it's worth releasing even small pieces of code as reusable libraries, and people are.
I'm not sure how linking (as part of compiling) works. I thought how it worked was by copying the compliled thing from other files into the same executable. Is this not the case?
I disagree that they encourage copy and pasting, what they really encourage is auto completion of well structured code.
Even if you find yourself copy and pasting your -own- written code, you should question why you're having to do that.
Is what you're doing perhaps better abstracted to somewhere else? Be that in an abstract class, a helper class, or even in a separate library from which you can reuse it.
IDEs excel in understanding your code, which helps finding where you've stored something and allows auto-completing access to it, which is almost definitely their most important strength.
This is like the car mechanic who goes to the used-parts dealer (the junkyard) and grabs the first part with the same size screw holes as the broken part.
You're not expected to invent the recipe for bread though - you're allowed to talk to people and find the best recipe, and if it's acceptable quality then you use that.
EDIT: Unless your job is to invent a better bread recipe.
Being able to search and use online code examples is a key skill in software development. The web has become the collective knowledge base for all of us.
Boss never wants a function, boss wants a solution to a problem, and your job is to know how to express that in functions. If you have the signature, 99% of your job is done.
I know you know, I know they know, and I know you know they know. I just wanted to say it, ok?
Haha, I love this. We're getting closer and closer to the ultimate conclusion of our industry: where we'll just type in a few keywords and generate full applications from StackOverflow code examples!
This is close to a real pun it would do, and nearly just as dire. It's fixed in recent versions, but it used to not understand emergencies. If you said "Siri, call me an ambulance," it would come back with "OK, I'll call you 'an ambulance' from now on."
This is exactly the direction we should be going. We build the world's most sophisticated engines for predicting human social behaviour, why are we stuck in the 1990s when it comes to autocomplete for a tightly scoped domain like writing software?
i did something similar one night a few months back, using python as a sublime plugin, but scrapes the live site. it eventually gets blocked tho as doesn't use the official API.
It's not obvious, but because of the way it's programmed, you have to remove the example function, since it parses the entire code block for similarities.
Cool idea, but kind of ironic that the example you used produces unnecessarily complicated stack overflow solutions. A JavaScript "contains" function is as simple as:
var contains = function (needle, haystack) { return haystack.indexOf(needle) !== -1 }
IE8 and older to be specific[0]. But, if you're supporting IE8 still you have bigger problems at hand.
Microsoft stopped supporting any version lower than IE11 as of January 12th of this year[1]. So unless all of those older IE machines are accessing an intranet (and only an intranet) their risk of being hit with malware is way higher than average.
For my clients I do a cursory IE9+ review, but if anything lower is a requirement then it's an additional fee to make a site work with it.
In a few months it'll be IE11+ and the same fee structure will apply to anything lower (I'm just waiting for IE10 to wind down a bit).
What I do, is provide nwjs to those on old versions, so they think they are getting the "application" version, but in reality, its Chromium hard coded to only goto the web application.
So far its a win for both parties, since they are not installing another browser than can interfere with some silly IE6 only internal enterprise site and we get to use the latest browser features.
Believe me, I understand how comments like that get on one's nerves. But civil discourse requires those of us with nerves to tolerate the irritation enough not to toss it back. The system has no steady state: the irritation either grows or we dampen it.
A great way to increase your liability in an automated fashion!
Sure, the code on Stack Overflow is licensed as MIT... but what assurance is there that the code which was posted is original property that the poster owns copyright to? What assurance is there that the posters won't claim patents on the methods being used?
The risk is low, but it's certainly not 0. Big companies caution their software developers to not even read code in SO answers to avoid lawsuits... how much risk do you bring to your company by copying and pasting (not to mention autocompleting) from SO?
It's fortunately not _that_ easy to get a software patent. It'd be quite difficult to patent something that's the length of an average stackoverflow answer. There's also a lower limit on the length (and originality) for copyright which I doubt many answers reach.
I can fit the concept of mp3 decoding and a sample decoder in a few dozen lines of text and code. This is a patented technology (for another year or two, at least).
falcolas might not be able to fit a full mp3 encoder and decoder in the couple of lines, but falcolas can probably come up with enough in these lines to violate the mp3 patents.
There was recently talks around changing it, and most of the feedback was very negative. You can see the initial post here [1] and the follow up post here [2].
Can't really see this being useful if I'm honest. Here you are simply showing the code suggestions based on what the user is writing. You are completely ignoring both the context of the code being written by the user and of the code being suggested.
These are key functionalities in order to actually be usable. With this you simply get hundreds/thousands of suggestions that are not related to what you are coding. And if by chance the suggested code is exactly what you want, then the variable names wouldn't even match. This is assuming that the suggested code is complete and functional.
No, but I realize the demo doesn't make it very clear. It inspects the structure of the code up to the cursor position, based on the syntax tree, along with nearby variable and function names, and matches to similar constructs from SO. It could be greatly improved, but I haven't had a lot of time lately.
It's funny because I thought "you can't just copy and paste SO, there's a number of things to think of to be able to do it correctly". Guess I'm not the only one who thought that.
Anyway, most intellectual work is just applying known recipes (copy-paste, renaming a few variables), so it's just a matter of granularity and sources. The hate for SO copy-pasters owes a lot to the few beginners who don't understand that yet and take copy-paste literally.
This post fills me with dread, it's come to the point during code review that I have to be sure the shitty devs aren't coping the bad answers from stackoverflow... I hope they don't find this.
Sure, we have libraries, apis, package managers etc. but every time I read a code base, there is always a utility function reinventing the wheel.
Someone wrote it because it is still difficult to discover modular code and reuse it easily.
It's pretty nuts when you think about it. Imagine mechanical engineers having to recreate the same CAD file because they can't find a component with the same functionality (which does happen but for other reasons e.g intellectual property).
But we don't have the same intellectual property challenges and yet the best tools we have to discover and reuse code are 90s search engines that treat code as raw text with zero contextual understanding and often outdated code snippets.
PS: I love your project btw.