The fact that a library discourages you from learning it's internals is how you know it's well done. If you needed to understand it, you could. The reason you don't is because you've never had a compelling reason to.
If only one person knows how a library works, that is a problem. If a 100 person team maintains it and you're not on that team.... it's probably because you have other stuff to do.
Software is an engineering discipline. Computer science is a science. If you are in programming because you want to advance our understanding, great, go work in one of the many fields with large novel algorithms that need to be understood.
For a typical programmer... Look around. Modern software is one of the best things about the modern world. It does SO much for us. Do you really think we, with a real world distribution of programmer abilities, could do all this without massive division of labor? How much would be missing if we insisted on understanding everything before we use it?
I suspect very few alternative approaches to software would work as well to truly build the modern software defined world where everything from cars to taxes to photography is digital.
Because... whenever someone tries an alternative approach, there usually seems to be the hidden unspoken assumptions that they actually don't want software to be as big as it is.
The end product of software itself these days is a service(in the sense of this article, not the SASS sense), wheras software built with an understanding-first mindset seems to usually wind up being closer to a digital version of a passive analog tool.
> The fact that a library discourages you from learning it's internals is how you know it's well done. If you needed to understand it, you could. The reason you don't is because you've never had a compelling reason to.
I don't think we need to protect people from learning internals, they do that just fine on their own. I know many situations where we failed to understand internals even in the presence of compelling reason.
With apologies for continuing to quote myself:
"I want to carve steps into the wall of the learning process. Most programs today yield insight only after days or weeks of unrewarded effort. I want an hour of reward for an hour (or three) of effort." -- http://akkartik.name/about
I’ve been coming around to the conclusion that some coding patterns, especially overuse of delegation and mutation
of inputs, make code hard to learn.
I talk occasionally about viewing your code from the debugger but that is kind of hand wavy. I wonder if there’s a ‘linter’ one could write that looks at coverage or trace reports and complains about bad patterns.
yes, lately the Butler Lampson quote echos round my head "All problems in computer science can be solved by another level of indirection"
The problem is, we have been adding layers of indirection for 80 years, and each is leaky. So now it's very difficult to do basic stuff, and people are ok with software thick with abstractions.
The next stage should be removing unnecessary layers of indirection, as, like you said, things are much easier to understand and maintain that way.
"All problems in computer science can be solved by another level of indirection, except for the problem of too many layers of indirection." -- David Wheeler
It's very easy to do basic stuff. It's hard to do basic stuff (or anything) well, since you end up having to solve problems that should have been solved by layers you're building upon.
We haven't been monotonically stacking layers steadily. In fact sometimes we collapse layers into monoliths. Sometimes we rebuild layers and make them bigger.
But we don't have 900 layers or anything. Performance is... usually pretty darn good, except for uneccessary disk writes.
It's trivial to do basic stuff because of those layers. What's hard is doing low level things from scratch in a way that doesn't conflict. But at the same time, there's less and less need for that.
I think things are way easier to maintain with a bunch of layers than the old school ravioli code. Most layers are specifically meant to make it easier to maintain, or they just formalize a layer that was already effectively there, but was built into another layer in an ad hoc manner.
As I've spent more time with flame graphs I realize they are, once you get down to brass tacks, the wrong tool for perf analysis because it's the width of the flame that tells you the most, not the height, and we usually worry about height when thinking about actual flames.
However there are all sorts of little subtle costs in your system that aren't captured by most of these tools due to lack of resolution (I haven't spent a lot of time with Intel's hardware solution) and asynchronous costs like memory defragmentation. Depth and frequency of calls are a useful proxy for figuring out what rocks to look under next after you've exhausted the first dozen. For this reason the flame graph is a useful fiction so I don't poo-poo them where anyone can hear. I can barely get people to look at perf data as it is.
But then I think how I'm turned off by some fictions and avoid certain fields, like parser writing, and wonder if a more accurate model would get more people to engage.
The difficult part about having opinions about the industry of software engineering is that it is, somehow, a largely enjoyable job.
But those two things are never going to be perfectly aligned. Enjoyment of the process does not guarantee working results that people want to pay for, and the thing that people want to pay for may be deeply unenjoyable. You can choose to tackle a subset of the problem space that happens to be both enjoyable and worthwhile, but you ought to admit that you are looking at a subset of the problem space.
I do a lot of other things with my time that are also near the confluence of enjoyable and marketable. I sing, for instance, and choir rehearsal and individual vocal practice are both a lot of fun. But any professional musician will tell you that if you want to be world-class at it, you occasionally have to wring the fun out of it. I've chosen the other option: I'm a volunteer singer with my church choir. I sing because it is fun, and only insofar as it is fun. We do new and not-very-difficult music each week, and an hour of effort is inherently an hour of reward. If my choir director said we're going to spend six months drilling some difficult measures and perfecting our German pronunciation, we'd all leave.
(In that sense, a volunteer church choir is rather like hobbyist open-source development. If it produces an enjoyable result for the public, great, and we do find that rewarding. But there is an upper bound on just how hard we're going to try and on the complexity of what we commit to doing.)
If you want an hour of reward after an hour of programming effort, that's fine! But the employment contract I've signed promises a year of payment after a year of effort. Occasionally, I want to work on things that are immediately rewarding because that helps with motivation. And it's important that such work is available. But I have a job that ultimately demands specific products, some of which just aren't going to be immediately rewarding, some of which do actually take a lengthy learning process, and - most importantly - many of which require building on top of other people's work that I could understand if I wanted but there is no business value in doing so up front.
(Incidentally, in the other direction, there are cases where I want to tackle problems that promise years of reward only after multiple years of effort, and figuring out how to get time to work on those problems is actually hard, too.)
We share almost all our code internally and leverage lots of open-source code so that we all have the option of understanding the code if we need it - but we rely on division of labor so that we have the option of building something on top of the existing base today, if we need that, which is more often what we need.
If you want to work on a hobbyist UNIX-like kernel for enjoyment's sake, great, I'm genuinely happy for you. But my servers at work are going to keep running a kernel with millions of lines of code I've never read, because I need those servers to work now.
Thanks, I largely agree with that. I just don't think we're getting a good deal for society with a year of effort from a paid programmer. When I said "reward" I meant understanding, not enjoyment. We programmers are privileged enough, I'm not advocating to optimize for our fun as well.
Separation of concerns is a must for ordinary/mediocre programmers to take part in building complex software. Getting these guys involved is getting a good deal for society.
I probably should be more clear, i don't think we should make it hard to learn how they work, or do anything to stop them, and we should probably even make easier, but at the same time, we should make in unnecessary, so we can get stuff done with maximum reuse and minimum monkey patching and custom stuff.
Parent talks about how fast you get back results from effort (learning, fiddling, etc) spent:
"Most programs today yield insight only after days or weeks of unrewarded effort. I want an hour of reward for an hour (or three) of effort."
So they want to be rewarded with at least an hour of saved time, or more productive time, for one (or few) hours they put in, instead of spending weeks to reap rewards.
I was just alluding to a subjective sense that you got something reasonable for your trouble and efforts. You could do something, or you learned something, or you were entertained. My argument doesn't depend too much on the precise meaning, I think.
"Division of labor is an extremely mature state for a society. Aiming prematurely for it is counterproductive. Rather than try to imitate more mature domains, start from scratch and see what this domain ends up needing."
This is more the gist of the article than "lets not have libraries or abstractions at all", its pointing more towards "lets defer those questions until the right time rather than from the jump".
I think Enterprise Java and the insane complexity of modern web dev are indicative of a consequence.
>Because... whenever someone tries an alternative approach, there usually seems to be the hidden unspoken assumptions that they actually don't want software to be as big as it is.
I really don't think that OP is making this assumption
People’s opinions on this stuff contain a lot of unstated pain points. Monolith first argues against a coping mechanism people use because they hate Big Ball of Mud projects.
Anyone who has ever done repair work can tell you how often people wait until it’s far too late to do basic maintenance. Software is similarly bimodal. People either run it till it falls apart, or they baby it like a collector’s item.
We have not collectively learned the trick of stopping at the first signs of pain or “strange noises” and figuring out what’s going on. But mostly we have not learned the trick of insisting to the business that we are doing our job when we do this, instead of plowing on blindly for six to eighteen months until there’s a break in the schedule. By which time everything’s a mess and two of the people you hoped would help you clean it up have already quit.
I read too much commit history, including my own, to agree with that. We are too close to the problem, and we enjoy deep dives too much.
I save myself a lot of frustrating work by stopping for a cup of coffee or while brushing my teeth or washing my hair. “If I change Z I don’t need to touch X or Y”
I mean, I do the exact same (I did it Friday), but I don't think this is counter to my point. It's not because I don't want to fix something so I avoid it, it's because fixing it puts me behind on other tasks (that come down from people writing my paycheck), so I avoid tasks that won't be clearly rewarded unless I can convince them otherwise.
It's because time pressures are designed in a way that incentivizes you not to do maintenance when you see a mess. It's also not sexy on a resume that you altered something from a mess to not a mess vs implementing some new feature.
It takes highly detail oriented and creative people to develop good software and those traits tend to drive one crazy to the point of fixing something when you see a mess. Given no constraints I bet most developers would clean up their code to the best of their ability and fix things as they come across issues they identify. I've been in these no constraint environments, usually on stuff I write myself for myself, and I don't mind going back and doing a significant refactor when it's clearly needed. Once I'm done, I feel genuinely satisfied that I've done something useful and productive because I only need to convince myself and because no real external financial pressures exist in this context.
I actually think this applies to the passive analog world too. Very few engineers could tell you how steel or concrete are made. And we all use abstractions / libraries, for instance using tabulated properties of steel alloys rather than knowing the underlying physics.
In fact, if put in plainer terms, every engineer would nod in agreement with Hyrum's Law. Everybody has a story of a design that worked because of an undocumented property of a material, that stopped working when Purchasing changed to a new supplier of that material.
The poster child for this was storage of liquid radioactive waste in barrels of clay cat litter.
Some bright-eyed purchasing agent substituted bark-pellet cat litter, because it was ecological or something -- anyway, not cheaper -- resulting in need for a cleanup so expensive that how much it cost is classified.
Someone substituted inorganic cat litter with an organic cat litter (specifically, wheat husks), it shut down WIPP for three years at a cost of half a billion dollars, and it's not classified: https://cen.acs.org/articles/95/i20/wrong-cat-litter-took-do...
On the contrary. Every engineer is able to learn the process when it matters. And it often does. The type of steel used, the treatment and the orientation is imporant to the outcome. You can not hand wave them away as "implementation details".
I don't see this is a disagreement with the GP. You say they (all engineers) can learn it, GP says they (few engineers) could tell you how it's done. Those are in perfect agreement. At any moment in time, there is likely no engineer who can explain fully every process used by their domain or object created by others in their domain. That doesn't mean that the majority of those engineers couldn't learn any arbitrary process or the details of an arbitrary object in their domain.
Of course not. A basic understanding? Absolutely, yes.
It's even widely accepted to be an important part of the education.
Just as Karnaugh maps and parser theory is part of computer science engineering curriculums. It's not something that's expected to be used a daily basis but some general knowledge of how a processor works is absolutely necessary to be able to at least talk about cache line misses, performance counters, mode swaps etc.
One issue is that the divisions between engineering fields are somewhat arbitrary, and technology doesn't always respect those divisions, so we don't know in advance what education is going to be needed. A second problem is that we make it very hard for young engineers to maintain their technical chops when they can be quite busy and productive doing basic design. In fact, engineering students hear through the grapevine: "You won't use this stuff once you get a job."
As a result, the industry settles on an "efficient" way of managing issues that require depth or breadth, which has to have a few people around who handle those things when needed. That becomes a form of division-of-labor.
This is true, but perhaps a bit of a tangent to a parable.
I was merely reacting to the statement that most construction engineeers wouldn't know "how concrete is made". Most of them could tell you a thing or two about it, it's even in the curriculum. They are even expected to know about different preparations.
The idea that specialization doesn't exist is a bit of a straw man argument and not something anyone seems to argue.
>If you're minimizing challenge for your software engineers, you're making worse engineers over time
Or you're solving your current problems in the most efficient means possible, e.g. engineering. Minimizing one challenge frees mental processing power to worry about bigger issues. I couldn't care less how the memory paging in my OS works. I care about building features that users find valuable.
Such a needlessly extreme example, sometimes it might mean just working on the backend after doing nothing but frontend for your career, sometimes it might mean not assigning a feature of a certain complexity to the person you know likely grasps it instantly and assigning it to someone who you're less sure about but wants to do more. It can be just as much about figuring out and cultivating the potential of your workforce and that can create greater efficiency over time. There are certainly opportunities where this can be accomplished, because if everything is mission critical, business itself is doing something wrong. The amount of opportunity and the nature of such will vary across businesses, and perhaps an engineer may need to go elsewhere to search out further challenge. Your statement that "Minimizing one challenge frees mental processing power to worry about bigger issues" presumes a level of uniformity.
On the contrary, by not minimizing the challenges you are left with less productive engineers. The idea that tools etc should be written internally if hen they already exist is such a Byzantine way of looking at things. The only challenges an engineer should face should be the problem they’re trying to solve. If you’re building a SaaS you shouldn’t be worrying about rebuilding all the tools a project needs to reach conclusion.
Yeah, but this isn't the Olympics. If the same people are producing better quality work with less stress... Sounds like you're doing something right even if they are technically not learning certain things that aren't relevant.
I have to agree. The article quotes sound right and wonderful, but thinking about the environment in which I work, there’s not really a practical way to combine our efforts. We simply wouldn’t get things done in time.
Division of labor ensures that changes can happen reliably and quickly. I’m ramping on an internal library right now and it’s taking a while to understand it’s complexities and history. I can’t imagine making quick changes to this without breaking a lot of stuff. It will take time to develop an intuition for predicting a change’s impact.
Now multiply my ramp up time by every dev on my team every time they need to make an update. You can imagine productivity slows down and cognitive overhead rises.
> Do you really think we, with a real world distribution of programmer abilities, could do all this without massive division of labor? How much would be missing if we insisted on understanding everything before we use it?
I agree that you don't want to spend time becoming super familiar with everything that you use but you should ALWAYS have a high level idea of what's happening under the hood. When you don't it inevitable leads to garbage in garbage out.
Way before the arrival of personal computers, it was clear for me to realize that I would be highly disappointed if average programmer abilities declined half as far as they have by now.
Some of the most coherent technical projects are the result of a single individual's outstanding vision.
Occasionally, the project can be brought to completion by that one individual, in ways that simply could not be exceeded alternatively, all the way from from fundamental algorithms through coding to UX and design.
Additional engineers could be considered useful or essential simply to accelerate the completion or launch.
When the second engineer is brought on board, it may be equally performant to have them come fully up to speed on everything the first engineer is doing, compared to having them concentrate on disparate but complementary needs which need to be brought to the table in addition.
If they both remain fully versed, then either one can do anything that comes up. Sometimes they can work on the same thing together when needed, other times doing completely different things to make progress in two areas simultaneously.
You're going to end up with the same product either way, the second engineer just allows an earlier launch date. But truly never within half the calendar time, those man-months are legendarily mythical.
For projects beyond a certain size more engineers are simply essential.
Then ask yourself what kind of team would you rather have?
All engineers who could contribute to any part of the whole enchilada from primordial logic through unexceeded UI/UX?
Or at the opposite end of the spectrum, all engineers who are so specialized that the majority of them have no involvement whatsoever with things like UI/UX?
Assuming you're delivering as satisfactory a product as can be done by the same number of engineers in the same time frame.
You're NOT going to end up with the same product either way.
Now aren't you glad it's a spectrum so it gives you an infinitely greater number of choices other than the optimum?
The programmer ability decline is probably just because it's easy now. There are still amazing programmers, it's just that a "mediocre programmer" was hard to find before, when getting ANYTHING done took a ton of skill.
As long as the number of excellent programmers is steady or increasing, I'm not too disappointed if there's a bunch of average ones too, as long as those average ones have great tools that let them still make good software.
It does seem like some of the very top innovative projects are done by one person.
I have very little direct experience with software small enough for an individual to understand, but it seems like a lot of our modern mega-apps are elaborations on one really great coder's innovation from the 80s, and real knock your socks off innovation only happens every few years.
Engineers that don't understand or care about UX can be a really bad problem, attempting to bolt on a UI on something meant to be a command line suite is usually highly leaky.
The opposite seems to be slightly less of a problem, to a point, almost nobody writes sorting algorithms, and writing your own database is usually just needless incompatibility.
I definitely am glad it's a spectrum, because having absolutely zero idea about the real context gets you in trouble, and stuff like media codecs are hard enough we'd lose half the worlds devs(including me) if you needed to understand them to use them.
Few library and service creators seem to think you can treat what they built as a black box. At the very least, they generally come with some advice on usage patterns to avoid.
Writing performant code unfortunately tends to require at least having a basic working model of how all your dependencies work. Otherwise, you tend to find yourself debugging why you ran out of file handles in production.
Most (useful) libraries are managing state. This means the user of the library should have a pretty good idea of how the state is managed or they will call functions in the wrong order.
Well I think that’s because most open sources licenses remove any liability from the creators and have no support outside of volunteers. For this reason a lot of them, and even on closed source projects to add an extra layer before support, provide basic troubleshooting information.
Programming seems to be one of the few fields where we think it’s a bad thing if our tools aren’t made by us (this has luckily been waning in recent years, but the push back against “bloat” we see on every tech forum is proof that the mindset is entrenched.
UPDATE: It seems the good folks at a certain subreddit found this comment and are interpreting "Discourage you from learning about it" as "Make it copyrighted" or obfusctation or something. I don't think those are signs of a good library!
I'm referring more to things like FOSS JS frameworks with the kind of batteries included design that prioritizes abstraction and encapsulation over simplicity. Nothing actually stops you from learning them, it just takes time, because their big, and it's not necessary to use them.
The best counter-argument was given by Alfred North Whitehead back in 1912:
"By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and in effect increases the mental power of the race.
It is a profoundly erroneous truism, repeated by all copy-books and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them. Operations of thought are like cavalry charges in a battle — they are strictly limited in number, they require fresh horses, and must only be made at decisive moments."
OP was me annoyingly quoting myself, and here's my refutation annoyingly quoting OP:
"Software libraries suck. Here's why, in a sentence: they promise to be abstractions, but they end up becoming services. An abstraction frees you from thinking about its internals every time you use it. A service allows you to never learn its internals."
"Confusingly, I've been calling Iverson's notion of subordination of detail 'abstraction', and Iverson's notion of abstraction 'service' or 'division of labor'. Though lately I try to avoid the term "abstraction" entirely. That seems on the right track. Regardless of terminology, this is a critical distinction."
As I hope these clarify, I have nothing against true abstractions as you describe them.
>Operations of thought are like cavalry charges in a battle — they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.
I love this analogy. People who aren't accustomed to doing long form, extended, in-depth thought about a single problem don't seem to understand this. That your attention and mental effort is a finite resource, that can be frivolously wasted or put to good use. It requires real concerted effort, and it isn't easy. Thus, why most people never really think about anything.
You can’t do much carpentry with your bare hands and you can’t do much thinking with your bare brain. (i.e. using appropriate Thinking Tools and Intuition Pumps) As Daniel Dennett says, "we are productive now because we have downloaded more apps to our necktops (i.e. brains)"
Basically, the human brain works in chunks for short and medium term memory, this is well known by psychological studies. Likely the same applies to long term recall and processing, since it's a neural network and the "chunks" are physically encoded neurons and synapse connects that symbolically represent "chunks"
If you had to deconstruct those chunks to first mathematical and physical principles, we'd never get anywhere. It would be like Zeno's paradoxes: each step forward is a seemingly infinite descent into detail.
This screams “toy projects” to me. Separation of concerns is a hard-won insight. We call code without this “spaghetti” because it eventually becomes nearly impossible trace or to reason about. Even if you have all your engineers working all over the whole codebase, the codebase itself still needs separation of concerns to simplify reasoning about behavior. This is considered a good engineering practice because the alternative is that you need to hold the whole system in your head to understand any behaviors. I’ve seen codebases like this. It’s not pretty. Use of abstractions to simplify cognitive overhead is not a weakness.
The idea that projects should take source copies instead of library dependencies is just kind of nuts, at least for large libraries. Yes, the proliferation of tiny libraries is obnoxious and adds a ton of pointless overhead. But for large libraries, taking a source copy is absolutely crazy. Imagine taking a source clone of Chromium into your project because you need a rendering engine. Yes, you can probably cut a bunch of code out as a result, but you will also now need to deal with security concerns and incorporation of new features until the end of time. And this is now way more expensive and you can’t just do a straightforward merge because you changed the code. 5 line package? Sure, just copy the code in (assuming licensing is in the clear). 500 dev-year codebase? Cloning that code is by far not a win.
The thing I tell people when I’m critiquing their tests is that nobody gives a shit about your tests until they break, at which point they’re interacting with your code when they’re already in a bad mood.
They will associate that feeling with you. So do you want to make their bad day worse by writing abstruse code? Or should they be simple?
I know I’m “done” with a test when I know why it’s red from what’s in the console. I shouldn’t even have to look in the source code. Until then, my tests are a work in progress.
"The idea that projects should take source copies instead of library dependencies is just kind of nuts..."
The idea that projects should take copies seems about symmetric to me with taking pointers. Call by value vs call by reference. We just haven't had 50 years of tooling to support copies. Where would we be by now if we had devoted equal resources to both branches?
"...at least for large libraries."
How are these large libraries going for ya? Log4j wasn't exactly a shining example of the human race at its best. We're trying to run before we can walk.
> Absolutely. I'm arguing for separating just concerns, without entangling them with considerations of people.
The separation of people is largely an artifact of scale. At some point it doesn’t make sense to have everyone touching the whole code base. It’s untenable to have, e.g., everyone who works on Windows be an expert in every part of the code. It’s also generally unnecessary because in a large code base features are often nearly completely contained to one area of code.
Some teams are really dysfunctional and won’t let people ever do work outside “their” area. This is absolutely an antipattern.
> The idea that projects should take copies seems about symmetric to me with taking pointers. Call by value vs call by reference. We just haven't had 50 years of tooling to support copies. Where would we be by now if we had devoted equal resources to both branches?
We have probably invested dev-millennia into managing copies. This is exactly what source control does. This is not a new area of investment. Merging is a giant pain in the ass and very possibly always will be. Accepting merge pain better come with some huge benefits.
> How are these large libraries going for ya? Log4j wasn't exactly a shining example of the human race at its best. We're trying to run before we can walk.
Can you clarify what you see as the alternative? Implementing everything from scratch seems absurd and so costly that there’s no point in considering this an actual option. Plus there’s no particular reason to believe that you wouldn’t care your own security bugs. Taking a source drop wouldn’t protect you if you happened to take one with the bug. And you’re forever bound to that version unless you want to pay a very high cost to reintegrate a new one. What viable option do you see that I don’t? (Picking on log4j right now is easy but are you seriously implementing your own SSH, for example?)
For the record, these large libraries are going great. I’ve built numerous products that simply would not have ever been created without, say, SQL. The cost would simply have been too high. I’ve built others with libraries that the project outgrew, but it was still a big win to start with the library.
> For the record, these large libraries are going great. I’ve built numerous products that simply would not have ever been created without, say, SQL. The cost would simply have been too high. I’ve built others with libraries that the project outgrew, but it was still a big win to start with the library.
We definitely have great libraries out there. But they're the exception rather than the rule. I consider Redis and Sqlite in particular to be quite exemplary.
SQL I'm more ambivalent about. Or rather, SQL the abstract ideal is great, but I'm more ambivalent about implementations like MySQL or PostgreSQL. As I've hopefully made clear already, I consider just raw speedup for an initial implementation of an app as a poor and incomplete metric for the quality of a library. At best it's a good quality for something you can use in prototypes, not the final version you build after you throw the prototype away. (Because we all agree you should throw the prototype away, right? :) For non-prototypes you also have to take into account the total cost of ownership. I think the ever expanding wavefront of "specialization" in roles like DBAs is a sign that MySQL and PostgreSQL are a bad trade for society. People being deployed to fill in for all the numerous gotchas arising from a crappy interface. MySQL and PostgreSQL make things more efficient in the short term, but they make us less resilient in the long term.
> I think the ever expanding wavefront of "specialization" in roles like DBAs is a sign that MySQL and PostgreSQL are a bad trade for society.
That’s a sign of a maturing field. Does the proliferation of medical specialties indicate a problem? I would argue that it simply arises from the amount of specialized knowledge that one can possess and utilize efficiently.
The existence of SQL implementations allows for faster and more maintainable implementations. The set of engineers who could do what SQL allows without it is quite small. If SQL disappeared tomorrow, a thousand equivalent implementations would appear tomorrow because it delivers real value. Hand rolling queries in a general purpose programming language is a very poorly trade off. I’ve seen this sort of stuff before. It’s always a bug farm and every other engineer asks “why the hell didn’t you use SQL”?
Give it enough time and you’ll see “Redis expert” show up as a specialty.
But I don't know enough to have an opinion of medicine. I find it useful in my experience. I wasn't saying the connection between my observation and conclusion is ironclad in all situations. But I do hold the heuristic opinion in the one case I mentioned.
> The set of engineers who could do what SQL allows without it is quite small. If SQL disappeared tomorrow, a thousand equivalent implementations would appear tomorrow because it delivers real value.
How much of this argument is about idealized SQL and how much about MySQL or its equivalents? I was careful to draw a distinction. SQL the language/mindset is incredibly valuable. EF Codd thoroughly deserves his Turing Award.
You don't need to persuade me that MySQL lets us do things faster. What would really persuade me is an argument that it helps us do the right things. I think it makes it easy to scale past our level of competence.
*Shrug*. When I go to the doctor, I love finding someone who does nothing but the one thing I need. Maybe this person is a generally “less useful” doctor, but if I break my ACL I want the surgeon who does nothing but ACLs all day every day to work on mine.
> I think it makes it easy to scale past our level of competence.
This sounds like an amazing win. This is frankly what software is supposed to do.
> Give it enough time and you’ll see “Redis expert” show up as a specialty.
No opinion on what happens to Redis after Antirez left it (http://antirez.com/news/133). But I have high hopes that the versions he created will be fairly timeless and usable even a decade from now.
I wasn’t aware he stepped back. I don’t follow it closely because I don’t use it directly, though I have been very impressed by the way it’s managed to become so ubiquitous so rapidly.
Absolutely this. There's a wide ranging of programming activities that are domain-agnostic, but most of what programmers do requires deeper knowledge in a specific domain to be useful.
I question the need for scale in 90% of the places where the tech industry has cargo-culted it. Clearly I'm still failing to articulate this. Perhaps https://news.ycombinator.com/item?id=30019146#30040616 will help triangulate on what I mean.
> Can you clarify what you see as the alternative? Implementing everything from scratch seems absurd and so costly that there’s no point in considering this an actual option.
Not using, reimplementing and copying are the closest thing to solutions I have right now. You're right that they're not applicable to most people in their current context. I have a day job in tech and have to deal with some cognitive dissonance every day between my day job and my open source research. The one thing I have found valuable to take to my scale-obsessed tech job is to constantly be suspicious of dependencies and constantly ask if the operational burdens justify some new feature. Just switching mindset that way from software as asset to software as liability has, I'd like to believe, helped my org's decision-making.
> We have probably invested dev-millennia into managing copies. This is exactly what source control does. This is not a new area of investment. Merging is a giant pain in the ass and very possibly always will be. Accepting merge pain better come with some huge benefits.
Not all copying is the same. We've learned to copy the letter 'e' so well in our writings that we don't even think about it. In this context, even if I made a tool to make copying easier and merges more reliable, that would just cause people to take on more dependencies which defeats the whole point of understanding dependencies. So tooling would be counter-productive in that direction. The direction I want to focus on is: how can we help people understand the software they've copied into their applications? _That_ is the place where I want tooling to focus. Copying is just an implementation detail, a first, imperfect, heuristic coping mechanism for going from the world we have today to the world I want to move to that has 1000x more forks and 1000x more eyeballs looking at source code. You can see some (very toy) efforts in this direction at https://github.com/akkartik/teliva
> It’s untenable to have, e.g., everyone who works on Windows be an expert in every part of the code.
It's frustrating to say one thing in response to counter-argument A and have someone then bring up counter-argument B because I didn't talk about it right there in the response to counter-argument A. I think this is what Plato was talking about when he ranted about the problems with the newfangled technology of writing: https://newlearningonline.com/literacies/chapter-1/socrates-.... I'm not saying everyone needs to be an expert in everything. I'm saying software should reduce the pressure on people to be experts so that we can late-bind experts to domains. Not every software sub-system should need expertise at the scale at which it is used in every possible context. My Linux laptop doesn't need to be optimized to the hilt the way Google's server farms do. Using the same scheduling algo or whatever in my laptop imposes real costs on my ability to understand my computer, without giving me the benefits Google gets from the algo.
> I question the need for scale in 90% of the places where the tech industry has cargo-culted it.
Certainly our industry does a lot of premature scaling. Sometimes this is the best bet, because it can be so difficult to scale later. But certainly sometimes it’s a huge waste and creates unnecessary complexity.
> Not using, reimplementing and copying are the closest thing to solutions I have right now. You're right that they're not applicable to most people in their current context.
Yeah. This again makes sense if you don’t really need the library. If you’re only using some tiny piece of it.
Otherwise this looks a lot like Not Invented Here. Reimplementing huge pieces of functionality is rarely a win. Copying can be better but is again rarely a win. It works for cases like Chromium where a massive team was created to maintain the fork, but in general firing code comes with huge cost.
Reimplementing and forking are generally not value adds. Building software is today about creating/adding value.
Certainly it pays to be suspicious of new dependencies and ask if they are necessary. I’ve seen unnecessary dependencies cause significant pain. I have also seen variants of not invented here waste years.
> I'm saying software should reduce the pressure on people to be experts so that we can late-bind experts to domains. Not every software sub-system should need expertise at the scale at which it is used in every possible context.
I don’t understand this criticism. If I write some code, I’m realistically the expert. The person who code reviewed for me is likely the next most knowledgeable. It’s not about wanting to silo expertise. It’s about the reality that certain people will understand certain subsystems better because they’ve worked there. If I need to get something done, I’m going to first try to get the experts to do it, not because someone who typically works in another area can’t, but because it will probably take the “outsider” twice as long. It’s not a statement of what’s right or wrong, just what’s efficient. Certainly I’d knowledge becomes too silo’d, it’s a massive risk. But some siloing can be significantly more efficient, at least on large or complex systems.
> Reimplementing and forking are generally not value adds. Building software is today about creating/adding value.
Check out the writings of Carlota Perez[1]. She distinguishes between two phases in adoption of a technology: installation and deployment. Analogously, reimplementing adds value because you learn to make something better or faster or simpler. And you learn how to make it and can now take that knowledge elsewhere and use it in unanticipated ways.
It blew my mind a few years ago to learn that the Romans had knights: https://en.wikipedia.org/wiki/Cataphract. WTF, I thought they were a medieval thing. It turns out everyone in the Ancient world understood the idea that an armored person on a horse can outfight someone on foot. What was missing in Ancient Rome was the ability to teach people at scale how to fight on horseback. As a result, most cataphracts would dismount before battle. It took literally a thousand years for society to figure out how to educate people at scale on mounted combat. You don't really know how to do something just by building it once, and you don't unlock all its economic potential just by building it once. Societies have to build it over and over again, thousands of times, to really understand its value. And they gain benefits throughout the process.
> Analogously, reimplementing adds value because you learn to make something better or faster or simpler. And you learn how to make it and can now take that knowledge elsewhere and use it in unanticipated ways.
You know, there’s some truth in this. At the same time, there is only truth to the extent that the person reimplementing is actually striving to do better. I would bet that well over 99% of reimplementations are strictly worse than the thing they are mimicking.
I spent a bunch of time implementing a custom windows framework similar to WTL. It was a great learning experience. I would also say that in no way woks it have been useful to anyone for or to appear in an actual product. In most ways it was doubtless inferior to WTL. I used it for toy projects and that’s what it was.
> If I write some code, I’m realistically the expert.
Ha, disagree. This is overly general and reductionist.
See, I know the C strcpy() function is a bad idea. In this I know more than Ken Thompson or whatever Unix God did at the time he wrote it. I also know more than everybody who wrote (at the time they wrote) every other library that's on my computer that uses strcpy(). Why the fuck is this still in my system? Division of labor accumulates bad ideas along with good ones.
It's bad now. It might not have been bad when it was originally written. Back when Unix was first being developed, you had machines with 64K or 128K of RAM, which is not much, and because of that, software was much simpler, and such design compromises could be kept in mind. And when C was standardized in 1989, it might have been a bad idea at that time, but the standards committee didn't want to obsolete all the existing code (a major concern at the time). But by all means, get a time machine and plead your case.
As an example of coding for size, I recently wrote a copy of strcpy() with similar semantics. I had to, because of constraints on memory. I wrote an MC6809 (an 8-bit CPU) disassembler in MC6809 assembly code, and the prime consideration was code size. Strings ending with a NUL byte? My GOD! that will waste 80 bytes! Waste a byte on size? That's another 80 (or 160 if you insist on 16 bits for size) bytes wasted! Instead, since all the text is ASCII, set the high bit on the last character. That more than makes up for the bit of oddness in my implementation of strcpy() to deal with the last byte, and helped contribute to the code being 2K in size.
strncpy() is for a different context, and it won't necessarily NUL terminate the destination [1]. It's not a compromise for strcpy(). strlcopy() will NUL terminate the destination, but it first appeared in 1998, nine years after C was standardized, so again, time machine etc. Should Ken Thompson & co. included strlcpy()? It's an argument one could make, but I'm sure they didn't see a need for it for what they were doing. Oversight? Not even thinking about it? Yes, but I don't blame them for failing to see the future. Should Intel not do speculative execution because it lead to Specter?
Now, assuming strlcpy() existed in the 70s---is that just as efficient as strcpy()? Should strcpy() have always included the size? Again, I think it comes down to the context of the times. Having worked on systems with 64K (or less! My first computer only had 16K of RAM), and with the experience of hindsight, the idea of implementing malloc() is overkill on such a system (and maybe even 128K systems), but I won't say people were stupid for doing so at the time.
[1] I think strcnpy() was there to handle the original Unix filesystem, where a directory was nothing more than a file of 16-byte entries---14 bytes for the filename, and two bytes for the indode.
I'm getting a little impatient with all this argument in circles. The point is not what someone should have done with hindsight. The point of this little sub-thread is that saying the author is always the expert is oxymoronic. Expertise is often multifaceted and distributed.
The larger point is also that it should be possible to get rid of things once hindsight has found them wanting.
> The point of this little sub-thread is that saying the author is always the expert is oxymoronic.
Writing the code makes me the expert in that code. Meaning how it works. Why it was built the way it was. How it was intended to be extended. Yeah, it might be shitty code and maybe someone with more domain or even just general expertise would be able to do better. But if I write the code, I’m still the expert in that specific code. Someone else can of course roll in and become and expert in that code, too. Expertise in my code is not a bounded resource.
If you took my comment to mean that writing the code makes me the expert on all possible implementations or choices for the solution, or that I am best qualified to decide what the best solution is, I must have communicated poorly.
(Also, this is pedantic, but that’s not what oxymoronic means.)
> If you took my comment to mean that writing the code makes me the expert on all possible implementations or choices for the solution, or that I am best qualified to decide what the best solution is, I must have communicated poorly.
No, I didn't think that. You're all good there.
> Writing the code makes me the expert in that code. Meaning how it works.
Go back and reread the bottom of https://news.ycombinator.com/item?id=30019146#30040731 that started this sub-thread. Does this definition of 'expert' really help in any way? Ok, so some author wrote some code, they're expert in that code. Big whoop. I don't want to use that code. I wrote a shitty simpler version of what you wrote, now I'm an expert, you're an expert. How is this helpful?
I think the only expertise that matters is domain expertise. Software should be trying to communicate domain expertise to anyone reading it. So that anyone who wants to can import it into their brain. On demand.
The “expert in the code” matters in terms of siloing. That was the context where I mentioned it. The engineers who work in an area all the time and know the code best are typically best equipped to make the next changes there. Hence a silo. Maybe I completely misunderstood your comment about late binding experts?
But stepping back for a moment, I’m increasingly confused about what your thesis here is. You’ve clarified that it’s not actually about abstraction. It’s also apparently not about clearly factoring code (though maybe you think that contributes to problems?). What is the actual thesis?
Engineers should be experts in the entire code base?
Code is in general too hard to understand?
Something else?
It feels like we’re discussing mostly tangents and I’ve missed what you intended to say. If I understood your message, I might have a better mental model for the utility of copying code.
Here's another attempt, a day and a half later. My thesis, if you will.
Before we invented software, the world has gradually refined the ability to depend on lots of people for services. When we buy a shoe we rely on maybe 10,000 distinct groups of people. When we write software we bring the same social model of organization, but all the vulnerabilities and data breaches we keep having show that it doesn't work. https://flak.tedunangst.com/post/features-are-faults-redux enumerates (in better words than mine) several examples where services made subtly different assumptions, and the disconnects make their combination insecure.
Since we don't know how to combine services as reliably in software as in non-software, we need to be much more selective in the services we depend on. A computer or app just can't depend on 10,000 distinct groups of people with any hope of being secure.
Yeah that's a failure on my part. Like I said, I've spent ten years trying to articulate it and spent a few hours in conversation with you, and I still can't seem to get my actual thesis across. I don't think another attempt right now will help where all else before it has failed :) How about we pause for a bit, and we can pick it up again in a few hours. Perhaps even over email or something, because this back and forth is likely not helping anyone else. My email is in my profile.
> Division of labor accumulates bad ideas along with good ones.
Agreed. To take an example from my area of specialization, accessibility, both Windows accessibility APIs (yes, as is often the case with Windows, there's more than one) have serious design flaws IMO. But they still allow disabled people to do many things we would otherwise be unable to do. I worry about what will happen if programmers truly follow your proposal from another subthread: "build/reuse less. Do without." In this case, doing without would make computers much less useful for many people, and possibly erect new barriers to employment, education, and just living one's life.
What you’re actually saying is that you have the benefit of hindsight. That in no way makes you more on an expert than Ken Thompson who also has the benefit of hindsight.
I would guess that literally thousands of people have implemented an equivalent to strcpy with exactly the same issues (and likely more). Reimplemention does not imply a better implementation.
Decades of experience using strcpy is what taught us it’s a bad idea, not reimplementing it a thousand times.
Maybe it took you decades. One decade sufficed for most of the world.
This is starting to feel like a pissing contest. Go write your software thinking you know more than your users. Me, I constantly need to talk to mine. They're often more expert on the domain than me.
> This is starting to feel like a pissing contest.
Is there some reason it suddenly went from us discussing a difference of opinion to hostility? I have no idea what this is about.
> Go write your software thinking you know more than your users.
That’s a strange nonsequitur. End customers don’t generally tell me how to craft software. When I work with customers, it’s to understand their domain and their problems, not mine.
Again, not sure where this hostility is coming from.
Perhaps I'm coming down with something. I'm uninterested in the sub-thread past https://news.ycombinator.com/item?id=30019146#30042174. It feels like willful misinterpretation, like you forgot what I said 2 exchanges ago when you respond to any single comment.
I've said my piece in a bunch of different ways, over ten years and again ten times to you. Perhaps this is as close as we're going to get to mutual understanding, if you think after all this that I'm trying to one-up Ken Thompson or something.
I think you're using words like "expert" and "strictly worse" way too bluntly. What's better for the author is often not what's better for the user.
> My Linux laptop doesn't need to be optimized to the hilt the way Google's server farms do. Using the same scheduling algo or whatever in my laptop imposes real costs on my ability to understand my computer, without giving me the benefits Google gets from the algo.
Perhaps I don't fully understand what you're arguing. But here's a scenario, involving a hypothetical developer using similar logic, that often makes me worry. Suppose a developer reasons that the GUI for their application doesn't need to have all of the complexity of, say, a web rendering engine. So they develop their own GUI toolkit that meets their needs and aesthetic. This toolkit lacks some features that are important to some users, but not to this developer, such as accessibility with a screen reader. An application developed using this toolkit becomes popular, and companies start integrating it into their workflows, perhaps even requiring other companies that they work with to use it. Oops, now these companies can't hire an otherwise qualified blind person, or perhaps a formerly productive blind person loses their job [1]. If the original developer had accepted the current division of labor inherent in building high atop the web stack, and yes, the complexity and bloat that come with it, their application might have been fully accessible without them having to deeply understand accessibility.
Yeah, that's a really good example. In general, accessibility and internationalization seem like really good reasons to specialize. We can't all know all possible languages.
It’s not about specialization, though. It’s about getting those benefits from standard and existing tooling. You don’t need to be an expert in accessibility to build accessible software (though it helps of course). But you do need to build accessibility support, which is a lot more work when you do it from scratch.
The longer I work in the industry the more I realize the cost of “doing it from scratch”. If you ask a brand new software engineer to estimate the time for a project, they’ll basically give you an estimate for a prototype. A slightly more experienced engineer will include time to get through code review and maybe shake out the egregious bugs. A more advanced one will remember to cost for tests. I have found you have to get a pretty senior engineer if you want them to accurately account for things like telemetry and logging, deployment, live site management, accessibility, localization, etc.
I’ve seen engineers build a “simple” system and then spend 4x as long as they planned to make it actually supportable because they “didn’t need” the existing tooling. So they rewrote it all but worse.
Can you elaborate on this comment? Based on your comments elsewhere I think we're likely to be largely in agreement.
I tend to have a love/hate relationship with Conway's Law, spending half my time thinking about how to design stuff assuming it, and half my time trying to design stuff (unsuccessfully) to make it go away.
A lot of pain comes from taking a “concern” that doesn’t align with the org chart, making simple tasks take weeks to accomplish because I need to get my boss to tell your boss to do it anyway or we don’t make payroll. People tend to get precious about things they own. See also the legal department or the financial department thinking they are the company, as if they had any revenue without all of those “cost centers” or legal liabilities.
Yeah, I agree with this. There's a larger context here of capitalism evolving firms because they're more efficient, even though they dilute the price-communication value of markets to some extent. As a consequence your boss and my boss are talking about something that might affect millions of people. That results in an effect akin to regulatory capture. So one way to view my case against division of labor is as a case for pushing intelligence to the edges, as close as possible to the owner of the computer running the software.
On this larger level my suggestions are not actually that radical. Capitalism has had a long trend to push intelligence down to the edges. Kings used to use hierarchy to push decisions down. Blitzkrieg was largely about creating an organization that pushes decision-making down. And we've been getting better at it. I think we (i.e. society) can be even better at it by pushing decision-making out of the org and towards the people using it.
The copies vs dependencies thing doesn't seem like a useful framework to me -- in order to do anything with software that has dependencies, such dependencies need to be copied in some form that can be integrated into the final build. So the copies are there regardless. Different ecosystems differ on how easy it is to work with them. If all you get is a .so, then even if it's open source you're going to have to find and build your own to make changes. In Java you'll get jars, which may or may not contain java files and class files -- in the past I've unzipped jars, recompiled a single java file to replace a class file, rezipped it back up, and used that jar instead of the upstream jar. Not too bad. Some log4j 'fixes' involved stripping out some class files from jars, again not too bad, just don't accidentally clobber your changes later. Some JS projects will have additional instructions (or a bash script) that after npm downloads everything into the local node_modules, you need to go modify a particular JS file in there. It's easy for the changes to get clobbered later, but..
Lastly in Lisp land, we have quicklisp, so you point to a library dependency like normal and it downloads it locally elsewhere (so it only need be downloaded once and not for each project). When I'm writing a program using library X, and I'm curious how some function works, I can just jump-to-source and it'll take me to that local copy. If I want to change it, I can, I just edit it, and because Lisp has "compile", "compile-file", and "load" all as standard runtime functions, not separate programs, I can compile and load my changes into the running program and immediately test them. Maybe such changes are upstreamable, maybe not. You can maintain the changes in the original files, maybe going so far as a full source copy locally instead of using the ones quicklisp brought down, or just make your own local 'patch' file of redefined functions or extended classes that you load after loading the original and the patch simply recompiles and replaces things you've changed. It's also rather fun to do this to inspect and learn about the Lisp implementation you're using, what it's doing or how it's implementing things, changing stuff to see what happens/fix an edge case bug, whatever.
Part of it I think is supported by a notion Thinking Forth calls a 'lexicon'. See its early section on 'Component Programming' but in short 'lexicon' refers to words (symbols) of a component that are used outside of a component. "A component may also contain definitions written solely to support the externally visible lexicon. We'll call the supporting definitions 'internal' words." Of course, without special effort, in Forth as in Lisp even those internal words are not really inaccessibly internal. In Lisp it's the difference of referring to an exported symbol by namespace:symbol and an un-exported one namespace::other-symbol. The other supporting notion here is a that of globally available "components", which are just a product of decomposition, not some formal thing like separate processes or libraries or what have you. "It's important to understand that a lexicon can be used by any and all of the components at higher levels. Each successive component does not bury its supporting components, as is often the case with layered approaches to design. Instead, each lexicon is free to use all of the commands beneath it. ... An important result of this approach is that the entire application employs a single syntax, which makes it easy to learn and maintain. This is why I use the term 'lexicon' and not 'language.' Languages have unique syntaxes."
I'm not sure I agree about the 'easy to learn and maintain' bit but at least not making things painfully inaccessible can help, especially with re-use. There used to be OS concepts where theoretically if I wanted to play a music file, and knew I had a popular music player application installed, I could just reach in and call a function in that application, rather than writing my own or finding some "do one thing..." library. All programs on the OS were simultaneously programs and DLLs able to be used by other programs.
I don't have many arguments in this space, it's just fun to think about, but good luck on your research. You might enjoy this quote:
"When I see a top-down description of a system or language that has infinite
libraries described by layers and layers, all I just see is a morass. I can't
get a feel for it. I can't understand how the pieces fit; I can't understand
something presented to me that's very complex."
--Ken Thompson
Thank you! My ideas are definitely greatly influenced by my time in the Lisp world. Here's the first time I articulated them, back in 2011 (so before the first entry in OP): http://arclanguage.org/item?id=13263.
> in Lisp land, we have quicklisp, so you point to a library dependency like normal and it downloads it locally elsewhere (so it only need be downloaded once and not for each project). When I'm writing a program using library X, and I'm curious how some function works, I can just jump-to-source and it'll take me to that local copy. If I want to change it, I can, I just edit it, and because Lisp has "compile", "compile-file", and "load" all as standard runtime functions, not separate programs, I can compile and load my changes into the running program and immediately test them. Maybe such changes are upstreamable, maybe not. You can maintain the changes in the original files, maybe going so far as a full source copy locally instead of using the ones quicklisp brought down, or just make your own local 'patch' file of redefined functions or extended classes that you load after loading the original and the patch simply recompiles and replaces things you've changed.
Is there an easy/idiomatic way to save your edits, or do they get blown away the next time you download the library? All my comments are in the context of not having tooling for this sort of thing. I wouldn't be surprised to learn the Gods at Common Lisp figured all this out years ago. (In which case, why is the whole asdf fracas happening?!)
Heh, that thread brings to mind a bunch of things... One advantage of CL that helps with a 'borrowing' aspect that give people the jeebies is that the unit of compilation is much smaller than the file, so you can also borrow much less. Another is that methods are decoupled from classes so there's a lot of room for extensibility. (The LLGPL interestingly provides an incentive for the open-closed principle, i.e. you extend objects you're fine, but if you modify the library's then you are subject to the LGPL.) If you haven't read Gabriel's Patterns of Software book (free pdf on his site), I think you'd enjoy it.
Your edits won't get blown away, at least by default, since quicklisp doesn't redownload a system that it knows about or check for 'corruption'. The way quicklisp does its own versioning also means if you update quicklisp's distributions lists and a newer version of the library has come out, it'll download that one into its own new folder and leave the old one alone. There's a cleanup function to clear out old things but I don't know of a case where that gets called hidden from you under the hood.
Maybe there's some magic and interesting stuff related to this for emacs or mezanno OS or in the annals of ancient lisp machines but I'm a vim heretic ;) But in any case if you want to save stuff, you can just save it in a new or existing buffer... So options are basically as I described. To give a specific example, I have a script that fetches and parses some HTML to archive HN comments and I wanted to remove the HTML bits so I'd just have text, making it more markdown-y. There are lots of ways to do that and I'm pretty sure I chose one of the worst ones, but whatever. I was already using the Plump library, and after not achieving full success with its official extension mechanisms, one method I came across and stepped through was https://github.com/Shinmera/plump/blob/master/dom.lisp#L428 and I learned I could hijack it for my purposes. I started by editing and redefining it in place until I got what I wanted, but instead of saving my changes over the original, I simply copied it over to my script file, modifying slightly to account for namespaces e.g. it's "(defmethod plump-dom:text ((node plump:nesting-node))", thus redefining and overwriting the library implementation when my script is loaded and run.
Some possible problems with this approach in general include later trying to integrate that script with other code that needs the default behavior (though CL support for :before/:after/:around auxiliary methods can help here, e.g. if I can't just subclass I could insert a seam with an :around method branching between my custom implementation over the library's without having to overwrite the library's; and in the long-term the SICL implementation will show the way to first-class environments that can allow multiple versions of stuff to co-exist nicely). Or the library could update and change the protocol, breaking my hack when I update to it. Or in other situations there may be more complex changes, like if you modify a macro but want the new modifications to apply to code using the old def, you need to redefine that old code, or if you redefine a class and want/need to specially migrate pre-existing instances you need to write an "update-instance-for-redefined-class" specializer, or if the changes just span across a lot of areas it may be infeasible to cherry-pick them into a 'patch' file/section of your script, so you're faced with choices on how much you want to fork the files of the lib and copy them into your own project to load. But anyway, all those possible problems are on me.
The asdf noise isn't that big of a deal and I think is only a little related here technically since it's a rather special library situation. It's more 'interesting' socially and as a show of possible conflict from having a core piece of infrastructure provided by the implementation, but not owned/maintained by the implementation. An analogous situation would arise if gcc, clang, and visual studio all agreed to use and ship versions of musl for their libc with any other libcs being obsolete and long forgotten. A less analogous situation is the existing one that Linux distributions do -- sometimes they just distribute, sometimes they distribute-and-modify, whether they are the first place to go to report issues or whether they punt it off to upstream depends case-by-case.
>The idea that projects should take source copies instead of library dependencies is just kind of nuts, at least for large libraries.
There's a medium to be found between static linking and dynamic dependencies which I think is one of the most crucial tradeoffs to understand in software engineering. NPM in my opinion goes a bit too far on the modularization of things. Especially in large projects where you can have multiple private repo URLs and different auth tokens needed for each, it can become a nightmare to orchestrate for CI/CD. Something that takes an NPM approach, but with a more fleshed out stdlib to eliminate things like left-pad is probably the way to go.
Depends on what it is you need. If you really have a small need, then taking a giant dependency is very possibly a bad idea. But I have seen the opposite pretty often. “I don’t need that library. I just need this one thing. Okay, this other thing, and this. And this and this and this.” And by the time they’re done they’ve reimplemented 75% of the library in a far worse way.
The most egregious example I ever saw was reimplementation of xml parsing with C++ substrings. They “didn’t need” a real xml parser. In the end, they found endless bugs with their shitty parsing logic and it turns it it was responsible for something like 50% of the (very large scale) system resource use.
Taking source copies is thought to be especially FOR large libraries. You take what you need, leaving the "large" behind...if you can. You probably can. Though, libraries are developed with the assumption that they are to be consumed as one massive binary. Making it more difficult to run off with just the part(s) you need.
- Kartik argues for better mutual understanding and additional communication
- Team Topologies states you should match system boundaries to team boundaries, specifically when cognitive and/or communications load have grown unmanageable due to team size and/or software complexity
On the surface, Kartik is arguing against abstractions and Team Topologies is arguing for abstractions, but both stances feel valid to me. It feels like there's a spectrum of tradeoffs here. Is this essentially an analogue to the tradeoffs of monolith vs services?
- For small teams, by all means keep abstractions low and communication high--monolithic systems and less division of labor is ideal and your team will gel better for it. Your team is probably not large enough to warrant either services or division of labor, so stick with the monolith.
- As an organization scales, you end up with an n^2 communication edges problem, and at this point division of labor is necessary to limit the cognitive and communications loads.
I think there needs to be one more aspect accounted for, and that is the domain. The domain might naturally push you towards larger (or smaller) blobs than you would otherwise anticipate, and in such cases I don’t think it makes sense to fight the domain — you have to adapt the team and software boundaries (map) to the ground reality (terrain), and find people who can function effectively with an atypical map — be it extra complexity or communication.
I interpret Kartik’s point as stating that it’s counterproductive to split naturally integrated domains (terrain) for the convenience of map making.
To clarify, this then allows you to look inside of the API (to bunch of things talking over more APIs, yay microservices). Then you can see which bit might be causing the issue with the bit of the larger black box you are talking to
Precisely. Team Topologies strongly advocates the "Inverse Conway Maneuver", where you shape your team and communications structures towards your desired architecture.
Abstractions are not always leaky. Often the “leaks” come from wanting to push past the bounds of the abstraction (for performance, additional capabilities, whatever). A good abstraction, if used as intended, needn’t always leak.
People also often define “leaky” as “can break”, which is kind of unreasonable. The notion of a data stream across a network is an abstraction over many layers of complexity. People often say it’s leaky because of packet loss and failed connections, but that’s not a leaky abstraction unless you define the abstraction to not include a failure mode.
Abstraction is necessary, but I think OOP as practiced in many companies likes abstraction a bit too much.
At the extreme people treat everything except the code they currently work on as black boxes and code like this is written:
//this was written long ago in one class
List<Book> getBooks(BookRepository bookRepository, String author) {
var books = bookRepository.getBooksByAuthor(author);
if (books.isEmpty()) return null;
return books;
}
//this too but in a different task and in a different class
Collection<Book> getBookAboutKeyword(String author, String keyword) {
var books = getBooks(this.bookRepo, author);
if (books == null) {
return null;
}
// let's ignore the fact this should use filtering on db
return books.stream().filter(Book b -> b.about(keyword)).collect(Collections::toList());
}
//this is written now and there's a NPE
List<Book> getBooksAndFooBarThem(List<Books> books, String author, String keyword) {
var books = getBookAboutKeyword(books, author, keyword);
//foobar them
return books;
}
And of course the solution to NPE was to put ANOTHER null check in the getBooksAndFooBarThem function because anything up or down the callstack is black magic.
I have been ruminating on this very thought for some time...
"I started out mocking NPM for the left-pad debacle, but over time I've actually grown more sympathetic to left-pad. The problem isn't that somebody wrote a library for left-pad. It is that even something as trivial as left-pad has triggered our division-of-labor instincts. A library like left-pad is just begging to be saved into projects that use it so users have one less moving part to worry about at deployment time. Turning it into a first-class dependency also ensured that a lot less people looked inside it to understand its five lines of code or whatever."
I think we should be extracting code from libraries until we need the whole thing. Dragging around black box dependencies as binaries has a lot of serious drawbacks. If people had extracted the parts they needed from log4j they wouldn't have even had to think about the exploits.
Pulling out just the subcomponents you need is actually easier with a more modular design. But akkartik also opposes modularity, or at least has expressed a less-than-positive disposition towards it:
> I think namespaces and modules are socially counter-productive even when implemented in a technically impeccable manner, because they encourage division of labor.
Namespaces and modules are precisely what you need to be able to select parts of a larger system and ignore other parts. Without them, you're faced with numerous "monoliths", excessively large systems that don't decompose into reusable, isolatable bits. That forces you to import too much, or to recreate just the part that you care about (often a waste of time).
I make namespaces and modules in my own projects just so that I can use simple names like "update" or "load" un-ambiguously. Projects that nobody else has ever touched and maybe nobody ever will.
It's been a popular C++ style since before I did C++, and now it's also idiomatic Rust style.
Is OP really advocating for a C style where everything must be fully-qualified? It's horrifying to read GTK+ C code because of that. Everything is `gtk_do_a_thing` and all code operates on a `GtkTypeOfGtkObject`. One of these days I'll get RSI, I only have so many keystrokes left in me! Have mercy!
And you can't rename things, so good luck mixing incompatible major versions of Qt or GTK+ in a C or C++ project. (Actually, Qt has a compile-time flag to put it in a namespace. Maybe GTK+ has some similar ugly macro. But this requires you to compile your own dependencies, a task which is especially annoying, inconvenient, and error-prone in the C family)
I don't even think micro-packages are that bad. The problems with NPM seem to have been that the "Always upgrade immediately! Upgrades are good for security and have no downsides!" people are winning the PR fight against us "Upgrade cautiously and never allow autonomous upgrades. Upgrades are important for security but they have many caveats" people.
One is a more Windows attitude of "I know you've never gotten a virus, but if you don't upgrade, you _will_ get a virus and you _will_ die" whereas my attitude is more of a Linux "The only time my Linux system breaks is when I upgrade it."
> Software libraries suck. Here's why, in a sentence: they promise to be abstractions, but they end up becoming services.
The file abstraction seems fine. Not having to understand physical wiring of disks, the specific disk manufacturer's wire protocol, and so on is Labor saving. Many of the abstractions don't even force the seek and location concepts but allow "just put my string there" semantics.
The addition abstraction seems fine though if you have a limited length number type there are a few things to consider. Arbitrary precision libraries eliminate those nuances though perhaps move you to others like numbers that cannot fit in available memory. Yet those are rather extreme corner cases.
It's a provocative claim but the universal declaration and the confidence it seems to project seem as weak as some of those abstractions being railed against.
This is not someone making a provocative post claiming that abstractions are bad. It is someone who understands that abstractions are good but claims that it is bad to rely on libraries to solve the need for abstraction.
There is something wrong with division of labour in creative projects... the story of the service industry seems to be how this is relearnt over-and-over. A "silo" is incomprehsible in a factory: the pin-machinist isnt silo'd.
Knowledge, skills, vision, creative engagement... these are silo'd. And this follows from misconceiving creative work as a form of goods-producing labour.
Efficient factories are built on silos. You do your job in your area and the next person does theirs. You might collaborate, but you absolutely have a clear division of labor.
If you don’t have a division of labor and silos, you don’t have a factory. You have artisans.
Yes. And that's ok! The world ran on artisans for hundreds of years.
It's even better than that right now, because we already have factories: the programs. Do we need factories to build the factories? I don't think we're there yet, not to mention we don't know how to build these meta-factories right.
Quoting myself from OP:
"Libraries with non-trivial functionality take a long time to get right, and in the meantime they are produced more like guilds of craftsmen than factories. (Even if the products themselves permit factory-like operation at scale.) In that initial bake-in period we are ill-served by conventional metaphors of software components, building blocks, etc. We should be dealing more in vertically-integrated self-contained systems rather than plug-and-play libraries. More OpenBSD, less `gem install`."
Efficiency is great, but it's not the only thing to optimize for.
> Knowledge, skills, vision, creative engagement... these are silo'd. And this follows from misconceiving creative work as a form of goods-producing labour.
It’s also related to the division of decision-making, usually between workers, with little or no executive authority, and management, with some to all of the executive authority. Creative roles are typically closer to management, and their knowledge bases are siloed, as is the business’s operational experience and expertise.
Everyone wants to be close to power, but how many actually wield power effectively? What types of hierarchy are amenable to diffuse power structures? A lot of business are run like petty fiefdoms, and may not benefit from a minor change, and yet may not have the stomach for radical reform either. Worker owned co-ops and DAOs give me hope for new ways of working.
I think Hyrum’s Law should be best understood as a warning for users rather than as handcuffs for implementors: don’t rely on non-obvious implementation details.
I’ve seen software break due to a race condition when an upgrade made something faster. Hyrum’s Law tells us the performance was a hidden part of the interface. Is the solution to never optimize anything? Instead, I claim the client code was broken all along.
Where client code is something someone outside the organization implemented and is using, that makes a certain amount of sense. But when the client is another internal team, and the SaaS product the company sells broke, the customer doesn't really care that the database team and the API team don't get along - the product is down and while it's down, no sales are happening. The point, thus, of looking into other teams is to get a sense for the challenges they face and to have a less adversarial working relationship with them.
In such a scenario I'd say that the distinction between the two teams is essentially imaginary. If management is forcing the downstream team to use the new API even if it's broken, and then management is forcing the upstream team not to break the API for the benefit of the downstream team, then the API is an illusion.
What I'm trying to point out in OP is a third way beyond "build all the things" and "reuse all the things others built" that doesn't get explored as often: build/reuse less. Do without. You're absolutely right that the tools we have are the best way to operate at scale. Do we always need to operate at scale? I think we are well-served by constantly questioning that, fractally on every decision all the way down.
I simply don’t understand why a junior developer would want to work at a company that uses as much proprietary software as some shops do. What have they learned at the end of their tenure that they can apply somewhere else?
If employers hire junior developers for reasons other than that they have some capacity to train them (such as cheap labor), then, agreed, they are doing them some disservice.
But for folks that are not so junior, their insistence on rewriting libraries often is a consequence of a lack of confidence or misunderstanding about the purpose of the software project.
If you’re building a new database for use with your new language, yeah, it’s worth building it from scratch. But for many older shops, simple was made easy long ago through selection of libraries that don’t need to be rebuilt for pedagogical reasons.
We had a guy who left (a mess), before he left he asked one of the office Nice Guys his opinion on open sourcing a framework he wrote. Even the nice guy said it was a bad reimplementation of something that already exists.
The logical conclusion is that he was already looking to leave and wanted to take his homunculus with him, but my head cannon is that he realized his play time was over and people weren’t going to kiss his butt anymore.
There are distractions I find healthy and ones I find unhealthy and most weeks I’m trying to juggle the mix of both. Among the ones I don’t like are people asking me questions about my code that they should be able to answer themselves. That means I’m more open to questions from junior devs, because they can’t, but if people are stuck I want to make it easier for the next person. Or me in a year when I wander back into a project that was some definition of “done” before new requirements came in or my perspective on “good” has shifted (these two are not independent variables).
Agreed. “Be kind to your future self because time is not.”
We ought to want to train people up, but those that have been trained yet lack discipline are problematic. I’m certain that’s true across every endeavor.
What if we think of libraries and dependencies in a different way?
For instance: dependencies would be at the granularity of a single function or data type. And when you take a dependency, in your editing environment it would look just like your code and fit right in, maybe even with your preferred autoformatted style. The editor would realize that the function has an external author, and could git-pull new changes in, or you could fork the function (and maybe some merges would conflict).
The advantage is that the code would feel a lot tighter and less mysterious, though still allow collaborative development.
Rust isn't way off here. It already does a lot of tracking of individual functions during the compilation process, and docs.rs already provides quick access to a lot of source code for dependencies. Closing the last few gaps to make it feel like part of your program, rather than outside magic, could be awesome.
That’s a wildly impractical position to take and isn’t really true in my experience. I spent several years at a seed stage startup where it was me on backend and one other engineer on mobile. We would specify a JSON API and then could work independently (and very fast) to realize the functionality because he’s an expert at mobile programming and I am an expert at backend. We didn’t have the luxury of working at 50% output for a year to learn each other’s specialization while trying to get the product off the ground. He didn’t need to learn the ins and outs of rate limiting with redis and I didn’t need to know how to animate a custom UI component for each of us to be maximally productive.
Division of labor and the interfaces which enable it exist for a reason. I suspect it’s because it reduces the complexity of collaboration between engineers with different specializations. It facilitates the separation of concerns.
I feel like all of the downsides given in the quotes can be solved by making sure people get a variety of work. Define an interface and split the work into chunks. Just make sure people are being assigned to different kind of chunks should their skill allow it.
This is really just the crux of managing people and bringing out their potential. I’m not gonna split a CRUD endpoint or class between two people but I’ve also had two people write the same exact function because it was so complex.
Even in my own work, when I'm moving back and forth across the code, I'll set up interfaces and contracts where appropriate, because it's important to have stability somewhere in the evolving mess. One of the problems you can run into while free-range programming is chasing the consequences of a change back and forth, where a fix in one place requires a fix in another, requiring a fix in another...
When I mentor coders one of the core lessons I try to get them to internalize is to code from known working state to known working state. Always be able to run something, either the code or a unit test, so that you know it works as expected. Add some code and get to the next known working state. This requires no undefined dependencies. If you're suddenly in a position where you're saying "I don't know why this doesn't work", you've bitten off more change than you can handle and need to back up and stabilize by decomposing the working state you're aiming for. Interfaces and contracts help immensely with that as long term stable scaffolding, as a skeleton for your codebase.
Interfaces and contracts are good. Telling people to not care about the code that implements the interface and contract, making it difficult to jump into a Cargo crate's source code and add debug logging, making it hard to tinker with your dependencies, is bad.
while knowing your tools is always a good idea, i disagree with the overall notion of "don't divide labour" because a lot of the real-world problems software has to deal with are fractally complex. for instance, to pick a seemingly well-contained problem: if i wanted to write a collaborative text editor i would absolutely want to rely on someone else's work for font rendering or fault-tolerant networking, because each of those are problems that i could spend years on, and still not have covered all the failure modes and corner cases.
Knowing how react works (or doesn't work) helps a designer design a valid experience that can actually be implemented by the developers. Ok maybe they don't need to know React, but surely they need to know what a browser can and can't do.
A react developer (web developer) needs to know how tcp/ip networking works to avoid spending hours of debuging communication problems with XYZ service.
And so on.
Surely people can go about doing their jobs without necesserily knowing how the adjecent stacks work, but they would be way better professionals of they did. I would agree more with your argument if you restricted the distance of irrelevant tech-stack to let's say 3 levels. Eg, the web developer truly won't do any better if they knew more about kernels.
I like thinking of it like the Olympic rings. The blue ring(all the way to the left) doesn't need to know anything about the red ring(all the way to the right), but should absolutely know the yellow ring(second ring from the left), and possibly bits of the black ring(middle ring). The leakier the abstraction, the more you need to reach.
Nobody's trying to build a super rigid structure around the division of labor, they're just trying to create a general set of guidelines that will help answer questions quickly.
Do not use a library without reading it's source code. Software problems are hard, and you can't hide from the complexity by letting someone else do the thinking. You have to know what the library author knows, so you can determine what's possible, and decide if the library provides you with the right abstraction.
In my lifetime Apple has changes hardware platform twice while still running on the Next Foundation API. Same thing with politicians, they never simplify the complicated laws but instead expect software to manage and work around this complexity. Nothing to do with the division of labor but more the packing order of software engineers.
How do you know when you have to? There might be dozens of libraries in just the current project that we could be using more effectively if only we'd taken the time to read the source.
We have this wonderful organ called a "brain". It appears to be the center of reasoning. We apply that to the situation and end up drawing a line where we say, "Beyond this point, I will not look. I may learn what it does, but I will not dwell on how it does it, neither in principle (conceptual model) nor in fact (actual code). Because if I were to investigate this, and every other such system, I would find that it is turtles all the way down and I will never actually complete my objective. Unless I choose to freeze all these dependencies at this one moment in time and never permit them to change. If, in the course of my work, I find that this particular area is a source of trouble or a possible source of solution, then I will investigate further. On my own, with my team, directing a team member, or contacting the responsible party."
It's a noble goal to make systems that are comprehensible, and one that should be striven for more often. It's also a noble goal to try to fully understand the stack of dependencies. But short of freezing time (this subsystem will never be permitted to change after I have studied it, except by my own hand), spending a lifetime on a more narrow field, being in the top 0.0001% of intellects, or working on relatively simple systems, it's infeasible in the general case. Once you try and do something even as "simple" as build out an avionics system for an aircraft, it will quickly explode into too many domains for any one person to truly comprehend every detail.
You are certainly much smarter (or more inexperienced in aircraft software design) than me if you think avionics systems are simple.
But yes, I also have a brain and can reason with it. You don't have to restate the obvious as if it is some deep wisdom. But parent comment made the point that you should only look at library code when you have to and I countered that you cannot know when you have to without looking. Your counter-counter that you must think about what you have seen does not help with deciding when to start looking in the first place.
Why do you think there's some algorithm you should run that will, in some determinate sense, tell you how to behave here?
The comment you're replying to was poking fun at you for assuming there would be some objective set of criteria, when in reality there isn't anything of the sort.
There are a lot of heuristics I use when picking out a 3rd party library, but I'm an engineer and these heuristics are practical, so I doubt you'll enjoy thinking in those terms, if your prior comments are any indication of what you're looking for here.
As a fellow engineer, I would indeed like actual reasons for why to pick one library over another. Even if that reason is as simple as "profiling shows we spend 99% of a request in this single library". Falling back to experience, heuristics and "just use your brain" can work in some circumstances (and with sufficiently experienced developers), but it is not how an engineering discipline matures.
...do you seriously not know how to evaluate 3rd party libraries? I can help you if you want, but honestly if this is a challenge for you, I'd be worried about whether or not this is the right career for you.
Engineers get shit done, first and foremost. If that's not your primary concern, you're going to spend a lot of time frustrated by your bosses priorities and confused about why your peers value things you don't.
This goes back to the discussion we had on https://news.ycombinator.com/item?id=29998868, about whether sofware developers are really engineers or not. If you think "getting shit done" is the only or even the primary concern for an engineer, I think it is best if you stay in software (and please far away from any safety critical code).
> and please far away from any safety critical code
To a rough approximation nobody in the software engineering industry works on safety critical code. It comes with an onerous set of restrictions and criteria to do it well, including fully analyzing all of your dependencies for runtime behavior, etc.
If you think applying that to the vast majority of software engineering makes any sense, you’re going to fail hard. It’s no different than applying the rigor that goes into engineering a fan blade on a GE90 to a toilet flush handle on a household toilet.
I would frame what you describe as Safety Critical Code™, and agree with your assessment that very few work on such things.
I expect a lot more of us work on safety critical code that doesn't fall into the first category. Stuff that we rely on to produce results in a reliably timely matter whose failure results in material degredation of someone's quality of life. As an example: The Alexa communication infrastructure has been used to save lives, but Alexa devices are not "Life Saving" or "Medical" devices. I believe we spurn the duty of disciplined engineering at our peril.
I will not hire anyone to my software teams who thinks their job is to create art for its own sake. If you can’t tie your work back to creating value, you have no place on my teams, and that includes wasting time rebuilding worse versions of libraries that already exist.
It's pretty sure that I wouldn't want to hire someone as argumentative as you either, but I don't see how we went from "Don't think about how a library works - until you have to" (which I still think is hilariously circular and not very useful) to accusing each other of rebuilding worse versions of libraries just because. In fact, a few comments up we touched on having actual facts as the basis for decisions so how you interpreted that as "art for it's own sake" is a mystery to me.
In any case, have a good evening good sir and all the best in your software engineering endeavors.
You are yourself all the time, and either you act this way all the time, or you don’t feel safe to be yourself on your work team, which indicates an unhealthy team environment.
Unless you’re saying you’re uncharacteristically acrimonious on HN, which would itself be an even odder choice…
I put it in quotes for a reason. It’s been most of my career at this point.
My point is that there is no hard and fast rule for when to stop reading or studying the stack of software and hardware. But at some point your own judgement enters into it because only you know your situation and objectives. Asking others when you should stop is inane, they cannot answer it for you and you have to use your own brain.
If you have the intellect and time to spend reading the entire Linux and GCC source. Go for it. Most of us don’t, so lines get drawn.
If only one person knows how a library works, that is a problem. If a 100 person team maintains it and you're not on that team.... it's probably because you have other stuff to do.
Software is an engineering discipline. Computer science is a science. If you are in programming because you want to advance our understanding, great, go work in one of the many fields with large novel algorithms that need to be understood.
For a typical programmer... Look around. Modern software is one of the best things about the modern world. It does SO much for us. Do you really think we, with a real world distribution of programmer abilities, could do all this without massive division of labor? How much would be missing if we insisted on understanding everything before we use it?
I suspect very few alternative approaches to software would work as well to truly build the modern software defined world where everything from cars to taxes to photography is digital.
Because... whenever someone tries an alternative approach, there usually seems to be the hidden unspoken assumptions that they actually don't want software to be as big as it is.
The end product of software itself these days is a service(in the sense of this article, not the SASS sense), wheras software built with an understanding-first mindset seems to usually wind up being closer to a digital version of a passive analog tool.