In software development it's pretty important to know when to build "on top" of something else, and when to start from scratch.
Lots of developers will find it much more interesting, challenging, rewarding and just plain fun to develop something from scratch, even when there are better things that already exist.
They'll cleverly manipulate and convince the boss, against the better discretion of their elder developers, that they can do it, and if they're one of the better developers, the boss won't want to risk losing them so they'll agree to the escapade.
Then said escapade turns into a shambles, as predicted by the elder devs, and the developer who created the mess simply quits and moves to some other job, in search of more fun and greener pastures. Any developer with decades of experience has probably seen this same pattern multiple times.
This is a sentiment that I've seen expressed in comment sections many times. I've been programming professionally now for 10 years, and it just doesn't resonate with my experience. Problems with build systems for external dependencies, package managers, and underfeatured / overcomplicated / buggy third party dependencies have been by far the worse issue in my career, compared to problems with homebrewed systems.
I'm not saying you're wrong, I don't doubt that many people have the opposite experience. It just makes me feel a bit alien when I read comments like this.
Thanks for saying this, I feel this way all the time even though I know it’s against the prevailing wisdom.
My experience is that in the pursuit of not reinventing the wheel, I am frequently told to use a dependency that doesn’t allow us to solve the whole problem, or prevents us from making making the user experience fast or cannot be made to understand our data model. It’s all well and good to use a tool that exists, but using the wrong tool just because it exists is madness. Even worse is when dependencies are deprecated or our use cases become unsupported. Honestly I would prefer to just build everything above the database layer in house, that way we at least know what we can and can’t deliver, and have some chance of fixing things when they break.
I am practically having this conversation at work. There's a sister team with a great tool for benchmarking what they are working on, but it is not convenient for our needs, and I am told to "just do the plumbing to make it work for our needs". Reality is that there are far, far easier ways to achieve what we need than doing all that plumbing, adding more layers of abstraction on top of what is a side project of an adjacent team.
The problem with being smarter than average programmer is that your insights will rarely ever be considered, even if they're correct, because they're new and controversial. That's because, from the perspective of an average programmer, a bad programmer who doesn't know what they're doing, and a programmer using techniques so advanced that they cannot be understood, are effectively indistinguishable, which means that average team will treat both geniuses and morons in the same way.
I feel like the collapse of the tech bro coincided with the masses going to programming, which changed the culture from promoting innovation and development, into simply following whatever best practices someone had already written, turning programming from a creative job into yet another repetitive office job. This is also, in my opinion, the true reason why salaries collapsed. Most business don't need creative specialists, they need code monkeys, and most people aren't creative specialists, they're code monkeys. So why would the salary worthy of a creative specialist even be talked about over here?
There is a legitimate concern that if the smarter programmer quits the job, the remaining average programmers will not be able to maintain the code.
I think a smart solution would be to teach the average programmers the new concepts. Many of them would probably be happy to learn, and the company would benefit from having everyone know a bit more and use better solutions. But for some reason, this usually doesn't happen.
And your second paragraph sounds like sour grapes. I have no idea what "the collapse of the tech bro coincided with..." means. Most programmers are working on CRUD apps. How creative do you need to be?
Not targeting you, but the industry in general. In every other industry I've been in outside of software dev, 10 years is not considered elder. You're just now becoming not a greenhorn. You're just now getting your sea legs. It's amazing what additional experience happens after year 10.
To that effect, Rust (2015) is 9 years old, Go and Node are 15 years old. While Python (1991) is 33 years old. Just putting things in a different perspective
I’ve been in this game for 30 and I agree with GP. “I won’t build that simple thing from scratch, I’ll just import this thing that does approximately what I want.”
We should banish the word “import” in favor of “take a dependency on someone else’s code, including the stability of the API, the support model, willingness to take patches, testing philosophy…”
Reputation is a rough proxy; inspecting the code can help. But when the thing you built your house of cards on falls over, you often can’t fix your house, and have to build a new house.
Obviously this applies more to utility code than it does to entire languages. But even there, Apple has broken their Swift syntax enough to release tools that upgrade your code for you…and that’s the best case scenario.
I’ve been in the industry for > 20 years and if anything, I think most people are too scared or lazy to reinvent code.
I’m not suggesting the earlier argument about NIH (not invented here) syndrome doesn’t exist. But I’ve certainly never seen in the scale that the earlier posted claimed. If anything, I see people getting less inclined to reinvent things because there’s so much code already out there.
But maybe this is a domain specific problem? There does seem to be a new JavaScript frontend framework released every week.
- Ethernet. Too boring. Create a new physical layer for networking.
- Naturally create your own programming language. That'll make it much easier to find people when you have to expand the team.
Seriously, build vs. buy and NIH has always been with us. There's a time to build and there's a time to buy/reuse. Where are you adding value? What's good enough for your project? What's strategic vs. tactical? How easy is it to change your mind down the road? What are the technical capabilities of the team? How do different options impact your schedule/costs? How do they impact quality? In the short term? In the medium term? In the long term.
The reality is there is no way to build everything. You want to do scientific computing do you use libraries that have been optimized for 50 years or do you write your own? You want to do cryptography do you build your own? Pretty much everyone working on LLMs today is leveraging things like nccl, cuda, pyTorch, job scheduling frameworks.
Let's face it. Nobody builds everything from scratch. The closest is companies like Google who due to sheer scale benefit from building everything from hardware to languages and even for them it's not always clear whether that was the right thing for the business or something they could afford to do because they had lots of money.
Build the things that add value. Don't build something that just works. That's why we have the old saying don't reinvent the wheel. If you have a working wheel, while re-inventing it might be fun, it's usually not the best use of time. In the time you've saved build cool things that add value.
gotta say, having written some scientific computing code, the libraries out there do not always cover the exact operation you need and are not always using the best algo in the literature. i was able to beat the existing ecosystem 6x head to head and thousands of times faster for my use case. ymmv ofc depending on the problem.
I worked on some proprietary video/image encoding application. In that context we hand wrote things like colour space conversions, wavelet transforms, arithmetic coders, compression algorithms, even hashing functions, in SIMD and we got better performance than anything off the shelf. We still used some off the shelf code where we could (e.g. Intel's hand written libraries). The thing is that this was the core of our business and our success depended on how performant these pieces were. That was also some time back, maybe today the situation is different. In this sort of situation you should absolutely put in the effort. But that typically accounts for some small % of the overall software you're going to be a user of. This is really just another variation of the premature optimization statement: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%". So if you're in the 3% then by all means go for it (you gotta). But if you're in the 97% it's silly to not use what's out there (other than for fun, learning etc.)
Let's be honest, nobody is saying to rebuild the world from scratch.
The stance for in-house built tools and software is a much more balanced act than that. One that prioritises self-reliance, and foments institutional knowledge while assessing the risks of making that one more thing in-house. It promotes a culture where employees stay, because they know they might be able to create great impact. It also has the potential to cut down the fat of a lot of money being spent on third parties.
Let's be real, most companies have built Empire State Buildings out of cards. Their devs spend most of their time fixing obtuse problems they don't understand, and I'm not talking about programming, but in their build processes and dependency hell.
It's no wonder that the giants of today, who have survived multiple crisis, are the ones who took the risk of listening to those "novice" enthusiastic engineers.
Sure. We should harness enthusiasm and channel it in the right direction.
I'm not sure I agree the giants of today are built on the work of enthusiastic novices. Amazon and Microsoft have always had a ton of senior talent. Meta started with novices but then a lot got reworked by more experienced people.
You might get by with sheer enthusiasm and no experience but often that leads you to failure.
This is a bit disingenuous, don't you think? There's technically a spectrum from low-level to high-level code, but in practice it's not too difficult to set a limit on how far down the stack you're willing to go while building. Writing a new testing framework is qualitatively different from writing a new OS or filesystem, and you know it just as well as everyone else does.
I was trying to make a point... Apparently not very well ;)
But let's take on the testing framework question. When I work in Go or in Python I use the testing frameworks that are part of that ecosystem. When I worked with C++ I'd use the Boost testing framework. Or Google's open source testing framework. Engineers I work with that do front-end development use Playwright (and I'm sure some other existing framework for unit tests).
Can't you do your own thing? sure. You'd be solving a lot of problems that other people have already solved. Over time the thing you did on your own will require more work. You need to weigh all of that vs. using something off the shelf. 9 times out of 10, most people, should use tooling that exists and focus on the actual product they want to build.
That said I work for a large company where we build a lot of additional in-house tooling. It makes sense because: a) it's a specialized domain - there's nothing off the shelf that addresses our specific needs. b) we are very large and so building custom tools is an investment that can be justified over the entire engineering team. We still use a ton of stuff off the shelf.
I don't see what the parent was saying. I think most of the time people choose to use existing bits for good reasons. When I started my career you pretty much had to do most stuff yourself. These days you almost always can find a good off the shelf option for most "standard" things you're trying to do. If you want to write your own testing framework for fun, go for it. If you're trying to get something else done (for business or other reasons) it's not something that usually makes sense. That said it's not like we have a shortage of people trying to do new things or revisit old things, for fun or profit. We have more than ever (simply because we have more than ever people doing software in general).
It just feels different in software development because things have moved very fast, I'd say especially when github rose to prominence. The amount of software developers on the market has also increased exponentially since then, so the amount of (relatively) junior developers is much higher than those of 15, 20+ years of experience.
The number of software developers has maybe doubled in the last 20 years. The number of senior developers has "always" been low, because the field suffers from unusually high attrition. Many people find that software is not for them, many switch fields after losing their jobs in an economic downturn, some move to management, and some make too much money to bother continuing until retirement age.
I'm reasonably sure that this estimate is far off the mark. The numbers I've seen suggest that the number of new software developers entering the industry has doubled every five years since at least the mid 90s. That's not the same metric as total number of developers, but it may as well be, and it definitely doesn't add up to a mere doubling of the total in twenty years.
Has there actually been attrition? Exponential growth is enough to explain "many more juniors than seniors" at any time in the past, present or future.
Also for attrition to be the cause, you'd need a lot more seniors dropping off than juniors.
None of my friends who graduated with me are still software developers and I’m several years from retirement age.
There’s a bunch of filters. Many people quickly realize they don’t enjoy development, next is openings in management. One of the big ones is at ~40 you’re debt free, have a sizable nest egg, and start thinking of you really want to do this for the next 20 years?
A part of this is the job just keeps getting easier over time. Good developers like a challenge, but realize that the best code is boring. Tooling is just more robust when you’re doing exactly the same things as everyone else using it, and people can more easily debug and maintain straightforward code. So a project that might seem crazy difficult at 30 starts to just feel like a slog through well worn ground.
Having significant experience in something also becomes a trap as you get rewarded for staying in that bubble until eventually the industry moves on to something else.
I recently hit thirty years of professional software development, in companies large small, profit and non profit, proprietary and FOSS, I have led teams of forty, sat in a corner as the only developer and one thing I know in my bones - I love making software and money just means I get to code what I want instead of what The Man wants.
In fact I already have my retirement planned - a small near flat in Argostolli, a walk down to the coffee bars on the harbour and a few hours adding code and documentation to a foss project of my choice before heading to the beach with grandkids.
Now affording retirement might be interesting but not having coding in it will be like not having reading and writing
You're probably from a privileged environment such as working in the US (probably in a top location) and probably from a top university or you were there at the right time to join a top company as it grew rapidly.
The first paragraph probably applies to 1-10% of developers worldwide...
The only part of that that applies to my friends is living in the US. Programming pays well just about anywhere for that area even if the absolute numbers are less extreme.
I also don’t mean early retirement. Still, combine minimal schooling, high demand, reasonable pay, and the basic financial literacy of working with complex systems adds up over time.
The exponential growth has been something like 3-4%/year, or 2x in 20 years. Though it's hard to find useful statistics that take different job titles and changing nature of the industry properly into account.
If you had asked me in 2010, I would have said that the median software developer lasts 5-10 years in the industry. A lot of people left the field after the dot-com bubble burst. The same happened again in a smaller scale in the late 2000s, at least in countries where the financial crisis was a real-world event (and not just something you heard about in the news). But now there has been ~15 years of sustained growth, and the ratio of mid-career developers to juniors may even be higher than usual.
Heavy use of macros could be why C went mainstream.
Macros gave C efficient inline functions without anything having to be done in the compiler.
Doing things like "#define velocity(p) (p)->velocity" would instantly give a rudimentary C compiler with no inline functions a performance advantage over a crappy Pascal or Modula compiler with no inline functions, while keeping the code abstract.
And of course #if and #ifdef greatly help with situations where C does not live up to its poorly deserved portability reputation. In languages without #ifdef, you would have to clone an entire source file and write it differently for another platform, which would cause a proliferation due to minor differences (e.g. among Unixes).
Ah, speaking of which; C's #ifdef allowed everyone to have their own incompatible flavor of Unix with its own different API's and header files, yet get the same programs working.
An operating system based on a language without preprocessing would have hopelessly fragmented if treated the same way, or else stagnated due to discouraging local development.
Thanks in part to macros, Lisp people were similarly able to use each other code (or at least ideas) in spite of working on different dialects at different sites.
You're quite right in that early C was a primitive compiler, and adding a macro processor was a cheap and easy way to add power.
Using the macro preprocessor to work around some fundamental issues with the language is not what I meant.
I meant devising one's own language using macros. The canonical example:
#define BEGIN {
#define END }
We laugh about that today, but the 80's people actually did that. Today's macros are often just more complicated attempts at the same thing.
The tales I hear about Lisp is that a team's code is not portable to another team, because they each invent their own macro language in order to be able to use Lisp at all.
To be fair, I'd rather type BEGIN instead of <<? Or whatever the trigraph is supposed to be. We tend to forget that a lot of computers didn't have the keys to type "mathematical" symbols.
EBCDIC was already dead in the 1980s. Nobody ever used the trigraphs except for one company that hung on for years until even the C++ community decided enough was enough and dumped them.
No because even if I could identify a benefit to these macros (which I can't in the contexts in which I work) there's a cost to using them.
Macros whuch simply transliterate tokens to other tokens without performing a code transformation do not have a compelling technical benefit. Only a non-technical benefit to a peculiar minority of users.
In terms of cost, the readability and writeability are fine. What's not fine is that the macros will confuse tooling which processes C code without necessarily expanding it through the preprocessor. Tooling like text editing modes,identifier cross-referencers and whatnot.
I've used C macros to extend a language with constructs like exception handling. These have a syntax that harmonizes with the language, making then compatible with all the tooling I use.
There's a benefit because the macro expansions are too verbose and detailed to correctly repeat by hand, not to mention to correctly update if the implementation is adjusted.
Rust was started in 2006 and launched publicly, I believe, in 2009, the same year as Go. The point stands that these are still fairly new, but it’s not nearly that new.
Rust 1.0 was released in 2015 making it almost ten years old.
Rust, unlike Go, was largely developed in public. It also changed significantly between it's initial design and 1.0 so it feels like "cheating" to count pre-release versions.
That's right. One of the knocks on those early versions was that every new release broke previous code in significant ways. Which is one reason that v. 1.0 was so important to the community. They could finally commit code using a stable language.
Weird, cargo and crates.io is why I ended up deciding on Rust for developing finl rather than C++. The lack of standardized build/dependency management in C++ was a major pain point.
Just because there is an absolute shitshow for C/C++ build systems doesn't automatically make Cargo & Crates.io good.
There is a fundamental philosophical disagreement I have with the NPM style of package management and this method of handling dependencies. Like NPM, Crates.io is a chaotic wasteland, destined for a world of security & license problems and transitive dependency bloat.
But honestly I'm sick of having this out on this forum. You're welcome to your opinion. After 25 years of working, with various styles of build and dependency management: I have mine.
I wasn't disagreeing with you. My comment was implying that cargo (and arguably rust itself to some extent) was kind of a knee-jerk response to the insane parts of C/C++, for better and also for worse.
Well "elders" are the people who have been there for the most amount of time, so if the industry has >30 year veterans wandering around then the elders will have around that much experience. But the learning in an industry is generally logarithmic where most of the lessons get picked up in the first 1-3 years and then after that there are only occasional new things to pick up.
If anything software seems to be unusually favourable to experience because the first 5 years of learning how to think like a computer is so punishing.
I've been there, on both sides, with homebrew ideas pushed from up and down, some that worked nicely, and some that were complete disasters...
And I agree with you. The problems with third party dependencies are way worse than any in-house complete disaster.
But that happens almost certainly because everybody is severely biased into adding dependencies. Make people biased into NIH again, and the homebrew systems will become the largest problems again.
In the two last projects I have worked on I have been lucky to work with great younger developers that neither invent things from scratch nor insist on pulling in exotic dependencies.
We have used mainstream technologies like just Quarkus or Spring Boot and plain React with Typescript and the absolute bare minimum of dependencies.
I have worked with a number of good devs over the years but it is amazing how productive these teams have been. (Should probably also mention that we were also lucky to have great non technical people on those teams.)
"because everybody is severely biased into adding dependencies"
When I make a request to chatGPT to show me a example of something with javascript and node - it always brings me a solution with a external libary needed. So I have to add "without external dependencies" - then it presents me a nice and clean solution without all the garbage I don't need.
So apparently adding yet another libary seems normal for many people, otherwise this behavior would not replicate in an LLM.
Same. My standing system prompt for Claude is “do not suggest any 3rd party libraries unless I ask for other options.”
Python perhaps isn’t quite as bad as JS in this regard, but people still have a tendency to pull in numpy for trivial problems that stdlib can easily solve, and requests to make a couple of simple HTTP calls.
fascinating. What sort of web requests are you doing stuff are you doing that it's not just easier to use requests? I use requests pathologically, though I use things in the stdlib when it comes to it. personally I'm more surprised that it isn't in the stdlib by this point.
It's not a matter of easier, it's that I'm against adding dependencies when they're not meaningfully adding value. It's not that hard to use urllib.request if you just need to pull down some file in a script, or fire a web hook, etc.
If you need connection pooling, keep-alive, or any of the other features that requests adds, then sure, add it.
My point was, that in the js/node universe - the extreme stances seem to be the default already.
(Unless the LLM is heavily biased towards certain code, maybe because blog entries promoting their libary got too much weight, but looking at a random js source of a webpage - they do mostly ship tons of garbage).
I have seen this argument made many times, but none of the examples used illustrated it properly. Past a certain point, one becomes suspicious of the argument itself.
Have you ever wondered why padStart is part of the standard library?
You are unaware of a core part of JavaScript history, which is why you don't understand why "I'm not importing a library to do left pad" is not only a proper example, but THE BEST example.
The left-pad incident was a problem with the build toolchain, not a problem with using a dependency. String padding is one of those fiddly things that you have to spend a couple of minutes on, and write 4–5 tests for, lest you get an off-by-one error. It makes perfect sense to bring in a dependency for it, if it's not available in the standard library, just as I might bring in a dependency for backprop (15 lines: https://github.com/albertwujj/genprop/blob/master/backprop.p...). My personal style is to reimplement this, but that doesn't mean it's foolish or unjustified to bring in a dependency.
It is, however, almost never justified to bring in a dependency for something that's in the standard library. The correct solution for that, in JavaScript-for-the-web, is a shim. left-pad is not a suitable example.
This depends on languages. For c/c++ without a package manager, people hesitate to add external dependencies (and especially when high performance is needed, e.g. game engines).
It surely does, but I think in most languages it is a bad idea, to add a external dependency, where it isn't needed. Like in the cases I mentioned, it was just standard stuff, already covered by the browser standard libaries.
So when I just need a simple connection to a web socket - I don't need that external baggage. But for high performance graphics, yeah, I won't write all the WebGL code by hand, but use Pixi (with the option of writing my own shader where needed).
I think this depends a lot on whether you're already using high level languages and lots of external libraries vs doing lower level programming using something like C/C++. I managed a large dev team in a Microsoft shop and it would never have occurred to anyone to ever create their own compiler. Even the most experienced programmers would have just continued to brute force things atop .Net's compiler until it eventually "worked". The result, combined with esoteric and poorly understood business requirements, was fragile spaghetti code few could parse for bugs or updates, but it was still several layers above the compiler.
This attitude is by far the most common among "enterprise developers", and one of the big differences between people building things from preexisting building blocks vs -- as witnessed from my 8 years at Google later -- people who think they're smart enough to build everything from the ground up, and do so, using primitive blocks and custom compilers created by similarly hubristic engineers who came before them.
Ymmv, but this has been my experience over the past 25 years.
To be fair expression trees offer nice capability to write your mini-dsl, then map it to expressions and then compile it.
It’s just an uncommon attitude to most enterprise teams, it has less to do with the language and more to do with the part of the industry. I wish more teams knew the tools they already have at their disposal.
It's people trying to generalise some rule over the wrong thing. The right thing is that, in both directions, how the project goes is simply a skill question.
You have unskilled, sloppy developers? The homebrew project AND the third-party integration will turn out a mess.
I've spent most of my career in the infrastructure space and I agree with this so much. These days prevailing wisdom is just to use 20 off the shelf open source components and spend your entire day debugging YAML integrations. I think we've lost our minds a bit because of this prevailing wisdom that building a simple wheel that does the 10% of this you actually need is somehow self-indulgence or negligent or both.
Strong agree here. I tend to try as hard as I can to write as much as I can in house so that when shit hits the fan, I have a great chance of being able to do something about it.
Shelling out to an AST parsing library that happens to be slow? Well, shit, that sucks. Guess your compilers just slow now.
The argument is not between NIH and external deps. The argument is over needless complexity and brittle unreliable bits (which can come through either channel) vs keeping things simple.
In my experience, younger developers will push both (in-house and external) directions at once, actually. Building out complex edifices with sharp corners over a maze of transitive dependencies that few understand.
It's the same thing: A fantasy that a framework will solve the problem, combined with a fantasy that they can develop said framework. It's an urge we all suffer from but some of us have learned the hard way to be careful about. (And others who are great at self-promotion have been rewarded for it by naive investors and managers.)
This kind of thing admittedly isn't as pervasive in the last decade as it was the two before, so if you've been a dev only since 2014 years you may not have seen it. The old people like me will get it tho.
Yes. And one of those vaunted differences between 'senior' and 'medior' is knowing the difference.
Because I can confirm what you said: Both experiences are real.
Brewing it up yourself can blow up in your face.
Reaching for external deps that solve the problem can blow up in your face.
Knowing which choice to make is _tricky_ and is hard to confirm. It doesn't sit well with programmers; either solution will _work_ (you can't write a unit test that 'fails' if you made the wrong choice here), and even if you're willing to accept highly suspect, Goodhart's law-susceptible metrics such as LOC, you still can't get anywhere because it's trading off more code you have to write and maintain without help from a larger community against having fewer lines _in total_ as part of the system.
I do not know of any way to do it right other than to apply a ton of experience. And it's really hard to keep yourself honest. Even if you're willing to wait 5 years and then spend some time looking back, how do you really know?
Anybody with a bunch of experience has seen enough homebrew stuff asplode in their face to be able to paint a picture with how utterly badly that choice could go. If you chose the 'build it on external deps' route you can easily tell yourself you did it right by painting a terrible picture of how it would have gone if you made the other choice.
But the reverse is just as true.
I think I'm really good at it. But, writing about it here, I don't have any real basis to make that claim. I look around at other dev shops that make products of similar complexity and it feels like they need 10x to 100x more resources, have more downtime, and have far larger dev teams. But no doubt bias is creeping in there too, and no 2 software products are 100% comparable in this sense.
I naturally trust homebrewers more because they tend to understand complex technical things better. Someone who can just glue libraries together is lost when I ask them to fire up a debugger and figure out why some interaction is not working. A hopeless NIH sufferer needs to be 'supervised' and their choices about what to write needs to be questioned, but, that's doable with supervision. "Just git gud and be technically proficient" not so much. But then maybe that's bias too - that leads to a codebase that is easier navigated when you're familiar with debuggers and reading code to understand it. Reaching for third party deps a lot leads to a codebase that is easier navigated when you're familiar with docs and tutorials. These are self fulfilling prophecies.
Been in software 25 years and was board design for telecom before that.
Vaunted is a vanity word.
I get what you mean based upon experience and I still prefer custom systems that eschew layers of syntax sugar. EE was way harder than managing a code base. The syntax patterns are of a finite set of values.
I won’t re-roll encryption libs but there’s a lot of “tooling” packages that just add syntax sugar to parse and cart around that came about in prior eras of sneakerware software that are baked into to development habits that no longer make sense.
I for one am excited about using ML to streamline the code base that comprised my preferred Linux system yet still builds to the usual runtime system. There’s a lot of duplication in code that models can help remove and a few tools can help unpack into machine state. EE brain informs me there’s no “code” in a running system. Just electricity. There’s way too much syntax sugar in the software ecosystem that’s just for parsing/marshaling/transpiling between syntax sugars. Bleh. It’s a big dumb monolith of glyph art that needs to be whacked back like a prairie that needs a controlled fire.
Well, foreign projects communicating with each other is always ground for a mess, but this is not an either-or question.
Also, your mileage may vary based on the niche you are working on - in case of, say java, the initial setup of the build system may not be "fun", but it will just work from then on.
The worse trash fires were the homebrewed systems, but maybe that's because I could dig in and see how bad they were.
But I'd actually agree with you - as bad as those were, I'd rather them than a shitty 3rd party something. At least I can theoretically do something about the in-house one, and, all the ones I've seen were smaller in scope than any SaaS product.
There’s a flip side to this which is building on top of which solves your problem but aren’t actually suitable. It “works”, but often at the expense of someone else. A great example is homebrew and GitHub. Or, making a shim between something that solves 80% of your problem rather than solving the problem yourself.
The mark of a 10x engineer IMO is getting the build vs buy question right consistently. My experience is that teams get it wrong often in both directions
> even when there are better things that already exist
That's a "big if".
Lots of times what's there is a nightmarish tangle of technical debt left by previous greenfield devs. The dev who gets to maintain and evolve this dreck is the sucker, scapegoated for ever slower development.
Canonical example: on-call AWS engineers working hellish overtime to close tickets on one of AWS's many terribad fragile codebases.
I have only seen this problem in elder devs. Some people simply seem to believe they are selected by god to hand out their frameworks for poor juniors to be forced to work inside. Sometimes they are just a founding engineer, they were the only devs in a startup, or one of five.
These senior devs often quit or are fired and leave the rest of the developers with their “good ideas”.
I have never seen a junior taking on something new that is inherently huge and complex. I have seen them go overboard with refactoring, because someone tricked them into thinking the Boy Scout rule is good, or that DRY is important, or that they need to think ahead and abstract/generalize for the future. Inevitably that is something they were “taught” by senior colleagues or teachers.
A corollary to this is the pandemic of phobia for NIH. A lot of developers really seem to prefer janky, undermaintained third party libraries with huge APIs over a quick home made hack to solve exactly the problem your team has, and you can maintain and test and just know everything there is to know about. Building your own stuff is good. It is the business we are in.
I think what you're calling "elder devs" is actually the "intermediate" devs. They're not junior in any sense if they're capable/allowed to create these huge balls of mud we're referring to. And the elders normally have seen way too much to fall into that trap... and definitely don't quit often like you're describing at all (my experience is that the younger you are, the more often you change jobs - which is good for you as it's been shown this is the best way to get a good paycheck, but bad for employers, of course). They're tired of that constant churn and have had more than enough time to find a place where they're comfortable. The OP is likely talking about those as well, but from the perspective of someone who probably is truly senior and has been doing this at least since the 90's... basically, they're talking about the devs who know just enough to be dangerous (some will enter this stage from around 3 years of experience to 10, others may stay there from 5y to 20y - so it's difficult to group them together in a neat group) just like you're doing, but to you, they look senior as well.
> A lot of developers really seem to prefer janky, undermaintained third party libraries with huge APIs over a quick home made hack to solve exactly the problem your team has
Sometimes it’s not even that they’re janky and undermaintained, it’s just the huge and unnecessary API. A good example is watching for file changes. inotify has been around forever, and is easy to reason about. The Python library inotify_simple [0] just wraps that. That’s it. It works extremely well, has no dependencies of its own, and provides nothing else. I once needed this functionality for a project, and had another teammate argue we should use watchdog [1] instead, because it had more stars, and more frequent commits. It took me longer than I thought it would to explain that sometimes, projects are complete and don’t need commits, and that we didn’t need or want any of the additional complexity provided by watchdog.
Another example is UUID generation. Python doesn’t yet natively do UUIDv7 generation, but if you read their source code and the RFC for UUIDv7, it’s fairly easy to write your own implementation. This was met with “please don’t write your own UUID implementation; use a library.” Baffling.
> This was met with “please don’t write your own UUID implementation; use a library.”
I agree with this one and I’d push back on it too.
In my experience, what will happen is you write your own UUIDv7 implementation because the language or stdlib doesn’t support it yet. Then the language eventually supports it, but this project is still using your homegrown implementation that is slightly different than the language’s implementation in an incompatible way. Lots of code is bound to your custom implementation and now it would be too much effort or cost to swap out your custom implementation for the standard one.
So now this project has a caveat: yes, it uses UUIDv7 but not the official UUIDv7 implementation, and you have to work around this land mine for the rest of the life of the project. All of these small “innocent” caveats like this add up and make working in projects like this a miserable experience.
That’s a fair point, but it’s also not a complex class, and could be extended to be compatible, such that swapping out for stdlib is a single import statement.
For larger or more complex problems, especially those that don’t have an RFC behind them, I can definitely see the unwillingness to do so.
To drop an anecdatum into the fray, I am an older dev and I tend to see this from younger devs because, frankly, the elder devs are too tired and semi-burnt out from trying to stop the younger devs creating nonsensically pure houses of glass built on the Wise Words of The Bloggers using Technique du Jour where everything takes 4x as long and results in the most fragile and complex sugar-spun castles which collapse into unmaintainable slop after the first contact with enemy (customer) fire.
(I may or may not be bitter about this from previous jobs)
> Lots of developers will find it much more interesting, challenging, rewarding and just plain fun to develop something from scratch, even when there are better things that already exist.
This is true; but after enough years in the industry you learn to correlate success with laziness. This is well-discussed and arguably obvious but on an emotional level it takes a long time to fully sink in. We were all once developers with outsized ambitions and awareness we can flee to greener pastures.
On the flip side there are many engineers who are so afraid to build anything that seems even a little bit difficult you end up with a million dependencies for things that could have been 100 lines of code.
A common thing to hear in frontend these days is “you’re just rebuilding X” “you mean you want to rebuild X” where X is some trivial flavor of the month library.
All of the famous tools, databases and frameworks you see today was built by someone who said they can do it better and then they built a community around it.
I'm working with SBoM, one fun side effect is that you can scan SBoM's for vulnerabilities. Suddenly hackers, your customers and your competitors starts do to this and you need to make sure your third party dependencies are updated.
This reveals the cost of dependencies (that often are ignored).
I hope that we in the future will have a more nuanced discussion on when it's okay to add a dependency and when you should write from scratch.
> In software development it's pretty important to know when to build "on top" of something else, and when to start from scratch.
Building some brownfield CRUD for a run of the mill org? Starting from scratch will almost always go horribly, just pick whatever enterprise'y solution fits the task at hand and be done with it.
Working for one of the big orgs on something interesting, and have the backing needed for being able to throw person-years at a problem until it crumbles under the collective engineering effort? Building from scratch might be a good choice sometimes.
Personal learning projects, side projects and the like? If you won't have to maintain it long term or at least don't think you'll have significant amounts of time or effort you can spare for that, then from scratch is okay (your own game engine to learn about the internals? your own implementation of something S3 compatible? maybe your own CMS for the hell of it?), otherwise consider treating it as a brownfield project (e.g. if you want to make and finish a game, or just store some files, or maybe just run a blog where the focus is on the content not how you made the thing it's running on).
What's my reasoning for this? Code is typically written to solve a particular problem. In business context, that typically means finishing some Jira issues and having deliverables. In large enough open source projects that typically also means having instructions on how to run and administer the thing, proper test coverage given the larger amount of various people working on it. Thus, the bus factor becomes larger and it won't be as much of a miserable experience of code archaeology as when the dev who wrote some custom CMS for a project at work leaves and literally only some code without even proper CI/CD is left behind, no proper comments, no ADRs, no code examples that aren't coupled to the logic, no documentation or even summary of the project, maybe an empty template for a README, no decoupling between the technical bits and the business rules (or just tight coupling in general), because again, they only wanted to ship. And even if they had better intentions, there were still deadlines and they were still one person (or a small team) that can't compete with any of the large multi-year projects out there.
In the last paragraph what you said is often what happens due to bad management too. A good developer can be given a task that they barely have time to get done, and as a result the unit tests, and the documentation, and even the architecture suffers, or gets omitted.
Often in shops where just cranking out new features and/or bug fixing is the goal of management, the software continues to degrade endlessly due to all the things you mentioned, because spending time in those areas isn't something the boss finds justifiable expenditure of developer time. Once all the developers who originally wrote the code have left or been fired then the deterioration in code quality can start to go down rapidly until some kind of "cleanup" effort is undertaken, where ZERO new features are created, but things are just cleaned up.
In projects with millions of lines of spaghetti code sometimes this cleanup is completely impossible, because a total rewrite would be easier.
> boss won't want to risk losing them so they'll agree to the escapade.
That's precisely it. A motivated engineer is almost always going to outperform a bored engineer/one who quits. Morale is miles more important than chasing after efficiency.
"Boss" here is likely making the correct decision after weighing the ups and downs.
Because at way too many places there are seven junior eng and one senior eng. If they’re lucky.
I’ve been that senior eng and you spend 130% of your time trying to find terrible decisions about to be made before it’s too late and reviewing 1,400-line PRs only to discover (and try to teach) that it could have been 40 lines. Enough junior devs without sufficient supervision can literally crank out endless quantities of negative-value work. And it’s a battle you’re constantly losing.
Lots of times it's just ordinary office politics, or the boss likes one person more than another, or isn't "technical" enough to know when he's being manipulated. Because often managers aren't developers themselves, so they don't know which developer is telling them the best advice, when two developers disagree.
And to be fair, often times it is against the senior's judgement, but if off the critical path can be a decent gamble. The hubris of junior engineers accomplishes a lot.
Yes, it's true great developers can 'reinvent' things and do a great job of it, but the problem is that every line of code in a project is an efficiency drag forever moving forward. It always has to be maintained, updated, and managed by someone.
Developers should be measured by how many lines of great code they can delete, not how many lines of great code they create (<-- but don't take this literally of course, it's just making a point)
Right. My "to be fair" is largely similar to "devil's advocate."
The caveat of, "off critical path" is a heavy lift, too.
My view was to give projects a form of risk budget. If possible, do things same way as last time. Any deviation is a risk. Can have rewards, sure. But if there was a known way to do it already, be budgeted to pivot back.
It usually doesn't happen for me, but when it does, it's because the seniors are out of options.
The safe, tried-and-true way to build typical web crap is to stick the one, blessed database in the middle, and then dangle all the dependencies off that. Everything is synchronous because it's simpler, and it's what the seniors grew up with.
And then one day you won't be able to run your new history feature, because it's locking up the database for too long and new transactions are timing out.
The juniors only get to run the show and introduce exotic, non-boring technology (asynchronicity, event-sourcing, eventual consistency, CQRS etc.) after the seniors have admitted defeat.
> The juniors only get to run the show and introduce exotic, non-boring technology (asynchronicity, event-sourcing, eventual consistency, CQRS etc.)
That’s decade old technologies. So is most of the functional niceties which are somehow finally hype nowadays by the way.
The idea that junior are somehow better able is as old as the field and every junior ends up seeing the light at some point.
It generally goes like this: junior thinks they are smart and have a good solution for an apparently new complex problem. They build it and it fails because of some unexpected edge case. They reluctantly go to see the senior expert with the problem who immediately understands the issue, tells them not to do what they just did - they tried this apparently good idea ten years ago - and proceed to give them a solution which will work. Junior dev finally realises that senior are just people who have been there long enough to already have done the obvious mistakes and they can gain considerable time by just learning from them.
Obviously it only works if your senior are actually senior and not slightly older junior with ego boosting title.
> The juniors only get to run the show and introduce exotic, non-boring technology (asynchronicity, event-sourcing, eventual consistency, CQRS etc.) after the seniors have admitted defeat.
lol. This is definitely not my experience. Most problems are pretty boring and can be solved with a judicious use of boring tech. It isn't until you get to a sufficiently large scale where you need to introduce these techniques. And only if they actually further the goal.
Many times the existing system is doing something in a really dumb way because the original authors were organically growing as they explored the problem domain and that puts the system into a box of thinking, similar to your LLM spitting out trash because the context got polluted, or simply inertia. So the exotic tech is introduced to solve a symptom to the underlying issue without the crucial step of reconsidering the box the system sits in. The requirements at scale are now fundamentally different and therefore the solution should be reconsidered.
If the seniors involved completely miss this step then I question the breadth of their experience because this is common when crossing a scale threshold. Being old or having worked at the same company for 10 years doesn't automatically mean someone is truly skilled and could actually mean their experience is extremely limited.
Which part doesn't vibe with you? Are the seniors not stuck in their way of doing things? Or they are stuck in their way of doing things, but never admit defeat?
> So the exotic tech is introduced to solve a symptom to the underlying issue without the crucial step of reconsidering the box the system sits in.
What does this even mean?
> Being old or having worked at the same company for 10 years doesn't automatically mean someone is truly skilled and could actually mean their experience is extremely limited.
Right. And it's really hard to have a discussion with them, because they'll bring the discussion back to seniority, or popularity, or boringness of the technology, or say that the scale issue probably won't happen, or the race condition probably won't happen.
I've seen it other places as well. Film/video post production has phases of the hot editor/colorist/director/etc. Then a new young hotness comes along because people feel the gray beards are too long in the tooth and impossible for them to be hip. Then the gray beards watch the newbie make the same mistakes over and over. It's called getting old. It's young thinking they are invincible and finding it impossible the olds can possibly know anything. It's human nature
I'm in that boat except the ages are reversed and the older guys are constantly trying to build things from scratch. They just refuse to spend any time looking for off-the-shelf solutions and only build on what they know, so we waste time and end up with a crappy result.
Example: For a parallel data processing pipeline they wanted to build a REST interface for submitting "jobs" to a cluster which would parallelize with MPI, instead of just using xarray+dask.
Another example: They wanted to store tabular data product metadata Postgres with URIs pointing to NetCDF files on disk, instead of just putting everything inside NetCDF.
I like to build from scratch, but I prefer maintain something on top of something. Less code to maintain since I can delegate part of my code to the technology built on top of it.
In that matters, for my professional work have a healthy long term life, I usually select on-top development style, and sometimes, if I have really good reasons to do, from scratch is the option. Of course, for learning and personal projects, from scratch is always a very fun choice!
+1 on this. Number one source of bugs at a recent job was a homebrew TLS / HTTP load balancer. First chance I got I replaced it with nginx and bugs shot down immediately. With tools like apache, nginx, haproxy and caddy available, it was pure madness to reinvent that wheel... But the dev wanted open source CV padding...
I've seen this a lot when someone wants to add "workflow automation" or "scripting" to their app. The most success I'd had is embedding either Lua or Javascript (preferably Lua) with objects/functions from the business domain available to the user's script. This is what games do too. I think it's a great way to dodge most of the work. For free you can support flow control, arbitrary boolean expressions, math, etc.
I find that the #1 reason people add those "simpler than Lua" homegrown languages in Enterprise is to allow non-programmers to program. This not only has a tremendous cost to develop (compared to something like embedding Lua) but it also creates the worst kind of spaghetti.
One of the most unhinged pieces of software I have ever seen was the one from a fintech I worked with. Visual programming, used by business specialists. Zero abstraction support, so lots of forced repetition. No synchronous function call, so lots of duplications or partitioning to simulate it. Since there were two failed versions, there are three incompatible versions of this system running in parallel and migration from one to the other must be done manually.
The problem is about 90% of the business rules were encoded into this system, because business people were in a hurry. People wanted a report but didn't want to wait for Business Intelligence? Let's add "tags" to records so they appear on certain screens, and then remove them when they shouldn't anymore.
In the end the solution was adding "experts" to use it, but the ones who actually knew or learned any programming would just end up escaping to other companies.
One pitfall that is so obvious it hurts (but I have seen people fall into it), goes a bit like this:
1. We have a python application
2. We need a configuration format, we pick one of the usual (ini/toml/yaml/...)
3. We want to allow more than usual to be done in this config, so let's build some more complex stuff based on special strings etc.
Now the thing they should have considered in step 3 is why not just use a python file for configuration? Sure this comes with pitfalls as you now allow people who write the config to do similar things than the application, but you are already using a programming language, why not just use it for your overly complex configuration? For in house stuff this could certainly be more viable than writing your own parser.
Because now, anything that wants to read that config has to be written in Python. You've chained yourself to a stack just for a dynamic config. I ran into this issue at a previous job, but with a service that leaned heavily on hundreds of Django models. It made it impossible to use those models as a source of truth for anything unless you used Python and imported a heavyweight framework. It was the biggest blocker for a C++ rewrite of the service, which was really bad because we were having performance issues and were already reaching our scaling limits.
Declarative configs are preferable for keeping your options open as to who or what consumes them. For cases where config as code is truly necessary, the best option is to pick something that's built for exactly that, like Lua (or some other embedded scripting language+runtime with bindings for every language).
I would tread carefully around this (although you know the specifics !).
Simply being tied to one language is rarely a bad thing - at a certain point in a company size growth, having a common language and set of tools (logging, dbase wrappers etc) acts as a force multiplier beyond individual team leads preferences.
I would be interested in exactly what scaling issues you hit but I would ask if Inwere financing the company if overcoming scaling problems in python would cost less and lead to better cadence than a migration to C++
I’ve worked in several python shops, and now work with Rust. Python’s performance can be a real cost problem at scale. Where this bit us in the past was with the sheer number of containers and nodes we had to spin up in k8s to support comparatively moderate traffic in a relatively simple web application.
It’s been a while, so take the numbers with a grain of salt, but where we might have needed 10 pods across several nodes to process a measly 100 req/s, we can easily handle that with a single pod running a web application written in rust, with plenty of room to spare. I suspect some of it is due to the GIL: you need to scale instances rather than threads to get more performance in Python.
Anyway, at some point the cost of all those extra nodes adds up, or your database can’t handle the absurd number of concurrent connections all your pods are establishing, or whatever.
This can sometimes be a good idea. But it isn't without downsides. Now your config file is Python and capable of doing anything Python can do (which isn't necessarily a good idea), it's no longer safe, you now have to deal with shitty Python tooling, you might have to debug crashes/lockups in your config files, you can no long switch implementation languages, etc. etc.
Not that this is a magic or even a good solution, I just wanted to mention that sometimes you already have the thing you are looking for directly under your nose.
I never had any project were a toml config wasn't enough.
One improvement though is using Starlark, instead of directly Python, since it offers a lot of advantages for a more lightweight runtime and parallelism.
I've done both. I embedded VBScript/JScript in an app via Microsoft's Active Scripting[1] interfaces and wrote a template language that grew to contain typical programming language constructs.
Looking back it was the VBScript/JScript functionality that caused me the most problems. Especially when I migrated the whole app from C++ to .Net.
Reminds me of the pain of intentionally building a compiler for Java 2 (subset) to MIPS compiler by writing out each AST node class by hand. And, I did it twice, once in C++03 with bison and flex and again in Java 2 with CUP and JFlex... each was developed to build and run as a host portably across Solaris (sparc), Linux (x86), HP-UX (68k), SGI (MIPS), and Windows (x86) with compiled with targets run on the SPIM emulator. It did have dead code, dead string, and dead variable elimination, but that was as far my optimization passes went. I recall the only build tool I used for each was the portable subset of make without GNU extensions.
Speaking of reinventing the wheel, in 1998, I built a flexible almost framework for a "portable" generic installer using Java 2, JWT (native GUI controls), and JNI on Windows to create a program group and desktop shortcut icon. The hilarious part was shipping a full JRE on a CD. It took forever to load but the additional time seemed impressive for expensive, niche software in a way similar to the now fake "loading..." delayed progress bar.
If you have a user flow that requires interaction, hits an API that returns instantly, and then changes something fundamentally, it is better UX to make the user wait for a couple seconds with an animation and then some success state indication. If it all happens at once it can be a bit jarring or confusing.
I get the solution for this and I know what all the terms mean. But I don't understand the problem. Whether it's facetious or hyperbole or whatever, I just don't get who or what circumstances this is addressing.
This is written like a Jeopardy answer. I just don't know what the question is.
Is it just me, or would "single page applications" fit easily in the "examples" section there? Sounds kind of trollish when I write it out, but honestly, it fits pretty well, right? We threw away all the things the browser gives you for free and re-implement the back button, history, etc in JavaScript. (and it's somewhat fractal, as within the frameworks we use to do this you'll frequently see people re-implementing things the framework already does).
> "Our editor is written in C++ and cross-compiled to JavaScript using the emscripten cross-compiler... Pulling this off was really hard; we've basically ended up building a browser inside a browser."
Now sure, Figma's an exception, but it's an illustrative one. For most single-page apps, it's an interesting question. Is the web browser a monolithic platform where if you reimplement any of its layout engines etc. you're reinventing the wheel? Or, is it a set of libraries that can be chosen from at will, that of course happen to all work together to provide sane defaults, but by no means are required or expected to all be put into use simultaneously?
I tend to think of the web platform as the latter. Just because there's something in the "standard library" so-to-speak doesn't mean I'm forced to use it - the real question is whether it's something stable that won't force me and the team to yak-shave to maintain it. Mature JS/TS libraries are no worse than the browser in this regard!
It was probably the right tradeoff to make between 2007-2017.
Doing anything in HTTP/1 would make you reload your whole page slowly. Then AJAX came along and allowed you to build things that were more interactive and responsive. Gmail was a game-changer.
Since then HTTP/2 came along, and I feel like the industry has blindly continued on the HTTP/1->AJAX trajectory, without stepping back and re-evaluating how much HTTP/2 (and later) can do for us.
I came to that conclusion recently. The prevailing logic was to ask: how can i maximize readability?
This at first had me rip out everything i use to be proud of. Now i have single documents that do only one thing. I put these in folders that like the files have titles that describe what they do.
The whole thing looks like I started coding a week ago.
It all just works. It will continue to work. If something ever breaks it will be only that page. You will be able to see what is wrong instantly. If you paste the page into an llm it will most likely guess correctly what is wrong.
I'd mostly agree. SPAs can do things hard to do with URLs and form inputs; for instance, chess would be harder to program in the browser without JS (though I remember an opening explorer which worked that way). Or if you think about popping up a modal or a toast. But a lot of the functionality is duplicated.
This gets even more extreme now that you can have wasm on a canvas... The language that you're compiling from doesn't understand the semantics of a back button either!
SPAs are an inner platform within an inner platform, given that the web environment already is one. Canvas-based web apps are an inner platform within an inner platform within an inner platform.
It's true that some of the Web platform's downsides, rooted in its split identity between being a document library and being an operating system, are kind of similar to this antipattern, if you squint a bit, although they tend not to be as bad because the outer platform is much more robustly engineered than the average enterprise app.
The key difference is that, in the Web platform's case, there's not actually a better alternative on offer. Even with these awkwardnesses, it's still a better app delivery platform than desktop or mobile OSes, because it's dramatically less fragmented, has a more convenient "installation" story (https://xkcd.com/1367/), and has a better security model (at least compared to desktop OSes). So people need to write rich web apps with arbitrary behavior in it, which requires it to be arbitrarily customizable.
Contrast an enterprise app, where the lesson of the "inner-platform effect" idea is that code changes to the outer platform aren't as costly as you think, compared to unmaintainable configuration that interacts in complex ways with the platform primitives. So it's best to allow only customization simple enough to not pose maintainability challenges, and eat the cost of an outer-platform code change whenever you need anything more complicated. But Web developers don't have the option of getting browsers to add new code every time they want to add a complex new feature to their app, so browsers need to support a rich enough set of primitives that those features are already possible.
The other way to resolve the tension would be to get rid of the document-library features and instead double down on being an operating system, perhaps based on WebAssembly and <canvas> instead of HTML+CSS+JavaScript, like Flutter for Web uses. But of course people are using the document library, and in some cases it's the easiest way to do something, even at the cost of a little bit of redundancy at intermediate levels of customization.
What SPA critics typically want, of course, is for most sites to be satisfied with less feature-richness so as to fit more easily into the document-library model. But the platform has to support everyone's use cases, not just those of people who like HN's minimalist style. (I can't find it now, but there was a great comment on HN awhile ago that said something like: "A lot of HN users basically wish the internet was like how it was in the 90s, except with broadband. But in this respect, we're unusual; most users like features and slick UIs.")
> Web developers don't have the option of getting browsers to add new code every time they want to add a complex new feature to their app
We lack a mechanism for picking sane new features. Browsers add new stuff all the time. Most of it is horrible. [Say] Adding a js assembly has to be the most stuborn way of tollerating new langages. you may do it, as long as the new lang is js!? You can have butter as long as it is yogurt.
I dont like pyton. I wouldn't be upset by <script type="pyton"> add some of the dom tools to it and people will have a ton of fun. Might even be useful.
That makes total sense. I have been tempted to do this in the past. Fortunately time and resources constraints have kept it to costly sane and maintainable, performant configurations until I learned that I would never create the system I wanted and that it was probably better that I didn't anyways. I guess I've been lucky and didn't even know it.
When writing programs that take other programs as inputs, and/or produce other programs as outputs, it's tempting to treat the program as only slightly more structured than its textual representation.
The problem is that unless your use case is very limited and is guaranteed to stay that way, supporting more and more language constructs will quickly turn your code into a mess.
Compiler design as we learn it (lex/parse, syntax tree, semantic checks, transforms, lowering to codegen) is _the_ solution to the problem of dealing with computer programs as inputs and outputs. Trying to do something less is like solving a dynamic programming problem without knowing dynamic programming: it will only work for a restricted set of inputs.
I've come very close to be the "sir" mentioned in the article (for hobby stuff only though), luckily always I did catch myself and was able to stop before it was too late. I decided at some point that I do not want to build a compiler/transpiler, and if I do some day, I want for it to be a conscious decision and not an accident like in the article.
It starts innocently, e.g. doing some template files and replacing some simple values, then you start to have to do more replacements and more "smart" parsing and then at some point it's too late, as the article suggests.
TBF, I did put together a transpiler from PHP to JS, but I didn't build it, just found the different pieces that luckily fit together and hacked around it enough that it could run in the browser.
Indeed, the safer thing is to actually build a few toy compilers on the side so you can get a sense for what they are good for, and what level of effort is required to build and maintain it.
Keeping them locked up in the "scary CS" closet only ends up stunting your growth.
I like to write toy compilers or interpreters as an exercise when learning a new language. Usually for a Forth or Lisp or one of Turing Tarpit languages. It requires some of the most common bits of programming: I/O, lexing, parsing (both of source and of arguments to the compiler), file handling, and some common algorithms & data structures (can't have an AST without a tree).
I'd say the opposite. Building a compiler incrementally, driven by clear needs at each step, as described in the article, tends to work out better than trying to build a compiler from day 1.
I've written 7 compilers. If you don't know you're writing a compiler from the beginning and you've never written before, it is basically guaranteed to be a hot mess in the end, unless you accidentally make exactly the right call at every one of a dozen critical steps. It's extremely unlikely to just stumble on all the good architectural ideas from a field that is 70+ years old without any planning.
Granted, if you've got a really simple language then maybe there's an upper bound on how bad it can be. But in practice people tend to also want to grow the language as they go.
Don't get me wrong; there's a lot than can wrong if you over-engineer the compiler from the beginning, including ending up with a hot mess anyway.
BTW if you can't write down, even in three sentences, the requirements for said compiler from the beginning, congratulations, you have no idea what you are doing.
Oh no. What the article describes ends up with a real mess. If you set out to support a certain language you'll have a global view of how things should fit together, and what the implementation should do.
Is there a meta-library someone somewhere has already written for when I just want to write 20% of a compiler, and it more or less takes care for me 80% of the common compiler-building-related things I’m likely to need to do?
LLVM doesn't prevent you from shooting yourself in the foot via a dozen different language details completely divorced from backend implementation. It's only a small, but very important and difficult, aspect of compiler writing. I think languages like go have demonstrated well that this is of limited benefit to the success of the language.
Racket at least provides most of the tooling required (eg dealing with details of namespacing, syntax, runtime/compile-time/evaluate-time distinction, continuations and garbage collection), but if you've ever tried to introduce it into an organization of significant size this very same result of yak-shaving will be considered a liability.
You're criticism is an apples vs. oranges strawman. LLVM IR and libraries provide powerful plumbing. It's not meant to be a turn-key or cookie-cutter solution, or inherently memory-safe VM or compiler generator. It's also not formally verified and written in C++, which adds project configuration complexity and C++ pitfalls. It's possible to plug in a custom GC using their strategy model their way, or manage GC yourself like Pony does because it's built around ORCA.
And you can step up a level above in the same ecosystem as well, my current project is written in Scala and directly shares its data structures with the Graal Python interprer that I've embedded inside it, so other people can write the stuff they want in Python.
Having recently built 90% of a compiler by mistake, I felt like this post was written specifically about me. Hilarious writing, congrats to the author.
There is also the opposite, Ed Kmett is on record as saying that he had a million ideas for his personal programming language, and all of the cool stuff it was going to do, but then he ran into Haskell and said “Let’s be real, whatever amazing language I make isn't gonna be half as put together as this and I have this one already in front of me...”
Don't duplicate what already exists worse except as a brief experiment or for learning. For real use, it will just cost money and time to fix and support for no reason.
I don't think building compilers is that bad, tbh. It's very difficult to do this without realizing it.
I've written a dozen different programs that might be considered compilers; some very simple, others very complex and whose life continued once I left the organization. Writing a functional compiler that provides the needs of the organization where existing tooling doesn't takes discipline and focus on what you actually want to accomplish. I don't know what "defining a struct inside a loop" might mean and this strikes me as, very obviously, having no clue what you actually want to build.
Perhaps the issue is not building a compiler but rather the lack of focus to begin with.
Back in the earlier days of AI, not that early, but the late 80's I was the lead developer for an AI research program being jointly conducted by 3 business professors from MIT, Harvard, and Boston University. We were working on "frame based knowledge representation" - frame of reference based node links between nodes containing something: a number, a word, a sentence, or a "function that combines linked nodes into a new frame of reference".
Long story short, we thought we were making a new type of N-dimensional spreadsheet, but after 3 semesters of work one of the advisors at MIT told us we need to meet his colleague, and that guy informed us we had a working compiler for a hybrid of Lisp and C.
There is also the related joke [I think by Jamie W. Zawinski of Netscape Navigator fame] that any sufficiently complex
program eventually will be enabled to send email, and its
sister joke that any sufficiently complex program eventually will have an embedded LISP interpreter (Emacs is an example for both phenomena).
Hardware people tend to say "Everything is a machine".
Compiler people tend to say "Everything is a compiler".
Database people tend to say "Everything is a database/relational database management system".
Operating system people tend to say "Everything is an operating system".
There are many broadly-applicable paradigms in modern computer science.
Linguistic abstraction, i.e. the definition of a Domain Specific Language (DSL), is a very powerful technique that is often the right choice (but don't apply it to read an *.ini file! That is called overkill/distraction). Abelson and Sussman's "Structure and Interpretation of Computer Programs" book (SICP) is the gold standard book to teach you the various forms of abstraction.
I've discovered that deploying this quote as a compliment to someone's ac hoc, informally-specified, bug-ridden, slow, half-of-Common-Lisp code, makes them increasingly nervous until you get to the punchline.
I know of someone that did this for a bespoke form definition language to drive onboarding. Tens of thousands of lines, months of delays, and a bus factor of 1 later it was all eventually ripped out and replaced with plain old page templates. When your 10 question onboarding flow has a back-end class named “PredicateEvaluator” something is wrong.
So, we should build more compilers. The only limiting factor, I think, is that it's damn hard to design a good DSL that wraps your domain well and is neither too flexible (increasing boilerplate) nor too rigid (increasing workarounds and escape hatch usage).
But generically, a compiler is the exact kind of thing you want when you're doing "Take this data structure and transform it into this other data structure". In a traditional compiler, we usually deserialize the first data structure from a string (parsing), call that data structure a CST, validate the data structure (syntax & type checking), do the transform, then serialize the output.
This kind of validate and transform pattern is all over programming though. And it's pretty easy to test with things like property tests. So yeah, we should build little compilers more as abstraction boundaries in our code.
I had this kind of risk in mind when I wrote a server-side "HTML template" feature for Racket.
The template language intentionally only handles static chunks of HTML, escaping of values, and a little safety guards.
Everything else (including the usual template language behavior like iterating over a collection/stream, such as from a database query result) is done with arbitrary normal Racket language, which the template feature's implementation doesn't have to know about nor handle specially.
More recently (for employability reasons, or under-resourced startup pragmatics), doing Python with Flask, JavaScript with SvelteKit, and Swift with SwiftUI, I still miss the clean simplicity and available power that I had with Scheme/Racket.
Right? It generates scripts which are then executed to do builds.
Debugging broken Yocto builds can be a nightmare.
You end up stepping into the build directory environment and manually running the generated code. You try some fixes in it, and then guess on which fragment in what bitbake file to backport that fix into.
I am not clear on why reaching for an existing compiler's AST would ever be top of list?
Don't get me wrong. I think many language design points should be used more. But starting from scratch makes a ton of sense. Skip the parsing stage and build up supported AST style constructs of your own.
Done simply, this is basically the command pattern. Keep execution separate from declaration and you should be fine?
Sure, you may want a parser for a dedicated serialization language some day. Hard to think you need start there?
But starting with the full AST of an existing language feels like a terrible idea. In any world.
So what do you use to know if you need to build it yourself or if there is already something out there? Niot being able to find a tool for the problem does not mean it doesn't exist, just that you haven't found it. Especially when you lack the familiarity with the problem to know the correct keywords.
I find ChatGPT to be of great help to explore the area, find relevant keywords or the name of the research domain. Sometimes you really need to know exactly what you are looking for before you can find the link to that one super helpful github library that solves you problem. The of course the next step is figuring out if you want to take on the dependency or not...
I have wasted hours searching for an (analytical) inverse kinematics library for robotic arms. There are tons of slow non analytical libraries out there, and some horrible ones like ikfast that is a effectively a code generator that spits out c that can be compiled with python bindings. I eventually did find https://github.com/Jmeyer1292/opw_kinematics, which someone ported rust (for which it was easy to create python bindings).
A point to note here, is that even if you're working on a software system that already _is_ a compiler, you might still find you're building a small, different compiler somewhere else within that project.
Many older .NET applications saved programmers from this by providing "C# scripts". The framework includes the compiler and then it's trivial to use the compiled artifact. You can still do it by including the Roslyn libraries. I don't see it as much anymore, or it's some half-baked Python or Lua interface.
The new Roslyn incremental generator API is pretty good these days but not well documented yet. I’ve been using it with json-schema to save a lot of boilerplate and provide a more intuitive declarative framework in a large side project.
It's presented like a magic fact of life, in reality people do what they are taught and are quite impotent without knowledge, most universities have some sort of compiler course probably using the dragon book, or a derivative copy and these students proceed to go out into the real world and write more or less the same implementation they saw in school with the same mistakes.
Compilers are interesting, but there is literally no proof that they are optimal for any of their popular applications. Which is what I think you are trying to imply by this narrative you have constructed of people constantly reinventing compilers. This is just the same propagandist argument lispweenies make to claim that their language is special.
every config parser is a compiler. if platforms (e.g. programming languages) made run-time plugins easier, we wouldn't even have config files.
Imagine a config file with type checking and control flow. You have it-- it's your programming language. you just need to load the code at runtime, like erlang.
You might want to check out Dhall[0], which is a configuration file format that allows a safe-ish subset of functional programming language idioms. Needless to say that it requires a full compiler :)
I have seen this happen countless of times at companies large and small. The article is brilliant in humorously highlighting the denial (usually) or the lack of knowledge (sometimes) that leads engineers down this path again and again.
This was a fun read! It has a link at the bottom to "if architects had to work like software engineers" which sounds fun, but the link no longer works, and searching doesn't bring anything up.
I do not understand this rant. If you have the vagues pretensions of being an actual software engineer, and your file format isn't brain-dead simple, the way to parse it tokenize->grammar based parser->ast binding phase. ASTs are simple recursive data structures, if you handle them correctly, it doesn't matter if they contain 50 or 5000 nodes or how they nest, as long as the code is correct.
SSA is a nice ish format for representing program code, but it's not the only choice and may or may not be appropriate for your domain. For example, if your language describes data instead of control flow, imo SSA is a bad choice.
I have done this and if you take care to do things right, you won't need to bother with these hacky corner cases.
I enjoyed the article, but the unintentional Easter egg at the end left me in stitches: the link to “If Architects had to work like Programmers” just 404s, which feels spot on.
> Do not worry at this time about acquiring the resources to build the house itself. Your first priority is to develop detailed plans and specifications. Once I approve these plans, I would expect the house to be under roof within 48 hours.
..nowadays would be more like:
> The MVP should be move-in-ready ASAP at which point we shall move into the house and live there while you complete the remaining requirements.
Many complaints are about the choice of YAML as a config file format. It is being used for nearly everything, and in many cases fine details of its syntax matter, especially around multiline strings and text blocks[0]. In theory, one can always retreat to JSON, but that's barely better.
Then it goes on: you need additional tools like Helm or Kustomize to generate your YAML files. I am not aware whether anybody has ever tried to generate Kubernetes config files from Dhall[1] input files.
Probably has to do with becoming mired in duplicated abstractions. I don't think this is the same thing, though, kubernetes composes what it is intended to replace with a different set of goals.
you start with the notion that Kubernetes is too complicated. so you build your own deploy and hosting system, add features to it as they come up, and soon you've reinvented Kubernetes poorly
Lots of developers will find it much more interesting, challenging, rewarding and just plain fun to develop something from scratch, even when there are better things that already exist.
They'll cleverly manipulate and convince the boss, against the better discretion of their elder developers, that they can do it, and if they're one of the better developers, the boss won't want to risk losing them so they'll agree to the escapade.
Then said escapade turns into a shambles, as predicted by the elder devs, and the developer who created the mess simply quits and moves to some other job, in search of more fun and greener pastures. Any developer with decades of experience has probably seen this same pattern multiple times.
reply