Hacker News new | past | comments | ask | show | jobs | submit login
Systems Past: software innovations we actually use (2014) (davidad.github.io)
148 points by ingve on Dec 28, 2016 | hide | past | favorite | 110 comments



Read the section "Conclusion" at the end, if nothing else. It is extremely well written. The author throws out some potentially controversial ideas, but he (like many people I have seen) are all hoping for a common outcome - dramatically "rethinking" the concept of the OS and programming language, but really embracing ideas that are as old as time. I'll highlight one section:

"Most of all, let’s rethink the received wisdom that you should teach your computer to do things in a programming language and run the resulting program on an operating system. A righteous operating system should be a programming language. And for goodness’ sake, let’s not use the entire network stack just to talk to another process on the same machine which is responsible for managing a database using the filesystem stack. At least let’s use shared memory (with transactional semantics, naturally – which Intel’s latest CPUs support in hardware). But if we believe in the future – if we believe in ourselves – let’s dare to ask why, anyway, does the operating system give you this “filesystem” thing that’s no good as a database and expect you to just accept that “stuff on computers goes in folders, lah”? Any decent software environment ought to have a fully featured database, built in, and no need for a 'filesystem'."


What programming language should the OS be? That's the problem. What he's saying will never happen, because nobody can design a programming language that will fit all use cases. Unix is polyglot by design; that's a feature and not a bug.

Everybody wants the OS to be easier for THEIR use case, underestimating the diversity of computing. "Why can't I just have a whole OS in node.js, e.g. https://node-os.com/ ? That's all I need."

Well some people need to run linear algebra on computers and don't have any need for that stuff. Likewise you don't have any need for Fortran.

The "language as OS" thing has been tried with Lisp and Smalltalk, and failed to gain adoption for good reason. They are both just languages now.

Microsoft already tried and abandoned the second idea (WinFS). File systems are complex, but databases are an order of magnitude more complex. OSes evolve on a slower time scale; databases and languages on a relatively faster time scale.

Multiple databases are also a feature. Both sqlite and Postgres/MySQL work on top of the file system. The file system supports the minimum API you need to write a database. (It's somewhat bad, but the way to fix the API isn't to replace it with a database)

If you had bundled a a relational database with the OS, then you wouldn't have had the right abstractions to make distributed databases like Dynamo and BigTable and whatnot. A database just has more design parameters than a file system and so you can't write one that will fit all applications. Again, the problem is underestimating the diversity of use cases.

So yeah I think both of these ideas are badly mistaken.

EDIT: I also don't think idea about shared memory is good idea. You can can have IPC without the network stack using Unix domain sockets, and that's what people actually use to connect to databases on the same machine. It has the benefit that you can move the database to another machine with minimal changes.

A shared memory interface between and application and a database sounds like a bad idea.


Which leads us to ask, "Where are the revolutionaries?"

It used to be the younger generation, around college-age, would rethink their parents' systems: political, business, etc. These kids would look at the buildings around them and imagine what could be accomplished in razing those buildings and creating a new foundation. I don't see that anymore. I see college-aged kids simply wanting to add a new floor on top of a very rickety shack that should have been discarded ages ago.

Ironically, the people writing these types of manifestos for Computer Science are almost exclusively older. They often wrote that first foundational layer and know that it needs to be overhauled.

By the way, I would phrase it more that "any righteous programming language is also an operating system," and that "no righteous operating system operates on files, so no righteous programming language is coded in files." Data that is not stored or accessed as data eventually will be.

It's true we've tried many variations on these themes in the past, but if we do the logical syllogism in our heads, we come invariably to the conclusion this is how the future should be. We know these things. So why give up? What does failure mean other than that we need to continue trying?

It infuriates me that we don't try -- when people say "we've tried everything and there's no way forward." No, we're just lazy. Charles Duell never actually said in 1899 that "everything has already been invented," yet, oh my God, we can't stop saying it now. So I'm going to ask again, "Where are the revolutionaries?"


It absolutely does need to be overhauled, but the problem is that people don't agree on the direction. I think most of your ideas are profoundly mistaken, and have been disproven by history. I'm sure you think my ideas are wrong too.

I think that people tend to dismiss the success or failure of certain systems as luck or even marketing, rather than deeply reflecting on the intrinsic properties of those systems that led them to be adopted or not adopted.

Unix and the web have lots of flaws, but they're not an accident of history. They have fundamental architectural properties that made them succeed. (In particular, REST is the same idea architecturally as Unix/Plan 9 style file systems. Long story, because it's basically avoiding O(m*n) problems in the ecosystem.)

You also have to accept evolution as a fundamental force in computing (as Linus Torvalds does). To ignore it is to set yourself up for failure and frustration. After all, humans are by no means the optimal living being either, but they're the best that evolution came up with.

And as I said in my other post, "revolutionary" ideas tend to just add to the pile. They fail to meet their expectations and then Unix subsumes them. It's almost an economic inevitability.

And Linux WAS revolutionary, if you compare to to Microsoft. The first couple decades of my life were DOS and Windows based, so from that perspective the revolution happened. Microsoft is porting all their stuff to Linux now.

http://www.revolution-os.com/

If you want a single language OS, check out Mirage: https://mirage.io/ . I know somebody is going to say "I don't like OCaml". Now you understand why Unix dominates: it supports heterogeneity. Even Mirage can't really get rid of Unix, because it runs on Xen, which typically runs on Linux.


I recognize that we disagree and I respect your opinion, I would just like to add a bit more:

> "have been disproven by history."

I guess that's my point. I don't think we've had enough history to be conclusive yet. I keep hearing this, over and over, and it's the equivalent of "heavier than air objects can't fly -- it's been proven in attempt after attempt."

In my opinion, the reason why there are thousands of programming languages, for example, and is not because we can't agree on them; it's because it hasn't been solved yet. No language is good enough, so we keep trying. In the late 1990's there were seemingly hundreds of search engines. Google/Page Rank came along and all the other search engines went away almost overnight.

> "revolutionary" ideas tend to just add to the pile

They do until they don't. All those crazy attempts at flight were truly crazy, until they lead to flight.

We're at the beginning of history, not the end of it.


Search engines are comparable in terms of how successful they are at their unique task.

Languages? What metric are you going to use that equally applicable to all tasks?

Think of it as vehicles and their different purposes. Would you support condensing all vehicles to one, so that one vehicle has to both haul and mix concrete and take your kids to get ice cream?


I think systems programming would continue to be separate. I left it off because it amounts to a small fraction of in terms of usage in terms of total code written (and I consider it to be able to be grouped in with hardware frankly).

The rest is application programming, and yes, when it's right, it will absolutely consolidate: There will be one "language". Sure, the current conventional wisdom is that languages are the "different tools in the toolbox." I know. My statement wasn't borne out of ignorance. What I'm saying is that kind of thinking is what's holding us back.

What is a program? Is it expressive text? Is it, "Shall I compare thee to a summer's day?" No, of course not. There's no narrative. We use text flow for instruction flow but that's coincidence. We are acutely aware that programs are discrete instructions. Code is data.

I made a statement above. I'll repeat it because it's important. It's a law: Data that is not stored or accessed as data eventually will be. Any programming language that uses text files will eventually go away. There will be one "language" eventually because the natural inclination of data is one representation. There will always be a market for new widgets to manipulate data or find insight in data, of course, but there will be one language -- and it will be data. It's the law.

Now I should admit, it's not that we haven't tried. We've tried many times and failed. What I'm saying is that if we come to a conclusion for a destination then there is no other choice but to make that the target.


I'm not fully grokking the theory/philosophy you're trying to express here, and I'd really like to, so I'll ask for forgiveness in advance if I seem too dense.

When I think of language and getting a message across (be it to someone else, or to a computer as code), I think of the components that are required for communication:

  - Emitter (myself) (EM; mostly obvious)
  - Recipient (someone else/the computer) (RE; also obvious)
  - Message (what I want to communicate) (MS)
  - Medium (what carries the message) (MD)
  - Protocol (base rules that both sides agree on) (PR)
If I'm trying to help my mother-in-law with a computer problem, I'll tell her to click on such-and-such over phone lines using English. (MD=phone line; MS=instructions; PR=English grammar)

If I'm writing a script to unzip a bunch of files at once and distribute the contents into different folders based on the file type, I'd probably whip up a Python script. (MD=text file; MS=task description; PR=Python)

My script in turn will communicate with the file system and the operating system to complete my task. (MD=bits; MS=instructions; PR=Python-to-Machine-Language-Interpreter)

If I understand you correctly, you're saying that all application programs should distill down to just the message, as the one representation of what I want to communicate. I'm hesitant to accept that, since it tends to assume that both emitter and recipient can independently determine and adapt to the correct protocol when all they have is the message.

For example, my Python script will probably use the file extension as a heuristic for the file type, but will make an error when picture.txt actually is a badly named JPEG. If I want to increase correctness, I can change the script so that it uses the file's Magic Number, but that may be overhead I don't need if the files are correctly named. Choosing one technique over the other requires that I craft the message differently, because Python doesn't make a decision on how best to determine file type.

TL;DR: Data needs a protocol to be used as information, and languages provide that. Doing away with languages means we need to define and agree on a non-ambiguous way of reading data, which is a tall order.


You raise an interesting point. Data, as we use it today, is often bare-bones and without meta-data. How you interpret it is very much up to you. It's as if, in English, we said "... apple ... tree" and then leave it to the user to decide if that's "I'm planting an apple tree" or "You can't get an apple from a walnut tree."

You're definitely describing how we work in today's terms -- there's nothing wrong with that, I'm just challenging it. I'm saying that we can describe the protocol better using data; or more I'm saying that it is data and that any other way of describing it is sub-optimal and for that reason will eventually desist.

> Doing away with languages means we need to define and agree on a non-ambiguous way of reading data Actually I'm saying that language is more ambiguous than data -- at all times and in all cases. Anything you can get language to do, you can get data to do better (in this context). But it's not a hard case to prove because actually we're using language as data. We're doing the equivalent in programming languages of typing in "three hundred" into a text file and having it read from the text file and converted to a single data unit of 300.0 when we could just be operating directly with the data.

But I didn't clarify how this would happen, so I can see that it would seem a bit abstruse. I need to better communicate and clarify that we need to expand how we're viewing data and how we work with data -- that we need to incorporate more meta-data as an ancillary part of the data that both comes with it and yet is still secondary.


Great reply David and my sentiments match exactly. BTW took a peek at your profile and it led me to Kayia, and this bit of text:

"No instruction is stored in its textual form, but instead it is interpreted as you type and stored directly in its AST form and exposed in the same way as the data. This allows you to query code identically as you would query data, and look at it in various aspects and layers."

This is interesting to me because I designed something identical about 10 years ago (and who knows who else has) when I was designing my first proprietary compiler. I wanted to be able to manipulate tokens with the same ease that you might in a spreadsheet (which was probably an arbitrary goal but seemed cool at the time).


I'm so happy to hear that. I'll reach out to you and let you know what's coming next for Kayia.


Sounds good - gavanw@gmail.com


Clearly TempleOS/HolyC is the answer we've been looking for.

http://www.templeos.org


The Smalltalk and LISP methods were successful in the commercial space. Still are at software level with LISP machines doing it at hardware level with incredible benefits for a while. Reasons for failure seem social or economic instead of technical so far. I'm not dismissing them for customizable OS's yet as we might see a comeback of such things in new form.


Some of the technical problems of Lisp Machines:

  * relatively complex code
  * grown system with lots of legacy code
  * depended on certain hardware or its emulation -> not portable to other architectures
  * almost no security story
  * optimized for single-user GUI workstation use, other types (headless servers, ...) not so
    well supported
  * not multi-user capable
  * weak support for terminals
  * would have needed more bug fixing
  * more difficult to use with standard keyboards
etc etc


Was the lisp machine OS fundamentally non-portable to other hardware? It's my impression that it could have been, with similar amount of effort to making the first port of any OS.

Unix is intended to be portable, but creating and maintaining a port (including compilers and drivers) is a large effort, and somehow a community never materialized around supporting Lisp Machine on PCs, the way Unix on PCs did.


The MIT Lisp Machine had been developed further and ported to a bunch of CPUs. LMI and Symbolics started with the original CPU. Symbolics then developed a 36bit machine and later a 40bit machine. TI developed 32bit machines. LMI and Symbolics were working on a new generation, which did not reach the market.

All these released CPUs were basically stack-based architectures with a bunch of Lisp support and even some esoteric stuff. Early CPUs had writeable microcode, so that special instruction sets could be developed and improved.

The main compiler/runtime was never (AFAIK) ported to support conventional CPUs of CISC or RISC types with mostly fixed instruction sets. Symbolics seems to have been working on a portable Common Lisp (for Unix etc) for a short period of time, but I have not heard of a portable OS on 'conventional' CISC/RISC hardware. Symbolics developed an embedded version of their OS, but still for Symbolics hardware.

I can't remember that that any of the competitors were developing a Lisp OS on top of something like SUN/IBM/Apollo/SGI/DEC/... hardware. Xerox had their Lisp OS ported to SUNs as emulation and you would run it on top of some SUN OS / Solaris. Symbolics ported theirs as emulation on top of DEC ALPHA / Unix.

For companies like SUN, DEC, IBM, SGI, etc. it was possible to license some core OS and develop from there. But there was no portable core Lisp OS to license. One could license the MIT Lisp OS, but the code you got from them was for a special Lisp hardware.

Symbolics and TI were able to use some standard chips for some interface functions in some of their systems. It's not just the CPU, which needs to be supported, but also the hardware for serial interfaces, ethernet, graphics, disks, wireless, ...


That they ended up emulating Genera instead of porting it should tell you something. Maybe. Also, look at the trouble people go through to do that:

http://fare.tunes.org/LispM.html

Then there's this that makes it look easy but also shows the keyboard problem LispM was probably referring to:

http://www.loomcom.com/genera/genera-install.html

What I find on getting them running without LISP hardware is somewhere between tough and "wow, I feel for that person." Maybe things have improved. There just seems to be a huge mismatch between many aspects of LISP machines and modern machines that means it's probably easier to do a clean-slate LISP machine that builds on modern primitives & interfaces better.


Stallman was a noted Lisp hacker, yet chose to develop a C compiler and clone the Unix user space for his free software project. Stallman killed Lisp; yeah, that's it. ;)

I mention that because there is Unix on PC's that isn't free, and heavily indebted to GNU.

Proprietary Unix on PC hardware is an insignificant blip in computing history.


A writeup on Xenix indicated it had huge impact in getting UNIX into more universities and created tons of market demand for PC UNIX. If that's true, then it's not so much an insignificant blip as a huge part of the reason for FOSS UNIX's success if they benefited from contributors from those universities or demand they generated.

http://www.softpanorama.org/People/Torvalds/Finland_period/x...

Then it becomes a relic of history with lasting influence from there. Unless you count the Linux distro's people paid for. Two of those are still leading with a more, usable one mostly happening due to paid, support model. Seems like proprietary UNIX on PC's just shed its skin and took a new form that dominates UNIX on PC's to this day albeit with more benefit to users. ;)


I was thinking in terms of the ability to do the OS with LISP with some of its advantages in development, maintenance, or customization per user. Except done on modern systems in ways users might actually use.

I still thank you for the response since I like learning about LISP machines and reasons for failure. This is a nice list of stuff to avoid in the next one where applicable.


> Unix is polyglot by design; that's a feature and not a bug.

No it is not. Unix was always built around C with other languages as an afterthought, even late in its development. Here are some quotes from my copy of the UNIX time-sharing system: UNIX Programmer's Manual revised edition from 1983:

"System calls are entries into the UNIX supervisor. Every system call has one or more C language interfaces..."

"An assortment of subroutines is available... The functions are described in terms of C, but most will work with Fortran as well."

"The three principal languages in UNIX are provided by the C compiler cc(1), the Fortran compiler f77(1), and the assembler as(1)."

Saying that Unix is a "polyglot" operating system makes as little sense as saying that Symbolics machines were "polyglot." Symbolics had better C and Fortran development environments than Unix but it does not change the fact that it was a Lisp machine. Unix is a time-sharing system with a C standard library, C memory management conventions, C calling conventions, C stack layout, C process memory layout, C linking and loading. Working against any of these conventions is possible but very awkward.


Yes, you're right that C is special in Unix. There's a bias toward writing applications in C because the system interface is provided as C headers, and the ABI is architecture-dependent.

But C turned out to be a great language for writing programming languages as well as kernels. The JVM is written in C, Python/Perl/Ruby/v8 etc. are written in C or C++. So in practice you do get an ecosystem with multiple languages. It's certainly nicer to write languages in C than in assembly.

Treating everything as byte streams is another big reason that it is polyglot. This has obvious downsides, but if you stored everything as s-expressions, or C structs, then it would privilege one language over another. Traditional Unix utilities don't use architecture-dependent binary formats, and this is one of the main reasons why.

But honestly, the only other solution would have been to define an RPC-like architecture-independent interface -- what operating systems do that? I don't know of any.


> The JVM is written in C, Python/Perl/Ruby/v8 etc. are written in C or C++.

This is disingenuous - that's exactly like saying that ZetaC on the Lisp Machine was written in Lisp and not in microcode, so Symbolics is a polyglot machine. The reason that a lot of dynamic programming language runtimes (like the JVM, SBCL, etc) use C is exactly because the only alternative on Unix is to hand-code system calls in assembly. The code generators are almost always self-hosting in these languages.

> It's certainly nicer to write languages in C than in assembly.

C is a bad language to target compilers and transpilers to if you need non-standard control flow because it forces you into a stack discipline. Doing things like continuations or restartable exceptions requires Rube Goldberg machine-level workarounds.

> Treating everything as byte streams is another big reason that it is polyglot. This has obvious downsides, but if you stored everything as s-expressions, or C structs, then it would privilege one language over another. Traditional Unix utilities don't use architecture-dependent binary formats, and this is one of the main reasons why.

Even the idea of byte streams is a C-ism. Bytes on the PDP-10 would "naturally" come out to 6 bits (out of a 36-bit word) - the instruction set was based around flexible bit fields.

> But honestly, the only other solution would have been to define an RPC-like architecture-independent interface -- what operating systems do that? I don't know of any.

That sounds like a microkernel. But I think the real takeaway here is that any operating system is going to come with baggage for language implementors, and it is important to recognize and make explicit what these assumptions are and why they were made (like for example I don't think many people ever think about the non-8-bit-bytes thing, but it is important if you want to provide a nice interface for bit-banging: http://clhs.lisp.se/Body/f_ldb.htm#ldb)


C doesn't have calling conventions; those are dictated by ABI's. A Pascal compiler can easily use the same ABI as a C compiler.

The layout of a process doesn't really do anything that helps C.

The C-specific considerations creep into API's when clients are required to prepare, or to parse, memory described as a C structure.

C helps here by dictating that the members of a struct may not be reordered. The rules that compilers use for aligning structure members tend to be very similar and straightforward.


I agree with all of your points, I just wonder (hypothetically) if things could be improved, or are we at the peak already? I think certain things, like the web browser, could be made much better if we rethought it and simplified it. Of course, we are fighting a huge battle against legacy, but big things have been killed in the past (see Flash)


I agree about the web browser, but unfortunately there's a tradeoff between features and stability. A browser from 2016 has to do way more than a browser from 2006 or 1996. It's supporting much more of the world's interaction now, in a very literal sense. So we have gained a lot, but it comes at a cost.

Both Firefox and Chrome have had severe stability issues, and I hope that when this period of rapid web evolution subsides, there will be a rethinking and consolidation.

I believe Douglas Crockford said that the period of stasis after IE6 and before Firefox was the one of best things for JavaScript, and I think there is something to that argument.

Although I would ask you -- improved along what dimension? Everything is a tradeoff.

I agree that software is much too buggy and slow. As far as being "bloated" and complex, I used to be part of the church of simplicity, but I've come to realize that the use cases for computing are just very diverse. Though I would like to get all these features without the associated instability and performance drop (i.e. paying for what you don't use).

Bjarne Stroustrop expresses this well; he says he hears this all the time: "C++ should be a much smaller and simpler language. Please add this tiny feature I'm missing". I think there's a lot of truth to that.

I'm working on a shell [1] because it's a language that connects other languages. It's the glue that makes some of the complexity manageable. And it's also the language closest to the OS other than C -- you might call it the "second language" that boots up on any system.

Shell is of a horrible legacy mess too. It's an instance of failure by success, much like PHP.

I think the reason for the mess is that piling new crap on top is easier than fixing and replacing old stuff. It's much easier to write a new language than to replace an old language, and probably more fun. A lot of my blog is basically software archaeology.

If people want things to be better, they have to roll up their sleeves and dig into the systems that people ACTUALLY USE and fix them, rather than just proposing "revolutionary stuff" that only adds to the pile.

[1]: http://www.oilshell.org/blog/


Everything I've seen about revolutionizing software points to the likelihood of it succeeding being zero. There's a lifecycle to it, and what we preserve are data and protocols, but not the incidental complexity of the systems themselves. At every point of the continuum, computer technology is more complex than it has to be in an academic sense("why have a computer at all if you can do equations in your head?"), but it solves enough problems that it stays alive.

Or to see it from a different light: once it's programmable, you've doomed it to die, the more so the more programmable it is. And this is borne out by how fast we burn through hardware. Our code survives the best where it's more driven towards a known destination format - e.g. an old TeX source document is more likely to be rebuildable into a rendered artifact than equivalent C code into usable software.

The Lisp or Smalltalk attitude to this - which is to remove code/data boundaries altogether - mostly seems to add further uncertainty and less room for curated archival. Either the whole thing runs or it doesn't.


Good points - see my comment below ("With regards to standing on the shoulders of giants...").

I wonder if there is some sort of enforcement of rules you could apply that would result in a programmable thing maintaining a relatively clean / non-bloated state. Of course, once you introduce more rules, you reduce freedom...


These are good comments.

>A righteous operating system should be a programming language.

Like the Commodore 64, VIC 20, TRS 80 and many early home computers.

>Any decent software environment ought to have a fully featured database, built in, and no need for a 'filesystem'.

Microsoft tried to have MSSQL as a file system in Windows called WinFS.

https://en.wikipedia.org/wiki/WinFS

If he's referring to non-relational, then most FS as we know it are key/value databases. You put in a key: c:\files\myfile.txt (/usr/root/myfile) and you get a byte array. It's more sophisticated than simple numeric tokens because it adds a 3d structure in the key.

Part of the reason for these seemingly odd things is we stand on the shoulders of giants that came before us. Because of this, iterative technologies must be backwards compatible to survive.


With regards to standing on the shoulders of giants, the rational part of me wholeheartedly agrees, but the kid in me just wants to burn it all to the ground and rebuild it from scratch. :) Of course, it is nearly impossible to rewrite things better because inevitably you hit one of these traps: not enough time/money, the bureaucracy that is attached to money, the "too many chefs" problem, or the inadequate supply of experienced/battle-hardened minds to put on such a project - i.e. people that have written compilers, operating systems, GUIs, etc - enough times over to know what they are doing.


Whereas the strategy worked for AS/400's. Helped them be more consistent, reliable, and easily managed. I'm more for pluggable storage with filesystems, object systems, and RDBMS's being options to choose from. Microsoft's failure doesn't invalidate the concept, though, so much as their use-case and solution didn't work out.


A database as a file system is fundamentally a good idea. But so much software expects a regular file system, and it's surprisingly hard to implement an efficient regular file system on top of a relational DB.


I'm all for making system calls and IPC cheaper. I'm also all for making system calls indistinguishable from IPC for all practical purposes. I'm in support of those ideas even if one does not take them to their logical destination at a microkernel.

But I just don't think filesystems should be as powerful as databases. At a minimum, no computer exists alone, and sharing database structures is a much more complex action than sharing opinionated byte streams.


I like the built-in database functionality he describes. Wasn't that a feature of the BeOS file system?


No idea, but VMS have something like that, and probably also other mini computers of the era.

I'm only 40 so I'm to young to know :-)

I think that Unix' "everything is a stream" concept won because it's simple and powerful, but it's a good example of the failure of our educational systems that only old geezers have a clue that there are alternatives.

I feel like 99 % of the things I do at work, some guy at IBM should have made a generic solution for in 1979. Actually someone probably did.


Probably not the earliest implementation, but IBM mainframes did indeed have record oriented files in the 1960's.

https://en.wikipedia.org/wiki/Data_set_(IBM_mainframe)


This is actually one of the main reasons why I'm interested in non-PC computers and non-Unixish/non-NTish operating systems :)


Maybe. You can query on BFS to match meta data entries. But for me it's more important that file systems be transactional. So instead of doing a stupid write-to-tmp-file and then move-into-place-on-same-physical-volume-hoping-it's-atomic dance you could just say open('filename').then_write(buf).then_commit() or whatever and it would do the work as a transaction; or fail. But you wouldn't end up with corrupt files with half written chunks.


Not sure, but it does sound interesting. :)


It's not easy to make such a comprehensive list over such a broad category, so cheers to the author for trying. I think it's a good list, but to challenge myself, I tried to think of some things that he missed.

1. Encryption 2. Compression

These might fit into another broad category like OS or Networking, but they are both distinct from those I believe.


If you read the footnotes he states that he was tempted to include those, but then there would be problematic to separate what algorithms would fit his text, so he decided against including any.


Something just occurred to me; encryption predates software so it's not a software innovation in and of itself.


Similarly, compression can be seen as a logical product of Information Theory. But I'm not sure it would go anywhere beyond theory without computers.


Shorthand / Stenography seem like compression without computers.


Encryption: RSA

Compression: JPEG or MP3


Oh, and the FFT should have been included in the article as well.


> RSA

No thanks.


Why? RSA is one of the greatest breakthroughs of the last half century in my opinion.

It was the first implementation of an asymmetric crypto scheme. Merkle came up with the idea of public key cryptography, but he never provided a mathematical technique to achieve that.

Imagine a world where RSA was never invented. We would not be able to exchange secrets over an insecure channel, so naturally, HTTPS/TLS would never have existed - this alone is huge. We would not be able to exchange secure email in a scalable way (PGP). We not be able to securely sign messages for authentication. And there's a ton more. This is without counting the schemes and techniques designed on top of RSA!


There exist other asymmetric encryption schemes; indeed there exists one that is basically non interactive Diffie Hellman key exchange, and DH was discovered before RSA was.


I was under the impression that RSA was first, but I guess I was wrong.

DH is a key exchange algorithm, not a general public-key encryption algorithm. So, as far as I understand, TLS would work, but the others I listed would not.


RSA was the first PKE scheme, but El Gamal, another PKE, is essentially a small modification of DH. Elgamal was discovered in 1985; before encryption really hit the mainstream.


I see. I'm not too familiar with crypto, so thanks for explaining. I guess that the argument is now: would ElGamal have been invented had RSA never existed?


I think so; as I said it's essentially a small modification to DH.


How about the very concept of having paradigms of programming (not the specific paradigms like OO or functional, but the idea that you can create a little Turing complete utopia and live inside that utopia never coming out, and, crazily, it actually works in multiple parallel universes of separate paradigms!)

Data structures. Including Codd Normal form or maybe Codd is important enough to mention. Sort of a very large scale GIGO. Or maybe the more important aspect of data structures or structured data or Codd is GIGO itself as a debugging concept.

The idea of the importance of the algorithm, that its not merely an mathematical amusement that there are different ways to calculate something, but now its of critical financial importance how efficient an algo is in time or space or power ... or patents.

Binary. What stopped man from flying until last century was millennia of trying to copy birds and the assumption that failure implied inaccurate copying of birds. What stopped man from computing in the 1600s, 1700s, 1800s was a fierce dedication to building computation hardware out of decimal. Once we gave up on birds and decimal, like a decade later we're flying and computing like crazy. Henry Ford didn't build a mechanical oil burning robotic horse, cool as that would have been, he built a better horse by building a horseless carriage that revolutionized the world for better or worse. Sometimes the most important breakthru is deciding what to give up on.

Virtual memory was kinda influential. So.. adders and lookups are so fast and the advantages are so huge that its worth sticking a MMU in the way of address lines... would not have guessed from first principles...

Automata theory is kinda important as a theory of computation, like "can you even do it".

Imaginary property or intellectual property or whatever you want to call it. The concept of legally owning an integer. The concept of owning a piece of media but not the license to apply whats stored on it. Business method patents along the lines of "... on the internet". Patented data formats that are separate from the copyright of the data stored in the patented form. Copyleft. FOSS. GPL and its less free competitors BSD and others. Domain squatting. Blockchain financial instruments, or properties or currencies or whatever they are. For better or worse imaginary property has driven a lot of "innovation" and "economic activity", some of it even useful.


These are all important things but either fall under one of the broad categories or isn't a software innovation:

Paradigms of programming, data structures, importance of the algorithm : Falls under programming although I'm not sure why GC wouldn't fall under that as well.

Codd : programming (data structures) and transactions. He explicitly references databases in the transaction section.

Binary: A hardware innovation.

Virtual memory: Falls under OS.

Automata: Falls under hardware or programming and possibly sculpture?

IP: Falls under copyright / patent law.

An aside: I've always known GIGO to stand for Garbage In, Garbage Out.


See also http://danluu.com/butler-lampson-1999/

Here are some more that come to mind:

Sandboxing+permissions+isolation. Being able to easily install, safely use and cleanly uninstall untrusted software totally changed the way most people consume software, on both the web and mobile. (Virtualization is neither sufficient (because I do want to give some apps access to some hardware) nor necessary (see NaCl))

Types. The static-typing wars may still be raging in other domains, but the vast majority [1] of our infrastructure is written in statically-typed languages, and many popular dynamic languages are growing gradual-typing tumours.

LLVM. Writing a compiled language used to be a major effort. Codegen quality was a huge incumbent advantage. Five years ago C, C++ were pretty much it for compiled languages, with some heroic efforts like OCaml and GHC on the sidelines. Nowadays pretty much all the language and database research that I see involves LLVM.

Packaging. Both OS and library package managers have come a long way since 1950, and to a first approximation 100% of programmers use one or the other.

[1] http://danluu.com/boring-languages/


Trying to take your categories and find where they started.

Re isolation. From high-assurance security, I believe the first, secure kernel that was general-purpose was this one:

http://csrc.nist.gov/publications/history/schi75.pdf

Types. Guess that would be ALGOL.

LLVM. BCPL seems to have gotten the idea started with a two-pass compiler that went from complicated stuff to byte-code (O-code) then to assembly. That got ported to PDP-11 in form of B & then C. Wirth further developed it & popularized it with P-code in Pascal/P. Helped non-compiler experts port it to 70 architecture in 2 years whereas compiler experts could target something easy. Did it at hardware level with Modula-2 & M-code assembly. So, either Richards of BCPL or Wirth of P-code get credit for such a concept. Maybe both.

re packaging. I got nothing. Anyone know what the first package managers were that had something inspiring modern functionality? Those ahead of their time.


I believe the first package manager was "pkg" in System V (early 80s).

The modern notion of package-manager-with-builtin-download-and-uprades only came in the mid-to-late 90s, when internet bandwidth grew enough to make that feasible.

The first one of these might've been FreeBSD ports.


I tried to dig it up. I think the terms "version control" and "package management" overlap a lot in how people classify their software or do the histories. They're different with some overlap but it's all muddied up. Interestingly, I found no history in Google of package management from early to modern stuff. There's an interesting project right there for some CompSci student interested in computer archaeology. :)

I did find VC history where the first thing to track software seemed to be SCCC on UNIX:

https://en.wikipedia.org/wiki/Source_Code_Control_System

The rest, from CVS to mainframe stuff to Apollo Aegis's, all showed up around mid-1980's in similar time frame per Wikipedia descriptions. pkg seems strong contender for first, package manager if even version control of source didn't hit others until after its creation.


There were versioning file systems in the 1960s (MIT's ITS among them, and I think Univac's EXEC 8, though I can't find a reference) [1]. Often, new writes to a file bumped a version (or cycle) count in the file system. I don't know, but I wouldn't be surprised if there were tools to do diffing and patching much like modern VCS systems.

[1] https://en.wikipedia.org/wiki/Versioning_file_system


Appreciate feedback on pkg. Far as other thing, I was hesitant to say that given mainframes or minicomputers connected over leased lines might have had something built-in or from 3rd parties. Some tool to easily package and push stuff from HQ to branch offices, warehouses, stores, etc. Just speculating. Strange if nobody automated the process.


Not so strange. It was way less common to push out software the way we do now. Software releases were generally pretty major events. Leased lines were often slow and very expensive. Even into the mid-eighties getting a software distribution or update via magnetic tape was the most common method. Even for software from user groups and other non-commercial sources were distributed on tape. (Source: I did Unix and Tandem programming the early 80s.)

So if we look at methods of tape distribution, even the very first edition of Unix in ~1971 had tap (an early ancestor of tar) [1] which saved & restored filenames, mtimes, modes, and owner info. You could argue that a tp/tar archive is a primitive packaging system.

[1] http://man.cat-v.org/unix-1st/1/tap

The other instance I can think of is the shar format [2] that was used to distribute software over Usenet from the early 1980s. Shar was basically a fancy text file that you piped through a shell (yeah, I know). Because binary compatibility was rare, most shar files resulted in a directory of source code, which you'd then compile -- basically, like a 'modern' tarball.

[2] https://en.wikipedia.org/wiki/Shar


"even the very first edition of Unix in ~1971 had tap (an early ancestor of tar) [1] which saved & restored filenames, mtimes, modes, and owner info. You could argue that a tp/tar archive is a primitive packaging system"

I'd definitely count that as an early packaging system if it had all that metadata with it. So, that pushes it back to 1971 unless a non-UNIX machine had an equivalent before that. Likely a business mainframe or academic machine.


Types are probably subsumed in his definition of programming languages. 1955 FORTRAN had static types, at least distinguishing INT and REAL. I think CHAR came later. 1958 LISP had dynamic types (list, symbol, int).


Filesystems, while they always had rather strange semantics, derived from ease of implementation, not ease of use, also made a leap with the introduction of journaling / logging the early 90s. IBM JFS was the first one here.


The only 3 real-life innovations we actually use:

- Vehicles. Invented in Mesopotamia about 7,500 years ago.

- Buildings. Invented by Neanderthalers about 44,000 years ago.

- Tools. Invented by early humans some 2,500,000 years ago.

I'm mostly kidding - but I do think the 'innovations' listed in the article are rather broad.


Vehicles and buildings are clearly members of the "tool" category. Especially where they intersect. For example, the bikeshed.


Interesting way of thinking.

I'd add Agriculture and Cooking, too. Maybe also Medicine.

Edit: and writing?


Did Neanderthals build dwellings or similar structures? I don't think I've heard that before.



I'm sure it will be debatable but I think Burroughs B5000 should be on this list somewhere.

http://www.smecc.org/The%20Architecture%20%20of%20the%20Burr...

If not for OS or interactivity, I think it should be on there for being the first machine with OS written in high-level language whose CPU was designed to safely execute it. Entry would go something like this:

"5. High-level, safe execution of software."

Benefits: Stuff crashed less. Hackers had harder time. Easier to expand or maintain.

Drawbacks: Cost some performance due to higher level and some extra money for extra hardware.

Exemplars: Ada language + secure, Ada chips. Java processors + OS's. SAFE and CHERI architectures. Maybe NonStop with hardware/software architecture applied to fault-tolerance. Erlang.


I found this essay to be fairly comprehensive, accurate, and informative; some of the items mentioned I have little experience or knowledge of - or didn't even know existed (BGP, for instance, was new to me).

I think the author's heart is in the right place, and I would love to hope to see some of the ideas espoused in the essay come to fruition - but I think that the momentum of history may be difficult to overcome (much like tiling window managers are now popular with some people - though IIRC, Windows 1.0 was a tiling system, then for some reason switched to overlapping windows - perhaps as a result of singular lower-resolution displays being the norm). The author mentions LISP and some other programming languages trying to be an OS, but failing to gain hold commercially for various reasons (though in LISP's case, one could argue Symbolics did succeed to an extent?).

This essay certainly is "food for thought", and I plan to re-read it and think about it; I'd love to see it expanded to book form (or some other format to explore the subject in a deeper manner).

Kudos to the author.



As Clubber said, I'd add a few things to a list that should be absolutely there.

- I'm not sure why he left out machine learning, deep learning and other AI concepts

- The GUI (from Xerox)


Not that I agree or disagree - the idea of the GUI can be considered a logical extension of hypermedia and interactivity.


I think there is a difference between "GUI" in general and Xerox Park GUI specifically. Sketchpad and GRAIL clearly had GUIs of sorts, but Xerox developed UI concepts that were highly generalizable. And just because they became popular doesn't mean they were "obvious". Without Xerox Park all the computers we use right now could have drastically different interfaces.


I'd love to hear feedback about this (even if it's mean).

Based on the notes toward the end of the article on the limitations of a traditional OS with filesystem, I had this thought.

The OS clearly seems to be going in the direction of Smalltalk or a Lisp machine as an environment. What would it look like, or what would be the limitations of a merged OS/programming environment with something like a single SQLite DB as the filesystem?


> The OS clearly seems to be going in the direction of Smalltalk or a Lisp machine as an environment. What would it look like, or what would be the limitations of a merged OS/programming environment with something like a single SQLite DB as the filesystem?

MUMPS, most 4GLs, RPG on the AS/400. I have not programmed in any of these but have heard a lot of bad things about them. Also I think it is a mistake to say the the OS is going in the direction of Lisp/Smalltalk. If anything, Linux containers/Docker is more like the IBM VM mainframe operating systems.


The OS that the author is pointing toward


It would look a lot like emacs, one of those guys with 100+ buffers simultaneously open who does everything in emacs from git to IRC to controlling their music player. They get a lot of humor involuntarily applied to them, they also get a lot done.

If you get the "privilege" of working on COBOL it has native record/DB storage ability which can make life get really weird. Not until Perl in the 90s did we have a DB API so closely and popularly tied to the language.


How many of these still have substantial unrealised potential?

For example, thinking back through various organisational Wikis I've encountered over the years leaves me thinking that practical Hypermedia needs a little bit more work yet...


I think that Operating Systems could do a lot better in representing content as more than just a list of files in a folder, using metadata to package and present the content into something that behaves more like a object you can interact with in different ways, and is more user-friendly, dividable (take a smaller piece of the content on a flash drive for portability, or divide content into two pieces and don't lose the metadata associated with it), automatically versioned, and with a variety of presentation modes (instead of just icons, perhaps it could show 'box covers' or 'banners' or the first few words of the text as a generated picture, etc).

I make all sorts of different types of content, and keeping everything organized is just so difficult, especially if copies exist on different drives and computers. I can't effectively put gigs of video into Git either, as far as I know, so most of it you have to take care of manually (or maybe I should set up a Perforce server, but most end-users and less tech-savvy content creators aren't going to go through with that).


> instead of just icons, perhaps it could show 'box covers' or 'banners' or the first few words of the text as a generated picture, etc

Dolphin does that. The Windows explorer tries to do that as well, but in a much more limited fashion.

Dividing metadata (with the respective privacy issues) and versioning large files are incredibly hard problems. And not the kind of "hard" that a genius in academy writes a paper and it's solved, but the kind where no two people want things to behave the same way, and getting into an usable shared set of assumptions may be even impossible.


Not disagreeing with you, it is very difficult, and people would have difficulty coming to a consensus for sure. But I still think it could and should be attempted, at least with one of the operating systems out there.


A large part of my mind agrees with you 100%.

There's a little voice in the back of my head saying "isn't that effectively what Lotus Notes was trying to do"?

(...which doesn't mean it's a bad idea...)


Imagine if Notes could have been merged with Smalltalk environment and Hypercard environments. Have the Notes/workflow idea with the live Smalltalk environment, and the linking of hypercards (hyperNotes).


In case someone is wondering what is HyperCard.

- http://www.loper-os.org/?p=568 A short article demonstrating how to build a calculator app. - https://vimeo.com/95380430 A 10-min talk showing the full potential of the language.


For a different look at the topic of key software innovations, look at my paper "The Most Important Software Innovations" http://www.dwheeler.com/innovation/innovation.html


I think that relational databases made a huge leap forward. It's built on top of a solid foundation presented and accessed using a 4-gl language that describes intent.

But I guess as a database, it's just a continuation of the hierarchical databases, general storage, and cards and index cards before that...


> FORTRAN’s conflation of functions (an algebraic concept) and subroutines (a programming construct) persists to this day in nearly every piece of software, and causes no end of problems

I'm not really sure what the author's point is here. I think "subroutine" is a more descriptive name than the more commonly-used "function," but I'm not aware of any problems this has caused.


Functions have properties which are completely defined by their inputs and outputs.

Subroutines (I prefer the term procedure) are just a sequence of commands. They may use arguments and return a value but they can also do anything else.

Functions, real functions, can be reasoned about, composed, mapped over collections, and otherwise trusted to behave themselves. Very few programming languages provide strict functions. One may write them, of course, if one is careful, but that is doing work a compiler could be doing for you.


There's also a disconnect with using a PL's concept of a function (procedure, calling convention) instead of the simple intellectual concept (composition) whether or not you're concerned with the mathematical concept (purity); language compilers or runtimes often perform poorly because as a matter of course every function call adds to a stack (failing to perform obvious inline or tail call optimizations), involves a dispatch that might be more expensive than the function itself (just in case inheriting code overloaded it or because every procedure lives in a run-time mutable table), or causes potentially large copies of arguments (to pretend at them being immutable by called code), etc. -- paying costs which in many cases could be avoided.


Functions can be memoized, procedures cannot. This means that you cannot use anything from mathematical logic for compiler optimizations or program checking/linting or verification. It basically means that you go from at least the possibility of proving something about your operating system/compiler/distributed system correct to it being impossible to prove the most trivial things.


I feel the premise of "the OS as a Language" is missing a very critical point: our common definition of what "a computer is" has changed drastically in the intervening years.

Take a close look at those early computers. It is easy to ask "why do we even need an OS" when we don't have extra things like USB, hard drives, or RAM (it's all just storage anyway). What was once called "a computer" is now known as "a processor."

Even within the CPU itself we find layers of abstraction: caches, interpreters, even out-of-order execution! Just because you write assembler today, doesn't mean the processor is going to do exactly what you specified.

Some of these abstractions should be questioned and challenged, such as the CISC instruction set I alluded to above. But we should remember _why_ these abstractions are there in the first place, instead of blindly questioning (or accepting) how things work.


I'd like to see something about dependency and package management here.


I think it is no exaggeration to say that until Eelco Dolstra's 2006 PhD dissertation there is nothing worthwhile to mention: http://nixos.org/~eelco/pubs/phd-thesis.pdf


All programming languages were derived from Fortran? Really?


This point and the few exceptions to it are addressed in the footnotes. I was skeptical too, particularly of the claim that the lisp family tree is rooted in Fortran, but the claims seem to be pretty well researched.


Maybe, but there are plenty of languages I would say are completely independent -- APL, Prolog, and Forth don't seem to resemble Fortran in any major way. At least Lisp shares the same notion of functions=subroutines.


I would add virtual files and filesystems to the list. We take them for granted, but without files we would probably stuck with storing everything in a giant database a=la Windows registry. That would make a lot of things complicated and some things practically impossible.


Slightly relevant:

- MIT Technology Review Emerging Technologies Top (2002-2016): https://www.technologyreview.com/lists/technologies/2016/

- Gartner Hype Cycle history since (2000-2016): http://imgur.com/gallery/noBKI

- Google N-gram viewer (using technology as a keyword): https://books.google.com/ngrams/graph?content=xerox%2Cfax%2C...


This info would be a lot easier to process if the images were juxtaposed and you had a slider to move through the years.

That way you could visually track how a particular item evolved.


This is interesting, and can be sort of hard to really understand until the original examples are really internalized. In the end, we can say that everything is change over some dimension, all systems have input and output. Everything is energy or matter, just like it is data or operations. I think these concepts and patterns are natural logical generalizations, and it's good to remember these patterns when beginning to analyze something, but ultimately it is moot and not helpful for any one specific task we are trying to solve. I can't play Angry birds on ENIAC, but that is what I may want to do, now.


So many amazing ideas were produced in 1955 & 1956.


Makes you wonder what those ideas would have looked like with more powerful hardware. Give Englebert and his NLS team or Kay and his Smalltalk team gigabyte RAM and processors with powerful networking, and see what they would have come up with. It would be akin to taking the most innovative people today and starting over from scratch.


Under the second item - Operating Systems this author states:

"Well-specified interfaces are great semantically for maintainability. But when it comes to what the machine is actually doing, why not just run one ordinary program and teach it new functions over time? "

What is her referring to there? What is the "alternative" a a time-sharing system that the author is alluding to? Does anyone know?


Likely alluding to the Forth development experience where you start with writing a word (function) that toggles a single bit on an IO port and a couple dozen layers of abstraction later you have an assembly line or telescope, but just like Feynman and the turtles, its turtles all the way down, or Forth words all the way down or whatever.

Whereas OS style, you'd pick some layer of device abstraction, probably extremely arbitrary, and on this side its all assembly or C and on that side its all ... something else, Perl or shell scripts who knows, and generally people don't ever cross the arbitrarily placed abstraction barrier. Its an extremely hard arbitrary barrier.

A classic example of randomly tossing down a barrier over the decades, should RS232 API be bit banging a TX bit with your own timing routines on the application side and the OS merely arbitrates access or way on the other side should the OS implement a full TCP/IP stack and your application talks to the RS232 API at the TCP session layer while the OS does all the rest, maybe the OS even implements a NFS server talking UDP in the kernel and your API is being a NFS client? I've lived thru both extremes and everything in between. It is totally arbitrary and mostly just gets in the way.

Back in the old days when the unix kernel was written in C and so were all the apps... Well no one wants to hear this, but those were good days...


I not sure I really understood this but it sounds wildly fantastic if it involves Feynman and Forth :)


forgot porn, ok maybe its not a software innovation. please downvote me




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: