On compiling 34 year old C code

mrspeaker · on Jan 14, 2014

That's so cool! Actually, it also highlights one of the things that concerns me about the proliferation of build tools, frameworks, and transpilers etc in the web sphere at the moment. The other day I was able to run and start modifying a "DHTML" side-scrolling shooter game I wrote in 2004 by double clicking the index.html file.

To run games I'm making now, I have to have the correct versions of Grunt, and all the related dependencies - working CoffeeScript converters etc. In 2024 it's going to be tough to set up a working environment to modify and run 10-year-old code: certainly a lot harder than just double-clicking an html file.

reidrac · on Jan 14, 2014

You're lucky, I wrote a "DHTML" game in 1998 and it won't work today (although I didn't do anything fancy, just DOM manipulation).

I was thinking about exactly that a couple of days ago trying some WebGL. The web is a moving target and you're not guaranteed that anything advanced today will last :(

On the other hand I wrote a shoot'em up in C and SDL in 2003 and still compiles and works as well as it did back then.

phillmv · on Jan 14, 2014

Yeah but comparing C/SDL to JS/WebGL is unfair in terms of maturity of the platform.

JS will soon be 20 years old, but in terms of a language community, outside of jQuery plugins it's only ~5 years old.

You're not wrong, though. I can only imagine the nightmare of finding copies of all the library dependencies will be five or ten years down the road ("oh this needs jquery 1.9 but we're at jquery 5 now, and what was underscore?") not to mention the actual execution environments ("Works best in 20 versions of Chrome ago").

The web at least is a tractable problem. Did you ever have a game that you really liked that only worked in iOS 4? Tough shit.

reidrac · on Jan 14, 2014

C and SDL have changed in the last +10 years, but the old stuff still works. I invested a lot of time into getting autotools right and it looks like it really paid off :)

I don't know why the web is different, but it is, and it makes using the newest stuff a little bit uncertain.

mnx · on Jan 14, 2014

I think the web is different because of how easy it is to roll out new features - most of the new stuff only is guaranteed to work on browsers up to a couple months old, but that's enough since people update often.

bmj · on Jan 14, 2014

I wonder how many of the sites/web apps built with the latest crop of stacks and tool chains will be around in 10 years? I suspect these sites will either "disappear" (that is, no longer be maintained), or will be constantly updated to use the latest, greatest stack/tool chain.

The only situation where I could see this being problematic would be enterprise-type applications written on something like the Java stack or the .NET stack, where you do get tied, at least somewhat, to whatever version of the framework you are using at that time.

bkmartin · on Jan 14, 2014

This is already a problem in the enterprise, has been for years. There are still companies out there stuck with VB6 code. I'm sure there are some that still have older than that laying around. Any time you get a non-compatible jump in a framework/technology you are going to get these legacy situations. Some small businesses don't have much of a choice but to scrap it and move forward with something else. The problem is that the business processes are built around this code, and in some cases the processes might be very efficient for that particular niche or market. This isn't a new problem, and it won't go away anytime soon.

BSousa · on Jan 14, 2014

I wonder why people keep picking on .net/java with this kinds of assumptions. Isn't GitHub running Rails 2.3 or something? Why just 'attack' the .net/java stacks?

bmj · on Jan 14, 2014

I wasn't "attacking" .NET/Java stacks (I'm a .NET developer who has to maintain a .NET 2.0 code base). I was just wondering if the teams that use CoffeeScript, Node, Grunt, etc. tend to adopt bleeding edge stacks, and therefore, will have constantly evolving stacks for their products.

I also wonder if Rails is more analogous to say, the LAMP stack (I'm not directly comparing Ruby to PHP, mind you)?

iagooar · on Jan 14, 2014

As far as I know, Github also uses their own Ruby fork and have build an asset pipeline on top of their current system.

I don't think that the reason why they don't upgrade is because it's too hard. It's just that their current system works fine as is.

On the other hand, I have upgraded dozens of Rails apps from 2.3 to 3.0, from 3.0 to 3.1 and 3.2 to 4.0. Sometimes it was extremely easy, sometimes it required a little bit more effort, but I would not consider it as being difficult or painful.

Arnor · on Jan 14, 2014

My first Rails project was 2.3 -> 3.2 (with intermediary steps if I recall). It took me a couple weeks, and I ascribed most of that to how unfamiliar I was with the stack at the time. Since then, I've become rather candid in my opinion that Rails upgrades are rather trivial. Do you know of any specific red flags that would indicate a more painful upgrade process?

iagooar · on Jan 14, 2014

Well, the most difficult project I had to upgrade was a 2.3 project with a custom Asset Pipeline, with some additional features that Sprockets do not support.

Sometimes the upgrading problems arise due to gem dependencies, as a lot of gems set hard constraints on other gems' versions so it can be a little tricky to update everything at once.

All the other upgrades where rather easy, but of course it depends heavily on the size of the project, the test coverage or the underlying server infrastructure.

steveklabnik · on Jan 14, 2014

2.3 -> 3.0 was the worst transition. Ever since then, we've been taking steps to make upgrades even more trivial. 4.0 -> 4.1 is almost drop-in, for comparison.

steveklabnik · on Jan 14, 2014

dotcom is running 2.3, but my understanding (I don't work at GitHub) is that they're very, very SOA, and use quite a bit of Sinatra.

watwut · on Jan 14, 2014

First question: How is java code more tied to whatever framework it is using then javascript one?

Second question: are you sure javascript code owners will constantly refactor their old finished but functional projects to new tool chain instead letting it live while starting new projects?

golergka · on Jan 14, 2014

I think has to do with the complexity of the software you're working on. The games you're working on right now are likely much more complex that 2004 html game. And as someone who recently tried to get the original Doom source code to run (still unsuccessfully), this is just as true for older code.

72deluxe · on Jan 14, 2014

That is a very good point. The years of web progress has just been new ways to get a result from a server. Repeated again and again, and renamed each time!

4ad · on Jan 14, 2014

> the storage pointed to by the argument will be modified, and the string literal that is passed is not guaranteed to be placed in a modifiable data region. I’m surprised this worked in Version 7 Unix.

PDP-11 had only 8 8kB "pages" (segments really) available for a program. Pages could be marked read-only, but making one read-only was a huge expense. A common trick was to smash code and data that were only used when initialising the program.

simias · on Jan 14, 2014

Wouldn't the rodata and text segments be shared across instances of the same program though?

That would seem like a straightforward optimisation, but then the code presented in the article would break subtly if you were to invoke ed twice at the same time since both instances would try to modify and use the /tmp/eXXXXX string at the same time.

4ad · on Jan 14, 2014

The text segment was shared, the data segment was not shared. There was a single data segment containing both the initialised and uninitialised data, there was no rodata. Rodata came with ELF, it was never in a.out.

Athas · on Jan 14, 2014

Given that Unix was primarily a multiuser system in those days, that would be a rarely useful optimisation. Especially if you expected most people to run ed at all times, since it was, after all, the standard editor.

simias · on Jan 14, 2014

Quite the opposite actually, this optimisation is only useful if you intend to run several instances of the same program. Otherwise each instance has its own copy of the code and rodata, therefore wasting memory.

It's one of the advantages of using shared libraries as well, several programs can share the same code and read-only data, saving in storage and memory space.

dded · on Jan 14, 2014

Would this code not have run on the VAX 11-780? I think the 780 had a 512-byte page size (and therefore I think it ought to have offered write protection at that granularity).

4ad · on Jan 14, 2014

Can't remember the VAX page size, but it doesn't matter (see my other comments) because Unix of that era didn't have a rodata section. There was a single data section (eventually a.out separated data and bs , but data was still writable) that obviously couldn't have been read-only.

Rodata sections are a modern idea, the a.out binary format never had it. A.out explicitly documents the data section as writable. Rodata came with ELF.

FigBug · on Jan 14, 2014

In a somewhat similar vien, a while back I was putting some code I wrote in University (1995-2001) on GitHub. One of the C projects just needed 1 linker flag changed. The Prolog didn't need any changes. I was pleasantly surprised.

I was also trying to put a OS X App from 2009 on GitHub, but gave up before I could get it to compile with the latest X Code. Wasn't worth the effort.

binarycrusader · on Jan 14, 2014

  In the old days it was possible to access members of
  structs inside unions without having to name the
  intermediate struct. For example the code in the sed
  implementation uses rep->ad1 instead of rep->reptr1.ad1.
  That’s no longer possible (I’m pretty sure this shortcut
  was already out of fashion by the time K&R was published
  in 1978, but I don’t have a copy to hand).

Wrong; it didn't just work in the "old days". The functionality described here is actually known as "anonymous struct" and is now a part of C11; but unfortunately not a part of C++11:

http://stackoverflow.com/questions/8622459/why-does-c11-not-...

gcc and other compilers have also supported this functionality for a long time, although as a non-standard extension. This technique is also still widely used in systems-level programming and is hardly "ghastly" although I'll readily admit that there are better ways to accomplish what this particular case was doing.

0x09 · on Jan 14, 2014

This is not an anonymous structure in the modern sense (if you look at the code they are in fact named), it's a side-effect of pre-ANSI C namespace rules. Any struct field could be used to calculate an offset from any other object. From http://cm.bell-labs.com/who/dmr/chist.html :

> Beguiled by the example of PL/I, early C did not tie structure pointers firmly to the structures they pointed to, and permitted programmers to write pointer->member almost without regard to the type of pointer; such an expression was taken uncritically as a reference to a region of memory designated by the pointer, while the member name specified only an offset and a type.

binarycrusader · on Jan 15, 2014

I was responding to the author's general claim that "In the old days it was possible to access members of structs inside unions without having to name the intermediate struct.".

However, I stand corrected in two regards. The first is that as of gcc 4.6+, the sed example the author provided is no longer allowed even if the structs are unnamed. (I was testing with gcc 3.4.) I verified that gcc 3.4 would happily allow me to use the provided sed example as long as I removed the names from the structs.

Second, the C11 standard specifically does not allow for the sed example provided because the structs in the unions contain duplicate members.

I had seen reasoning that hinted at what you're saying in one of the comments posted on the author's blog entry.

Regardless, thanks for encouraging me to dig deeper.

eliteraspberrie · on Jan 14, 2014

Great hack!

Amusingly, version 7 Unix didn't even have a header file that declared malloc().

I noticed something similar with the old f2c code. It is about a decade older than Version 7, and came with its own malloc implementation. [1] Apparently some Unix still didn't provide a good implementation.

[1] http://www.netlib.org/f2c/src/malloc.c

Edmond · on Jan 14, 2014

C is actually quite impressive (or maybe gcc?) in this regard. A few years ago I dug up a 15 yr old C implementation of some image based search techniques by Daniel Huttenlocher, I was able to compile and run it with minimal effort:

http://pugoob.blogspot.com/2008/01/pugoob-image-search-tool....

geocar · on Jan 14, 2014

A while back, I did something similar to v6-ed: https://github.com/geocar/ed-v6

malkia · on Jan 14, 2014

I've learned something about the old days of "C" from the first commenter - e.g. the "struct { int integ; } - and then any pointer that does ptr->integ (from anything) would access it as an integer".

jhallenworld · on Jan 14, 2014

I similar trick: eliminate need for '.' by declaring things inside an array of size 1: struct { int n; } z[1]; Now you can say z->n. This is useful because structs are usually passed by reference. Why should code accessing members of a local variable struct look any different than code dealing with a pass-by-reference struct?

CUViper · on Jan 15, 2014

Cute! However, if you really want complete equivalence between locals and struct* parameters, you might prefer something like "struct { int n; } _z, * const z = &_z;". There's a slight difference of types when using your array version, so things like sizeof(z) and &z don't mean the same thing as when used on struct* parameter.

nly · on Jan 14, 2014

That's one reason why C++ has actual references in addition to pointers.

NAFV_P · on Jan 14, 2014

I was thinking of modernising some 25 year old code, being pre-ANSI it has some of the same problems, like:

  char *malloc();

Of course I am referring to Robert Morris' Worm.

_3u10 · on Jan 14, 2014

The funny thing is the struct hack with regard to variable access is back in Go.

known · on Jan 14, 2014

hackers delight