Hacker News new | past | comments | ask | show | jobs | submit login
High C Compiler – A C language extension ahead of its time (cohost.org)
271 points by stefanos82 6 months ago | hide | past | favorite | 95 comments



I love named parameters. Whenever I use Python or Swift, I hate having to come back to languages like C or Javascript that don't have them.

It just seems like such a clear win to me, that and writing long numbers like 1_000_000. So much more readable.

Why aren't these features more common? I'm a bit surprised Typescript never added them. Are they too difficult to implement in compilers/interpreters? Do they cause too many issues for interop?

I'm honestly curious. Is it just that not that many people find these features appealing?

PD: I know you can emulate named parameters using objects in JS/TS. I don't love the performance consequences of this; it's good in some situations, but terrible in others.


In C99 and later you can use a similar workaround as JS/TS with designated initialization for functions which take a struct by value or pointer.

E.g. for a function:

    void my_func(const bla_t* bla);
   
It would look like this:

    my_func(&(bla_t){ .x = 123, .y = 542 });
Designators can be in any order and can also be optional (missing items are zero-initialized, so the function should be aware of that and replace them with defaults.

C++20 has a much more restricted designated init feature (basically useless for complex structs), but one nice thing in C++ is that default values can be declared right in the struct declaration.


This is one thing I really appreciate about your API design in C/Zig -- the mindfulness of being idiomatic to use by passing (possibly nested) designated initializers.


I hope the C++ committee will fix their designated init feature implementation (atleast for POD structs) because this single feature makes the code 10x more readable.


> Are they too difficult to implement in compilers/interpreters?

No, not really. You just have to be aware of them, decide that you want to implement them, then sit down and implement them.

For numeric literals with underscores the implementation is rather trivial, you fiddle a bit with the lexer to make it skip the underscores during the text consumption and value conversion.

For named arguments the main problem is not in the implementation itself, it's deciding on what semantics you want when you mix named and positional parameters: e.g. if you have f(a,b,c) then what does f(1, 2, a=3) or f(c=1, 2, 3) or f(b=2, 3, a=1) mean?


C23 has C++14-compatible separators in numbers, as in 1'000'000. (Not sure why they didn’t use underscores.) Unfortunately, strtol and friends haven’t been extended accordingly.


The original C++ proposal [1] discusses some alternative syntaxes, including underscores. Underscores would conflict with C++ user defined literals, for example in 0xAA_BB it is unclear whether _BB is a literal suffix for 0xAA since it’s a valid identifier.

[1]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n34...


In python the int parser was quietly changed to accept underscores as valid ints. This has bitten me very bad in production. Imo this was a very bad idea.


Yeah, literal int representation in source is one thing, changing int parsing string methods is something else and probably should require explicit inclusion.


How did this cause problems for you?


One system checked if it was a valid int by running `int(x)`, deciding that it was, and passing it to the other system. The second system ran a python version BEFORE this change, while the first system was a newer version. BOOM.

That was the first issue. But this made me extremely fearful in the general case, as parsing a number incorrectly can have quite dire consequences when dealing with money for example. So suddenly accept any random string of numbers interspersed at random places with underscores is a Bad Thing, and not a tiny little implementation detail to push through the most popular programming language in the world without any discussion.


it seems like, the way you designed your system, literally any backward-compatible change to integer syntax in python would cause the same parser-differential denial of service vulnerability?


C23 features won't help many C users because it's a community that takes an uncommonly long time to move to new versions. Linux only recently moved from C89 to C11.


TBF it makes little sense to change the language standard of an existing project because then you end up with large areas in the old standard and some areas in the new standard. And bringing the entire code base to the new standard is just pointless busywork and might actually introduce new bugs.

As long as Linux doesn't prevent user space code to be written in more recent C standards (which it obviously does) it's fine.

Apart from that, there's also the Visual Studio compiler team to blame because they didn't manage to catch up with C99 until around 2015 (and even then only an incomplete implementation), and only in 2019 decided to 'officially' support the latest C standards again.

In general it makes a lot more sense today to start new C projects in more recent standard versions then it was 5..10 years ago.


Strong agree on all points. The gains from bumping e.g. a C++ version are much higher.


The cost of pumping a C++ version is also much higher. For C, you lose support for older compilers and minor platforms that haven't catched up, but if this isn't a problem, then only very minor (or no) modifications to the source should be required. Newer features can then be used gradually where it makes sense.


> I'm a bit surprised Typescript never added them

TypeScript's pretty insistent on not adding features to the language that can't be "compiled" just by stripping types away. There was a little bit of slack around this early on (enums, most notably) but these days something like named parameters would probably be a non-starter until/unless JS implemented them first.


Also it’s easy enough to pass objects with named properties and destructure the object in the function parameters. So there’s no real need for named parameters anymore and it would just add another duplicative way of writing the same code.


JavaScript has named parameters, in sort of the same way that C has named parameters.

Get used to writing your functions using objects for the arguments:

  function myFunc({ foo, bar }) {}
Then you can call

  function myFunc({ foo: 1, bar: "x" })
Similarly in C/C++

    struct MyFuncArgs { int foo; char bar; };
    void myFunc(MyFuncArgs args) {}
    myFunc({ .foo = 1, .bar = 'x' });


You can use objects to imitate named parameters in JS/TS. I think that's widely used convention and most of my APIs use one object parameter instead of multiple separate parameters exactly because named parameters are awesome. With TS it looks clunky with all those type declarations, but I can live with that.

As to your question, I share this feeling. Naming parameters must be standard feature in every language. Absolute majority of functions would benefit from verbose calling syntax.


One issue with named parameters is that your public function argument names are always part of the public api, and thus any changes are a breaking change. It’s manageable obviously but pretty obnoxious because otherwise variable names are always part of the black box


Depending on your language design combinations of named and optional parameters can lead to some real problems, especially if you want to link to a different version of a library than the one you built with.


D has allowed underscores in numbers for as long as I can remember. They're implementing named parameters right now.


I’m glad named args aren’t in JS/TS.

Objects are good enough and it would add complexity for minimal reward.


Passing objects is less efficient than passing individual arguments in multiple ABIs though, even if the straight byte count would be the exact same


ABI is not a concern for JavaScript.


Even if JS does not have a "public" ABI, I'm pretty sure passing an integer will be faster than passing a new temporary object in at least multiple JS implementations.


All lisps, from Scheme to Clojure, have them under the name "keyword parameters"


Everyone knows lisps are superior, the trick is not to rub it in.


scheme (at least as of r5rs) doesn't have keyword parameters


-


The High C Compiler was designed and built by the geniuses at MetaWare: Frank deRemer (MIT) and Tom Pennello (UCSC), the pair who invented LALR2 parsing. I worked there in the early 90s, great memories of Tom editing compiler code in his windows terminal so quickly that the screen couldn't update fast enough to keep up with him, he continued editing 'in the blind' on unseen code. Those two are/were seriously gifted computer scientists and they gave me my start in the software industry.



On what platform did development take place?


We all had windows machines on our desks, but MetaWare made cross-compilers for the complete cross product of every flavor of CPU and OS. So lots of SunOS/Solaris servers, BSD, IBM 360, Irix, etc servers on the network.


Thanks, I was thinking of early development which I'm assuming is well before the existence of Windows. However I understand you may not have awareness of this period of development.


Likely on UCSC's VAXen running BSD4.2.


I worked using the High C compiler in my first post-university job back in '88-'89, working for PacTel Teletrac, an early street mapping and vehicle tracking division of PacTel. I was formerly one of the beta developers for the Mac, and my job was creating the GUI subsystem for the Teletrac application.

The High C Compiler was fast, I remember that, as I was used to getting a cup of coffee when compiling anything decent sized back in the late 80's, and High C compiled too fast for a cup. Of the non-standard C features available in the High C Compiler: we were not allowed to use any of them, they were seen as proprietary locks that would render our application unable to change compilers if needed.

Anyone remember Ted Gruber's FastGraph?


Generators in C in 1989?

Seems to me some programmer from the future has journeyed to the past to design High C.

Gives me a slight urge to write some High C, an urge I’ll never follow through on.


Generators were introduced in 1975, in the CLU language, alongside with other things such as exceptions (as they are commonly understood today), variant types, and bounds on the generic's parameters.


Case ranges are in gcc for a million years. And also nested functions. I used nested functions a lot but, it was never ported to clang which makes it a bit of a problem for portability, also, nowadays the linker complains about having to make the stack executable, which is another problem; so I had to stop using them.

I also don't understand why the "C standard" hasn't evolved further -- there was a discussion lately about the committee being out of touch. Some of these extensions makes life so much easier. Instead we get stupid stuff nobody need, or have been in a header for 200 years (ie, <pthread.h>, whoohoo, I'm delighted this standard header is now a... standard!)

For nested/sub function, it makes so much sense it's not even funny. Having micro-callbacks right there before the call to qsort() (or any other) is SO much better than having to farm out a context, yet-another-static-function etc...


> (ie, <pthread.h>, whoohoo, I'm delighted this standard header is now a... standard!

<pthreads.h> is POSIX, it has never been - and still don't is! - part of the C standard. In particular, MSVC only supports it through third party libraries.

However, C11 added cross-platform thread support with <threads.h>: https://en.cppreference.com/w/c/thread. I wouldn't call that "stupid stuff nobody needs".


> and still don't is!

"isn't," not "don't is"

You're right, though


d'oh


> I'm delighted this standard header is now a... standard!

The POSIX standard isn't the C standard though (for better or worse).

Especially on Windows with MSVC "obvious" things like this are traditionally a royal PITA (which also means that a lot of C code coming from the UNIX world simply doesn't build on MSVC even though MSVC is now a decent standard-C compiler again).

But yeah, the C standard is just the minimal common subset of what C compilers offer. The good stuff is all in Clang and GCC extensions.


> But yeah, the C standard is just the minimal common subset of what C compilers offer.

Why is this? Most other standards — for languages or otherwise — seem to run ahead of implementation, as meetings of major players agreeing on what they'll all implement next, with the standard constraining the implementations on how to do that.

But the C standard is seemingly instead just an external, descriptivist report on "what you can get away with writing in C while keeping it completely portable."


Don't know, most likely reason is probably that the C standard was always an afterthought and 'reactive' in the C world (e.g. the language already existed for nearly two decades before the first standard was released).

TBH I think the general idea of "let's only standardise what has already been implemented in real world extensions and proven its worth" isn't too bad, especially when looking at the mess that the C++ standard committee made of C++ in the last decade.

But even with this pragmatic approach I imagine it's a highly political balancing act to get multiple powerful compiler vendors to actually agree on the same thing.

See for instance the C23 auto proposal which contains a special exception for Clang's behaviour which clashes with GCC's behaviour:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3007.htm


Problem is, that approach forces the committee to roll in stuff from the C++ spec -- stuff that would have been shot in flame if you had hinted about that sort of things as a proposal C extension..

For example the [[keywords]] that had was rolled in because it was already out there -- imagine proposing that out of the blue and see a people self-propel into orbit...

On other hand the Case A ... B: syntax that has been out there for 30+ years, well there's a proposal for it but it won't be THAT because well it hasn't been invented here really, we'll make up a different syntax for it instead...


One problem is that the important compilers are today mostly controlled by C++ people, and smaller C compilers and C users are underrepresented. Every change in C that is different to what C++ has already is an uphill battle.


Well there is a reason for that. The A ... B syntax requires that you use spaces between A, the three dots and B, else it will be lexed as a single number. C is supposed to ignore whitespace, it wouldn't make sense to add a feature that requires whitespace.


It doesn't requires a space, you could modify the lexer to recognize '..' as a prefix and solve that problem, it's just a choice they made to refuse that it could be implemented by just having one character forward looking lexer.

It is not because it is implemented like that in gcc (requiring a space, possibly so they didnt have to modify the lexer) that it ought to be specced like that.


All C lexers are greedy, i.e. they extend the current token as far as they can. So "0..9" is split as a token "0." (numeric literal) followed by token "." (operator) followed by token "9" (numeric literal).

To do what you suggest, the lexers would need a major redesign, not just a small change.


C still supports EBCDIC, and other potential non-ASCII character sets one might invent in the future.

It promises e.g. that 0..9 is contiguous, but e.g. "case 'A' ... 'Z'" would behave differently.


That has clearly prevented dozens of other languages to support that feature... because no parser would be able to tell you that range isn't valid at compile time?


In EBCDIC, ‘Z’ > ‘A’, so the range 'A' ... 'Z' is valid, but may not do what the programmer intended.

Most other languages never supported EBCDIC or dropped it long ago.

(Also: 'A' ... 'Z' may not do what the programmer intended in most other languages, too. Even ignoring Unicode, quite a few encodings have accented letters that are outside of that range)


> the C23 auto proposal which contains a special exception for Clang's behaviour which clashes with GCC's behaviour

Can you explain what that is? There’s supreme awkwardness around allowing `auto int i;` at block scope to work as before (with `auto` being ignored), but I’ve stared at that proposal a fair bit and haven’t noticed such an exception there.


The details are buried in the proposal and make for an interesting read, but the gist is that Clang's C frontend has a "C++ style" auto keyword where a pointer can be declared both with "auto" and "auto*" (and this was already baked into the existing __auto_type extension before C23 came along).

For instance this is valid Clang C23 code (and entirely C23 standard-compliant):

    int i = 0;
    auto p0 = &i;
    auto* p1 = &i;
...both p0 and p1 have the same type of int*.

In GCC this code doesn't compile because GCC already had a simpler "C style" __auto_type extension which doesn't allow the form "auto*", but this behaviour is also entirely valid C23 ¯\_(ツ)_/¯.

(personally I prefer GCC's version as it makes more sense for C)

This will cause some head-scratching when code that builds on Clang doesn't build on GCC, especially since Clang usually tries to emulate GCC behaviour. But OTH I think/hope that C's auto will only be used when absolutely necessary in "type-generic" macros, and no "almost always auto" nonsense which was fashionable in the C++ world for a while.


Im in the wg14 and this is not true. The later versions of the C standard are not dependable and diverges quite a bit from implementations. There are lots of things in the C standard text that you should not depend on. There are in fact extensions that are more commonly implemented than standard features, and some standard features that hav no known implementations.

I hope to work on a guide for developers who want a “dependable-C” where they can guarantee it will compile and run correctly in as many implementations as possible.


I agree. Nested functions are very useful often lead to substantially better code. With GCC 14 you can also create heap trampolines, so an executable stack is not required anymore when you take the address of a nested function (if you don't take the address, it isn't required anyway).


“ Nested functions can even goto back into their parent function, allowing for nonlocal exits to break out of nested functions like Smalltalk blocks, allowing control flow-like functions to be built using them.”

Wild!

Makes one wonder how something could have evolved if it took C and made it more “functional” and/or smalltalk-ish, without going the dynamic way of ObjectiveC.

I don’t exactly love C++ but I find it very useful. Maybe I could have loved that other never conceived C offspring.


Note that this functionality is already in C — in the form of the setjmp/longjmp functions, which indeed allow you to have a parent function that contains "global" error-handling, and then a hierarchy of child functions designed to only be executed from that parent, that can "jump out" to the parent's error handler. (But where "parent" and "child" here are just a conceptual relationship the programmer is keeping track of, not anything the compiler knows about.)

The difference is that High C gives you lexically-nested function definitions, thus making labels into things like local variables that have scope + shadowing. And so it becomes sensible — necessary, really, to have a coherent semantics — to extend the (runtime, dynamic) longjmp, into a compile-time-targeted, lexical, syntax-sugared version that jumps to in-scope labels (incl. ones defined outside the current lexically-nested function definition.)

If you didn't, then you'd have to decide on what a `goto` to an in-scope but out-of-function label would do instead — compiler error, maybe? — and whatever you choose, it would probably be just as hard to implement as just making it work. (Most of the effort is in extending the CFG to allow some analysis pass to notice that this is happening, after all.)


> A function nested in a generator can capture the yield operation from the outer generator, and the nested function can call itself recursively to traverse a tree or other recursive data structure, yield-ing at each level to produce values for the generator. I don't think you can do that in Python or in many other mainstream languages with generator coroutines.

Automatic yield flattening is a scary feature. Python has explicit control via `yield from`.


Lots of good memories and insights in this article.

The 80s and 90s were a time of interesting experimentation in the Japanese PC market. Sadly it eventually it converged into the grey goo of baseline Wintel, just as the contemporaneous and exciting world of keitai just ended up as the same boring smartphones as everywhere else.

Standardization of things like POSIX and languages ironically has the effect of reducing extensions -- most people write to the standard for maximum portability. I do it too. I'm not a fan of single-implementation languages* (python, pearl, go, rust etc) but at least they do hark back to a time when languages really were written with greater programmer affordances (just as CISC machines had all sorts of affordances for writing applications in assembly code, such as BCD instructions, string maninipulation, etc).

* Yes, I know about gccgo and rust-gcc but both were written to be compatible with the original implementation rather than to an independent standard. So de facto those are single-implementation languages.


All of them apart from generators look like they were taken straight from Ada.


D has _ in integers and that did come straight from Ada. I always liked Ada.


As I was reading I was thinking High C looks a lot like D!

Though D still has no named parameters :D (I know it's in the works).


Dennis Korpel has nearly finished the named parameters. Yes, rest does look like D, such as nested functions! C still doesn't have nested functions. Once you get accustomed to them, they are really useful.


Looking forward to named parameters, they help so much when functions take more than 2 or 3 parameters. By the way, I've been doing some D lately, just as a hobby, but it's been really nice! I had no idea before that D had so many features (I would say it even has too many - but still quite nice).


D does in some ways have too many features. We'd like to prune out some of them.

I agree that named parameters can be very helpful. For example, if there are three bool arguments, it makes calls like:

    test(true, false, false);
into:

    test(format: true, output: false, skip: false);
which makes it much more readable.


Pascal predates Ada, and already had ranges, CLU and Modula-2 also predate Ada and had generators.


i know icon had generators, but i don't think it predated ada. did snobol4, icon's very different predecessor, have generators?


Quite easy to find out, Ada released to the world in 1983, Icon released to the world in 1977.


it's a bit more complex than that; ironman was already written in january 01977, for example https://dl.acm.org/doi/pdf/10.1145/234286.1057816 §5.3.4 p.197 (25/60)

i'm not sure if ironman listed generators as a requirement or not; i suspect they were added later

this paper doesn't mention clu, modula-2, or icon as influences, for whatever that's worth


Since we are doing archeology,

https://dl.acm.org/doi/10.1145/154766.155376

We come to https://link.springer.com/book/10.1007/BFb0021415 from the references, with an interesting table of contents.

Similar approach can be done for other quoted references, if one is really keen in who did it first, and how much the Ada folks were aware of them.

Naturally since not all of them are scanned, it is going to be a lot of work.


Archive.org has two scans of "Design and implementation of programming languages : proceedings of a DoD sponsored workshop, October, 1976, Ithaca". One is at https://archive.org/details/designimplementa0054unse_u4m7/mo...


Thanks for hunting it down.


fantastic!


thanks! i think i actually have the book for the first one in the other room


The concept of C is still king or queen of all languages for most of the programs [I'm don't know about FORTRAN]. Eventhough I'm working with Python, Java & Swift I love C language among all modern programming languages. C is very straight forward. C is really great design. [I hate RUST's annoying syntax]


You might like Zig.


It's like Ruby's grandfather


Yeah, I wonder if Matz had this manual?


No clue what FlexOS is but High C v1.7 seems to be there: https://archive.org/details/mwhc17flexos

Maybe it can be "ported" to some other kind of OS.


It is (or was) a real time protected mode OS for 286 processors.

I actually used that compiler over 25 years ago, it generates Intel OMF files with 8086/80186/80286 - As I recall one can select the target.

So post compiling, one should be able to link the result to target a DOS machine, VM, or even Dosbox.

I'd suggest porting it, but rather simply simulating the FlexOS SVCs, since it doesn't really need any of the RT capabilities, merely the file access, memory allocation / free, and possibly execution (COMMAND SVC).

The package contains two forms of the compiler, a single executable version, and an overlaid version. The non overlaid version would be easiest to get working in such a simulated approach, as one would not have to implement the OVERLAY SVC.


Nice details! Do you have a contact in case someone wants to pick that up as a side project?


Contact details for myself, if that is what you meant, should be irrelevant.

Most (if not all) the needed information will be available in the documents under this directory:

    https://bitsavers.org/pdf/digitalResearch/flexos/
There are also copies of the OS available here:

    https://bitsavers.org/bits/DigitalResearch/FlexOS/


If we had switch ranges in C many other languages wouldn't be...


Well, good news, there is a new proposal [1] arguing for their inclusion in the language. You can express your support for it to the committee.

[1]: https://open-std.org/JTC1/SC22/WG14/www/docs/n3194.htm


I can't read "Fujitsu" without immediately thinking of a public-private partnership post office software debacle where a software error is exploited by unsupervised private prosecutors incentivized by financial reasons (finder's commissions) to wrongly accuse, extort, and imprison hundreds of subpostmasters.


Well, they weren't 'Fujitsu' at the time though, they were still British company 'ICL' ('International Computers Limited') in 1999.

The post hoc insistance on using it's rebrand helps with the 'othering' of making it sound like a foreign company doing all the dodgy shit.


First, what is wrong with you? You're weaponizing a false racism accusation to silence someone making a point while denigrating and disrespecting actual racism.

Second, you're wrong because that was until 2002 when they were acquired and kept doing "dodgy shit" together with the Post Office until 2015. The Post Office recorded the profits, the private prosecutors got a cut of false reclaimed losses, and the software contractor enabled them.


you may be overgeneralizing a bit

there are 18 sovereign countries in https://en.wikipedia.org/wiki/List_of_countries_and_dependen... with smaller populations than fujitsu, and 31 with smaller economies than fujitsu https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nomi...


Reminds me of Martha Wells' sci-fi novel series Murderbot Diaries, and the corporations in the Corporation Rim


What are you talking about? Did you watch the drama or the documentary that finally spurred the UK gov to partially redress this absurd scandal?


no, of course not, watching documentaries is a waste of time that makes you actively less informed than you were before watching them; your irrational knee-jerk response to the word 'fujitsu' is a perfect example of this




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: