Hacker News new | past | comments | ask | show | jobs | submit login
C Is Not a Low-level Language (2018) (acm.org)
293 points by bmer 11 months ago | hide | past | favorite | 396 comments



C is low level for at least one reason: manual memory management. Especially with modern hardware, memory management is at the center of programming. For example, Rust prides itself in being memory safe without a garbage collector, memory management is more or less the entire reason for Rust to exist. Why is C fast? Memory. Why is C unsafe? Mostly memory. One of the big reason parallel computing is hard? Concurrent memory access. Functional programming is often surrounded by plenty of mathematical concepts, but a good part of it is to pretend that objects are immutable when behind the scenes, the compiler works with mutable memory.

In C, every call to the allocator is explicit, that is, if you are using an allocator at all. Compare to oldschool C++, with new/delete and raw pointers, where you may call the allocator explicitly, but still, a lot happen in destructors, automatically. In modern C++, with smart pointers, it is essentially like a garbage collected language in the sense that allocation and deallocation all happen automatically.


> C is low level for at least one reason: manual memory management. Especially with modern hardware, memory management is at the center of programming.

Ok, but even with C we can't actually low level manage memory how the processor does it. You can't tell the processor what to keep or not on what level of cache, what to send to virtual memory, etc. It's lower level than say Python but I don't think it is low level memory management in the way PDP-11 C was.


A lot of that is just a property of modern OSs, with good reason, intentionally not exposing these features to userspace processes. It's not really a function of the language itself.


Hmm, true for virtual memory, didn't think of that, but CPU caches are inside the processor, can even the kernel control it at all?


It's not really a question of whether or not it's feasible, but rather: If the hardware were supports such operations, would they be expressible within the language? If they are, then we should point at the accessibility of the functionality rather than C.

In this case, I could definitely see `set_cache_behavior(ptr, len, options)` being perfectly reasonable, so I'd argue that, again, the fact that we can't do it is more a property of the environment than the language itself.


The reference point for when we talk about low memory languages is not transistor but rather machine code. In this aspect we say that C is low level because one pointer dereference in C translates directly into memory load of machine code.

The fact that modern memory load operation involves cache, protection, memory mapping, etc.. is not a property of language, but rather of the environment (CPU + OS).


That's fair enough, but it doesn't seem to be the most useful that it can be then. Low level to me could mean that we control all the details. But modern processors have microcode between transistors and the x86 machine code. And we can't control all that memory stuff.

But those aren't abstractions that we can treat as black boxes, we need to know them and how to code taking them into account without actually having control inside the black box.


You can control what goes into cache if you want to. The effort to make an open source bios do this in order to have working memory before the DRAM controllers are initialized.[0]

0. https://www.coreboot.org/data/yhlu/cache_as_ram_lb_09142006....


> Ok, but even with C we can't actually low level manage memory how the processor does it. You can't tell the processor what to keep or not on what level of cache, what to send to virtual memory, etc.

neither can assembler so it is useless distinction


There are CPU instructions to pull memory into cache, send cache back to main memory, and Mark things in cache as not worth writing out to memory. All hard to use from C, the last type basically impossible.


> pull memory into cache, send cache back to main memory

I haven't had much of an issue with intrinsics.

> Mark things in cache as not worth writing out to memory

Can you give an example of this in a surviving architecture?



That isn't really marking cache items as not being worth writing out to memory. It's a feature for communicating with non coherent devices to tell the cache that any lines it might have speculatively loaded might have changed out from under it.

"Not being worth" reads like cache perf management, as opposed to uses of DC CIVAC, which are strict correctness issues.


Intrinsics are not part of C though. C is the abstract machine defined in various specs and its syntax and semantics.


C is as much its implementations as it is its spec. And the spec came much later than the implementations, and really was designed as a common subset.


Any language can have intrisics, C is not a special snowflake.

So if intrisics count, than any language goes.


Any of those instructions could be wrapped in a C function or intrinsic if they were valuable enough


Wrapping other languages does not count when describing the hight-of-level of a language. In almost any language you can embed machine code, even in spoken languages.

It would be like 'English is not easier than Chinese, since I can quote a Chinese sentence in English.'


There's a bit of a difference in that C has ways to closely integrate assembly as needed. Most languages don't.


They aren't part of C, they are compiler specific extensions.

Any language can have compiler specific extensions.

Playing fair game, all non toy languages can load modules written in Assembly.

In fact that was the only way in K&R C, compiler extensions came later.


As the article notes, that's because CPUs are designed to run existing C code fast. You could create an instruction set that provided this control, but it might be a tough sell in a world full of C code.


"Memory" itself is an abstraction around a much more complicated model (virtual memory / pages) that most programmers remain ignorant of. (Unless you're working on a microcontroller class system, or other system without an MMU but that's a whole other kettle of fish(.

Even Rust developers like myself labour within the fantasy that a pointer is, y'know, like an address to memory, a real "physical" thing. Rust (and to some extent C++) introduces some management abstractions in front of this in the form of references and borrowing, but the main concept is still there.

In reality the kernel of your operating system has put a giant layer between you and the physical memory, and the "address" and "pointer" are really just handles behind which the OS and MMU do all sorts of shenanigans.

"Raw pointers" really aren't raw. They're handles to offsets within pages, which can be all over the place. It would be entirely possible to walk away from the libc & C model entirely and work in a world of pure references interacting directly with VM subsystem pages as some kind of "object handles" and be much closer to the actual operation of the underlying system.


> Unless you're working on a microcontroller class system, or other system without an MMU but that's a whole other kettle of fish

So C can do actual memory management, your OS or hardware just won't let you. I've done programming for audio effects gear where memory is directly accessible by real address. Often with different memory chips with different performance characteristics (for cost reasons) corresponding to different pointer value ranges. Just because your machine won't let you do it doesn't mean C isn't capable of it.


Raw pointers are how you communicate with your CPU. They are "raw" in the sense that they're just an integer number (not really on the C language level, but they have an integer representation on any actual target like x86) and that you have to synchronize these pointers with the lifetimes of "actual" objects, which are only an abstract concept that your computer doesn't understand.

Meanwhile, virtual memory is as close as you can get down to the physical hardware in terms of normal CPU instructions (i.e., not VM management code). VM as a concept is orthogonal to raw pointers, which can be either virtual or physical.

Raw pointers are nothing like handles. They need to be manually "synchronized" properly with VM management (which happens completely behind the scenes for 99,99% of userspace code) to make sense but it's not like there is bookkeeping overhead in copying or offseting a pointer, like there would be for a "handle".

The point of a handle is that it's use to hold objects, to keep them alive. Raw pointers don't do that.


Would such a model be generally useful, though?


I have thought so for a long time. It could open up execution of functional languages on a truly distributed runtime. Something like the fabled Tao operating system I guess.


Definitely useful in some systems context, especially e.g. database page buffer management.


> It would be entirely possible to walk away from the libc & C model entirely and work in a world of pure references interacting directly with VM subsystem pages

Is this possible in Ring 3? Or would everyone be running in kernel mode at that point.

Even if you do away with that layer, then there may still be a hypervisor lying to the kernel about memory.


C's memory management is its own abstraction. malloc and free are library functions. They're an abstraction not just over hardware (that doesn't have anything bytewise allocated like that), they even abstract away the way operating systems allocate memory.

You don't get direct access to the stack in C either. Stack frames are abstracted away, and you only get longjmp.

If you pay attention to Undefined Behavior and strict aliasing, you don't even get that much access to poking around memory.


BASIC also can do manual memory management, not only that, it had a whole computer generation for itself, in computers not able to have a full ISO C implementation.


So does Python (via ctypes) and pretty much every language we consider "high level". But in BASIC your default approach is "DIM names$(count)" which "magically" manages your memory for you.. which is why we consider it higher level than C.


That is hardly different from malloc(count * size), REDIM exists (aka realloc()) and many BASICs do offer the free variant as well.

In fact, there is hardly any difference between VMS BASIC and VMS C in terms of what is possible, if we want to take the discussion outside of 8 bit versions.


> Why is C unsafe? Mostly memory.

C can't even do all of integral arithmetic safely. It's a language that goes really out of its way to add unsafety.


I'm pretty sure that this is one of the unsafeties that rust borrows from c, even as it attempts to eliminate all the others. Checking every addition adds a massive slowdown, without giving much useful protection against vulns or corruption.


> I'm pretty sure that this is one of the unsafeties that rust borrows from c

But integer arithmetic is safe in terms of Rust.

> Checking every addition adds a massive slowdown

It only does so for debug mode. In release mode, it uses modular arithmetic.


> It only does so for debug mode. In release mode, it uses modular arithmetic.

So it's still treated as an error, just one that has a predictable fallback. I'm really not sure how that's much different from `-fsanitize=undefined`. Broken code is broken, even if it breaks in a predictable manner.

Now if the modular arithmetic had been enshrined as the expected behavior without being treated like an error to be caught, it'd be another matter.


An error does not rewrite your entire code on the assumption that it can not happen.

Signed overflow is not an error in C.


I meant error "in the code".

A chunk of C that causes a signed overflow has an error in it. Seemingly, so does Rust code according to the behavior described in the post I was replying to.

My point is that I question how big is the value gain from having a predictable fallback when we are already within the realm of "this code is considered wrong". This isn't unlike the various arguments against the value of compiler warnings.

That being said, I agree that it's preferable in general, but the difference seems rather marginal to me. That is, within the context of what I'm replying to. I wouldn't be surprised if Rust had a few additional tricks up its sleeve to address this.


You really mean the usual C reasoning of "if this program has an error, what difference does it makes if it returns the wrong value or formats the main disk" (With an implicit "I see none" added on the end)?

Because a caught static error, a runtime error, a wrong value, and C's UB are completely different beasts.


I think modular behavior at run-time is actively dangerous. It is not memory-unsafe, but still unsafe. Having it trap would better. For C, you can tell the compiler to trap for signed overflow.


IMO, that's a job for the type system. But if you can only have one option, clearly an error is the best one.

Anyway, none of those are anything nearly as damaging as C's UB. All of them are reasonable, on the literal sense that you can reason about them, anticipate what your program may do, and defend against the problem (or shrug it off and claim "it doesn't matter here"). You can do neither with by the spec C.


I do not think C's UB is damaging. As I said, you instruct the compiler to insert a trap and then it is not unsafe.

Example: https://godbolt.org/z/Kvrrx19Pa

The UB in the spec is exactly what makes safe use possible without enforcing it everywhere, which is not feasible for C.


That should just be a matter of a compiler knob, no? Such as -fsanitize=undefined (which is the sledge hammer, but there could be more fine grained ones).


It's not. There is no sanitizer on embedded platforms and it turns out, I only use C on embedded platforms, which for me means the UB sanitizer doesn't exist.


Also, that flag enables only unreliable detection of the issue. It won't catch it 100% of the time. :D It also interferes with other flags and checkers like Valgrind and adds bloat to the executable as a bonus.


> But integer arithmetic is safe in terms of Rust.

To expand on this: integer overflow is not UB, it is unspecified. It can result in clamping, wrapping or a panic, depending on configuration at compile time.

> In release mode, it uses modular arithmetic.

And I believe that to have been a mistake. Android enables overflow checks by default and there is no measurable performance impact.


Lol, when C does it, it's "unsafe". When Rust does it, it's "modular"!


For anyone wondering about the term "modular":

> In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" when reaching a certain value, called the modulus. The modern approach to modular arithmetic was developed by Carl Friedrich Gauss in his book Disquisitiones Arithmeticae, published in 1801.

https://en.wikipedia.org/wiki/Modular_arithmetic


Modular/modulus (and also twos complement) is a natural feature of most CPUs. It comes natural by the way logic counters work in hardware. C enforces that by not touching that logic. Rust is the same in that respect. Python on the other hand treats integers as objects with a virtually unlimited amount of digits. That said, float/double precision and logic is still CPU dependent and is used as-is for most languages.


It's astonishing the number of people defending the C "ideals" that demonstrate ignorance about what C actually does. (Is it artificial,in order to willingly miss the point?)

Only some of the integral types in C are modular. If they all where, it wouldn't be a problem.


No, what's astonishing is that now that every CPU worth using has 2's complement for signed integers, the compiler writers are still embracing undefined behavior in the name of piddly optimizations.

I was tempted to specify "unsigned" to ward off the obnoxious pedants, and I see that I should have. C really should be a portable assembly language by now. A very small non-breaking change to the standard, and C's arithmetic would be the same as Rust's.


> No, what's astonishing is that now that every CPU worth using has 2's complement for signed integers, the compiler writers are still embracing undefined behavior in the name of piddly optimizations.

Yeah, that's also astonishing.

Anyway, I've stopped blaming the C developers by now. I just assume they have the goal of killing the language and moving people into a more ergonomic alternative. I don't know their true intentions, but this has been a very predictive assumption.

(I guess any definition of any UB would be non-breaking, so yeah, they could fix all of the language.)


> I just assume they have the goal of killing the language

Sadly, that's my conclusion too. I really wish there was a good "portable assembly language", and maybe that'll be something that targets WASM.


then use -fwrapv if that's what you want...


C and Rust do very different things in this case. C defines overflow of signed integers to be Undefined Behaviour. Whereas in Rust it either wraps (release mode) or panics (debug mode).


Any specific C compiler could do the same in complete agreement with the C standard.

There isn't a guarantee that any given standards-conforming compiler will, but it seems that with Rust there isn't a guarantee what behaviour you get either (it depends on the compile settings). In either language, you can't write code that does signed overflow in a meaningful way (at least not if you use Debug).


I agree with your point, but note that Rust does have an ugly way to do it:

https://doc.rust-lang.org/std/primitive.i64.html#method.wrap...

I'd rather just put `-fwrapv` on the command line that clutter my code with crap like that though.


> I'd rather just put `-fwrapv` on the command line that clutter my code with crap like that though.

The advantage of Rust's way is that it lets you customise addition on a per-operation basis. So you can mix and match wrapping addition with saturating addition, etc.


Yes, I can have functions/methods that do math differently from the default. I could just as easily say that's C's way and mix and match those:

    int64_t x = add_i64_with_abort_on_debug_and_wrap_on_release(y, z);
    uint8_t t = add_u8_with_saturation(u, v);
I'd prefer the default for the math operators be what all the CPUs currently do, and neither C or Rust promises that.


Yes, so fix the C standard... But the compiler guys won't let the anyone fix it, because optimizations.

So either A) the optimizations using UB are important, and therefore C is/will-always-be faster than Rust which doesn't have them. Or B) the optimizations using UB are not important and the compiler writers for gcc and clang are wrong.

You pick.

And of course some insane people advocate for adding undefined behavior to Rust in the name of optimizations. Gross.


> But integer arithmetic is safe in terms of Rust.

It's "defined safety": If a >= 0 and b >= 0 then a + b > = 0. True according to most schoolchildren but not true according to the Rust spec. It breaks the principle of least astonishment and has and will lead to security vulnerabilities.


For C, I tell my compiler to make it trap. Then it is also safe.


But 53 years later, it's added the `<stdckdint.h>` header, offering `ckd_add()` and friends. :D Better late than never!


Integers are an abstraction on top of words; words are perfectly safe.


[flagged]


Post might have signed integer overflow => undefined behaviour => delete all the code in mind. Which can be avoided if you remember that hazard is there.

What's integer divide by zero in C? Would you consider that safe?


> Which can be avoided if you remember that hazard is there.

This is true, but it could be argued that it's harder to avoid signed integer overflow in C/C++ than it is to avoid buffer overflows: you often don't know what types you're working with, due to the usual arithmetic conversions and integer promotions, and checking whether an overflow will occur without causing an overflow is difficult in itself. It's kind of uniquely awful in this respect, basically every other popular language has a more practical integer programming model.


I like integer overflow being UB. Makes it easier to check for by enabling a sanitizer. With it being defined as wrapping, it would be illegal for overflow to cause a runtime trap. Of course, rust mandates the trap on debug builds, which is a fine approach too.


> What's integer divide by zero in C?

It's explicitly left undefined.

> Would you consider that safe?

Yes, because, as all C newbies can easily explain to you, the general rule of thumb is that undefined behavior should be treated as a fault and thus should be handled as a bug.

Hence, your question reads as "Would you consider a bug to be safe?".

In case of integer division, you simply need to check that the divisor is not zero prior to executing the division. Done.


People new to C hear about UB, resolve to not do that, and that seems fine.

People not new to C have noticed that so many constructs are UB as to make it infeasible in practice that any given codebase will be free of UB.

What makes C a fundamentally unsafe language is that conceptually minor errors have unspecified consequences of unbounded magnitude, limited to no compile time detection of said errors, and that even builtins like + are specified to compile successfully into nonsense in some contexts.

Integer operations can be defined to be safe. Whatever integers you pass to your maths operation, you get an unsurprising integer back.

What the language does in the presence of your bug is definitely part of the safety properties of the language.


> What makes C a fundamentally unsafe language is that conceptually minor errors have conceptually unspecified consequences of unbounded magnitude.

There, fixed that for you.

Everybody knows that these scary stories can happen (even though almost nobody has seen them happen in the wild). But for the most part they should be seen as a combination of (typically, obviously) buggy code and compiler optimizer defects, rather than fundamental defects of the language.

> Integer operations can be defined to be safe. Whatever integers you pass to your maths operation, you get an unsurprising integer back.

There is at least 1 arithmetics teacher disagreeing with you.

There is no point, and let me scream again NO FR**G POINT, for a zero divide to return zero. I want it to crash.

And I consider it a compiler defect if the compiler proves that a zero division is happening and proceeds to do a strange optimization instead of reporting it.

It's a fine line to walk though, since there is also the case of legitimately assuming that it doesn't happen, and not emitting the code that triggers the crash. There probably should be compiler knobs to tune the behaviour.


[flagged]


Decent chance integer divide by zero will kill your process. Might even call it a floating point exception. Maybe that qualifies as safe to you, seems not-safe to me.

Compelling alternative would be for it to return zero. Which is safe. But C doesn't do that. So you have to remember to never write '/', and instead call my_divide, which has a branch in it.


> Compelling alternative would be for it to return zero. Which is safe.

Is it? Or is it just another opportunity for a bug to hide?


>Compelling alternative would be for it to return zero.

That's actually way less safe than crashing. If your code doesn't handle the case where the denominator is zero, it is likely that the logic around your division doesn't consider it either. The behavior you suggest would take a rapidly increasing number and instantaneously set it to 0, then silently pass it into the logic that was humming along up to that point.

Since there's no symbol for NaN in integers, there is no safe way to represent `x / 0`, and thus the best way to handle it is to fault. Even better would be if the compiler caught it and warned you.


> Even better would be if the compiler caught it and warned you

Which is probably what the big compilers do (seems reasonable to expect it) -- I can't really know though because the last time I've written code that divides by 0 in an easily provable way is probably a long time ago.


Accurate comment of the day.

C intentionally leaves behavior undefined to allow flexibility in implementation, which allows implementations to take advantage of optimizations that other languages can't because they are over-constrained for the target architecture.

What's the order of operations for sequence of expressions added together? Do they evaluate left to right or right to left? The answer is, very intentionally, "any order is allowed and order may change between executions of a line." This makes it possible to make C implementations equivalently fast on various architectures where, for example, stacks and stack operation ordering make it more efficient to do one or the other. Hell, the language is even usable in an architecture where each of those expressions could be run purely parallel. Of course, the downside is that you definitely can't assume side effects caused by the left side of an expression occurred before the right side.

But the trade-off is that you do have to be extremely sensitive to making assumptions that the specification doesn't actually specify, or you are going to trip over an error in your assumptions that matters on your implementation. The actual "this is undefined behavior so the program deleted your whole hard drive" scenario is incredibly unlikely on a modern desktop or server architecture, but the languages flexibility does mean that you're perpetually one missed trick away from incredibly perplexing behavior.

This is a feature of the language not a bug. It's how the language gets you as close as it possibly can to the speed you would be able to get hand writing the assembly at every step of the process.


It's not a bug only in the sense that it's intentional. It's certainly not "good" or a feature any other language should ever copy again because it prevents you from saying anything formally about the properties of the system at time T, when UB has occurred or will occur at any point along the execution path. As a result, you're always stuck with the qualification that x is true in the absence of UB, something that no automated checker can ever verify and careful review is insufficient to validate in the real world.

Whether or not this is a common issue on commodity hardware is basically irrelevant given C's remaining niche. It's mainly the language of kernels, drivers, and firmware nowadays, not primitive CLI tools that can crash whenever they feel like it. Ensuring system state with high reliability is the raison d'etre of modern C, so the lack of ways to do that without significant caveats is incredibly problematic.


That's the tricky part. As the amount of parallelism required to eke out more performance approaches infinity, the amount we can say about the properties of the system at time T approaches zero. Determinism is actually the enemy here.

> It's mainly the language of kernels, drivers, and firmware nowadays, not primitive CLI tools that can crash whenever they feel like it

On this we completely agree; I would encourage pushing implementation of command line tools in C as close to zero as possible. It isn't necesary to get maximum performance (either because those tools don't demand it or because we have other langauges that make better safety-performance tradeoffs to get us, on average, as fast as C without sacrificing things like memory-access safety).


If we can't say anything about the system at time T, that's a failure of the execution model. I'm not trying to implicate Rice's theorem here or other undecideable problems. C makes it difficult to correctly implement any code (rather than all code) that has a property we want and show that it holds solely with the tools available to mortals. Divine intervention is helpful, but notoriously unreliable.


This is probably my bias just from all of the distributed work I've had to do, but in general being able to say anything about the state of the system at a discrete time T is an expensive luxury in my ecosystem, If not actually infeasible.

You can say things about the expected input and output, you can reason about how it's supposed to get there, and you can do traces of a discrete path that was taken in hindsight. But being able to pause the universe and say "What is the current state of the machine" isn't possible when the 'machine' is spread across data centers in multiple geographic locations and it's barely, if we turn our head and squint and lie to ourselves, possible on a modern CPU architecture with SIMD instructions and branch prediction.


Totally agree, but it's a bit outside C. The C execution model is essentially built around the idea of a single-thread computer with a flat memory space, with very minor considerations for exceptions hidden in UB throughout the standard. Most of the difficult and expensive consistency stuff is punted outside the language or to other standards like Cilk and UPC, though recent versions have at least recognized that threads and parallel processors exist.


Other languages aren't much different in this regard (other than often being less well specified), because it's the reasonable thing to do. You want to give threads a lot of independence for performance reasons. But people just love to shit on C.


Yes, many popular languages are even worse. Thankfully most of those languages (C++ excepted) aren't used for systems programming like C is and hopefully we can agree that the safety story with C++ is mixed at best. Those that have some history as real systems languages (e.g. Java) tend to have pretty decent execution semantics. But just to provide a real counterargument, Ada/SPARK meets the bars I set above. Rust is also a significant improvement over C in practical terms, even if there isn't an official standard you can point to in the same way.


> C can't even do all of integral arithmetic safely.

Your comment reads like nonsense. Are you able to provide what you feel is the best example that substantiates your claim?


When comparing a signed and unsigned integer, the signed integer is promoted to unsigned integer.

So if you have a = -1, b = 1000 and compare the two, a > b is actually true.


Signed integer overflow is UB in C.


Can't we force the compiler to define the behavior with flags like -fwrapv? I use them as a matter of course. Also use -fno-strict-aliasing.


Yes, you can tell the compiler to generate predictably-behaving output for non-compliant code. However, that code is not valid C anymore, but a dialect.

Since this is a discussion about what C is and is not, I think it's fair to limit ourselves to the actual language as-is.


> that code is not valid C anymore, but a dialect

> I think it's fair to limit ourselves to the actual language as-is

I don't agree. Standard C only ever seems to be discussed when some asinine undefined behavior starts causing problems that we have to work around. No one cares about it otherwise.

It's better to redefine C as whatever the compilers accept. Now we can actually move forward and actually fix problems such as "signed integer overflow is undefined" -- just tell the compilers to start dealing with it.


But they do deal with it -- by assuming it doesn't happen. Wrap or trap don't necessarily change that.

At its heart, behaviour that's undefined like this is a source of optimisation opportunities, because we tend (and especially the preprocessor tends) to write code that assumes it won't happen. C does not lend itself well to iterator patterns that elide the range check entirely, and so it is valuable to the optimiser to be able to assume that a variable that steadily increments will not suddenly turn out to have wrapped. So we may (for example) unroll memory accesses[0], secure in our understanding that when we add 1 three times we will get three consecutive numbers, and (where we would normally add another 1) go ahead and add four at the end of the unrolled loop.

If we trap at that point, the behaviour is different from if we accessed each memory location in turn and trapped when we actually wrapped around. In the C model, the behaviour is still clasically undefined, but we've added a trap to hopefully catch that it happened before running too much further. We still can't assert anything about the state of the program after the trap, to potentially recover from it.

People writing performance-sensitve code get frustrated when a compiler trades performance for safety, so we're probably never going to get "safe" C in that sense. In practice though? This form of undefined behaviour only kicks in at runtime. Make sure your software is bug-free and you need never worry about it.

[0]: Imagine you have a 16 bit signed int that you're using as input into computing an index into an array -- you may know that it's never going to overflow, but how do you tell the compiler?


Compilers are typically benchmarked with ubsan and friends turned off, so when we say "C is fast" [with undefined behavior enabled] and "C is reasonably safe" [with undefined behavior defined] we aren't talking about the same language.


Well, what I really want is a trap on overflow, but since C doesn't support that, modern architectures don't have the capability; you can do it in software (like with ubsan), but you pay a performance penalty.


I mentioned the -fwrapv option to make it wrap around. There's also a -ftrapv option that makes the compiler generate traps.


Most popular C compilers support trapping on integer overflow (i.e. https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#ind...), but it does have some overhead as it replaces the native arithmetic instruction with a call to a library function (i.e. https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routin...)


C is a low level language, so when programmer writes "+" they get "ADD" opcode. That's what "low level" means. If one wants "+" to add then do range checks, they can use a higher level programming language (or a special function, perhaps an intrinsic, in C)


There's plenty of cases in C where the use of a + operator doesn't result in any form of add instruction being emitted.


> C is low level for at least one reason: manual memory management.

Manual memory management isn't that much faster than a modern GC, sometimes even slower. I'd argue that C programs are typically fast because there is just less rope to hang yourself with, leaving aside memory safety.

The anemic abstractions provided in the language and the tiny stdlib means it takes a lot of work to achieve something, so developers simply do less. There isn't even a Hashmap (or a proper String), while in Kotlin, you can perform a deep copy of the object graph and convert it to json in parallel in a single line if you so wish.


This is a common misconception that stems from a misunderstanding of why manual memory management is "fast". It has nothing to do with the actual process by which you request and release memory. Manual memory management being faster than GC is a function of controlling memory layout, which is one of the (very) few low hanging optimization fruits that mortals like you or I can do on contemporary CPUs. It also has to deal with the fact that dynamic allocation is slow, and being able to get it out of the way with a single allocation in the first moments of a process's life is an immensely important tool in the optimizer's toolbelt.

> The anemic abstractions provided in the language and the tiny stdlib means it takes a lot of work to achieve something

Which has the additional effect of forcing you to be a bit smarter about how you do things, to be less wasteful. It forces you to contend with everything you want to do, to consider it and the cost associated with it. Built-in, general-case abstractions are nice when under time constraint and hacking something together, but it doesn't make for good software. Not only is it almost guaranteed to be slower than a properly constructed purpose-built solution, but it also removes your view from thinking about the cost of every single thing you're doing. It makes it easier and attractive to overuse abstractions, to over-engineer solutions, and to approach problems from a standpoint where you simply throw the kitchen sink at the problem because that's the only thing you can think of.


>It has nothing to do with the actual process by which you request and release memory. Manual memory management being faster than GC is a function of controlling memory layout, which is one of the (very) few low hanging optimization fruits that mortals like you or I can do on contemporary CPUs.

That is kind of misleading. The difference is that C and Rust support stack allocation, which is essentially an arena style allocator integrated into the language. What the fancy pointer bumping GC runtimes do, the stack does by default. The problem is that escape analysis is difficult and it is difficult to prove that an access to memory on the stack is safe without fundamentally changing the language like Rust does. It gets worse on the heap, where you can have runtime determined ownership.

C programmers like their doubly linked lists, but when you think about it, it is actually kind of a difficult problem to formalize and analyze in its full generality.


It's not about the call stack (which I don't think is that special), but about control over layout of data structures. Languages like Java introduce lots of indirection and "object overhead".


> Manual memory management being faster than GC is a function of controlling memory layout

Control over memory layout and manually allocating and freeing memory are orthogonal issues.

I can optimize memory layout in Java too by using primitive data types instead of pointer-chasing objects, or structs of arrays vs array of structs type of things in order to improve access patterns. I can't control alignment and padding, except indirectly, thats true, but that is not what people mean when they say "manual memory management". Rust gives you control over memory layout, but has "automatic" memory management.

> forcing you to be a bit smarter about how you do things, to be less wasteful

Yes this is what I meant.


Love your words. 1000 upvotes if I could.

For balance, the faster machines get, the more problems are most effectively solved by throwing the kitchen sink at them.


Sometimes wasting a perfectly good kitchen sink on a small problem gives you two bigger problems.


There are some performance advantages a garbage collector can have over manual memory management. If you're just calling malloc/free or, in C++, calling new/delete in constructors/destructors (or using a class that does so, like std::vector), and nothing special, the garbage collector is probably allocating memory faster.

> controlling memory layout

Garbage collectors can compact active memory into one contiguous location and adjust the active pointers to point there instead. You can't do this in a language like C, because you can have arbitrary pointers to anything, and there's no runtime indication of what's a pointer or just an integer. You simply have to prevent memory fragmentation in the first place, which also complicates the logic of the program.

For faster allocation in C, arena allocation based on object lifetimes can be used [1]; in generational garbage collectors, you get similar benefits, but it's just done automatically. In fact, in that linked paper, they found that lifetime-based arena allocation improved the speed of their program (a C compiler) at the cost of increased memory allocation compared to naïve malloc() and free(), which is exactly what garbage collection does.

As a result of compaction, memory allocation with garbage collection is just a pointer bump in the best case, whereas allocation with just malloc usually requires searching a free list or a tree.

[1]: https://www.cs.princeton.edu/techreports/1988/191.pdf


Cache invalidation is hard :) almost as hard as semantically naming things in a way that is clear now, and in the future.


... the famous Two Hard Things, together with off-by-one errors.


concurrency With 3 it's Things Hard.


Concurrency is hard because it's difficult to know if the value you have cached in memory is valid or invalid. As soon as a value is read it's cached.

Therefore Concurrency reduces to Cache invalidation and there are again only two hard problems :)


very interesting observation. never thought about how memory is central to all those concepts and technologies.


In my lifetime the memory hierarchy has grown from 3 levels (register, main memory, disk) to at least 8 levels (register, L1 Cache, L2 Cache, L3 Cache, VM, main memory, disk, web). A lot of people like the guy who wrote this paper have no patience for a language that can still run most 1971 C programs!

Every year people who do not understand why C is so successful try make a name for themselves by breaking what makes the language great (such as the complete agnostic set of control structures). C has successfully remained portable and performant for 50Y+ because of its flat memory model (with a few tweaks such as "volatile".)


C is neither fast nor low-level... none of these descriptors have any meaning.

It's a pointless discussion when you don't care to explain how you use the words that obviously have many related but different ways to interpret them.


Friendly local C programmer and compiler writer here to remind you that C definitely is a low level language for those who understand it and use it professionally. If you’re looking for a low level language, then C (and its relatives) are your best bet.

If you’re new to the language and want to understand how to use it like a pro then ignore this post - it will only confuse you and reduce your ability to use C effectively.


C is a low level language... but it's the wrong low-level language. It gives you low-level access to a machine that your real machine actually has to somewhat laboriously emulate. Such dangly bits and bodges that have been added over the years to give access to the real machine are relatively foreign bodies to C.

I would agree the title is a bit rhetorically rough, though, because being the wrong low-level language doesn't make it a high-level language. WASM would similarly be "wrong" if I claimed it was a direct mapping to modern hardware, but that doesn't make it "high level".

(Although what really frustrates me about C isn't that it's a bad mapping per se. It's from the 1970s, what do you expect? And it is obviously still quite useful for many cases. What frustrates me is that it continues to a large degree to dictate language design and heavily color how language designers see hardware, so too much modern language design is still just reshuffling bits of C around, rather than building languages that work with the hardware well.)


What is C really? A concise syntax to define structs and functions, with a usable expression syntax. There isn't all that much to it, I've always found it ridiculous for people to claim it's holding hardware back.

I don't think I've ever really seen a good argument what developments were prevented by the existence of C as an important compiled language. The one claim I can remember I find ridiculous: that today's CPUs execute instructions in parallel, not serially. Well, for one, C's semantics aren't that serial, there is a large degree of freedom for compilers and CPUs how to schedule the execution of C expressions and statements. Then, there are SIMD instructions exploiting those capabilities explicitly. But also, the rest of the code gets automatically pipelined by the CPU, according to a specific CPUs capabilities. Even though that stuff happens in parallel, any instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code (and address spaces)?


To my aesthetic C is the wrong abstraction because while all those things are possible, the language exposes them via a syntax that makes you think you're writing an embarrassingly sequential program and then tries to hide all of the parallelization that improves performance in the undefined behavior.

I liken it to doing imperative UI development on top of the DOM abstraction in a browser. Yes, under the hood, the browser is choosing when to re-evaluate and repaint interface elements, but you can't touch any of that; you're instead rearranging things in the DOM and memorizing heuristics the browsers use to try and trick the browsers into matching changes to the DOM to visual changes in the browser UI efficiently.

It may very well be time for a low level languages to encourage us to think about programming as "arranging independent blocks of code that can be executed in parallel, with only a handful of sequencing operations enforcing some kind of dependency order. Apart from honoring those sequencing requirements, order of execution or whether execution happens in parallel is undefined."


> "arranging independent blocks of code that can be executed in parallel, with only a handful of sequencing operations enforcing some kind of dependency order. Apart from honoring those sequencing requirements, order of execution or whether execution happens in parallel is undefined."

Isn't that exactly what is happening?


More or less, but nothing about the design of the language puts that frontmost. Instead, the language is designed to make the developer think they're operating on an embarassingly-sequential machine and only the vast amount of undefined behavior in the language spec allows the compiled output to be parallel.

It's the wrong abstraction for the job and properly using C in a way that takes advantage of it requires unlearning most of what people think they know about how the language works. I'd like to see more languages that start from a place of "Your code can execute whenever the computer thinks is most efficient; don't ever think you know the execution order" and then treat extremely-sequential, deterministic computing as a special case.


I think you're just making wrong assumptions. Any C programmer worth their salt knows that that both compilers as well as the CPU introduce a lot of reordering / instruction-level parallelism as optimizations.

You can SIMD / multi-thread explicitly as much as you feel like, but you'll soon find your productivity diminishing, which is not a language fault.


I don't want to SIMD / multi-thread explicitly.

I want my language to have low-level abstractions like "pack data into an array, map across array. Reduce array to a value." Those are abstractions a programmer can look at and go "Oh, the compiler will probably SIMD that, I should use it instead of a for loop." In contrast, C will auto-unroll loops. Unless it doesn't. Go memorize this pile of heuristics on popular architectures so you can guess at whether your code will be fast.

I want my language to have low-level abstractions like Orc's parallel and sequential combinators, so that when I need some operations sequenced I can force them to be, when I don't I can let the compiler "float" it and assemble the operations into whatever sequence is fastest on the underlying architecture; I don't have to memorize a bunch of heuristics like "the language allows arbitrary ordering of execution for either side of a '+' operator in an expression, but statements are executed sequentially, unless they aren't it depends on the contents of the statement."

In short, I want my language to ask me to think in terms of parallelism from the start so that my mind is always in the head-space of "This program will be executed in nondeterministic order and I shouldn't assume otherwise."


> pack data into an array, map across array. Reduce array to a value."

These are abstractions that you've been able to enjoy for a long time, by using higher-level languages like C++ or Rust. So C didn't prevent the feature, after all.

You could argue now that C has prevented CPUs from implementing these abstraction (because arguably, C cannot express them), but I would like to ask first how you think it should be done, and why it's not a good idea to implement it on the language/compiler level how it's currently done?

If there comes up a new way that lets CPUs understand type theory and magically multi-thread your variably-sized loops by creating a new set of execution units out of thin air, you'll have a point. For the time being, there doesn't seem to exist such a thing, and I can't imagine that the reason why not is C. Rather, if such a thing is nearing practicability, C will have to adapt or slowly die out.

> In short, I want my language to ask me to think in terms of parallelism from the start so that my mind is always in the head-space of "This program will be executed in nondeterministic order and I shouldn't assume otherwise."

This works only in a very limited way in practice. To solve practical problems, you need to combine logic/arithmetic instructions serially to achieve the intended effect. Seems to me that it turned out that most degrees of freedom are more accidental than structured, and it's not practical to manually specify them, when the majority of them are easily recoverable in an automated way.

So that's how you end up with that instruction-level parallelism that is worked out by the compiler and the CPU.


> You could argue now that C has prevented CPUs from implementing these abstraction (because arguably, C cannot express them), but I would like to ask first how you think it should be done, and why it's not a good idea to implement it on the language/compiler level how it's currently done?

As I said top-thread, it's to my aesthetic. These are all Turing-complete languages and you can, in theory, do whatever in any of them. But map-reduce-fold-etc make it much clearer, to my eye, that I'm operating on a blob of data with the same pattern, and it's easier to map that in my brain to the idea "The compiler should be able to SIMD this." Contrast with loops requiring me to look at a sequential operation and go "I'll trust the compiler will optimize this by unrolling and then deciding to SIMD some of this operation." The end-result is (handwaving implementation) the same, but the aesthetic differs.

As you've noted, I'm not unable to do this in C or C++ or Rust (in fact, C++ is especially clever in how it can use templates to inline static implementations recursively so that the end result of, say, the dot product of two N-dimensional vectors is "a1 x b1 + a2 x b2 + a3 x b3" for arbitrary dimension, allowing the compiler to see that as one expression and maximize the chances it'll choose to use SIMD to compute it). But getting there is so many layers of abstraction away (I had to stare at a lot of Boost code to learn that little fact about vector math) that the language gets in the way of predicting the parallelism.

> If there comes up a new way that lets CPUs understand type theory

CPUs don't understand type theory. Compilers do and they can take advantage of that additional data to do things like unroll and SIMD my loops right now. My annoyance isn't that it's impossible, it's that I'd rather the abstraction-to-concrete model be "parallel, except sometimes serial if the CPU doesn't have parallel instructions or we hit capacity on the pipelines," not the current model of "serial, and maybe the compiler can figure out how to parallelize it for you."

> To solve practical problems, you need to compile logic/arithmetic instructions serially to achieve the intended effect... Seems to me that it turned out that most degrees of freedom are more accidental than structured, and it's not practical to manually specify them

I agree... Eventually. There's a lot of parallelism allowed under-the-hood in the space between where most programmers think about their code, as evidenced by C's undefined behavior for expression resolution with operators of the same precedence.

Whether degrees of freedom evolved by accident is irrelevant to whether a new language could specify those parts of the system (sequential vs. intentionally-undefined ordering) explicitly. C, for example, has lots of undefined behavior around memory management; Rust constrains it. It's up to the language designer what is bound and what is allowed to be an arbitrary implementation detail, intentionally left undefined to give flexibility to compilers.

Even the modern x86 instruction set is a bit of a lie; under the hood, modern CPUs emulate it by taking chunks of instruction and data and breaking them down for simultaneous execution on multiple parallel pipelines (including some execution that never goes anywhere and is thrown away as a predictive miss). CPUs wouldn't be nearly as fast as they are if they couldn't do that.

I'm not advocating for breaking the x86 abstraction; that's a bit too ambitious. But I'd like to see a language take off that abandons the PDP-11 embarrassingly-serial era of mental model in favor of a parallel model.


Yes, in Haskell.


> A concise syntax to define structs and functions, with a usable expression syntax. [...] I've always found it ridiculous for people to claim it's holding hardware back.

You just looked in your fish tank and declared what the weather is going to be like in the Atlantic ocean... Like... these things have nothing to do with each other. The fact that C has functions or structs has nothing to do with it being awful influence on designing hardware.

Here are some reasons why C is awful.

* It believes that volatile storage is uniform in terms of latency and throughput. This results in operating systems written with the same stupid idea: they only give you one system call to ask for memory, and you cannot tell what kind of memory you want. This in turn results in hardware being designed in such a way that an operating system can create the worthless "abstraction" of uniform random-access memory. And then you have swap, pmem GPU's memory etc. And none of that has any good interface. And these are the products that despite the archaic and irrelevant concept of how computers are built have succeeded to a degree... Imagine all those which didn't. Imagine those that weren't even conceived of because the authors dismissed the very notion before giving the idea any kind of thinking.

* It has no concept of parallelism. In its newer iterations it added atomics, but this is just a reflection of how hardware was coping with C's lack of any way to deal with parallel code execution. C "imagines" a computer to have a CPU with a single core running a single thread, and that's where program is executed. This notion pushes hardware designers towards pretension that computers are single-threaded. No matter how many components your computer has that can actually compute, whenever you write your program in C, you implicitly understand that it's going to run on this one and only CPU. (And then eg. CUDA struggles with its idea of loading code to be executed elsewhere, which it has to do in some very cumbersome and hard to understand way, which definitely doesn't rely on any of C's own mechanisms).


> It believes that volatile storage is uniform in terms of latency and throughput.

It doesn't, I don't think it even mentions terms like latency and throughput.

> they only give you one system call to ask for memory, and you cannot tell what kind of memory you want

What?

> Imagine those that weren't even conceived of because the authors dismissed the very notion before giving the idea any kind of thinking.

Such as?

> It has no concept of parallelism.

C can function with instruction-level parallelism, CPU-level parallelism, process/thread-level parallelism just fine.

> C "imagines" a computer to have a CPU with a single core running a single thread, and that's where program is executed.

Given that a memory model was introduced in C11, and that there were other significant highly concurrent codebases before that, I'm having doubts how correct and/or meaningful that statement is.

For sure, when trying to understand the possible outcomes of running a piece of code is when running it in a single thread (doesn't matter on how many CPUs though, apart from performance). That is just the nature of multi-threading, it's hard to understand.

> This notion pushes hardware designers towards pretension that computers are single-threaded.

How do they pretend so? My computer is currently running thousands of threads just fine. It has a huge number of components, from memory to CPU to controllers to buses to I/O devices, that are executing in parallel.


> It doesn't, I don't think it even mentions terms like latency and throughput.

It only has one group of functions to allocate memory, and neither of them can be configured wrt' to what storage to allocate memory from, definitely not in terms of that storage's latency or throughput which would be very important in systems with non-uniform memory access.

Compare this to eg. concept of "memory arenas" that explicitly exists in eg. Ada, but many languages have libraries to implement this idea -- in this situations, instead of using language's allocator, you'd be using something like APR's memory pools <https://apr.apache.org/docs/apr/trunk/group__apr__pools.html>.

> and that there were other significant highly concurrent codebases before that

You are again confused... there weren't such codebases in C because C is bad for this. It's not because nobody wanted it. Highly concurrent codebases existed in Erlang since forever, for example.

> How do they pretend so? My computer is currently running thousands of threads just fine.

Threads aren't part of C language. They exist as a coping mechanism. Their authors are coping with the lack of parallelism in C, which is exactly the point I'm making. Threads exist to fix the bad design (not that they are a good fix, especially since they are designed by people who believe that there's nothing major wrong with C, and the thousand and first patch will definitely fix the remaining problems).

So, not only this is a counter-argument to the point you are trying to make, it's also yet another illustration to how using C prevents designers from seeking more adequate solutions.

> from memory to CPU to controllers to buses to I/O devices, that are executing in parallel.

The point is not that they cannot run in parallel... The point is that C doesn't give you tools to program them to run in parallel.


> It only has one group of functions to allocate memory, and neither of them can be configured

Seriously, have you done any non-trivial C programming? Because those are blatant falsehoods. You must be talking about uni level introduction to C programming, using malloc/free and thinking that's how you "allocate".

> You are again confused... there weren't such codebases in C because C is bad for this. It's not because nobody wanted it. Highly concurrent codebases existed in Erlang since forever, for example.

Just one example I know, take the Linux kernel which had a good amount of SMP support way before C11. I believe they still haven't switched over to the C11 memory model.

> The point is not that they cannot run in parallel... The point is that C doesn't give you tools to program them to run in parallel.

How come then, that my computer is running so many things, many of them written in C, in parallel?


> Threads exist to fix the bad design (not that they are a good fix, especially since they are designed by people who believe that there's nothing major wrong with C, and the thousand and first patch will definitely fix the remaining problems).

The thing that needs fixing is mostly people like you, purporting falsehoods while lacking deeper understanding how it works / how it's used.

Threads are a concept that exists independently from any language. They are the unit of execution that is scheduled by the OS. If a program should be multi-threaded with parallel execution (instead of only concurrent execution), by necessity you need to create multiple threads. (Or run the program multiple times in parallel and share the necessary resources, but that's much less convenient and lacks some guarantees).


> > It believes that volatile storage is uniform in terms of latency and throughput.

> It doesn't, I don't think it even mentions terms like latency and throughput.

Yes, that's the whole point.


No it isn't. Not mentioning the differences isn't the same as acting like they don't exist. Those things are only treated as out of scope.

Not every concept must be expressed in language syntax / runtime objects, nor is necessarily it a good idea to do so. In many cases, it's a bad idea because it leads to fragmentation and compatibility issues. At some point, one has to stop making distinctions and treat a set of things things uniformly, even though they still have differences.

CPUs have various load and store instructions that all work with arbitrary pointer addresses. Whether the address is a good/bad/valid/invalid one will only turn out at run time. There would be little point to make a separate copy of these instruction sub-sets for each kind of memory (however you'd categorize your memories). The intent as well as the user interface are the same.

I think that's basic software architecture 101. (Once you've left uni and left behind that OOP thinking where every object of thought must have a runtime representation).

Btw. C compilers allow you to put a number of annotation on pointers as well as data objects. For example pointer alignment to influence instruction selection, or hints to the linker...


At least for the first point: C has been used extensively with non-uniform storage. Back in the DOS days when we had memory models (large, small, huge, etc...), and today, when programming all sorts of small microcontrollers. A common one I occasionally is AVR, which has distinct address spaces for code and data memory - which means a function to print string variable is a very different from the one used to print a string constant. This makes programs rather ugly, but things generally work.

As for your parallelism idea.. well every computer so far has a fixed number of execution units, even your latest 16384 core GPU still has every core perform sequential operators. And that's roughly what C's model is, it programs execution units. And it definitely hasn't stopped designers from innovating - complete different execution models like FPGA exists, and have a constant innovation in programming languages.


> At least for the first point: C has been used extensively with non-uniform storage

And the results are awful. You are confused between doing something and doing it well. The fact that plenty of people cook frozen pizza at home doesn't make frozen pizza a good pizza.

> And it definitely hasn't stopped designers from innovating

And this is where you are absolutely wrong. We have hardware designs twisted beyond belief only so that they would be usable with C concepts of computer, while obviously simpler and more robust solutions are discarded as non-viable. Just look at the examples I gave. CUDA developers had to write their own compiler to be able to work around the lack of necessary tools in C. We also got OpenMP and MPI because C sucks so much that the language needs to be extended to deal with parallelism.

And it wasn't some sort of a hindsight where at the time of writing things like different memory providers were inconceivable. Ada came out with the concept of non-uniform memory access baked in. Similarly, Ada came out with the concept for concurrency baked-in. It was obvious then already that these are the essential bits of system programming.

C was written by people who were lazy, uninterested to learn from peers and overly self-confident. And now we've got this huge pile of trash of legacy code that's very hard to replace and people like you who are so used to this trash, that they will resist its removal.


You are very confidently making some wild statements that seem to be based on the assumption that only because something isn't specified in a given place, it couldn't be specified somewhere else. That assumption is wrong.


I don’t think it’s fair to blame C for the flat random access memory model. Arguably it goes back to Von Neumann. There was a big push to extend the model in the 1960s through hardware like Atlas and Titan (10 years before C) and operating systems like Multics. And there’s all the computer science algorithms analysis that assumes the same model.


At the time C rose to prominence there was already understanding that memory access isn't going to be uniform, and less and less so as hardware evolves and becomes more complex. Ada came out with this idea from the get go.

Von Neumann created a model of computation. It's a convenient mathematical device to deal with some problems. He never promised that this is going to be a device to deal with all problems, nor did he promise that this is going to be the most useful or the most universal one etc.


You’re echoing my point back at me, though to be fair I should have been more explicit that my examples from the 1960s were about caches and virtual memory and other causes of nonuniform access hidden under a random access veneer.

But we can go 15 years earlier: Von Neumann wrote in 1946: “We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.” https://www.ias.edu/sites/default/files/library/Prelim_Disc_...


>Well, for one, C's semantics aren't that serial, there is a large degree of freedom for compilers and CPUs how to schedule the execution of C expressions and statements.

I thought about the implications of a "parallel" statement, where everything is assumed to execute in parallel and oh boy are the implications big. C's semantics are serial but they contain implicit parallelism. The equivalent is that the parallel statement contains implicit sequentialism that the compiler can exploit to reduce the amount of book keeping needed by the CPU to schedule thousands of instructions at the same time. E.g. instead of having an explicit ready signal and blocking on it, the compiler can simply decide to split the parallel statement into two parallel statements, one executed after the other. Implicit sequentialism! A parallel statement implies that no aliasing writes are allowed to be performed. I don't know what the analysis for that would look like, but in many common cases I would expect the parallel statement to be autovectorized quite reliably.

>Even though that stuff happens in parallel, any instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code (and address spaces)?

Uh, you know we can just encode the program as a graph? Graph reduction machines are a thing, you know.


> we can just encode the program as a graph

What is the output medium for the encoded representation? A linear address space, like a file, or virtual memory.


“instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code”

That is sort of a thing: https://en.m.wikipedia.org/wiki/Very_long_instruction_word

If you have multiple instructions grouped together like this you could think of it as being a 2D array of instructions


I understand your point: Modern hardware tries REALLY hard to pretend it is a simple set of instructions executing one after another. For all the on the fly clever caching, micro-op translation, branch prediction, speculative execution, register renaming, and whatever else, it consistently presents a sane model to single threaded programs. It's difficult to even see the magic under the hood if you tried, and it mostly shows up in unexpected performance discrepancies or race conditions for multi threaded programs. It's all a huge charade...

However, before dismissing this all as a bad mapping to an outdated 1970s model of computation, I'd like to see a good alternative. CUDA has clearly shown that there's an acceptable model for massively parallel data sets, but that doesn't handle branch heavy code very well at all. And FPGAs have a different approach for a completely different kind of problem, but I don't know how you would expose what Apple, AMD, or Intel chips are doing under the hood and have it be at all manageable to the programmer. How is someone supposed to indicate what's next when a pipeline stalls waiting on the previous operation or a cache miss? Is the programmer going to toss micro ops into separate execution units and wait for the results to come out the other side in arbitrary order? Is this an async/await model for every addition or memory fetch? I think it would be complete spaghetti to even try, but I'd love to be shown I'm wrong.

People get all excited trash talking Itanium, but I think it's a lesson that if you try to expose any alternative to the 1970s model they'll just bitch about how there are no sufficiently smart compilers. And of course it got scooped by AMD64 pretending to execute one instruction after another.

And if there isn't a good alternative, I think C (or Rust, or WASM) are a pretty good fit for what you've actually got to work with at the low level.


"I'd like to see a good alternative"

Me too. See my other reply below.

That said, "This is a good match" does not logically follow from "This is a bad match but it's the best match we have." It's still a bad match.


Itanium was the wrong design not because of the reasons you suggest, but because it assumed that good performance is something that can be statically baked into the object code, and therefore that there is such a thing as a sufficiently smart compiler for an explicitly parallel processor running general purpose code. But evidently the designers were wrong.

Which is not to say that explicit parallelism is bad, it’s clearly useful for GPUs and vector code (and compiling to SVE is not too different from itanic). But it doesn’t work as well as dynamically discovered parallelism for non-vector code.


It seems to me there's some uncharted territory between "massively parallel" (GPU) and "unpredictable branching" (CPU), and the corpse of Itanium is laying there as a warning to anyone who would go exploring in that area. Maybe it's just a desert, but I doubt it.


What language(s) in your opinion have the right low-level where the access to the real machine doesn't feel foreign?


Assembly is the right one. You have direct access to the machine ISA, including the weirder status/control registers and whatever trap/syscall corresponds to. Assemblers are somewhat powerful - can define data layouts somewhat like structs, abstract some things behind macros, add pseudo-instructions to put friendlier names on some things. Maybe the ISA expects you to build constant integers out of arithmetic, the assembler can give you a 'const' instruction which expands to said arithmetic.

I have a pet theory that lisp macros over an assembler is the right high level language for systems programming but that hasn't made it off the whiteboard yet.


The problem is that assembly is CPU dependent. The benefit of a high-level language is that it's CPU architecture independent.

For smaller CPUs that can't support all of C's assumptions natively anyway, like the 6502, which can't multiply or do floating point arithmetic, something like what you describe would likely be best. It reminds me of the COMFY 6502 compiler: https://dl.acm.org/doi/pdf/10.1145/270941.270947


Therein lies the interesting design space, yeah. Control flow, data layout, semantics of basic blocks are sometimes target agnostic and sometimes not. Sometimes a div instruction needs to turn into a runtime call, sometimes it doesn't. Sometimes you want explicit control of registers, sometimes any gpr is fine.

Which I suppose yields the other language choice. Instead of C or assembly, write in something very like a compiler IR. Ymmv persuading non-compiler devs to code in SSA form directly.


But even the exposed machine ISA for x86 is way different these days than the underlying hardware.


> I have a pet theory that lisp macros over an assembler is the right high level language for systems programming but that hasn't made it off the whiteboard yet.

I'm having a little trouble visualising this. Don't many assemblers provide macro-instructions already?


Assemblers come with text substitution macros. Lisp comes with program rewriting macros. Same basic idea that it's all expanded away by runtime, but using something like scheme as the compile time metaprogram that emits the machine specific assembly. There are a few s-expression based assemblers out there so probably nothing novel.


Well Forth is possibly the most minimal VM over a platform, as evidenced by openfirmware.

It does have problems scaling though, in that if you've seen one Forth, you've seen one Forth ie. The variations required to fit a platform make them semi-incompatible. Also, only global scope, no types and no built-in threadsafe constructs are limiting.

That's not to say that a more lispy Forth wouldn't be useful though, in that a concatenative syntax allows us to pass custom datastructures around like APL, and CPS (delimited continuations with lexically scoped dynamic binding would come from the lisp side (see https://github.com/manuel/wat-js).

Memory management in Forth can handle multiple memory types eg. https://flashforth.com/ so adding something like ref counting (https://github.com/zigalenarcic/minilisp/blob/main/main.c) to handle the dynamic list side of things might mesh well.

In any case, if you're looking for a self hosting lisp that runs on bare metal, https://github.com/attila-lendvai/maru has been out for a few years.


Assembly, or what ESPOL was already doing in 1961 a decade before C was even an idea, compiler intrisics.

So taking out Assembly, any language can have hardware capabilities exposed as compiler intrisics, that is nothing special about C in that regard, only the one many people are commonly aware of because they don't to be educated in compilers.


The only one I can think of would-be Assembly, but I don't do much low-level work, I code in much higher-level languages. Genuinely curious what the answer is.


For portability by far the vast majority will say C. In my experience the C compiler optimizer will do a lot with -O2 or -03 but it can't always infer correct SIMD optimization for some operations and on occasion you have to drop down into x86_64 assembly. The idea is do most things in C and use __asm__ to write custom assembly instructions. With #defines around the assembly for each processor you plan on supporting you get the benefit of both C optimizers and portability across different CPUs as well as any future updates to the compiler in the future. But the compiler writers will say to use intrinsics and extended assembly rather than raw assembly because when you write raw assembly your code becomes a black box to the compiler and it can't infer optimizations for your surrounding code that interfaces with the assembly. I think C with extended Asm is likely the most sane combination if you don't mind the slightly ugly syntax and the fact that there could be differences between compilers. That being said, C with compiler intrinsics seems to be a happy compromise for those that don't want to shift around registers and deal with the stack.

https://gcc.gnu.org/wiki/DontUseInlineAsm

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

I don't use Rust so I can't comment on it but it also has compiler intrinsics + a memory safety model. It's compiler is really dog slow last time I used it so I hope that has improved but nobody is really killing C any time soon, even if there's enthusiasm for memory safety. Sooner or later you have to delve down into the depths of Narnia and you may as well get comfortable dealing with memory.

My likely favorite combination is Python + C (for the speed stuff) + Intrinsics (for the really speed stuff).


I don’t think there is one, and if there was, it would run the same risk of becoming anachronistic as C itself has.

Besides, C and its compilers have very much influenced CPU designs and optimisations, so it’s a world with a feedback loop.

Maybe the loop will weaken somewhat in the new LLM craze.


Per my last paragraph, I am not convinced about any of them.

One of these days I really need to post my "ideas for languages" that I've got banging around on my hard drive, but one of them is "a language that deals with the increasingly heterogeneous nature of the computer". You've got the CPU, the GPU, efficiency cores, whoknows what else in the future (NN cores), and it's only a small hop from there to consider other computers as resources too.

Full disclosure: I have no idea whatsoever what this looks like. Especially in light of the fact that you need to build not just for the exact machine you're developing on but for machines in the future as well. Some sort of model of what is being computed and some guestimate at the costs? (Something like an SQL query builder where you declare your goal and it does the computation about what resources to compute it with?) It's also possible that the huge gulfs in performance between all these parts are just too large to bridge and manual scheduling of all these resources is just the only choice.

Even just within a CPU it's rather annoyingly difficult to use vector-based code in modern languages. Perhaps something like an array-based language, but one that discards that field's bizarre love affair with single-character (if not outright Unicode) operators and can be read by a normal human, and just affords writing code in a style that SIMD becomes a sensible default rather than something the optimizer laboriously reverse engineers from your conventional imperative code. (Array based programming could really use a "for humans" version of those languages in general.)

To some extent, just sitting down for a year to learn modern assembler and starting from the very, very bottom once again to build a high level language, rather than starting with C and building "C, but ..." which is pretty much every modern language being developed, would be an interesting exercise if nothing else.

Another little example is I think Jai was supporting structures-of-arrays instead of arrays-of-structures, though I don't know if they kept it. I'd like to see a language where the language-level data structures are explicitly viewed through the lens of "how I serialize these into memory", rather than the data structure implicitly creating such a specification by how it is defined, so for instance you could swap out a SoA to an AoS by swapping only the way the compiler serializes to RAM and not any of the rest of the code. Obviously you provide defaults that look like modern languages, but with this you could directly implement things like tagged unions with custom bit layouts, or theoretically, directly accessing gzip'd data by specifying that this data structure can only be accessed sequentially but as long as that's what you do you don't need to directly unzip it, etc. This doesn't directly answer "how do you utilize modern hardware correctly" but gives you tools to potentially create a better match than what compilers give by default.

Again, to be clear, this is crazy pie-in-the-sky far out ideas that I do not have an implementation in mind for, but it's the sort of thing I'd like to see more experimentation with on the fringes of language dev. (And I only wish I had time to do it myself. Unfortunately, I simply do not.)

(And, as the sibling comments point out, yeah, assembler technically, but that's kind of a cop out.)


> To some extent, just sitting down for a year to learn modern assembler and starting from the very, very bottom once again to build a high level language, rather than starting with C and building "C, but ..."

Right so build a 'union' of what's available and somehow try to fit it in a unified model. I was hoping there was at least some theoretical PL answer of a unified model. But can't be because all manufacturers and industry sub-groups are doing their own thing.

> assembler

Yes that's definitely a cop out.


TAOS was a 1990s system for heterogeneous computing https://en.m.wikipedia.org/wiki/Tao_Group

but it predated the rise of GPGPUs and vector units so it didn’t tackle data parallelism and array processing.


Yeah, I don't want to pretend this is a totally new idea. There's not a lot of totally new ideas.

But there was a lot of things tried in the 1960s and 1970s whose only fault was that they were simply too early. For example, people were researching neural nets back then. They basically got nowhere. In hindsight, they never could have, simply because it was too soon and the requisite power wasn't there yet.

A phone is more heterogenous today than I think even supercomputers were in the 1990s, and the trend is only increasing diversity, with neural processors on the near horizon and quantum on the far horizon (as it seems quantum processors are far more likely to end up functioning as a sort of fancy "accelerator card" than their own CPUs). & honestly even CPUs are almost viewable as their C subset and their vector processing subset, and even "within" the same CPU the two don't always cross particularly gracefully.


Keep in mind C is just basically assembly but encapsulated in a pretty package. If you create small executables and dump out the generated unoptimized assembly you'll be surprised just how simple it is. It pretty much just encapsulates the ideas of System V binary compatibility and then keeps going. So developing a language from scratch and skipping the use of C would really likely be just causing yourself more pain than you need as you're going to have to replicate all of the things C does anyways, so why not reuse what the experts have already done. And you get a lot of cross CPU and cross compiler portability.

What you want is the idea of bootstrapping. Write your compiler in C, then as your language specification is developed enough, dogfood your own compiler. Write your compiler in your new language, then compile itself. This is called bootstrapping and is how many languages are developed. Once you are fully bootstrapped you can drop C altogether.


> It gives you low-level access to a machine that your real machine actually has to somewhat laboriously emulate

Isn't C the language (x86_64) processors are designed to be fast for? Sure they added a large amount of abstractions but since they were made for C is there any language where the processor doesn't have to laboriously emulate?


> Isn't C the language (x86_64) processors are designed to be fast for?

Yup

I mean they also optimize for Java and JS and .NET and probably Swift and Rust.

But C still takes precedence, I bet


> Isn't C the language (x86_64) processors are designed to be fast for?

Nope. They compete on performance in C++ (games mostly), Java (enterprise SKUs, but same core architecture), and JavaScript (browser benchmarks even though raw JS performance is a very small part of browser responsiveness...)


Nothing added to machines since the invention of C is foreign to C. In fact, C is hardwares most favored customer. Chip designers tend to favor tuning for traces of instructions generated by C compilers. Some architectures, like RISCV, are so overtuned for C and nothing but C that they forgot to add some instructions (like add with overflow check).


>they forgot to add some instructions (like add with overflow check).

If you actually read the spec, you would have found that they didn't "forget" these.

They carefully studied them and judged the encoding space is better used elsewhere.


I did read the spec. They did forget them.

The “studies” failed to consider non-C languages. These people had no clue how widespread overflow checking us and how much more widespread it’s set to become because of the security upside.


Multiprocessing. Atomics. Vectors. GPGPUs. All foreign to C when they were introduced.


I don't think any of those are foreign to C since:

- All of them were designed with C in mind, so much so that in many cases the C implementation of those features was the first implementation of them. The first SMPs were programmed in C with C APIs. The first time I did atomics was in C. When vector APIs are introduced, they're usually exposed to C first. Etc.

- All of those features fit more elegantly into C than any other language. C runs on GPUs so naturally while most other languages don't run on GPUs at all. So, the things you list are examples of features that are more native to C than they are foreign.


Your friendly wg14 member here. It is a low-level language, but it is not a portable assembler. If you think you what you will write will have a one-to-one relationship to assembler you will run in to trouble. If you want a deeper dive in to how these things can trip you up, watch: https://youtu.be/w3_e9vZj7D8


C programmer and fan of yours.

I agree with you, but if you could convince WG14 to remove a lot of the stupid UB, that would be closer to the case.

(I know you're trying from your "One Word Broke C" article. Which, by the way, is putting up a server error right now.)


> it is not a portable assembler

And it never was!

Just keeping this point in mind would reduce the plethora of discussions about undefined behaviour to the essential, i.e. the useful discussions, i.e. the 0.1%.


Opinion is divided on this. My best guess is that ISO C was never a portable assembler, but the C programming language before standardisation broadly was, and that's how people hold both positions as self evidently true. Different definition of "C".


Programmers are supposed to trust official interfaces rather than undocumented implementation details ;-P


C would have been great as a portable assembler. E.g. if a syntactic + mapped to the hardware `add` instruction, that's pretty predictable! But it doesn't; it maps to the hardware `add` modulo compiler optimizations (like folding and strength reduction, which are done assuming overflow and other tricky parts are UB). Basically everywhere UB is permitted by the spec is so compilers don't have to handle the tricky cases, don't have to give semantics for buggy programs, or even help in debugging, and can make what would be unsound optimizations if the operations truly represented the target CPU's "weird" add semantics.


Just toss enough compiler flags at clang and make sure to occasionally use inline asm snippets to throw off the compiler's optimizations.

Then you're GTG


Depends on what you mean by "portable assembler". It is exactly that in a lot of ways, but exactly not that in others.

I think it's more useful to say that C is a portable assembler, than it is to say that it isn't, considering how it's used in practice and the sort of nasty things C compilers do in order to make that possible.


The author is playing a semantic game.

I don't think the author's point is that "C is not a good language for systems programming." You are not going to have an equivalent to volatile int *dma_register = SCATTER_GATHER_BASE; in Haskell.

The author's point is that the drive to make C and other "model the von Neumann machine" languages execute quickly has made the compiler very complicated (the author is implying that "low level requires simple compiler") and that processors built to make such code run quickly are also very complicated. And those complications carry costs.

In many ways this is a "call to programming model action" and cites GPU as illustrating the potential when "new programming model" and "silicon to support it" are done in concert.


"Low-level" is a word with multiple meanings.

The original one is the one the article uses: low-level languages are non-portable and tied to the hardware on which they run, and high-level languages can target multiple platforms. Under this definition, C is absolutely a high-level language.

My complaint would not exactly be that the author is playing semantic games; it would be that they are clinging to archaic terminology in a way that does more to confuse than enlighten. The "generations" taxonomy is generally more descriptive.

  1st: Machine
  2nd: Assembly
  3rd: General-purpose
  4th: Application-specific
The 3rd/4th distinction gets a bit muddied sometimes, and back in the 80s and 90s people talked about a 5th generation that never really took off. But a couple (I think) clear examples of 4GLs are SQL, HyperCard, and Mathematica.

What I like about that approach is that it mostly breaks languages up according to fairly clear distinctions about when you would use them. And then we can use "high/low-level" as a relative term, where higher-level languages tend to do more to abstract away the details of what the computer is actually doing. That does mean that higher-generation languages tend to be higher-level; all we lose in doing it that way is the ability to have silly arguments about where to place a completely arbitrary (and, frankly, useless) dividing line.

I also like that this way we can recognize .NET IL, WebAssembly, and Java bytecode as very high-level 2nd generation languages, which, at the very least, is fun.

Oh, and Forth is a 3rd generation language. Fight me, Chuck.


5th generation was the label under which the Japanese government threw a lot of money at Prolog and expert systems. It wasn’t a technically-driven distinction from the 4th generation, but rather a wish about what would happen if the project succeeded. 5GLs came about from language designers bidding for research money, saying, try our language, it’s better than Prolog!


>use it professionally

I think this post goes way way way above boringness of day2day jobs.

Yea, this post is not about how to use hammer, but more like curious consideration whether using hammers everywhere is not limiting us (C design)


> Yea, this post is not about how to use hammer, but more like curious consideration whether using hammers everywhere is not limiting us (C design)

Maybe it [EDIT: the post] is, but the title is obviously nowhere near accurate - if C is not a portable low-level language, what on earth is?

[1] It gets reposted everywhere so often I have read it multiple times, and the one thing in common I see is how every know-it-all crawls out of the woodwork to comment on the title, as if the title was something new, deep, profound or even correct.


C is only portable between systems which emulate PDP-11 at hardware level and if and only if you don't use any compiler-specific extensions.

If you use sys calls, work between different breeds of operating systems (UNIX, POSIX and Windows are not compatible with each other), you need to rewrite or wrap relevant parts, or write the relevant part beforehand inside ifdefs to be able "port" it between systems.

The gist of the piece is, hardware is evolving to please C's programming model, hiding all the complexities C is not aware of, and behave like a PDP-11 on steroids. This is why we have truckload of side-channel attacks in X86 to begin with. To "emulate" PDP-11s faster and faster.


It's not even that faithful to PDP-11, either. PDP-11 has unified integer division/modulo instruction (and it operates in double-width: it takes 32-bit dividend and 16-bit divisor and produces 16-bit quotient and remainder, just like x86), it has double-width integer multiplication (again, just like x86), it has instruction for addition/subtraction with carry — nothing of that is available from (standard) C, and it's quite a pity. And also, while PDP-12 it has built-in support for post-increment and pre-decrement for pointers, it doesn't has built-in pre-increment or post-decrement.


I think we'd have the side-channel attacks on x86 even if we wrote in assembler - unless we wrote the assembler specifically with an eye to preventing (the known kinds of) side-channel attacks.

Put differently, I don't think the side-channel attacks would disappear if we wrote in Rust or Haskell or Agda.


The side channel attacks are not a result of programming in C, but the design of the hardware which doesn't upset the view of the system w.r.t. C compilers.

All programming languages, regardless of their type (imperative, functional) or interfacing method with the system (JIT, interpreted, compiled) are not immune from these attacks, because it's the hardware which is designed to emulate PDP-11.

In other words, all programming languages target a modern PDP-11 at the end of the day. If hardware has shown all of its tricks (esp. cache management, invalidation, explicit prefetching, etc.), and lacked speculative, out of order execution, these problems will go away, but getting the highest performance would become much harder and complicated, and even impossible in some cases.

Intel tried this with IA64, with a "No tricks, compiler shall optimize" approach, and it tanked to put it mildly (esp. after AMD64 came out).


Let's say we have two chips. Chip A requires the programmer to handle all the "magic" stuff. Chip B is like current chips; it hides that stuff. Chip B is subject to side channel attacks. Chip A likely is also unless the programmer is very careful.

Which chip would have sold more? I assert that chip B would have, by a massive volume, because it didn't require the programmer to mess with all that stuff.

So I don't think that it's fair to say that the chip is trying to look like a PDP-11 because of C. I think it's trying to look like a simpler chip, so that mere mortals can program it and still get most of the maximum performance.


I think it depends on the toolchain. Itanium didn't sink because of the optimization it needs, but the because of the toolchain which can't do all the optimization.

So, if a complex processor comes with a toolchain which does all the tuning by itself, I think it can sell equally well, because the burden will not be reflected on the developer, again.

So, I think popularity of the language itself has a great impact on hardware design.

AMD AthonXP had an "Optimized for Windows XP" badge on it. GPUs are built upon the programming model OpenGL and DirectX puts forward. Modern processors are made to please C and its descendants, because it's the most prominent programming model.

Lisp even tried to change this with "Lisp Machines", and they failed, because Lisp was not mature/popular enough at that point.

So we can say programming model drives hardware very much.


I believe that the point is the processor was designed to please C (by emulating PDP-11). And this design complicates things immensely, which is how we end up with side-channel attacks on our processors.


>if C is not a portable low-level language, what on earth is?

This question doesn't have to have an answer. The author of TFA apparently believes that a low-level language is one that effectively and clearly exposes the execution model of the hardware to the programmer. Under this definition, no widespread language (except assembly) is truly low-level, and possibly none are.

Which, for what it's worth, is also what I was taught in school. C was consistently described as a high-level language by my professors, even if it is "lower-level" than almost everything else.


The real question is whether you would even want to use a language that effectively and clearly exposes the execution model of the hardware. Not even most assemblers do that as architectures give stronger guarantees that would be implied by the microarch execution model.

Some machines do expose the microarchitecture (or better, there is no architecture other that what is implemented in hardware by a specific revision) and rely on install-time or even JIT code specialization. But especially on this machines it would be insane to try to manually target them as you would have to rewrite your code for every revision.

So, targeting the effective execution model of the machine is out of question. You need an abstraction. The question is whether C is the correct abstraction.


The post argues that there is no portable low-level languages, including C.

i.e. truly low-level languages can't be portable and is bound to the architecture.


It's plausible that a language could expose some general logic behind instruction-level parallelism and cache management — even register renaming — without being explicitly tied to the way one particular architecture does that. I have no idea how to design such a language, but from 10000 meters I think it could be done.

I think the author oversteps his case by suggesting that ILP is an abomination that exists to preserve the availability of C-like languages. In my experience, many algorithms seem to naturally lend themselves to ILP, and I often find myself wondering whether I have typed them in so that these five lines will in fact run simultaneously. One common flaw in critiques of the common C compiler model is that they all seem afflicted by a nostalgia for Lisp machines, when the space of unexplored possibilities is so much larger.


Only when taking into account language extensions that are compiler specific and not part of ISO C.

Also a reminder that any language can have toolchains with extensions exposing low level features.


Funny how the top comment on "hacker" news is an *unsubstantial* comment about how, actually, TFA is wrong.

Even worse, adding a comment on how actually you shouldn’t be curious and understand how things really work.


There's a lot of moaning and crowing here, but no real substance. If one were to design a CPU and its ISA from scratch, what would you do? Instructions, control flow, memory, out-of-order execution, caches, hierarchies, branch prediction, you'd probably end up with all of it down there anyway. I don't get the point about GPUs. Real applications aren't matrix multiplies and embarrassingly parallel numeric algorithms, they run general purpose PLs.

Which basically then boils down to ISA design. If you could design an ISA from scratch for the hardware you design from scratch, what would you do? Well, there aren't that many options. Stack machine, dataflow machine, VLIW machine. All of those have been tried and the modern superscalar CPUs kick their butts on every metric except power.

The whole article kind of misses the point anyway. We should probably be running higher level languages for most things anyway, which shouldn't be overly constrained by hardware design. For everything else, 100% serious, there is WebAssembly, and hardware ISAs will fade below this level of abstraction in the fullness of time.


Can you elaborate?


I disagree with the author's point that CPU instruction sets should expose more of the CPU's implementation. This has been tried in the past and failed to work long-term. One example of this is branch delay slots from some RISC processors (such as MIPS and SuperH) designed in the late 80s and early 90s. For those unfamiliar with the concept, it basically means that the instruction after a branch instruction will get run regardless of if the branch was taken or not. This was a short-term benefit, as it meant the job of avoiding pipeline stalls after a branch was left to the programmer, so the processor could be simpler and cheaper than designs without them. However, as time went on, the processor designs evolved with more complex pipelines, so the single instruction wasn't enough to cover the branch delay. Instead, it became a legacy issue that future processors had to deal with for compatibility reasons and made their branch prediction and pipeline logic more complex.


I don't think he's saying "expose random implementation details". Exposing the wrong details would obviously be bad. He's just saying c's model has significant shortcomings in the world of modern CPUs.


> he's saying c's model just doesn't work well anymore.

The author argues that C's model does not fit the model he defined himself and claims to be the same model used by everyone.

After going through the article, I'm left with the impression that the author's thesis is flawed and relies on a series of strawmen arguments. Among the strawmen we find:

* arguing that speculative execution "were added to let C programmers continue to believe they were programming in a low-level language".

* claiming that "modern processors are trying to emulate "the same abstract machine as a PDP-11"

* "Creating a new thread is a library operation known to be expensive, so processors wishing to keep their execution units busy running C code rely on ILP (instruction-level parallelism)."

* etc etc etc.

I don't think this opinion piece is grounded on reality, let alone is an objective take.


He doesn't define a model. He just discusses the gap between c's model and a few details of a modern CPU and talks about a few other models.

In your opinion, why was speculative execution added? It doesn't seem off base to suggest it was to enable programmers to continue writing single threaded applications while increasing execution speed.

In your opinion, what is wrong with the statement that modern processors are trying to emulate an abstract machine like PDP-11? To me it seems largely right.


Counterpoint: Allowing explicit control over prefetching or providing an additional "engine" which brings in data to caches in a pattern you like is esp. beneficial in real-time and latency sensitive applications.

I have listened a talk where developers used such subsystem in the given processor. If unused, they would spend 95% of their time window just to copy the data, however by requesting the data ahead of time via that engine, they only used 10% of their time window to get the data, and accomplished what they wanted in ~50% of the time window they have, leaving tons of time for further features and improvements.

If x86 had a such feature, I'd use that in my Ph.D. to request the matrix data I'm accessing ahead of time, because the pattern I use is not linear but well defined. Now, if I want to accelerate that code further, I need to reorder my matrices to make the prefetcher happy, and refactor the whole codebase from top to bottom.


PREFETCHh: Prefetch Data Into Caches

https://c9x.me/x86/html/file_module_x86_id_252.html


generic x86 does not, but SSE extension (present on all modern CPUs) does have it! and you naturally can use C to call it via intrinsics, because C is low level after all...

https://stackoverflow.com/questions/48994494/how-to-properly...


C# also has the same intrisics, I guess C# is low level after all...


And it's mandated to be available on all x86-64.


Is the pattern linear for some stretches or something like that? I’m wondering if reordering your matrices is the right strategy anyway—you want to load a whole cache line at a time I guess, being able to program the prefetcher wouldn’t get you that, right?


x86 does have a prefetch instruction, and it’s available as a compiler intrinsic in gcc and clang. But it’s really difficult to get a performance improvement from explicit prefetching on modern big CPUs because their dynamic prefetchers are very clever. Your data access pattern needs to be something the prefetcher does not know how to match, and you need to be able to prefetch addresses before the speculator would get to them by itself. Unlikely for matrices unless they have a very weird shape.


You may like the x86 instruction called 'prefetch'.


Wasn't this also similar for itanium? Where the branch burden would be on the compiler?


> One example of this is branch delay slots

That's the only example I'm aware of. Are there others? (I'm sure you could do it poorly if you wanted to, but how much history is there to extrapolate from?)


Vector instruction sets expose too much about the size of the CPU. They have got bigger over the years, 128 - 256 - 512 bits, 8 - 16 - 32 registers, and now Intel is struggling to fit them comfortably into their small efficiency cores and retain binary compatibility with their big performance cores.


I'm unimpressed that x64 assumes function calls write the return address to stack memory instead of passing it in a register. It means you have to use the stack memory in order to benefit from the call/ret prediction hardware, even in functions which don't otherwise need to allocate any stack memory.

In general leaking microarch weirdness matters less if you don't have backwards compatibility.


I feel, low to high level is a spectrum, not a binary. C is arguably in the lowest third of languages, exposing you to a lot of machine primitives like memory and thread management. It may not be as low level as assembly, but it is arguably lower level than Java or Go, and definitely nowhere near the Pythons and JS of this world.


> exposing you to a lot of machine primitives like memory and thread management

Except it doesn't really, the standard leaves most of the really machine-dependent parts undefined; only very few things are left implementation-defined.

Plus, of course, C is quite unsuitable for any platform that uses segmented memory/non-flat addresses (which are things that are trying to come back in vogue but C's wide spread really, really hinders that).


> Except it doesn't really, the standard leaves most of the really machine-dependent parts undefined

Well that's because it is low level and, especially, simple, and doesn't try to abstract things.


It's a certain kind of low level - specifically a PDP-11 kind of low level.

If your hardware is significantly different, it only looks low level. In reality plenty of mapping and conversion goes on behind the scenes - sometimes with hilarious consequences.


> and doesn't try to abstract things.

The C standard is a description of an abstract machine. You get UB and unexpected miscompilations, because the optimizer is not evaluating how your code runs on the machine you're compiling for, but simulates running your code on the weirdly abstract C machine, one that can't overflow signed integers.

And C abstracts away almost everything about stack, stack frames, and all the complexities of memory and cache hierarchies. They are abstracted to be uniform linear address space.


Can you or someone expand further on that? Which platforms are trying to use segmented addressing, and what benefits does it have?


CHERI project [0]. Look at figure 2.1: it's an improvement and further development of the segments of yore but the origins are quite visible.

[0] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-941.pdf


C supports CHERI quite well. It’s modern languages used to flat address spaces and 64-bit pointers that are having a hard time, because they defined a lot of “obvious” behaviors that the standard leaves flexible.


We did ok with the 8086 processors, no?


Yes, juggling near and far pointers was somewhat annoying but then Intel, as a part of the 32-bit transition, modified their ISA to be a more pleasant target for C implementations.

Incidentally, C never really became popular on 6502 because, arguably, that ISA is somewhat hostile towards efficient implementations of higher-level languages.


True. The code generated by https://cc65.github.io/ is pretty decent but there are a few places where hand-rolled assembler will perform much better when you need it. Although I've made things for 6502-based systems in C with this handy compiler (thanks cc65 contributors!).

Is there something intrinsic to how C handles addressing that makes segmented architectures more painful than they ought to be? Or maybe is there a language where segmented addressing is easier?

I hadn't really thought about it in a while. :)


Lisp is fine on 6502 though :-)


By that do you mean exposing a non-uniform memory hierarchy as separate addressable spaces (but with coherent views from each hardware thread) or something like thread-local scratch pads?

Either way, C is equipped ok for that - at least as well as most systems languages C++, Rust, etc. - simply because dealing with allocation and raw addressing (at least raw within the process memory space) is a fundamental part of the experience. Throw in a few compiler extensions (because you'll need to change the compiler to make use of this anyway) for things like where to locate static allocations and use library functions that add dynamic allocation in specific spaces. It will get hairy, but it's at least possible with some very careful programming.


Honest question - is there any language at all between C and Assembly? Because if there is, I haven't heard of it. For that reason alone, my mental model has always been "C is the lowest you can go before hitting direct instructions to the processor."


I would say Forth is lower level than C. The mental model is a two stack machine plus memory, rather than a PDP-11.

And it is very reasonable if you are under 50 years of age, that you haven't heard of it.


Started programming 81 and technically I could have known it

https://www.timexsinclair.com/product/zx-forth/


Conceptually, you can consider LLVM IR to be such language, and there are people who use it that way.


There are some cleverer assemblers that allow to program in assembly while still being able to do stuff like loops without too much effort https://en.wikipedia.org/wiki/Assembly_language#Support_for_...


While staying portable across architectures, probably not. But you can make a little language that's nicer to jse than assembly for a particular CPU.

COMFY-65 is a compiler for a small Lisp language that provides all non-branching operations of the 6502 processor as primitives (e.g. tests for carries, overflows, zero, and negative; set decimal arithmetic mode; etc.). However, programs still consist of subroutines, loops, and tests, with no "go to label" construct provided. It's surprisingly simple and, I would say, elegant.

Here's the PDF that outlines it: https://dl.acm.org/doi/pdf/10.1145/270941.270947


C makes lots of things undefined behavior which are perfectly fine in assembler — read a stack without first writing to it, doing overflow signed internet arithmetic, treating the same memory location as different types.

Also there is quite a lot in modern assembler that you can’t really get to from C, like prefetch and cache flushing instructions.


That last bit is really interesting. Do newer languages make use of those features? Sorry I’m fairly ignorant of this level of the stack.


Newer languages don't make use of any of those features, to my knowledge. They're only available in assembly, and only on modern processors. Because C was made on and designed for hardware that was crappy even in the 1970s, which didn't have a cache or do out of order execution, C doesn't fully reflect the capabilities of modern processors. That's what they're implying with the last sentence, I'm pretty sure.


> is there any language at all between C and Assembly?

LLVM or QBE, for example.


Yes, WebAssembly is higher-level than machine code but lower than C.


1990 is calling:

"C does not behave as a typical ‘high-level’ language, because it offers a number of features which are more normally associated with ‘low-level’ languages such as assembly language. These include the ability to write data to and from particular memory addresses, facilities for operations on the contents of memory locations, and instructions for incrementing and decrementing integer variables ... Thus C allows the programmer the flexibility and efficiency of working at low level with the advantages of working at high-level, for example the more advanced data structures and program flow controls typical of today’s computer languages. For this reason, C is sometimes described as a ‘high-level low-level language’ or as a ‘low-level high-level language’." - https://archive.org/details/computerprogramm0000ford/page/13...


You have high expectations for accuracy in an article titled "C Is Not a Low-level Language".


I think this statement at the end of the article - 'There is a common myth in software development that parallel programming is hard.' - is misleading. Granted the author denotes explicit situations where it is not hard, but if it's applicable in general, then it is hard. Not a common myth.

Is parallel programming hard? Without any further details or specifics, yes it is. It is far harder to conceptualize code instructions executing simultaneously, than one-at-a-time in a sequential order.


> Is parallel programming hard? Without any further details or specifics, yes it is. It is far harder to conceptualize code instructions executing simultaneously, than one-at-a-time in a sequential order.

If I program (map inc [0 1 2 3]) is it really any more difficult to conceptualize the (inc ) function performing on each element sequentially than in parallel?

I think the difficulty of parallel programming is less innate and more two fold:

1) languages often default to sequential so to do async requires introducing additional primitives to the programmer

2) knowing when to effectively use parallel programming

When I have a list or stream that I know has independent elements that require wholly independent calculations then parallel programming is straightforward

Where people get hung up is trying to shoe horn async where it is either unnecessary (performance is equal or worse than sequential) or introduces breaking behavior (the computations are in fact interdependent).


Most problems are not embarrassingly parallel.

(Fun fact: I once had someone call HR on me because they didn't know embarrassingly parallel was a technical term, and they thought I was belittling them)


Prefix scan is not embarrassingly parallel. Yet OP's statement still works when you change it to scanl (+) 0 [0 1 2 3]


That requires + to be associative. And scan is one of the core skeletons of parallel skeletons, so obviously if you express everything as parallel skeletons, parallel programming remains manageable.


I agree that if we define the individual instructions to always be wholly independent, then sure, it is more straightforward.

While I'd probably argue that it is still more difficult to conceptualize, the statement we're discussing is presented as broad and general. I'd call it far less misleading if it said something like:

There is a common myth in software development that parallel programming *has* to be hard.


The whole reason async is even a thing is due to slow, side-effect producing operations. Of course pure functions are easy to parallelize.

I don't think folks so much "shoe horn async where it is unnecessary" as the red/blue problem causes async code in most languages to spread.

Or by "async" do you just mean concurrent code? I'm reading "async" to mean lightweight coroutines or similar.


> Or by "async" do you just mean concurrent code? I'm reading "async" to mean lightweight coroutines or similar.

Yeah, my bad, I was utilizing a colloquial definition of a term that has a technical definition in a technical conversation. A lamentation lo the lossyness of language.

I guess I assumed we were talking about something other than in terms of red/blue because I'd argue red/blue's "hard"ness transcends myth to mathematical fact.


I don't think this is right. Thinking about operations on matrices is not complex. Defining how a single agent should act on its environment is not complex

When you say "without further details or specifics" you're saying "using my default framework of a c/ c descendent world"

The author's point is that sequential programming is one type of simple programming, but it's not the only type, and it doesn't map easily to modern hardware


The author's article generally focuses on C (and possibly descendant languages), but the phrase I am critical of, does not. Furthermore, I explicitly consider a very broad selection of programming languages (many not C-derived) in my opinion. The author's phrasing, I'd argue, paints the entire concept of parallel programming as not hard.

There's some irony to the fact that you re-interpret my opinion as being very specific to C and (indirectly) posit that - in that specific case - parallel programming is hard, and then yourself go on to select a very specific case where parallel programming is not hard, because some matrix operations are independent.

I agree that there are languages that are explicitly built to make parallel programming easy. But in general, and not just related to c or c descendant languages, parallel programming is hard.


My point (and I think the points of others responding to you) is that parallel programming is not always hard. That's also what the author is saying.

The common myth - you're doing parallel programming? That sounds hard

It's not always hard. It really isn't! You don't need to be a genius or an expert to write parallel code.

Maybe where we're getting caught up is Cassie K's comment on ml engineering. You don't need to know how to build a microwave to use a microwave. In the same way, you don't need to be a genius or some deep expert in distributed systems to use abstractions that parallelize your programs

To write a parallel program does not require that you know what a mutex is. It just needs you to understand some simple algebraic (6-8th grade) properties about your functions (and, in fact, for library functions, they can be annotated as associative)

There is a broad spectrum of parallel programs. Somebody using a web server implementation? They've made a parallel application

Somebody running tensorflow or pytorch? Also parallel! Even for simple stuff!

You could be a beginner programmer and be taught to make parallel programs without understanding distributed systems. It's not always hard. It's not generally hard. The complex bits are hard. The simple bits use 8th grade math.


> My point (and I think the points of others responding to you) is that parallel programming is not always hard.

Sure, and even more people commenting appear to be of the mind that it is generally hard.

> That's also what the author is saying.

It's not what author is explicitly saying in the statement I'm addressing if you re-read my original comment. There, the author isn't saying that it's not always hard, they're implying that it 'in general' isn't hard.

From your arguments, it would seem you think anything that actually runs in parallel (regardless of whether it programmed as such) can be considered 'parallel programming' and from that perspective, sure, it is super easy. But with that kind of reasoning, you can argue that anyone who only knows how to drive cars with automatic gears is actually a gear-shifting expert and shifting gears is really easy, because it happens automatically for them.


Agreed. The potential state space in parallel processing is a lot larger, which makes it more complex, which makes it harder.

That Erlang exists and people use it successfully does not mean that harder things aren't.


Concurrent programming - doing lots of different things at once - is hard. It is hard to use concurrent programming infrastructure (processes, threads) to implement parallel algorithms. Parallel programming - using lots of processing elements to work on the same thing at once - is much easier if you have the right abstractions.


I wouldn't say that it's hard to conceptualize instructions executing in parallel, but it's hard to coordinate those parallel subtasks in an efficient and correct way - except in some use cases, like eg matrix multiplication.


Isn't it distant from how humans work? We can't really do parallel, can we? And programming is translating human instructions to computer instructions, and translation is harder between more distant languages.


A factory is parallel

Or do you mean an individual can't do things in parallel?

Like.... Pushing all of those grocery carts in a long line is moving them in parallel

Or do you mean processing? Like thinking?


This article is correct that your computer is not a fast PDP-11 but wrong that this has anything to do with C. Eg, "another core part of the C abstract machine's memory model: flat memory. This hasn't been true for more than two decades."

This has nothing to do with C. The hardware insists on this abstraction. And its a good job too, otherwise your programs would stop working when moved to a machine with different cache.


The article argues that the hardware’s insistence on this abstraction is in large part _because_ of C’s dominance.


If only that were true. Lots of languages that have nothing to do with C also did it. It's just much easier to program with a unified memory model, that's all there is to it.


If by unified memory model you mean "flat address space" then no, it's not. The moment you need two (or more) dynamically-sized arrays, you need to implement realloc with memmov in the unfortunate case. In a world where each array could have it's own segment, this problem doesn't arise because they simply cannot intersect and realloc boils down into increasing/decreasing a segment's extent.


On the good[1] old times of x86 segments, you still couldn't assign a segment to each array as segments were a limited resource. Something still has to do the mapping of the segments to physical linearly addressable memory.

[1] spoiler alert: they were not good.


Wouldn't that just move the "magic" down to the kernel or the MMU which has to do the realloc and memmov hidden from the programmer instead of the programmer being able to chose when and how to do it?


Yes, but here's the thing: the virtual memory system already does this kind of "magic", maintaining the mapping of virtual addresses onto the physical ones, and it doesn't need to actually move data between physical pages at all since the mapping is discontinuous.


Many of those languages indirectly have lots to do with C – even if you ignore the obvious problems like "C is the only ABI supported by most OSes, C FFI is the only cross-language interface supported by most languages and thus most libraries", there's more subtle influenced: Copying e.g. the (very expensive to implement in hardware) cache semantics of C usually "costs" languages nothing, because the hardware is already there, due to C. Not copying them happens, if both language and hardware get developed at the same time, but it's much rarer.

You see similar problems with things like vectorization – Rust was in a good position to define semantics more amenable to ARM SVE / Risc-V VE, but all existing SIMD libraries are written for C and x86 semantics, so that's what Rust is currently stuck with, as are most other languages.


What “C semantics” are baked into Rust’s SIMD libraries?


Look at the github/reddit/etc discussions around getting variable-length vector extensions like SVE/RVE properly supported; all the libraries are designed for x86 semantics, which are the way they are to make them easily implementable in C.


I’m still confused. What are “x86 semantics”? What does this have to do with C? I understand if SIMD libraries were designed for e.g. SSE or AVX first but I’m not sure how that relates here.


But we have/had architectures that expose parallelism at the instruction set. Eg itanium and graphcore. And the PS2 made cache management the programmer's problem. I don't think any of these experiments proved successful in the long run.


Hence the observation that every architecture eventually converges to NUMAcc if physically possible.


People have written C code that dealt with more complex memory schemes.

The language matters less than the fact that there's a lot of existing code around. That code needs to keep working.


Yeah. A lot of the things that make C not low level in the terms of this article happened on IBM mainframes decades before x86:

* tiered memory hierarchy pretending to be flat RAM

* CPUs that are much bigger than the ISA suggests, and which have out-of-order and speculative execution so code can make good use of their resources

* optimizing compilers that further decouple the program as written from its execution

IBM was working on this stuff in the 1970s, well before the rise of C. It’s fair to criticize the model and seek out alternatives, but it isn’t fair to blame C.


Flat memory is a bad thing for performance. Especially cache-coherent flat memory. It is convenient for programmers.


Yeah, agreed. My comment should really have been, "I'm glad modern ISAs are high level because low level ones would be a massive burden". And, "It isn't C's fault that low level ISAs are a massive burden".


This is now five years old, and while obviously the premise is more correct than ever (computers don't look much like a PDP-11 architecturally), the conclusion ("imagining a non-C processor") seems less strong. We are seeing (and were seeing, even in 2018) a strong separation between linear and highly-parallel code, most obviously in the rise of Python for machine learning and scientific computing. It is still very convenient, when performance isn't paramount, to write in a single-threaded style and to a flat memory model. When performance is important, it's then appropriate to switch to a language better suited to parallel programming -- one of the computational-graph languages in something like Pytorch, some other set of primitives on top of CUDA, or even something more experimental like Futhark. Performance-critical code has always had its domain-specific languages, and they seem to be becoming more common, not less, and the hardware is being built to match -- as the CPU+GPU combination common to desktop PCs, as vector extensions to x86 (which have their own primitives making, essentially, a DSL of their own), or things like the M1, which bolt a GPU to a CPU to give both high-speed access to the same system memory.

In other words, perhaps what's really out of date is not C, but the concept of a general-purpose language which is equally well-suited to any type of task.


If the sophistication of modern CPUs makes C no longer a "low level" language, then the same applies to assembly language .. things like out of order execution and register naming applies there too.

I guess the sophistication of compilers in recent decades adds to the argument since even the assembler (object code) the C compiler generates isn't going to be as expected due to hoisting things out of loops, common subexpression elimination, etc, etc.

Still, I think the notion of C being a "low level" language is still a useful label ... if not we need to retire this designation altogether.


Assembly is just a bit lazy, macro-expanded, and the computer's memory address a made up concept.

That's indeed an abstraction over the real computer, but it's a lot less things piled up on your virtual computer's model than C. Current assembly is about on the same level as C was when it was created. Current C is so high-level that it doesn't provide any functionality you can't get with a better, more modern language.

But yeah, I do agree that "low" and "high" level aren't useful names nowadays.


I feel like the article advances on two different lines of argument that are difficult to reconcile. The first is that C is not a low-level language, and gives examples like struct padding and signed overflow being undefined behavior. That part makes sense to me, and the argument seems constructive: it seems to propose language features for a hypothetical "real low-level" language.

The second argument is that, because of the dominance of C, CPU designers have had to bend over backwards to create something that runs C naturally. Here there are examples like register renaming, flat memory, caching, etc. This argument also makes sense to me, but in the context of the first argument, and the title of the article, I'm not sure how it relates. Taken at face value, this seems to imply that it isn't even possible to create a low-level language on modern hardware, and even machine code is "high-level". This seems to argue that we would have to create a new generation of hardware that exposes much more complexity to the instruction set architecture, and only then could we design a low-level language to take advantage of that.

I think both of these arguments have merit, but it's a little disconcerting to put both of them in the same article, and to make the title "C is not a Low-Level Language". I suppose the first argument could go here, and the second argument could have been done in a follow-up article entitled "Machine code is not a Low-Level Language Either".


Intel's IA-64 supposedly exposed lower levels of the processor to machine code, but I hear it took ages to compile, and compilers never really got to the optimization levels they were expecting (and not being compatible with x86 also didn't help adoption)


Reminds me of VLIW. As per Wikipedia, from the Itanium page:

> One VLIW instruction word can contain several independent instructions, which can be executed in parallel without having to evaluate them for independence. A compiler must attempt to find valid combinations of instructions that can be executed at the same time, effectively performing the instruction scheduling that conventional superscalar processors must do in hardware at runtime.

If your CPU exposed the single-stream parallelism at the interface, you can do it at compile-time or even decide it with in-line assembler.

I wonder if it hasn't caught due strictly to the business dynamics of the industry, or are there technical reasons this isn't really a good strategy?


Well, IIRC it didn't caught on mostly because of a) compilers weren't really that good at that kind of instruction scheduling (and when they improved, Itanium has sunk already), b) conventional ISAs (that is, x86) got quite good at doing this in hardware, at runtime, and actually deliver slightly better results than static scheduling precisely because they do it at runtime, when profiling data is available.

I believe Linus has a good even if tangentially related to this exact topic rant at [0]. "While the RISC people were off trying to optimize their compilers to generate loops that used all 32 registers efficiently, the x86 implementors instead made the chip run fast on varied loads and used tons of register renaming hardware (and looking at _memory_ renaming too)."

[0] https://yarchive.net/comp/linux/x86.html


Static scheduling, even with profiling, can never be as good as dynamic scheduling for general-purpose workloads. VLIW/EPIC can do well for HPC-style number crunching, but that isn't everything. https://news.ycombinator.com/context?id=37900987


One can move complexity back and forth between compiler, runtime and processor implementation to some extent. VLIW works really well in some niches. It's harder to program than single instructions that execute in sequence, either by hand or by compiler, but it simplifies the scheduling for the hardware. Works better if the bundled instructions have similar latency.

The key design puzzle at present seems to be that memory access takes many more cycles than arithmetic. Bundling a few cycles of arithmetic with a few hundred cycles of memory load is kind of pointless. So VLIW works well if you know memory access is going to be fast, which roughly means knowing it'll hit in L1 cache or equivalent. I think that's part of why it suits DSP style systems.

Exposed pipelines are an interesting quirk of some of these systems. One instruction in a VLIW bundle writes to a register and subsequent instructions that read from that same register will see the previous value for N subsequent cycles, after which the write becomes visible. They're really confusing to program by hand but compilers can deal with that sort of scheduling.


Because static scheduling is terrible for non-DSP and non-HPC loads like the typical server or desktop application where the control and data flow is very input dependent. Until recently DSP and HPC were a tiny fraction of the market so architectures capable of dynamic scheduling dominated even those markets as they had more investment.

With GPUs of course things have changed and in fact GPUs relied more on static scheduling, but even there as they expand to more varied loads, they are acquiring more dynamism.


See my other comment for why VLIW was technically flawed

https://news.ycombinator.com/context?id=37900987


Im reading that TeraScale (AMD) works this way. Itanium is a major attempt to ship it in a CPU. I guess AMD64 and ARM rule the day but maybe in the future we'll see it again.


Terascale was a vliw, worked well as far as I know. The current amdgpu architectures aren't - those are multiple execution port systems, reminiscent of the x64 setup.

Qualcomms' Hexagon is a vliw, I think that's contemporary. Graphcore's IPU is two instructions per word.


Are you asking why VLIW hasn't caught on? There are DSPs that use VLIW concepts. But for general purpose computing look at Itanium and it's failure.


My sense is that this is really a communication issue (when is it not?)

On a relative scale, C is very low level compared to how we program today if you think about levels of abstraction.

If “low level” means “runs on the CPU almost literally as written.” then no it’s not.


It's more than that. C-the-language just doesn't have low-level concepts such as machine addresses, and its facilities for dealing with the types that the abstract machine ascribes to all objects are quite limited.

Ada has System.Address to model machine addresses:

http://ada-auth.org/standards/rm12_w_tc1/html/RM-13-7.html#p...

C++ has std::less specializations for pointer types which provide a strict total order (one aspect of machine addresses):

https://en.cppreference.com/w/cpp/utility/functional/less

There is also placement new and std::launder for more explicit control of typed memory:

https://en.cppreference.com/w/cpp/language/new https://en.cppreference.com/w/cpp/utility/launder

These days, even Java tries to model machine addresses:

https://docs.oracle.com/en/java/javase/21/core/foreign-funct...


Yep, this is a linguistic problem, not a technical one. "C is not a low level language" implies that the hi/lo boundary lies below C. What's below C? IR, Asm, and opcodes.

IRs like LLVMIR and various bytecodes. Well, those don't map to the hardware 1:1, not even close. So IR must be HLL.

Sure Asm has to be architecture specific, but even then we are getting pretty good at transpilation. And those codes get translated to opcodes anyways on most modern chips.

Basically, unless you are assembling on an ancient system or embedded processor, you aren't writing in a "low level language". Very few folks nowadays do this, so the term "LLL" doesn't occupy much mindshare in semantic space. That leads folks to populate it with what they perceive as low level - the lowest language on the abstraction tree they are likely to encounter - C.

This divide is only going to expand so I say we just accept the definition of low level language has shifted, and call anything where it does closely match... something else, I don't have a good term. Maybe "hardware level language".


> If “low level” means “runs on the CPU almost literally as written.” then no it’s not.

But doesn't this still depend on what CPU you're talking about? Your C code will map much more closely to the instructions of the machine code of an 8051 or even an M4 than it will to an x86.

Thus any general-purpose language is more or less "low level" depending on the CPU it's running on. This seems like a poor definition.


This is one of the most interesting programming articles I've read in a while. And it's well written and easy to read! Don't stop at the (inflammatory?) title.

* We all agree that c gives you a lot of control to write efficient sequential code

* Modern processors aren't merely sequential processors

* Optimizing c code for a modern processor is hard because c is over-specified - in order to allow humans to manually optimize their programs (given the c memory model etc), it's hard for compilers to make assumptions about what optimizations they can make

It doesn't seem like this is a fundamental problem, though, and c could provide symbols that denote "use a less strict model here" (or even a compiler flag, although I bet incremental is the way to go)


This is a great article (worth reading if interested in performance/parallel computing) but the complications it gets into are mostly in the CPU architecture/hardware to which compilers add additional complexity. Even without the compiler optimizations there's still branch prediction and associated parallel execution of serial machine code.

To anyone debating whether C is low/not-low level language note that this discussion is at a much lower level so 'low' has a lower than common meaning.


I think the title that the authors decided to give this article was unnecessarily provocative in a distracting manner. I’m pretty sure there is a technical definition of low level language they are referencing that excludes C, and pretty much only includes assembly as a low level language. Ok, fine, whatever.

Their bigger point seems to be that C is no longer very mechanically sympathetic to huge modern cores, because the abstraction pretends there’s only one instruction in flight at a time. Is anyone aware of a language that fits the hardware better? Maybe Intel needs to release a “CUDA of CPUs” type language.


It doesn't even include assembly.


> On a modern high-end core, the register rename engine is one of the largest consumers of die area and power.

Another red herring. Register rename isn't the result of some PDP fetishizing. It is a direct result of using more hardware resources than are exposed in the architectural model. Even if it were a stack machine or a dataflow graph architecture, register renaming is what you do when you have more dynamic names for storage than static names in the ISA.


> Consider another core part of the C abstract machine's memory model: flat memory.

The C abstract machine only has a flat memory model within a given malloc allocation (and within each local or static object). Relational pointer comparison between different allocations is UB (see e.g. https://stackoverflow.com/a/34973704).

So C is perfectly fine with a non-flat memory model as long as each object is confined within a flat memory region (by virtue of being allowed to alias it as a char array). You can imagine a C runtime library that provides functions to obtain pointers to different types of memory that don’t share a flat address space.

The only restriction is that pointers must carry enough information to compare unequal if they point to different objects. Of course, you might be able to construct a virtual flat memory model from the bit representation of void* or char*, but that’s not quite the same as imposing an actual flat memory model.


"Low-level" is not a perfectly well-defined technical term, and does mean (slightly) different things to different people.

I feel that the article does explain well enough, how the author defines "low-level" for the sake of this article - and the definition being used seems just as fine as any other. And sticking with this specific definition, the conclusions of the article do seem to check out. (But I'm no expert on the subject matter, so I might be wrong about that).

I feel that the "value" of the article lies in challenging certain conceptions about C.

To me, it doesn't really matter if the article is (completely) right or not - the somewhat indignant response I see happening to the title of the article, and the discussion I see about what "low-level" actually means, seems to prove that some dogmatic beliefs about C are pretty deep-seated.

I feel it's always worthwhile to question such dogmatic beliefs.


> The root cause of the Spectre and Meltdown vulnerabilities was that processor architects were trying to build not just fast processors, but fast processors that expose the same abstract machine as a PDP-11.

No, Spectre is the direct result of processors speculatively executing code without respecting the conditions that guard the code. Hands down, processors hallucinate conditions in code. It has nothing to do with the particular computational model, but would happen in any system that speculates conditions.

And not just one branch, but a whole series of them. In fact, the processor is usually running with a whole buffer full of instructions that are executing in parallel, having been loaded into the reorder engine using nothing more than (normally highly accurate) statistical predictions.


Apparently there is plenty disagreement about what “low level” means. Historically assembler was considered low level, and languages like C with named variables and functions and nested expressions were considered high level. I have also seen C described as mid-level to indicate it is higher level than assembler but “closer to the metal” than say Java. And apparently it is now called low-level by some - wonder what assembler is then?

In any case, at this point, low level and high level are only meaningful relative to other languages.

The article is questioning how “close to the metal” C actually is, but some of the arguments also applies to assembler, which is not that close to the metal either these days.


Yeah-- my intro C Programming class 20+ years ago began with the professor saying "This class is about C, a high-level programming language used for most UNIX system tools because trying to write all of those things in assembly would make you want to burn your keyboard."

It seems like the distinction between C and Assembly these days is less important than the distinction between C and say... Javascript. Which is fine by me- English is descriptive and the people who work in Assembly aren't going to get confused by it.


To make an article about how C maps to the processor and fail to make any distinction between application programming and embedded programming seems strange to me. After all, C is by far the most common language for programs running on micro-controllers, and it actually does map well to many micro-controller architectures in use today.

I'm clearly not the target audience for this article, but I still feel like the author would be well advised to put a little note at the top that says "we're talking about CISC and high-end microprocessors rather than microcontrollers."

I'm also not seeing suggestions for languages that do map well to modern microprocessors.


I've been programming since the mid 80s, started with the C=64. People have been having the argument that C is low-level vs c is not since at least then.

Why do so many smart people waste their friggin' time on such nonsense?


Computation is only a small part of computing, addressed by languages such as OpenCL and by no means simple, observe constant GameReady driver releases from Nvidia to support each new major game. C is still pretty good at many other parts of low level computing, such as managing state of hardware or allocation of system memory to different tasks. Such tasks are not well suited to parallelism, as they must maintain a globally consistent state.

It is perhaps true that CPUs and compilers should execute C code mostly as it is, with only local optimizations to spare programmer of having to decide whether x + x, or x * 2 or x << 1 is faster for example. This would improve system security and reliability while freeing up time to work on great compute languages for vectorizable computations.

But, at the end of the day, CPU makers and compiler writers are humans motivated by both career success and less tangible bragging rights. So OF COURSE they will chase benchmarks at the expense of everything else, even when benchmarks have little to do with real life performance in an average case. I have a 13 year old 17 inch MacBook pro I use for some favorite old games. When I fire it up, I don't see any differences in my computing experience vs a 2023 laptop. So whatever advances in CPU/compiler design were made since do not seem to help with tasks I am actually interested in.


Assembly is not the lowest level language you can work in. I've programmed in raw binary opcodes before, that is the lowest level. (though there is a valid argument that microcode is even lower level - I disagree but still acknowledge the argument is valid) Often a single assembly language instruction can be one of more than 30 different opcodes as registers are often encoded in the opcode. Of course at this level you have to have your CPU instruction manual as they are all different.


agreed


Other than assembly, which barely qualifies as a language, what programming language is lower than C?


It does not need to be a relative statement in order to be correct.

The statement "C is not close to the instruction set of a modern CPU" does not need to be validated by specifying examples of languages that are closer.


But if you're going to say that "C is not a low-level language", then yeah you kinda do need other languages beneath it.


Well, firstly, I'm not saying it.

But no. That is what I meant when I said this is not a relative statement.

If the title said "C is not the lowest-level language" then your objection would be valid... but it doesn't and it's not saying that.

But before I go into some lengthy explanation: have you read the article, or are you responding to the title alone?


In general terms, any language aiming to be lower-level than C should

- have an "abstract" machine that is more concrete than C (and by extension less portable)

- be easier to lower into optimal assembly (especially loop ops)

- give you strong and precise compile-time guarantees about memory layout (padding, bitfields), variable sizes, register spilling, stack usage, etc.


Plenty to chose from since 1958's introduction of JOVIAL, when one cares to research what has happened in the world of systems programming outside Bell Labs, and UNIX/C taking over the server room.


Forth/joy maybe?


LLVM IR


Fortran, maybe?


B


Or T3X9, if you're prefer Algol-style syntax.


I think there is an argument that Brainfuck [0], et al, is lower than C, given that it eschews variables and functions.

[0] https://esolangs.org/wiki/Brainfuck


Low level means close to the processor, not small in scope.

You could argue brainfuck is machine language for a theroetical infinite tape machine, but such a machine can only exist when implemented in high-level software.


This article begins with victim blaming the software engineers in full-throated support of hardware engineers. If, and I do mean if, anyone should be exalted it is the fact that software engineers have been coping with C as a stable-but-difficult programming language specifically for the benefit of the hardware engineers’ desire to have a stable target. The fact that the specification is ambiguous at all is so that hardware manufacturers can port a reasonably small, expressive, and powerful language to their hardware. And, no, making a new language that targets the platform for the ease of hardware development and exploitation of system-specific benefits is not the answer. In fact, it’s the literal reason why C is still as popular as it is.

Nobody wants to learn your programming language, write thousands-to-millions of dollars worth of software, just to have it become obsolete two days after the new-hotness processor comes out. Been there, done that.

Alternatively, perhaps, we can place the blame on hardware manufacturers who were looking to cut corners for improved performance and produced insecure machines because they lied to us non-expert hardware users about how fast their systems could go and what we were getting for our money.


Yes, C is a set of abstractions like any other language (even assembly.) which attempt to mimic a machine of far less complexity.

Unfortunately it's also the wrong set of abstractions for the contemporary era.

That said, if you're working in low-level embedded microcontroller world, C's memory model and program structure does in fact look a lot more like those systems.


What would you say is the right set of abstractions for the contemporary era? Especially for writing things like OSes and device drivers?


I've been working on making games for the Playdate (https://play.date) over the past few weeks, using their C SDK. it's my first time using C in a decade, since I first learned it in college, and I'm having a surprisingly great time with it. sure, there's tons of weird quirks that take some getting used to—but there's a lot that I've been surprised to find that I missed about it! it's fun to write code that does what you tell it to do, without having to worry about object ownership or any higher-level concerns like that—you just manage the memory yourself, so you know where everything is, and write functions that operate on it straightforwardly. if it's been awhile since you've touched C, I highly recommend giving it a try for a small game project.


“The abstract machine C assumes no longer resembles modern architectures” implies that it might be nice to have a language that maps more directly to what is really happening under the hood. I agree. It would be nice to take the guesswork out of, “How should I write this so that the compiled code has fewer cache misses?”

Maybe there is a sweet-spot level of abstraction that allows for more fine-grained control of the modern machine, in the sense that compiled code more or less reflects written code, but not so fine-grained as to be unwieldy or non-portable.

Vectorized code that is native to the language could be done with either map functions or Python / NumPy / PyTorch style slicing, which is fairly intuitive. Multithreaded OTOH I’m not sure there is an easy answer.


Of course it isn't, but what's the alternative?


My favourite article about C in years.

To answer your question off the top of my head, answering different bits of the issue, from the perspective of the era of active programming language R&D not themes on themes on themes as we have now...

Limbo, Occam (Occam-pi, etc.), APL (I/J, Aplus, etc.), Oberon (Oberon 2, Oberon 07, Active Oberon, Zennon)...


I'm not familiar with these languages, but which of them is closer to the actual modern hardware than C, while still being abstract enough to be portable?


In what way did I imply that any of them were in any way closer?

That was not my intention at all.

You asked what alternatives there were. C is a systems implementation language, designed to be compiled to object code that will run on the bare metal.

I offered some examples of alternatives to that role, as I thought you asked. I did say that they explored different aspects of the problem.

As I said to someone else upthread:

It does not need to be a relative statement in order to be correct.

The statement "C is not close to the instruction set of a modern CPU" does not need to be validated by specifying examples of languages that are closer.


I think it is somewhat fair to take "what is the alternative" to be in relation to the headline claim. So, "what is the alternative that is low level?"

I think I agree that it is fair to push back on the very idea of a "low level" language. But that feels somewhat banal. We get it, there are abstractions even at the lower levels nowadays that simply didn't exist back when.

Similarly, if someone claims that Haskel isn't a "high level" language, what does that mean?

And to be fair, we have screwed up terms so much it is embarrassing. I see arguments on whether or not LISP is a functional language quite often. There was an amusing discourse not long ago on whether SQL was declarative. Turns out, taxonomies are tough and strict taxonomies are near useless.


It's not my headline claim, and if one wants to enter into debate about what the article means, then step 1 is reading the article.

As for the rest: well, yes, fair enough, but somewhat tangential to this subthread, I feel.


I meant my first claim to be that it is charity to the question that it was written to reflect against the headline. Even with what comes in the article. Which, frankly, doesn't change much?

Indeed, the core claim at the start of the article is "The features that led to these vulnerabilities, along with several others, were added to let C programmers continue to believe they were programming in a low-level language, when this hasn't been the case for decades." But, no they weren't. They were added to allow the CPUs to maintain resource utilization while executing code that they are taking a probabilistic stab at.

There is some odd appeal to GPU programming, ignoring that the main reason GPU programming can do what they do, is because it is a foregone assumption that you will have to do the same operation across the entire visible scene.

So, back to the question at hand, what is this "lower level language" that is being talked about? Best I can see from this article, it is "c" but with vectors and no aliasing? And many more core instructions? I know of basically no languages that make it clear that sqrt could be a CPU level instruction. And that one is somewhat trivial to name. It wasn't too long ago that we saw discussion of popcnt instruction. Is that a "primitive" part of any non-assembly language?

It is a neat assumption to challenge, that C may be limiting what we can do. But, with how often the C and C with inline assembly dominate most any performance category, it is a steep hill to climb to show that that is what is limiting us.

I also find the closing remarks about how "There is a common myth in software development that parallel programming is hard" to be kind of flippantly insulting. Would be like claiming any sport is easy because "look, you can teach grade school kids to play." Especially as I have seen plenty of bugs in actor-model languages to know it is no panacea. I agree it is easy to specify parallel activities. It gets a lot harder as you start adding in all of the deadlines and other work handoffs that are necessary for fast execution. Again, sports make a good example. Hitting a ball is easy. Running cross court to return a fast shot from an opponent is, essentially, the same thing. Far far harder, though.


I don't really know what you're talking about, TBH.

You are bringing GPU programming into it, which I never mentioned at all.

Again: saying "X is not big" is not a relative claim. Saying "X is the biggest" is a relative claim but nobody's saying that.

"Low level" means "close to the metal". The way C is often described is as "a portable assembly language". The article is saying that is not true. That the model of computation, of processor operation, that C represents is a 1970s model of how computers work and it barely fits onto modern machines at all.

I can't name anything closer to the metal or lower level, but it doesn't matter; it is irrelevant to the discussion.


I brought the article into it, as you seemed to deflect the misunderstanding of "low level" to what the article was pushing. And the article goes into several examples and has a "[this abstraction] is conspicuously absent on GPUs ..."

Granted, I can kind of see how the entire point of that rant from the article was that register renaming is some sort of sin of processor design. Problem is, of course, that you lose plenty of other speed tricks on GPU by making that tradeoff. More, my point was that that tradeoff comes from the natural unit of work for GPU, which would be operating over scenes of data. This isn't being opportunistic in looking for ways to go wide on operations, it is literally the reason those units were built. (And I'll ignore that CUDA programming looks a lot like a C program.)

Back to the idea of "closer to the metal," per the article. My further point was how close are we talking about? I know of literally no language that exposes all intrinsic operations of a machine to the end user. Excepting anything that allows inlined assembly? Such that anyone asking "what else is there" is almost certainly asking for those that do.

I'm ultimately open to the idea that there is no "close to the metal" language anymore. Largely for good reasons. To wit, it would be near impossible to code preemptively multitasked programs without something like register renaming. Yes, you could do it in software, but hard to see how that would dodge any of the complaints of the article with regards to the idea.

All of which to is to say that there not being an answer to the question that literally started this thread is a bit of the point? I'm sympathetic to the idea you were answering an easier question. I'm just pressing on the idea that you answered a different question.


While I'm reading about limbo and occam, what do you think apl and oberon can express that C cannot ? talking about low level electronics benefits (apl array idioms are superb for sure)


I recommend Sophie Wilson's talk on CPU architectures for some interesting insight into this.

https://www.youtube.com/watch?v=6lOnpQgn-9s

It's worth the time, IMHO, and I dislike video presentations. This one is different.

She designed the ARM processor (and BBC BASIC before that).


Bounds checking by default.

Actors, more precisely active objects in Active Oberon, the only one still actively being developed at ETHZ from Oberon linage.


but that's a high level feature, when people talk about C not being a low level language they mean you can't control/reflect the hardware enough right ? or maybe I'm misguided


>> what do you think apl and oberon can express that C cannot ?

> Bounds checking by default.

That's odd - I've written a C container library that checks bounds by default.

Are you sure that C doesn't allow you to check bounds?


Absolutely unless you're using a compiler with language extensions for pointer management.

A library isn't the language that is described by the ISO C standard document.


> A library isn't the language that is described by the ISO C standard document.

Sure, but the poster didn't ask "what comes with apl and oberon that doesn't come with C", they asked "what do you think apl and oberon can express that C cannot?"

And you absolutely, positively can EXPRESS bounds checking in C. I'm not sure where you heard that this is impossible, but it's probable you misunderstood or that source is wrong.


First of all your library doesn't come with C, otherwise it would be defined on the PDF I can buy from ISO.

Secondly using if statements and conditional expressions isn't what bounds checking in a programing language is about.

Here is some education material,

https://en.wikipedia.org/wiki/Bounds_checking

> Many programming languages, such as C, never perform automatic bounds checking to raise speed. However, this leaves many off-by-one errors and buffer overflows uncaught. Many programmers believe these languages sacrifice too much for rapid execution.[1] In his 1980 Turing Award lecture, C. A. R. Hoare described his experience in the design of ALGOL 60, a language that included bounds checking, saying:

Feel free to update the Wikipedia page and convince Wikipedia of your reasoning.


What does any of that have to do with whether or not you can EXPRESS bounds checking in a C program?

You were misinformed; one can certainly express bounds checking in a C program, independent of libraries or compiler extensions.


Please educate us, we are all curious to learn how.

Only the ISO C language is allowed, declare C array and then show us how do you validate the accesses with the index operator.

As second exercise, show us how a function call using pointer + length, validates that the lengh into the pointer region is a valid length for the memory region total size.


> Only the ISO C language is allowed, declare C array and then show us how do you validate the accesses with the index operator.

Who said anything about arrays?

Let me refresh what was said, and what you claimed.

What was said:

> what do you think apl and oberon can express that C cannot ?

What you claimed

> Bounds checking by default.

Are you seriously saying that you did not claim that bounds checking cannot be expressed in C?

Because that is all this boils down to - my reading of that was that you claimed that bounds checking is an example of a thing that "apl and oberon can express that C cannot ? "

> Only the ISO C language is allowed, declare C array and then show us how do you validate the accesses with the index operator.

No one made this claim so there is no point in doing what you asked.

> As second exercise, show us how a function call using pointer + length, validates that the lengh into the pointer region is a valid length for the memory region total size.

No one claimed this either. The specific claim is that it is possible to express bounds checking in C.


Too many words and very little facts.

Care to provide your library for the security folks to have a go at your bounds checking implementation in C.


> Care to provide your library for the security folks to have a go at your bounds checking implementation in C.

Once again, I have to ask - what does that have to do with your claim that C is unable to express bounds checking?


slightly branching out, I wonder if recent languages like zig allow (or will) customized array language features. They seem to more flexible about compiletime vs runtime and also allocation mechanisms


Hmm good question, but i think my question is located half way. What you describe is basically turing completeness, C allow to write more on top, but it won't be integrated in the base constructs of the language. I admit that this comment too is fuzzy :)

I hope things don't go angry in here


One example given in the article is Erlang VM which maps a lot better to modern processors.

We currently have a problem where we can't have thousands of cores because, even today, so much code is designed to be fast on one core.

We really have to move the asynchronous programming because synchronizing async hardware is both complex and inefficient.

RISC V is probably going to help since it allows for a lot of experimentation.


The article does mention a few areas of interest:

- Languages with "better" (=more modern hardware friendly) loop constraints are easier to parallelize (Fortran, Erlang, …)

- CPU architectures with better programmable vectorization (ARM SVE, Risc-V VE) are much easier to work with, if the language primitives allow it (see above)

Porting software over to fortran/erlang on aarch64 is something you can already do today, if you want to. Rust/Zig/etc. and RISC-V could have a good opportunity here to figure out better ergonomics for vectorization and more hardware friendly cache coherency policies, too, but no clue if anyone in the relevant standard gremiums cares.

In terms out "but what can I easily use as drop-in replacement?" Yeah, we're kinda stuck with C and languages that inherit its problems (current Rust/Zig/etc. included).


Rust/Zig does not have enough portability, there is errors trying to compile to s390x:

https://github.com/wekan/wekan-node20#trying-to-compile-llvm...

C89 compiles to 30+ CPU/OS:

https://github.com/xet7/darkesthour


GNU assembler, nasm, if you really need to go low level, usually you don’t.


isnt assembler a high level language nowadays?


Why would you think so? Assembler is what it has always been, i.e., mnemonics for machine instructions. Unless you are thinking about microcode, which is nothing new and I am not sure it should count as a "level" from the perspective of a programmer anyway.


By the standards of this article assembler on most architectures suffers from many of the same things that make C “not low level”: in particular it offers little control over cache hierarchy or coherency (modulo hint instructions like x86’s PREFETCH), nor instruction level parallelism. Of course, these aspects are entirely due to the fact that the dominant lingua-franca (C) has no ability to support these semantics.

In large part the article argues that in most cases the abstract machines that ISAs describe differ so fundamentally from the reality of how code is executed on the underlying machine to make a truly low level language impossible to achieve.


Your question seems provocative... but that's a very good question. I've always liked assembly programming and I got very puzzled when I discovered the processor metal have gone very far from the x86 instruction set I was writing. Il felt like the magic was gone.

Indeed there is no direct match anymore between instructions and gate combinations on the processor die. There is a microcode translating x86 instruction into whatever electronics are below. Change this microcode, and you could have your processor speak a different binary code (matched to a assembler language).


Rewrite the world in Rust /s

The real answer is: none. There are two problems, the first is you have to rewrite the world with the new language and hardware.

The second is, unfortunately, language enthusiasts who are willing to rewrite the world AND can get job done want a language to target a sequential abstract machine (i.e. look like C).


C++ is serving me well staying away from C as much as possible, since 1993.


When compared to assembler, I’d agree.

I grew up with 6502 and 68k. To me, back in the early 90s, C (Mac MPW C to be precise) was an abstract assembler. The code-gen was perfectly readable.

Compared to the likes of Python, it most certainly is low-level. These types of language allow developers to rapidly get something going and not just because of the libraries.

I’d find it very hard to justify a business position where C has any other role than binding and breaking out into something more abstract. Be that Go or C++, for an example.

An argument I used to hear was “performance” from C. I’m not entirely convinced as in a higher language your algorithm may well be better as you can deal with the abstraction.

But… people make money coding C.


Even before that, this is ultimately about that fact that an ISA for a general-purpose computer can be seen as a way to abstract away parallelism. Even in your favorite assembly language, the effects are largely supposed to happen one after another.

That abstraction is leaky, but the alternative is VLIW machines - even in that case, you probably end up using a compiler so that you don't have to worry about parallelism. Reasoning about parallel things is hard, that's why we spend so much time trying to avoid it ¯\_(ツ)_/¯


If you have never heard of the PDP-11 before like I did until yesterday (I should probably be banned for this from HN), this is really something worth learning about. There is an awesome project for a PDP-11 front panel replica running an emulator on a Rhaspberyy PI (the whole thing is called PiDP-11, haha). Here is more information:

https://obsolescence.wixsite.com/obsolescence/pidp-11


Mandatory xkcd: https://xkcd.com/1053/


I am neither a compiler writer or an OS guru. Just an old c programmer. In this article it looks like the entire CPU instruction set was designed to emulate pdp11 to ensure c compatibility. So my naive question. What is stopping a Microprocessor manufacturer to have two instruction sets one that is compatible and one that allows us to fully utilise the modern CPU but with a different programming paradigm? Is that too expensive or hard to do? I genuinely don't know.


I think that there's just isn't any commercial incentive, as we have a ton of legacy C code and the CPU will have to run anyway in the "legacy" mode to run your OS, be it Linux, Windows or MacOS.


This article is I guess thought-provoking for people who have never really considered what goes on at the CPU level. But C is not a low-level language, because it hides abstractions like caches from the programmer? So what is a low-level language? CUDA bytecode? CPU microcode? It seems like this article is saying that there is no such thing as a low-level language.


If you accept the premise of the article, you also need to accept that assembly is not a low level language, and that it is impossible to program any CPU currently for sale in a low level language.

The abstraction CPUs give you is more or less a fast pdp11 with some vector registers bolted on.

The implementation internally is not.


This is pretty much my issue with this article. By its criteria assembly isn’t low level, which makes the claim pretty uninteresting. You could argue there are/were processors that had low level ISAs (VLIW, no OoO exec, no memory reordering, no caches, no branch prediction, etc.) but they are all niche (usually low power embedded DSPs) or failed to capture the market (Itanium) because they failed to deliver performance comparable to the highly abstracted CPU supposedly “designed” to run C (also a questionable claim, I think the reality is it has more to do with sequential execution being the way humans think, a point Jim Keller has made in several interviews).


If we ever wonder why more people don’t get into low level programming, this article and the responses are an excellent case study. We’re allowed to make what we know accessible to newcomers and many of us should tone down our arrogance when we have really deep technical conversations.


The paper talks about how C is designed for a PDP architecture and that's the problem. Is there any language that is not that way and can handle parallelism and all the things mentioned in the paper?

Yes, I do see Erlang mentioned but I don't think it was considered a solution.


Interesting take, but I think it goes out of its way to prove the definition of low-level to be wrong, while missing that the definition it gives and claim is wrong, in itself is very flexible.

What is irrelevant? To a data-scientist, typescript is low-level. You're required to think about structure and compile stuff!

To a web developer, C# and Java are low-level because you need to think about the execution platform

To an IT developer, C and C++ are low level because you need to think about memory.

To a game developer assembly is low level because you need to think about everything.

To electronocians everything is high level. To accountants VBA in Excel is low level. To a product manager a word document with any sort of technical words is too low level.

If you need to optimize your software to the point where some CPU specific instructions are required, C is too high level because its hiding stuff that is not irrelevant.


With the same argument you could even argue that the x86 ISA is a high-level language, since under the hood it's decomposed to micro-ops which are scheduled on a superscalar infrastructure and run out of order.


I am really surprised that such a bad take has gotten so much airtime, almost as much as that such a gifted developer came up with it.

The only way that the title is true is one that is not mentioned in the article: when C became popular, anything that was not assembly was a "high level language". Heck, even some Macro assemblers were considered high level, IIRC.

The factors that are mentioned in the article fall roughly into two categories:

1. The machine now works differently.

This may be true, but it does so almost entirely invisibly, and the exact same arguments given in the article apply in the same way not just to assembly language, but even to raw machine language.

I have a hard time seeing how machine level is not low level. But I guess opinions can differ. What seems inarguable is that machine language is the lowest level available. And if the lowest available level does not qualify as "low" in your taxonomy, then maybe you need to rethink your taxonomy.

2. C compilers do crazy shit now

This is also true, but it is true exactly because C is a low level language. As a low-level language, it ties execution semantics to the hardware, resulting in lots of undefined (and implementation defined) behavior that makes a lot of optimisations that some people really, really want to do (but which are far less useful than they claim) really really hard.

So C compiler engineers have defined a new language C' which has semantics that are much more amenable to optimisation. Nowadays they try to infer that language C' from the C source code and then optimize that program. And manhandle the C standard, which is intentionally somewhat loose, in order to make the C'' language that looks like C but maps to C' the official C language.

Since they were moderately successful, it can now be argued that C has morphed or been turned into a language that is no longer low level. However, the shenanigans that were and continue to be necessary to accomplish this make it pretty obvious that it is not the case that this "is" C.

Because, once again, those shenanigans were only necessary because C is a low level language that isn't really suited to these kinds of optimisations. Oh, and of course the rationale document(s) for the original ANSI C standard, which explicitly state that C should be suitable as a "portable assembly language".

But then again we already established that assembly is no longer a low level language...so whatever.


>take a property described by a multidimensional value

>project it into a single dimension

>split it in the middle, thus inventing two useless artificial categories ("low level", "high level")

>get a bunch of highly functioning hackernews 0.1xers to argue endlessly about said useless categories

>submit weekly articles "thing X is NOT in my imaginary category Y!!!"

>profit

Arguing whether or not C is a low level language is about as useful as arguing whether dog-headed men have souls

Next up: IO is not a Monad, x86 machine code is not a low level language, RISC-V is not actually RISC, GPL is not actually open source and so on


The taxonomy is not the point of the article. The point of the article is about language and hardware development interactions and whether we are locked in a paradigm in which our reliance on C prevents us from taking advantage of hardware innovations, and in turn create languages which properly makes use of such hardware.


I see that point, but I still think the article is completely wrong.

My disagreement with the article (aside from the flamebait title) is that many of the things the author calls C problems are actually general computing issues. The reason highly threaded processors are not the norm is not that C can't take advantage of them (it does it just as well as 90% of other languages). The problem is that most problems aside from specialized domains are either highly sequential or require too much synchronization.

Regarding the immutable memory model example - C does not place any limitations at all. Just declare that modifying such an immutable object is undefined behaviour and let programmers figure it out. Memory already has its complexities with NUMA and such, C programmers have no issue taking advantage of these features.

Or maybe take TSX as an example - I'm fairly sure the PDP-11 did not have anything remotely close to Intel TSX and yet it is easy to use in C. Include <magic.h>, write __magicXYZ() and it just works.

Sure, existing C programs will run slowly on the author's imagined new processor architecture, but so will programs written in any language except maybe some highly restrictive very high level language (like GLSL on GPUs, etc.). But new programs that are written with such hardware in mind will not in any way be limited by C semantics and if they are (like with mistakes in standard such as errno for math functions...), it will be one compilation switch away from being fixed.


Wasn't it the idea of RISC to have a simpler CPU and push the optimization responsibility towards the programmer and the compiler?


I once read that C is the new assembly, because all CPU have a C compiler.

I then decided to make a language that compiles to C, it's just about adding strings, list and tuple. I almost finished the parser and the "translator" will take more time (I encourage anybody to try lexy as a parser combinator). Basically it will use a lot of the C semantics and even give C compiler errors, so it will save me a lot of work.

Of course I am very scared that I will run into awful problems, but that will be fun anyways.


Everyone knows that CPU stands for "C" processing Unit...

:-)


I just write English these days and have my LLM compile it to Python, so...


On the flip side, maybe CPUs are trying to be too general purpose.


(2018)


PDP-11 is a fast machine?


2017


(2018)


I don't think it was ever claimed that C was a low level language. In fact I have always heard it as the canonical reference for an example of a high level language. I will admit that in this day and age C feels like a low level language.

Lower level is something that maps more directly to machine operation (assembly, maybe forth).

Higher level is something that has it's own semantics of operation and need to be converted to into the machine operation, the more conversion the higher the level.


It is somewhat common to describe C as “low level” in introductory programming or CS classes (before the student would know what an abstract machine is). Lots of people carry misunderstandings from that early simplification forwards in their careers, especially if their only interaction with C is academic and not professional.


When C was declared to be "high level", there did not exist a level higher than C. Now that there are more levels above C, it is not the highest level language out there, which makes it low level compared to them. It is not the lowest level, but it's not a misunderstanding to call it low level. The playing field is not the same as it was 50 years ago, so relative terms like "low" and "high" have naturally changed referends.


Huh? Tons of languages at levels "higher than C" existed at that time C was created, and they were popular too.

LISP (1960), Smalltalk (1972), BASIC (1963), FORTRAN (1957), COBOL (1959) and countless others. Heck, ALGOL (1958, 1968) was much higher level than C too.


Yes it did, outside Bell Labs.


It's also that the definition of high vs low level has shifted in the past decade.

Nowadays a "high level language" is one where the person using it doesn't necessarily have to think about memory usage and allocation, since that's the task of a garbage collector - you accept a small amount of inefficiency in order to get a program that works "good enough" in 99.9% of all cases (since we're not on ancient devices anymore and most programmers don't write code that upsets the garbage collector in novel ways). By this criteria, Java, C#, Python, JavaScript, Ruby and so on are "high level languages" in that the programmer rarely has to think about this sort of thing; the underlying GC takes care of memory concerns. There's a reason you see these languages used more for end-user tools like webdev, scripting and desktop applications - the penalty is considered worth it (since it often ends up only shaving off milliseconds at most).

By contrast a low level language basically makes the programmer an active participant in memory management, with all the footguns that come with it. C and Rust are both two extremes of this - C just lets you do whatever, any form of memory control is up to you, segfaults included. Meanwhile Rust deliberately prevents you from doing anything that could possibly cause segfaults through its borrow checker. In some ways C can give you a lot more freedom to be efficient in how you allocate/deallocate your memory (or in the case of Rust - write code that is always memory safe), but you do trade things for it (in C you basically have to be really meticulous about free()-ing memory while in Rust you have to eat a lot of complexity upfront to not upset the borrow checker).

Also contrasting to high level languages, the modern domain of lower level languages tend to be things like drivers, kernels, RDBMSes and the like, rather than conventional user-facing applications (which it also was used for in the past since most of the previously mentioned languages are either pretty young or took quite some time to mature). Still useful, just a different set of expectations, since those are the components that have to be fast so the rest doesn't have to be as hyperefficient.


> in C you basically have to be really meticulous about free()-ing memory

only if you malloc()/free() for every allocation/deallocation. if you use any other allocation strategy then this is never an issue.

for example: see the "Rewriting the memory management" section in this article: https://phoboslab.org/log/2023/08/rewriting-wipeout

> I'm not sure what the original PSX version did, but the PC version had a lot of malloc() and little fewer free() calls scattered around. Now I can assure you that the game doesn't leak any memory, because it never calls malloc().

> Instead, there's a fixed size statically allocated uint8_t hunk[MEM_HUNK_BYTES]; of 4mb that is used from both sides:

> A bump allocator takes bytes from the front of the hunk. This is used for everything that persists for many frames. When the game starts, it loads a bunch of assets that are needed everywhere (UI graphics, ship models and textures etc.) into this bump allocater and then remembers the high water mark of it. When you load a race track, it loads all assets needed on top. After finishing a race, the bump allocator is reset to the previous high water mark.

> On the other side, a temp allocator takes bytes from the end of the hunk. Temporary allocated objects need to be explicitly released again. This is used when loading a file into memory. The file is read at once and unpacked onto the bump allocated side. When done, the temp memory for the file is released again.

> Temporary objects are not allowed to persist over multiple frame. So each frame ends with a check to ensure that the temp allocator is empty.

> Somewhat related, the OpenGL renderer does the same with the textures: It bumps up texture memory (more precisely space in the texture atlas) and resets it to the previous level when a race ends.

if you use a system like this—either malloc() just once (or a few times) at the start of your program and then never manually free(), or just use statically-allocated arrays—then you never have to worry about "meticulous free()ing". I'm not sure why this never seems to be taught in early CS courses that teach C—it seems that basically everyone comes away thinking malloc()/free() OCD is the only way to manage memory with C, and is thus undesirable compared to the ease of use of garbage collection.


> I don't think it was ever claimed that C was a low level language.

When I was introduced to C during high school, my teacher presented C as a low-level language compared to what we previously studied (which was Ruby).

And I just ate that up because C looked less readable than Ruby, today (10 years later) I have to disagree with my teacher. C is not a low-level language, it has access to the lower level parts, sure. But it is an high level language!


> I don't think it was ever claimed that C was a low level language.

It was introduced to me as "glorified PDP11 Assembly Language". So the claim has been made at least once.

Granted, there are people here commenting that maybe assembly language is not "low-level". I'm lost for words.


Im curious - what is it about forth that makes you consider it to map more closely to the machine?

I've done a handful of forth projects as part of a code-dojo years ago. I wouldn't have considered it low-level.


Fort is strange, it requires a virtual forth machine to run(I have heard of hardware that runs forth directly but it is exotic), this should automatically exclude it from the low level camp. however this machine ends up almost trivial to write and is very simple. so once you start writing forth it feels low level, like there is very little between you and the cpu. As a consequence, just like assembly, forth people tend to reinvent the world.

Note that I am not far in the forth rabbit hole at all, any interest I may show is incidental, a side effect of my interest in postscript, which is very much a high level language.


Forth is easy to make custom hardware for, even if it's a poor fit for the commonly available hardware architectures. RPN + Stack lends itself to a very simple implementation (no registers needed, easy layouting, etc.).


It's fortran, not forth.


They're talking about forth, not Fortran.


Someone should make a Forthran (I wonder what it would look like though).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: