C Is Not a Low-level Language (2018)

GuB-42 · 2023-10-16T13:38:53.000000Z

C is low level for at least one reason: manual memory management. Especially with modern hardware, memory management is at the center of programming. For example, Rust prides itself in being memory safe without a garbage collector, memory management is more or less the entire reason for Rust to exist. Why is C fast? Memory. Why is C unsafe? Mostly memory. One of the big reason parallel computing is hard? Concurrent memory access. Functional programming is often surrounded by plenty of mathematical concepts, but a good part of it is to pretend that objects are immutable when behind the scenes, the compiler works with mutable memory.

In C, every call to the allocator is explicit, that is, if you are using an allocator at all. Compare to oldschool C++, with new/delete and raw pointers, where you may call the allocator explicitly, but still, a lot happen in destructors, automatically. In modern C++, with smart pointers, it is essentially like a garbage collected language in the sense that allocation and deallocation all happen automatically.

grotorea · 2023-10-16T14:38:23.000000Z

> C is low level for at least one reason: manual memory management. Especially with modern hardware, memory management is at the center of programming.

Ok, but even with C we can't actually low level manage memory how the processor does it. You can't tell the processor what to keep or not on what level of cache, what to send to virtual memory, etc. It's lower level than say Python but I don't think it is low level memory management in the way PDP-11 C was.

Chabsff · 2023-10-16T14:43:29.000000Z

A lot of that is just a property of modern OSs, with good reason, intentionally not exposing these features to userspace processes. It's not really a function of the language itself.

grotorea · 2023-10-16T14:53:39.000000Z

Hmm, true for virtual memory, didn't think of that, but CPU caches are inside the processor, can even the kernel control it at all?

Chabsff · 2023-10-16T15:12:13.000000Z

It's not really a question of whether or not it's feasible, but rather: If the hardware were supports such operations, would they be expressible within the language? If they are, then we should point at the accessibility of the functionality rather than C.

In this case, I could definitely see `set_cache_behavior(ptr, len, options)` being perfectly reasonable, so I'd argue that, again, the fact that we can't do it is more a property of the environment than the language itself.

theamk · 2023-10-16T15:16:59.000000Z

The reference point for when we talk about low memory languages is not transistor but rather machine code. In this aspect we say that C is low level because one pointer dereference in C translates directly into memory load of machine code.

The fact that modern memory load operation involves cache, protection, memory mapping, etc.. is not a property of language, but rather of the environment (CPU + OS).

grotorea · 2023-10-16T17:46:03.000000Z

That's fair enough, but it doesn't seem to be the most useful that it can be then. Low level to me could mean that we control all the details. But modern processors have microcode between transistors and the x86 machine code. And we can't control all that memory stuff.

But those aren't abstractions that we can treat as black boxes, we need to know them and how to code taking them into account without actually having control inside the black box.

wang_li · 2023-10-16T14:50:45.000000Z

You can control what goes into cache if you want to. The effort to make an open source bios do this in order to have working memory before the DRAM controllers are initialized.[0]

0. https://www.coreboot.org/data/yhlu/cache_as_ram_lb_09142006....

ilyt · 2023-10-16T15:19:34.000000Z

> Ok, but even with C we can't actually low level manage memory how the processor does it. You can't tell the processor what to keep or not on what level of cache, what to send to virtual memory, etc.

neither can assembler so it is useless distinction

CJefferson · 2023-10-16T15:24:12.000000Z

There are CPU instructions to pull memory into cache, send cache back to main memory, and Mark things in cache as not worth writing out to memory. All hard to use from C, the last type basically impossible.

monocasa · 2023-10-16T15:32:04.000000Z

> pull memory into cache, send cache back to main memory

I haven't had much of an issue with intrinsics.

> Mark things in cache as not worth writing out to memory

Can you give an example of this in a surviving architecture?

oasisaimlessly · 2023-10-16T16:14:42.000000Z

Does ARM64 count?

DC CIVAC: https://developer.arm.com/documentation/ddi0595/2021-12/AArc...

monocasa · 2023-10-16T16:55:59.000000Z

That isn't really marking cache items as not being worth writing out to memory. It's a feature for communicating with non coherent devices to tell the cache that any lines it might have speculatively loaded might have changed out from under it.

"Not being worth" reads like cache perf management, as opposed to uses of DC CIVAC, which are strict correctness issues.

bjourne · 2023-10-16T16:56:16.000000Z

Intrinsics are not part of C though. C is the abstract machine defined in various specs and its syntax and semantics.

monocasa · 2023-10-16T19:00:07.000000Z

C is as much its implementations as it is its spec. And the spec came much later than the implementations, and really was designed as a common subset.

pjmlp · 2023-10-17T07:53:08.000000Z

Any language can have intrisics, C is not a special snowflake.

So if intrisics count, than any language goes.

xscott · 2023-10-16T15:33:06.000000Z

Any of those instructions could be wrapped in a C function or intrinsic if they were valuable enough

nuancebydefault · 2023-10-16T20:25:28.000000Z

Wrapping other languages does not count when describing the hight-of-level of a language. In almost any language you can embed machine code, even in spoken languages.

It would be like 'English is not easier than Chinese, since I can quote a Chinese sentence in English.'

jowea · 2023-10-17T00:08:16.000000Z

There's a bit of a difference in that C has ways to closely integrate assembly as needed. Most languages don't.

pjmlp · 2023-10-17T07:55:11.000000Z

They aren't part of C, they are compiler specific extensions.

Any language can have compiler specific extensions.

Playing fair game, all non toy languages can load modules written in Assembly.

In fact that was the only way in K&R C, compiler extensions came later.

quatrevingts · 2023-10-16T15:52:36.000000Z

As the article notes, that's because CPUs are designed to run existing C code fast. You could create an instruction set that provided this control, but it might be a tough sell in a world full of C code.

cmrdporcupine · 2023-10-16T13:55:13.000000Z

"Memory" itself is an abstraction around a much more complicated model (virtual memory / pages) that most programmers remain ignorant of. (Unless you're working on a microcontroller class system, or other system without an MMU but that's a whole other kettle of fish(.

Even Rust developers like myself labour within the fantasy that a pointer is, y'know, like an address to memory, a real "physical" thing. Rust (and to some extent C++) introduces some management abstractions in front of this in the form of references and borrowing, but the main concept is still there.

In reality the kernel of your operating system has put a giant layer between you and the physical memory, and the "address" and "pointer" are really just handles behind which the OS and MMU do all sorts of shenanigans.

"Raw pointers" really aren't raw. They're handles to offsets within pages, which can be all over the place. It would be entirely possible to walk away from the libc & C model entirely and work in a world of pure references interacting directly with VM subsystem pages as some kind of "object handles" and be much closer to the actual operation of the underlying system.

squeaky-clean · 2023-10-16T14:31:03.000000Z

> Unless you're working on a microcontroller class system, or other system without an MMU but that's a whole other kettle of fish

So C can do actual memory management, your OS or hardware just won't let you. I've done programming for audio effects gear where memory is directly accessible by real address. Often with different memory chips with different performance characteristics (for cost reasons) corresponding to different pointer value ranges. Just because your machine won't let you do it doesn't mean C isn't capable of it.

jstimpfle · 2023-10-16T14:08:33.000000Z

Raw pointers are how you communicate with your CPU. They are "raw" in the sense that they're just an integer number (not really on the C language level, but they have an integer representation on any actual target like x86) and that you have to synchronize these pointers with the lifetimes of "actual" objects, which are only an abstract concept that your computer doesn't understand.

Meanwhile, virtual memory is as close as you can get down to the physical hardware in terms of normal CPU instructions (i.e., not VM management code). VM as a concept is orthogonal to raw pointers, which can be either virtual or physical.

Raw pointers are nothing like handles. They need to be manually "synchronized" properly with VM management (which happens completely behind the scenes for 99,99% of userspace code) to make sense but it's not like there is bookkeeping overhead in copying or offseting a pointer, like there would be for a "handle".

The point of a handle is that it's use to hold objects, to keep them alive. Raw pointers don't do that.

saagarjha · 2023-10-16T14:08:06.000000Z

Would such a model be generally useful, though?

actionfromafar · 2023-10-16T14:17:32.000000Z

I have thought so for a long time. It could open up execution of functional languages on a truly distributed runtime. Something like the fabled Tao operating system I guess.

cmrdporcupine · 2023-10-16T15:54:03.000000Z

Definitely useful in some systems context, especially e.g. database page buffer management.

anonymous_sorry · 2023-10-16T14:20:35.000000Z

> It would be entirely possible to walk away from the libc & C model entirely and work in a world of pure references interacting directly with VM subsystem pages

Is this possible in Ring 3? Or would everyone be running in kernel mode at that point.

Even if you do away with that layer, then there may still be a hypervisor lying to the kernel about memory.

pornel · 2023-10-16T21:48:25.000000Z

C's memory management is its own abstraction. malloc and free are library functions. They're an abstraction not just over hardware (that doesn't have anything bytewise allocated like that), they even abstract away the way operating systems allocate memory.

You don't get direct access to the stack in C either. Stack frames are abstracted away, and you only get longjmp.

If you pay attention to Undefined Behavior and strict aliasing, you don't even get that much access to poking around memory.

pjmlp · 2023-10-16T14:23:14.000000Z

BASIC also can do manual memory management, not only that, it had a whole computer generation for itself, in computers not able to have a full ISO C implementation.

theamk · 2023-10-16T15:24:09.000000Z

So does Python (via ctypes) and pretty much every language we consider "high level". But in BASIC your default approach is "DIM names$(count)" which "magically" manages your memory for you.. which is why we consider it higher level than C.

pjmlp · 2023-10-16T15:38:30.000000Z

That is hardly different from malloc(count * size), REDIM exists (aka realloc()) and many BASICs do offer the free variant as well.

In fact, there is hardly any difference between VMS BASIC and VMS C in terms of what is possible, if we want to take the discussion outside of 8 bit versions.

marcosdumay · 2023-10-16T14:09:06.000000Z

> Why is C unsafe? Mostly memory.

C can't even do all of integral arithmetic safely. It's a language that goes really out of its way to add unsafety.

bensecure · 2023-10-16T15:18:59.000000Z

I'm pretty sure that this is one of the unsafeties that rust borrows from c, even as it attempts to eliminate all the others. Checking every addition adds a massive slowdown, without giving much useful protection against vulns or corruption.

hirrolot · 2023-10-16T15:26:02.000000Z

> I'm pretty sure that this is one of the unsafeties that rust borrows from c

But integer arithmetic is safe in terms of Rust.

> Checking every addition adds a massive slowdown

It only does so for debug mode. In release mode, it uses modular arithmetic.

Chabsff · 2023-10-16T15:50:12.000000Z

> It only does so for debug mode. In release mode, it uses modular arithmetic.

So it's still treated as an error, just one that has a predictable fallback. I'm really not sure how that's much different from `-fsanitize=undefined`. Broken code is broken, even if it breaks in a predictable manner.

Now if the modular arithmetic had been enshrined as the expected behavior without being treated like an error to be caught, it'd be another matter.

marcosdumay · 2023-10-16T16:05:13.000000Z

An error does not rewrite your entire code on the assumption that it can not happen.

Signed overflow is not an error in C.

Chabsff · 2023-10-16T17:40:01.000000Z

I meant error "in the code".

A chunk of C that causes a signed overflow has an error in it. Seemingly, so does Rust code according to the behavior described in the post I was replying to.

My point is that I question how big is the value gain from having a predictable fallback when we are already within the realm of "this code is considered wrong". This isn't unlike the various arguments against the value of compiler warnings.

That being said, I agree that it's preferable in general, but the difference seems rather marginal to me. That is, within the context of what I'm replying to. I wouldn't be surprised if Rust had a few additional tricks up its sleeve to address this.

marcosdumay · 2023-10-16T19:21:55.000000Z

You really mean the usual C reasoning of "if this program has an error, what difference does it makes if it returns the wrong value or formats the main disk" (With an implicit "I see none" added on the end)?

Because a caught static error, a runtime error, a wrong value, and C's UB are completely different beasts.

uecker · 2023-10-16T19:31:49.000000Z

I think modular behavior at run-time is actively dangerous. It is not memory-unsafe, but still unsafe. Having it trap would better. For C, you can tell the compiler to trap for signed overflow.

marcosdumay · 2023-10-16T21:19:08.000000Z

IMO, that's a job for the type system. But if you can only have one option, clearly an error is the best one.

Anyway, none of those are anything nearly as damaging as C's UB. All of them are reasonable, on the literal sense that you can reason about them, anticipate what your program may do, and defend against the problem (or shrug it off and claim "it doesn't matter here"). You can do neither with by the spec C.

uecker · 2023-10-17T17:03:13.000000Z

I do not think C's UB is damaging. As I said, you instruct the compiler to insert a trap and then it is not unsafe.

Example: https://godbolt.org/z/Kvrrx19Pa

The UB in the spec is exactly what makes safe use possible without enforcing it everywhere, which is not feasible for C.

jstimpfle · 2023-10-16T17:25:47.000000Z

That should just be a matter of a compiler knob, no? Such as -fsanitize=undefined (which is the sledge hammer, but there could be more fine grained ones).

imtringued · 2023-10-17T08:52:29.000000Z

It's not. There is no sanitizer on embedded platforms and it turns out, I only use C on embedded platforms, which for me means the UB sanitizer doesn't exist.

shrimp_emoji · 2023-10-18T13:30:56.000000Z

Also, that flag enables only unreliable detection of the issue. It won't catch it 100% of the time. :D It also interferes with other flags and checkers like Valgrind and adds bloat to the executable as a bonus.

estebank · 2023-10-16T15:45:15.000000Z

> But integer arithmetic is safe in terms of Rust.

To expand on this: integer overflow is not UB, it is unspecified. It can result in clamping, wrapping or a panic, depending on configuration at compile time.

> In release mode, it uses modular arithmetic.

And I believe that to have been a mistake. Android enables overflow checks by default and there is no measurable performance impact.

xscott · 2023-10-16T15:36:33.000000Z

Lol, when C does it, it's "unsafe". When Rust does it, it's "modular"!

mathiaskindberg · 2023-10-16T16:03:15.000000Z

For anyone wondering about the term "modular":

> In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" when reaching a certain value, called the modulus. The modern approach to modular arithmetic was developed by Carl Friedrich Gauss in his book Disquisitiones Arithmeticae, published in 1801.

https://en.wikipedia.org/wiki/Modular_arithmetic

nuancebydefault · 2023-10-16T20:34:54.000000Z

Modular/modulus (and also twos complement) is a natural feature of most CPUs. It comes natural by the way logic counters work in hardware. C enforces that by not touching that logic. Rust is the same in that respect. Python on the other hand treats integers as objects with a virtually unlimited amount of digits. That said, float/double precision and logic is still CPU dependent and is used as-is for most languages.

marcosdumay · 2023-10-16T16:30:26.000000Z

It's astonishing the number of people defending the C "ideals" that demonstrate ignorance about what C actually does. (Is it artificial,in order to willingly miss the point?)

Only some of the integral types in C are modular. If they all where, it wouldn't be a problem.

xscott · 2023-10-16T18:27:06.000000Z

No, what's astonishing is that now that every CPU worth using has 2's complement for signed integers, the compiler writers are still embracing undefined behavior in the name of piddly optimizations.

I was tempted to specify "unsigned" to ward off the obnoxious pedants, and I see that I should have. C really should be a portable assembly language by now. A very small non-breaking change to the standard, and C's arithmetic would be the same as Rust's.

marcosdumay · 2023-10-16T19:16:21.000000Z

> No, what's astonishing is that now that every CPU worth using has 2's complement for signed integers, the compiler writers are still embracing undefined behavior in the name of piddly optimizations.

Yeah, that's also astonishing.

Anyway, I've stopped blaming the C developers by now. I just assume they have the goal of killing the language and moving people into a more ergonomic alternative. I don't know their true intentions, but this has been a very predictive assumption.

(I guess any definition of any UB would be non-breaking, so yeah, they could fix all of the language.)

xscott · 2023-10-16T19:43:42.000000Z

> I just assume they have the goal of killing the language

Sadly, that's my conclusion too. I really wish there was a good "portable assembly language", and maybe that'll be something that targets WASM.

cozzyd · 2023-10-16T21:50:39.000000Z

then use -fwrapv if that's what you want...

nicoburns · 2023-10-16T16:49:41.000000Z

C and Rust do very different things in this case. C defines overflow of signed integers to be Undefined Behaviour. Whereas in Rust it either wraps (release mode) or panics (debug mode).

jstimpfle · 2023-10-16T17:19:45.000000Z

Any specific C compiler could do the same in complete agreement with the C standard.

There isn't a guarantee that any given standards-conforming compiler will, but it seems that with Rust there isn't a guarantee what behaviour you get either (it depends on the compile settings). In either language, you can't write code that does signed overflow in a meaningful way (at least not if you use Debug).

xscott · 2023-10-16T18:51:20.000000Z

I agree with your point, but note that Rust does have an ugly way to do it:

https://doc.rust-lang.org/std/primitive.i64.html#method.wrap...

I'd rather just put `-fwrapv` on the command line that clutter my code with crap like that though.

nicoburns · 2023-10-16T19:11:28.000000Z

> I'd rather just put `-fwrapv` on the command line that clutter my code with crap like that though.

The advantage of Rust's way is that it lets you customise addition on a per-operation basis. So you can mix and match wrapping addition with saturating addition, etc.

xscott · 2023-10-16T19:32:58.000000Z

Yes, I can have functions/methods that do math differently from the default. I could just as easily say that's C's way and mix and match those:

    int64_t x = add_i64_with_abort_on_debug_and_wrap_on_release(y, z);
    uint8_t t = add_u8_with_saturation(u, v);

I'd prefer the default for the math operators be what all the CPUs currently do, and neither C or Rust promises that.

xscott · 2023-10-16T18:47:58.000000Z

Yes, so fix the C standard... But the compiler guys won't let the anyone fix it, because optimizations.

So either A) the optimizations using UB are important, and therefore C is/will-always-be faster than Rust which doesn't have them. Or B) the optimizations using UB are not important and the compiler writers for gcc and clang are wrong.

You pick.

And of course some insane people advocate for adding undefined behavior to Rust in the name of optimizations. Gross.

bjourne · 2023-10-16T17:06:52.000000Z

> But integer arithmetic is safe in terms of Rust.

It's "defined safety": If a >= 0 and b >= 0 then a + b > = 0. True according to most schoolchildren but not true according to the Rust spec. It breaks the principle of least astonishment and has and will lead to security vulnerabilities.

uecker · 2023-10-16T19:29:06.000000Z

For C, I tell my compiler to make it trap. Then it is also safe.

shrimp_emoji · 2023-10-16T16:57:17.000000Z

But 53 years later, it's added the `<stdckdint.h>` header, offering `ckd_add()` and friends. :D Better late than never!

diogenes4 · 2023-10-16T15:02:46.000000Z

Integers are an abstraction on top of words; words are perfectly safe.

jstimpfle · 2023-10-16T14:13:16.000000Z

[flagged]

JonChesterfield · 2023-10-16T14:15:43.000000Z

Post might have signed integer overflow => undefined behaviour => delete all the code in mind. Which can be avoided if you remember that hazard is there.

What's integer divide by zero in C? Would you consider that safe?

TazeTSchnitzel · 2023-10-16T14:27:59.000000Z

> Which can be avoided if you remember that hazard is there.

This is true, but it could be argued that it's harder to avoid signed integer overflow in C/C++ than it is to avoid buffer overflows: you often don't know what types you're working with, due to the usual arithmetic conversions and integer promotions, and checking whether an overflow will occur without causing an overflow is difficult in itself. It's kind of uniquely awful in this respect, basically every other popular language has a more practical integer programming model.

celrod · 2023-10-16T14:53:56.000000Z

I like integer overflow being UB. Makes it easier to check for by enabling a sanitizer. With it being defined as wrapping, it would be illegal for overflow to cause a runtime trap. Of course, rust mandates the trap on debug builds, which is a fine approach too.

rewmie · 2023-10-16T15:10:30.000000Z

> What's integer divide by zero in C?

It's explicitly left undefined.

> Would you consider that safe?

Yes, because, as all C newbies can easily explain to you, the general rule of thumb is that undefined behavior should be treated as a fault and thus should be handled as a bug.

Hence, your question reads as "Would you consider a bug to be safe?".

In case of integer division, you simply need to check that the divisor is not zero prior to executing the division. Done.

JonChesterfield · 2023-10-16T15:28:47.000000Z

People new to C hear about UB, resolve to not do that, and that seems fine.

People not new to C have noticed that so many constructs are UB as to make it infeasible in practice that any given codebase will be free of UB.

What makes C a fundamentally unsafe language is that conceptually minor errors have unspecified consequences of unbounded magnitude, limited to no compile time detection of said errors, and that even builtins like + are specified to compile successfully into nonsense in some contexts.

Integer operations can be defined to be safe. Whatever integers you pass to your maths operation, you get an unsurprising integer back.

What the language does in the presence of your bug is definitely part of the safety properties of the language.

jstimpfle · 2023-10-16T17:01:00.000000Z

> What makes C a fundamentally unsafe language is that conceptually minor errors have conceptually unspecified consequences of unbounded magnitude.

There, fixed that for you.

Everybody knows that these scary stories can happen (even though almost nobody has seen them happen in the wild). But for the most part they should be seen as a combination of (typically, obviously) buggy code and compiler optimizer defects, rather than fundamental defects of the language.

> Integer operations can be defined to be safe. Whatever integers you pass to your maths operation, you get an unsurprising integer back.

There is at least 1 arithmetics teacher disagreeing with you.

There is no point, and let me scream again NO FR**G POINT, for a zero divide to return zero. I want it to crash.

And I consider it a compiler defect if the compiler proves that a zero division is happening and proceeds to do a strange optimization instead of reporting it.

It's a fine line to walk though, since there is also the case of legitimately assuming that it doesn't happen, and not emitting the code that triggers the crash. There probably should be compiler knobs to tune the behaviour.

jstimpfle · 2023-10-16T14:20:47.000000Z

[flagged]

JonChesterfield · 2023-10-16T15:01:51.000000Z

Decent chance integer divide by zero will kill your process. Might even call it a floating point exception. Maybe that qualifies as safe to you, seems not-safe to me.

Compelling alternative would be for it to return zero. Which is safe. But C doesn't do that. So you have to remember to never write '/', and instead call my_divide, which has a branch in it.

jstimpfle · 2023-10-16T15:11:45.000000Z

> Compelling alternative would be for it to return zero. Which is safe.

Is it? Or is it just another opportunity for a bug to hide?

jovial_cavalier · 2023-10-16T16:06:14.000000Z

>Compelling alternative would be for it to return zero.

That's actually way less safe than crashing. If your code doesn't handle the case where the denominator is zero, it is likely that the logic around your division doesn't consider it either. The behavior you suggest would take a rapidly increasing number and instantaneously set it to 0, then silently pass it into the logic that was humming along up to that point.

Since there's no symbol for NaN in integers, there is no safe way to represent `x / 0`, and thus the best way to handle it is to fault. Even better would be if the compiler caught it and warned you.

jstimpfle · 2023-10-16T16:50:36.000000Z

> Even better would be if the compiler caught it and warned you

Which is probably what the big compilers do (seems reasonable to expect it) -- I can't really know though because the last time I've written code that divides by 0 in an easily provable way is probably a long time ago.

shadowgovt · 2023-10-16T14:45:31.000000Z

Accurate comment of the day.

C intentionally leaves behavior undefined to allow flexibility in implementation, which allows implementations to take advantage of optimizations that other languages can't because they are over-constrained for the target architecture.

What's the order of operations for sequence of expressions added together? Do they evaluate left to right or right to left? The answer is, very intentionally, "any order is allowed and order may change between executions of a line." This makes it possible to make C implementations equivalently fast on various architectures where, for example, stacks and stack operation ordering make it more efficient to do one or the other. Hell, the language is even usable in an architecture where each of those expressions could be run purely parallel. Of course, the downside is that you definitely can't assume side effects caused by the left side of an expression occurred before the right side.

But the trade-off is that you do have to be extremely sensitive to making assumptions that the specification doesn't actually specify, or you are going to trip over an error in your assumptions that matters on your implementation. The actual "this is undefined behavior so the program deleted your whole hard drive" scenario is incredibly unlikely on a modern desktop or server architecture, but the languages flexibility does mean that you're perpetually one missed trick away from incredibly perplexing behavior.

This is a feature of the language not a bug. It's how the language gets you as close as it possibly can to the speed you would be able to get hand writing the assembly at every step of the process.

AlotOfReading · 2023-10-16T15:18:38.000000Z

It's not a bug only in the sense that it's intentional. It's certainly not "good" or a feature any other language should ever copy again because it prevents you from saying anything formally about the properties of the system at time T, when UB has occurred or will occur at any point along the execution path. As a result, you're always stuck with the qualification that x is true in the absence of UB, something that no automated checker can ever verify and careful review is insufficient to validate in the real world.

Whether or not this is a common issue on commodity hardware is basically irrelevant given C's remaining niche. It's mainly the language of kernels, drivers, and firmware nowadays, not primitive CLI tools that can crash whenever they feel like it. Ensuring system state with high reliability is the raison d'etre of modern C, so the lack of ways to do that without significant caveats is incredibly problematic.

shadowgovt · 2023-10-16T15:44:47.000000Z

That's the tricky part. As the amount of parallelism required to eke out more performance approaches infinity, the amount we can say about the properties of the system at time T approaches zero. Determinism is actually the enemy here.

> It's mainly the language of kernels, drivers, and firmware nowadays, not primitive CLI tools that can crash whenever they feel like it

On this we completely agree; I would encourage pushing implementation of command line tools in C as close to zero as possible. It isn't necesary to get maximum performance (either because those tools don't demand it or because we have other langauges that make better safety-performance tradeoffs to get us, on average, as fast as C without sacrificing things like memory-access safety).

AlotOfReading · 2023-10-16T17:05:03.000000Z

If we can't say anything about the system at time T, that's a failure of the execution model. I'm not trying to implicate Rice's theorem here or other undecideable problems. C makes it difficult to correctly implement any code (rather than all code) that has a property we want and show that it holds solely with the tools available to mortals. Divine intervention is helpful, but notoriously unreliable.

shadowgovt · 2023-10-16T18:11:43.000000Z

This is probably my bias just from all of the distributed work I've had to do, but in general being able to say anything about the state of the system at a discrete time T is an expensive luxury in my ecosystem, If not actually infeasible.

You can say things about the expected input and output, you can reason about how it's supposed to get there, and you can do traces of a discrete path that was taken in hindsight. But being able to pause the universe and say "What is the current state of the machine" isn't possible when the 'machine' is spread across data centers in multiple geographic locations and it's barely, if we turn our head and squint and lie to ourselves, possible on a modern CPU architecture with SIMD instructions and branch prediction.

AlotOfReading · 2023-10-16T18:49:05.000000Z

Totally agree, but it's a bit outside C. The C execution model is essentially built around the idea of a single-thread computer with a flat memory space, with very minor considerations for exceptions hidden in UB throughout the standard. Most of the difficult and expensive consistency stuff is punted outside the language or to other standards like Cilk and UPC, though recent versions have at least recognized that threads and parallel processors exist.

jstimpfle · 2023-10-16T22:45:45.000000Z

Other languages aren't much different in this regard (other than often being less well specified), because it's the reasonable thing to do. You want to give threads a lot of independence for performance reasons. But people just love to shit on C.

AlotOfReading · 2023-10-17T00:09:40.000000Z

Yes, many popular languages are even worse. Thankfully most of those languages (C++ excepted) aren't used for systems programming like C is and hopefully we can agree that the safety story with C++ is mixed at best. Those that have some history as real systems languages (e.g. Java) tend to have pretty decent execution semantics. But just to provide a real counterargument, Ada/SPARK meets the bars I set above. Rust is also a significant improvement over C in practical terms, even if there isn't an official standard you can point to in the same way.

rewmie · 2023-10-16T15:03:53.000000Z

> C can't even do all of integral arithmetic safely.

Your comment reads like nonsense. Are you able to provide what you feel is the best example that substantiates your claim?

DashAnimal · 2023-10-16T15:25:28.000000Z

When comparing a signed and unsigned integer, the signed integer is promoted to unsigned integer.

So if you have a = -1, b = 1000 and compare the two, a > b is actually true.

aidenn0 · 2023-10-16T15:14:17.000000Z

Signed integer overflow is UB in C.

matheusmoreira · 2023-10-16T15:26:48.000000Z

Can't we force the compiler to define the behavior with flags like -fwrapv? I use them as a matter of course. Also use -fno-strict-aliasing.

Chabsff · 2023-10-16T15:41:48.000000Z

Yes, you can tell the compiler to generate predictably-behaving output for non-compliant code. However, that code is not valid C anymore, but a dialect.

Since this is a discussion about what C is and is not, I think it's fair to limit ourselves to the actual language as-is.

matheusmoreira · 2023-10-16T16:05:50.000000Z

> that code is not valid C anymore, but a dialect

> I think it's fair to limit ourselves to the actual language as-is

I don't agree. Standard C only ever seems to be discussed when some asinine undefined behavior starts causing problems that we have to work around. No one cares about it otherwise.

It's better to redefine C as whatever the compilers accept. Now we can actually move forward and actually fix problems such as "signed integer overflow is undefined" -- just tell the compilers to start dealing with it.

andrewaylett · 2023-10-16T19:01:39.000000Z

But they do deal with it -- by assuming it doesn't happen. Wrap or trap don't necessarily change that.

At its heart, behaviour that's undefined like this is a source of optimisation opportunities, because we tend (and especially the preprocessor tends) to write code that assumes it won't happen. C does not lend itself well to iterator patterns that elide the range check entirely, and so it is valuable to the optimiser to be able to assume that a variable that steadily increments will not suddenly turn out to have wrapped. So we may (for example) unroll memory accesses[0], secure in our understanding that when we add 1 three times we will get three consecutive numbers, and (where we would normally add another 1) go ahead and add four at the end of the unrolled loop.

If we trap at that point, the behaviour is different from if we accessed each memory location in turn and trapped when we actually wrapped around. In the C model, the behaviour is still clasically undefined, but we've added a trap to hopefully catch that it happened before running too much further. We still can't assert anything about the state of the program after the trap, to potentially recover from it.

People writing performance-sensitve code get frustrated when a compiler trades performance for safety, so we're probably never going to get "safe" C in that sense. In practice though? This form of undefined behaviour only kicks in at runtime. Make sure your software is bug-free and you need never worry about it.

[0]: Imagine you have a 16 bit signed int that you're using as input into computing an index into an array -- you may know that it's never going to overflow, but how do you tell the compiler?

aidenn0 · 2023-10-16T18:23:43.000000Z

Compilers are typically benchmarked with ubsan and friends turned off, so when we say "C is fast" [with undefined behavior enabled] and "C is reasonably safe" [with undefined behavior defined] we aren't talking about the same language.

aidenn0 · 2023-10-16T16:21:25.000000Z

Well, what I really want is a trap on overflow, but since C doesn't support that, modern architectures don't have the capability; you can do it in software (like with ubsan), but you pay a performance penalty.

matheusmoreira · 2023-10-16T16:33:34.000000Z

I mentioned the -fwrapv option to make it wrap around. There's also a -ftrapv option that makes the compiler generate traps.

ndiddy · 2023-10-16T16:31:39.000000Z

Most popular C compilers support trapping on integer overflow (i.e. https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#ind...), but it does have some overhead as it replaces the native arithmetic instruction with a call to a library function (i.e. https://gcc.gnu.org/onlinedocs/gccint/Integer-library-routin...)

theamk · 2023-10-16T15:19:15.000000Z

C is a low level language, so when programmer writes "+" they get "ADD" opcode. That's what "low level" means. If one wants "+" to add then do range checks, they can use a higher level programming language (or a special function, perhaps an intrinsic, in C)

monocasa · 2023-10-16T15:52:07.000000Z

There's plenty of cases in C where the use of a + operator doesn't result in any form of add instruction being emitted.

MrBuddyCasino · 2023-10-16T15:25:22.000000Z

> C is low level for at least one reason: manual memory management.

Manual memory management isn't that much faster than a modern GC, sometimes even slower. I'd argue that C programs are typically fast because there is just less rope to hang yourself with, leaving aside memory safety.

The anemic abstractions provided in the language and the tiny stdlib means it takes a lot of work to achieve something, so developers simply do less. There isn't even a Hashmap (or a proper String), while in Kotlin, you can perform a deep copy of the object graph and convert it to json in parallel in a single line if you so wish.

johnnyjeans · 2023-10-16T16:17:57.000000Z

This is a common misconception that stems from a misunderstanding of why manual memory management is "fast". It has nothing to do with the actual process by which you request and release memory. Manual memory management being faster than GC is a function of controlling memory layout, which is one of the (very) few low hanging optimization fruits that mortals like you or I can do on contemporary CPUs. It also has to deal with the fact that dynamic allocation is slow, and being able to get it out of the way with a single allocation in the first moments of a process's life is an immensely important tool in the optimizer's toolbelt.

> The anemic abstractions provided in the language and the tiny stdlib means it takes a lot of work to achieve something

Which has the additional effect of forcing you to be a bit smarter about how you do things, to be less wasteful. It forces you to contend with everything you want to do, to consider it and the cost associated with it. Built-in, general-case abstractions are nice when under time constraint and hacking something together, but it doesn't make for good software. Not only is it almost guaranteed to be slower than a properly constructed purpose-built solution, but it also removes your view from thinking about the cost of every single thing you're doing. It makes it easier and attractive to overuse abstractions, to over-engineer solutions, and to approach problems from a standpoint where you simply throw the kitchen sink at the problem because that's the only thing you can think of.

imtringued · 2023-10-17T09:17:25.000000Z

>It has nothing to do with the actual process by which you request and release memory. Manual memory management being faster than GC is a function of controlling memory layout, which is one of the (very) few low hanging optimization fruits that mortals like you or I can do on contemporary CPUs.

That is kind of misleading. The difference is that C and Rust support stack allocation, which is essentially an arena style allocator integrated into the language. What the fancy pointer bumping GC runtimes do, the stack does by default. The problem is that escape analysis is difficult and it is difficult to prove that an access to memory on the stack is safe without fundamentally changing the language like Rust does. It gets worse on the heap, where you can have runtime determined ownership.

C programmers like their doubly linked lists, but when you think about it, it is actually kind of a difficult problem to formalize and analyze in its full generality.

jstimpfle · 2023-10-17T10:44:49.000000Z

It's not about the call stack (which I don't think is that special), but about control over layout of data structures. Languages like Java introduce lots of indirection and "object overhead".

MrBuddyCasino · 2023-10-17T07:29:14.000000Z

> Manual memory management being faster than GC is a function of controlling memory layout

Control over memory layout and manually allocating and freeing memory are orthogonal issues.

I can optimize memory layout in Java too by using primitive data types instead of pointer-chasing objects, or structs of arrays vs array of structs type of things in order to improve access patterns. I can't control alignment and padding, except indirectly, thats true, but that is not what people mean when they say "manual memory management". Rust gives you control over memory layout, but has "automatic" memory management.

> forcing you to be a bit smarter about how you do things, to be less wasteful

Yes this is what I meant.

jstimpfle · 2023-10-16T17:32:47.000000Z

Love your words. 1000 upvotes if I could.

For balance, the faster machines get, the more problems are most effectively solved by throwing the kitchen sink at them.

nineteen999 · 2023-10-17T09:45:29.000000Z

Sometimes wasting a perfectly good kitchen sink on a small problem gives you two bigger problems.

trealira · 2023-10-16T17:43:48.000000Z

There are some performance advantages a garbage collector can have over manual memory management. If you're just calling malloc/free or, in C++, calling new/delete in constructors/destructors (or using a class that does so, like std::vector), and nothing special, the garbage collector is probably allocating memory faster.

> controlling memory layout

Garbage collectors can compact active memory into one contiguous location and adjust the active pointers to point there instead. You can't do this in a language like C, because you can have arbitrary pointers to anything, and there's no runtime indication of what's a pointer or just an integer. You simply have to prevent memory fragmentation in the first place, which also complicates the logic of the program.

For faster allocation in C, arena allocation based on object lifetimes can be used [1]; in generational garbage collectors, you get similar benefits, but it's just done automatically. In fact, in that linked paper, they found that lifetime-based arena allocation improved the speed of their program (a C compiler) at the cost of increased memory allocation compared to naïve malloc() and free(), which is exactly what garbage collection does.

As a result of compaction, memory allocation with garbage collection is just a pointer bump in the best case, whereas allocation with just malloc usually requires searching a free list or a tree.

[1]: https://www.cs.princeton.edu/techreports/1988/191.pdf

kuchenbecker · 2023-10-16T15:50:27.000000Z

Cache invalidation is hard :) almost as hard as semantically naming things in a way that is clear now, and in the future.

staunton · 2023-10-16T16:54:34.000000Z

... the famous Two Hard Things, together with off-by-one errors.

tmtvl · 2023-10-16T19:26:19.000000Z

concurrency With 3 it's Things Hard.

kuchenbecker · 2023-10-18T03:35:41.000000Z

Concurrency is hard because it's difficult to know if the value you have cached in memory is valid or invalid. As soon as a value is read it's cached.

Therefore Concurrency reduces to Cache invalidation and there are again only two hard problems :)

2-718-281-828 · 2023-10-16T14:35:52.000000Z

very interesting observation. never thought about how memory is central to all those concepts and technologies.

systemBuilder · 2023-10-16T15:21:12.000000Z

In my lifetime the memory hierarchy has grown from 3 levels (register, main memory, disk) to at least 8 levels (register, L1 Cache, L2 Cache, L3 Cache, VM, main memory, disk, web). A lot of people like the guy who wrote this paper have no patience for a language that can still run most 1971 C programs!

Every year people who do not understand why C is so successful try make a name for themselves by breaking what makes the language great (such as the complete agnostic set of control structures). C has successfully remained portable and performant for 50Y+ because of its flat memory model (with a few tweaks such as "volatile".)

crabbone · 2023-10-16T15:06:57.000000Z

C is neither fast nor low-level... none of these descriptors have any meaning.

It's a pointless discussion when you don't care to explain how you use the words that obviously have many related but different ways to interpret them.

pizlonator · 2023-10-16T12:30:51.000000Z

Friendly local C programmer and compiler writer here to remind you that C definitely is a low level language for those who understand it and use it professionally. If you’re looking for a low level language, then C (and its relatives) are your best bet.

If you’re new to the language and want to understand how to use it like a pro then ignore this post - it will only confuse you and reduce your ability to use C effectively.

jerf · 2023-10-16T13:32:15.000000Z

C is a low level language... but it's the wrong low-level language. It gives you low-level access to a machine that your real machine actually has to somewhat laboriously emulate. Such dangly bits and bodges that have been added over the years to give access to the real machine are relatively foreign bodies to C.

I would agree the title is a bit rhetorically rough, though, because being the wrong low-level language doesn't make it a high-level language. WASM would similarly be "wrong" if I claimed it was a direct mapping to modern hardware, but that doesn't make it "high level".

(Although what really frustrates me about C isn't that it's a bad mapping per se. It's from the 1970s, what do you expect? And it is obviously still quite useful for many cases. What frustrates me is that it continues to a large degree to dictate language design and heavily color how language designers see hardware, so too much modern language design is still just reshuffling bits of C around, rather than building languages that work with the hardware well.)

jstimpfle · 2023-10-16T14:37:15.000000Z

What is C really? A concise syntax to define structs and functions, with a usable expression syntax. There isn't all that much to it, I've always found it ridiculous for people to claim it's holding hardware back.

I don't think I've ever really seen a good argument what developments were prevented by the existence of C as an important compiled language. The one claim I can remember I find ridiculous: that today's CPUs execute instructions in parallel, not serially. Well, for one, C's semantics aren't that serial, there is a large degree of freedom for compilers and CPUs how to schedule the execution of C expressions and statements. Then, there are SIMD instructions exploiting those capabilities explicitly. But also, the rest of the code gets automatically pipelined by the CPU, according to a specific CPUs capabilities. Even though that stuff happens in parallel, any instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code (and address spaces)?

shadowgovt · 2023-10-16T14:54:51.000000Z

To my aesthetic C is the wrong abstraction because while all those things are possible, the language exposes them via a syntax that makes you think you're writing an embarrassingly sequential program and then tries to hide all of the parallelization that improves performance in the undefined behavior.

I liken it to doing imperative UI development on top of the DOM abstraction in a browser. Yes, under the hood, the browser is choosing when to re-evaluate and repaint interface elements, but you can't touch any of that; you're instead rearranging things in the DOM and memorizing heuristics the browsers use to try and trick the browsers into matching changes to the DOM to visual changes in the browser UI efficiently.

It may very well be time for a low level languages to encourage us to think about programming as "arranging independent blocks of code that can be executed in parallel, with only a handful of sequencing operations enforcing some kind of dependency order. Apart from honoring those sequencing requirements, order of execution or whether execution happens in parallel is undefined."

jstimpfle · 2023-10-16T15:13:37.000000Z

> "arranging independent blocks of code that can be executed in parallel, with only a handful of sequencing operations enforcing some kind of dependency order. Apart from honoring those sequencing requirements, order of execution or whether execution happens in parallel is undefined."

Isn't that exactly what is happening?

shadowgovt · 2023-10-16T15:42:13.000000Z

More or less, but nothing about the design of the language puts that frontmost. Instead, the language is designed to make the developer think they're operating on an embarassingly-sequential machine and only the vast amount of undefined behavior in the language spec allows the compiled output to be parallel.

It's the wrong abstraction for the job and properly using C in a way that takes advantage of it requires unlearning most of what people think they know about how the language works. I'd like to see more languages that start from a place of "Your code can execute whenever the computer thinks is most efficient; don't ever think you know the execution order" and then treat extremely-sequential, deterministic computing as a special case.

jstimpfle · 2023-10-16T15:54:06.000000Z

I think you're just making wrong assumptions. Any C programmer worth their salt knows that that both compilers as well as the CPU introduce a lot of reordering / instruction-level parallelism as optimizations.

You can SIMD / multi-thread explicitly as much as you feel like, but you'll soon find your productivity diminishing, which is not a language fault.

shadowgovt · 2023-10-16T16:05:22.000000Z

I don't want to SIMD / multi-thread explicitly.

I want my language to have low-level abstractions like "pack data into an array, map across array. Reduce array to a value." Those are abstractions a programmer can look at and go "Oh, the compiler will probably SIMD that, I should use it instead of a for loop." In contrast, C will auto-unroll loops. Unless it doesn't. Go memorize this pile of heuristics on popular architectures so you can guess at whether your code will be fast.

I want my language to have low-level abstractions like Orc's parallel and sequential combinators, so that when I need some operations sequenced I can force them to be, when I don't I can let the compiler "float" it and assemble the operations into whatever sequence is fastest on the underlying architecture; I don't have to memorize a bunch of heuristics like "the language allows arbitrary ordering of execution for either side of a '+' operator in an expression, but statements are executed sequentially, unless they aren't it depends on the contents of the statement."

In short, I want my language to ask me to think in terms of parallelism from the start so that my mind is always in the head-space of "This program will be executed in nondeterministic order and I shouldn't assume otherwise."

jstimpfle · 2023-10-16T16:47:35.000000Z

> pack data into an array, map across array. Reduce array to a value."

These are abstractions that you've been able to enjoy for a long time, by using higher-level languages like C++ or Rust. So C didn't prevent the feature, after all.

You could argue now that C has prevented CPUs from implementing these abstraction (because arguably, C cannot express them), but I would like to ask first how you think it should be done, and why it's not a good idea to implement it on the language/compiler level how it's currently done?

If there comes up a new way that lets CPUs understand type theory and magically multi-thread your variably-sized loops by creating a new set of execution units out of thin air, you'll have a point. For the time being, there doesn't seem to exist such a thing, and I can't imagine that the reason why not is C. Rather, if such a thing is nearing practicability, C will have to adapt or slowly die out.

> In short, I want my language to ask me to think in terms of parallelism from the start so that my mind is always in the head-space of "This program will be executed in nondeterministic order and I shouldn't assume otherwise."

This works only in a very limited way in practice. To solve practical problems, you need to combine logic/arithmetic instructions serially to achieve the intended effect. Seems to me that it turned out that most degrees of freedom are more accidental than structured, and it's not practical to manually specify them, when the majority of them are easily recoverable in an automated way.

So that's how you end up with that instruction-level parallelism that is worked out by the compiler and the CPU.

shadowgovt · 2023-10-16T17:51:11.000000Z

> You could argue now that C has prevented CPUs from implementing these abstraction (because arguably, C cannot express them), but I would like to ask first how you think it should be done, and why it's not a good idea to implement it on the language/compiler level how it's currently done?

As I said top-thread, it's to my aesthetic. These are all Turing-complete languages and you can, in theory, do whatever in any of them. But map-reduce-fold-etc make it much clearer, to my eye, that I'm operating on a blob of data with the same pattern, and it's easier to map that in my brain to the idea "The compiler should be able to SIMD this." Contrast with loops requiring me to look at a sequential operation and go "I'll trust the compiler will optimize this by unrolling and then deciding to SIMD some of this operation." The end-result is (handwaving implementation) the same, but the aesthetic differs.

As you've noted, I'm not unable to do this in C or C++ or Rust (in fact, C++ is especially clever in how it can use templates to inline static implementations recursively so that the end result of, say, the dot product of two N-dimensional vectors is "a1 x b1 + a2 x b2 + a3 x b3" for arbitrary dimension, allowing the compiler to see that as one expression and maximize the chances it'll choose to use SIMD to compute it). But getting there is so many layers of abstraction away (I had to stare at a lot of Boost code to learn that little fact about vector math) that the language gets in the way of predicting the parallelism.

> If there comes up a new way that lets CPUs understand type theory

CPUs don't understand type theory. Compilers do and they can take advantage of that additional data to do things like unroll and SIMD my loops right now. My annoyance isn't that it's impossible, it's that I'd rather the abstraction-to-concrete model be "parallel, except sometimes serial if the CPU doesn't have parallel instructions or we hit capacity on the pipelines," not the current model of "serial, and maybe the compiler can figure out how to parallelize it for you."

> To solve practical problems, you need to compile logic/arithmetic instructions serially to achieve the intended effect... Seems to me that it turned out that most degrees of freedom are more accidental than structured, and it's not practical to manually specify them

I agree... Eventually. There's a lot of parallelism allowed under-the-hood in the space between where most programmers think about their code, as evidenced by C's undefined behavior for expression resolution with operators of the same precedence.

Whether degrees of freedom evolved by accident is irrelevant to whether a new language could specify those parts of the system (sequential vs. intentionally-undefined ordering) explicitly. C, for example, has lots of undefined behavior around memory management; Rust constrains it. It's up to the language designer what is bound and what is allowed to be an arbitrary implementation detail, intentionally left undefined to give flexibility to compilers.

Even the modern x86 instruction set is a bit of a lie; under the hood, modern CPUs emulate it by taking chunks of instruction and data and breaking them down for simultaneous execution on multiple parallel pipelines (including some execution that never goes anywhere and is thrown away as a predictive miss). CPUs wouldn't be nearly as fast as they are if they couldn't do that.

I'm not advocating for breaking the x86 abstraction; that's a bit too ambitious. But I'd like to see a language take off that abandons the PDP-11 embarrassingly-serial era of mental model in favor of a parallel model.

imtringued · 2023-10-17T09:33:56.000000Z

Yes, in Haskell.

crabbone · 2023-10-16T15:24:46.000000Z

> A concise syntax to define structs and functions, with a usable expression syntax. [...] I've always found it ridiculous for people to claim it's holding hardware back.

You just looked in your fish tank and declared what the weather is going to be like in the Atlantic ocean... Like... these things have nothing to do with each other. The fact that C has functions or structs has nothing to do with it being awful influence on designing hardware.

Here are some reasons why C is awful.

* It believes that volatile storage is uniform in terms of latency and throughput. This results in operating systems written with the same stupid idea: they only give you one system call to ask for memory, and you cannot tell what kind of memory you want. This in turn results in hardware being designed in such a way that an operating system can create the worthless "abstraction" of uniform random-access memory. And then you have swap, pmem GPU's memory etc. And none of that has any good interface. And these are the products that despite the archaic and irrelevant concept of how computers are built have succeeded to a degree... Imagine all those which didn't. Imagine those that weren't even conceived of because the authors dismissed the very notion before giving the idea any kind of thinking.

* It has no concept of parallelism. In its newer iterations it added atomics, but this is just a reflection of how hardware was coping with C's lack of any way to deal with parallel code execution. C "imagines" a computer to have a CPU with a single core running a single thread, and that's where program is executed. This notion pushes hardware designers towards pretension that computers are single-threaded. No matter how many components your computer has that can actually compute, whenever you write your program in C, you implicitly understand that it's going to run on this one and only CPU. (And then eg. CUDA struggles with its idea of loading code to be executed elsewhere, which it has to do in some very cumbersome and hard to understand way, which definitely doesn't rely on any of C's own mechanisms).

jstimpfle · 2023-10-16T15:49:39.000000Z

> It believes that volatile storage is uniform in terms of latency and throughput.

It doesn't, I don't think it even mentions terms like latency and throughput.

> they only give you one system call to ask for memory, and you cannot tell what kind of memory you want

What?

> Imagine those that weren't even conceived of because the authors dismissed the very notion before giving the idea any kind of thinking.

Such as?

> It has no concept of parallelism.

C can function with instruction-level parallelism, CPU-level parallelism, process/thread-level parallelism just fine.

> C "imagines" a computer to have a CPU with a single core running a single thread, and that's where program is executed.

Given that a memory model was introduced in C11, and that there were other significant highly concurrent codebases before that, I'm having doubts how correct and/or meaningful that statement is.

For sure, when trying to understand the possible outcomes of running a piece of code is when running it in a single thread (doesn't matter on how many CPUs though, apart from performance). That is just the nature of multi-threading, it's hard to understand.

> This notion pushes hardware designers towards pretension that computers are single-threaded.

How do they pretend so? My computer is currently running thousands of threads just fine. It has a huge number of components, from memory to CPU to controllers to buses to I/O devices, that are executing in parallel.

crabbone · 2023-10-16T19:09:43.000000Z

> It doesn't, I don't think it even mentions terms like latency and throughput.

It only has one group of functions to allocate memory, and neither of them can be configured wrt' to what storage to allocate memory from, definitely not in terms of that storage's latency or throughput which would be very important in systems with non-uniform memory access.

Compare this to eg. concept of "memory arenas" that explicitly exists in eg. Ada, but many languages have libraries to implement this idea -- in this situations, instead of using language's allocator, you'd be using something like APR's memory pools <https://apr.apache.org/docs/apr/trunk/group__apr__pools.html>.

> and that there were other significant highly concurrent codebases before that

You are again confused... there weren't such codebases in C because C is bad for this. It's not because nobody wanted it. Highly concurrent codebases existed in Erlang since forever, for example.

> How do they pretend so? My computer is currently running thousands of threads just fine.

Threads aren't part of C language. They exist as a coping mechanism. Their authors are coping with the lack of parallelism in C, which is exactly the point I'm making. Threads exist to fix the bad design (not that they are a good fix, especially since they are designed by people who believe that there's nothing major wrong with C, and the thousand and first patch will definitely fix the remaining problems).

So, not only this is a counter-argument to the point you are trying to make, it's also yet another illustration to how using C prevents designers from seeking more adequate solutions.

> from memory to CPU to controllers to buses to I/O devices, that are executing in parallel.

The point is not that they cannot run in parallel... The point is that C doesn't give you tools to program them to run in parallel.

jstimpfle · 2023-10-16T23:03:13.000000Z

> It only has one group of functions to allocate memory, and neither of them can be configured

Seriously, have you done any non-trivial C programming? Because those are blatant falsehoods. You must be talking about uni level introduction to C programming, using malloc/free and thinking that's how you "allocate".

> You are again confused... there weren't such codebases in C because C is bad for this. It's not because nobody wanted it. Highly concurrent codebases existed in Erlang since forever, for example.

Just one example I know, take the Linux kernel which had a good amount of SMP support way before C11. I believe they still haven't switched over to the C11 memory model.

> The point is not that they cannot run in parallel... The point is that C doesn't give you tools to program them to run in parallel.

How come then, that my computer is running so many things, many of them written in C, in parallel?

jstimpfle · 2023-10-17T02:48:55.000000Z

> Threads exist to fix the bad design (not that they are a good fix, especially since they are designed by people who believe that there's nothing major wrong with C, and the thousand and first patch will definitely fix the remaining problems).

The thing that needs fixing is mostly people like you, purporting falsehoods while lacking deeper understanding how it works / how it's used.

Threads are a concept that exists independently from any language. They are the unit of execution that is scheduled by the OS. If a program should be multi-threaded with parallel execution (instead of only concurrent execution), by necessity you need to create multiple threads. (Or run the program multiple times in parallel and share the necessary resources, but that's much less convenient and lacks some guarantees).

taejo · 2023-10-16T17:10:31.000000Z

> > It believes that volatile storage is uniform in terms of latency and throughput.

> It doesn't, I don't think it even mentions terms like latency and throughput.

Yes, that's the whole point.

jstimpfle · 2023-10-16T17:34:03.000000Z

No it isn't. Not mentioning the differences isn't the same as acting like they don't exist. Those things are only treated as out of scope.

Not every concept must be expressed in language syntax / runtime objects, nor is necessarily it a good idea to do so. In many cases, it's a bad idea because it leads to fragmentation and compatibility issues. At some point, one has to stop making distinctions and treat a set of things things uniformly, even though they still have differences.

CPUs have various load and store instructions that all work with arbitrary pointer addresses. Whether the address is a good/bad/valid/invalid one will only turn out at run time. There would be little point to make a separate copy of these instruction sub-sets for each kind of memory (however you'd categorize your memories). The intent as well as the user interface are the same.

I think that's basic software architecture 101. (Once you've left uni and left behind that OOP thinking where every object of thought must have a runtime representation).

Btw. C compilers allow you to put a number of annotation on pointers as well as data objects. For example pointer alignment to influence instruction selection, or hints to the linker...

theamk · 2023-10-16T15:38:48.000000Z

At least for the first point: C has been used extensively with non-uniform storage. Back in the DOS days when we had memory models (large, small, huge, etc...), and today, when programming all sorts of small microcontrollers. A common one I occasionally is AVR, which has distinct address spaces for code and data memory - which means a function to print string variable is a very different from the one used to print a string constant. This makes programs rather ugly, but things generally work.

As for your parallelism idea.. well every computer so far has a fixed number of execution units, even your latest 16384 core GPU still has every core perform sequential operators. And that's roughly what C's model is, it programs execution units. And it definitely hasn't stopped designers from innovating - complete different execution models like FPGA exists, and have a constant innovation in programming languages.

crabbone · 2023-10-16T19:20:56.000000Z

> At least for the first point: C has been used extensively with non-uniform storage

And the results are awful. You are confused between doing something and doing it well. The fact that plenty of people cook frozen pizza at home doesn't make frozen pizza a good pizza.

> And it definitely hasn't stopped designers from innovating

And this is where you are absolutely wrong. We have hardware designs twisted beyond belief only so that they would be usable with C concepts of computer, while obviously simpler and more robust solutions are discarded as non-viable. Just look at the examples I gave. CUDA developers had to write their own compiler to be able to work around the lack of necessary tools in C. We also got OpenMP and MPI because C sucks so much that the language needs to be extended to deal with parallelism.

And it wasn't some sort of a hindsight where at the time of writing things like different memory providers were inconceivable. Ada came out with the concept of non-uniform memory access baked in. Similarly, Ada came out with the concept for concurrency baked-in. It was obvious then already that these are the essential bits of system programming.

C was written by people who were lazy, uninterested to learn from peers and overly self-confident. And now we've got this huge pile of trash of legacy code that's very hard to replace and people like you who are so used to this trash, that they will resist its removal.

jstimpfle · 2023-10-16T23:49:14.000000Z

You are very confidently making some wild statements that seem to be based on the assumption that only because something isn't specified in a given place, it couldn't be specified somewhere else. That assumption is wrong.

fanf2 · 2023-10-16T15:59:09.000000Z

I don’t think it’s fair to blame C for the flat random access memory model. Arguably it goes back to Von Neumann. There was a big push to extend the model in the 1960s through hardware like Atlas and Titan (10 years before C) and operating systems like Multics. And there’s all the computer science algorithms analysis that assumes the same model.

crabbone · 2023-10-16T19:23:38.000000Z

At the time C rose to prominence there was already understanding that memory access isn't going to be uniform, and less and less so as hardware evolves and becomes more complex. Ada came out with this idea from the get go.

Von Neumann created a model of computation. It's a convenient mathematical device to deal with some problems. He never promised that this is going to be a device to deal with all problems, nor did he promise that this is going to be the most useful or the most universal one etc.

fanf2 · 2023-10-16T19:48:03.000000Z

You’re echoing my point back at me, though to be fair I should have been more explicit that my examples from the 1960s were about caches and virtual memory and other causes of nonuniform access hidden under a random access veneer.

But we can go 15 years earlier: Von Neumann wrote in 1946: “We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.” https://www.ias.edu/sites/default/files/library/Prelim_Disc_...

imtringued · 2023-10-17T09:26:28.000000Z

>Well, for one, C's semantics aren't that serial, there is a large degree of freedom for compilers and CPUs how to schedule the execution of C expressions and statements.

I thought about the implications of a "parallel" statement, where everything is assumed to execute in parallel and oh boy are the implications big. C's semantics are serial but they contain implicit parallelism. The equivalent is that the parallel statement contains implicit sequentialism that the compiler can exploit to reduce the amount of book keeping needed by the CPU to schedule thousands of instructions at the same time. E.g. instead of having an explicit ready signal and blocking on it, the compiler can simply decide to split the parallel statement into two parallel statements, one executed after the other. Implicit sequentialism! A parallel statement implies that no aliasing writes are allowed to be performed. I don't know what the analysis for that would look like, but in many common cases I would expect the parallel statement to be autovectorized quite reliably.

>Even though that stuff happens in parallel, any instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code (and address spaces)?

Uh, you know we can just encode the program as a graph? Graph reduction machines are a thing, you know.

jstimpfle · 2023-10-17T13:27:11.000000Z

> we can just encode the program as a graph

What is the output medium for the encoded representation? A linear address space, like a file, or virtual memory.

circuit10 · 2023-10-16T17:00:53.000000Z

“instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code”

That is sort of a thing: https://en.m.wikipedia.org/wiki/Very_long_instruction_word

If you have multiple instructions grouped together like this you could think of it as being a 2D array of instructions

xscott · 2023-10-16T14:32:42.000000Z

I understand your point: Modern hardware tries REALLY hard to pretend it is a simple set of instructions executing one after another. For all the on the fly clever caching, micro-op translation, branch prediction, speculative execution, register renaming, and whatever else, it consistently presents a sane model to single threaded programs. It's difficult to even see the magic under the hood if you tried, and it mostly shows up in unexpected performance discrepancies or race conditions for multi threaded programs. It's all a huge charade...

However, before dismissing this all as a bad mapping to an outdated 1970s model of computation, I'd like to see a good alternative. CUDA has clearly shown that there's an acceptable model for massively parallel data sets, but that doesn't handle branch heavy code very well at all. And FPGAs have a different approach for a completely different kind of problem, but I don't know how you would expose what Apple, AMD, or Intel chips are doing under the hood and have it be at all manageable to the programmer. How is someone supposed to indicate what's next when a pipeline stalls waiting on the previous operation or a cache miss? Is the programmer going to toss micro ops into separate execution units and wait for the results to come out the other side in arbitrary order? Is this an async/await model for every addition or memory fetch? I think it would be complete spaghetti to even try, but I'd love to be shown I'm wrong.

People get all excited trash talking Itanium, but I think it's a lesson that if you try to expose any alternative to the 1970s model they'll just bitch about how there are no sufficiently smart compilers. And of course it got scooped by AMD64 pretending to execute one instruction after another.

And if there isn't a good alternative, I think C (or Rust, or WASM) are a pretty good fit for what you've actually got to work with at the low level.

jerf · 2023-10-16T14:34:33.000000Z

"I'd like to see a good alternative"

Me too. See my other reply below.

That said, "This is a good match" does not logically follow from "This is a bad match but it's the best match we have." It's still a bad match.

fanf2 · 2023-10-16T15:15:40.000000Z

Itanium was the wrong design not because of the reasons you suggest, but because it assumed that good performance is something that can be statically baked into the object code, and therefore that there is such a thing as a sufficiently smart compiler for an explicitly parallel processor running general purpose code. But evidently the designers were wrong.

Which is not to say that explicit parallelism is bad, it’s clearly useful for GPUs and vector code (and compiling to SVE is not too different from itanic). But it doesn’t work as well as dynamically discovered parallelism for non-vector code.

xscott · 2023-10-16T15:58:48.000000Z

It seems to me there's some uncharted territory between "massively parallel" (GPU) and "unpredictable branching" (CPU), and the corpse of Itanium is laying there as a warning to anyone who would go exploring in that area. Maybe it's just a desert, but I doubt it.

hawk_ · 2023-10-16T13:46:06.000000Z

What language(s) in your opinion have the right low-level where the access to the real machine doesn't feel foreign?

JonChesterfield · 2023-10-16T14:07:06.000000Z

Assembly is the right one. You have direct access to the machine ISA, including the weirder status/control registers and whatever trap/syscall corresponds to. Assemblers are somewhat powerful - can define data layouts somewhat like structs, abstract some things behind macros, add pseudo-instructions to put friendlier names on some things. Maybe the ISA expects you to build constant integers out of arithmetic, the assembler can give you a 'const' instruction which expands to said arithmetic.

I have a pet theory that lisp macros over an assembler is the right high level language for systems programming but that hasn't made it off the whiteboard yet.

trealira · 2023-10-16T15:19:51.000000Z

The problem is that assembly is CPU dependent. The benefit of a high-level language is that it's CPU architecture independent.

For smaller CPUs that can't support all of C's assumptions natively anyway, like the 6502, which can't multiply or do floating point arithmetic, something like what you describe would likely be best. It reminds me of the COMFY 6502 compiler: https://dl.acm.org/doi/pdf/10.1145/270941.270947

JonChesterfield · 2023-10-16T16:06:25.000000Z

Therein lies the interesting design space, yeah. Control flow, data layout, semantics of basic blocks are sometimes target agnostic and sometimes not. Sometimes a div instruction needs to turn into a runtime call, sometimes it doesn't. Sometimes you want explicit control of registers, sometimes any gpr is fine.

Which I suppose yields the other language choice. Instead of C or assembly, write in something very like a compiler IR. Ymmv persuading non-compiler devs to code in SSA form directly.

wnoise · 2023-10-16T15:39:05.000000Z

But even the exposed machine ISA for x86 is way different these days than the underlying hardware.

jbreckmckye · 2023-10-16T16:03:22.000000Z

> I have a pet theory that lisp macros over an assembler is the right high level language for systems programming but that hasn't made it off the whiteboard yet.

I'm having a little trouble visualising this. Don't many assemblers provide macro-instructions already?

JonChesterfield · 2023-10-16T16:32:30.000000Z

Assemblers come with text substitution macros. Lisp comes with program rewriting macros. Same basic idea that it's all expanded away by runtime, but using something like scheme as the compile time metaprogram that emits the machine specific assembly. There are a few s-expression based assemblers out there so probably nothing novel.

alexisread · 2023-10-16T17:25:59.000000Z

Well Forth is possibly the most minimal VM over a platform, as evidenced by openfirmware.

It does have problems scaling though, in that if you've seen one Forth, you've seen one Forth ie. The variations required to fit a platform make them semi-incompatible. Also, only global scope, no types and no built-in threadsafe constructs are limiting.

That's not to say that a more lispy Forth wouldn't be useful though, in that a concatenative syntax allows us to pass custom datastructures around like APL, and CPS (delimited continuations with lexically scoped dynamic binding would come from the lisp side (see https://github.com/manuel/wat-js).

Memory management in Forth can handle multiple memory types eg. https://flashforth.com/ so adding something like ref counting (https://github.com/zigalenarcic/minilisp/blob/main/main.c) to handle the dynamic list side of things might mesh well.

In any case, if you're looking for a self hosting lisp that runs on bare metal, https://github.com/attila-lendvai/maru has been out for a few years.

pjmlp · 2023-10-16T14:26:14.000000Z

Assembly, or what ESPOL was already doing in 1961 a decade before C was even an idea, compiler intrisics.

So taking out Assembly, any language can have hardware capabilities exposed as compiler intrisics, that is nothing special about C in that regard, only the one many people are commonly aware of because they don't to be educated in compilers.

giancarlostoro · 2023-10-16T13:49:39.000000Z

The only one I can think of would-be Assembly, but I don't do much low-level work, I code in much higher-level languages. Genuinely curious what the answer is.

schwoll · 2023-10-16T14:44:46.000000Z

For portability by far the vast majority will say C. In my experience the C compiler optimizer will do a lot with -O2 or -03 but it can't always infer correct SIMD optimization for some operations and on occasion you have to drop down into x86_64 assembly. The idea is do most things in C and use __asm__ to write custom assembly instructions. With #defines around the assembly for each processor you plan on supporting you get the benefit of both C optimizers and portability across different CPUs as well as any future updates to the compiler in the future. But the compiler writers will say to use intrinsics and extended assembly rather than raw assembly because when you write raw assembly your code becomes a black box to the compiler and it can't infer optimizations for your surrounding code that interfaces with the assembly. I think C with extended Asm is likely the most sane combination if you don't mind the slightly ugly syntax and the fact that there could be differences between compilers. That being said, C with compiler intrinsics seems to be a happy compromise for those that don't want to shift around registers and deal with the stack.

https://gcc.gnu.org/wiki/DontUseInlineAsm

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

I don't use Rust so I can't comment on it but it also has compiler intrinsics + a memory safety model. It's compiler is really dog slow last time I used it so I hope that has improved but nobody is really killing C any time soon, even if there's enthusiasm for memory safety. Sooner or later you have to delve down into the depths of Narnia and you may as well get comfortable dealing with memory.

My likely favorite combination is Python + C (for the speed stuff) + Intrinsics (for the really speed stuff).

actionfromafar · 2023-10-16T14:19:57.000000Z

I don’t think there is one, and if there was, it would run the same risk of becoming anachronistic as C itself has.

Besides, C and its compilers have very much influenced CPU designs and optimisations, so it’s a world with a feedback loop.

Maybe the loop will weaken somewhat in the new LLM craze.

jerf · 2023-10-16T14:21:43.000000Z

Per my last paragraph, I am not convinced about any of them.

One of these days I really need to post my "ideas for languages" that I've got banging around on my hard drive, but one of them is "a language that deals with the increasingly heterogeneous nature of the computer". You've got the CPU, the GPU, efficiency cores, whoknows what else in the future (NN cores), and it's only a small hop from there to consider other computers as resources too.

Full disclosure: I have no idea whatsoever what this looks like. Especially in light of the fact that you need to build not just for the exact machine you're developing on but for machines in the future as well. Some sort of model of what is being computed and some guestimate at the costs? (Something like an SQL query builder where you declare your goal and it does the computation about what resources to compute it with?) It's also possible that the huge gulfs in performance between all these parts are just too large to bridge and manual scheduling of all these resources is just the only choice.

Even just within a CPU it's rather annoyingly difficult to use vector-based code in modern languages. Perhaps something like an array-based language, but one that discards that field's bizarre love affair with single-character (if not outright Unicode) operators and can be read by a normal human, and just affords writing code in a style that SIMD becomes a sensible default rather than something the optimizer laboriously reverse engineers from your conventional imperative code. (Array based programming could really use a "for humans" version of those languages in general.)

To some extent, just sitting down for a year to learn modern assembler and starting from the very, very bottom once again to build a high level language, rather than starting with C and building "C, but ..." which is pretty much every modern language being developed, would be an interesting exercise if nothing else.

Another little example is I think Jai was supporting structures-of-arrays instead of arrays-of-structures, though I don't know if they kept it. I'd like to see a language where the language-level data structures are explicitly viewed through the lens of "how I serialize these into memory", rather than the data structure implicitly creating such a specification by how it is defined, so for instance you could swap out a SoA to an AoS by swapping only the way the compiler serializes to RAM and not any of the rest of the code. Obviously you provide defaults that look like modern languages, but with this you could directly implement things like tagged unions with custom bit layouts, or theoretically, directly accessing gzip'd data by specifying that this data structure can only be accessed sequentially but as long as that's what you do you don't need to directly unzip it, etc. This doesn't directly answer "how do you utilize modern hardware correctly" but gives you tools to potentially create a better match than what compilers give by default.

Again, to be clear, this is crazy pie-in-the-sky far out ideas that I do not have an implementation in mind for, but it's the sort of thing I'd like to see more experimentation with on the fringes of language dev. (And I only wish I had time to do it myself. Unfortunately, I simply do not.)

(And, as the sibling comments point out, yeah, assembler technically, but that's kind of a cop out.)

hawk_ · 2023-10-16T14:41:28.000000Z

> To some extent, just sitting down for a year to learn modern assembler and starting from the very, very bottom once again to build a high level language, rather than starting with C and building "C, but ..."

Right so build a 'union' of what's available and somehow try to fit it in a unified model. I was hoping there was at least some theoretical PL answer of a unified model. But can't be because all manufacturers and industry sub-groups are doing their own thing.

> assembler

Yes that's definitely a cop out.

fanf2 · 2023-10-16T15:20:49.000000Z

TAOS was a 1990s system for heterogeneous computing https://en.m.wikipedia.org/wiki/Tao_Group

but it predated the rise of GPGPUs and vector units so it didn’t tackle data parallelism and array processing.

jerf · 2023-10-16T15:35:31.000000Z

Yeah, I don't want to pretend this is a totally new idea. There's not a lot of totally new ideas.

But there was a lot of things tried in the 1960s and 1970s whose only fault was that they were simply too early. For example, people were researching neural nets back then. They basically got nowhere. In hindsight, they never could have, simply because it was too soon and the requisite power wasn't there yet.

A phone is more heterogenous today than I think even supercomputers were in the 1990s, and the trend is only increasing diversity, with neural processors on the near horizon and quantum on the far horizon (as it seems quantum processors are far more likely to end up functioning as a sort of fancy "accelerator card" than their own CPUs). & honestly even CPUs are almost viewable as their C subset and their vector processing subset, and even "within" the same CPU the two don't always cross particularly gracefully.

schwoll · 2023-10-16T14:56:54.000000Z

Keep in mind C is just basically assembly but encapsulated in a pretty package. If you create small executables and dump out the generated unoptimized assembly you'll be surprised just how simple it is. It pretty much just encapsulates the ideas of System V binary compatibility and then keeps going. So developing a language from scratch and skipping the use of C would really likely be just causing yourself more pain than you need as you're going to have to replicate all of the things C does anyways, so why not reuse what the experts have already done. And you get a lot of cross CPU and cross compiler portability.

What you want is the idea of bootstrapping. Write your compiler in C, then as your language specification is developed enough, dogfood your own compiler. Write your compiler in your new language, then compile itself. This is called bootstrapping and is how many languages are developed. Once you are fully bootstrapped you can drop C altogether.

grotorea · 2023-10-16T14:40:46.000000Z

> It gives you low-level access to a machine that your real machine actually has to somewhat laboriously emulate

Isn't C the language (x86_64) processors are designed to be fast for? Sure they added a large amount of abstractions but since they were made for C is there any language where the processor doesn't have to laboriously emulate?

pizlonator · 2023-10-16T20:54:11.000000Z

> Isn't C the language (x86_64) processors are designed to be fast for?

Yup

I mean they also optimize for Java and JS and .NET and probably Swift and Rust.

But C still takes precedence, I bet

kllrnohj · 2023-10-16T14:54:43.000000Z

> Isn't C the language (x86_64) processors are designed to be fast for?

Nope. They compete on performance in C++ (games mostly), Java (enterprise SKUs, but same core architecture), and JavaScript (browser benchmarks even though raw JS performance is a very small part of browser responsiveness...)

pizlonator · 2023-10-16T14:26:02.000000Z

Nothing added to machines since the invention of C is foreign to C. In fact, C is hardwares most favored customer. Chip designers tend to favor tuning for traces of instructions generated by C compilers. Some architectures, like RISCV, are so overtuned for C and nothing but C that they forgot to add some instructions (like add with overflow check).

snvzz · 2023-10-17T01:36:36.000000Z

>they forgot to add some instructions (like add with overflow check).

If you actually read the spec, you would have found that they didn't "forget" these.

They carefully studied them and judged the encoding space is better used elsewhere.

pizlonator · 2023-10-17T13:57:04.000000Z

I did read the spec. They did forget them.

The “studies” failed to consider non-C languages. These people had no clue how widespread overflow checking us and how much more widespread it’s set to become because of the security upside.

fanf2 · 2023-10-16T16:06:49.000000Z

Multiprocessing. Atomics. Vectors. GPGPUs. All foreign to C when they were introduced.

pizlonator · 2023-10-16T20:33:33.000000Z

I don't think any of those are foreign to C since:

- All of them were designed with C in mind, so much so that in many cases the C implementation of those features was the first implementation of them. The first SMPs were programmed in C with C APIs. The first time I did atomics was in C. When vector APIs are introduced, they're usually exposed to C first. Etc.

- All of those features fit more elegantly into C than any other language. C runs on GPUs so naturally while most other languages don't run on GPUs at all. So, the things you list are examples of features that are more native to C than they are foreign.

quelsolaar · 2023-10-16T12:41:44.000000Z

Your friendly wg14 member here. It is a low-level language, but it is not a portable assembler. If you think you what you will write will have a one-to-one relationship to assembler you will run in to trouble. If you want a deeper dive in to how these things can trip you up, watch: https://youtu.be/w3_e9vZj7D8

gavinhoward · 2023-10-16T13:10:14.000000Z

C programmer and fan of yours.

I agree with you, but if you could convince WG14 to remove a lot of the stupid UB, that would be closer to the case.

(I know you're trying from your "One Word Broke C" article. Which, by the way, is putting up a server error right now.)

pif · 2023-10-16T13:16:11.000000Z

> it is not a portable assembler

And it never was!

Just keeping this point in mind would reduce the plethora of discussions about undefined behaviour to the essential, i.e. the useful discussions, i.e. the 0.1%.

JonChesterfield · 2023-10-16T14:08:39.000000Z

Opinion is divided on this. My best guess is that ISO C was never a portable assembler, but the C programming language before standardisation broadly was, and that's how people hold both positions as self evidently true. Different definition of "C".

pif · 2023-10-16T16:50:07.000000Z

Programmers are supposed to trust official interfaces rather than undocumented implementation details ;-P

titzer · 2023-10-16T16:15:27.000000Z

C would have been great as a portable assembler. E.g. if a syntactic + mapped to the hardware `add` instruction, that's pretty predictable! But it doesn't; it maps to the hardware `add` modulo compiler optimizations (like folding and strength reduction, which are done assuming overflow and other tricky parts are UB). Basically everywhere UB is permitted by the spec is so compilers don't have to handle the tricky cases, don't have to give semantics for buggy programs, or even help in debugging, and can make what would be unsound optimizations if the operations truly represented the target CPU's "weird" add semantics.

pizlonator · 2023-10-16T16:52:30.000000Z

Just toss enough compiler flags at clang and make sure to occasionally use inline asm snippets to throw off the compiler's optimizations.

Then you're GTG

pizlonator · 2023-10-16T15:51:48.000000Z

Depends on what you mean by "portable assembler". It is exactly that in a lot of ways, but exactly not that in others.

I think it's more useful to say that C is a portable assembler, than it is to say that it isn't, considering how it's used in practice and the sort of nasty things C compilers do in order to make that possible.

cmsonger · 2023-10-16T13:17:00.000000Z

The author is playing a semantic game.

I don't think the author's point is that "C is not a good language for systems programming." You are not going to have an equivalent to volatile int *dma_register = SCATTER_GATHER_BASE; in Haskell.

The author's point is that the drive to make C and other "model the von Neumann machine" languages execute quickly has made the compiler very complicated (the author is implying that "low level requires simple compiler") and that processors built to make such code run quickly are also very complicated. And those complications carry costs.

In many ways this is a "call to programming model action" and cites GPU as illustrating the potential when "new programming model" and "silicon to support it" are done in concert.

bunderbunder · 2023-10-16T14:18:13.000000Z

"Low-level" is a word with multiple meanings.

The original one is the one the article uses: low-level languages are non-portable and tied to the hardware on which they run, and high-level languages can target multiple platforms. Under this definition, C is absolutely a high-level language.

My complaint would not exactly be that the author is playing semantic games; it would be that they are clinging to archaic terminology in a way that does more to confuse than enlighten. The "generations" taxonomy is generally more descriptive.

  1st: Machine
  2nd: Assembly
  3rd: General-purpose
  4th: Application-specific

The 3rd/4th distinction gets a bit muddied sometimes, and back in the 80s and 90s people talked about a 5th generation that never really took off. But a couple (I think) clear examples of 4GLs are SQL, HyperCard, and Mathematica.

What I like about that approach is that it mostly breaks languages up according to fairly clear distinctions about when you would use them. And then we can use "high/low-level" as a relative term, where higher-level languages tend to do more to abstract away the details of what the computer is actually doing. That does mean that higher-generation languages tend to be higher-level; all we lose in doing it that way is the ability to have silly arguments about where to place a completely arbitrary (and, frankly, useless) dividing line.

I also like that this way we can recognize .NET IL, WebAssembly, and Java bytecode as very high-level 2nd generation languages, which, at the very least, is fun.

Oh, and Forth is a 3rd generation language. Fight me, Chuck.

fanf2 · 2023-10-16T15:32:25.000000Z

5th generation was the label under which the Japanese government threw a lot of money at Prolog and expert systems. It wasn’t a technically-driven distinction from the 4th generation, but rather a wish about what would happen if the project succeeded. 5GLs came about from language designers bidding for research money, saying, try our language, it’s better than Prolog!

hardware2win · 2023-10-16T12:40:51.000000Z

>use it professionally

I think this post goes way way way above boringness of day2day jobs.

Yea, this post is not about how to use hammer, but more like curious consideration whether using hammers everywhere is not limiting us (C design)

lelanthran · 2023-10-16T12:53:21.000000Z

> Yea, this post is not about how to use hammer, but more like curious consideration whether using hammers everywhere is not limiting us (C design)

Maybe it [EDIT: the post] is, but the title is obviously nowhere near accurate - if C is not a portable low-level language, what on earth is?

[1] It gets reposted everywhere so often I have read it multiple times, and the one thing in common I see is how every know-it-all crawls out of the woodwork to comment on the title, as if the title was something new, deep, profound or even correct.

bayindirh · 2023-10-16T13:03:37.000000Z

C is only portable between systems which emulate PDP-11 at hardware level and if and only if you don't use any compiler-specific extensions.

If you use sys calls, work between different breeds of operating systems (UNIX, POSIX and Windows are not compatible with each other), you need to rewrite or wrap relevant parts, or write the relevant part beforehand inside ifdefs to be able "port" it between systems.

The gist of the piece is, hardware is evolving to please C's programming model, hiding all the complexities C is not aware of, and behave like a PDP-11 on steroids. This is why we have truckload of side-channel attacks in X86 to begin with. To "emulate" PDP-11s faster and faster.

Joker_vD · 2023-10-16T13:13:36.000000Z

It's not even that faithful to PDP-11, either. PDP-11 has unified integer division/modulo instruction (and it operates in double-width: it takes 32-bit dividend and 16-bit divisor and produces 16-bit quotient and remainder, just like x86), it has double-width integer multiplication (again, just like x86), it has instruction for addition/subtraction with carry — nothing of that is available from (standard) C, and it's quite a pity. And also, while PDP-12 it has built-in support for post-increment and pre-decrement for pointers, it doesn't has built-in pre-increment or post-decrement.

AnimalMuppet · 2023-10-16T13:59:46.000000Z

I think we'd have the side-channel attacks on x86 even if we wrote in assembler - unless we wrote the assembler specifically with an eye to preventing (the known kinds of) side-channel attacks.

Put differently, I don't think the side-channel attacks would disappear if we wrote in Rust or Haskell or Agda.

bayindirh · 2023-10-16T14:13:59.000000Z

The side channel attacks are not a result of programming in C, but the design of the hardware which doesn't upset the view of the system w.r.t. C compilers.

All programming languages, regardless of their type (imperative, functional) or interfacing method with the system (JIT, interpreted, compiled) are not immune from these attacks, because it's the hardware which is designed to emulate PDP-11.

In other words, all programming languages target a modern PDP-11 at the end of the day. If hardware has shown all of its tricks (esp. cache management, invalidation, explicit prefetching, etc.), and lacked speculative, out of order execution, these problems will go away, but getting the highest performance would become much harder and complicated, and even impossible in some cases.

Intel tried this with IA64, with a "No tricks, compiler shall optimize" approach, and it tanked to put it mildly (esp. after AMD64 came out).

AnimalMuppet · 2023-10-16T15:23:45.000000Z

Let's say we have two chips. Chip A requires the programmer to handle all the "magic" stuff. Chip B is like current chips; it hides that stuff. Chip B is subject to side channel attacks. Chip A likely is also unless the programmer is very careful.

Which chip would have sold more? I assert that chip B would have, by a massive volume, because it didn't require the programmer to mess with all that stuff.

So I don't think that it's fair to say that the chip is trying to look like a PDP-11 because of C. I think it's trying to look like a simpler chip, so that mere mortals can program it and still get most of the maximum performance.

bayindirh · 2023-10-30T12:17:32.000000Z

I think it depends on the toolchain. Itanium didn't sink because of the optimization it needs, but the because of the toolchain which can't do all the optimization.

So, if a complex processor comes with a toolchain which does all the tuning by itself, I think it can sell equally well, because the burden will not be reflected on the developer, again.

So, I think popularity of the language itself has a great impact on hardware design.

AMD AthonXP had an "Optimized for Windows XP" badge on it. GPUs are built upon the programming model OpenGL and DirectX puts forward. Modern processors are made to please C and its descendants, because it's the most prominent programming model.

Lisp even tried to change this with "Lisp Machines", and they failed, because Lisp was not mature/popular enough at that point.

So we can say programming model drives hardware very much.

Yoric · 2023-10-16T22:02:34.000000Z

I believe that the point is the processor was designed to please C (by emulating PDP-11). And this design complicates things immensely, which is how we end up with side-channel attacks on our processors.

scythe · 2023-10-16T13:55:16.000000Z

>if C is not a portable low-level language, what on earth is?

This question doesn't have to have an answer. The author of TFA apparently believes that a low-level language is one that effectively and clearly exposes the execution model of the hardware to the programmer. Under this definition, no widespread language (except assembly) is truly low-level, and possibly none are.

Which, for what it's worth, is also what I was taught in school. C was consistently described as a high-level language by my professors, even if it is "lower-level" than almost everything else.

gpderetta · 2023-10-16T14:30:16.000000Z

The real question is whether you would even want to use a language that effectively and clearly exposes the execution model of the hardware. Not even most assemblers do that as architectures give stronger guarantees that would be implied by the microarch execution model.

Some machines do expose the microarchitecture (or better, there is no architecture other that what is implemented in hardware by a specific revision) and rely on install-time or even JIT code specialization. But especially on this machines it would be insane to try to manually target them as you would have to rewrite your code for every revision.

So, targeting the effective execution model of the machine is out of question. You need an abstraction. The question is whether C is the correct abstraction.

rfoo · 2023-10-16T13:03:14.000000Z

The post argues that there is no portable low-level languages, including C.

i.e. truly low-level languages can't be portable and is bound to the architecture.

scythe · 2023-10-16T14:04:16.000000Z

It's plausible that a language could expose some general logic behind instruction-level parallelism and cache management — even register renaming — without being explicitly tied to the way one particular architecture does that. I have no idea how to design such a language, but from 10000 meters I think it could be done.

I think the author oversteps his case by suggesting that ILP is an abomination that exists to preserve the availability of C-like languages. In my experience, many algorithms seem to naturally lend themselves to ILP, and I often find myself wondering whether I have typed them in so that these five lines will in fact run simultaneously. One common flaw in critiques of the common C compiler model is that they all seem afflicted by a nostalgia for Lisp machines, when the space of unexplored possibilities is so much larger.

pjmlp · 2023-10-16T12:40:01.000000Z

Only when taking into account language extensions that are compiler specific and not part of ISO C.

Also a reminder that any language can have toolchains with extensions exposing low level features.

hcks · 2023-10-16T12:59:22.000000Z

Funny how the top comment on "hacker" news is an *unsubstantial* comment about how, actually, TFA is wrong.

Even worse, adding a comment on how actually you shouldn’t be curious and understand how things really work.

titzer · 2023-10-16T16:10:52.000000Z

There's a lot of moaning and crowing here, but no real substance. If one were to design a CPU and its ISA from scratch, what would you do? Instructions, control flow, memory, out-of-order execution, caches, hierarchies, branch prediction, you'd probably end up with all of it down there anyway. I don't get the point about GPUs. Real applications aren't matrix multiplies and embarrassingly parallel numeric algorithms, they run general purpose PLs.

Which basically then boils down to ISA design. If you could design an ISA from scratch for the hardware you design from scratch, what would you do? Well, there aren't that many options. Stack machine, dataflow machine, VLIW machine. All of those have been tried and the modern superscalar CPUs kick their butts on every metric except power.

The whole article kind of misses the point anyway. We should probably be running higher level languages for most things anyway, which shouldn't be overly constrained by hardware design. For everything else, 100% serious, there is WebAssembly, and hardware ISAs will fade below this level of abstraction in the fullness of time.

spion · 2023-10-16T13:51:52.000000Z

Can you elaborate?

ndiddy · 2023-10-16T14:37:00.000000Z

I disagree with the author's point that CPU instruction sets should expose more of the CPU's implementation. This has been tried in the past and failed to work long-term. One example of this is branch delay slots from some RISC processors (such as MIPS and SuperH) designed in the late 80s and early 90s. For those unfamiliar with the concept, it basically means that the instruction after a branch instruction will get run regardless of if the branch was taken or not. This was a short-term benefit, as it meant the job of avoiding pipeline stalls after a branch was left to the programmer, so the processor could be simpler and cheaper than designs without them. However, as time went on, the processor designs evolved with more complex pipelines, so the single instruction wasn't enough to cover the branch delay. Instead, it became a legacy issue that future processors had to deal with for compatibility reasons and made their branch prediction and pipeline logic more complex.

danielmarkbruce · 2023-10-16T14:44:55.000000Z

I don't think he's saying "expose random implementation details". Exposing the wrong details would obviously be bad. He's just saying c's model has significant shortcomings in the world of modern CPUs.

rewmie · 2023-10-16T15:01:16.000000Z

> he's saying c's model just doesn't work well anymore.

The author argues that C's model does not fit the model he defined himself and claims to be the same model used by everyone.

After going through the article, I'm left with the impression that the author's thesis is flawed and relies on a series of strawmen arguments. Among the strawmen we find:

* arguing that speculative execution "were added to let C programmers continue to believe they were programming in a low-level language".

* claiming that "modern processors are trying to emulate "the same abstract machine as a PDP-11"

* "Creating a new thread is a library operation known to be expensive, so processors wishing to keep their execution units busy running C code rely on ILP (instruction-level parallelism)."

* etc etc etc.

I don't think this opinion piece is grounded on reality, let alone is an objective take.

danielmarkbruce · 2023-10-16T16:00:31.000000Z

He doesn't define a model. He just discusses the gap between c's model and a few details of a modern CPU and talks about a few other models.

In your opinion, why was speculative execution added? It doesn't seem off base to suggest it was to enable programmers to continue writing single threaded applications while increasing execution speed.

In your opinion, what is wrong with the statement that modern processors are trying to emulate an abstract machine like PDP-11? To me it seems largely right.