What is C really? A concise syntax to define structs and functions, with a usabl...

shadowgovt · 2023-10-16T14:54:51.000000Z

To my aesthetic C is the wrong abstraction because while all those things are possible, the language exposes them via a syntax that makes you think you're writing an embarrassingly sequential program and then tries to hide all of the parallelization that improves performance in the undefined behavior.

I liken it to doing imperative UI development on top of the DOM abstraction in a browser. Yes, under the hood, the browser is choosing when to re-evaluate and repaint interface elements, but you can't touch any of that; you're instead rearranging things in the DOM and memorizing heuristics the browsers use to try and trick the browsers into matching changes to the DOM to visual changes in the browser UI efficiently.

It may very well be time for a low level languages to encourage us to think about programming as "arranging independent blocks of code that can be executed in parallel, with only a handful of sequencing operations enforcing some kind of dependency order. Apart from honoring those sequencing requirements, order of execution or whether execution happens in parallel is undefined."

jstimpfle · 2023-10-16T15:13:37.000000Z

> "arranging independent blocks of code that can be executed in parallel, with only a handful of sequencing operations enforcing some kind of dependency order. Apart from honoring those sequencing requirements, order of execution or whether execution happens in parallel is undefined."

Isn't that exactly what is happening?

shadowgovt · 2023-10-16T15:42:13.000000Z

More or less, but nothing about the design of the language puts that frontmost. Instead, the language is designed to make the developer think they're operating on an embarassingly-sequential machine and only the vast amount of undefined behavior in the language spec allows the compiled output to be parallel.

It's the wrong abstraction for the job and properly using C in a way that takes advantage of it requires unlearning most of what people think they know about how the language works. I'd like to see more languages that start from a place of "Your code can execute whenever the computer thinks is most efficient; don't ever think you know the execution order" and then treat extremely-sequential, deterministic computing as a special case.

jstimpfle · 2023-10-16T15:54:06.000000Z

I think you're just making wrong assumptions. Any C programmer worth their salt knows that that both compilers as well as the CPU introduce a lot of reordering / instruction-level parallelism as optimizations.

You can SIMD / multi-thread explicitly as much as you feel like, but you'll soon find your productivity diminishing, which is not a language fault.

shadowgovt · 2023-10-16T16:05:22.000000Z

I don't want to SIMD / multi-thread explicitly.

I want my language to have low-level abstractions like "pack data into an array, map across array. Reduce array to a value." Those are abstractions a programmer can look at and go "Oh, the compiler will probably SIMD that, I should use it instead of a for loop." In contrast, C will auto-unroll loops. Unless it doesn't. Go memorize this pile of heuristics on popular architectures so you can guess at whether your code will be fast.

I want my language to have low-level abstractions like Orc's parallel and sequential combinators, so that when I need some operations sequenced I can force them to be, when I don't I can let the compiler "float" it and assemble the operations into whatever sequence is fastest on the underlying architecture; I don't have to memorize a bunch of heuristics like "the language allows arbitrary ordering of execution for either side of a '+' operator in an expression, but statements are executed sequentially, unless they aren't it depends on the contents of the statement."

In short, I want my language to ask me to think in terms of parallelism from the start so that my mind is always in the head-space of "This program will be executed in nondeterministic order and I shouldn't assume otherwise."

jstimpfle · 2023-10-16T16:47:35.000000Z

> pack data into an array, map across array. Reduce array to a value."

These are abstractions that you've been able to enjoy for a long time, by using higher-level languages like C++ or Rust. So C didn't prevent the feature, after all.

You could argue now that C has prevented CPUs from implementing these abstraction (because arguably, C cannot express them), but I would like to ask first how you think it should be done, and why it's not a good idea to implement it on the language/compiler level how it's currently done?

If there comes up a new way that lets CPUs understand type theory and magically multi-thread your variably-sized loops by creating a new set of execution units out of thin air, you'll have a point. For the time being, there doesn't seem to exist such a thing, and I can't imagine that the reason why not is C. Rather, if such a thing is nearing practicability, C will have to adapt or slowly die out.

> In short, I want my language to ask me to think in terms of parallelism from the start so that my mind is always in the head-space of "This program will be executed in nondeterministic order and I shouldn't assume otherwise."

This works only in a very limited way in practice. To solve practical problems, you need to combine logic/arithmetic instructions serially to achieve the intended effect. Seems to me that it turned out that most degrees of freedom are more accidental than structured, and it's not practical to manually specify them, when the majority of them are easily recoverable in an automated way.

So that's how you end up with that instruction-level parallelism that is worked out by the compiler and the CPU.

shadowgovt · 2023-10-16T17:51:11.000000Z

> You could argue now that C has prevented CPUs from implementing these abstraction (because arguably, C cannot express them), but I would like to ask first how you think it should be done, and why it's not a good idea to implement it on the language/compiler level how it's currently done?

As I said top-thread, it's to my aesthetic. These are all Turing-complete languages and you can, in theory, do whatever in any of them. But map-reduce-fold-etc make it much clearer, to my eye, that I'm operating on a blob of data with the same pattern, and it's easier to map that in my brain to the idea "The compiler should be able to SIMD this." Contrast with loops requiring me to look at a sequential operation and go "I'll trust the compiler will optimize this by unrolling and then deciding to SIMD some of this operation." The end-result is (handwaving implementation) the same, but the aesthetic differs.

As you've noted, I'm not unable to do this in C or C++ or Rust (in fact, C++ is especially clever in how it can use templates to inline static implementations recursively so that the end result of, say, the dot product of two N-dimensional vectors is "a1 x b1 + a2 x b2 + a3 x b3" for arbitrary dimension, allowing the compiler to see that as one expression and maximize the chances it'll choose to use SIMD to compute it). But getting there is so many layers of abstraction away (I had to stare at a lot of Boost code to learn that little fact about vector math) that the language gets in the way of predicting the parallelism.

> If there comes up a new way that lets CPUs understand type theory

CPUs don't understand type theory. Compilers do and they can take advantage of that additional data to do things like unroll and SIMD my loops right now. My annoyance isn't that it's impossible, it's that I'd rather the abstraction-to-concrete model be "parallel, except sometimes serial if the CPU doesn't have parallel instructions or we hit capacity on the pipelines," not the current model of "serial, and maybe the compiler can figure out how to parallelize it for you."

> To solve practical problems, you need to compile logic/arithmetic instructions serially to achieve the intended effect... Seems to me that it turned out that most degrees of freedom are more accidental than structured, and it's not practical to manually specify them

I agree... Eventually. There's a lot of parallelism allowed under-the-hood in the space between where most programmers think about their code, as evidenced by C's undefined behavior for expression resolution with operators of the same precedence.

Whether degrees of freedom evolved by accident is irrelevant to whether a new language could specify those parts of the system (sequential vs. intentionally-undefined ordering) explicitly. C, for example, has lots of undefined behavior around memory management; Rust constrains it. It's up to the language designer what is bound and what is allowed to be an arbitrary implementation detail, intentionally left undefined to give flexibility to compilers.

Even the modern x86 instruction set is a bit of a lie; under the hood, modern CPUs emulate it by taking chunks of instruction and data and breaking them down for simultaneous execution on multiple parallel pipelines (including some execution that never goes anywhere and is thrown away as a predictive miss). CPUs wouldn't be nearly as fast as they are if they couldn't do that.

I'm not advocating for breaking the x86 abstraction; that's a bit too ambitious. But I'd like to see a language take off that abandons the PDP-11 embarrassingly-serial era of mental model in favor of a parallel model.

imtringued · 2023-10-17T09:33:56.000000Z

Yes, in Haskell.

crabbone · 2023-10-16T15:24:46.000000Z

> A concise syntax to define structs and functions, with a usable expression syntax. [...] I've always found it ridiculous for people to claim it's holding hardware back.

You just looked in your fish tank and declared what the weather is going to be like in the Atlantic ocean... Like... these things have nothing to do with each other. The fact that C has functions or structs has nothing to do with it being awful influence on designing hardware.

Here are some reasons why C is awful.

* It believes that volatile storage is uniform in terms of latency and throughput. This results in operating systems written with the same stupid idea: they only give you one system call to ask for memory, and you cannot tell what kind of memory you want. This in turn results in hardware being designed in such a way that an operating system can create the worthless "abstraction" of uniform random-access memory. And then you have swap, pmem GPU's memory etc. And none of that has any good interface. And these are the products that despite the archaic and irrelevant concept of how computers are built have succeeded to a degree... Imagine all those which didn't. Imagine those that weren't even conceived of because the authors dismissed the very notion before giving the idea any kind of thinking.

* It has no concept of parallelism. In its newer iterations it added atomics, but this is just a reflection of how hardware was coping with C's lack of any way to deal with parallel code execution. C "imagines" a computer to have a CPU with a single core running a single thread, and that's where program is executed. This notion pushes hardware designers towards pretension that computers are single-threaded. No matter how many components your computer has that can actually compute, whenever you write your program in C, you implicitly understand that it's going to run on this one and only CPU. (And then eg. CUDA struggles with its idea of loading code to be executed elsewhere, which it has to do in some very cumbersome and hard to understand way, which definitely doesn't rely on any of C's own mechanisms).

jstimpfle · 2023-10-16T15:49:39.000000Z

> It believes that volatile storage is uniform in terms of latency and throughput.

It doesn't, I don't think it even mentions terms like latency and throughput.

> they only give you one system call to ask for memory, and you cannot tell what kind of memory you want

What?

> Imagine those that weren't even conceived of because the authors dismissed the very notion before giving the idea any kind of thinking.

Such as?

> It has no concept of parallelism.

C can function with instruction-level parallelism, CPU-level parallelism, process/thread-level parallelism just fine.

> C "imagines" a computer to have a CPU with a single core running a single thread, and that's where program is executed.

Given that a memory model was introduced in C11, and that there were other significant highly concurrent codebases before that, I'm having doubts how correct and/or meaningful that statement is.

For sure, when trying to understand the possible outcomes of running a piece of code is when running it in a single thread (doesn't matter on how many CPUs though, apart from performance). That is just the nature of multi-threading, it's hard to understand.

> This notion pushes hardware designers towards pretension that computers are single-threaded.

How do they pretend so? My computer is currently running thousands of threads just fine. It has a huge number of components, from memory to CPU to controllers to buses to I/O devices, that are executing in parallel.

crabbone · 2023-10-16T19:09:43.000000Z

> It doesn't, I don't think it even mentions terms like latency and throughput.

It only has one group of functions to allocate memory, and neither of them can be configured wrt' to what storage to allocate memory from, definitely not in terms of that storage's latency or throughput which would be very important in systems with non-uniform memory access.

Compare this to eg. concept of "memory arenas" that explicitly exists in eg. Ada, but many languages have libraries to implement this idea -- in this situations, instead of using language's allocator, you'd be using something like APR's memory pools <https://apr.apache.org/docs/apr/trunk/group__apr__pools.html>.

> and that there were other significant highly concurrent codebases before that

You are again confused... there weren't such codebases in C because C is bad for this. It's not because nobody wanted it. Highly concurrent codebases existed in Erlang since forever, for example.

> How do they pretend so? My computer is currently running thousands of threads just fine.

Threads aren't part of C language. They exist as a coping mechanism. Their authors are coping with the lack of parallelism in C, which is exactly the point I'm making. Threads exist to fix the bad design (not that they are a good fix, especially since they are designed by people who believe that there's nothing major wrong with C, and the thousand and first patch will definitely fix the remaining problems).

So, not only this is a counter-argument to the point you are trying to make, it's also yet another illustration to how using C prevents designers from seeking more adequate solutions.

> from memory to CPU to controllers to buses to I/O devices, that are executing in parallel.

The point is not that they cannot run in parallel... The point is that C doesn't give you tools to program them to run in parallel.

jstimpfle · 2023-10-16T23:03:13.000000Z

> It only has one group of functions to allocate memory, and neither of them can be configured

Seriously, have you done any non-trivial C programming? Because those are blatant falsehoods. You must be talking about uni level introduction to C programming, using malloc/free and thinking that's how you "allocate".

> You are again confused... there weren't such codebases in C because C is bad for this. It's not because nobody wanted it. Highly concurrent codebases existed in Erlang since forever, for example.

Just one example I know, take the Linux kernel which had a good amount of SMP support way before C11. I believe they still haven't switched over to the C11 memory model.

> The point is not that they cannot run in parallel... The point is that C doesn't give you tools to program them to run in parallel.

How come then, that my computer is running so many things, many of them written in C, in parallel?

jstimpfle · 2023-10-17T02:48:55.000000Z

> Threads exist to fix the bad design (not that they are a good fix, especially since they are designed by people who believe that there's nothing major wrong with C, and the thousand and first patch will definitely fix the remaining problems).

The thing that needs fixing is mostly people like you, purporting falsehoods while lacking deeper understanding how it works / how it's used.

Threads are a concept that exists independently from any language. They are the unit of execution that is scheduled by the OS. If a program should be multi-threaded with parallel execution (instead of only concurrent execution), by necessity you need to create multiple threads. (Or run the program multiple times in parallel and share the necessary resources, but that's much less convenient and lacks some guarantees).

taejo · 2023-10-16T17:10:31.000000Z

> > It believes that volatile storage is uniform in terms of latency and throughput.

> It doesn't, I don't think it even mentions terms like latency and throughput.

Yes, that's the whole point.

jstimpfle · 2023-10-16T17:34:03.000000Z

No it isn't. Not mentioning the differences isn't the same as acting like they don't exist. Those things are only treated as out of scope.

Not every concept must be expressed in language syntax / runtime objects, nor is necessarily it a good idea to do so. In many cases, it's a bad idea because it leads to fragmentation and compatibility issues. At some point, one has to stop making distinctions and treat a set of things things uniformly, even though they still have differences.

CPUs have various load and store instructions that all work with arbitrary pointer addresses. Whether the address is a good/bad/valid/invalid one will only turn out at run time. There would be little point to make a separate copy of these instruction sub-sets for each kind of memory (however you'd categorize your memories). The intent as well as the user interface are the same.

I think that's basic software architecture 101. (Once you've left uni and left behind that OOP thinking where every object of thought must have a runtime representation).

Btw. C compilers allow you to put a number of annotation on pointers as well as data objects. For example pointer alignment to influence instruction selection, or hints to the linker...

theamk · 2023-10-16T15:38:48.000000Z

At least for the first point: C has been used extensively with non-uniform storage. Back in the DOS days when we had memory models (large, small, huge, etc...), and today, when programming all sorts of small microcontrollers. A common one I occasionally is AVR, which has distinct address spaces for code and data memory - which means a function to print string variable is a very different from the one used to print a string constant. This makes programs rather ugly, but things generally work.

As for your parallelism idea.. well every computer so far has a fixed number of execution units, even your latest 16384 core GPU still has every core perform sequential operators. And that's roughly what C's model is, it programs execution units. And it definitely hasn't stopped designers from innovating - complete different execution models like FPGA exists, and have a constant innovation in programming languages.

crabbone · 2023-10-16T19:20:56.000000Z

> At least for the first point: C has been used extensively with non-uniform storage

And the results are awful. You are confused between doing something and doing it well. The fact that plenty of people cook frozen pizza at home doesn't make frozen pizza a good pizza.

> And it definitely hasn't stopped designers from innovating

And this is where you are absolutely wrong. We have hardware designs twisted beyond belief only so that they would be usable with C concepts of computer, while obviously simpler and more robust solutions are discarded as non-viable. Just look at the examples I gave. CUDA developers had to write their own compiler to be able to work around the lack of necessary tools in C. We also got OpenMP and MPI because C sucks so much that the language needs to be extended to deal with parallelism.

And it wasn't some sort of a hindsight where at the time of writing things like different memory providers were inconceivable. Ada came out with the concept of non-uniform memory access baked in. Similarly, Ada came out with the concept for concurrency baked-in. It was obvious then already that these are the essential bits of system programming.

C was written by people who were lazy, uninterested to learn from peers and overly self-confident. And now we've got this huge pile of trash of legacy code that's very hard to replace and people like you who are so used to this trash, that they will resist its removal.

jstimpfle · 2023-10-16T23:49:14.000000Z

You are very confidently making some wild statements that seem to be based on the assumption that only because something isn't specified in a given place, it couldn't be specified somewhere else. That assumption is wrong.

fanf2 · 2023-10-16T15:59:09.000000Z

I don’t think it’s fair to blame C for the flat random access memory model. Arguably it goes back to Von Neumann. There was a big push to extend the model in the 1960s through hardware like Atlas and Titan (10 years before C) and operating systems like Multics. And there’s all the computer science algorithms analysis that assumes the same model.

crabbone · 2023-10-16T19:23:38.000000Z

At the time C rose to prominence there was already understanding that memory access isn't going to be uniform, and less and less so as hardware evolves and becomes more complex. Ada came out with this idea from the get go.

Von Neumann created a model of computation. It's a convenient mathematical device to deal with some problems. He never promised that this is going to be a device to deal with all problems, nor did he promise that this is going to be the most useful or the most universal one etc.

fanf2 · 2023-10-16T19:48:03.000000Z

You’re echoing my point back at me, though to be fair I should have been more explicit that my examples from the 1960s were about caches and virtual memory and other causes of nonuniform access hidden under a random access veneer.

But we can go 15 years earlier: Von Neumann wrote in 1946: “We are therefore forced to recognize the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.” https://www.ias.edu/sites/default/files/library/Prelim_Disc_...

imtringued · 2023-10-17T09:26:28.000000Z

>Well, for one, C's semantics aren't that serial, there is a large degree of freedom for compilers and CPUs how to schedule the execution of C expressions and statements.

I thought about the implications of a "parallel" statement, where everything is assumed to execute in parallel and oh boy are the implications big. C's semantics are serial but they contain implicit parallelism. The equivalent is that the parallel statement contains implicit sequentialism that the compiler can exploit to reduce the amount of book keeping needed by the CPU to schedule thousands of instructions at the same time. E.g. instead of having an explicit ready signal and blocking on it, the compiler can simply decide to split the parallel statement into two parallel statements, one executed after the other. Implicit sequentialism! A parallel statement implies that no aliasing writes are allowed to be performed. I don't know what the analysis for that would look like, but in many common cases I would expect the parallel statement to be autovectorized quite reliably.

>Even though that stuff happens in parallel, any instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code (and address spaces)?

Uh, you know we can just encode the program as a graph? Graph reduction machines are a thing, you know.

jstimpfle · 2023-10-17T13:27:11.000000Z

> we can just encode the program as a graph

What is the output medium for the encoded representation? A linear address space, like a file, or virtual memory.

circuit10 · 2023-10-16T17:00:53.000000Z

“instruction encoding is by necessity serial. Or is anyone proposing we should switch to higher-dimensional code”

That is sort of a thing: https://en.m.wikipedia.org/wiki/Very_long_instruction_word

If you have multiple instructions grouped together like this you could think of it as being a 2D array of instructions