Using Zig's translate-C to understand weird C code

WithinReason · on Nov 3, 2021

The author translates 1 line of C mess to 7 lines of Zig mess which are then manually simplified through transforms. Why not just apply transforms to the original C code? Or is this only for people who know Zig syntax but not C?

kristoff_it · on Nov 3, 2021

> Or is this only for people who know Zig syntax but not C?

It's a blog post hosted on zig.news (so intended for an audience that knows Zig), I think we can safely assume that it's not meant to be an universal guide for understanding C code, but rather the recount of a (somewhat minor) experience lived by the author. It's even stated upfront:

> @rep_stosq_void on Twitter posted this strange sample of C code, and I wanted to show my process of understanding this contrived C code.

"I was presented with this code, and this is what I did to understand it"

sobeston · on Nov 3, 2021

Author here, I'm just showing how I did it. It took a few minutes to do it for myself - figured it would be worth the 15 minutes or so it'd take to write it down. Just sharing my process, there's nothing too serious about this post.

I've happened to have written a lot of C, but some areas (like this one) I'm not as confident in. This way of casting, and working with types is very poor syntax in my opinion (I mean below in this thread you have people arguing about the spiral rule and such, it's obviously a common confusion).

Zig's way of expressing types and pointers is far superior, these transformations I made were done quickly and I didn't feel like there was any ambiguity or confusion in anything. Just a series of simple reductions, there are no "tricks" or easy mistakes to make in the code. It feels like a trivial proof.

Obviously I am biased, this was posted to zig.news. I thought it was a neat showcase of translate-c, and how zig does some of these things nicer. I'm not telling everyone that they should do what I did, but this works for me and I'm happy to share.

mannykannot · on Nov 3, 2021

As someone unaware of Zig and its translate-C feature, I read the article as being about those things, not C.

rightbyte · on Nov 3, 2021

Or like use the cdecl cli tool ...

abaracadabi · on Nov 3, 2021

[flagged]

dnautics · on Nov 3, 2021

You do understand that zig started out by removing things from C and then "adding rich constexprs" and that's it? It's arguably simpler than C, C is actually two languages (C and C preprocessor), three if you add make, five if you add cmake and autoconf.

drfuchs · on Nov 3, 2021

What's the motivation for changing the visually appealing C syntax "ptr->field" to the whack "ptr.*.field"?

sobeston · on Nov 3, 2021

ptr.field works.

10000truths · on Nov 3, 2021

make, cmake and autoconf are not not part of the C language, they are build tools.

dnautics · on Nov 3, 2021

In practice you can't be productive in a large project in C without make, increasingly cmake (at the very least for reading someone else's c project). And it's part of the zig agenda to unify build tools, too. We are now in the 2nd decade of the 21st century, it is not strange for programming languages to be opinionated about build tools, ruby has rake, rust has cargo, js has, well, js is a mess, don't copy js, elixir has mix, etc (and those are just the languages I know).

10000truths · on Nov 4, 2021

That doesn’t make those tools part of the C standard, though. The standard does not mandate the use of any particular build tool for C projects beyond a conformant C compiler. There is nothing stopping me from using meson, ninja, bazel, a Frankenstein shell script, etc. to build a C project. Hell, there is nothing stopping me from using make to build non-C projects - it was designed as a generic dependency resolution tool.

nicoburns · on Nov 4, 2021

> There is nothing stopping me from using meson, ninja, bazel, a Frankenstein shell script, etc. to build a C project

There is if you intend to use libraries as those libraries may well have chosen a build system other than the one you have chosen meaning you'd have to rewrite their build scripts. This is far from the experience of using libraries in languages with unified build systems which is typically as simple as adding a line with the version of the library you want to a manifest file.

dnautics · on Nov 4, 2021

> That doesn’t make those tools part of the C standard, though

So? I'm talking about the C experience, not the C standard.

sedatk · on Nov 3, 2021

You somehow managed to criticize Zig for being something that C isn't, lacking macros, preprocessor and whatnot, and yet praise Go and Python for being opinionated about their features, which don't have preprocessors or macros.

abaracadabi · on Nov 3, 2021

Not really. I praise Python for simple syntax and Go for simplicity of language features.

And I criticize Zig for trying to be another kitchen sink (despite having no macros or preprocessor).

AtomToast · on Nov 3, 2021

Having few features that are small and knowable is pretty much the idea behind Zig though. It's supposed to be more of a C replacement than another C++/Rust. If you know C you can just read through the Zig language spec and see how stuff correlates and you just learned zig.

Removing a whole secondary language for macros actually just makes the language simpler

xigoi · on Nov 3, 2021

Since you like Python's syntax, what's your opinion about Nim?

abaracadabi · on Nov 3, 2021

Honestly, I’m not sure about Nim. I’d probably reach for Go before Nim because I hate layers in-between things and too many language paradigms.

Main thing is that all these hip new languages will be dead in a couple years because they don’t do anything all that innovative. Making “another C” or “another C target” versus Go which focuses on channels, slices, and go routines. These other languages that “make a better X” simply will be sidelined in production deployments because no matter the language there will always be warts.

Really folks keep reinventing the wart in the name of something “better”.

hansvm · on Nov 3, 2021

Maybe it's still not the language for you or for your projects, but for anyone else stumbling across this I don't think it's a fair characterization. Would you mind elaborating on where you think I'm off-base in the following?

> head-strong nerds inventing things

Let people have their fun.

> preprocessor allows you to add some logic before compilation

Zig's comptime feature does address that need. For most use cases where comptime and the preprocessor would both suffice, comptime has tons of strong advantages (type checked, errors traced back to the right lines, ability to write arbitrary turing-complete code without unholy syntactic black magic, no double evaluation, no need to wrap everything in extra parentheses, ....). It explicitly does not address use cases where you would actually want a textual preprocessor (like making the language look like Fortran), the argument being that the burden on reading unfamiliar code would be too high. The ease with which you can write arbitrary comptime code also enables you to do things like embed lookup tables in the binary, which in C you would ordinarily do by copy pasting from another tool (or writing constants by hand) or adding yet another preprocessor to the mix.

Comptime isn't a clear win over preprocessing in all cases, but having written a lot of C and a little Zig, if I had to choose one for a hypothetical new project I'd be tempted to use Zig just for access to that one feature.

> Syntax

Zig does feel a little clunky to write to me right now, but I think that's mainly do to how often I'm using @someBuiltin() in Zig when I would be using an operator or an additional language feature in something like Python.

That said, the syntax itself is incredibly simple. You can check that yourself for Zig [0], Python [1], and Go [2]. Zig has far fewer syntactic quirks and abilities than either of the other two. You have functions, operators, code blocks, types, reserved characters/keywords for builtin stuff, a few kinds of literals, and a little bit of syntactic sugar for working with structs.

> Don't reinvent C++...for instance, I even consider C better than Zig just because the features are small and knowable

To each their own. FWIW, Zig is explicitly and actively avoiding becoming a kitchen sink language like Rust or C++. They've added a few new features, and I personally find it easier to keep track of those than of all the different kinds of undefined behavior I might stumble across in C, especially since most of those (defer, errdefer, labeled break, ...) behave exactly as your intuition suggests they would.

[0] https://ziglang.org/documentation/master/#Grammar

[1] https://docs.python.org/3/reference/grammar.html

[2] https://golang.org/ref/spec

dosshell · on Nov 3, 2021

Or like use https://cppinsights.io/ ...

michaelmior · on Nov 3, 2021

That gives the following output which I don't think is much more readable.

  void f(int * a)
  {
    void * p = reinterpret_cast<void *>(&a);
    ***reinterpret_cast<int *(*)[]>(p) = 1;
  }

cies · on Nov 3, 2021

Maybe after several steps of manual processing it is :)

(not sure what steps that should be)

michaelmior · on Nov 3, 2021

Sure but I don't think C++ Insights really helped.

MauranKilom · on Nov 3, 2021

The only particularly messy part in the C code there is the (int*(*)[]) type cast.

My intuition (because I don't usually have to deal with this kind of nonsense) is "cast to a pointer to an array of int pointers". cdecl confirms that: https://cdecl.org/?q=int*%28*x%29%5B%5D

So we cast a (pointer to pointer to int) to (pointer to array of int pointers) [we can ignore the detour through void*] and then immediately dereference through all three layers. Which gives us back the only int in the program.

Excellent example of how to be a Three Star Programmer I guess: https://wiki.c2.com/?ThreeStarProgrammer

pmarreck · on Nov 3, 2021

TIL about cdecl.org

I don't code in C day to day but this will help for the odd time I need to understand C code!

bruce343434 · on Nov 3, 2021

That line is honestly trivial for anyone familiar with the spiral rule.

Interpret p as a pointer to an array of pointers to integer. Since arrays decay to pointers, you triple dereference it to get

1. the array

2. the first element of that array

3. whatever int that element is pointing to

Then set it to 1.

It took me about 5 seconds. I have no idea what that Zig is supposed to do on the other hand.

Note: the programmer who wrote that C code was being very obtuse. This is essentially just `*a = 1`, the whole p thing is some form of baroque code.

palotasb · on Nov 3, 2021

Any time the spiral rule comes up, I like to point out that it's wrong. It is instructive in a way because one learns more about C declaration syntax, but it is even more instructive to recognize why it is wrong.

The spiral rule works only if there is no pointer to pointer or array of array in the type. But take this for example:

        +----------------------------+
        | +-----------------------+  |
        | | +------------------+  |  |
        | | | +-------------+  |  |  |
        | | | | +--------+  |  |  |  |
        | | | | |  +--+  |  |  |  |  |
        | | | | |  ^  |  |  |  |  |  |
    int * * ¦ ¦ ¦ xxx[1][2][3] |  |  |
     ^  | | | | |     |  |  |  |  |  |
     |  | | | | +-----+  |  |  |  |  |
     |  | | | +----------+  |  |  |  |
     |  | | +---------------+  |  |  |
     |  | ---------------------+  |  |
     |  +-------------------------+  |
     +-------------------------------+

The type of xxx is a [1-element] array of [2-element] array of [3-element] array of pointer to pointer to ints. I drew a spiral that passes through each specifier in the correct order.

Notice that to make the spiral correct it has to skip the pointer specifiers in the first three loops. This is marked by ¦. This is not mentioned in the original spiral rules and one could be forgiven to parse the expression as xxx -> [1] -> pointer -> [2] -> etc. following a spiral that doesn't skip the pointers.

The Right-Left Rule is quoted less frequently on HN but it's a correct algorithm for deciphering C types: http://cseweb.ucsd.edu/~ricko/rt_lt.rule.html

The spiral rule can be modified to process all array specifiers before all pointer specifiers, but then you'd have to specify that the order to do so is right and then left. At that point it's just the Right-Left Rule.

userbinator · on Nov 3, 2021

The summary is "array and function has higher precedence than pointer, and parentheses override all". This explains why

    int *f();

is a function returning pointer to int, while

    int (*f)()

is a pointer to function returning int. Likewise,

    int *a[]

is an array of pointer to int, while

    int (*a)[]

is a pointer to array of int.

nicoburns · on Nov 4, 2021

Argh! All of these examples seem backwards to me. Why on earth isn't "a pointer to an array of int" something like:

    *(int[])

userbinator · on Nov 4, 2021

Because of how precedence works --- you start from the inside and work outwards, the same as when evaluating ordinary maths expressions.

"Pointer to array of int" means you dereference it first to get an array of ints, then index to get an int.

pmarreck · on Nov 3, 2021

between things like this and the pre- and post-increment/decrement operators, it's shocking that C code generates so many security holes ;)

nicoburns · on Nov 3, 2021

> That line is honestly trivial for anyone familiar with the spiral rule.

Perhaps, but the spiral rule is one of the most complicated parts of C. Most things are trivial to someone who knows how to do the thing.

dnautics · on Nov 3, 2021

I thought the spiral rule was a tongue-in-cheek "rule" that was just made to point out a problem with C.

klyrs · on Nov 3, 2021

If it's not wrong...

bruce343434 · on Nov 3, 2021

You're right of course. Every language has warts, and declaration, especially involving pointers, is C's. But once you internalize that declaration mirrors usage, together with the spiral rule, it will all immediately become clear. There is some method to the madness.

rightbyte · on Nov 3, 2021

The confusing part for beginners is maybe to know were to start the spiral (or right left scheme) in a cast since there is no identifier.

steerablesafe · on Nov 3, 2021

Yeah, because the "spiral-rule" is dumb, it's not a spiral at all. It's just regular operator precedence.

bruce343434 · on Nov 3, 2021

That's a fair point. In C, you always start at the identifier. In case there is none, type declarations can contain parentheses*, and just like in math, parens resolve first, so it's from innner to outer. So in this case one starts with the `(*)`.

* the tricky part is that `()` are also used to denote functions. So yeah, it's not always readable. `(*)()` would be a pointer to a function returning int (the default type) and taking an unspecified amount of arguments.

simias · on Nov 3, 2021

Do the parens in `int * (*) []` do anything? When I saw them I immediately assumed that function pointers would be involved, but it doesn't seem to be the case and now I'm confused.

EDIT: Uh, apparently you can add parens to casts but not declarations.

This doesn't compile:

    int (*) p = (int *)&i;

This does:

    int *p = (int (*))&i

I can't quite justify this behaviour.

foxfluff · on Nov 3, 2021

Yes they do something. They are used to override precedence, just as you would in a math expression. Array indexing has higher precedence than pointer dereferencing, and declarations follow the same precedence that you have in expressions.

   int **p[123]; // p is array(123) of pointer to pointer to int

   int *(*p)[123]; // p is a pointer to array(123) of pointer to int

Sometimes people find casts confusing because there is no identifier inside. But you can easily read it if you know where the identifier would be in an equivalent declaration.

Precedence is usually documented in a man page called operator. http://man.openbsd.org/operator

MauranKilom · on Nov 3, 2021

I think this is just about where the identifier has to go in a declaration. Otherwise the spiral rule [or rather right-left-rule, as pointed out elsethred] doesn't start at the right place (to over-simplify it).

You can in fact have parentheses in declarations, but the identifier must be on the inside, not just to the right of everything: https://godbolt.org/z/vKzcYMdvK

abainbridge · on Nov 3, 2021

I've always thought that the "obfuscated C competition" is proof that it is quite hard to obfuscate C. There'd be no challenge in most other languages. (I haven't got enough knowledge of Zig to know if it is as easy to understand as C in general).

tyingq · on Nov 3, 2021

A fair amount of obfuscated C is also the macro processor.

bruce343434 · on Nov 3, 2021

I feel lately every IOCCC submission just has an obligatory "replace some part of the code with arbitrary identifiers which the preprocessor will search and replace back". Honestly a cheap gimmick, but it leads to a bigger "whut" factor when first seeing the code.

bruce343434 · on Nov 3, 2021

Yes, the IOCCC is always so interesting. Because the language is so spartan, there's often only one or two ways to do something, because to solve problems on a higher level you have to build the scaffolding yourself. Whereas in the usual dynamic scripting language you can reuse variables as different types, and conjure up a really convoluted approach in just a few statements.

userbinator · on Nov 3, 2021

The spiral rule is incorrect in the general case as other comments have already pointed out (the correct way is to read in precedence order) but I agree that it was trivial to decipher. I guess the opposite could be true for an experienced Zig programmer.

Joker_vD · on Nov 3, 2021

But &a gives you "int * *", and then it's dereferenced three times. How?

cecilpl2 · on Nov 3, 2021

The int** is not dereferenced three times.

The cast says "treat this int* as an array of int*, then the first dereference says "give me the first element of that array", which gets you back the original int*.

Joker_vD · on Nov 3, 2021

See, this would be a better, shorter, and more insightful explanation than that an article full of mechanical transformations performed by hand.

"We start with a pointer-to-int called "a", take pointer to it and name it "p" (it's a pointer-to-pointer-to-int, although we store it in a pointer-to-whatever variable), then cast it to some weird type, dereference it thrice and store 1 into the resulting target. Two questions remain: a) what is that weird type? b) we have two levels of indirection but three dereferences, how does it work? The answer to the first question is that weird type is "pointer-to-array-of-pointers-to-int", and it helps us to answer the second question: dereferencing a value of that type is a no-op in arithmetical sense (but has a type-casting effect)."

eMSF · on Nov 3, 2021

Unless I'm mistaken, this is almost, but not quite correct. (Firstly, what's being cast is `p` which is a `void-pointer`.) More importantly, the latter cast says to treat p as a `pointer to an array of int-pointers`, which means that the first dereference actually gives you the array. Expressions with an array type in most contexts (including this one) decay into pointers to their first elements – thus the second dereference gets you the first element of the array, an int-pointer, and the third one gets you the integer being pointed to.

Also, because this array type is incomplete (it has no size), I don't think you could use it in any context where it didn't "decay" into a pointer to its first element (you can test what your compiler says if you try to measure the array's size with the sizeof operator when only using one dereference).

nyc_pizzadev · on Nov 3, 2021

There is an array cast [] in there. So dereferencing that using a plain * takes you back to the original double pointer.

caslon · on Nov 3, 2021

Am I the only one who thinks that Zig's syntax looks incredibly ugly? I've never seen a snippet that didn't make my eyes glaze over from the syntax. What am I not getting here?

kristoff_it · on Nov 3, 2021

This particular example is machine-generated code created by translate-c, it's meant to be semantically equivalent to the C code and even uses language features that you're normally not even supposed to use (c pointers).

Reasonable Zig code looks more like this:

https://github.com/riverwm/river/blob/master/riverctl/main.z...

That said I think it's fine if you don't like the syntax. I think that some complaints are honestly too superficial to be legitimate (like complaining about builtins being prefixed with @), but at the same time Zig is often times prioritizing explicitness over "good looking".

I personally consider Swift a very good looking language, but then I look at all the new features that got added since I used it last, remember that I value simplicity over aesthetics, and go back to Zig.

dragonelite · on Nov 3, 2021

Ooh that looks a bit like Rust having quickly scrolling by. I have been keeping a eye on Zig but I'm waiting till the package manager stuff has been finalized and implemented.

capableweb · on Nov 3, 2021

> Am I the only one who thinks that Zig's syntax looks incredibly ugly?

It's very subjective what is beautiful or ugly, of course. It'd be more interesting if you can offer specific critique rather than just calling it ugly.

You're probably not the only one, I for one would call any non-lisp "ugly", but again, highly subjective, as many others find some C-like code beautiful but other C-like code ugly.

jcelerier · on Nov 3, 2021

> It's very subjective what is beautiful or ugly, of course.

if it was, something like trypophobia would not exist

capableweb · on Nov 3, 2021

What? Are you arguing that because a phobia (that not everyone had) exists, beauty is not subjective but absolute? I'd love to see how you measure beauty if so, including the "beauty of code".

Talanes · on Nov 3, 2021

Trypophobia proves beauty subjective, not the other way around. I can find an image beautiful that someone with trypophobia finds deeply unsettling.

jcelerier · on Nov 7, 2021

Objective / subjective does not mean that things cannot be experienced relatively. Someone colourblind isn't able to distinguish all colours, yet it does not mean that colours as a frequency of electromagnetic radiation are subjective, only their experience is.

cies · on Nov 3, 2021

It is common to make stuff you dont generally want in your code (but still need to be able to do because the language is sufficiently powerful) look ugly.

Like unsafePerformIO in Haskell.

deaddodo · on Nov 3, 2021

As someone who likes both C and Zig for their simplicity and explicitness, I don’t find it “ugly”.

I do find it practical, with an aesthetically barebones approach.

pjmlp · on Nov 3, 2021

You are not alone, if I miss @ everywhere I will just start an Objective-C project.

Then using modules with JavaScript AMD pattern is also not appealing.

AnIdiotOnTheNet · on Nov 3, 2021

If using '@' to prefix builtins[0] is the biggest complaint[1] people have about Zig, then Zig must doing a pretty good job.

[0] or really any statement at all about aesthetics

[1] I know for a fact pjmlp will also complain about lack of built-in gc or ref-count pointer safety.

pjmlp · on Nov 3, 2021

Paying attention it seems.

matheusmoreira · on Nov 3, 2021

Yeah. I think I'm so used to C by now that I just can't handle anything that doesn't look like C. It's like my brain just ignores text when it can't recognize the C code patterns.

Semantically Zig is really interesting though.

skocznymroczny · on Nov 3, 2021

I feel similar when it comes to non-OOP code. I don't like free functions and global variables just hanging there.

kristoff_it · on Nov 3, 2021

OOP might have helped popularize usage of the dot notation, but namespaces are a different thing.

Zig has no inheritance but everything is namespaced, including declaring functions inside struct definitions so that you can use them as if they were methods.

matheusmoreira · on Nov 3, 2021

Yeah, me neither. Names must belong to some namespace, it really bothers me when code starts binding common nouns in a global context. C lacks namespaces so I use prefixes instead. At least this solution doesn't screw up the ABI like in C++.

I hate global variables so much it's one of the reasons I got rid of libc. Freestanding C turned out to be a superior language just because it lacks all the libc cruft.

I ultimately dropped Ruby because of global state. It's such a wonderful language but it has one fatal flaw: lack of proper modules. The require method just executes Ruby source files, modifying the global state of the interpreter. It ceased to be a beautiful language once I realized this. Python's modules are superior, and the Javascript approach is the best one: just a normal function that returns a normal object containing exported data and functions. Javascript modules are c.ompletely reified.

foxfluff · on Nov 3, 2021

Yeah, the syntax doesn't really excite to me too much. Which is a shame, because I would like to see a modernized "better C" that isn't more verbose than C.

pornel · on Nov 3, 2021

Out if curiosity I've checked what c2rust.com thinks about it:

    **(*(p as *mut [*mut libc::c_int; 0])).as_mut_ptr() = 1 as libc::c_int;

which is still needlessly complicated, and not even quite accurate due to giving the array a 0 size (the as_mut_ptr() converts the array back to a C pointer).

Arnavion · on Nov 3, 2021

>and not even quite accurate due to giving the array a 0 size (the as_mut_ptr() converts the array back to a C pointer).

It doesn't seem inaccurate to me, more like the best choice at hand. If the C array has a known length, the Rust code has it too. Only if the C code has an array of unknown length does the Rust code use a 0-length array. Furthermore, if the C code indexes the array of unknown length, the Rust code uses .as_mut_ptr().offset(...) instead of directly indexing the array. So the fact that it represents C arrays of unknown length with Rust arrays of 0 length does not cause any problem, because the generated code is consistent.

    char foo(void* p) {
        char (*arr)[] = (char (*)[])p;
        return (*arr)[1];
    }

    char bar(void* p) {
        char (*arr)[3] = (char (*)[3])p;
        return (*arr)[1];
    }

... translates to:

    pub unsafe extern "C" fn foo(mut p: *mut libc::c_void) -> libc::c_char {
        let mut arr: *mut [libc::c_char; 0] = p as *mut [libc::c_char; 0];
        return *(*arr).as_mut_ptr().offset(1 as libc::c_int as isize);
    }

    pub unsafe extern "C" fn bar(mut p: *mut libc::c_void) -> libc::c_char {
        let mut arr: *mut [libc::c_char; 3] = p as *mut [libc::c_char; 3];
        return (*arr)[1 as libc::c_int as usize];
    }

tyingq · on Nov 3, 2021

For those mystified by certain bits of Perl, the built-in Deparse module is nice.

perl -MO=Deparse /some/script

Outputs a (usually) more readable equivalent, and also works for one-liners that use -e "somesnippet".

Joker_vD · on Nov 3, 2021

TL;DR:

    void f(int *a) {
        int **p = &a;
        **p = 1;  // the same as *a = 1;
    }

Making a concise explanation of how exactly the third dereferencing disappeared is left as a further exercise for the reader.

_pmf_ · on Nov 3, 2021

Why is taking the address of a parameter legal? Doesn't this depend on the ABI and could be a register?

ximeng · on Nov 3, 2021

https://stackoverflow.com/questions/34519318/c-address-of-fu... suggests it's guaranteed by the standard to be OK.

In the standard at https://web.archive.org/web/20181230041359if_/http://www.ope...

    6.5.3.2 Address and indirection operators
    Constraints
    1 The operand of the unary & operator shall be either a 
    function designator, the result of a [] or unary
    \* operator, or an lvalue that designates an object that
    is not a bit-field and is not declared with the
    register storage-class specifier.

So as the parameter is an lvalue it is guaranteed to work with the & operator.

jcranmer · on Nov 3, 2021

The parameter is copied into a stack variable so that you can take its address in such cases.