The "Clockwise/Spiral Rule" in C

tptacek · on Jan 18, 2013

In practice, nobody uses a "clockwise/spiral rule" to read C declarations; they typedef complex declarations into bite-sized components.

C declarations are hardest to understand when they involve (1) "arrays"† of "arrays" of "arrays", (2) function pointers, and (3) multiple layers of indirection.

In systems and networking code and in most application code, (1) "multi-level arrays" are uncommon, and they're uncommon in roughly the same way tuples of tuples of tuples are uncommon in Python: they're a symptom that you're missing a level of abstraction.

Function pointer declarations (2) are common and annoying to read. However, in virtually all cases, the core of what you're trying to express with a function pointer declaration is "what arguments does this function take" and "what does it return". More importantly, most idiomatic C code typedefs function pointers, so you're not looking at prototype decls with nested complex function pointer decls inside of them.

So instead of

    void (*signal(int sig, void (*func)(int)))(int);

a modern C programmer would expect

     typedef void (*sig_t) (int);

     sig_t signal(int sig, sig_t func);

The qsort(3) man page is another example of an archaic declaration (its authors presumably think this is the clearest way to convey qsort to an experienced programmer); check out the man page for pcap_loop(3) for a better example.

In most C code, multiple layers of indirection (3) are one-step pointers-to-pointers used to work around call-by-value in C. If you store in your head the notion that a pointer to a pointer is just there to provide a return-value argument, that's probably all you'd need to remember about it. Pointers to pointers to pointers are rare.

† C pedants: I'm using "array" in an intentionally vague sense.

daeken · on Jan 18, 2013

I make a rule of always typedefing function pointer types. I've never seen a single case where it didn't make things cleaner and more readable, not to mention that it's significantly easier to change the declarations in the future, since you rarely have a single place where you define and use a function pointer.

DannoHung · on Jan 18, 2013

I like types. I do have a question though: Why do you call it "sig_t" as opposed to "sig_h" or something. Based on its positional appearance, it's pretty clear that it's a type. Or does t stand for something else?

wrl · on Jan 18, 2013

It's a POSIX convention to indicate that it's a typedef.

stephencanon · on Jan 18, 2013

User code built against POSIX should never use the _t suffix; POSIX explicitly puts *_t into the reserved namespace (http://pubs.opengroup.org/onlinepubs/007904975/functions/xsh...), meaning that future revisions may add arbitrary type names of that form, which will break your code if you use the same name and include POSIX headers.

tptacek · on Jan 18, 2013

Thbthbthbthbthbththbthtbhtbhthbt! I say to POSIX. This convention is practically universal.

stephencanon · on Jan 18, 2013

Just so long as I don't see you filing any bug reports when POSIX changes a header and your code stops compiling. =)

spc476 · on Jan 19, 2013

That's why I use *__t (two underscores).

obiterdictum · on Jan 19, 2013

That's not ideal either.

http://www.doc.ic.ac.uk/lab/cplus/c++.rules/chap5.html

"The use of two underscores (`__') in identifiers is reserved for the compiler's internal use according to the ANSI-C standard."

Someone · on Jan 19, 2013

Are you sure that statement is correct? In my memory, it's only double underscores at the start of identifiers. My final draft of ISO C 11 seems to confirm that:

"7.1.3 Reserved identifiers 1 Each header declares or defines all identifiers listed in its associated subclause, and optionally declares or defines identifiers listed in its associated future library directions subclause and identifiers which are always reserved either for any use or for use as file scope identifiers. — All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use. — All identifiers that begin with an underscore are always reserved for use as identifiers with file scope in both the ordinary and tag name spaces. — Each macro name in any of the following subclauses (including the future library directions) is reserved for use as specified if any of its associated headers is included; unless explicitly stated otherwise (see 7.1.4). — All identifiers with external linkage in any of the following subclauses (including the future library directions) and errno are always reserved for use as identifiers with external linkage.184) — Each identifier with file scope listed in any of the following subclauses (including the future library directions) is reserved for use as a macro name and as an identifier with file scope in the same name space if any of its associated headers is included. 2 No other identifiers are reserved. If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), or defines a reserved identifier as a macro name, the behavior is undefined."

obiterdictum · on Jan 21, 2013

You are right, it appears to be C++-specific and looks like a typo in the link.

C++98, 17.4.3.1.2 Global names

- Each name that contains a double underscore (__) [...] is reserved to the implementation for any use.

stephencanon · on Jan 18, 2013

The "spiral rule" makes for pretty pictures, but unfortunately it isn't correct. One simple example:

    int *foo[3][4];

The spiral rule would have us read this as "foo is an array [3] of pointers to array [4] of int". What it actually is is "foo is an array [3] of arrays [4] of pointers to int".

The correct rule* for parsing declarations is "Start from the thing being declared, and read right until you hit a closing parenthesis, then read left until you reach the matching paren, then go back to reading right, and always skip over any tokens you've already read." This is sometimes called the "right-left rule"; it's not as pretty as "spiral", but it actually works.

(*) the really correct rule is "follow the syntax and semantics specified in the language standard", but for some reason that confuses people.

tedunangst · on Jan 18, 2013

I think it works if you treat all the [] as a single token. foo is a 3x4 two dimensional array of pointers to int.

stephencanon · on Jan 18, 2013

You can construct a rule along those lines that would work (though just taking all the [] at once doesn't suffice), but it ends up being precisely equivalent to the right-left rule, so I don't really see the point.

tedunangst · on Jan 18, 2013

I think the point is that the spiral rule is just a way to remember/implement the right left rule.

revelation · on Jan 18, 2013

But its not, there are no two dimensional arrays. There are arrays of arrays.

kzrdude · on Jan 18, 2013

Yes, but does it matter? We can call that a 2D array.

tedunangst · on Jan 18, 2013

The sizeof operator begs to differ.

sltkr · on Jan 18, 2013

How does it differ? It just gives the size of the whole object -- that doesn't mean it's not an array of arrays...

jwhite · on Jan 19, 2013

To elaborate on what tedunangst said, for those less experienced with C:

To some extent it's a matter of nomenclature, but one can draw a distinction in C between arrays of arrays and 2D arrays. It does matter. The following is an array of arrays of ints (a "ragged array" as tedunangst mentioned), if I malloc an array of pointers to ints and then add a loop that mallocs an array of ints for each a[i]:

  int **a;

  a = malloc(N * sizeof(a[0]));
  int i;
  for(i = 0; i < N; i++)
  {
      a[i] = malloc(M * sizeof(a[i][0]));
  }

Then a[i][j] picks out a single int from the array of arrays. The layout in memory is then like this:

      +--+--+--+-       -+--+
  a:  |  |  |  |   ...   |  |   (array of pointers to arrays of ints)
      +--+--+--+-       -+--+
        |   |
        |   |
        |  +--+--+--+-       -+--+
  a[0]: |->| 1| 2| 3|   ...   |  |  (array of ints)
           +--+--+--+-       -+--+
            |
            |  +--+--+--+-       -+--+
  a[1]:     |->|41|42|43|   ...   |  |  (array of ints)
               +--+--+--+-       -+--+
    .
    .	
    .

Note that you cannot rely on these arrays to have any relationship with each other in respect to their positions in memory. That's entirely up to malloc(3).

However, the following is a 2D array of ints:

  int a[3][4];

The same syntax as before can be used to pick out one of the ints in the 2D array: a[i][j]. But this time the C standard gives us that this object is laid out in memory in row-major order. So if you initialize it like this:

  int a[3][4] = { {1, 2, 3, 4}, { 5, 6, 7, 8 }, { 9, 10, 11, 12 } };

then its memory layout looks like this:

      +--+--+--+--+--+--+--+--+--+--+--+--+
  a:  | 1| 2| 3| 4| 5| 6| 7| 8| 9|10|11|12|
      +--+--+--+--+--+--+--+--+--+--+--+--+
        ^        ^  ^        ^  ^        ^
        |__a[0]__|  |__a[1]__|  |__a[2]__|

This means you can index it manually like this:

  int x = *((int*)a + i * M + j);

where M is the length of each row, in this case M==4. You can't do that with the ragged array-of-arrays.

Exercise: try taking sizeof(a) in both cases, as suggested in the grandparent.

Exercise: 3D arrays?

tedunangst · on Jan 18, 2013

Hmm, I'm not sure what to say. A two dimensional array is an array of arrays. I'm objecting to the idea that two dimensional arrays are some mythical construction.

For instance, the only way to construct a ragged array is with pointers, and then sizeof doesn't "work" to give the complete size. Also, the [][] syntax is one of the places where pointers and arrays are less equivalent than usual. You can't pass a [][] array via star star.

ralph · on Jan 19, 2013

Yes, the Right-Left rule is the one to use; I first read of it in Anderson & Anderson's _Advanced C: Tips & Techniques_. Having that simple rule, there's no need to struggle over qsort()'s declaration and others, or create unnecessary, one-use, typedefs.

malkia · on Jan 18, 2013

http://cdecl.org/

binarycrusader · on Jan 18, 2013

Cool, but it doesn't handle the last example:

  void (*signal(int, void (*fp)(int)))(int)

...claims there's a syntax error, and there might be, but I don't know off-hand.

cnvogel · on Jan 18, 2013

It's confused by the parameter name fp.

    cdecl> explain void (*signal(int, void (*)(int)))(int) ;

    declare signal as function (int, pointer to function (int) returning void) returning pointer to function (int) returning void

malkia · on Jan 18, 2013

This works:

  void (*signal(int, void (*)(int)))(int)

tantalor · on Jan 18, 2013

It says "C gibberish ↔ English", but does it really support English to C?

stephencanon · on Jan 18, 2013

Yes, for some definition of "English": http://cdecl.ridiculousfish.com/?q=declare+bar+as+volatile+p...

stephencanon · on Jan 18, 2013

cdecl is a great resource, though I do wish that it knew about the restrict keyword.

caf · on Jan 19, 2013

It misspells it "noalias".

hobbyist · on Jan 18, 2013

there is a cdecl package in all linux distributions, which can be used inside terminal to expand such C declrations. It was really helpful when I started C programming.

rwmj · on Jan 19, 2013

Unfortunately the license is not clearly free. This was why we had to drop it from Fedora.

I have tried to contact the original authors, and have got clarifications from all of them except Graham Ross.

So if anyone here knows how to get hold of him, please let me know! (rjones at redhat dot com)

lomendil · on Jan 18, 2013

I can't imagine using this for real. Perception is so much more parallel than that. If I see

  char *(*fp)(int)

I see that it matches the pattern

  returntype *(*function)(args)

It's the "* (* )()" that one sees all at once, and that signifies "this is a function pointer".

This actually reminds me of some research where kids are asked to trace the direction of interlocked gears. They start by looking at each one, but soon they move to more general strategies. Edit - this is the one, in case anyone is curious: Dixon, J. A., & Bangert, A. S. (2002). The prehistory of discovery: precursors of representational change in solving gear system problems. Developmental Psychology, 38(6), 918.

yshklarov · on Jan 19, 2013

But before the kids could use general strategies, they had to work out the details.

You're right, the spiral rule is probably unwieldy in practice. But until a programmer can recognize the patterns, the rule can be used as a learning aid.

PeterisP · on Jan 18, 2013

This is why most programmers are more productive with newer languages. Syntax and readability matter.

tptacek · on Jan 18, 2013

"Readability" has virtually nothing to do with why people are more productive in higher level languages. Pointer declarations address a problem most high level languages don't even allow developers to engage.

tedunangst · on Jan 18, 2013

This is great. Functions like signal are rare, so I just memorize what they do without reading the decl every time. Also, typedefs for function pointers go a LONG way towards readability.

The point at the end about const is good too. I see a lot of people screwing that up and consting the wrong piece.

dryman · on Jan 19, 2013

Great post. But it's lack of explanations in type qualifiers and storage specifiers. C declaration is not easy for beginners but surely is one of the most important subjects towards optimisations. Correct usage of const and restrict keywords let compiler optimise your code super fast for free. Check my post for example usage of type qualifiers and how the syntax is structured in C: http://www.idryman.org/blog/2012/10/29/type-qualifiers/

abcd_f · on Jan 19, 2013

"signal"... pfft... entry level kindergarten stuff. Try parsing array of pointers to array of function pointers taking pointer array of arrays with some const and __restrict sprinkled on. And that'd be a warm up. Then name some of the function pointer arguments _const, _restrict and _ - and now we are talking. Spiral out of that!

* This is loosely based on an interview question I got few years ago. I did answer it, then asked similar clusterfuck of a question back. Needless to say it didn't go anywhere.

ajwerner · on Jan 19, 2013

http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html

deltasquared · on Jan 18, 2013

This is why I like S-Expressions

Symmetry · on Jan 18, 2013

Yes, but also why Go does it's declarations the way it does.

ajross · on Jan 19, 2013

S-expressions don't declare types, though. A sexp grammar for the same thing would be basically isomorphic to a fully-parenthesized C type expression.

C has some glitches: the fact that the pointer type decoration goes on the left while the array decoration goes on the right is one. The precedence rule that requires parentheses for function pointers is unfortunate. And the putting the decoration on the variable and not the type is just simply a bug IMHO.

But the complexity here is inherent. Complicated types require complicated expressions no matter what grammar you use.

deltasquared · on Jan 19, 2013

I think the gist of your argument is this: both C languages and s-expression based languages are touring complete and thus isomorphic.

For any given C compiler I can write a lisp program that gives the same answers for input. Likewise for any Lisp program I can make input for a c compiler that yields the same behavior. I know this because both c and lisp are touring complete. Yay formal languages !!!

But touring completeness and isomorphism are not exactly the same thing. When you run into undefined behavior in your c/c++ program, there does not exist exactly one s-expression to cover the possible interpretations of every compiler. You may need to write several compiler specific s-expressions to cover all the behaviors from the same c expression.

These compiler specific difference break the mappings required for isomorphism.

Lisps also have two different ways of evaluating input. Lazy and strict. So one s-expression may require several c expressions to deal with edge cases. Nothing is perfect.

Sadly, either way you look at it, c expressions are not quite isomorphic to s-expressions.

I agree with you about left to right vs right to left precedence. That's actually why I prefer s-expressions to order of operations and infix notation.

Also some complexity is inherent, some is self imposed. Sometimes static typing is nice and catches errors. Sometimes dynamic typing can save a lot of keystrokes.

bproctor · on Jan 18, 2013

This is supposed to be a joke, right?