The Clockwise/Spiral Rule of C declarations

stephencanon · on Oct 23, 2016

This has been posted before[1], and the "spiral rule" is a load of hooey.

The correct rule is "follow the C grammar". An easier to remember and also correct rule is "start at the identifier being declared; work outwards from that point, reading right until you hit a closing parenthesis, then left until you hit the corresponding open parenthesis, then resume reading right..." (this is sometimes called the "right-left rule"[2]).

The "spiral rule" dances around the truth without actually being precise enough to be useful.

[1] https://news.ycombinator.com/item?id=5079787 [2] http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html

thinkMOAR · on Oct 24, 2016

I used to complain about re-posts too. But now it is the first time i see a reference to this website, which seems interesting to me. So i'm happy with new url on my to-read list.

Thanks!

colanderman · on Oct 23, 2016

Way simpler: from inside out, read any subpart of the type as an expression. (Arrays have precedence over pointers, as usual.) The type that remains is that expression's type. So e.g. given the type:

    const char *foo[][50]

the following expressions have the following types:

     foo       -> const char *[][50]
     foo[0]    -> const char *  [50]
     foo[0][0] -> const char *
    *foo[0][0] -> const char

Another example:

    int (*const bar)[restrict]
    
      bar     -> int (*const)[restrict]
     *bar     -> int         [restrict]
    (*bar)[0] -> int

One more:

    int (*(*f)(int))(void)
    
        f        -> int (*(*)(int))(void)
      (*f)       -> int (*   (int))(void)
      (*f)(0)    -> int (*        )(void)
     *(*f)(0)    -> int            (void)
    (*(*f)(0))() -> int

(In this case, the last expression is more readily written as f(0)(), since pointers to functions and functions are called using the same syntax.)

ericye16 · on Oct 24, 2016

What happened to the const in the second example?

kr7 · on Oct 24, 2016

The pointer itself is const; not what it points to.

    bar  -> const pointer to mutable array of ints
    *bar -> mutable array of ints

And const pointers are dereferenced with * , not (* const), so the rule needs an exception for const pointers (as well as volatile pointers).

colanderman · on Oct 24, 2016

That const applies only to the bar symbol itself, not to anything it points to. So once bar is dereferenced, the const doesn't matter. The beauty of this method is that it predicts that correctly without having to think about it.

cyphar · on Oct 24, 2016

Yeah, I think they confused

  const *

with

  * const

colanderman · on Oct 24, 2016

Nope, * const means that the identifier (i.e. thing to the right of the star) is const. That is, in this example, the symbol "bar" is const, not anything that it points to. So once you dereference it, the const no longer matters.

cyphar · on Oct 24, 2016

You're right, I got confused. const is read as "the thing to the right of me is const". const * means const pointer, * const means pointer to const.

wruza · on Oct 24, 2016

Vice versa. Const pointer:

    int *const p;

Pointer to const value:

    const int *p
    int const *p

>const is read as "the thing to the right of me is const"

const is one of storage classes and is read at its order, not just "to the right".

cyphar · on Oct 24, 2016

Wow. I literally was staring at my test case when I wrote it and I _still_ got it wrong. I think I need to get some more sleep...

jcheng · on Oct 24, 2016

This would make a great CLI tool!

kazinator · on Oct 24, 2016

The real rule is that the type construction operators mirror the unary and postfix family of operators (declaration follows use). For instance unary * declares a pointer, mimicking the dereference operator, and postfix [] and () declare functions, mimicking array indexing and function call.

To follow the declaration you make use of the fact that postfix operators in have a higher precedence than unary, and that of course unary operators are right-associative, whereas postfix are left-associative (necessarily so, since both have to "bind" with their operand).

So given

   int ***p[3][4][5];

we follow the higher precedence, in right to left associativity: [3], [4], [5]. Then we run out of that, and follow the lower-precedence * * * in right-to-left order.

If there are parentheses present, they split this process. We go through the postfixes, and then the unaries within the parens. Then we do the same outside those parens (perhaps inside the next level of parens):

   int ****(***p[3][4][5])[6][7];
               1 2  3  4
            765
                           8  9
       1111
       3210

Start at p, follow postfixes, then unaries within parens. Then the postfixes outside the parens and remaining unaries.

The result is in fact a spiral just from going root postfix unary out postfix unary out. We just don't have to focus on the spiral aspect of it.

robertelder · on Oct 23, 2016

I used to think of the spiral rule as being a good guide, but then a commenter on HN showed me otherwise:

https://news.ycombinator.com/item?id=12053206

userbinator · on Oct 24, 2016

The real credit goes to Linus Torvalds, as I linked in that post, but I'll repeat the link again here:

https://plus.google.com/+gregkroahhartman/posts/1ZhdNwbjcYF

_ij0r · on Oct 23, 2016

The rule is misleading in cases like:

    int* arr[][10];

Spiral rule would state "arr is an array of pointers to arrays of 10 ints", where actually it would be "arr is an array of array of 10 pointers to int".

Instead, when you write declarations, do it from right-to-left, e.g.:

   char const* argv[];

"argv is an array of pointers to constant characters"

It doesn't help with reading, unfortunately.

clusmore · on Oct 23, 2016

I think the advice that helped me the most was "Declaration follows usage".

  int* arr[][10];

If you index twice into arr and then dereference, you'll get an int. So arr must be an array of array of pointer to int.

convales · on Oct 23, 2016

Declaration follows usage is much more easier to follow than the artificial spiral rules, IMHO with a few typedefs the declaration follows usage can make things pretty simple.

dllthomas · on Oct 24, 2016

To my mind that's one part of the reason the * belongs next to arr rather than next to int.

The other part is `int* x, y`.

vostok · on Oct 23, 2016

For this reason I strongly prefer writing

    char const

rather than

    const char

Is there a reason to prefer the second version? It's a lot more popular in my experience.

Spivak · on Oct 23, 2016

It's the difference between "declare a constant integer" and "declare an integer constant" and to me the former more accurately represents what you're doing since `const` is modifying `int`, `int` isn't modifying `const`.

kccqzy · on Oct 24, 2016

Putting const on the right makes more sense when you have pointers or references. Then you just always read from right to left: `int const ` is a pointer to constant integer whereas `int const` is a constant pointer to integer.

Also your argument about which modifies which is strongly anglocentric: there are plenty of people whose native language puts modifiers after the things they modify.

gtrubetskoy · on Oct 23, 2016

Contrast the first example with Golang:

  str [10]*byte

which reads exactly as it is declared: "str is an array of length 10 of pointers to byte" (byte is Go equivalent of C char (mostly)).

iopq · on Oct 24, 2016

Or Rust:

    let string: [&u8; 10];

string is an array of references to unsigned integers of 8 bits of length ten

Ericson2314 · on Oct 24, 2016

Actually it's a semicolon.

iopq · on Oct 24, 2016

Whoops, fixed

kccqzy · on Oct 24, 2016

It can also be simpler if you use idiomatic modern C++ with std::array<T, n> and std::function<R(T1, T2)>.

userbinator · on Oct 24, 2016

Looking at "idiomatic modern C++", I am often at a loss for words at what lengths they've gone to in order to reinvent things while greatly obfuscating them in the process. Is there a std::pointer_to<T> too? I don't know, but something like this

    std::array<std::pointer_to<byte>, 10> str;

certainly does not look any more readable to me than

    byte *str[10];

. (Disclaimer: I mainly work with C, but find some C++ features genuinely useful, although the majority of the time they seem more like absurd complexity for the sake of complexity.)

physguy1123 · on Oct 24, 2016

I've never seen nor heard of pointer_to ever being used to declare a pointer to something. I believe t's used inside of custom allocators for a generic type that might not use a normal pointer as the pointer type, but would never be used for normal declarations like this.

std::array is useful for letting the compiler avoid array-to-pointer decaying, value semantics, and also actually putting array length type info in a function parameter.

detrino · on Oct 24, 2016

std::array does not exist because it is easier to read. It exists because C arrays behave strangely. Two examples: decay to pointer and no value semantics.

Jach · on Oct 24, 2016

Or Nim:

    var str: array[10, ptr byte]

(and much richer types)

Edit: and while I'm here, Nim has other sensible syntax for this low level stuff...

    var b: byte = 10
    str[0] = addr b
    echo $str[0][]

userbinator · on Oct 24, 2016

I am not familiar with Go, and have heard many praises of its declaration syntax, but is its dereference operator postfix? That would make sense in such a case.

On the other hand, IMHO the whole "make declarations read left-to-right" idea is misguided --- plenty of other constructs exist in programming languages which simply can't be read left-to-right, but are nested according to precedence. I mean, you might as well make 3+4*3 evaluate to 21 if you want to try making everything consistently left-to-right, but I don't really see anyone complaining about not being able to understand operator precedence...

dsymonds · on Oct 24, 2016

Go's defererence operator `*` is prefix, like in C.

The point here is that type declarations are regular to read, and those tend to be the tricky ones. Expressions tend not to be so difficult, and are more commonly factored if they become complex. For various reason, type declarations are not so practically factorable.

vorg · on Oct 24, 2016

When I came to Go, I hadn't used C or C++ for over a decade, only Java and C# in between. Using explicitly written pointers came flooding back, but the new "C for expressions, Pascal for declarations" syntax still takes getting used to.

Declaring `v * T` means we can write `* v` as an expression, so the use of token * is synchronized for both these uses, but I must vocalize the * in my head differently:

  `*T` vocalizes as "pointer to something of type T"
  `*v` vocalizes as "that pointed to by variable v"
  `&v` vocalizes as "pointer to variable v"

So my thought process when I see * goes: If it's in a type, say "pointer to", otherwise say the opposite of "pointer to", i.e. "that pointed to by". It feels like an inconsistent use of * whenever I'm writing Go code -- even though I know it's a natural result of Go using Pascal-style declaration syntax but C-style tokens.

Peaker · on Oct 24, 2016

But then you lose C's nice property that declaration and use are the same syntax.

For example, D also uses a similar type syntax, so in D if you declare:

  int[10][20] x;
  x[19][9] // is legal

In C:

  int x[10][20];
  x[9][19] // is legal

I think the correct solution would have been to make pointer syntax post-fix like the arrays and functions, so that you get the best of both worlds. Go-like declarations and C-like matchup between use and declarations.

fazkan · on Oct 24, 2016

I read a paper somewhere from dennis ritchie, where he explained the development of C language, the pros and cons; and in there he mentioned that reading complex declarations is a problem in C, he said that if we had placed the * operator to the left of the type it was qualifying then it would have been easier to write and understand more complex declarations.

(PS. Golang has the right idea, since its developed by the guys who contributed to C)...

hilop · on Oct 24, 2016

Go fixes a lot of C language design bugs, and the only cost you pay is (sometimes important) garbage collection and extreme memory layout control.

fazkan · on Oct 25, 2016

I guess thats a given with the amount of ease and fast prototyping that it provides, it had to have taken a lot of decisions beforehand for you...

nine_k · on Oct 23, 2016

What makes me wonder is why C ended up with such a syntax. That is, its contemporary, Pascal, has a very straightforward, unambiguous syntax.

colanderman · on Oct 24, 2016

The type syntax exactly matches the expression syntax used to destruct values of the type. It is very intuitive once you realize this.

The alternative would be for the type syntax to mirror the expression syntax used to construct values of the type. Functional languages tend to do this, particularly ones which prefer pattern matching over destructors.

jeltz · on Oct 24, 2016

Yeah, the rule is intuitive when I am writing the code, but I think the type declarations in other languages like Go are easier to read correctly when I am skimming through the code even though I am much more used to C. I am not sure how useful this mirroring of usage is in practice.

jwatte · on Oct 23, 2016

The C syntax is also unambiguous, and if you actually "get" C, it's what you'd naturally expect it to be.

pjmlp · on Oct 24, 2016

unambiguous?

Have you ever had to write a C parser?

Peaker · on Oct 24, 2016

Yes, it is unambiguous, even if it is context-sensitive.

If you don't have the available type names then it becomes ambiguous.

ajarmst · on Oct 24, 2016

Seems overly complex. The way I learned it, and now teach, is to read the type backwards (int const * is 'pointer to const int') for const correctness, but anything that requires more complex parsing by a human should just be typedef'd into submission.

jacquesm · on Oct 24, 2016

One problem with the web is that it will remember wrong information just as well as it will remember correct information.

generic_user · on Oct 24, 2016

While cdecl was probably written before some of you where born it still does precisely one thing and does it well.

Cdecl (and c++decl) is a program for encoding and decoding C (or C++) type declarations.

http://linuxcommand.org/man_pages/cdecl1.html

userbinator · on Oct 24, 2016

Notably, one of the exercises in K&R (with a solution provided) is to write a mostly complete version of cdecl, which I think is great for dispelling much of the "magic" and increasing the understanding of how declarations are actually parsed.

generic_user · on Oct 24, 2016

That exercise was and probably still is above my pay grade.

oneofthose · on Oct 24, 2016

And online: http://cdecl.org/

sugarfactory · on Oct 24, 2016

Once I tried to figure out how to parse complex C declarations just by reading the specification (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), that is, without consulting guides for layman like this. But I gave up. I looked at what seemed a BNF-like description of the C grammar but I had no idea what it tells about the parsing rules. So I ended up using this guide: http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html With this I managed to implement an imitation of cdecl.

Chinjut · on Oct 24, 2016

What is the value of bending ourselves to fit a confusingly designed old language, rather than bending the language to fit us? The very fact that articles like this have to be written indicates a failure of user interface design, which we needn't forever perpetuate.

lisper · on Oct 23, 2016

Things break down utterly in the presence of typedefs. What is this?

    foo(*baz(bing,boff(*bratz)(biff)))(buff);

evincarofautumn · on Oct 24, 2016

The “spiral rule” is just an approximation of the actual rule as defined in the standard: declaration follows usage.

Even with typedefs, that declaration means “when you call baz with a bing and a pointer (named bratz) to a function of type boff(biff), then you get back a pointer to a function of type foo(buff).”

It’s an extremely concise notation for expressing type information without (much) special type syntax, and I think it’s quite elegant in that way.

lisper · on Oct 26, 2016

I'm impressed. How did you know where to start?

evincarofautumn · on Oct 30, 2016

In C, the statement “type declarator;” is an assertion that “declarator” has the type “type”. In other words, if you read “declarator” as an expression (more or less), then it should have the type “type”. So here:

    foo (*baz(bing, boff (*bratz)(biff)))(buff);

“foo” is the type, and the rest is the declarator. Then you just break it down according to the usual precedence rules:

    baz(…)

“baz” is a function…

    baz(bing, …)

…which takes a “bing”, and…

    *bratz

…a pointer (arbitrarily named “bratz”)…

    (*bratz)(biff)

…to a function which takes a “biff”…

    boff(*bratz)(biff)

…and returns a “boff”…

    *baz(…)

…and “baz” returns a pointer…

    (*baz(…))(buff)

…to a function taking a “buff”…

    foo (*baz(…))(buff)

…and returning a “foo”.

With typedefs for function pointer types:

    typedef boff (*bratz_t)(biff);
    typedef foo (*baz_ret_t)(buff);

    baz_ret_t baz(bing, bratz_t);

Or for function types:

    typedef boff bratz_t(biff);
    typedef foo baz_ret_t(buff);

    baz_ret_t *baz(bing, bratz_t *);

hilop · on Oct 24, 2016

The first red flag is that the rule says "clockwise" where there s clearly to way to distinguish clocwise from anticlockwise inside the code. Only the completely arbitrary choice of up/down direction of the drawing affects clockwiseness.

It's been 20years(!) Why is this incorrect advise still up at c-faq?

mrcactu5 · on Oct 24, 2016

symbols with equal amount of open and close parentheses in order are counted by Catalan numbers

    (())()(()())(())

these count different arrangement of parentheses for function application. this guy is describing something like contour integration for computer programs

faehnrich · on Oct 24, 2016

Or, as the book Expert C Programming says, declarations in C are read boustrophedonically.

Buge · on Oct 24, 2016

I always heard of the right-left rule, which seems simpler and more accurate to me.

http://ieng9.ucsd.edu/~cs30x/rt_lt.rule.html

mtrycz · on Oct 24, 2016

I'd take the striking simplicity of a lisp anyday compared to this mess.

Yeah, I know, I'm not good enough, I didn't study enough, I'm not enlightened enough. But why make things so overly comples in the first place?

marcv81 · on Oct 24, 2016

This is the reason why golang declares the identifier before the type and the return value at the after the function parameters. This allows parsing any declaration from left to right.

xyzzy4 · on Oct 24, 2016

Just don't make complex declarations in C, it's almost never useful and won't help anyone out. It'll confuse people and make your code write-only. Just put in a couple extra lines of code somewhere if you have to. It won't be the end of the world.

userbinator · on Oct 24, 2016

It'll certainly confuse people, but only those who aren't qualified to be doing anything with the code anyway.

"complex" is subjective. It reminds me of stupid "rules" like "don't use the ternary operator", "every function must be less than 20 lines" (I am not exaggerating --- this was on a Java project, however); and you could easily extend that to "every statement must have a maximum of one operator", "you must not use parentheses", "you must not use more than one level of indirection", etc. Where do you stop? To borrow a saying from UI, "if you write code that even an idiot can understand, only idiots will want to work on it." I don't think we should be forcing programmers to dumb-down code at all.

That said, I'm not advocating for overly complex solutions, and will definitely prefer a simpler solution, but you should know and use the language fully to your benefit.

yitchelle · on Oct 24, 2016

>> It'll certainly confuse people, but only those who aren't qualified to be doing anything with the code anyway.

If the complexity can be avoided, why not avoid it. Removing complexity is not the same as dumb-downing code. It will improve readability and maintainability.

This mindset is defintitely applicable to declaration as well as code construct.

edit: clarity.

userbinator · on Oct 24, 2016

Note the last sentence of my comment. I am not advocating unwarranted complexity at all, but just saying that there are cases where an increase in local complexity can reduce overall complexity of the system, and you should not be afraid of using the language to the best of your ability.

CamperBob2 · on Oct 24, 2016

It'll certainly confuse people, but only those who aren't qualified to be doing anything with the code anyway.

And people wonder why there are so many broken C programs out there...

fnj · on Oct 24, 2016

"if you write code that even an idiot can understand, only idiots will want to work on it."

To me, that makes about as much sense as when Ricky Bobby in Talladega Nights says "If you ain't first, you're last."