A convenient untruth: Array notation in C is a lie

cessor · on Dec 22, 2016

The equivalent in C# always made more sense to me (not comparing memory or allocation model between the languages, but simply syntax in relation to a person reading it):

    int[] arr = new int[5]; // C#

    int arr[5]; // C

The fact that the brackets go on the datatype always made more sense to me, after all, I want to refer to memory of a certain cell size (as indicated by int). I realize that there is a lot of stuff going on when using new, but i believe even in C it should read, because you effectively change the datatype (to be of type pointer, rather than int).

    int[5] arr;

But then, I will now duck and get the hell out. This is the same reason why it feels wrong to write:

    int *arr;

You're not changing the identifier, you're trying to change the datatype.

In summary, I believe that the syntax to fiddling with pointers in C is very misleading, and this I fully agree with the article, but as many people are accustomed to this notation, I will now duck and get far away from the internet, in fear of all the hateful comments explaining to me how I am wrong, and apparently just don't understand the superior beauty of complicated C syntax. I will now go and check my garbage collected privilege.

bluetomcat · on Dec 22, 2016

> I will now duck and get far away from the internet, in fear of all the hateful comments explaining to me how I am wrong, and apparently just don't understand the superior beauty of complicated C syntax.

I am going to be that person :-)

One phrase – "declarations mirror use". In a declaration, you use the same set of operators around the declared object that you would use in a normal expression. All of these operators (asterisk, `[]` and `()`) have the exact same precedence and associativity as in the rest of the language. The type specifier(s) on the left is the final type of the expression that you get after applying all of the operators in the correct order as per the precedence/associativity rules.

So when you see:

    char *arr[X]

You identify the identifier first: `arr`. Then, because `[]` takes precedence over the asterisk, you say that `arr` is an array (of size X) of pointers to `char`. In other words, the expression `*arr[some_index]` is of type `char`.

mannykannot · on Dec 22, 2016

To be fair, while declaration mirrors use, the use contains a few traps for the beginner.

If you have written some assembler, you will see where C is coming from. It just occurred to me that most early C programmers were probably proficient In assembly.

mnarayan01 · on Dec 22, 2016

I don't know if "declarations mirror use" gets you all the way there. You can use:

  *some_index[arr]

(though obviously not saying you should), but you can't declare:

  char *5[arr];

Obviously there's good reasons for that, but then you have "declarations mirror use, except when there's good reason not to", which basically brings you back to the original question.

wildmusings · on Dec 22, 2016

Finding an exception doesn't invalidate the explanation. Being able to switch the index and array is more an accident of how C implements arrays than an actual use case.

chc · on Dec 22, 2016

> char *5[arr]

Isn't that exploiting a quirk of the operator's definition, rather than how the operator is intended to be used?

mnarayan01 · on Dec 23, 2016

I assume you're right, though I would love to see some of the original discussion around it. That said, while I don't think it's a catastrophic point against "declarations mirror use", I do think it's a strike against it.

Edit: I guess I'll also note here just for fun that while

  char [5]arr;

seems reasonable enough,

  char [4][5]arr1;
  char[5][4] arr2;

seems less so.

sqeaky · on Dec 22, 2016

I have been writing C and C++ for a little more than 10 years now and this is the first I have heard "declarations mirror use". I understood that was the case, but why is the phrase important.

To me understanding the actual types involved is more important than getting the declaration to look some specific way. (So I always jammed the star next to type and did whatever else I though would most ease expressing types)

I am also the kind of developer who who have declared it as:

std::array<std::string, X> arr;

or

std::vector<std::string> arr;

depending on if X was known at compile time. Because I want to give the compiler as many chances to call out my mistakes as possible.

dkersten · on Dec 22, 2016

I think if all type-information were kept together, it might read simpler:

    char*[X] arr;

You read strictly left to right: char pointer array of size X called arr. Basically, it takes a simple type (char in this case) and for each thing to the right, wraps it in something. You could read it as: given a char, we have a pointer to it, and an array of size X of these pointers.

It would be interesting if the use did "mirror" declaration:

    arr[5]*

That is, take element 5 of arr and derefence it.

Just some random thoughts...

bluetomcat · on Dec 22, 2016

> You read strictly left to right

That wouldn't play well with pointers to arrays and pointers to functions:

    int (*pa)[X]; // int(*)[X] pa;
    int (*fp)(void); // int(*)(void) fp;

Here is how I designed the type declaration syntax in my own language, Quaint (https://github.com/bbu/quaint-lang):

    pa: ptr(int[3]); // int (*pa)[3]
    fp: fptr(): int; // int (*fp)(void)
    arr: int[5]; // int arr[5]
    p: ptr(int); // int *p
    pp: ptr(ptr(int)); // int **pp
    arrp: ptr[5](int); // int *p[5]

You basically have the identifier first, and then a recursive expression which uses mostly function-call-like expressions in order to nest recursive types.

_cx2w · on Dec 22, 2016

> char pointer array of size X called arr

That does not even English.

contravariant · on Dec 22, 2016

Surely it would still mirror use if the operators were applied to the type, rather than the identifier?

Preferring 'char ٭arr[X]' over '٭char[X] arr' seems arbitrary to me. I see no reason the 'declaration mirror use' principle can differentiate between the two.

Personally I prefer the latter since it makes it easy to separate the type from the identifier.

bluetomcat · on Dec 22, 2016

> Surely it would still mirror use if the operators were applied to the type, rather than the identifier?

The "type" is actually a list of storage class specifiers (static, extern, auto, register, typedef, _Thread_local), type qualifiers (const, volatile, restrict) and type specifiers (int, char, float, double, signed, unsigned, long, short, void). Imagine the soup of keywords that would have to be at the deepest level of the expression:

    static const volatile unsigned char *arr[X];
    // vs
    *static const volatile unsigned char[X] arr;

Nullabillity · on Dec 23, 2016

static, const, and volatile modify the variable itself, not what values it's legal for it to contain. So the outcome should be:

    static const volatile *unsigned char[X] arr;

Or, because signedness should be a property of the type, not an arbitrary modifier:

    static const volatile *char[X] arr;

_cx2w · on Dec 22, 2016

'char ٭arr[X]' over '٭char[X] arr'

What is this dirt after char and before char? This wouldn't even compile.

bluetomcat · on Dec 22, 2016

HN converts asterisks to italics and I too couldn't find a way to escape them, except when surrounded by backticks `*`. Not a C-friendly discussion forum :-)

bogomipz · on Dec 22, 2016

I've heard this phrase - "declarations mirror use" many times before but it just doesn't click for me for some reason.

Who's use? The compiler's or the programmer's? Can you elaborate? Sorry if this silly question but I've scratched my head enough on hearing that phrase that I thought I would ask. Cheers.

bluetomcat · on Dec 22, 2016

The meaning of "declarations mirror use" is that the programmer can use the declared variable in the exact same way as being declared (by applying the exact same operators):

    int **p; // declares "p" as a pointer to a pointer to int
    int x = **p; // the "**p" expression mirrors the declaration and its type is "int"

This is true even for functions:

    int *f(int a); // declares a function which takes an int and returns a pointer to int
    int res = *f(3); // the "*f(3)" expression mirrors the declaration and its type is "int"

Arrays are obvious when looked that way:

    int arr[5][5];
    int elem = arr[1][2]; // the "arr[1][2]" expression mirrors the declaration

bogomipz · on Dec 22, 2016

Thanks for the clear explanation. This makes sense. I do have a tangential question.

I have read that the reason for this declaration syntax is that it was easier form compilers and compiler writers to parse.

How does this declaration syntax facilitate parsing exactly? I fail to see why this is.

bluetomcat · on Dec 22, 2016

The advantage is that you don't introduce any new operators or syntactic forms which would only be used in declarations. This means a smaller set of tokens and a smaller set of syntactic forms. Being able to parse an expression like

    (*arr[i])(42)

means that you can use pretty much the same machinery to parse

    int (*arr[SIZE])(int a)

bogomipz · on Dec 22, 2016

This is awesome. Thank you.

cessor · on Dec 23, 2016

Thank you for clarifying. I have never heard of "declaration mirrors use" (DMU) before. Also thank you for explaining it in a non-hateful way (which isn't always the tone when touching religious matters). Next up: Tabs vs. Spaces ;)

This sounds really odd to me. I can see, and others have pointed this out in the comments, too, that DMU might facilitate building a compiler for such a language and reuses several operators. However, I am not sure whether I like this goal from a person-centric perspective.

Experts might be used to matching declaration and use structures in their code, maybe to make it more 'symmetric', but I am sceptical whether this match is actually quite 'expensive' when developing code with many people of different skills.

Use and declaration are differnt concepts, and as such should have different representations (i.e., syntactically), otherwise you might introduce synonym defects (i.e., ambiguities, where it becomes hard to understand what somebody means). I found that some of my colleagues and students had a hard time to learn what pointers are about (and I believe this is a fairly common phenomenon, that people find pointers in C hard to grasp), because the asterisk is used both in declaration and in dereferencing. I found that some of my students got the concept more easily after I introduced a couple of macros:

    #include "stdio.h"

    #define IntPointer int*
    #define value_of(ptr) *ptr 
    #define address_of(v) &v

    int main() {
        int a = 42;
        IntPointer p = address_of(a);
        printf("%i\n", value_of(p));
        return 42;
    }

The above code has no other purpose than to separate the concepts verbally, so that they can be reasoned about explicitly. This can be achieved in other ways, for example, in C++ I tend to use templates or classes to abstract the pointer syntax away. Of course, I am aware that using classes introduces overhead and this technique is not necessarily feasible when you need as much performance as you can get.

This also underlines my original point to place the asterisk on the datatype rather than on the identifier, because the #define wouldn't make sense otherwise.

In summary, I was unaware that DMU was a design goal, but I suspect it to make the language more difficult to learn, code harder to read, altough this effect might not be an issue for experts.

I am not trying to convince anybody that DMU is "bad", but I am interested in the actual properties. Everybody making claims for readability owes the community an empirical evaluation. Disclaimer: I am currently working on a research project about program comprehension, including some eyetracking and FMRI work, providing exactly such evaluations. :)

bluecalm · on Dec 22, 2016

I don't agree with:

    int[5] arr;

being a more readable syntax. The problem is that if you later have:

    int[7] arr2;

It would make sense that arr and arr2 are different types (as the thing on the left is different) while they are the same type and you should be able (hopefully!) to use them as arguments to the same function.

On the other hand I agree that:

    int* arr;

makes more sense than more commonly used:

    int *arr;

although it's messed up anyway as:

   int* a, b, c, d;

will result in a surprise.

dkersten · on Dec 22, 2016

although it's messed up anyway as

But that's because they wanted it to be like that.

To me, this code is very unclear:

    int a, *b, c, d;

because everywhere else, multiple declarations in one statement all have the exact same type, but for some reason pointers get special treatment so that you can declare variables of type X and variables that are pointers to type X in the same statement, which just seems odd to me.

I'd be much happier if they were all strictly different and that

   int* a, b, c, d;

did, in fact, declare all four variables to be pointer to int.

tzs · on Dec 22, 2016

> [...] everywhere else, multiple declarations in one statement all have the exact same type, but for some reason pointers get special treatment so that you can declare variables of type X and variables that are pointers to type X in the same statement, which just seems odd to me.

It's not just pointers:

  int a, *b, c();

odbol_ · on Dec 22, 2016

Jesus and you guys say JavaScript is bad...

gpderetta · on Dec 22, 2016

oh, don't worry, we also say that C is bad. C++ also gets an honourable mention.

comex · on Dec 23, 2016

I've occasionally written code like:

    char buffer[16], *ptr = &buffer[0]; // declare an array and a pointer into it

ant6n · on Dec 22, 2016

I'd say it's better style anyway to only ever declare multiple primitive types in the same statement.

SomeStupidPoint · on Dec 22, 2016

I mean, int[5] and int[7] would be different types, and lots of bugs happen from passing something besides the appropriate array type to a function.

That being said, most languages which disambiguate between int[5] and int[7] provide some kind of polymorphism (and usually store it as a struct of size + data, to enable that).

For example: you can define a first function that goes from t[N]->t pretty easily, and it would operate on both int[5] and int[7] (returning an int).

pasquinelli · on Dec 22, 2016

Right, in a language with dependent types there is a type-level difference between int[5] and int[7], but c is not such a language, therefore using a syntax that encourages the mistaken notion that there is a type-level difference between int[5] and int[7] would be misleading.

kibwen · on Dec 23, 2016

A language doesn't need dependent types to make fixed-size arrays of differing size count as incompatible types. Rust does this, for example:

  // error: expected an array with a fixed size of 5 
  // elements, found one with 7 elements
  let x: [i32; 5] = [1,2,3,4,5,6,7];

pasquinelli · on Dec 24, 2016

i don't know rust, but i read a bit about fixed size arrays in it and the fact that 32 is the largest fixed size array makes me suspicious that it works like a pair. you can do this without depending on a value, because the length is encoded in the type.

like, you can have a type (a, b), and b can, of course, be of type (a, b), b again having type (a, b), and so on. then you always carry around the length encoded in the type, and it can be checked like above.

ghc has a limit on tuple sizes, and haskell makes no type distinction between [a] based on the number of elements in it, and it isn't dependently typed.

but again i don't know rust.

kibwen · on Dec 24, 2016

  > i read a bit about fixed size arrays in it and the fact 
  > that 32 is the largest fixed size array

This is exceptionally mistaken, where did you read it? Arrays in Rust top out at the maximum value of a platform-sized pointer, which is either 2^32 or 2^64 depending on platform.

pasquinelli · on Dec 24, 2016

"Arrays of sizes from 0 to 32 (inclusive) implement the following traits if the element type allows it:

Clone (only if T: Copy) Debug IntoIterator (implemented for &[T; N] and &mut [T; N]) PartialEq, PartialOrd, Eq, Ord Hash AsRef, AsMut Borrow, BorrowMut Default This limitation on the size N exists because Rust does not yet support code that is generic over the size of an array type. [Foo; 3] and [Bar; 3] are instances of same generic type [T; 3], but [Foo; 3] and [Foo; 5] are entirely different types. As a stopgap, trait implementations are statically generated up to size 32."

from the rust docs. https://doc.rust-lang.org/std/primitive.array.html

i sort of leapt to the conclusion that these traits couldn't be implemented because generically because the type system requires them to be implemented for each size of N, and they provide the first 32 as a nicety. the similarity to the situation with tuples in haskell, (http://stackoverflow.com/questions/2978389/haskell-tuple-siz...), and the fact that rust doesn't have dependent types and so a type couldn't depend on a value, and i just kind of guessed at a possible reason.

kibwen · on Dec 25, 2016

Ah I see, yes the explanation there is correct. The types exist up to ginormous sizes, but the standard library only implements certain convenience traits for certain sizes (though using newtypes, you can implement those traits yourself for any array size you want). The specific feature we're lacking is type-level numerals, which is on the way towards but not anywhere close to a dependent type system AIUI.

Nullabillity · on Dec 23, 2016

Saying int[5] and int[7] are the same type because the compiler doesn't enforce the difference is like saying JavaScript is untyped because there is no compiler to enforce types at all.

Regardless of what the spec or the compiler says, you the programmer absolutely need to treat them separately. Especially in a language like C, where arrays aren't even self-describing at runtime.

pasquinelli · on Dec 24, 2016

no it isn't. javascript has types; they exist only at runtime, but they're there. it's patently false to say it's untyped. but type differences codify certain classes of differences, and it really isn't that crazy or unusual to say that the length of a vector, array, list, whatever you want to call it, isn't a difference of type. to say that there's a type difference even if the type checker disagrees just means you and the type checker are using different type systems.

Vogtinator · on Dec 22, 2016

There is a small type difference in C, sizeof() of those two types is different.

flukus · on Dec 23, 2016

Are there languages where these would be different types? Could they still be passed to a function that accepts a variable length array?

netheril96 · on Dec 22, 2016

> It would make sense that arr and arr2 are different types (as the thing on the left is different) while they are the same type and you should be able (hopefully!) to use them as arguments to the same function.

arr and arr2 are indeed distinct types. They are of type int[5] and int[7]. You can see this by checking that they have different sizes, or that `printf("%s\n", std::is_same<decltype(arr), decltype(arr2)>::value ? "true" : "false");` will outputs `false` on your screen.

Both of them decay into a pointer to int, that is why people commonly confuse them with pointers. But they are pointer types; they are distinct types on their own.

pasquinelli · on Dec 22, 2016

Sizes aren't types though, and if I'm not mistaken that line of code is c++ no?

wahern · on Dec 22, 2016

On older versions of clang (including macOS 10.10 / Xcode 7.2) and, I believe, GCC this code

  #include <stdio.h>
  #define is_same(T, x) _Generic((x), T: "true", default: "false")
  int main(void) {
    int arr[5];
    int arr2[7];
    printf("arr === int[5] -> %s\n", is_same(int[5], arr));
    printf("arr === int[7] -> %s\n", is_same(int[7], arr));
    printf("arr2 === int[5] -> %s\n", is_same(int[5], arr2));
    printf("arr2 === int[7] -> %s\n", is_same(int[7], arr2));
    return 0;
  }

produces

  arr === int[5] -> true
  arr === int[7] -> false
  arr2 === int[5] -> false
  arr2 === int[7] -> true

Unfortunately either clang or GCC (I can't remember) decided the obvious behavior was wrong and changed it so that _Generic behaved as-if array expressions decayed to pointers. The C11 specification for _Generic was insufficiently precise, and for various reasons both vendors and (IIRC) the C committee are going to go with the least common denominator approach (just treat them like pointers) for consistency.

So newer versions of clang and GCC print out all false.

But another way of showing that arrays are real types is with

  #include <stdio.h>
  #define countof(a) (sizeof (a) / sizeof *(a))
  int main(void) {
    int arr[5];
    int arr2[7];
    printf("countof(arr) -> %zu\n", countof(arr));
    printf("countof(*&arr) -> %zu\n", countof(*&arr));
    printf("countof(arr2) -> %zu\n", countof(arr2));
    printf("countof(*&arr2) -> %zu\n", countof(*&arr2));
    return 0;
  }

which produces

  countof(arr) -> 5
  countof(*&arr) -> 5
  countof(arr2) -> 7
  countof(*&arr2) -> 7

on all version of clang and GCC, and should on any other conformant C compiler. Although I would think that the simple sizeof proof should suffice to show that arrays are real types, notwithstanding that their evaluation rules are peculiar.

Alas, the disaster with _Generic and array expressions only proves that the situation is less than ideal. Although part of the problem is that _Generic was a novel language feature that didn't fit neatly into the historical translation phases. IMO C++ gets a lot of things wrong about C semantics, but apparently they got decltype right (presuming the behavior is a product of a clearer specification, and that behavior is consistent across implementations).

To be fair, although inelegant the compromise behavior for _Generic makes some sense. The principle use for _Generic is to implement crude function overloading. Because arrays always decay to pointers when passed to functions, it's convenient that _Generic would capture array expressions as pointers. OTOH, it makes some useful behaviors impossible. And the convenient behavior could have been had by manually coercing arrays to pointers using a trick like:

  #define decay(arr) ((0)? 0 : (arr))
  #define is_same(T, x) _Generic(decay(x), ...)

netheril96 · on Dec 22, 2016

Yes, it is c++, but both the concept of array and type decay comes from C.

unscaled · on Dec 22, 2016

Well, had C been really strongly-typed then the hypothetical 'int[5]' and 'int[7]' would have been different types, and 'int[]' would be the size-agnostic type for an array.

Then you could easily get creature comforts such as:

  typedef uint8_t[16] guid_t;
  typedef uint8_t[20] sha1_digest_t;

These typedefs would then have copy assignment, pass-by-value semantics and everything else you'd expect from value types.

ssalazar · on Dec 22, 2016

> Then you could easily get creature comforts such as:

    > typedef uint8_t[16] guid_t;
    > typedef uint8_t[20] sha1_digest_t;

> These typedefs would then have copy assignment, pass-by-value semantics and everything else you'd expect from value types.

What does this accomplish that a struct can't?

gpderetta · on Dec 22, 2016

>It would make sense that arr and arr2 are different types (as the thing on the left is different) while they are the same type

For what is worth, they are really different types in C++ (they can still decay to pointers for compatibility with C).

edit: netheril96 pointed out exactly the same thing.

gpderetta · on Dec 22, 2016

Unfortunately this is differently wrong in C# (and many other modern languages). arr here is not an array at all, but is actually a reference to an array and has different semantics from the the corresponding C declaration.

This is not just being pedantic: using the same syntax for values and references does lead to confusion. At least C# has actual value types and 'ref' (but not value type arrays AFAIK), in Java this is still being worked on.

cessor · on Dec 23, 2016

I wouldn't agree to call it "wrong", since we're talking about languages that are trying to discourage you from doing your own memory management *(C#, that is, but I'd say this applies to any language running in a VM with gc, really). Thus, in such languages, there shouldn't be a conceptual or syntactical difference between values and references. The fact that C# allows to differentiate them from each other is a leaky abstraction imho, built in to satifsy people form a C/C++ background.

I believe that whether this is 'wrong' depends on the problem you're trying to solve. In my day to day work the difference usually doesn't matter, and I am totally happy passing around objects that others would call heavy. I am really happy that I don't have to care about references or values when doing scientific python stuff (pandas, numpy, etc), whereas I am also really happy that this stuff is important to people implementing these libraries.

braveo · on Dec 24, 2016

> Thus, in such languages, there shouldn't be a conceptual or syntactical difference between values and references. The fact that C# allows to differentiate them from each other is a leaky abstraction imho, built in to satifsy people form a C/C++ background.

It's not a leaky abstraction, it's a very specific design choice for performance reasons. Java also has this dichotomy between types, and for the same reason, although I believe they only delineate between primitive/non-primitive.

gpderetta · on Dec 23, 2016

Having first class references has very little to do with manual memory management. As you said, confusing values with references is a very leaky abstraction, especially when you have mutable data.

protomyth · on Dec 22, 2016

"The Design and Evolution of C++" by Bjarne Stroustrup has a section on alternate declaration syntax that he considered. Compatibility won out but you might find the suggestions interesting.

itsuart · on Dec 22, 2016

It was always puzzling to me why people stick to

  char *foo

instead of

  char* foo

foo is obviously a pointer to the char, not a char. It has different size and behavior.

gpderetta · on Dec 22, 2016

That's because [1]

  char* foo, bar;

doesn't mean what one would expect it to mean.

[1] as per Rule of Maximum Astonishment.

itsuart · on Dec 22, 2016

I find that

char* foo = NULL;

char bar = 0;

are preferable. Is there any reason one would want to smash N declarations into one?

gpderetta · on Dec 22, 2016

I prefer that as well, but because the other option is possible and sometimes used, a few prefer to put the star close to the name as to avoid mistakes.

cessor · on Dec 23, 2016

I agree.

Simply put, it would sense to provide

    #define CharPointer char*

and then declare

    char c = 'a';
    CharCointer p = &c;

To me this became clear when I did some C# interop, where IntPtr is a common class, such as:

    [DllImport("user32.dll")]
    static extern IntPtr CallWindowProc(WndProcDelegate lpPrevWndFunc, IntPtr hWnd, uint Msg, IntPtr wParam, IntPtr lParam);

It's just an object oriented pointer interface. It looks different. I agree:

> It has different size and behavior.

wruza · on Dec 22, 2016

Otoh,

  *foo

is obviously a char. Do you put a space between dereference operator and its operand?

itsuart · on Dec 22, 2016

No, I don't. If foo is a (storage of) pointer to something, to get that something from memory I use dereference operator `* so

  *foo = 'a';

will write value of 'a' into memory at address that is in foo.

  type* foo;

is declaration and

  *foo

is dereferencing.

posterboy · on Dec 22, 2016

Java also gets this right, probably in direct response to the crufty C syntax.

> (to be of type pointer, rather than int)

No,the array typeis not a pointer, although implicit casts to pointer occur e.g. for passing arrays as function arguments. The practical difference is ... well, I don't know.

rossy · on Dec 22, 2016

This gets weird when you compare it to how structs work in C. Both are complex data types, so I sometimes forget that array semantics are totally different to struct/union semantics. Unlike arrays, in ANSI C, structs are real value types. You can pass them by-value to functions, return them by-value from functions and assign them by-value to variables of the same type. Also, structs never work like pointers to themselves. You have to dereference the pointer or use the -> syntactic sugar to access a member of a pointer to a struct.

Value type semantics enable some neat things, for example, you can zero all the members of a struct by assigning a compound literal to it. You can also make simple structs, like three uint8_ts for an RGB colour, and use them just like they were primitive types. In comparison, it seems almost archaic that you have to break out memset() and memcpy() to zero and move arrays.

mojuba · on Dec 22, 2016

Worth noting that you can always wrap an array in a structure and kick it around in your program happily, as if it's how the language should've been.

    struct { int a[5]; } arr5 = {{4, 3, 2}};

What's a bit weird though, is that while initialization like above is possible, assignment of a contsant is not as smooth in C, you will have to typedef your structure and use it in a typecast expression:

    typedef struct { int a[5]; } Array5;
    ...
    arr5 = (Array5){{4, 3, 2}};

gpderetta · on Dec 22, 2016

You can actually drop the extra brackets (at least in C++, but IIRC also in C).

C++11 has array<T, size_t> which has proper value semantics. It is implemented exactly like your structure above, so no overhead, and also has a proper operator== and assignment via initializer list.

ScottBurson · on Dec 22, 2016

Yes, and there's another inconsistency here as well. Arrays and pointers into arrays are accessed with the same syntax:

  char a[3], *pa = a;
  ... a[0] ... pa[0] ...

In contrast, with a struct, you have to know which one you have and use the correct operator:

  struct { int x; } s, *ps = &s;
  ... s.x ... ps->x ...

To be consistent, they could either have made '.' work in both cases, or forced you to write things like '(*pa)[0]'.

jcoffland · on Dec 22, 2016

I suppose this article is tongue-in-cheek but it doesn't really demonstrate lies in the C language. It does point out some of the quirks of arrays in C but not calling them real arrays is a matter of interpretation of terms. C defined what the term array meant for a lot of the languages that followed it. That today's languages have diverged from C's definition of array is not surprising.

burntsushi · on Dec 22, 2016

From the OP:

> Of course, if you’ve read this far you’ll (hopefully) realise that this post should have been taken in jest. Arrays aren’t really a lie (any more than any of C’s constructs are). Despite all the ‘trickery’ C’s arrays work well for many, many programming tasks. They are – as the title of this article suggests – a very convenient set of untruths.

ordu · on Dec 22, 2016

These facts is not a lies. They are inconviniences which one can see looking on C from perspective of higher level language. But if you learn how to program with assembler before studing C, than all these facts would look like obvious and convinient syntactic sugar.

Maybe in C++ these facts become inconvinient, because of C++ pretending to be higher level language than crossplatform assembler. But if it is a problem, it is not problem of C, it is problem of C++.

gpderetta · on Dec 22, 2016

None of those 'lies' have anything to do with assembler. C deviated from regularity for the sake of convenience in some cases, and now we are stuck with those bad decisions forever.

ordu · on Dec 23, 2016

> None of those 'lies' have anything to do with assembler.

They do. Lets take a look at the very first one. "Array name is just a pointer".

Assebler "array" is a name of a label. So its just named address in memory. Or we can alternatively say, that assembler array is constant pointer.

`sizeof' returns size of array in bytes? Hmm... maybe that because main C abstraction for memory is the assembler one: memory is a continuous sequence of bytes? `sizeof' meant to be used for functions like malloc or memcpy, not for operator `new'. When we use some dynamic memory allocation in assembler we get pointer to untyped memory chunk, compare with C:

void* malloc(size_t size);

If you wish I can show you the connections of other 'C lies' with assembler abstractions. I'm too lazy to write about all of them, but I could write about one more if you ask. Just pick one you like more.

> now we are stuck with those bad decisions forever

Yes, you are right. We stuck with that. And it is bad. But it doesn't make my point wrong. C is crossplatform assembler, and these decisions looks pretty good from perspective of assembler. They give to programmer low level control on generated machine code while keeping code portable, and its very useful in some cases. For example when you developing an OS kernel.

gpderetta · on Dec 23, 2016

From an assembler point of view a structure of N elements of type T and an array of T[N] have exactly the same layout and are accessed in exactly the same way [1], but in C have wildly different semantics.

Sizeof behaves exactly the same way for structs and arrays, so it is one of the few things in C that treat arrays "correctly".

[1] although usually the offset is constant for a struct field access.

ordu · on Dec 23, 2016

> but in C have wildly different semantics.

Can you show it with example? I'm not sure what you mean exactly.

Look:

    struct {
        int a, b, c;
    } foo;
    int *pfoo = (int*)&foo;
    printf("%d, %d, %d\n", pfoo[0], pfoo[1], pfoo[2]);

Code like this can have problems due to alignment, but at my opinion its not "wildly different semantics".

gpderetta · on Dec 23, 2016

In addition to being UB, the example doesn't illustrate the issue: arrays in C are not first class as they can't be passed by value and can't be assigned. The decay-to-pointer thing that prevent this regularity has nothing to do with asm.

ordu · on Dec 23, 2016

> In addition to being UB,

Yes, its UB. But I'm not persuade you to use this UB in real code: in real C code use offsetof from stddef.h. The only thing I want to say is: this code would work everywhere (if you pay attention to alignment). And its not coincidence by some chance: C mimics asm, because C needs to be 100% predictable to coder. Because asm use simpliest and the most obvious abstractions, with predictable runtime costs. C also goes this way. So its inevitable for my code to work. With some precautions, but it would work everywhere.

> the example doesn't illustrate the issue...

Yes, I suggested it, and I asked you for some illustrative example, because I can't understand your reasoning from "arrays are not first class" to "nothing to do with asm". I see it other way: "arrays are not first class" is "asm mode".

gpderetta · on Dec 24, 2016

> this code would work everywhere it does not, it will be miscompiled by modern compilers.

> I asked you for some illustrative example,

  foo(T x) { x[0] = 1; }
  T x = {0};
  foo(x);
  assert(x[0] == 0);

The assertion fails for T = char[1], but succeed for T=std::array<char, 1>; You could construct a similar example in pure C.

std::array and C arrays compile down to the exact same code for access, have the exact same layout, etc, but C array are not copyable and assignable and implicitly convert to pointers without any good reason. This has nothing to do with assembler whatsoever.

ordu · on Dec 24, 2016

>> this code would work everywhere

> it does not, it will be miscompiled by modern compilers.

Sorry, due to formatting bug I overlooked this.

Can you show me example of such a modern compiler? I suspect that you mean some C++ compiler, and they probably do, they would `miscompile' my example, because they treat struct in a matter similar to a class with vtable and all other stuff. But we are speaking about C, not C++. But if I'm mistaken with my suggestions, I'd like to know about modern compiler of C which prove me wrong. Such a proof can help me to understand modern C much better.

gpderetta · on Dec 25, 2016

It is hard for compilers to miscompile this specific example as it doesn't do much at all.

The idea is that a write to pfoo[1] couldn't possibly alias with any write to foo, so the compiler should be free to reorder accesses if profitable. This is the same in C and C++ and has nothing go do with vtables.

For what is worth, I couldn't get gcc, clang and icc it to miscompile [¹] a slightly changed example, so either it is not actually UB or compilers still refrain to make this kind of optimization as it would break way too much code.

[¹] i.e. they elect to reload from the struct after writing to the array and vice versa even when it would be profitable not to do so.

ordu · on Dec 24, 2016

Okey... Now I cant understand only one thing: how do you jump in conclusions to your last sentence? If you use asm and try to pass array into function, then you will pass address of array, not a copy of array on stack. Looks similar to C behaviour, isn't it?

gpderetta · on Dec 25, 2016

Whether you copy or pass by reference has everything to do with the language semantcis, ABI and calling convention and nothing to do with asm.

For example, if you look at the generated asm, C on amd64 will happily pass a struct by copy in registers, but will pass an array by address.

The designers of C decided to give arrays pass by reference semantics and struct pass by value [1]; this was done because is convenient: you often want to iterate through arrays and pointers are the most generic way, but it does make arrays not first class.

[1] admittedly traditional C couldn't pass structs at all.

Animats · on Dec 22, 2016

I once proposed a backwards-compatible way out of this: "Safe Arrays for C"[1]. The fundamental problem with arrays in C is that compiler has no idea how big they are. My proposal was to replace

    int read(int fd, char buf[n], size_t n);

with a safe form

    int read(int n; int fd, char (&buf)[n], size_t n);

This says that the size of "buf" is "n", which comes in as another parameter. There are no array descriptors; the generated code for a call is the same. Thus, this is backwards-compatible, allowing mixing of "regular C" and "safe C" modules.

The programmer has to know how big the array is, after all. There must be some way to compute the array size from other variables or constants, or the program has no hope of working. All C needs is a way to allow the programmer to say that in the language. Then subscript checking is possible. Buffer overflows can be eliminated.

The required changes to C are minor. The big one is adding C++ references. Instead of passing a pointer to the first element of an array, you pass a reference to the array. Same object code, but now arrays are first-class objects.

This was discussed at length on the C standards digest back in 2012. After many revisions, the conclusion was that it was technically feasible, but too difficult politically.

[1] http://www.animats.com/papers/languages/safearraysforc43.pdf

chrisseaton · on Dec 22, 2016

> The fundamental problem with arrays in C is that compiler has no idea how big they are

Isn't that the compiler's choice, to not know how big they are? You could write a standards compliant implementation of C that did track how big arrays were if you wanted to couldn't you?

Animats · on Dec 22, 2016

That's been done, with "fat pointers". GCC used to have an option for that, but it wasn't used much. The overhead is all at run time and is substantial.[1]

[1] http://www.imperial.ac.uk/pls/portallive/docs/1/18619746.PDF

kazinator · on Dec 22, 2016

> Why is this [array assignment] failing? Because the array’s name is a lie! Using a variable as an expression normally yields its value, but in the case of arrays the array name yields a pointer (to the first element; which is at least reasonable)

Sorry, no; the assignment fails because an array isn't a modifiable lvalue. Array assignment simply isn't supported.

If array assignment were supported the array-to-pointer conversion ("decay") could be suppressed in that case to make it work. Just like it is suppressed when an array is the operand of sizeof of & (address-of).

Assignment of arrays is supported when they are struct/union members:

  struct wrapper {
    int a[5];
  } x = { 0 }, y = x;

We can return this from a function, too.

spacelizard · on Dec 22, 2016

Brings back fond memories of the first time I learned C, when I really had to dig into what the difference was between storage durations (auto/stack, dynamic/heap, static, thread local). It makes it increasingly important to think about when an object is going to be stored, and for how long. To me this is still a really useful concept that most high-level languages seem to have all disregarded in favor of extremely eager GC or refcounting, with the exception of Rust.

bluetomcat · on Dec 22, 2016

C has a really simple model with regard to storage durations, but the usage/omission of the respective keywords (static, extern, auto) is what makes it hard for beginners, IMO.

`static` means different things at file scope and at block scope, `extern` is redundant most of the time (except when linking to an object from another unit), `auto` is 100% redundant and is a leftover from the days when `int` was implied for every declaration.

valarauca1 · on Dec 22, 2016

    most high-level languages seem to have all disregarded in
    favor of extremely eager GC or refcounting, with the 
    exception of Rust.

Because Rust does a good job of hiding the fact it isn't a high level language.

Sure you have ML type checking, pointer safety rules, multiple returns. But if you can see past the syntax sugar you realize it is just C (with guardrails).

Cursuviam · on Dec 22, 2016

High level abstraction or syntax sugar?

Terrorist or freedom fighter?

cmollis · on Dec 22, 2016

the twist with gc or refcounting is that you actually still need to think about memory management.. in many ways, unfortunately, it becomes harder to control. An advantage of user managed memory is that you always need to think about it, with every line you write so if there is an issue, it will become apparent rather quickly. Buffer overruns, however, are never much fun.. and largely avoided with auto memory management so it's clearly useful in most cases. At least with most of boring software that I write.

logicallee · on Dec 22, 2016

>Brings back fond memories of the first time I learned C, when I really had to dig into what the difference was between storage durations (auto/stack, dynamic/heap, static, thread local)

If those are your fond memories, I'd hate to hear your traumatic ones. :P

pdog · on Dec 22, 2016

Why did people think otherwise? Chapter 5 of K&R[1] is titled Pointers and Arrays and basically explains that arrays and pointers are equivalent.

[1]: https://www.amazon.com/Programming-Language-Brian-W-Kernigha...

notacoward · on Dec 22, 2016

The point of the article is that sometimes they're not equivalent, and that creates a lot of confusion. Please read it before commenting on it.

kazinator · on Dec 22, 2016

Arrays and pointers are never equivalent. One is a clump of like-sized objects allocated consecutively; the other is a referential type indicating the location of an object or function.

ams6110 · on Dec 22, 2016

I thought the same things as GP. Anyone who has read K&R (as any C programmer ought to have) will know everything in this article.

notacoward · on Dec 22, 2016

I don't think that's true. While I don't have my copy of K&R handy, I don't recall it covering all the subtleties of how assignment and increment operators and sizeof will work differently for something declared as an array vs. something declared as a pointer. At least in the edition I've had, it just had the same "pointers and arrays are equivalent" which is misleading in exactly the way this article describes. Did that get added in your much-later edition?

mwfunk · on Dec 23, 2016

I'd go so far as to say that any C programmer at all (whether or not they've read K&R) will know everything in this article. However, it's probably interesting or useful for people who are either in the process of learning C, or people who have to read and write it occasionally, but never truly learned the language. These are definitely all pain points for people coming to C from higher-level languages.

anjc · on Dec 22, 2016

It's not that convenient an untruth seeing as these are probably some of the first things you learn in C, and some of the first gotchas that'll getcha.

Having said that, the article was well written.

bluetomcat · on Dec 22, 2016

Some of the first things that people learn in C is the fallacy that "arrays and pointers are equivalent". An array is a series of contiguously laid-out objects that has a size known at compile time (except C99 VLAs). A pointer, on the other hand is merely a "single cell" that is supposed to contain an address and can be added/subtracted to, also dereferenced.

The truth is that array names decay to pointers except when the array is an operand to the `sizeof` or `&` operator.

jandrese · on Dec 22, 2016

Right, the fact that arrays aren't pointers is academic when they decay to pointers at the drop of a hat. In fact it's so easy to decay the array to a pointer that it is generally best to always treat it as a pointer lest you get burned later on during a code refactor. This mostly means never using sizeof() to get the size of an array.

bluetomcat · on Dec 22, 2016

Even if we simplify it this way, arrays are still different because they cannot be assigned to and their decayed pointer can never be NULL (they always have a backing storage provided by the compiler):

    int arr[5];
    int *ptr;

    // "arr" has a fixed backing storage of sizeof(int) * 5 bytes
    // "ptr" may point to anything or be NULL, depending on control flow

jandrese · on Dec 22, 2016

Best to check for NULL anyway though, because anything can happen once you let it decay to a pointer. Never assigning them is a good idea though, maybe it's best to think of them as constant pointers? But that's more confusing terminology for a C programmer, so maybe not. People get really wrapped around the axle when differentiating a constant pointer from a pointer to a constant.

grymoire1 · on Dec 22, 2016

I think it's more likely the reaction to someone who uses magic languages that perform complicated actions over a simple assignment. If you think of C as a high-level assembly language whose assignment instruction is converted into a single machine instruction, this type of behavior is not so surprising.

kbp · on Dec 22, 2016

> I think it's more likely the reaction to someone who uses magic languages that perform complicated actions over a simple assignment. If you think of C as a high-level assembly language whose assignment instruction is converted into a single machine instruction, this type of behavior is not so surprising.

C assignment isn't that simple. Consider this program:

    #include <stdio.h>

    struct point { int x; int y; };

    int main(void) {
        struct point p1 = { 0, 0 };
        struct point p2 = p1;
        ++p2.x;
        printf("(%d, %d), (%d, %d)\n", p1.x, p1.y, p2.x, p2.y);
    }

It prints "(0, 0), (1, 0)", as most people would expect. C isn't as transparent a layer over assembler as some people like to imagine; it just doesn't have first class arrays.

TillE · on Dec 22, 2016

That's just doing a copy of contiguous memory though? Sure, C has simple custom data types.

I like to think of it as one small step above assembly languages; it's certainly not a "portable assembly", but it's about as low as a high-level language can possibly be.

kbp · on Dec 22, 2016

Yes, C is definitely a low-level language, I was just responding to someone's inaccurate justification for why you can't copy arrays like that.

baydonFlyer · on Dec 22, 2016

I wish that was the case, as in the "first things you learn in C" unfortunately in most cases I come across it's not and many C programmers maybe know that very basics (int* ptr = arr;) but not much beyond that.

utopcell · on Dec 22, 2016

C now has static array indices. For example

void func(int arr[static 8]) {}

imposes a limit on the size of the array that can be passed as an argument (you cannot pass an array of 7 or fewer elements.)

I'd suggest that to the author, but given the article, I fear it may give him a heart attack.

ThatGeoGuy · on Dec 22, 2016

I know they're called static array indices, and they're called that because of the use of the keyword `static`, but they don't have to be compile time constant at all (and checks aren't performed at compile time, IIRC). You can have the following:

    void foo(size_t len, int arr[static len]);

Which is really useful in asserting that you won't pass in the null pointer at runtime (so you can remove any `if (arr) {}` checks). Taken further, this exact method is what makes the restrict keyword usable in practice. One of the main problems of the restrict keyword is that you shouldn't be aliasing pointers. By ensuring that your passed-in array isn't actually a null pointer using the above syntax, you avoid one of the biggest problems of aliasing: passing in two null pointers. Consider:

    void foo1(size_t len1, restrict int* arr1, size_t len2, restrict int* arr2);

vs

    void foo2(size_t len1, restrict int arr1[static len1], size_t len2, restrict int arr2[static len2]);

The second is more verbose, but you have a stronger guarantee that this procedure won't be called with pointers aliased to NULL. The compiler can (and I believe in the case of GCC, will) take advantage of this.

clarry · on Dec 23, 2016

> Taken further, this exact method is what makes the restrict keyword usable in practice. One of the main problems of the restrict keyword is that you shouldn't be aliasing pointers. [..] one of the biggest problems of aliasing: passing in two null pointers

You shouldn't be dereferencing NULL pointers. Aliasing them is perfectly fine.

The non-aliasing requirements of restrict kick in only when you are actually accessing (and modifying!) the object referenced by an lvalue based on an expression of the restrict qualified pointer. So NULL pointers don't matter because first, they do not point to an object, and second, if you dereference them, you're already in UB land anyway. Correct code will not dereference NULL pointers, therefore the restrict qualification means absolutely nothing on code that opts to not try access anything through NULL pointers.

EDIT:

N1256 6.7.3.1p4 under Formal definition of restrict (emphasis mine):

> During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these requirements are not met, then the behavior is undefined.

utopcell · on Dec 22, 2016

Checks against constants are indeed compile-time.

_vya7 · on Dec 22, 2016

What version of C is this in? And do modern implementations (clang, GCC) adhere to it (without extensions)?

FreeFull · on Dec 22, 2016

It's a feature added in C99. Both clang and GCC do support it, assuming -std=c99 or later

utopcell · on Dec 22, 2016

It is C99 so it has been part of the language for a very long time

Vindicis · on Dec 22, 2016

Just to add, from the N1256 draft of C99, 6.7.5.3:

A declaration of a parameter as ‘‘array of type’’ shall be adjusted to ‘‘qualified pointer to type’’, where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation. If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression.

Vanit · on Dec 22, 2016

I don't know how this is a lie or untruth, even in jest. In a language that exposes memory management directly of course you can manually traverse an array.

allemagne · on Dec 22, 2016

It's just a different way of looking at C that might resonate with someone new to the language and the idea of exposed memory. There's no "lying" but by framing it like a story or an evil conspiracy it might make it interesting or fun enough to stick when a clinical description might not for many students.

yason · on Dec 22, 2016

C has no more real arrays than assembly: for the cpu, it's just differently indexed pointers in the end. But arrays in C have some notable differences from pointers in C however, please read a good detailed description here: http://eli.thegreenplace.net/2009/10/21/are-pointers-and-arr...

hardlianotion · on Dec 22, 2016

Is this article a sign that C has fallen out of mainstream use?

coldtea · on Dec 22, 2016

Four answers:

No, C (and C++) is used as much as ever. HN echo chamber aside, Rust and/or Go haven't made much of a dent.

No, we had such articles for decades.

No, it's just an article that points some issues with C, like exist for every language and environment (e.g. tons of articles on JS shortcomings). No correlation whatsoever with such an article and the language falling out of mainstream use.

No, this is a bizarro question. It's an article by single person, not some general trend.

dagw · on Dec 22, 2016

No, C (and C++) is used as much as ever

Is this really true, especially for C? Lots of things that used to be done in C is today done in C++ and lots of things that used to be done in C++ is today done in Java or C#.

TickleSteve · on Dec 22, 2016

In the embedded world, the default language is still C by a wide margin. You have to argue hard to have C++ considered and languages like Rust&Go just aren't on the radar.

Only in HN world is C considered legacy. For the rest of the world, its the well-known workhorse of the software world.

kbart · on Dec 22, 2016

I can confirm that. I write C for embedded systems every day, but for better or worse, I don't see any replacement for it in foreseeable future. C++, Python, Java etc. might be used at higher levels (GUI, for example), but all the guts are still good, old, plain C. Rust is still in its infancy so it's hard to tell and even if it succeed, it will be evolution, not revolution and will take decades to replace C fully.

HN is a bit of echo chamber as mostly SaaS, web and other high level application developers are here. For them C might be dead as well, but if you have anything to do with hardware and system programming, C is still the tool.

dagw · on Dec 22, 2016

For the rest of the world, its the well-known workhorse of the software world.

But it's a workhorse that's continually being replaced. I'm not talking about Rust&Go. I'm talking about C++/Java/C#. Thinking about C projects I saw 15-20 years ago, hardly any of them would be written in C if they where started today. And even in the embedded world C++ is becoming more and more of a thing.

Is C used, of course. Is C going away, of course not. Is C "used as much as ever", I just don't see it.

TickleSteve · on Dec 22, 2016

The number of devices with embedded software is growing all the time.

These devices are 99% of the time using C. ...therefore the number of C projects is growing, not shrinking.

coldtea · on Dec 22, 2016

Well, it's second after Java in TIOBE, which, while not perfect, it checks multiple data points, so it's better than most attempts at guesstimation:

http://www.tiobe.com/tiobe-index/

It's #8 on GitHub in projects per language: http://githut.info/

And #18 on Stack Overflow: http://stackoverflow.com/tags

(and I'd say most C projects are not as likely to go to GitHub compared to JS or Javascript ones. And C programmers are not exactly the type to ask questions on SO, compared e.g. to some language where one can be an "eternal newbie".

>Lots of things that used to be done in C is today done in C++ and lots of things that used to be done in C++ is today done in Java or C#.

Lots of things that are done by Java or C# where done by other languages back in the day too. Visual Basic for enterprise apps that are now a web Java/C# frontend, Delphi, 4GL platforms, Clipper, Visual FoxPro, etc. Even games were written in assembler for most of the eighties too.

steveklabnik · on Dec 22, 2016

(GitHut's data is from 2014)

GoToRO · on Dec 22, 2016

If by "mainstream use" you refer to applications then maybe yes. If you refer to embedded use, for sure not.

dagw · on Dec 22, 2016

Even in the embedded world I'm seeing more and more C++ these days.

baydonFlyer · on Dec 22, 2016

True C++, especially with C++11/14 is growing, but based on all the recent studies (by people like embedded.com) C is still a long way ahead of C++ in the embedded space.

pjc50 · on Dec 22, 2016

What is "mainstream" these days?

logicallee · on Dec 22, 2016

React (a javascript library, sometimes referred to as reactJS or react.js) and more specifically its most popular module, Reagent, which is a full, lazily-loaded preemptive operating system that can run concurrent Java, Pythonjs, Rubyjs programs all from your browser while allowing cooperative suspend, load and save to network or local storage, intertab cooperative process management, etc.

Basically, if you're not working in an add-on to a framework library written in javascript running in a web browser, you might as well be using punch cards. /s

I made the part about Reagent up, but you know you believed it.

We're so far from the metal we might as well be sending a telegram with our requirements.

In the five seconds it takes this crap to load and show you a still loading page, your CPU cores have done 40,000,000,000 sixty-four bit operations.

greenshackle2 · on Dec 22, 2016

You laugh, but Odoo* 's hand-rolled, Backbone.js-based frontend framework includes an interpreter for a subset of Python. They call it py.js. I'm not fucking with you.

https://github.com/odoo/odoo/tree/10.0/addons/web/static/lib...

Just so that you can experience the full horror: Yes, the Python server ships XML templates with raw embedded Python code to the client. The client then parses the XML and interprets the Python using this py.js thing.

If the embedded Python needs to access the database (and it almost always does), the client makes calls to a JSONRPC interface on the server.

(If you ever think of using Odoo. Don't. Just... don't.)

*An open source ERP.

jerf · on Dec 22, 2016

If you're serious, or for those who may indeed be in a bit of an echo chamber, the Tiobe index, while it may have much to criticize, is at least approximately correct: http://www.tiobe.com/tiobe-index/ Which yields Java being larger than C and C++ combined, which are the next two. Then Python. Then probably a long list of things whose order should not be taken too literally. All I'm trying to show here is that, yes, Java, C, and C++ are still the dominant languages. This is shown by a lot of other metrics too.

jcoffland · on Dec 22, 2016

Whatever language you use.

mannykannot · on Dec 22, 2016

Whatever language I use.

coldcode · on Dec 22, 2016

I knew this already 30 years ago. While I am happy to not ever use C again it is still useful as a low level language.

MichaelBurge · on Dec 22, 2016

I'd be careful about assuming that C code run on a human interpreter behaves similarly to C code compiled by a modern compiler. For example, string literals probably don't actually have to exist in memory anywhere, but could arise implicitly from the control flow of your program. A compiler could probably turn:

    char *x = "abcdefghijklmnopqrstuvwxyz"
    printf("%s", x)

into something like:

    for (int x = 'a'; x <= 'z'; x++) {
      putc(x);
    }

Maybe when your program thinks it's accessing the 42nd element of that array, it's actually accessing some function of (the number of clock cycles in the CPU's counter, n unrelated code segments XOR'ing into a memory location, the executable's exact binary output) and the compiler has conspired to make these calculate to what that string's value would've been in an imaginary virtual machine to save 2 bytes(or because they're cached).

Sounds like a good DRM scheme actually.

And who says pointers are to RAM addresses? Maybe the compiler statically notices that the pointer's target stays strictly between 'a' and 'z', and decides to use a simple 26-value counter. Depending on how you debug a program compiled by a sufficiently smart compiler, pointers could point to RAM addresses only when you're looking at them.

That for loop you thought you wrote? Well, your program accesses different parts of the result at different times, so it scattered it all over your program so it's lazily computed. The loop counter or pointer never actually exists or takes on any value.

You can't be sure any of it exists unless you add logging or inspection. The whole program could be a lie, cleverly calculated to mimic the one you really intended.

dom0 · on Dec 22, 2016

> The whole program could be a lie, cleverly calculated to mimic the one you really intended.

The standard has a similarly convoluted way of saying that, in a nutshell, C compilers are permitted all optimizations under the as-if rule. (§ 5.1.2.3)

bluetomcat · on Dec 22, 2016

The "canonical" mental memory model of C (globals in data/BSS, locals on the stack, memory is a big array, code and data are the only artifacts) may be utterly useless when reasoning about the performance of a program, but helps the programmer greatly when approaching a problem.

The language was designed with these things in mind, no matter how much more sophisticated compilers and hardware have become, and no matter how much language lawyer fetishists frown on you for saying "this is on the stack" instead of "this has automatic storage duration".

TickleSteve · on Dec 22, 2016

You're describing a general problem in general terms...

Does a program that has no side effects even exist?? ooOOOoh, spooky....

BTW, yes I know what you're getting at... optimisers are allowed to perform any transformation as long as they're semantically equivalent. This applies to all languages.

MichaelBurge · on Dec 22, 2016

> This applies to all languages.

Sure, but it's common for C programmers to think C is a low-level language with concepts that map straightforwardly to the target machine. You can't simultaneously think that C is a low-level language "close to the machine" and that your program can be freely rewritten into an eldritch horror. That's the only reason it would be a good "lie".

Contrast that with something like Perl, where people accept that an array is whatever Larry Wall wants it to be.

lisivka · on Dec 22, 2016

C programmers are not thinking that C is low level language. ;-)

Assembler language is low-level language, because it is not portable between architectures. C is high-level language, because it's portable. It's means that one C language statement can be translated into many statements of assembler language, hence C has higher level of abstraction than assembler language.

goatlover · on Dec 22, 2016

C is low level for everyone using a programming language other than C (excluding assembly). That even applies to languages created in the early 60s, prior to C.

lisivka · on Dec 22, 2016

C is one of first high level languages. If someone is not educated well, it is his problem. It's possible to do low-level stuff in C, e.g. by inline assembler, but it does not makes C low level. Low level languages lacks abstractions, i.e. they tied to machine, while high level languages are not.

MaulingMonkey · on Dec 22, 2016

> Why is this failing? Because the array’s name is a lie! Using a variable as an expression normally yields its value, but in the case of arrays the array name yields a pointer (to the first element; which is at least reasonable)

Ahh, not quite. sizeof(array) was using the variable as an expression - not an evaluated expression, but an expression nonetheless - and it's clearly not giving us the same result as sizeof(array+0). In C++, you can even construct references, which can be abused in conjunction with templates to create a 'safe' array size check, which relies on array maintaining it's array typing:

  int (&r)[5] = array;

Now, arrays are implicitly convert to pointers is you so much as sneeze in the same room as them, but there are instances (namely arrays of arrays and the like) where you can fuck up your pointer math if you assume that simply using the array name yields a pointer, or that the array 'is' a pointer - because that is the lie! If I'm feeling particularly explicit, I'll write something like (assuming a and b are arrays, in C++ again):

  std::copy(a+0, a+N, b+0);

Where the +0s ensure I'm actually dealing with pointers. This avoids any compiler errors from having mixed types for 'a' (array) and 'a+N' (pointer) which, while rare (the former typically converts to a pointer at some point), has happened to me at least once.

The real reason "this" (array init and assignment) is failing is that C decided arrays weren't copyable and assignable like this. That's all. Really! Now, one can think of plenty of rationale that made sense at the time (memcpy is more explicit, simplifies the implementation to only implement copy/assignment for simpler types, etc.) but it ultimately boils down to the choice of the implementors.

coldpie · on Dec 22, 2016

The thing to keep in mind is Never (Never) use array syntax in your function arguments. It implies something that you can't rely on. More on this: https://lkml.org/lkml/2015/9/3/428

ThatGeoGuy · on Dec 22, 2016

See https://news.ycombinator.com/item?id=13237674 and my corresponding reply, where you _should_ use array syntax for function arguments, but you need to do it using the `static` index syntax.

joeld42 · on Dec 22, 2016

I was lucky to learn C with pointers first, and then arrays. When you think about it as just chunks of memory, it all makes sense and is easier to reason about what the cpu will do. This is another example of a "simplifying abstraction" that is more misleading than simplifying.

rhinoceraptor · on Dec 22, 2016

  But this is exactly what the index operator is doing.  When you write (for example)
  arr[0]
  The compiler is re-writing your code as:
  *(arr + 0)

Since addition is commutative, you can also write arr[0] as 0[arr].

bonoboTP · on Dec 22, 2016

Yes, it's in the article.

leonatan · on Dec 22, 2016

These are no "lies", just misunderstanding on the part of those that believe the untruths. Those that have basic understanding of C know most of the things listed in the article.

tlan · on Dec 22, 2016

The mycodeschool channel on youtube has a playlist[1] that provides a great visual explanation of this information.

I found it really useful in getting my head around these concepts.

[1]: https://www.youtube.com/playlist?list=PL2_aWCzGMAwLZp6LMUKI3...

jibsen · on Dec 22, 2016

Regarding #3, it's worth noting that even though the elements of the two-dimensional array form a contiguous block of integers, you cannot treat them as such [1].

[1]: http://c-faq.com/aryptr/ary2dfunc2.html

wbkang · on Dec 22, 2016

In the last example,

> char string[] = "Hello world";

I thought that gives out warning these days. Isn't the proper type of a string literal the following?

> const char string[] = "Hello world";

Therefore, you can't really modify individual characters there.

kbp · on Dec 22, 2016

You're confusing things a little bit. 'char s[] = "foo"' is equivalent to 'char s[] = { 'f', 'o', 'o', '\0' }' and gives you a normal mutable array (or you can make it const and then it's const; no surprises); it is okay to mutate that array. 'char * s = "foo"' gives you a pointer to a string literal that might be placed in read-only storage; in C++ it is a const char * (so modern compilers should complain about that initialisation), while in C it is a char * where it's UB if you modify it (compilers may warn about storing it in a non-const char *; many don't).

coldpie · on Dec 22, 2016

> Isn't the proper type of a string literal the following?

No, it's a const char pointer. Your declaration actually makes a copy of the "Hello world" string into a new char array, distinct from the string literal itself:

  [~]$ cat test.c
  #include <stdio.h>
  
  int main(int argc, char ** argv)
  {
      char a[] = "test2";
      printf("sizeof(a): %lu\n", sizeof(a));
      return 0;
  }
  [~]$ gcc -std=c99 -pedantic -Wall -o test test.c
  [~]$ ./test
  sizeof(a): 6
  [~]$

copascetic · on Dec 22, 2016

The type of a string literal is array of char, and like other arrays, they decay to pointers to the first element when used in expression context, with three exceptions: sizeof, taking the address with &, and when used as a string initializer. "test2" in your example is not a const char *, it's an initializer, since this is one of the exceptions to array-to-pointer decay.

http://c-faq.com/aryptr/aryptrequiv.html http://c-faq.com/decl/strlitinit.html

smitherfield · on Dec 22, 2016

C string literals are " char * ".

C++ string literals are " const char * " (one of the minor incompatibilities between the languages).

They ought to be "const char[]" but aren't. And, the behavior of "char[]" should be the same as "struct { char[] }", but it isn't.

gpderetta · on Dec 22, 2016

They actually are proper arrays in C++ [1], although, as usual, they decay to pointers faster than an unstable particle.

[1]: https://godbolt.org/g/ZWISCu

smitherfield · on Dec 23, 2016

I stand corrected!

valbaca · on Dec 22, 2016

This is covered in "Expert C Programming: Deep C Secrets"

https://www.amazon.com/gp/product/0131774298

pklausler · on Dec 22, 2016

If you want Pascal, you know where to find it. C is C.

Radle · on Dec 22, 2016

So the only difference between an Array and an *int is how sizeof() behaves?

And how do I pass my Array to a function without losing sight of its size?

optymizer · on Dec 22, 2016

Short answer: For the user? yes. For the type system? no. To keep track of the length of the array, you can pass the number of elements as a second parameter to the function.

Long answer: For arrays with a declared length, the length is included in its type (6.2.5.20, page 42, [1]). Therefore, the type of "int a[5]" is "array of 5 integers". The type of "int*" is "pointer to integer". For arrays without a length, the type is considered 'incomplete' (6.2.5.22).

So the C typing system considers these 2 different types.

"Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue." (6.3.2.1.3)

sizeof is essentially an exception.

"The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer." (6.5.3.4.2)

And that's how sizeof is defined. Because it uses the type to compute the size, and the type of arrays include their length, and the type of arrays doesn't change in sizeof expressions, sizeof will return the total number of bytes of all the elements of the array.

[1] C11 standard (draft): http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

shultays · on Dec 23, 2016

Check this one out

https://news.ycombinator.com/reply?id=13237674

JoeAltmaier · on Dec 22, 2016

I don't believe that's possible. Except 'by hand'; pass the size as a separate parameter.

yCloser · on Dec 22, 2016

This stuff is all taught in 1st year of C.S.

I hope that anyone who works with asm, C (and also C++) learned all this when he was still a kid

ehntoo · on Dec 22, 2016

From my own experience and what I've seen from major US universities, this does not seem to be the case.

For intro (1st year-ish) CS, it looks like most places are teaching Python and C++, with some institutions (such as my own) using Java. An ACM article from 2014 actually has some numbers here. [0]

I graduated relatively recently with a bachelors degree, majoring in CS and Computer Engineering. I had only one course which actually used C, and that wasn't for my CS major. I've spent a fair amount of time since then doing low-level work on ARM micros, but definitely wasn't taught this in school.

[0] http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the...

GrinningFool · on Dec 23, 2016

Well done, Grasshopper. You are now ready to begin the true journey of understanding.

zwieback · on Dec 22, 2016

Don't even get me started on packing when you have arrays of structs...

notacoward · on Dec 22, 2016

Worth noting: since everyone on HN is clearly an expert on C[1], we're just as clearly not the audience for this post. It's obviously written for people who haven't learned this yet, who might still be fooled by the superficial similarity of arrays in C to arrays elsewhere (or to what any rational non-lazy person not implementing their first compiler might expect). That doesn't make it a bad article, so stop being so gratuitously negative. It's actually a pretty good explanation, for somebody at that level, of how C arrays can trip you up. I might use it myself, as a reference for some of the people I mentor. Pedagogy matters.

[1] Or any other topic. Just ask any one of us. Apparently we all sprang fully formed from Athena's brow, already endowed with every bit of knowledge we'll ever need.

misnome · on Dec 22, 2016

[flagged]

dang · on Dec 22, 2016

We detached this subthread from https://news.ycombinator.com/item?id=13236850 and marked it off-topic.

misnome · on Dec 25, 2016

What part of "Array notation in C is a lie".... (End of article "p.s. It isn't") isn't an exact example of clickbait?

0xffff2 · on Dec 22, 2016

I.e., not e.g. If you're going to leave snarky two word replies, you should at least use the right abbreviation.