Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is

  sizeof('x')
equal to 4 if

  char letter = 'x';

  sizeof(letter)
is equal to 1, just like `sizeof(char)`? If `'x'` is represented as an `int` in C, shouldn't `letter` in this example also be represented as an `int`?


No. The type of 'x' is int. It so happens on your platform (and most available systems today) sizeof(int) == 4.

The type of letter was explicitly char, and sizeof(char) == 1 by definition in C.

char letter = 'x'; is a type coercion. That literal is an integer, with the value 120 and then it's coerced to fit in the char type, which is an 8-bit integer of some sort (might be signed, might not, doesn't matter in this case).


People often forget that 'ab' or '1ert' multi-char immediate are allowed in C. They are almost unusable as they are highly un-portable (because of endianess issues between the front-end and the back-end).


This is once in a while kinda useful, aside from the data layout issue for stuff like a FourCC.

Rust has e.g. u32.to_le_bytes() to turn an integer into some (little endian) bytes, but I don't know if there's a trivial way to write the opposite and turn b"1ert" (which is an array of four bytes) into a native integer.

Edited to Add: oh yeah, it has u32.from_le_bytes(b"1ert"). I should have checked.


Does this mean that `word` in

  char *word = "xyz";
is a pointer to an array of four `int`s, `'x'`, `'y'`, `'z'`, and `'\0'`? When I evaluate

  sizeof(*word)
I do get 1 instead of 4, even though `*word` is pointing to `'x'`. Where are the remaining 3 bytes in memory?


A char is 1 byte by definition. But the type of a character literal (the 'x' syntax) is not a char, but an int instead.

The C type system generally matters so little that the type of an expression has little relevance (sizeof is the most notable exception to that rule), which obscures this fact.


Not at all. There are no character literals in "xyz", this is a string literal and it's unrelated to what your parent was saying.


word is of type char*, a pointer to a (single) object of type char.

The initializer means that the char object it points to happens to be the first (0th) element of an array containing 4 elements with values 'x', 'y', 'z', and '\0'.

Most manipulation of arrays in C is done via pointers to the individual elements, and arithmetic on those pointers. (Incrementing a pointer value yields a pointer to the next element in the array.)

For example, `sizeof word` gives you the size of the pointer object, but `strlen(word)` yields 3, because it calls a library function that performs pointer arithmetic to find the trailing '\0' that marks the end of the string. (A "string" in C is a data layout, not a data type.)


If you specifically type it as char * the it's a pointer to chars each of which has size 1.


you'll have to understand the 'x' syntax and the "xyz" syntax as two different things. Different quotes.


I know. But my understanding was that `"xyz"` is an array of characters so that these two would have the same representation in memory:

  char word[] = {'x', 'y', 'z', '\0'};  // sizeof(word) = 4, sizeof(*word) = 1
  char word[] = "xyz";                  // sizeof(word) = 4, sizeof(*word) = 1
What I did not realize was that the above two are not the same as this:

  char *word = "xyz";  // sizeof(word) = 8, sizeof(*word) = 1


The representation of an object is determined by how the object itself is defined.

An initializer doesn't change that. It only affects the value stored in the object when it's created.

A special case exception is that an array object defined with empty square brackets gets its length from the initializer, so

    char word[] = "xyz";
is a shorthand for, and is exactly equivalent to:

    char word[4] = "xyz";


What I see there is that you seem to highlight the difference between using sizeof with an array and sizeof with a pointer, which makes a difference, even if array-decays-to-pointer is a rule in most other contexts.


Right, I am mixing up two things here. You are right that bringing up pointers here is a mistake.

But apart from that, I would expect `{'x', 'y', 'z', '\0'}` to have size 16 rather than size 4 because it consists of four character literals which each have size 4 on my machine.


Maybe do not overthink it. 'x' is called a character literal, but it has the type int.

`{'x', 'y', 'z', '\0'}` does not have a type by itself, but it's valid syntax to use it to initialize various structs and arrays - some of those will have the size you are looking for, depending on which type of array or struct you choose to initialize with that: https://gcc.godbolt.org/z/Tqjq3xzKo


Thank you for the explanation and the Godbolt example! I appreciate it. Apologies for fumbling around in confusion.


sizeof() returns the number of "units" that something -- an expression or a type -- takes up. What do you think those units are?

They are literally defined as "characters". sizeof(char) is always 1.

Your confusion (besides the pointer thing) is that 'x' is a funny way to write an int, not a char.


It seems to me that `sizeof` returns the number of bytes that the thing takes up in memory. For example:

  int numbers[] = {1, 2, 3};  // sizeof(numbers) = 12
> Your confusion (besides the pointer thing) is that 'x' is a funny way to write an int, not a char.

Yes, this might be it. So the way to get a `char` value that contains "c" is to use type coercion and write it as `(char) 'c'`. This changes the representation in memory so that it now takes up only one byte rather than four, right?


`(char)'c'` is an expression of type char.

Its size is one byte -- but the size of an expression isn't really relevant, since it's (conceptually) not stored in memory.

You can assign the value of an expression to an object, and that object's size depends on its declared type, not on the value assigned to it. The cast is very probably not necessary.

    char c1 = 'c'; // The object is one byte; 'c' is converted from int to char
    int  c2 = 'c'; // The object is typically 4 bytes (sizeof (int))
The fact that character constants are of type int is admittedly confusing -- but given the number of contexts in which implicit conversions are applied, it rarely matters. If you assign the value 'c' to an object of type char, there is conceptually an implicit conversion from int to char, but the generated code is likely to just use a 1-byte move operation.


In the declaration

    char letter = 'x';
the initialization expression 'x', which for historical reasons is of type int in C, is implicitly converted to the type of the object. `letter` is a `char` because you defined it that way.

If you had written

    int letter = 'x';
that would be perfectly valid, and the conversion would be trivial (int to int).

It's just like:

    double x = 42;
`sizeof 42` might be 4 (sizeof (int)), but `sizeof x` will be the same as `sizeof (double)` (perhaps 8).


The type of the expression 'x' is int, not char (in C). The type of an expression consisting of a variable name is the type of the variable (as far as sizeof is concerned).




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: