Hacker News new | past | comments | ask | show | jobs | submit login
F() vs. F(void) in C vs. C++ (nickdesaulniers.github.io)
194 points by headalgorithm on May 12, 2019 | hide | past | favorite | 67 comments



Prefer T foo(void) in C because that's the declaration that actually introduces, into the scope, information about the parameter list.

T foo() is a deprecated old style from before ANSI C 1989 which declares foo in a way that is mum about how many arguments there are and what their types are.

(void) is actually an ANSI C invention. C++ adopted this for better C compatibility. (Yes, C++ did that once upon a time!) Then some C++ coders started doing silly things like myclass::myclass(void): something that would never be processed with a C compiler.

(void) is ugly; it would be a productive change in C to do away with K&R declarations and make () equivalent to (void) like C++ did some three decades ago.


> it would be a productive change in C to do away with K&R declarations and make () equivalent to (void) like C++ did some three decades ago.

But this would mean you no longer have a nice way of specifying a function that takes parameters that you can't list :(


How often do you need to do that? I feel like I never have, and I think feel like it's a rare enough use as that it shouldn't be the default behavior, but maybe I haven't written enough C.


I have actually run across code in the wild along the lines of:

  int foo()
    int x;
    int y;
  { return x + y; }
Perfectly legal C, if not an old archaic and seldom used syntax. I saw it in some proprietary code, so, no, I cannot provide a link. It was odd, but worked for what it was trying to do.

It pays to know the language you work in, not to write things like this, but to understand on the off chance you encounter it in the wild.

Edit, I may be wrong on the syntax, it might be:

  int foo(x, y)
    int x;
    int y;
  { return x + y; }
I dont have the ISO spec in front of me, but one of these should work.


That's K&R C, predating ANSI C. It's not that uncommon if you're looking at long-lived codebases.

Rather it should look like this:

    int foo(x, y)
      int x;
      int y;
    { return x + y; }
x and y would be assumed to be int in the absence of the type specifiers, so in this case they're optional.


To add to this, int is also default return type.

   foo(x, y) { return x + y; }


I use it sometimes for callbacks that take similar but not exactly the same arguments, f.e. containers for pointers that can have an optional "release" function pointer that is called to release all pointers in the container and can be assigned to "free" for raw data, or object specific destructors like "free_bitmap" or "free_sound" or even "free_container" (so having a container with other containers inside that may have their own "release" functions).


It shouldn't be the default, but I use it occasionally when you have a function pointer and need to call it with a set of arguments that you get at runtime.


There isn't any way to do this in ISO C. Libraries like libffcall and libffi exist for dynamically building a C argument list and calling it.

If we stick to standard C, all calls are statically determined, so your only option is to switch among different call expressions, like:

  switch (nargs) {
  case 0: return f();
  case 1: return f(a[0]);
  case 2: return f(a[0], a[1]);
  ...
  }
Here, if the f pointer is under-declared, it saves you some casting.

You can use a union:

  struct fun {
   int nargs;
   union {
     int (*ptr0)(void);
     int (*ptr1)(valtype);
     int (*ptr2)(valtype, valtype);
     ...
   } fn;
  }
Then:

  switch (f->nargs) {
  case 0: return f->fn.ptr0();
  case 1: return f->fn.ptr1(arg[0]);
  ... 
  }
Now there is a modicum of type checking. When you create the "struct fun" object, you may be able to take the address of a function that is compatible with one of the ptr's, and assing it to the correct union member without having to use a cast. If you record the number of arguments correctly, it won't be misused.


Your idea to use a union of function pointers also allows a much broader range of function parameter types to be used. When you call a function that used a K&R declaration, the arguments are subject to default promotion e.g. short to int, so you can't call "int f(short)" if it is only declared "int f()". Your idea doesn't have this problem.


Function pointers in a union? Now that's a first (for me).

One of my coworkers could defeat any idea for a new optimization with "well, what about unions?" which would typically complicate things quickly...


Here, the union is being used as a space-saving structure. We only need to store one pointer, but it can be of different types. We access the same one that we most recently stored.

Most of the code surrounding this won't contain assignments into the union, so it won't impact optimization.

In an interpreted language I wrote this kind of union is initialized when a function object comes to life, and then not mutated again.

The union is necessary because a struct would blow up the size of the object significantly (and then it wouldn't fit into the GC heap cell size, requiring an additional piece of malloced memory).


> One of my coworkers could defeat any idea for a new optimization with "well, what about unions?" which would typically complicate things quickly...

I'm not sure I follow: they would thwart your attempts at optimizing things by talking about unions?


Compilers that allow type punning through a union have to basically consider the access to a member of the union to have an effect on any other member. It's conceivable that there are edge cases where that consideration could hamper the optimization of a code which uses unions without perpetrating any sort of type punning. Hmm, like what?

Say we have a really contrived function that works with two pointers A and B to the same union type. The compiler cannot prove that A and B are distinct. The function evaluates A->x = expr, and also B->y in several places. Since A and B might be the same pointer, the assignment has to be regarded as clobbering B->y, which interferes with CSE of B->y and register caching.

As of C99, a possible solution here would be to declare the pointers restrict; then the compiler assumes they don't overlap: A->x has nothing to do with B->y.

If a struct is used, then x and y are different members and have nothing to do with each other for that reason, needless to say, even if A and B are the same pointer.


Given that this is C, can the compiler prove in any case that two pointers are distinct? When you're dealing with a structure, it could still be the case that A = B + sizeof(structure.x) (+/- padding).


Yes, I know this: I'm relying POSIX's guarantees and not what ISO C mandates (also truth be told I usually do this in Objective-C, but that doesn't actually change anything significantly). Unfortunately I get a function pointer and need forward essentially anything, so I'm not going to have any sort of type safety at all…


What guarantees does POSIX make about calling functions that are not in ISO C? I'm curious.

POSIX generaly has very little to say about C matters; for the most part it defers to ISO C by normative reference.

Off the top of my head, because POSIX specifies dlopen and dlsym, that pretty much requires function pointers have to have a common representation convertible between void * and back.


int main() { return 0; }

Main takes argc and argv, but it's idiomatic to omit them if unused.


This is irrelevant; this is the function definition, which means the compiler knows this function takes zero arguments.

It works because with the way the C ABI works on all platforms I'm aware of, extra arguments passed to the function can be completely ignored by the function with no ill effects, so the fact that _start actually invokes main with argc and argv can be ignored by main.

Which is to say, you could also write this as

  int main(void) { return 0; }
and that would work equally as well.


Not on stdcall i.e. Win32 API.


I've never done Windows programming but AFAICT you don't use `int main()` on Win32 anyway, you write something like

  int WINAPI WinMain(HINSTANCE hInst, HINSTANCE hPrevInst, LPSTR, int)
(where WINAPI is just a macro for __stdcall)

I also found a reference saying you can in fact use `int main()` if you want to, in a GUI subsystem Windows app, by using the Microsoft linker options /subsystem:windows /ENTRY:mainCRTStartup, but in that case you're not writing a stdcall function anymore.

EDIT: I guess I should clarify, my previous comment was talking about cdecl functions, i.e. the default calling convention in C.


That's a special case though and unrelated to (). Like it's also explicitly allowed to omit a return statement in main altogether, which implies a return 0.


That was added in C99; previously, "falling off" the end of main without a return statement resulted in an indeterminate termination status. Countless sloppily written C programs had garbage termination statuses because of this, making it impossible for higher level programs (e.g. scripts) to reliably test for successful execution.


Why not just use int main(void)?


I'm confused. main doesn't take void--it takes two arguments. The f() syntax means that the definition doesn't specify how many arguments the function will be called with, while f(void) means it takes no arguments.


As per the standard, main() can take no argument [ie, main(void)] or 2 arguments [main (int, char asterisk asterisk) or equivalent] or some implementation defined manner. See 5.1.2.2.1 in C2x working draft[0].

[0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2346.pdf


Wow, it is painful to see that they still haven't accepted case ranges. They are supported by gcc, clang, and icc. Like so:

case 123 ... 456:

Switching on strings is another thing people have been wanting for half a century. It would seep up many programs, because most programmers don't bother to generate a perfect hash or a carefully-balanced tree of "if".


I would just be happy with a change that allowed me to not keep repeating the type identifier all the time. Something like:

  int foo(int x, y, z, float a, b, c) { ...
instead of

  int foo(int x, int y, int z, float a, float b, float c) { ...
Yes, you can pack things like that into a struct sometimes, but not always. From talking to folks who were on the ANSI committee at the time, there was some reason why the first example wouldn't work. Some parsing ambiguity, but I forget the details.

It's distressing to see new languages (like D) adopt this verbose style as if it were some well thought out idea. It wasn't, it was just a consequence of the new declaration style that the ANSI C committee adopted.


Can't you do that already with f(...)?


As other comments have noted:

varargs functions require at least one non-vararg parameter.

(As int_19h said:) On some architectures, vararg functions have their own distinct ABI, so (...) as a declaration is not compatible with any definition that doesn't also use (...).


Not in C! All variadic arguments functions must have at least one non-variadic variable.


Could introduce something new, foo(?) or such to make the intent clearer.


There's already the "..." token to signify a variadic function, which is effectively what "takes parameters that you can't list" means, although I believe it's currently defined only for at least one non-variadic argument.


Furthermore, on some architectures, vararg functions have their own distinct ABI, so (...) as a declaration is not compatible with any definition that doesn't also use (...).


The construct would be a pointless throwback to poorer type safety. Every C function call is statically typed and so is a function definition. The declaration of what is being called may tell you nothing about the argument list, but the call expression which uses that identifier will assume a static type for it, based on how it is called. The call will be compiled assuming it has a certain type, which may or may not match what is being called. There is no run time-check; if it's wrong, it's undefined behavior.


What I would like in C is a way to specify an argument list. And be able to slop them around and call functions with them later.

So perhaps foo(args) means foo takes a list of arguments unspecified.


You can approximate an argument list using compound literals as long as 0/NULL is an acceptable default value. It's a bit noisy, but it works something like this if I remember right:

    result = foo(&(struct foo_args){.arg1 = 4, .arg7 = "2"})



There are libraries for this like GNU ffcall and libffi.


Looked at those. Problem with libffi is it wasn't ported to a 32 bit arm. And the interface is 'clunkie'

I think what I want is llvm's blocks. Which isn't available in gcc.


Like va_list in vprintf ?


Yup, would totally be fine with that.


For those who are still wondering the actual reason for the extra instruction after reading all that, it has to do with the calling convention: when calling a variadic function in SysV AMD64, AL holds the number of vector registers used for parameters. I believe the Microsoft x64 one doesn't do that.

Also, a xor r32, r32 is 2 bytes, not 1.


Yep: https://godbolt.org/z/BMjD0Y

sorry, I must've misread the output from godbolt from too many window splits. Will update the article. Thanks for pointing it out. :)


The reason is that there’s no space between the instruction address and the first byte of the instruction, but there are spaces between later bytes.


> a xor r32, r32 is 2 bytes, not 1

Does the article imply otherwise?


Yes.

> So Clang can potentially save you a single instruction (xorl %eax, %eax) whose encoding is only 1B, per function call to functions declared in the style f(), but only IF the definition is in the same translation unit and doesn’t differ from the declaration, and you happen to be targeting x86_64.


I couldn't seem to find that bit, thanks for pointing it out to me.


I just updated the article, too, sorry if that caused confusion.


Yes.


> Is an error in C, but surprisingly C++ is less strict here, not only allowing it but also taking the semantics of the definition. (Spooky)

Aren't these just declaring one overload of foo and defining a different overload?


Yeah, it's an overload as others have replied. The difference between C and C++ here, is that you should get a linker error if you try and invoke the undefined overload in C++ (C doesn't allow overloading, nor does it require a forward declaration).

This is how we "deleted" functions, especially otherwise auto generated ones such as a default or copy constructor and assignment operator pre C++11.


Yeah, there's nothing spooky there. C++ supports overloads and C doesn't.


Ah! That's it, thanks! edited the article


Came here to say something that I suspect operator overloadíng. I read your comment, pressed refresh, and it was already fixed.


I feel like basically any "explanation" of fails to sufficiently explain `f()`.

`void f(){}` is just a special case of the old style of defining functions. Another case of this is `void f(a, b) int a, b; {}`.

Using the old style, functions are defined without "prototypes"; that is, the type of the function does not specify its arguments.

Since changing the meaning of `void f(){}` would have broken backwards compatibility, they added `void f(void){}` as the way of specifying "no arguments" in the new style.

Edit: to further demonstrate, this is a perfectly valid C program:

> void f(a, b) int a, b; {}

> int main(void) { if(0) f("oops"); }

As the type of `f` does not specify its arguments, there is no constraint violation (compilation error), and as the call is never actually performed (`if(0) ..`), the program does not invoke the undefined behaviour that would result in performing the invalid call.


Thank you to this guy for not burying the lead. I hope more people put a simple conclusion at the beginning when writing such a blog post.

Also, any one else notice that the favicon is a gif (nyan cat)? I didn't know that was supported.



I've always wondered why EAX was getting cleared; I see it all the time in disassembly. Very cool stuff!


In the conclusion the author says he considers T f() to be prettier than T f(void). I personally prefer the look of T f(void) to T f(); the former is explicit to both the programmer and the compiler that this function is not supposed to take any arguments. If I can cause the compiler to not be helpfully clever when it doesn't need to be, as well as make it clear to my future selves, or any other poor soul condemned to read my code what my actual intention was, I see that as a Good Thing.


> If we change foo2 to a declaration (such as would be the case if it was defined in an external translation unit, and its declaration included via header), then Clang can no longer observe whether foo2 definition differs or not from the declaration.

I tried enabling link time optimization on clang-8, it does see through and eliminate redundant xor instruction in foo2(). gcc fails to do that, however. I inspect the result by calling objdump on generated object code.


Great! Thanks for confirming; I suspected LTO would be able to do this optimization as well. I should add that to the post.


That should have been disallowed in C11, but apparently it wasn't. You do have to have function prototypes now, but they can still be empty.


I found another difference between C and C++ recently. C++ compiles the code in the following post ok, but C doesn't.

https://old.reddit.com/r/C_Programming/comments/bn04w0/confu...

In short, for "typedef struct bar bar;", C views "bar" as an incomplete type, but C++ doesn't.


That doesn't sound right. I think the difference is that C++ doesn't treat that as an array declaration so it doesn't require a complete type.


What is the root cause for the difference?


Ah, the days of void main(void)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: