F() vs. F(void) in C vs. C++

kazinator · on May 12, 2019

Prefer T foo(void) in C because that's the declaration that actually introduces, into the scope, information about the parameter list.

T foo() is a deprecated old style from before ANSI C 1989 which declares foo in a way that is mum about how many arguments there are and what their types are.

(void) is actually an ANSI C invention. C++ adopted this for better C compatibility. (Yes, C++ did that once upon a time!) Then some C++ coders started doing silly things like myclass::myclass(void): something that would never be processed with a C compiler.

(void) is ugly; it would be a productive change in C to do away with K&R declarations and make () equivalent to (void) like C++ did some three decades ago.

saagarjha · on May 12, 2019

> it would be a productive change in C to do away with K&R declarations and make () equivalent to (void) like C++ did some three decades ago.

But this would mean you no longer have a nice way of specifying a function that takes parameters that you can't list :(

dlbucci · on May 12, 2019

How often do you need to do that? I feel like I never have, and I think feel like it's a rare enough use as that it shouldn't be the default behavior, but maybe I haven't written enough C.

hermitdev · on May 12, 2019

I have actually run across code in the wild along the lines of:

  int foo()
    int x;
    int y;
  { return x + y; }

Perfectly legal C, if not an old archaic and seldom used syntax. I saw it in some proprietary code, so, no, I cannot provide a link. It was odd, but worked for what it was trying to do.

It pays to know the language you work in, not to write things like this, but to understand on the off chance you encounter it in the wild.

Edit, I may be wrong on the syntax, it might be:

  int foo(x, y)
    int x;
    int y;
  { return x + y; }

I dont have the ISO spec in front of me, but one of these should work.

barrkel · on May 12, 2019

That's K&R C, predating ANSI C. It's not that uncommon if you're looking at long-lived codebases.

Rather it should look like this:

    int foo(x, y)
      int x;
      int y;
    { return x + y; }

x and y would be assumed to be int in the absence of the type specifiers, so in this case they're optional.

klodolph · on May 13, 2019

To add to this, int is also default return type.

   foo(x, y) { return x + y; }

Crinus · on May 13, 2019

I use it sometimes for callbacks that take similar but not exactly the same arguments, f.e. containers for pointers that can have an optional "release" function pointer that is called to release all pointers in the container and can be assigned to "free" for raw data, or object specific destructors like "free_bitmap" or "free_sound" or even "free_container" (so having a container with other containers inside that may have their own "release" functions).

saagarjha · on May 12, 2019

It shouldn't be the default, but I use it occasionally when you have a function pointer and need to call it with a set of arguments that you get at runtime.

kazinator · on May 13, 2019

There isn't any way to do this in ISO C. Libraries like libffcall and libffi exist for dynamically building a C argument list and calling it.

If we stick to standard C, all calls are statically determined, so your only option is to switch among different call expressions, like:

  switch (nargs) {
  case 0: return f();
  case 1: return f(a[0]);
  case 2: return f(a[0], a[1]);
  ...
  }

Here, if the f pointer is under-declared, it saves you some casting.

You can use a union:

  struct fun {
   int nargs;
   union {
     int (*ptr0)(void);
     int (*ptr1)(valtype);
     int (*ptr2)(valtype, valtype);
     ...
   } fn;
  }

Then:

  switch (f->nargs) {
  case 0: return f->fn.ptr0();
  case 1: return f->fn.ptr1(arg[0]);
  ... 
  }

Now there is a modicum of type checking. When you create the "struct fun" object, you may be able to take the address of a function that is compatible with one of the ptr's, and assing it to the correct union member without having to use a cast. If you record the number of arguments correctly, it won't be misused.

quietbritishjim · on May 13, 2019

Your idea to use a union of function pointers also allows a much broader range of function parameter types to be used. When you call a function that used a K&R declaration, the arguments are subject to default promotion e.g. short to int, so you can't call "int f(short)" if it is only declared "int f()". Your idea doesn't have this problem.

ndesaulniers · on May 13, 2019

Function pointers in a union? Now that's a first (for me).

One of my coworkers could defeat any idea for a new optimization with "well, what about unions?" which would typically complicate things quickly...

kazinator · on May 13, 2019

Here, the union is being used as a space-saving structure. We only need to store one pointer, but it can be of different types. We access the same one that we most recently stored.

Most of the code surrounding this won't contain assignments into the union, so it won't impact optimization.

In an interpreted language I wrote this kind of union is initialized when a function object comes to life, and then not mutated again.

The union is necessary because a struct would blow up the size of the object significantly (and then it wouldn't fit into the GC heap cell size, requiring an additional piece of malloced memory).

saagarjha · on May 13, 2019

> One of my coworkers could defeat any idea for a new optimization with "well, what about unions?" which would typically complicate things quickly...

I'm not sure I follow: they would thwart your attempts at optimizing things by talking about unions?

kazinator · on May 13, 2019

Compilers that allow type punning through a union have to basically consider the access to a member of the union to have an effect on any other member. It's conceivable that there are edge cases where that consideration could hamper the optimization of a code which uses unions without perpetrating any sort of type punning. Hmm, like what?

Say we have a really contrived function that works with two pointers A and B to the same union type. The compiler cannot prove that A and B are distinct. The function evaluates A->x = expr, and also B->y in several places. Since A and B might be the same pointer, the assignment has to be regarded as clobbering B->y, which interferes with CSE of B->y and register caching.

As of C99, a possible solution here would be to declare the pointers restrict; then the compiler assumes they don't overlap: A->x has nothing to do with B->y.

If a struct is used, then x and y are different members and have nothing to do with each other for that reason, needless to say, even if A and B are the same pointer.

TeMPOraL · on May 13, 2019

Given that this is C, can the compiler prove in any case that two pointers are distinct? When you're dealing with a structure, it could still be the case that A = B + sizeof(structure.x) (+/- padding).

saagarjha · on May 13, 2019

Yes, I know this: I'm relying POSIX's guarantees and not what ISO C mandates (also truth be told I usually do this in Objective-C, but that doesn't actually change anything significantly). Unfortunately I get a function pointer and need forward essentially anything, so I'm not going to have any sort of type safety at all…

kazinator · on May 13, 2019

What guarantees does POSIX make about calling functions that are not in ISO C? I'm curious.

POSIX generaly has very little to say about C matters; for the most part it defers to ISO C by normative reference.

Off the top of my head, because POSIX specifies dlopen and dlsym, that pretty much requires function pointers have to have a common representation convertible between void * and back.

mkehrt · on May 13, 2019

int main() { return 0; }

Main takes argc and argv, but it's idiomatic to omit them if unused.

lilyball · on May 13, 2019

This is irrelevant; this is the function definition, which means the compiler knows this function takes zero arguments.

It works because with the way the C ABI works on all platforms I'm aware of, extra arguments passed to the function can be completely ignored by the function with no ill effects, so the fact that _start actually invokes main with argc and argv can be ignored by main.

Which is to say, you could also write this as

  int main(void) { return 0; }

and that would work equally as well.

innocenat · on May 13, 2019

Not on stdcall i.e. Win32 API.

lilyball · on May 13, 2019

I've never done Windows programming but AFAICT you don't use `int main()` on Win32 anyway, you write something like

  int WINAPI WinMain(HINSTANCE hInst, HINSTANCE hPrevInst, LPSTR, int)

(where WINAPI is just a macro for __stdcall)

I also found a reference saying you can in fact use `int main()` if you want to, in a GUI subsystem Windows app, by using the Microsoft linker options /subsystem:windows /ENTRY:mainCRTStartup, but in that case you're not writing a stdcall function anymore.

EDIT: I guess I should clarify, my previous comment was talking about cdecl functions, i.e. the default calling convention in C.

Vogtinator · on May 13, 2019

That's a special case though and unrelated to (). Like it's also explicitly allowed to omit a return statement in main altogether, which implies a return 0.

kazinator · on May 14, 2019

That was added in C99; previously, "falling off" the end of main without a return statement resulted in an indeterminate termination status. Countless sloppily written C programs had garbage termination statuses because of this, making it impossible for higher level programs (e.g. scripts) to reliably test for successful execution.

shawxe · on May 13, 2019

Why not just use int main(void)?

mkehrt · on May 13, 2019

I'm confused. main doesn't take void--it takes two arguments. The f() syntax means that the definition doesn't specify how many arguments the function will be called with, while f(void) means it takes no arguments.

pksadiq · on May 13, 2019

As per the standard, main() can take no argument [ie, main(void)] or 2 arguments [main (int, char asterisk asterisk) or equivalent] or some implementation defined manner. See 5.1.2.2.1 in C2x working draft[0].

[0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2346.pdf

tropo · on May 15, 2019

Wow, it is painful to see that they still haven't accepted case ranges. They are supported by gcc, clang, and icc. Like so:

case 123 ... 456:

Switching on strings is another thing people have been wanting for half a century. It would seep up many programs, because most programmers don't bother to generate a perfect hash or a carefully-balanced tree of "if".

rootbear · on May 13, 2019

I would just be happy with a change that allowed me to not keep repeating the type identifier all the time. Something like:

  int foo(int x, y, z, float a, b, c) { ...

instead of

  int foo(int x, int y, int z, float a, float b, float c) { ...

Yes, you can pack things like that into a struct sometimes, but not always. From talking to folks who were on the ANSI committee at the time, there was some reason why the first example wouldn't work. Some parsing ambiguity, but I forget the details.

It's distressing to see new languages (like D) adopt this verbose style as if it were some well thought out idea. It wasn't, it was just a consequence of the new declaration style that the ANSI C committee adopted.

chappar · on May 13, 2019

Can't you do that already with f(...)?

quietbritishjim · on May 13, 2019

As other comments have noted:

varargs functions require at least one non-vararg parameter.

(As int_19h said:) On some architectures, vararg functions have their own distinct ABI, so (...) as a declaration is not compatible with any definition that doesn't also use (...).

mFixman · on May 13, 2019

Not in C! All variadic arguments functions must have at least one non-variadic variable.

bpye · on May 12, 2019

Could introduce something new, foo(?) or such to make the intent clearer.

userbinator · on May 13, 2019

There's already the "..." token to signify a variadic function, which is effectively what "takes parameters that you can't list" means, although I believe it's currently defined only for at least one non-variadic argument.

int_19h · on May 13, 2019

Furthermore, on some architectures, vararg functions have their own distinct ABI, so (...) as a declaration is not compatible with any definition that doesn't also use (...).

kazinator · on May 13, 2019

The construct would be a pointless throwback to poorer type safety. Every C function call is statically typed and so is a function definition. The declaration of what is being called may tell you nothing about the argument list, but the call expression which uses that identifier will assume a static type for it, based on how it is called. The call will be compiled assuming it has a certain type, which may or may not match what is being called. There is no run time-check; if it's wrong, it's undefined behavior.

Gibbon1 · on May 12, 2019

What I would like in C is a way to specify an argument list. And be able to slop them around and call functions with them later.

So perhaps foo(args) means foo takes a list of arguments unspecified.

nitrogen · on May 13, 2019

You can approximate an argument list using compound literals as long as 0/NULL is an acceptable default value. It's a bit noisy, but it works something like this if I remember right:

    result = foo(&(struct foo_args){.arg1 = 4, .arg7 = "2"})

ndesaulniers · on May 13, 2019

http://nickdesaulniers.github.io/blog/2013/07/25/designated-...

kazinator · on May 13, 2019

There are libraries for this like GNU ffcall and libffi.

Gibbon1 · on May 13, 2019

Looked at those. Problem with libffi is it wasn't ported to a 32 bit arm. And the interface is 'clunkie'

I think what I want is llvm's blocks. Which isn't available in gcc.

unilynx · on May 13, 2019

Like va_list in vprintf ?

saagarjha · on May 12, 2019

Yup, would totally be fine with that.

userbinator · on May 12, 2019

For those who are still wondering the actual reason for the extra instruction after reading all that, it has to do with the calling convention: when calling a variadic function in SysV AMD64, AL holds the number of vector registers used for parameters. I believe the Microsoft x64 one doesn't do that.

Also, a xor r32, r32 is 2 bytes, not 1.

ndesaulniers · on May 12, 2019

Yep: https://godbolt.org/z/BMjD0Y

sorry, I must've misread the output from godbolt from too many window splits. Will update the article. Thanks for pointing it out. :)

klodolph · on May 13, 2019

The reason is that there’s no space between the instruction address and the first byte of the instruction, but there are spaces between later bytes.

saagarjha · on May 12, 2019

> a xor r32, r32 is 2 bytes, not 1

Does the article imply otherwise?

biesnecker · on May 12, 2019

Yes.

> So Clang can potentially save you a single instruction (xorl %eax, %eax) whose encoding is only 1B, per function call to functions declared in the style f(), but only IF the definition is in the same translation unit and doesn’t differ from the declaration, and you happen to be targeting x86_64.

saagarjha · on May 12, 2019

I couldn't seem to find that bit, thanks for pointing it out to me.

ndesaulniers · on May 12, 2019

I just updated the article, too, sorry if that caused confusion.

cblum · on May 12, 2019

pdpi · on May 12, 2019

> Is an error in C, but surprisingly C++ is less strict here, not only allowing it but also taking the semantics of the definition. (Spooky)

Aren't these just declaring one overload of foo and defining a different overload?

hermitdev · on May 12, 2019

Yeah, it's an overload as others have replied. The difference between C and C++ here, is that you should get a linker error if you try and invoke the undefined overload in C++ (C doesn't allow overloading, nor does it require a forward declaration).

This is how we "deleted" functions, especially otherwise auto generated ones such as a default or copy constructor and assignment operator pre C++11.

Thorrez · on May 12, 2019

Yeah, there's nothing spooky there. C++ supports overloads and C doesn't.

ndesaulniers · on May 12, 2019

Ah! That's it, thanks! edited the article

J-Kuhn · on May 12, 2019

Came here to say something that I suspect operator overloadíng. I read your comment, pressed refresh, and it was already fixed.

maxdamantus · on May 12, 2019

I feel like basically any "explanation" of fails to sufficiently explain `f()`.

`void f(){}` is just a special case of the old style of defining functions. Another case of this is `void f(a, b) int a, b; {}`.

Using the old style, functions are defined without "prototypes"; that is, the type of the function does not specify its arguments.

Since changing the meaning of `void f(){}` would have broken backwards compatibility, they added `void f(void){}` as the way of specifying "no arguments" in the new style.

Edit: to further demonstrate, this is a perfectly valid C program:

> void f(a, b) int a, b; {}

> int main(void) { if(0) f("oops"); }

As the type of `f` does not specify its arguments, there is no constraint violation (compilation error), and as the call is never actually performed (`if(0) ..`), the program does not invoke the undefined behaviour that would result in performing the invalid call.

mises · on May 12, 2019

Thank you to this guy for not burying the lead. I hope more people put a simple conclusion at the beginning when writing such a blog post.

Also, any one else notice that the favicon is a gif (nyan cat)? I didn't know that was supported.

_Microft · on May 12, 2019

This might blow your mind :)

https://news.ycombinator.com/item?id=19712167

IAmLiterallyAB · on May 13, 2019

I've always wondered why EAX was getting cleared; I see it all the time in disassembly. Very cool stuff!

korethr · on May 13, 2019

In the conclusion the author says he considers T f() to be prettier than T f(void). I personally prefer the look of T f(void) to T f(); the former is explicit to both the programmer and the compiler that this function is not supposed to take any arguments. If I can cause the compiler to not be helpfully clever when it doesn't need to be, as well as make it clear to my future selves, or any other poor soul condemned to read my code what my actual intention was, I see that as a Good Thing.

johannkokos · on May 13, 2019

> If we change foo2 to a declaration (such as would be the case if it was defined in an external translation unit, and its declaration included via header), then Clang can no longer observe whether foo2 definition differs or not from the declaration.

I tried enabling link time optimization on clang-8, it does see through and eliminate redundant xor instruction in foo2(). gcc fails to do that, however. I inspect the result by calling objdump on generated object code.

ndesaulniers · on May 13, 2019

Great! Thanks for confirming; I suspected LTO would be able to do this optimization as well. I should add that to the post.

Animats · on May 13, 2019

That should have been disallowed in C11, but apparently it wasn't. You do have to have function prototypes now, but they can still be empty.

tapirl · on May 13, 2019

I found another difference between C and C++ recently. C++ compiles the code in the following post ok, but C doesn't.

https://old.reddit.com/r/C_Programming/comments/bn04w0/confu...

In short, for "typedef struct bar bar;", C views "bar" as an incomplete type, but C++ doesn't.

dearrifling · on May 13, 2019

That doesn't sound right. I think the difference is that C++ doesn't treat that as an array declaration so it doesn't require a complete type.

tapirl · on May 13, 2019

What is the root cause for the difference?

selimthegrim · on May 13, 2019

Ah, the days of void main(void)