Hacker News new | past | comments | ask | show | jobs | submit login
Where the Printf Rubber Meets the Road (2010) (hostilefork.com)
80 points by striking on Oct 4, 2014 | hide | past | favorite | 43 comments



So for the record, I wrote this four years ago. No idea why this is being linked to from here now.

Let the record show, also, that I in no way stand by this as being a sane approach. I just walked through it, trying to answer the question. I wanted to show the path to syscall and it was way wackier than I thought it would be. That's why it ended up a blog entry.

(Though when my blog went down one day, someone copied the content into the answer:

http://stackoverflow.com/revisions/2444508/3

...and they did so ignoring the license clause difference that my blog is CC-BY-NC-SA instead of CC-BY-SA. There's been a bit of a tussle over that distinction lately, with people selling StackOverflow books:

http://meta.stackoverflow.com/questions/272768/is-this-site-...

...and I'll leave it to those with more interest in the issue to decide if that is worth worrying about, because I don't actually care.)

Anyway, I'm sure the people involved in writing this had their reasons for doing it this way, and it had to do with code legacy and evolution. I don't want anyone to mistake my trace through it as endorsement. It's just what it is.


You did miss one fun rabbit hole with printf in glibc:

Search for register_printf_function, and realize that printf is now a function that can have whatever side-effects you like (which really sucks for optimization around logging code)


Glibc may have register_printf_function, but GCC does not have register_printf_attribute, so this is not as practical as it could be. That is:

    __attribute__ ((format (printf, 1, 2)))
attaches to a function declaration to indicate that the function works like printf, so GCC can warn if a (compile time) format string does not match the variadic arguments. But there seems to be no way to extend this attribute, nor to make entirely new attributes in the same spirit.

So you can register new format specifiers for printf, but it seems you'll then have to disable warnings about bad format specifiers during compilation. Those warnings have very few false positives and catch real bugs in real code.


"But there seems to be no way to extend this attribute, nor to make entirely new attributes in the same spirit."

Yes, GCC knows, and we're good with that :)

The glibc extension here serves no earthly purpose (and on the gcc side, we've considered pretending it doesn't exist before so we can actually optimize better).

It sounds like it may be a good idea, except 1. It modifies functions that have a standard purpose, standard set of format specifiers, and are, as you point out, often error checked to make sure they match those :)

2. Unless you see the printf handler registration, which may be in a library, linked to your program, or whatever, you'll have no idea that the code is wrong.


The Plan 9 C compilers have #pragma varargck (argpos|type|flag):

http://plan9.bell-labs.com/sys/doc/compiler.html (3.8)


So what is the reason for all the rest of the indirection and macros? Why __printf and then aliased to printf? Why is outchar a macro instead of being factored into a function? If you asked me to implement printf I wouldn't immediately do all of this - so what would my implementation be missing that makes these necessary?


I can speculate: __printf(), having two leading underscores, is a named reserved for "the implementation," meaning the compiler and standard library. This enables at least one useful thing, which is that the standard library (or a third-party library I guess) can use this function when it needs to print things, and not worry that printf() has been redefined by the application.

You might think, but who in their right mind would redefine printf()? But C has an infinite number of people using it all the time, so every possible weird thing has been done a few times by now.


> You might think, but who in their right mind would redefine printf()?

One cannot discount those in their wrong mind either.

I had a header dedicated to #undef ing things ruby.h redirected so I could include it in C++ contexts without breaking the SC++L. While I don't see printf on the list, read, write, close, fclose, sleep, and many more are.


"and not worry that printf() has been redefined by the application."

Note that GCC assumes it isn't redefined unless you use -fno-builtin-printf.

It will replace printf calls with puts and do other fun things.


This is interesting and timely since just recently I attempted to make the same journey in the name of creating a tiny obfuscated C program[0]. However, right at the part where the author says "you might start thinking that you no longer care how printf works" is exactly where I stopped caring.

In the end, I did find out a way to write to stdout without calling putchar or including stdio directly, but there's still some mystery in the call to write:

    // Printing to stdout without stdio or putchar
    // Originally adapted from here:
    // http://stackoverflow.com/a/14296280
    void print_char(char item, int len)
    {
        for (;len; --len)
        {
            write(1, &item, 1);
        }
    }
[0] - https://github.com/lelandbatey/tiny_tree_printer


How is that function mysterious? It just writes the same character to stdout len times. The only mystery is how anyone thought that would be efficient.


What I mean by mysterious is that the syscall is hidden in the write statement, vs hidden in the printf statement. Also, I wrote the above function (actually a much smaller version) as part of a program that would print a binary tree of any given height[0]. It wasn't meant to be efficient, it was meant to be small, since the total size of the program was 777 bytes (later reduced to 505 bytes).

[0] - http://lelandbatey.com/posts/2014/09/binary-tree-printer/


I had a similar question to this once, and what I mainly found out was that FreeBSD's (and thus OSX's) libc is much, much, more readable than glibc.


Any other C library is more readable than glibc. Even on Linux, the Musl C library is far far far more readable.

The OpenBSD C library is the one I usually look at when I want to understand how a specific function works. It doesn't have insane optimizations or bloat like glibc, but it's clean and portable.


The only thing you can begin to understand by reading glibc is the terrible genius of Ulrich Drepper.


You're correct.


The original Stack Overflow question seems to have some fundamentally broken assumptions about implementation in assembly. The design decisions of one random compiler don't determine how all implementations work. And even for that compiler, a lack of inline assembly doesn't mean the standard library can't use assembly; it just has to use separate assembly files. Beyond that, the compiler itself generates machine code, and it can (and does) do so as part of the implementation of varargs.

Regarding the site hosting the article: ugh, it's bad enough to capture the left arrow key and have it go to a previous article, but pay attention to the modifiers to avoid breaking alt-left as a keyboard shortcut for "back". (Browsers shouldn't even allow unprivileged pages to override shortcuts like alt-left.)


But I'll patch the alt-left thing if it makes you happy. :-/ Still, if you want to give feedback, write an email and make it friendly. It doesn't make people feel good with "ugh". There's so much "ugh"-worthy stuff out there and I don't think I deserve that.


I appreciate you taking the time to fix the alt-left problem. I've run into too many sites doing similar things lately, including Google Blogger-based sites; this was just the most recent one I've run into. I was attempting to express a very mild annoyance; sorry if it came across as excessively snarky.


It's all right, I just try kind of hard to be the least ad-having, most license friendly, site I can host... so it gets my goat a bit given what I think are better "ugh" targets by far.

I'm traveling and it's not convenient to fix it tonight, but I will do it as soon as I can.


Not the original commenter, but you clearly haven't hung around HN for very long :)


Indeed, not my generation, not my thing. Back in my day, if you wanted to say something to someone you said it to their face. Gloves were thrown on the ground. Duels were had. It doesn't end until someone gets shot in the face.

Well maybe these modernizations are good. You shoot people in the face virtually.

https://www.youtube.com/watch?v=PYQhvW-tjNM


Of all the things, of all the sites on the Internet, loading all the plugins and advertising weighing down your browser that you can handle... you complain that left and right cursor keys let you navigate entries? Your compass is set so straight, buddy. Thanks for making my day. I hate it when stuff I write gets picked up by sites like this where people like you offer your genius unto the world.


...didn't realize you disliked this site so much. Sorry for putting you on the spot, I guess? I just thought your article was interesting.


Well I was in an irritated mood (it happens, often). And someone texted me as "you're on hackernews"...which is another one of these "we don't deign to comment on your blog but in invisible space you can't moderate and wouldn't even know about if someone didn't tell you" sites.

It irritates me that a low-tech site run by loudmouth self-important non-philosopher startup "We're Big Business" people is worshipped so mindlessly by the madding JavaScript crowds. I guess they don't know any better. They're kids.

And the logo... it's a Y. Must have taken a long time to make that! Can I trade my "points" here for anything besides meaningless points on something that is only slightly lower tech than Reddit?

It's not your fault that these people--this rep system--this fraud exists. But sadness for me and my heart, because I care about right answers and beautiful things. "Startups" and "entrepreneurs" who don't have a soul or an original idea in their head aren't part of my world. Show me something amazing, or come to me with a question. Not the junk from these clowns.

So I'm back to answering homework questions for poor kids in India who don't know that Turbo C is long discontinued. I want to improve the world.

I wish computers hadn't gotten cool enough to get on the radar of MBA wannabe mental lightweights.

http://api.ning.com/files/7p9j8xEwRdVx32byvos40sX-7FeEGhQJ6Z...


>Can I trade my "points" here for anything besides meaningless points on something that is only slightly lower tech than Reddit?

Get some, and you can change the color of the header bar. Get about 500 and you can downvote people. That's about it.


Oooh. Zero points and gray. That's bad, is it? O noes.

Business, business business. Numbers. Is this working?

https://www.youtube.com/watch?v=CFIqby7f5B4


Consider using noscript by default. It fixes this and many other problems.


In my more nieve days I used to genuinely believe that macros were a cool language feature. Now after spending some time in C/C++ I think macros are the enemy of code readability, and I think this article summarises WHY nicely.

I'm sure there is some political reason why ldbl_strong_alias was used to rename printf to __printf, but frankly I don't really care, there are macros like that all over the C/C++ libraries that make finding the source for anything pure hell.

The only acceptable macro to me now is a block which only runs in the DEBUG build and is skipped over by the compiler in RELEASE.

PS - Yes this is why I don't code in C/C++ very often. I'm currently playing with Go which has no concept of macros.


the very few times you want the source to something to debug it etc. you'll have a hard time. the 99% of the time when you want not the implementation details but some semantically meaningful mapping of one token to one concept, macros are wonderful.

another huge benefit is that they take care of fiddly bookkeeping for you - if you have to remember some incantation that involves doing several steps in sequence and making sure that if you change place A you need to change B and C in certain ways, just write a macro and be happy.


I messed around with an MSP430 micro controller for a while. There, I didn't want to use a "standard" (Third-party but popular) library. I ended up grabbing and modifying a much more direct version of printf, and I got to the syscall part much quicker. When you only have 256 bytes of RAM and 64 KB ROM, you don't have such luxury. You also dont need to support C++.


I think printf() is one of the more interesting functions in the standard C library to examine, since it is both so often-used and has the very visible effect of producing output. It also shows some of the ways in which data can be interpreted. Variants or limited-functionality versions of it can make good assignments for programming courses.

Printing floating-point numbers is also another topic that is subtly more difficult than it looks at first glance... here's one of the better articles about it: http://www.ampl.com/REFS/rounding.pdf


When I was using actively Turbo/Borland Pascal I always thought that the "Write"/"Writeln" (forgive me, if mispelled) were magic functions :) - since they were taking random amount of arguments, but there was no pre-processor, or "..." as in "C"...


There is a very quick overview of all this here: http://en.wikipedia.org/wiki/Write_(system_call)#Higher_leve...


I think the question comes from a lack of understanding of varargs, not a desire to see a bunch of crappy glue code inside glibc.

And for that type of C newbie who hasn't seen varargs... I'd say the answer is pretty intuitive if you know how to look. First of all the answer is staring at you in the manpage:

   int printf(const char *s, ...);
   int vprintf(const char *s, va_list ap);
OK, so there is the set of ordinary C functions that do it. Look up this va_list thing and an implementation becomes obvious... printf sets up a va_list and calls vprintf... vprintf scans the string for '%' and uses va_arg...

But the important thing is none of this is magic and you can totally guess it just by looking at documentation. I don't think it ever confused me when I learned C. I don't really get why this is so shocking to people who read the manpages.


what? the question had nothing to do with varargs; the poster quite rightly noted that at some point in the implementation of printf, when you needed to send an actual character to an actual output device, you would need something conceptually lower-level than C. they were just confused about the specifics of interfacing between C and the machine; i think one of the comment-level answers put its finger on the nub:

> Visual Studio x64 doesn't support inline assembler. That doesn't mean you can't have assembler code. You can still have assembler, just not inline.

and in the answer section

> The system calls are (on platforms that I know of) internally implemented by a call to a function with inline asm that puts a system call number and parameters in CPU registers and triggers an interrupt that the kernel then processes.


I have a hard time seeing this as a magical part even in the eyes of a newbie. How common is it to hear about stdio but not also know about write()? I am pretty sure one man page refers to another. My point, just as it is with varargs, is you should be curious enough to find these answers yourself, run some experiments, make some good guesses.

Since Visual Studio is mentioned, I'll say that despite some extra layers (CRT, kernel32, ntdll) the concepts are identical and when I was a kid learning C I was able to dig through header files and documentation to conceptually figure it out.


it's not a question of anything being magical; the op just got confused by the difference between writing inline assembly in a c program and calling from c code into compiled and linked in assembly code.


I'll say, somebody is a little confused.

I still say, they don't understand varargs, don't understand the difference between stdio and write(), and most importantly, they are not curious enough to read documentation and make smart guesses. And I am saying, the bullshit obfuscation involved in this article, sharing so many meaningless implementation details about a very specific libc, will not help answer this question or gain insight more than RTFMing and being curious about the world would.

So... If you are even asking questions like... "Does printf use assembly?" I'd argue you are not thinking it through. You need to get yourself to a place where you can answer that yourself. It will be a very obvious answer when you do. Writing frivolous blog posts is not the way to get there.


Just noting that the article is not the source of the bullshit obfuscation, nor does it condone it. I'm part of the Red project.

http://www.red-lang.org/p/contributions.html

We're trying to undo this stuff. I was just answering a question.


The article struck me as such because the important point is not how glibc layers things, it's the conceptual detail that stdio buffers first and then flushes with write(). [You happened to pick a really bad libc to pick apart because a much simpler implementation is possible.] If someone saw your post having no idea they would have to parse it out of what you are saying, and probably get lost, because you did not say it plainly.


red looks awesome, and has come a lot further than I expected! (i peeked at it a few years ago and there didn't seem to be much). I love that small executable sizes are an explicit design goal.


The next release will be able to directly build and sign .APK files, with no JDK or JarSigner required to be installed. Still under 1MB for that same compiler executable on all platforms...that can also build PE (Windows), Mach-O (OS/X), and ELF (Linux etc.)

It's moving slower than we'd like, but definitely moving.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: