>But regardless of what our computer is doing, if it takes less than 100ms to execute our program, we simply won't be able to tell the difference between executing 1 instruction or all 10 billion.
>All of the example programs except for "Python 3 (PyPy)" are basically within this threshold. No user will notice the difference.
This is a contentious point. 100ms is very noticeable for a human being. I'd say most humans probably hit the "simultaneous" range at 33ms (30fps). I know, personally, I can very much tell when there's an input latency of 100ms.
Regardless, for Drew's somewhat contrived benchmark, I can't imagine a situation where starting a program in 33ms or 100ms would matter, but if that program was running in a loop in a bash script? Then those milliseconds very much do matter. Spending 90ms more in a loop is an extra 9 seconds for every 100 items.
This is covered in the article, and he even does a benchmark of 25k iterations, and then explains why go is slower, creates a more apples to apples comparison where go is only slightly slower, and then optimizes the syscalls to create a go program many times faster than the asm.
I would argue if you're doing performance sensitive work in that fashion, you're doing it wrong.
I'd also probably argue that if you're doing it that way, you're probably already not that concerned with performance to the degree where losing a few tens of seconds is a big deal.
I certainly wouldn't include any tight bash loops that execute external code in any service code, for example. Performance there shouldn't be impacting either the sync or async path.
Much like the article's point, it's a question for optimising for the right things.
I do agree that people can perceive values faster than 100ms.
Several years back I was interviewing for a sysadmin job at a futures trading company. Part way through the interview, one of the traders complained to the interviewer that the latency was increasing on the connection to the exchange, and sure enough it was up 20ms from normal (I forget the normal, but it was very low, on the order of only maybe 10-20ms) and monitoring was just about to page the interviewer.
In addition to 100ms being very noticable, the problem with such "it isn't noticeable" arguments is that n such implementations get layered and soon you have a pretty egregious situation.
As a lazy example, I happen to have Excel open on macOS -- whenever I try to format a cell the format screen takes just shy of 4 seconds to appear. How is that even remotely possible? Tens of billions of instructions per second...leaves me waiting whenever I set a single cell to display as a number.
Memory utilization has the same problem. Those many allocations are meaningless individually, but pretty soon we have an idle and unnecessary "Adobe Desktop Service" sitting in the background soaking up almost 2GB.
There's a difference between perception of display FPS and action-reaction cycle, like pressing button and seeing result. Check out this video: https://youtu.be/eLK-L10zRFA
The guy is a super-fast gun artist, honing his action-reaction loop every day for hours. But for first 120ms he just standing there while nerve signals are getting from eyes to brain and then to his arms and fingers.
The author makes a benchmark that compares a go program that explicitly buffers many "hello world" strings, and compares this to an assembly program that does not do this, then goes on to argue that this shows that high level languages can be faster. This can certainly be true, but this particular example is not a great way to show it, because the same optimization can be done in an obvious way in assembly, (just allocate enough space with brk or mmap2, write your strings and send the right memory address to ). This would of course be harder to do in assembly than in Go, but it is still quite straightforward to see what is the optimal way to do it. Furthermore, this would probably still outperform the optimized Go version by a wide margin.
A better example could be where a compiler picks an obscure but faster parallelization operation, or unrolled a loop appropriately in a way that is both faster and unlikely to be written by a competent human, or a complex memory management scenario etc etc.
I think this is not the point of the original article though. I think we all understand that abstractions can in theory bring great benefits, but we do need to scrutinize the cost they add. The hello world examples shows that even with the simplest program we can imagine, the result is extremely far from optimal in popular programming environments. If this is the case, why should we assume that these same compilers are doing an excellent job in situations that are actually hard?
He called this out, saying that the added complexity overhead of Go allowed the abstraction to be done much simpler than it would be in assembly. This is just tradeoff that is worthwhile in most cases.
The gripes about boilerplate overhead in Hello World miss the point that the runtimes involved are themselves making tradeoff about what to optimize for. Go explicitly trades off ultra-efficient binary size for ultra-fast compilation and mostly-static artifacts. Go is not designed to make the most efficient possible Hello World binary, nor should we want it to be. The fact that you can optimize Hello World better by hand than the Go compiler does tells us nothing interesting. How well can you optimize Docker, Consul, or Kubernetes by hand?
> By some estimates human beings perceive two events as instantaneous at around 100ms.
This is not at all what the linked article says. The article says human reaction time — the time it takes to receive a stimulus, process it, and perform a physical act in response — is about 100ms.
Human time perception is much faster than that. A software button that takes 100ms to switch to its pressed appearance feels dramatically different than one that does so in 10ms. As the linked article notes, game player performance degrades as latency increases from 13ms.
Original observation: programs do a lot of things you didn't ask for.
This response: well, actually™ you really want all these things you didn't ask for. And don't worry about that bloat, the memory is cheap and the CPUs are fast.
So, the points of the original article still stand (this "response" does not address them in any way).
The claim about 100ms latency being imperceptible is the cherry on the top. Try delaying a drum track by 100ms and listening to it without pain (or try singing karaoke into a mic with that much latency).
But that's beside the point, which is that all these programs are doing a lot of things you didn't ask for and take megabytes to do that.
I don't think this is a fair summary of what I said.
These were my points:
1. Performance improvements of interactive programs below a certain threshold are imperceptible and therefore of negligible value.
2. A minimalistic program does not always lead to better performance.
3. Compilation involves trade-offs and binary size and the number of syscalls executed are not the only things that matter.
4. The syscalls emitted by these programming languages are there for a reason and provide significant value to developers and users.
I nowhere stated that memory was cheap, but I agree that CPUs are fast.
As for latency, quibble with the numbers, but I don't think a user will ever notice the few hundred microseconds difference between the Go version and the ASM version.
> But that's beside the point, which is that all these programs are doing a lot of things you didn't ask for and take megabytes to do that.
Yes. And it both doesn't matter and optimizing them away has drawbacks.
Theoretically the points in the original article could matter - writing a very simple tool with Electron for example. But the example given wasn't anything like that.
> As for latency, quibble with the numbers, but I don't think a user will ever notice the few hundred microseconds difference between the Go version and the ASM version.
You do realize there's a huge difference between a few hundred microseconds and the 100 milliseconds you were talking about?
The point wasn't about startup time. The point s/he was making was that complexity can be a good thing in certain cases.
No one would ever expect or claim that a JEE application would have a fast startup time. That's not the level of complexity JEE is trying to optimize for.
do you agree that there are tradeoffs is system design?
A plane could weigh a lot less if we removed all the computers and sensors that it carries. However, as logical beings we make the trade off because speed isn’t as important as safety.
> However, as logical beings we make the trade off because speed isn’t as important as safety.
umm, isn't it more like we make the trade off because we currently can't do better? who knows what amazing progress we will make?
we do really want both speed and safety. bringing the subject back home, golang on their homepage picked 3: simple, reliable & efficient.
the original article is right for talking about it at all. i, for example, had no clue. and now will keep all that in mind. the response is great for providing an explanation. and OP's comment here is right on spot.
It's a fixed cost. The percentage of the resulting program that is "bloat" appears to be crazy high when you're writing a trivial "hello world" program... but for an actual, real-world program the percentage of this fixed cost compared to the total size of the program will be negligible.
> Computers are fast - a lot faster than we can possibly perceive. By some estimates human beings perceive two events as instantaneous at around 100ms. To Drew's point, this is an eternity for a computer. In that time a modern CPU could execute 10 billion instructions.
This type of thinking is why we can’t have nice things, and why technology and software seem stuck on an eternal treadmill. A system with lower latency feels amazing - compare a high and low latency terminal or text editor sometime and feel the difference. A touchscreen with 1ms latency (input to output, which requires a 1000Hz display) feels like a real physical object - way different from a touchscreen at 16ms or (god forbid) 100ms. We are never going to get there if people keep assuming that latencies lower than 100ms (or 30ms) are imperceptible.
100ms is the minimum viable performance. If an operation takes longer than that, you’ve failed for interactive purposes. But just because that’s the minimum requirement doesn’t mean we shouldn’t try our dang hardest to improve on that!
I strace'd Rust's hello world and the vast majority of the syscalls are just ld.so setting up dynamically loaded libraries. Presumably if you really cared you could statically link against musl to eliminate those. The remaining syscalls are needed for stack guards to work, which you actually want for security, even with hello world.
I would go so far as to say that publishing benchmarks that encourages languages to skimp on important security features like guard pages is irresponsible.
It's still weird that it has significantly more syscalls than the C program dynamically linked to glibc, which presumably has the same overhead wrt ld.so and guard pages. (BTW, the guard page for the main thread, IIRC, is setup by the kernel)
A fair number of the syscalls are to ensure that stack overflow is reported like a Rust panic (with a stack trace), not like a segv in unsafe code. I think this is worth it.
> A full-program, aggressive optimization step might lead to smaller binaries, but it would do so at a great cost to projects with many dependencies. Incremental compilation is a boon to developer productivity which is also important.
Isn't the traditional way to solve this problem to have two different modes for the compiler? You can have a "development" mode that compiles quickly and supports incremental compilation, and that doesn't stop you from also having a "release" mode that produces a Hello World executable that fits on a floppy disk.
And then if you're Google and you have infinite disk space and memory but limited time to spend waiting for builds, you can just run all your builds in "development" mode. (But if you also ship smartphone apps, maybe you compile those in release mode.)
The Go Way ain't the traditional way. I'd argue you should want the same binary everywhere, though there's always C or you could come up with your own scheme to break production.
I don't especially care about the binary (beyond wanting it to be as small and fast as possible on end-users' machines), but I do want the behavior to be the same everywhere.
In C, this is difficult because code often accidentally relies on "undefined behavior" that changes depending on the compiler and optimizations you use. But is that still an issue in newer languages that don't have as much undefined behavior? (Is this something Rust programmers struggle with, for example?)
Safe language implementations such as Go and Rust are specifically designed to avoid such inconsistencies. If your builds differ, you never know 100% though.
On an “etch a sketch” you go up one dial then down a half then right half a dial then down half a dial then right half a dial and then right half a dial and then left half a dial and up one dial and then right half a dial and then down half a dial and then...bugger stuck
I believe the function of this feature was to get the executable before it could be deleted off the disk, which it presumably could in the time between the binary was executed and the call to readlink was made.
I've looked into it some more; I had assumed /proc/*/exe acted like a symlink whose value never changed, but actually it's more complicated[1]. As a result (only considering Linux here), there actually is a race condition in what they're doing: /proc/self/exe will have " (deleted)" appended if the executable is deleted. In an apparent attempt to make the problem more subtle, they're doing the readlink on every program startup so that if the value is needed, it will be correct--unless the file was deleted between when it was exec'd and when it got to the readlink call.
> By some estimates human beings perceive two events as instantaneous at around 100ms.
Would be fun to design a realtime audio app using that supposition.
Also, a question to troll fans of the phrase "Gell-Mann amnesia effect": how flippantly should I now reject the other paragraphs that are outside of my expertise?
>All of the example programs except for "Python 3 (PyPy)" are basically within this threshold. No user will notice the difference.
This is a contentious point. 100ms is very noticeable for a human being. I'd say most humans probably hit the "simultaneous" range at 33ms (30fps). I know, personally, I can very much tell when there's an input latency of 100ms.
Regardless, for Drew's somewhat contrived benchmark, I can't imagine a situation where starting a program in 33ms or 100ms would matter, but if that program was running in a loop in a bash script? Then those milliseconds very much do matter. Spending 90ms more in a loop is an extra 9 seconds for every 100 items.