Hacker News new | past | comments | ask | show | jobs | submit login

Back in time all you needed for perfect performance is to use C and proper algorithms. It was easy.

Nowadays you need vector operations, you need to utilise GPU, you need to utilise various accelerators. For me it is black magic.




I think you're trying too hard.

Something that you do a lot? Fine, write it in C/C++/Rust.

It's something that costs thousands/millions of dollars of compute? Ok, maybe it's worth it for you to spend a month on, put your robe on, and start chanting in latin.


Brian Cantrill has a video out there where he rewrote some C code in Rust and the benchmark was enough faster that he couldn't make sense of it. After much digging it turned out that it was because Rust was using a better data structure to represent the data, one that's difficult to get right in C.

In the end his test was comparing algorithms not compilers, but there is still something to that: we always make algorithmic compromises based on what is robust and what is brittle in our language of choice. The speed limits don't matter if only a madman would ever drive that fast.


But perfect performance isn't even the benchmark, it's "not ridiculously slow". This is what is meant by "Computers are fast, but you don't know it", you don't even know how ludicrously fast computers are because so much stuff is so insanely slow.

They're so fast that, in the vast majority of cases, you don't even need optimization, you just need non-pessimization: https://youtu.be/pgoetgxecw8


To be really fast, yes. Those are optimizations that allow you to go beyond the speed of just C and proper algorithms.

But C and proper algorithms are still fast - Moore's law is going wider, yes, and single-threaded advancements aren't as impressive as they used to be, but solid C code and proper algorithms will still be faster than it was before!

What's not fast is when, instead of using a hashmap when you should have used a B-tree, you instead store half the data in a relational database from one microservice and the other half on the blockchain and query it using a zero-code platform provided by a third vendor.


These things only net you one or two orders of magnitude (and give you very little or even negative power efficiency gain), or maybe 3 for the gpu.

This pales in comparison to the 4-6 orders of magnitude induced by thoughtless patterns, excessive abstraction, bloat, and user-hostile network round trips (this one is more like 10 orders of magnitude).

Write good clean code in a way that your compiler can easily reason about to insert suitable vector operations (a little easier in c++, rust, zig etc. than c) and it's perfect performance in my book even if it isn't saturating all the cores




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: