Hacker News new | past | comments | ask | show | jobs | submit login
Build Your Own Redis with C/C++ (build-your-own.org)
184 points by ibobev on March 18, 2023 | hide | past | favorite | 65 comments



I haven't created Redis itself but a Redis client library. It all started with a chat server I wanted to write on top of Boost.Beast (and by consequence Boost.Asio) and at the time none of the clients available would fill my needs and wishes

- Be asynchronous and based on Boost.Asio.

- Perform automatic command pipelining [1]. All C++ libraries I looked at the time would open new connections for that, which results in unacceptable performance losses.

- Parse Redis responses directly in their final data structures avoiding extra copies. This is useful for example to store json strings in Redis and read them back efficiently.

With time I built more performance features

- Full duplex communication.

- Support for RESP3 and server pushes on the same connection that is being used for request/response.

- Event demultiplexing: It can server thousands of requests (e.g. websocket sessions) on a single connection to Redis with back pressure. This is important to keep the number of connections to Redis low and avoid latency introduced by countermeasures like [3].

This client was proposed and accepted in Boost (but not yet integrated), interested readers can find links to the review here [2].

[1] https://redis.io/docs/manual/pipelining/

[2] https://github.com/boostorg/redis/issues/53

[3] https://github.com/twitter/twemproxy


I will still trust antirez to write code more than myself. For expected behavior, for performance, for simplicity, I can't expect myself to do any of these things better than him.

In fact, I find that fitting my code into patterns that work with redis actively improve these aspects of my own code.

If I had a monkey wrench into redis (beyond lua), I imagine one or more of these aspects would rapidly deteriorate.


I did something similar few years ago, while attending the Networking course at the university:

https://github.com/gabrieledarrigo/ducky

I did choose UDP for the sake of simplicity, and I implemented a very naive protocol, using select to handle I/O. I also remember that In tried various benchmarks, but the hash implementation that I used failed after 500k inserts or something. Now, I started to program quite late (after 25 years old), and I mostly do web development, and even if I'm a complete C noob, it was a super funny project! And if someone would like to take a look at the code and share with me an opinion, it would be great!


For the full "Build Your Own X" list: https://github.com/codecrafters-io/build-your-own-x


Does the book itself have more prose than what's found on the webpages linked from the table of contents? Those webpages are very sparse on actual explanation; they are mostly code dumps. I would expect a book on a key-value store to be a lot more theory focused. You could write a whole book on hash tables alone, but the hash tables "chapter" is only a few paragraphs of actual explanation [1].

[1] https://build-your-own.org/redis/08_hashtables


Codecrafters has a similar exercise:

https://app.codecrafters.io/courses/redis?track=c


No C++? Hard pass from me.


The book should start from explaining what Redis is.

It is a DB in the first place so "Learn network programming [by writing Redis]" is kind of questionable.



from the very first chapter with code (chapter 2 if keep numbers):

> There are several Linux system calls we need to learn before we can start socket programming.

Have we reached the state where Linux by default is New Unix/New POSIX ? Isn't it alarming for say FreeBSD guys?


> C/C++

there is NO SUCH THING


My fist thought on reading the title and confirmed by looking at the article.

Looks like they meant C.


It is odd that they are building it in C when it is already written in C.


The point of the exercise seems to be to learn C by writing C. If it were to understand how Redis works, then you might as well write it in Python and get it done in ~200 (or whatever) LoC instead of ~1200 LoC. You can't learn C by writing some other language.


It is schizophrenic?

So is it C or C++?


This seems to be a common phrasing left over from maybe two or three decades ago. Whenever I see it I assume it's going to be written in terrible non idiomatic c++. Whatever it is, saying c/c++ doesn't not inspire confidence that I'm going to want to learn in that style.


I usually argue for using C/C++ when the intent is talking about C and C++, after all we used to have "The C/C++ Users Journal".

However programming a Redis clone in C/C++ is definitly not something that happens, it is either C or C++, and if it is in the common subset compiled by compilers of both languages, there are definitly better ways to express it.


It seems to be a nice project but they should remove "C++" from the name since both languages have nothing in common anymore, especially the use of smart pointers and Boost.Asio compared to "malloc and socket" that is used in the book.


I'm starting to understand there are clearly 2 different worlds, if you live in one where C++ assumes the use of smart pointers, or especially Boost.

In the C++ world I live/work Boost is avoided like the plague. I always wondered how Boost continues to exist, but it seems in certain communities it's held up as a standard to strive for.


I think in the even more modern world, there is also the use of post C++17 smart pointers and still avoiding boost if at all possible. Personally I have grown a little softer on it now that adding boost does not take an application that took a few minutes to build to one that takes an hour.


More than half a dozen Boost libraries have made their way to the standard.

Boost continues to exist because there is large overlap with the standards committee and they use it as a staging ground to try out new ideas before they are standardized.


You can still use raw pointers and sockets in C++, which the code examples provided in the book do


I agree but no one calls this C++ anymore especially since C++14/17, and I fail to see why you would use a C++ compiler for such a code.


> I agree but no one calls this C++ anymore especially since C++14/17

I do. Who's to say if you use C++, thou shalt always use: smart pointers, move semantics, OOP, templates, and more. Also, you're discounting a very large and very real industry of people writing code for embedded hardware that doesn't necessarily have the luxurious memory resources modern devices have.

Why could I ever possibly want to use C++ if I don't want to use smart pointers and OOP and all that other stuff? Well, generic type safety is very nice. Templates don't exist in C. Constexpr and references are also quite nice. They allow me to annotate my code to make my intentions clearer. Operator overloading is also insanely helpful at times. Matrix/vector math is so much nicer to write/read using mathematical operators instead of function calls everywhere. This also isn't possible in C.

Anyways, don't go making claims that if somebody doesn't use a language exactly the same way you do, then they're using it wrong. Especially when it's a language as massively overengineered as C++. There are plenty of reasons to prefer a subset of C++ to C, and plenty of valid reasons to prefer to use C instead of dealing with the complexities of C++.


The article doesn’t use any C++ features, though. It’s straight C code. Pretty sure that’s what they meant


Well, if you are implementing something like ASIO, and of course c++ is a good language for that, you'll be dealing with sockets directly.


I sort of want, one day, to read the same article targeted at Delphi or Turbo Pascal. lol.


All styles of programming for implementing Redis are supported in Delphi. It'll work just fine. As for Turbo Pascal - replace it with FreePascal/Lazarus which together comprise opensource version of Delphi.


I've heard ChatGPT is a great programming language translator.


Lua. In fact, I think I could do this in Turbo.LUA in about a day.


> lol

What’s the joke? I don’t get it.


For giggles I was watching this thread waiting to see how long until the word "rust" appeared. Didn't take long :^)

Someone on reddit joked that Rust is becoming the new Crossfit meme;

"How can you tell when someone programs in Rust?" "Don't worry, they'll tell you."


I wrote my own Arch kernel, busybox and drivers in Rust, while doing crossfit but I studied law.


You know what? If I was more confident in my coding abilities, or some seriously good static analysis tool was available for C or C++, I would program in one of those languages.

I’m really not big fan of Rust, but I love that’s it’s always looking over my shoulder for stupid mistakes.


I love C and Rust - I’ve been using rust full time for the past couple of years.

But if I wanted to let loose these days and write some code without worrying about the safety that rust has to offer, I’d reach for Zig. It seems to solve a lot of C’s problems while being a much cleaner, well thought through language for low level code. Im a fan!


It’s OK. The truth is that even seasoned C/C++ developers can’t write safe C/C++ code. Otherwise exploits wouldn’t exist and static analysis wouldn’t be necessary.

What are you building with Rust?


> The truth is that even seasoned C/C++ developers can’t write safe C/C++ code

I maintain a very large C++ project used to run airlines around the world that hasn’t had a bug in production for 5+ years. No crashes. No memory leaks. No production shutdown. Not even once. So no it most definitely is possible to write solid code in C++. However I do agree that it is hard :) I managed to do it by having 9000+ tests testing all aspects of the code.

Another (more public) example is SQLite. I am sure there are more examples out there. Airbus avionics software is written in C for example.


Airbus avionics software is written in a C subset, subject to high integrity computing certification processes, and standards.

List of SQLLite exploits,

https://www.cvedetails.com/vulnerability-list/vendor_id-9237...


An unfortunate typo, either that or it must be hard to sit at a desk.


I’ve been programming professionally in C since 2005 and still short myself in the foot all the time.

I’m Rust I’m found various side projects. An all in one PiHole like dns server, a distributed key value store based on raft and an openwrt module.

But nothing interesting or revolutionary enough to share.

How about you?


> The truth is that even seasoned C/C++ developers can’t write safe C/C++ code.

Except DJB, of course.... ;-)


No, even DJB: https://lwn.net/Articles/820969/

> "This claim is denied. Nobody gives gigabytes of memory to each qmail-smtpd process, so there is no problem with qmail's assumption that allocated array lengths fit comfortably into 32 bits."

Years later it turns out that well, "nobody" was lots of people. Not DJB of course, DJB can claim that DJB's software works exactly the way DJB intended, but for everybody else not so much.


I would really love to take a look at this book but I'm not really interested in C/C++, has anyone tried to follow along but with another language? How did you find it?


if you're interested in Rust this might help. It's intended to be an introduction to the tokio async runtime for Rust by having you implement a minimal redis.

https://tokio.rs/tokio/tutorial/setup


I have done something similar with ruby, this is a great book: https://workingwithruby.com/downloads/Working%20With%20TCP%2...

I would recommend doing C at some point in your career, nothing comes close in terms of forcing you to understand the hardware you are using.


That was true in the 70's but C doesn't require that you understand much about modern hardware. https://queue.acm.org/detail.cfm?id=3212479


It still is more low level than any other language currently (that can be run on multiple architectures) which I think is what the OP was saying. If you compile C without compiler optimizations, it will generate code that is exactly what you wrote. The argument the paper makes is that the instructions available in modern processors and the way we use them to optimize code (such as instruction level parallelism) is a consequence of C being so important despite being made for much older and much simpler machines. Because if the generated binaries from any C compiler without optimizations isn't low level then you might as well say the same about x86 assembly, and then you're basically out of options.


But it's not. It has become a high level language. The idea that C is still a low level language is no longer true. It is highly abstracted from modern hardware.

Edit: I just saw your edited comment. Yeah the point is you really don't need to know much about the hardware to write C. It doesn't force you to understand what's actually going on behind the scenes.


My take is both C and the hardware are targeting the same abstract machine. It seems too that attempts shift things one direction or another haven't been successful. Itainium which tries to force the compiler to deal with instruction scheduling and parallelism, failed. Things like LISP machines also didn't do well.

So thinking in terms of the abstract machine are valid for now. The exceptions mostly have to do with caching and understanding that processor can and does execute short sequences of instructions in parallel.


Yeah sorry about the ninja edit. Very often I reply to a comment without actually finishing and then edit it. I thought Hacker News was delaying my comments by 1 minute though according to my configs? Anyways, I agree that you don't really need a deep understanding of the hardware nowadays to write C, but my point is that it's still useful to learn because you still need more understanding of the hardware than you need to write something such as Ruby.

The point the OP was making is that it still forces you to learn more about the hardware than other languages. Arguably, it also forces you to learn more about the software itself with things being much more explicit than in a modern complex language such as Rust. I'm also just not aware of _any_ language that actually maps well to what the CPU is doing nowadays since they are such complex pieces of silicon.


But does it really force you to learn more about the hardware versus other languages? I am challenging that assumption or belief.

C has memory pointers, but those are pointers in a flat memory space abstracted over the physical memory hierarchy.

So yes, you do need to understand that the hardware has memory addresses and that pointers can reference memory addresses. Aside from that I don't see much difference to Java or Python in terms of requiring a deeper understanding. Even Python has bitwise operators.


> Aside from that I don't see much difference to Java or Python in terms of requiring a deeper understanding.

The big difference would be that it requires you to understand the difference between stack vs heap allocation.


This whole article seems to be picking a lot of nits. I didn't read it too deeply, so feel free to correct me if I'm wrong, but the biggest complaints highlighted in the article are:

1. Modern CPUs use instruction level parallelism (ILP)

2. Memory isn't linear (you have separate caches, L1, L2, L3 and main memory)

If you've ever debugged a C or C++ program in release, you've quickly found out about ILP. The code still maps relatively close to the hardware, it just won't run in the sequential order you've provided it in. Many C and C++ programmers know this and try to make the implicit assumptions in their code explicit to allow the compiler to reorder instructions more easily/reduce memory dependency chains[0].

And I'm sure you've heard several proponents over the past few years (Mike Acton comes to mind) espousing data oriented design. This is an entire code methodology intended to help the CPU caches, and it shows an implicit understanding that memory is not linear. Heck, most C/C++ programmers realize that they're using VRAM all the time, and the memory locations they get aren't necessarily backed by physical memory until the OS sorts it out. This is especially transparent if you've ever done any sort of memmapping with files or played with virtual memory.

Anyways, C doesn't necessarily map directly to the hardware, but it's a heck of a lot easier to intuit what a C program will end up doing on the CPU vs what a Python program will do. And most C/C++ programmers realize this fact, and actively write code to utilize the fact that C does not map directly to hardware.

[0]: https://johnnysswlab.com/instruction-level-parallelism-in-pr...


Yeah. Also, learning C is still probably the best way to learn how raw pointers work. And pointers underpin everything - even if you spend your life in Python or Java.

When I was teaching programming, it was always a rite of passage for my students to implement a linked list in C. Once it clicked for them, the world of programming opened up.

C is also still the universal glue language for FFI. Wanna call Swift from Rust? You can always reach for C.


Likewise it is possible to write optimally performing Java code if you know how what what's going on behind the scenes.

https://lmax-exchange.github.io/disruptor/disruptor.html


id recommend assembly language to really achieve this. the further away you get from x86 the more enjoyable it seems to be fore a human. skilldricks easy6502 is a really good start


C/C++ is two languages, C and C++.


"Build your own Redis with Rust/Fortran"

If the author actually can't understand the difference between C and C++, there's a pretty low chance of any good code being in the book. If they do, they should keep an eye on the editor next time they go into empty marketing mode for the title.


"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

"Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead."

https://news.ycombinator.com/newsguidelines.html


I kid you not I once had a recruiter asking someone with experience in C/C+/C++

Not a typo for C#. He really wanted someone with experience in C+


I’ve lost count of the number of recruiters who shorten Javascript to Java. Those are not the same thing! Arghhhhh


benefit of the doubt: C with classes?


Maybe. Who knows what his client asked. To me he confirmed they were looking for C+…


> The code is written as direct and straightforwardly as the author could. It’s mostly plain C with minimal C++ features.

Sounds fair to me


this is also one of my biggest pet peeves too




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: