Build Your Own Redis with C/C++

mzimbres · on March 18, 2023

I haven't created Redis itself but a Redis client library. It all started with a chat server I wanted to write on top of Boost.Beast (and by consequence Boost.Asio) and at the time none of the clients available would fill my needs and wishes

- Be asynchronous and based on Boost.Asio.

- Perform automatic command pipelining [1]. All C++ libraries I looked at the time would open new connections for that, which results in unacceptable performance losses.

- Parse Redis responses directly in their final data structures avoiding extra copies. This is useful for example to store json strings in Redis and read them back efficiently.

With time I built more performance features

- Full duplex communication.

- Support for RESP3 and server pushes on the same connection that is being used for request/response.

- Event demultiplexing: It can server thousands of requests (e.g. websocket sessions) on a single connection to Redis with back pressure. This is important to keep the number of connections to Redis low and avoid latency introduced by countermeasures like [3].

This client was proposed and accepted in Boost (but not yet integrated), interested readers can find links to the review here [2].

[1] https://redis.io/docs/manual/pipelining/

[2] https://github.com/boostorg/redis/issues/53

[3] https://github.com/twitter/twemproxy

elif · on March 19, 2023

I will still trust antirez to write code more than myself. For expected behavior, for performance, for simplicity, I can't expect myself to do any of these things better than him.

In fact, I find that fitting my code into patterns that work with redis actively improve these aspects of my own code.

If I had a monkey wrench into redis (beyond lua), I imagine one or more of these aspects would rapidly deteriorate.

gabrieledarrigo · on March 19, 2023

I did something similar few years ago, while attending the Networking course at the university:

https://github.com/gabrieledarrigo/ducky

I did choose UDP for the sake of simplicity, and I implemented a very naive protocol, using select to handle I/O. I also remember that In tried various benchmarks, but the hash implementation that I used failed after 500k inserts or something. Now, I started to program quite late (after 25 years old), and I mostly do web development, and even if I'm a complete C noob, it was a super funny project! And if someone would like to take a look at the code and share with me an opinion, it would be great!

valbaca · on March 19, 2023

For the full "Build Your Own X" list: https://github.com/codecrafters-io/build-your-own-x

hgs3 · on March 19, 2023

Does the book itself have more prose than what's found on the webpages linked from the table of contents? Those webpages are very sparse on actual explanation; they are mostly code dumps. I would expect a book on a key-value store to be a lot more theory focused. You could write a whole book on hash tables alone, but the hash tables "chapter" is only a few paragraphs of actual explanation [1].

[1] https://build-your-own.org/redis/08_hashtables

sbmthakur · on March 18, 2023

Codecrafters has a similar exercise:

https://app.codecrafters.io/courses/redis?track=c

intelVISA · on March 18, 2023

No C++? Hard pass from me.

c-smile · on March 19, 2023

The book should start from explaining what Redis is.

It is a DB in the first place so "Learn network programming [by writing Redis]" is kind of questionable.

heretoo · on March 19, 2023

Original version was written TCL:

https://gist.github.com/antirez/6ca04dd191bdb82aad9fb241013e...

CoolCold · on March 19, 2023

from the very first chapter with code (chapter 2 if keep numbers):

> There are several Linux system calls we need to learn before we can start socket programming.

Have we reached the state where Linux by default is New Unix/New POSIX ? Isn't it alarming for say FreeBSD guys?

heywhatupboys · on March 18, 2023

> C/C++

there is NO SUCH THING

Mesopropithecus · on March 19, 2023

My fist thought on reading the title and confirmed by looking at the article.

Looks like they meant C.

spullara · on March 19, 2023

It is odd that they are building it in C when it is already written in C.

actually_a_dog · on March 19, 2023

The point of the exercise seems to be to learn C by writing C. If it were to understand how Redis works, then you might as well write it in Python and get it done in ~200 (or whatever) LoC instead of ~1200 LoC. You can't learn C by writing some other language.

pjmlp · on March 19, 2023

It is schizophrenic?

So is it C or C++?

tejohnso · on March 19, 2023

This seems to be a common phrasing left over from maybe two or three decades ago. Whenever I see it I assume it's going to be written in terrible non idiomatic c++. Whatever it is, saying c/c++ doesn't not inspire confidence that I'm going to want to learn in that style.

pjmlp · on March 19, 2023

I usually argue for using C/C++ when the intent is talking about C and C++, after all we used to have "The C/C++ Users Journal".

However programming a Redis clone in C/C++ is definitly not something that happens, it is either C or C++, and if it is in the common subset compiled by compilers of both languages, there are definitly better ways to express it.

JTyQZSnP3cQGa8B · on March 18, 2023

It seems to be a nice project but they should remove "C++" from the name since both languages have nothing in common anymore, especially the use of smart pointers and Boost.Asio compared to "malloc and socket" that is used in the book.

Arelius · on March 18, 2023

I'm starting to understand there are clearly 2 different worlds, if you live in one where C++ assumes the use of smart pointers, or especially Boost.

In the C++ world I live/work Boost is avoided like the plague. I always wondered how Boost continues to exist, but it seems in certain communities it's held up as a standard to strive for.

mianos · on March 18, 2023

I think in the even more modern world, there is also the use of post C++17 smart pointers and still avoiding boost if at all possible. Personally I have grown a little softer on it now that adding boost does not take an application that took a few minutes to build to one that takes an hour.

stonemetal · on March 19, 2023

More than half a dozen Boost libraries have made their way to the standard.

Boost continues to exist because there is large overlap with the standards committee and they use it as a staging ground to try out new ideas before they are standardized.

hewlett · on March 18, 2023

You can still use raw pointers and sockets in C++, which the code examples provided in the book do

JTyQZSnP3cQGa8B · on March 18, 2023

I agree but no one calls this C++ anymore especially since C++14/17, and I fail to see why you would use a C++ compiler for such a code.

_gabe_ · on March 18, 2023

> I agree but no one calls this C++ anymore especially since C++14/17

I do. Who's to say if you use C++, thou shalt always use: smart pointers, move semantics, OOP, templates, and more. Also, you're discounting a very large and very real industry of people writing code for embedded hardware that doesn't necessarily have the luxurious memory resources modern devices have.

Why could I ever possibly want to use C++ if I don't want to use smart pointers and OOP and all that other stuff? Well, generic type safety is very nice. Templates don't exist in C. Constexpr and references are also quite nice. They allow me to annotate my code to make my intentions clearer. Operator overloading is also insanely helpful at times. Matrix/vector math is so much nicer to write/read using mathematical operators instead of function calls everywhere. This also isn't possible in C.

Anyways, don't go making claims that if somebody doesn't use a language exactly the same way you do, then they're using it wrong. Especially when it's a language as massively overengineered as C++. There are plenty of reasons to prefer a subset of C++ to C, and plenty of valid reasons to prefer to use C instead of dealing with the complexities of C++.

batty_alex · on March 19, 2023

The article doesn’t use any C++ features, though. It’s straight C code. Pretty sure that’s what they meant

gpderetta · on March 18, 2023

Well, if you are implementing something like ASIO, and of course c++ is a good language for that, you'll be dealing with sockets directly.

tylerhannan · on March 18, 2023

I sort of want, one day, to read the same article targeted at Delphi or Turbo Pascal. lol.

FpUser · on March 18, 2023

All styles of programming for implementing Redis are supported in Delphi. It'll work just fine. As for Turbo Pascal - replace it with FreePascal/Lazarus which together comprise opensource version of Delphi.

password4321 · on March 19, 2023

I've heard ChatGPT is a great programming language translator.

boffinAudio · on March 20, 2023

Lua. In fact, I think I could do this in Turbo.LUA in about a day.

andsoitis · on March 18, 2023

> lol

What’s the joke? I don’t get it.

TheChaplain · on March 18, 2023

For giggles I was watching this thread waiting to see how long until the word "rust" appeared. Didn't take long :^)

Someone on reddit joked that Rust is becoming the new Crossfit meme;

"How can you tell when someone programs in Rust?" "Don't worry, they'll tell you."

pfoof · on March 19, 2023

I wrote my own Arch kernel, busybox and drivers in Rust, while doing crossfit but I studied law.

noncoml · on March 18, 2023

You know what? If I was more confident in my coding abilities, or some seriously good static analysis tool was available for C or C++, I would program in one of those languages.

I’m really not big fan of Rust, but I love that’s it’s always looking over my shoulder for stupid mistakes.

josephg · on March 18, 2023

I love C and Rust - I’ve been using rust full time for the past couple of years.

But if I wanted to let loose these days and write some code without worrying about the safety that rust has to offer, I’d reach for Zig. It seems to solve a lot of C’s problems while being a much cleaner, well thought through language for low level code. Im a fan!

throwaway5959 · on March 18, 2023

It’s OK. The truth is that even seasoned C/C++ developers can’t write safe C/C++ code. Otherwise exploits wouldn’t exist and static analysis wouldn’t be necessary.

What are you building with Rust?

deterministic · on March 19, 2023

> The truth is that even seasoned C/C++ developers can’t write safe C/C++ code

I maintain a very large C++ project used to run airlines around the world that hasn’t had a bug in production for 5+ years. No crashes. No memory leaks. No production shutdown. Not even once. So no it most definitely is possible to write solid code in C++. However I do agree that it is hard :) I managed to do it by having 9000+ tests testing all aspects of the code.

Another (more public) example is SQLite. I am sure there are more examples out there. Airbus avionics software is written in C for example.

pjmlp · on March 19, 2023

Airbus avionics software is written in a C subset, subject to high integrity computing certification processes, and standards.

List of SQLLite exploits,

https://www.cvedetails.com/vulnerability-list/vendor_id-9237...

dougSF70 · on March 19, 2023

An unfortunate typo, either that or it must be hard to sit at a desk.

noncoml · on March 18, 2023

I’ve been programming professionally in C since 2005 and still short myself in the foot all the time.

I’m Rust I’m found various side projects. An all in one PiHole like dns server, a distributed key value store based on raft and an openwrt module.

But nothing interesting or revolutionary enough to share.

How about you?

1024core · on March 18, 2023

> The truth is that even seasoned C/C++ developers can’t write safe C/C++ code.

Except DJB, of course.... ;-)

tialaramex · on March 18, 2023

No, even DJB: https://lwn.net/Articles/820969/

> "This claim is denied. Nobody gives gigabytes of memory to each qmail-smtpd process, so there is no problem with qmail's assumption that allocated array lengths fit comfortably into 32 bits."

Years later it turns out that well, "nobody" was lots of people. Not DJB of course, DJB can claim that DJB's software works exactly the way DJB intended, but for everybody else not so much.

mattrighetti · on March 18, 2023

I would really love to take a look at this book but I'm not really interested in C/C++, has anyone tried to follow along but with another language? How did you find it?

iudqnolq · on March 18, 2023

if you're interested in Rust this might help. It's intended to be an introduction to the tokio async runtime for Rust by having you implement a minimal redis.

https://tokio.rs/tokio/tutorial/setup

jvans · on March 18, 2023

I have done something similar with ruby, this is a great book: https://workingwithruby.com/downloads/Working%20With%20TCP%2...

I would recommend doing C at some point in your career, nothing comes close in terms of forcing you to understand the hardware you are using.

Mike_12345 · on March 18, 2023

That was true in the 70's but C doesn't require that you understand much about modern hardware. https://queue.acm.org/detail.cfm?id=3212479

doodlesdev · on March 18, 2023

It still is more low level than any other language currently (that can be run on multiple architectures) which I think is what the OP was saying. If you compile C without compiler optimizations, it will generate code that is exactly what you wrote. The argument the paper makes is that the instructions available in modern processors and the way we use them to optimize code (such as instruction level parallelism) is a consequence of C being so important despite being made for much older and much simpler machines. Because if the generated binaries from any C compiler without optimizations isn't low level then you might as well say the same about x86 assembly, and then you're basically out of options.

Mike_12345 · on March 18, 2023

But it's not. It has become a high level language. The idea that C is still a low level language is no longer true. It is highly abstracted from modern hardware.

Edit: I just saw your edited comment. Yeah the point is you really don't need to know much about the hardware to write C. It doesn't force you to understand what's actually going on behind the scenes.

Gibbon1 · on March 18, 2023

My take is both C and the hardware are targeting the same abstract machine. It seems too that attempts shift things one direction or another haven't been successful. Itainium which tries to force the compiler to deal with instruction scheduling and parallelism, failed. Things like LISP machines also didn't do well.

So thinking in terms of the abstract machine are valid for now. The exceptions mostly have to do with caching and understanding that processor can and does execute short sequences of instructions in parallel.

doodlesdev · on March 18, 2023

Yeah sorry about the ninja edit. Very often I reply to a comment without actually finishing and then edit it. I thought Hacker News was delaying my comments by 1 minute though according to my configs? Anyways, I agree that you don't really need a deep understanding of the hardware nowadays to write C, but my point is that it's still useful to learn because you still need more understanding of the hardware than you need to write something such as Ruby.

The point the OP was making is that it still forces you to learn more about the hardware than other languages. Arguably, it also forces you to learn more about the software itself with things being much more explicit than in a modern complex language such as Rust. I'm also just not aware of _any_ language that actually maps well to what the CPU is doing nowadays since they are such complex pieces of silicon.

Mike_12345 · on March 18, 2023

But does it really force you to learn more about the hardware versus other languages? I am challenging that assumption or belief.

C has memory pointers, but those are pointers in a flat memory space abstracted over the physical memory hierarchy.

So yes, you do need to understand that the hardware has memory addresses and that pointers can reference memory addresses. Aside from that I don't see much difference to Java or Python in terms of requiring a deeper understanding. Even Python has bitwise operators.

Kamq · on March 18, 2023

> Aside from that I don't see much difference to Java or Python in terms of requiring a deeper understanding.

The big difference would be that it requires you to understand the difference between stack vs heap allocation.

_gabe_ · on March 18, 2023

This whole article seems to be picking a lot of nits. I didn't read it too deeply, so feel free to correct me if I'm wrong, but the biggest complaints highlighted in the article are:

1. Modern CPUs use instruction level parallelism (ILP)

2. Memory isn't linear (you have separate caches, L1, L2, L3 and main memory)

If you've ever debugged a C or C++ program in release, you've quickly found out about ILP. The code still maps relatively close to the hardware, it just won't run in the sequential order you've provided it in. Many C and C++ programmers know this and try to make the implicit assumptions in their code explicit to allow the compiler to reorder instructions more easily/reduce memory dependency chains[0].

And I'm sure you've heard several proponents over the past few years (Mike Acton comes to mind) espousing data oriented design. This is an entire code methodology intended to help the CPU caches, and it shows an implicit understanding that memory is not linear. Heck, most C/C++ programmers realize that they're using VRAM all the time, and the memory locations they get aren't necessarily backed by physical memory until the OS sorts it out. This is especially transparent if you've ever done any sort of memmapping with files or played with virtual memory.

Anyways, C doesn't necessarily map directly to the hardware, but it's a heck of a lot easier to intuit what a C program will end up doing on the CPU vs what a Python program will do. And most C/C++ programmers realize this fact, and actively write code to utilize the fact that C does not map directly to hardware.

[0]: https://johnnysswlab.com/instruction-level-parallelism-in-pr...

josephg · on March 18, 2023

Yeah. Also, learning C is still probably the best way to learn how raw pointers work. And pointers underpin everything - even if you spend your life in Python or Java.

When I was teaching programming, it was always a rite of passage for my students to implement a linked list in C. Once it clicked for them, the world of programming opened up.

C is also still the universal glue language for FFI. Wanna call Swift from Rust? You can always reach for C.

Mike_12345 · on March 20, 2023

Likewise it is possible to write optimally performing Java code if you know how what what's going on behind the scenes.

https://lmax-exchange.github.io/disruptor/disruptor.html

sitzkrieg · on March 18, 2023

id recommend assembly language to really achieve this. the further away you get from x86 the more enjoyable it seems to be fore a human. skilldricks easy6502 is a really good start

pjmlp · on March 19, 2023

C/C++ is two languages, C and C++.

user5534762135 · on March 18, 2023

"Build your own Redis with Rust/Fortran"

If the author actually can't understand the difference between C and C++, there's a pretty low chance of any good code being in the book. If they do, they should keep an eye on the editor next time they go into empty marketing mode for the title.

dang · on March 18, 2023

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

"Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead."

https://news.ycombinator.com/newsguidelines.html

noncoml · on March 18, 2023

I kid you not I once had a recruiter asking someone with experience in C/C+/C++

Not a typo for C#. He really wanted someone with experience in C+

josephg · on March 18, 2023

I’ve lost count of the number of recruiters who shorten Javascript to Java. Those are not the same thing! Arghhhhh

heywhatupboys · on March 18, 2023

benefit of the doubt: C with classes?

noncoml · on March 18, 2023

Maybe. Who knows what his client asked. To me he confirmed they were looking for C+…

matttb · on March 18, 2023

> The code is written as direct and straightforwardly as the author could. It’s mostly plain C with minimal C++ features.

Sounds fair to me

sitzkrieg · on March 18, 2023

this is also one of my biggest pet peeves too