Show HN: Mako – a full Bitcoin implementation in C

zackham · on Nov 15, 2021

This looks like a herculean effort over just the past few months. Congratulations on hitting this milestone and hopefully you can take a breather and tie up loose ends at a more comfortable pace. As much as HN has changed over the years I still think this is a place where there are a lot of people who can appreciate how you feel right now. Nicely done!

chjj · on Nov 15, 2021

Thanks. This was definitely the hardest project I've worked on in my life. I fear I may have shaved a few years off my life with all of the all-nighters I pulled with this, so it's good to hear recognition for the work. :)

spuz · on Nov 15, 2021

Very impressive work. Was there a deadline that you were trying to meet? What was the reason for the all-nighters?

chjj · on Nov 15, 2021

No deadline. I just have a tendency towards obsession when I'm working on a project I'm very interested in. I tend to get into a kind of manic phase where I can't sleep even if I wanted to. There's no real telling when that phase will end. So I end up coding for an unhealthy amount of time. I don't recommend it to anyone.

HappySweeney · on Nov 16, 2021

I have the same thing, self-diagnosed as monomania. I had to train myself to stop thinking about my obsession when not at it, and the condition eventually subsided enough that it doesn't completely screw up my life anymore.

nb: looking it up on wikipedia, it seems that monomania was a 19th century psychiatric diagnosis, and is no longer considered a real condition.

johnnyApplePRNG · on Nov 15, 2021

When you really love working on something, sometimes you just can't sleep.

It's the most productive you can ever be imho.

bertr4nd · on Nov 16, 2021

Truly an amazing output! Did you really do all this in 2.5 months? Seems I need to dramatically up my coding game!

b20000 · on Nov 16, 2021

congrats on your hard work. why did you choose C89 instead of writing this in C++11 or 17? just curious, no criticism.

debo_ · on Nov 15, 2021

Agreed! I'm confused by the number of comments that suggest this was a waste of time because other potentially similar implementations exist. There could be a hundred of these and I'd still be interested in looking at them.

0des · on Nov 15, 2021

Disregard those comments. There will always be critics wondering why you don't just use floppies taped to a carrier pigeon instead of email. We are nerds.

anonporridge · on Nov 16, 2021

When I was young, I was taught that Benjamin Franklin taught himself to write well by reading a work, and then trying to rewrite it from memory.

I imagine a similar principle is in play here.

There's value in rewriting what already exists and works, even if it will only ever be useful to yourself.

cantrevealname · on Nov 16, 2021

Call me old school, but I like looking at the number of lines of code to get a feel for how big of a project something is. Mako[1] is 265,618 lines of code. The most widely accepted Bitcoin implementation[2] is 639,074 lines of code -- that's 2.5x bigger and written in a slew of languages. Mako looks like a super-impressive amount of work (and by a single person no less).

[1] https://github.com/chjj/mako

[2] https://github.com/bitcoin/bitcoin

[3] How I calculated: find <source-directory> -type f | sed 's/.*/"&"/' | xargs wc -l

EDIT: Also of interest is this "from-scratch tour of Bitcoin in Python" that was discussed here on HN a few months ago: https://news.ycombinator.com/item?id=27593772

bluejekyll · on Nov 16, 2021

Tokei is my go to for counting lines of code in a project: https://github.com/XAMPPRocky/tokei

Jach · on Nov 16, 2021

Huh, I feel somewhat old: https://github.com/AlDanial/cloc

slmjkdbtl · on Nov 16, 2021

I usually line by line count by hand, you get a good count, and learned the source along the way

rweichler · on Nov 16, 2021

Thx for [3]. I usually just do

    wc -l *.c */*.c */*/*.c */*/*/*.c */*/*/*/*.c */*/*/*/*/*.c */*/*/*/*/*/*.c

kazinator · on Nov 16, 2021

  wc $(git ls-files '*.c')

This avoids counting things like generated code that is not checked in (lex.yy.c, y.tab.c or what have you), and any local test code or other junk you have laying around in the tree.

thangalin · on Nov 16, 2021

    wc -l $(find . -type f -name "*.c")

parhamn · on Nov 16, 2021

In bash/zsh and many other shells '*/*.c' would be similar.

kazinator · on Nov 16, 2021

I don't think so. In Bash you can do this:

  bash$ shopt -s globstar

Then you can use double star syntax like this:

  bash$ wc **/*.c

The **/ will match any number of directory components, including zero; I think it's like *.c */*.c */*/*.c and so on, like what OP wrote.

Somewhat sadly, the Glibc glob function does not have an equivalent GLOB_ option for this; it's just in Bash.

rweichler · on Nov 16, 2021

    -bash: shopt: globstar: invalid shell option name

kazinator · on Nov 17, 2021

Bash's NEWS file says that globstar was added between bash-3.2 and bash-4.0.

In the git repo, there is a 2009-dated commit 3185942a5234e26ab13fa02f9c51d340cec514f8 where the material appears, as a snapshot import.

Make sure you're using "shopt -s globstar" and not "shopt -o globstar".

marto1 · on Nov 16, 2021

this looks hilarious and I guess it works. nice one!

latchkey · on Nov 16, 2021

https://github.com/boyter/scc

b20000 · on Nov 16, 2021

last week i got attacked here on HN for suggesting it is not bad to stick to one language and get BETTER over time rather than screwing around with new languages and fads every year. i think this sort of project illustrates what i meant.

oars · on Nov 16, 2021

This is a great discussion of how to count code lines of code. Thanks.

sizzzzlerz · on Nov 15, 2021

Cloned and built on my Mac without any issues. Everything compiled and linked without warnings. A huge plus in my opinion. Other than stating that it builds two executables, there doesn't seem to be any documentation on using it. Did I miss it?

chjj · on Nov 15, 2021

Still experimental and documentation is lacking, but right now the CLI behavior is almost identical to `bitcoind` and `bitcoin-cli`.

So, for example, `$ makod -datadir=foobar -chain=testnet` will create a data directory in `./foobar` and start syncing testnet.

`$ mako getblock 100 2 -chain=testnet` will return block #100 to you serialized as json.

In other words, it's something akin to this:

https://man.archlinux.org/man/bitcoind.1

https://man.archlinux.org/man/community/bitcoin-cli/bitcoin-...

Please let me know how things go on Apple. I have yet to test it on a Mac. It's possible Mac has some issues since the event loop backend is using poll(2). Apple has been known to break poll every now and then.

inter_netuser · on Nov 16, 2021

How did you write this so fast?

did you leverage anything from core or other impls at all, or entirely from scratch?

tmaly · on Nov 15, 2021

This looks super interesting.

One thing that kinda bothers me is that most of the tests are just

#include <stdint.h> #include <stdlib.h> #include <string.h> #include "lib/tests.h"

int main(void) { return 0; }

chjj · on Nov 15, 2021

Yeah, I'm embarrassed about that. It's way behind on tests.

The past 2 months of limited sleep were a massive scramble just trying to get everything implemented and trying to get the damn thing to sync properly. Things seem to work in practice, but it's very scary not having high test coverage, especially with a project like this. Now that everything has solidified and my sleep schedule has normalized a bit, it's my intention to take the time to write proper tests.

SavantIdiot · on Nov 15, 2021

Cool to study. Disappointed there are zero comments and the most terse variable names as possible. Almost like it was js-minified.

chjj · on Nov 15, 2021

Some of the terse variable names are the result of my adherence to a GMP-like naming convention, which I find easy to read and aesthetically pleasing.

The GMP naming convention is something like:

- Pointer/Data - single letter followed by a "p"

- Size/Length - single letter followed by an "n"

So a function declaration might look like:

    static void
    process_bytes(uint8_t *zp, const uint8_t *xp, size_t xn);

The above function would do some processing on `xn` bytes at `xp` and store the result in `zp` (assuming there are `xn` bytes also allocated here).

A function which accepts more inputs might have `yp` and `yn` also, so conceptually: `zp = func(xp, xn, yp, yn);` or to simplify: `z = func(x, y)`.

sanderjd · on Nov 15, 2021

This is a bad convention. Instead of `x` and `z`, you should describe what those pointers are meant to represent. I get that everything is subjective, but some things are actually just bad due to illegibility, and I think it is worth being frank about this.

basedbase · on Nov 16, 2021

Quality C code is descriptive in the function name and simply organized, the functions are usually doing one or two things and fairly obvious without many values being passed; you're going to glean a lot more from the function name than the variables. The measure for good C code is extremely different than higher level languages you may be more used to writing.

sanderjd · on Nov 16, 2021

I believe that this is a cop out, an excuse for a culture of poor conventions. In all languages, good functions should be named well, simply organized, and doing only one or two things, and without many arguments passed in. Also in all languages, parameters and variables should be named expressively. There's no reason C should be exempt from this.

defgeneric · on Nov 16, 2021

Not really. It's because the interfaces in C tend to follow a common pattern that verbose variable names can get in the way. If we have

  int alter_struct(some_struct_t *s, void *xp, size_t xn);

then it's pretty obvious that the data pointer xp with size xn is going to modify whatever struct object we have at pointer s.

The variable names can be shortened because the function interface follows a common convention.

The really important thing here is a good function name, not a good name for the data pointer argument (a descriptive name could even incorrectly suggest it has a specific type rather than simply point to bytes).

sanderjd · on Nov 16, 2021

"The data pointer is going to modify the struct" is not the only information that is useful.

Again, it is true in every language that there are some basic patterns to what interfaces look like. That doesn't tell you anything about what the application-specific logic implementing those interfaces is intended to do. For that, it is useful to name things such that they describe what they represent in the domain of the application or library.

If it is a generic function, there are good names for that as well (for instance, some kind of copy function would still have meaningful names like "from" and "to").

But the example given by the author here was not that, it specifically acted on transactions. In that case, the input bytes were meant to represent a "raw transaction" and the output struct is meant to represent a "transaction".

b20000 · on Nov 16, 2021

if someone needs to review this code and only has a few days to do so, it will be impossible without comments.

chjj · on Nov 15, 2021

I disagree. The function name gives context as to what they're meant to represent if you understand the convention. One of the conventions in mako is something like:

    int btc_tx_import(btc_tx_t *z, const uint8_t *xp, size_t xn);

This function deserializes a raw transaction of `xn` bytes at `xp` and stores the result in the transaction `z`. Zero is returned on failure.

What would be the alternative here? I suppose I could rename `xp` to `data`, `transaction_data`, `raw_tx_data`, or something like that? I don't think it adds any value and it just takes up extra space, making the code less readable.

sanderjd · on Nov 16, 2021

Your english language description of it gives some good clues to the alternative:

  int btc_tx_import(btc_tx_t *transaction, const uint8_t *raw_transaction, size_t raw_transaction_size);

Or since clearly `tx` is already a convention for "transaction", it could be `tx`, `raw_tx`, and `raw_tx_size`. And sure, I have no problem with the `p` and `n` stuff, so it could be `txp`, `raw_txp`, `raw_txn`.

But from your description, the input is a "raw transaction" and the output is a "transaction". Using `x` to mean "raw transaction" and `z` to mean "transaction" is obtuse. You know that the input is a "raw transaction" and the output is a "transaction", but I as a fresh reader, don't, and your code does not help me understand.

chjj · on Nov 16, 2021

You make a good point. I will consider changing the names for the import/export functions (but maybe not the MPI code). That or explain the inputs/outputs in detail in docs.

nothrowaways · on Nov 16, 2021

Pythonic

sanderjd · on Nov 16, 2021

Self documenting code is good in every language.

freemint · on Nov 16, 2021

Self documenting code is not good in BrainFuck.

sanderjd · on Nov 16, 2021

It would be, if it were possible. That it isn't is why that is a bad language. There are others in that same boat.

b20000 · on Nov 16, 2021

agree with raw_tx etc.

defgeneric · on Nov 16, 2021

> I don't think it adds any value and it just takes up extra space, making the code less readable.

I agree here. What they want is to be able to get a superficial understanding at a glance of what the code is doing--in other words they want to give the absolute minimum effort in terms of reading. But when you actually read/write the code and understand it, the shorter names are an advantage. IMO it comes down to who the names are really important for--the reader/reviewer who will likely move on to something else in the next hour, or the person who has actually given some attention to the meaning of the code?

I think the short names also have the advantage of making the logic of the function body understandable at a glance.

optymizer · on Nov 16, 2021

Which code is clearer?

   void f(int c) {
       l.c = c;
       ...
   }

or

   void setColor(int color) {
       label.color = color;
       ...
   }

Names are more important than matching data types and sizes because they convey intent and meaning better than types.

matvore · on Nov 16, 2021

I just did this grep for single-character function names on the Mako codebase:

    grep -r '\<[a-z](' .

And there were no matches. There were some one-character macros but they were macros repeated dozens of times in a localized area.

It seems like what's being proposed is descriptive function names with simple variable names. If the function bodies are short enough (e.g. fit on a single screen) then this seems like a good trade-off to me. IOW, the variable names are symbolic but the function names are descriptive.

The short variable names should be clear enough if you understand the purpose of the function.

smt88 · on Nov 16, 2021

> The short variable names should be clear enough if you understand the purpose of the function.

I shouldn't have to read the function implementation to understand its purpose. Code is buggy! If the function has no name or comment explaining what it's supposed to do, I only have the (often buggy) implementation to go by.

There is no reason to use terse, non-descriptive names in 2021. It's an awful practice that guarantees easy-to-avoid bugs.

matvore · on Nov 16, 2021

I am saying you indeed should have descriptive function names.

I also agree that if the function's name leaves something to be desired then it should be commented.

You are conflating function names--global and relatively non-contextual--with variable names--which have limited scope and rely on the function name for their meaning.

In the setColor example, I would use setColor for the function name and c for the parameter name (with the caveat that C language doesn't have method names, so my reasoning about context has limited applicability to non-C languages)

jrop · on Nov 16, 2021

"Clean code reads like prose" (Uncle Bob C. Martin)

This is quickly becoming my goto standard for measuring how clean my code is, and in my case this means ultra-descriptive variable names. I usually code in two passes: first rough things out using single-character/short names, and then go back and use LSP features to rename the variables using the language-aware tools in any modern code editor.

gregschlom · on Nov 15, 2021

This is silly. The variable types are already giving you exactly this information. Why would you add a "p" when you already have the "*"? Likewise, size_t tells you that this is a size, no need for the "n".

Instead, the variable names should be used to convey information that the types alone can't convey.

chjj · on Nov 16, 2021

> Likewise, size_t tells you that this is a size, no need for the "n".

How do you differentiate the two input lengths? If I were to rename `xp` to `x` and `xn` to `n`, what should `yn` be renamed to? At the very least, there's going to need to be a `yn` somewhere.

It's very common for code to include the type when there are two inputs to a function (even when written more verbosely): e.g. `thing_len`, and `other_thing_len`.

The `p`-suffix convention can also save you in a situation like this:

    int x = 1;
    int *xp = &x;
    int y = 1;
    int *yp = &y;

It avoids naming collisions, and further down in the function, you'll be able to differentiate the pointer and the value. I find it very useful.

If you write multi-precision integer code in C[1] without this convention, you will end up with an unreadable mess. I certainly wish Torbjörn Granlund were here to testify to this.

[1] https://github.com/chjj/mako/blob/master/src/mpi.c

ZephyrBlu · on Nov 16, 2021

I don't understand why you chose to prefix your variables with x and y rather than something more descriptive. It seems like x and y are completely arbitrary, which is confusing.

ancode · on Nov 16, 2021

It's called C

X6S1x6Okd1st · on Nov 16, 2021

> Contrary to what some people might tell you, multiple implementations of a protocol are a good thing. In bitcoin's case, they are necessary to mitigate the harm of developer centralization.

Important point that I didn't really get until I read more on this topic coming out of Ethereum community

jazzyjackson · on Nov 15, 2021

I’m loving the well commented and standalone inplementations of crypto-primatives, will definitely be studying these.

One question, in src/crypto/rand.c:25 you have a standalone RNG but comments say it is not used internally? what is it used for?

chjj · on Nov 15, 2021

Most of the crypto is from my more general crypto library libtorsion: https://github.com/bcoin-org/libtorsion

I originally wanted to vendor my libtorsion code and link to it, but it felt clunky since libtorsion pulls in a ton of crypto that bitcoin doesn't need. Also, since I was focusing on just a few algorithms, it gave me the opportunity to optimize a lot of them (in particular, the ECC backend was optimized for secp256k1 whereas in libtorsion it supports all kinds of curves).

Because of all of this, there's probably some leftover comments. That comment isn't true anymore. rand.c is definitely used internally for libmako, just not libtorsion.

edit: fixed link.

37ef_ced3 · on Nov 16, 2021

I've been writing C for two decades.

But in recent years I have used Go as a replacement for C whenever possible. I think of Go as an improved C, with slightly different performance characteristics.

In your opinion, what is the advantage of writing a program like Mako in C rather than Go?

thinkmassive · on Nov 16, 2021

btcd is an existing implementation in go https://github.com/btcsuite/btcd

cfferry · on Nov 15, 2021

This look cool! great job, congrats.

What are you using as data storage? I understand btcd uses leveldb, are you using something similar?

chjj · on Nov 15, 2021

I considered using leveldb initially, but that would require a C++ compiler and it also requires linking to libstdc++. So at that point, you might as well just write the project in C++, which defeats the point of this project.

Mako uses LMDB. Aside from Berkeley DB, it's pretty much the only key-value store in town if you want a pure C project.

In the end, it worked out well because I really like LMDB. It has a very intuitive API, and it's very small, which sort of matches the spirit of this project.

The zero-copy on reads feature is also amazing. When reading a UTXO from the database, we do no copying and no allocation whatsoever. The UTXO is simply parsed from the pointer LMDB returns. This is something LevelDB cannot do at all.

avar · on Nov 16, 2021

There's always SQLite, which can also be used as a key-value store.

VHRanger · on Nov 16, 2021

And is similarly pure c, self contained

cfferry · on Nov 16, 2021

I am really impressed with your work! If you ever need or look for C++ collabs, let me know!

Also, consider opening a discord server to talk more about this project!

andrewqu · on Nov 16, 2021

Great work! Curious what you used as a reference when writing ?

chjj · on Nov 16, 2021

My first bitcoin reimplementation was written in node.js and called bcoin[1]. So this is my second time reimplementing the bitcoin protocol, albeit in a very different language.

Bcoin was frequently used as a reference along with bitcoin core v0.8.0-v0.11.0 when I felt like double checking consensus functions (among other things).

As an aside, I personally think bitcoin core v0.8.0 is the best version of core if you want to learn bitcoin from it. It's a lot more straightforward than later versions. I personally don't enjoy reading any version beyond v0.11.0.

This is also the reason mako doesn't support taproot yet. That code is very new and isn't present in upstream bcoin. I could try to implement it from the BIPs alone, but I won't know what intricacies are present in the actual bitcoin core code until I actually read it.

[1] https://github.com/bcoin-org/bcoin

AlexanderTheGr8 · on Nov 16, 2021

When you say "reading bitcoin core v0.8.0", do you mean reading the source code? I want to learn more about the fundamentals (technical part) of bitcoin, but am not sure if diving into the 600k+ lines of src code is worth it. Do you know any way to understand bitcoin (or cryptocurrencies in general)?

timobile · on Nov 16, 2021

I highly recommend the books "Mastering Bitcoin" by Andreas Antonopoulos (technical, for beginners) and "Programming Bitcoin" by Jimmy Song (very technical, advanced).

yardstick · on Nov 15, 2021

Anything interesting on your choice of project name?

chjj · on Nov 15, 2021

Its placeholder name was originally "libsatoshi", but I was worried that would cause confusion between this project and bitcoin core.

I spent the past few days thinking of names. Mako came to mind because I had recently been playing through the original FF7 (maybe the first time in ~10 years). It's short, memorable, looks cool. It checked all the boxes. On top of that, pretty much every name involving the words "btc", "coin", etc. is already taken when it comes to bitcoin projects.

That said, apparently there is a wayland notifier also named "mako", so I may have to think up a different name here. =/

debo_ · on Nov 15, 2021

Mako was the thing that Shinra almost destroyed the planet trying to extract, so you may end up opening the project up to a lot of Barret+Bitcoin memes. That could be a win!

chjj · on Nov 15, 2021

Not to get in a pedantic FF7 debate, but I don't believe Mako was destroying the planet: Shinra's Mako reactors were destroying the planet. Mako energy was a tool, and it could be used for good or evil, just like cryptocurrency. In the case of bitcoin, I believe it is being used for good.

debo_ · on Nov 15, 2021

I don't mind pedantic FF7 debates! I think Mako was Shinra's branding on the energy they obtained by mining the Lifestream via the Mako reactors; I don't think it was some natural force.

Edit: Hmmm.

"Mako (Japanese: 魔晄, Makō; literally meaning "magic light") is a liquid substance featured throughout Final Fantasy VII. It is the condensed form of lifestream and the primary source of energy used by human beings throughout the world. It is comparable to nuclear energy. Lifestream can condense into Mako via both natural and artificial processes. The terms "Mako" and "Lifestream" are often used interchangeably because one is a derivative of the other."

From here: https://finalfantasywiki.com/wiki/Mako

chjj · on Nov 15, 2021

Hmm, not sure. I'd have to read up on the lore more, or maybe pay more attention to the game.

The thing that catches my eye there is that they say lifestream can condense into mako through natural processes as well as artificial ones.

janzer · on Nov 15, 2021

Mako is also a fairly popular python template library.

https://www.makotemplates.org/

atarian · on Nov 15, 2021

can something like this actually participate in the real bitcoin network or does everyone have to run the same client?

rasengan · on Nov 15, 2021

Bitcoin is a protocol much in the same way HTTP is - so just like there are multiple implementations of web browsers, there are also multiple bitcoin implementations. This is the first in C.

jlrubin · on Nov 15, 2021

nicely done jj!

ejanus · on Nov 16, 2021

Great job. I would take a look.

vegai_ · on Nov 16, 2021

Oh jeez. What could go wrong?

gvv · on Nov 16, 2021

yes but when moon sir?

noxer · on Nov 15, 2021

[flagged]

isitdopamine · on Nov 15, 2021

Hacker news values intellectual curiosity, so why dismissing something so interesting as a waste of time?

Moreover, the code is beautifully written, the project is useful and the rules of HN ask you not to dismiss project just the way you did.

noxer · on Nov 16, 2021

I said its "incredibly impressive" in case you missed that. Other than that my post represent only my opinion and my opinion is that the result of this project is not worth its time.

Feel free to have another opinion but dont tell me I dismissed something when I did not. You did with my opinion for no reason. Have your own opinion, express your disagreement, do whatever but dont dismiss mine and come around with some BS rules I didn't break.

isitdopamine · on Nov 16, 2021

You have the right to your opinion.

You do not have the right to express it here, since it goes against the rule of this website.

I suggest you read them very carefully!

noxer · on Nov 17, 2021

If you think so, report it to the HN authority and they remove it. Good luck.

isitdopamine · on Nov 17, 2021

To be honest I don't care that much, I would find much more useful that you understood the rules of a community you are taking part of.

beebmam · on Nov 15, 2021

Some of this C code looks extremely obfuscated. Not a fan.

dang · on Nov 15, 2021

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

Would you mind reviewing the Show HN rules too? Your comment broke them as well, and we're especially trying to avoid the culture of shallow putdowns when people are sharing their own work.

https://news.ycombinator.com/showhn.html

chjj · on Nov 15, 2021

Do you mind elaborating? I obsess over clean code, so any criticism is appreciated.

MonkeyClub · on Nov 15, 2021

> I obsess over clean code

Yep, it shows.

I got curious over the GP's comment, and I spot checked json.c[1] and json.h[2].

They're cleanly written, your data structures reflect your use and I can get an idea of what does what through the code alone.

Trying to figure out what could be considered obscure. For example would

    if (!btc_amount_import(&x, obj->u.string.ptr))
        return 0;

be considered obscure? Cause if you know the structure of a json_value, then you get what the phrase says.

Well thought out and cleanly written code, I'd say.

[1] https://github.com/chjj/mako/blob/master/src/json.c

[2] https://github.com/chjj/mako/blob/master/include/mako/json.h

chjj · on Nov 15, 2021

Not sure. I suppose I could add a style or coding convention guide somewhere in the repo to code that may seem confusing initially.

Basic rules are: return values on the left side of params, and functions generally return a boolean for success or failure.

The function you mention is parsing a fixed-point integer string (from a json string) and returning an int64_t. It will return 0 if it's not a syntactically valid integer, or if there is some kind of overflow, etc.

arcticbull · on Nov 15, 2021

It looks super clean to me, the only exception being for the love of all that's good and holy always brace your one-line if statements and loops.

You don't want your own "goto fail" do you? ;) [1]

I think you did a great job.

[1] https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-got...

chjj · on Nov 15, 2021

> It looks super clean to me, the only exception being for the love of all that's good and holy always brace your one-line if statements and loops.

> You don't want your own "goto fail" do you? ;) [1]

Ah, that was my style 7+ years ago. Then I started regularly contributing to an open source project which did not add curly braces on one-liners. I wanted to match the style of the project despite it feeling unnatural to me. Within a few weeks, my style was changed forever (for better or for worse).

> I think you did a great job.

Thank you.

kazinator · on Nov 15, 2021

Won't necessarily work:

  if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
  {
    goto fail;
  }
  {
    goto fail;  /* MISTAKE! THIS BLOCK SHOULD NOT BE HERE */
  }

Copy and paste bugs can play out in any number of ways.

Newer GCC will catch the goto fail by noting that the indentation is wrong.

  $ gcc-11 -Wall gotofail.c 
  gotofail.c: In function ‘main’:
  gotofail.c:5:3: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
      5 |   if (argc > 42)
        |   ^~
  gotofail.c:7:5: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
      7 |     goto fail;
        |     ^~~~

  $ cat gotofail.c 
  #include <stdlib.h>
  
  int main(int argc, char **argv)
  {
    if (argc > 42)
      goto fail;
      goto fail;

    return 0;
  fail:
    return EXIT_FAILURE;
  }

Of course, it could be that the indentation is not wrong. If somneone runs the code through some automatic formatting, the problem will not then be diagnosable that way.

Sadly, if I fix the indentation then:

  $ gcc-11 -Wall -Wunreachable-code -O3 gotofail.c 
  $ # silence

even though the return 0; is not reachable.

Still, I think I'm going to stick with "brace the else part if the then part is braced, and vice versa*.

Jach · on Nov 15, 2021

I wasn't originally going to look at it but this claim got me intrigued. I'd say compared to a lot of C code I've seen, yeah, you aren't lying, the average cleanliness is so clean it's sick ;) Well done!

Perhaps the GP means the general crypto nature of the project lends to opaqueness. For example, I randomly clicked into https://github.com/chjj/mako/blob/master/src/crypto/chacha20... and one could ask where the magic numbers on lines 41-44 come from. Maybe it's explained in the references, or just specified without explanation as part of the protocol. I looked at the reference implementation at https://cr.yp.to/streamciphers/timings/estreambench/submissi... which has the equivalent line at L66. After finding the implementation of U8TO32_LITTLE which does some bitwise-or'ing and shifting of the first 4 items ("expa" for either sigma or tau), in Lisp I quickly verified:

       (format nil "0x~x"
               (logior (char-code #\e) (ash (char-code #\x) 8) (ash (char-code #\p) 16) (ash (char-code #\a) 24)))
    ;-> "0x61707865"

So perhaps an argument could be made that your implementation could be slightly cleaner by using those sigma/tau constants instead of the magic numbers for the first four entries in the initial state, which would show how they're related, but that doesn't really take away the magic-ness of them, just moves the question elsewhere to why those magic strings. And it wouldn't surprise me if the answer is just the strings were arbitrarily chosen; the sigma/tau naming in the reference suggests a common domain convention to me (I'm not a crypto guy though so I don't know).

Still overall this is a minor point, it looks like a clean piece of code (especially given what I'm used to seeing from browsing other bits of crypto code here and there) and random sampling of other parts of the program are at least as nice and nicer, especially when they're not doing anything complicated, which I've seen plenty of C code make a mess of. While it might not always be clear why something is done, it's at least clear what's going on and where to jump for more context, what the interfaces are, and things are well-named such that I could probably go find references for some of the whys if they actually needed to be answered. e.g. in mempool.c's btc_mempool_verify, I have no clue what "Annoying process known as sigops counting" refers to, but even without that comment, the simple if condition's pieces are well-named so I could go search for more info on sigops, which I wouldn't even expect to be part of the code (at least here) anyway.

(Edit: It also occurs to me that a low ratio of comments and explanation to code could also be what is meant by obfuscation. To me that's not a large part of the cleanliness concept, though it does factor into other qualities. I consider the sqlite codebase to be some of the most beautiful C code on the planet, wonderfully documented and tested, and as a whole its beauty more than makes up for some stylistic or organizational quirks I don't exactly like. Not to say that style is totally unimportant, but neither this nor sqlite are exemplars of ugly C with lots of insane typedefs, super macros, inconsistencies, syntax abuses everywhere, and mngld_nms like vowels cost $100 a pop. For a project I consider "not so good" C code, I reference Enlightenment Foundation Libraries...)

kazinator · on Nov 15, 2021

The numbers instantly say to me, "we are is lower case ASCII" due to the 0x6N and 0x7N values.

One way to do this, however is:

  #define FOURCC(a, b, c, d) ... insert shifting-oring expression here ...

Then:

  ctx->state[0] = FOURCC('a', 'p', 'x', 'e');

That doesn't tell you where those characters came from but at least unmasks the ASCII connection.

However, the code now breaks on EBCDIC compilers, where 'a' won't have the value 0x61.

defgeneric · on Nov 15, 2021

I found it to be quite neat. The separation of compilation units alone shows the author knows what he's doing. Perhaps you have trouble with C idioms.

isaiahg · on Nov 15, 2021

As someone who's teaching themselves C could you elaborate?

chj · on Nov 16, 2021

What do you mean "obfuscated"? This is the cleanest C code I've ever read in quite a while.

nynx · on Nov 15, 2021

This is very impressive, but I’ll be honest, I wouldn’t want to expose any new software written in C to the internet.

kahlonel · on Nov 16, 2021

Yeah it’s not like the browser you are using, the OS you have installed, or even the firmware of the keyboard you’re typing this on is written in C.

__vim__ · on Nov 16, 2021

I feel this comment is misguided. Plenty of new code is written in C, particularly in the scientific computing space. If the code in question doesnt do lots memory management and expects the user to provide a memory block to be populated with values, then I see no issues. C code tends to be faster than other implementations even without optimizations. Moreover, lots of interpreted languages have good interoperability with C so a C compiled code can be called in many languages using simple high level interfaces.

0xFFFFFFFFF · on Nov 15, 2021

Can I ask you why? I mean, I know about memory safety and stuff, but it is that bad?

pizza234 · on Nov 15, 2021

Independent researches (Microsoft/Mozilla) showed that around 70% of security vulnerabilities are caused by memory safety bugs. That's one heck of a "memory safety and stuff" :)

Here's one reference: https://msrc-blog.microsoft.com/2019/07/18/we-need-a-safer-s...

postalrat · on Nov 15, 2021

"memory safety bugs" sounds pretty broad. What sort of vulnerability doesn't involve memory access?

UncleMeat · on Nov 16, 2021

Something like SQL injection is a vulnerability that is not exploiting a memory error.

nynx · on Nov 16, 2021

its “memory safety”, not “accesses memory”

blown_gasket · on Nov 16, 2021

That reference and the original blog post where 70% was indicated is only dealing with security vulnerabilities in Microsoft's software, not everyone's.

So while other software that isn't Microsoft most certainly has memory safety bugs, this blog doesn't speak for those, only Microsoft's.

The only part that Mozilla is indicated is in reference to Rust.

syvolt · on Nov 16, 2021

If one of the largest software companies around can't get memory safety right, it does make a guy start to think that maybe most people should avoid having to handle it if they can.

blown_gasket · on Nov 16, 2021

I don't have a fully developed opinion on that, so I won't try to come up with one on the spot. My comment is purely because I felt the parent left out details about the blog post that should have been pointed out.

saagarjha · on Nov 16, 2021

Google has done similar research on their codebases and found results in line with Microsoft’s.

roca · on Nov 15, 2021

Yes it is.

jazzyjackson · on Nov 15, 2021

I wouldn’t be too worried about a bitcoin node since the whole point is to stay in consensus with the rest of the network

amelius · on Nov 15, 2021

What is the official reference implementation written in? (I hope a purely functional language like Haskell.)

Kranar · on Nov 15, 2021

C++

https://github.com/bitcoin/bitcoin

fabianfabian · on Nov 15, 2021

There is no official implementation but http://github.com/bitcoin/bitcoin is most widely used to achieve consensus

l- · on Nov 15, 2021

Not to belittle the work, but it is always prudent to research what the other implementations are before possibly doing duplicate work and providing that comparison along with the new work for justification of it. For example, https://en.bitcoin.it/wiki/Software#C shows picocoin (libccoin), which Github also lists as being C. What makes this or what is desired to for this to be different from the alternative(s)? It certainly seems a non-trivial matter to have completely reimplemented the necessary GNU Multiple Precision Arithmetic Library or similar dependencies such as specific compiler extensions.

chjj · on Nov 15, 2021

Right. Perhaps I should explain this in the readme.

To be clear:

Libbtc is a small bitcoin library meant for everyday things: signing transactions, maintaining an SPV wallet, etc. It's good at what it does, but you wouldn't be able to actually sync and validate an entire blockchain with it.

I suppose picocoin is the most similar to mako, but I think it falls short of being a "full node" in that it doesn't provide a mempool, miner, or RPC (from what I can tell). I'm also not clear on how it is storing UTXOs. It looks incomplete right now, but maybe jgarzik could give more insight on that.

Mako is a _full_ reimplementation of bitcoin. It's an alternative to bitcoin core.

auggierose · on Nov 15, 2021

It's not a paper, and no academic who is not interested in it anyway is going to make stupid comments about related work. No need to justify making what you wanted to make. Well done.

traceddd · on Nov 16, 2021

In my opinion, given the application is cryptocurrency, reimplementations are pretty valuable. It’s important to avoid systemic failure due to a large percentage of nodes having the same exploit or bug. A fully featured reimplementation like this is really great for Bitcoin!

vmception · on Nov 15, 2021

The point wasn’t for this to be desired by anyone

It is just a show and tell

And now we also have a new client on the network!