This looks like a herculean effort over just the past few months. Congratulations on hitting this milestone and hopefully you can take a breather and tie up loose ends at a more comfortable pace. As much as HN has changed over the years I still think this is a place where there are a lot of people who can appreciate how you feel right now. Nicely done!
Thanks. This was definitely the hardest project I've worked on in my life. I fear I may have shaved a few years off my life with all of the all-nighters I pulled with this, so it's good to hear recognition for the work. :)
No deadline. I just have a tendency towards obsession when I'm working on a project I'm very interested in. I tend to get into a kind of manic phase where I can't sleep even if I wanted to. There's no real telling when that phase will end. So I end up coding for an unhealthy amount of time. I don't recommend it to anyone.
I have the same thing, self-diagnosed as monomania. I had to train myself to stop thinking about my obsession when not at it, and the condition eventually subsided enough that it doesn't completely screw up my life anymore.
nb: looking it up on wikipedia, it seems that monomania was a 19th century psychiatric diagnosis, and is no longer considered a real condition.
Agreed! I'm confused by the number of comments that suggest this was a waste of time because other potentially similar implementations exist. There could be a hundred of these and I'd still be interested in looking at them.
Disregard those comments. There will always be critics wondering why you don't just use floppies taped to a carrier pigeon instead of email. We are nerds.
Call me old school, but I like looking at the number of lines of code to get a feel for how big of a project something is. Mako[1] is 265,618 lines of code. The most widely accepted Bitcoin implementation[2] is 639,074 lines of code -- that's 2.5x bigger and written in a slew of languages. Mako looks like a super-impressive amount of work (and by a single person no less).
This avoids counting things like generated code that is not checked in (lex.yy.c, y.tab.c or what have you), and any local test code or other junk you have laying around in the tree.
last week i got attacked here on HN for suggesting it is not bad to stick to one language and get BETTER over time rather than screwing around with new languages and fads every year. i think this sort of project illustrates what i meant.
Cloned and built on my Mac without any issues. Everything compiled and linked without warnings. A huge plus in my opinion. Other than stating that it builds two executables, there doesn't seem to be any documentation on using it. Did I miss it?
Please let me know how things go on Apple. I have yet to test it on a Mac. It's possible Mac has some issues since the event loop backend is using poll(2). Apple has been known to break poll every now and then.
Yeah, I'm embarrassed about that. It's way behind on tests.
The past 2 months of limited sleep were a massive scramble just trying to get everything implemented and trying to get the damn thing to sync properly. Things seem to work in practice, but it's very scary not having high test coverage, especially with a project like this. Now that everything has solidified and my sleep schedule has normalized a bit, it's my intention to take the time to write proper tests.
This is a bad convention. Instead of `x` and `z`, you should describe what those pointers are meant to represent. I get that everything is subjective, but some things are actually just bad due to illegibility, and I think it is worth being frank about this.
Quality C code is descriptive in the function name and simply organized, the functions are usually doing one or two things and fairly obvious without many values being passed; you're going to glean a lot more from the function name than the variables. The measure for good C code is extremely different than higher level languages you may be more used to writing.
I believe that this is a cop out, an excuse for a culture of poor conventions. In all languages, good functions should be named well, simply organized, and doing only one or two things, and without many arguments passed in. Also in all languages, parameters and variables should be named expressively. There's no reason C should be exempt from this.
Not really. It's because the interfaces in C tend to follow a common pattern that verbose variable names can get in the way. If we have
int alter_struct(some_struct_t *s, void *xp, size_t xn);
then it's pretty obvious that the data pointer xp with size xn is going to modify whatever struct object we have at pointer s.
The variable names can be shortened because the function interface follows a common convention.
The really important thing here is a good function name, not a good name for the data pointer argument (a descriptive name could even incorrectly suggest it has a specific type rather than simply point to bytes).
"The data pointer is going to modify the struct" is not the only information that is useful.
Again, it is true in every language that there are some basic patterns to what interfaces look like. That doesn't tell you anything about what the application-specific logic implementing those interfaces is intended to do. For that, it is useful to name things such that they describe what they represent in the domain of the application or library.
If it is a generic function, there are good names for that as well (for instance, some kind of copy function would still have meaningful names like "from" and "to").
But the example given by the author here was not that, it specifically acted on transactions. In that case, the input bytes were meant to represent a "raw transaction" and the output struct is meant to represent a "transaction".
I disagree. The function name gives context as to what they're meant to represent if you understand the convention. One of the conventions in mako is something like:
int btc_tx_import(btc_tx_t *z, const uint8_t *xp, size_t xn);
This function deserializes a raw transaction of `xn` bytes at `xp` and stores the result in the transaction `z`. Zero is returned on failure.
What would be the alternative here? I suppose I could rename `xp` to `data`, `transaction_data`, `raw_tx_data`, or something like that? I don't think it adds any value and it just takes up extra space, making the code less readable.
Your english language description of it gives some good clues to the alternative:
int btc_tx_import(btc_tx_t *transaction, const uint8_t *raw_transaction, size_t raw_transaction_size);
Or since clearly `tx` is already a convention for "transaction", it could be `tx`, `raw_tx`, and `raw_tx_size`. And sure, I have no problem with the `p` and `n` stuff, so it could be `txp`, `raw_txp`, `raw_txn`.
But from your description, the input is a "raw transaction" and the output is a "transaction". Using `x` to mean "raw transaction" and `z` to mean "transaction" is obtuse. You know that the input is a "raw transaction" and the output is a "transaction", but I as a fresh reader, don't, and your code does not help me understand.
You make a good point. I will consider changing the names for the import/export functions (but maybe not the MPI code). That or explain the inputs/outputs in detail in docs.
> I don't think it adds any value and it just takes up extra space, making the code less readable.
I agree here. What they want is to be able to get a superficial understanding at a glance of what the code is doing--in other words they want to give the absolute minimum effort in terms of reading. But when you actually read/write the code and understand it, the shorter names are an advantage. IMO it comes down to who the names are really important for--the reader/reviewer who will likely move on to something else in the next hour, or the person who has actually given some attention to the meaning of the code?
I think the short names also have the advantage of making the logic of the function body understandable at a glance.
I just did this grep for single-character function names on the Mako codebase:
grep -r '\<[a-z](' .
And there were no matches. There were some one-character macros but they were macros repeated dozens of times in a localized area.
It seems like what's being proposed is descriptive function names with simple variable names. If the function bodies are short enough (e.g. fit on a single screen) then this seems like a good trade-off to me. IOW, the variable names are symbolic but the function names are descriptive.
The short variable names should be clear enough if you understand the purpose of the function.
> The short variable names should be clear enough if you understand the purpose of the function.
I shouldn't have to read the function implementation to understand its purpose. Code is buggy! If the function has no name or comment explaining what it's supposed to do, I only have the (often buggy) implementation to go by.
There is no reason to use terse, non-descriptive names in 2021. It's an awful practice that guarantees easy-to-avoid bugs.
I am saying you indeed should have descriptive function names.
I also agree that if the function's name leaves something to be desired then it should be commented.
You are conflating function names--global and relatively non-contextual--with variable names--which have limited scope and rely on the function name for their meaning.
In the setColor example, I would use setColor for the function name and c for the parameter name (with the caveat that C language doesn't have method names, so my reasoning about context has limited applicability to non-C languages)
"Clean code reads like prose" (Uncle Bob C. Martin)
This is quickly becoming my goto standard for measuring how clean my code is, and in my case this means ultra-descriptive variable names. I usually code in two passes: first rough things out using single-character/short names, and then go back and use LSP features to rename the variables using the language-aware tools in any modern code editor.
This is silly. The variable types are already giving you exactly this information. Why would you add a "p" when you already have the "*"? Likewise, size_t tells you that this is a size, no need for the "n".
Instead, the variable names should be used to convey information that the types alone can't convey.
> Likewise, size_t tells you that this is a size, no need for the "n".
How do you differentiate the two input lengths? If I were to rename `xp` to `x` and `xn` to `n`, what should `yn` be renamed to? At the very least, there's going to need to be a `yn` somewhere.
It's very common for code to include the type when there are two inputs to a function (even when written more verbosely): e.g. `thing_len`, and `other_thing_len`.
The `p`-suffix convention can also save you in a situation like this:
int x = 1;
int *xp = &x;
int y = 1;
int *yp = &y;
It avoids naming collisions, and further down in the function, you'll be able to differentiate the pointer and the value. I find it very useful.
If you write multi-precision integer code in C[1] without this convention, you will end up with an unreadable mess. I certainly wish Torbjörn Granlund were here to testify to this.
I don't understand why you chose to prefix your variables with x and y rather than something more descriptive. It seems like x and y are completely arbitrary, which is confusing.
> Contrary to what some people might tell you, multiple implementations of a protocol are a good thing. In bitcoin's case, they are necessary to mitigate the harm of developer centralization.
Important point that I didn't really get until I read more on this topic coming out of Ethereum community
I originally wanted to vendor my libtorsion code and link to it, but it felt clunky since libtorsion pulls in a ton of crypto that bitcoin doesn't need. Also, since I was focusing on just a few algorithms, it gave me the opportunity to optimize a lot of them (in particular, the ECC backend was optimized for secp256k1 whereas in libtorsion it supports all kinds of curves).
Because of all of this, there's probably some leftover comments. That comment isn't true anymore. rand.c is definitely used internally for libmako, just not libtorsion.
But in recent years I have used Go as a replacement for C whenever possible. I think of Go as an improved C, with slightly different performance characteristics.
In your opinion, what is the advantage of writing a program like Mako in C rather than Go?
I considered using leveldb initially, but that would require a C++ compiler and it also requires linking to libstdc++. So at that point, you might as well just write the project in C++, which defeats the point of this project.
Mako uses LMDB. Aside from Berkeley DB, it's pretty much the only key-value store in town if you want a pure C project.
In the end, it worked out well because I really like LMDB. It has a very intuitive API, and it's very small, which sort of matches the spirit of this project.
The zero-copy on reads feature is also amazing. When reading a UTXO from the database, we do no copying and no allocation whatsoever. The UTXO is simply parsed from the pointer LMDB returns. This is something LevelDB cannot do at all.
My first bitcoin reimplementation was written in node.js and called bcoin[1]. So this is my second time reimplementing the bitcoin protocol, albeit in a very different language.
Bcoin was frequently used as a reference along with bitcoin core v0.8.0-v0.11.0 when I felt like double checking consensus functions (among other things).
As an aside, I personally think bitcoin core v0.8.0 is the best version of core if you want to learn bitcoin from it. It's a lot more straightforward than later versions. I personally don't enjoy reading any version beyond v0.11.0.
This is also the reason mako doesn't support taproot yet. That code is very new and isn't present in upstream bcoin. I could try to implement it from the BIPs alone, but I won't know what intricacies are present in the actual bitcoin core code until I actually read it.
When you say "reading bitcoin core v0.8.0", do you mean reading the source code? I want to learn more about the fundamentals (technical part) of bitcoin, but am not sure if diving into the 600k+ lines of src code is worth it. Do you know any way to understand bitcoin (or cryptocurrencies in general)?
I highly recommend the books "Mastering Bitcoin" by Andreas Antonopoulos (technical, for beginners) and "Programming Bitcoin" by Jimmy Song (very technical, advanced).
Its placeholder name was originally "libsatoshi", but I was worried that would cause confusion between this project and bitcoin core.
I spent the past few days thinking of names. Mako came to mind because I had recently been playing through the original FF7 (maybe the first time in ~10 years). It's short, memorable, looks cool. It checked all the boxes. On top of that, pretty much every name involving the words "btc", "coin", etc. is already taken when it comes to bitcoin projects.
That said, apparently there is a wayland notifier also named "mako", so I may have to think up a different name here. =/
Mako was the thing that Shinra almost destroyed the planet trying to extract, so you may end up opening the project up to a lot of Barret+Bitcoin memes. That could be a win!
Not to get in a pedantic FF7 debate, but I don't believe Mako was destroying the planet: Shinra's Mako reactors were destroying the planet. Mako energy was a tool, and it could be used for good or evil, just like cryptocurrency. In the case of bitcoin, I believe it is being used for good.
I don't mind pedantic FF7 debates! I think Mako was Shinra's branding on the energy they obtained by mining the Lifestream via the Mako reactors; I don't think it was some natural force.
Edit: Hmmm.
"Mako (Japanese: 魔晄, Makō; literally meaning "magic light") is a liquid substance featured throughout Final Fantasy VII. It is the condensed form of lifestream and the primary source of energy used by human beings throughout the world. It is comparable to nuclear energy. Lifestream can condense into Mako via both natural and artificial processes. The terms "Mako" and "Lifestream" are often used interchangeably because one is a derivative of the other."
Bitcoin is a protocol much in the same way HTTP is - so just like there are multiple implementations of web browsers, there are also multiple bitcoin implementations. This is the first in C.
I said its "incredibly impressive" in case you missed that.
Other than that my post represent only my opinion and my opinion is that the result of this project is not worth its time.
Feel free to have another opinion but dont tell me I dismissed something when I did not. You did with my opinion for no reason. Have your own opinion, express your disagreement, do whatever but dont dismiss mine and come around with some BS rules I didn't break.
Would you mind reviewing the Show HN rules too? Your comment broke them as well, and we're especially trying to avoid the culture of shallow putdowns when people are sharing their own work.
Not sure. I suppose I could add a style or coding convention guide somewhere in the repo to code that may seem confusing initially.
Basic rules are: return values on the left side of params, and functions generally return a boolean for success or failure.
The function you mention is parsing a fixed-point integer string (from a json string) and returning an int64_t. It will return 0 if it's not a syntactically valid integer, or if there is some kind of overflow, etc.
> It looks super clean to me, the only exception being for the love of all that's good and holy always brace your one-line if statements and loops.
> You don't want your own "goto fail" do you? ;) [1]
Ah, that was my style 7+ years ago. Then I started regularly contributing to an open source project which did not add curly braces on one-liners. I wanted to match the style of the project despite it feeling unnatural to me. Within a few weeks, my style was changed forever (for better or for worse).
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
{
goto fail;
}
{
goto fail; /* MISTAKE! THIS BLOCK SHOULD NOT BE HERE */
}
Copy and paste bugs can play out in any number of ways.
Newer GCC will catch the goto fail by noting that the indentation is wrong.
$ gcc-11 -Wall gotofail.c
gotofail.c: In function ‘main’:
gotofail.c:5:3: warning: this ‘if’ clause does not guard... [-Wmisleading-indentation]
5 | if (argc > 42)
| ^~
gotofail.c:7:5: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘if’
7 | goto fail;
| ^~~~
$ cat gotofail.c
#include <stdlib.h>
int main(int argc, char **argv)
{
if (argc > 42)
goto fail;
goto fail;
return 0;
fail:
return EXIT_FAILURE;
}
Of course, it could be that the indentation is not wrong. If somneone runs the code through some automatic formatting, the problem will not then be diagnosable that way.
I wasn't originally going to look at it but this claim got me intrigued. I'd say compared to a lot of C code I've seen, yeah, you aren't lying, the average cleanliness is so clean it's sick ;) Well done!
Perhaps the GP means the general crypto nature of the project lends to opaqueness. For example, I randomly clicked into https://github.com/chjj/mako/blob/master/src/crypto/chacha20... and one could ask where the magic numbers on lines 41-44 come from. Maybe it's explained in the references, or just specified without explanation as part of the protocol. I looked at the reference implementation at https://cr.yp.to/streamciphers/timings/estreambench/submissi... which has the equivalent line at L66. After finding the implementation of U8TO32_LITTLE which does some bitwise-or'ing and shifting of the first 4 items ("expa" for either sigma or tau), in Lisp I quickly verified:
So perhaps an argument could be made that your implementation could be slightly cleaner by using those sigma/tau constants instead of the magic numbers for the first four entries in the initial state, which would show how they're related, but that doesn't really take away the magic-ness of them, just moves the question elsewhere to why those magic strings. And it wouldn't surprise me if the answer is just the strings were arbitrarily chosen; the sigma/tau naming in the reference suggests a common domain convention to me (I'm not a crypto guy though so I don't know).
Still overall this is a minor point, it looks like a clean piece of code (especially given what I'm used to seeing from browsing other bits of crypto code here and there) and random sampling of other parts of the program are at least as nice and nicer, especially when they're not doing anything complicated, which I've seen plenty of C code make a mess of. While it might not always be clear why something is done, it's at least clear what's going on and where to jump for more context, what the interfaces are, and things are well-named such that I could probably go find references for some of the whys if they actually needed to be answered. e.g. in mempool.c's btc_mempool_verify, I have no clue what "Annoying process known as sigops counting" refers to, but even without that comment, the simple if condition's pieces are well-named so I could go search for more info on sigops, which I wouldn't even expect to be part of the code (at least here) anyway.
(Edit: It also occurs to me that a low ratio of comments and explanation to code could also be what is meant by obfuscation. To me that's not a large part of the cleanliness concept, though it does factor into other qualities. I consider the sqlite codebase to be some of the most beautiful C code on the planet, wonderfully documented and tested, and as a whole its beauty more than makes up for some stylistic or organizational quirks I don't exactly like. Not to say that style is totally unimportant, but neither this nor sqlite are exemplars of ugly C with lots of insane typedefs, super macros, inconsistencies, syntax abuses everywhere, and mngld_nms like vowels cost $100 a pop. For a project I consider "not so good" C code, I reference Enlightenment Foundation Libraries...)
I feel this comment is misguided. Plenty of new code is written in C, particularly in the scientific computing space. If the code in question doesnt do lots memory management and expects the user to provide a memory block to be populated with values, then I see no issues. C code tends to be faster than other implementations even without optimizations. Moreover, lots of interpreted languages have good interoperability with C so a C compiled code can be called in many languages using simple high level interfaces.
Independent researches (Microsoft/Mozilla) showed that around 70% of security vulnerabilities are caused by memory safety bugs. That's one heck of a "memory safety and stuff" :)
That reference and the original blog post where 70% was indicated is only dealing with security vulnerabilities in Microsoft's software, not everyone's.
So while other software that isn't Microsoft most certainly has memory safety bugs, this blog doesn't speak for those, only Microsoft's.
The only part that Mozilla is indicated is in reference to Rust.
If one of the largest software companies around can't get memory safety right, it does make a guy start to think that maybe most people should avoid having to handle it if they can.
I don't have a fully developed opinion on that, so I won't try to come up with one on the spot. My comment is purely because I felt the parent left out details about the blog post that should have been pointed out.
Not to belittle the work, but it is always prudent to research what the other implementations are before possibly doing duplicate work and providing that comparison along with the new work for justification of it. For example, https://en.bitcoin.it/wiki/Software#C shows picocoin (libccoin), which Github also lists as being C. What makes this or what is desired to for this to be different from the alternative(s)? It certainly seems a non-trivial matter to have completely reimplemented the necessary GNU Multiple Precision Arithmetic Library or similar dependencies such as specific compiler extensions.
Right. Perhaps I should explain this in the readme.
To be clear:
Libbtc is a small bitcoin library meant for everyday things: signing transactions, maintaining an SPV wallet, etc. It's good at what it does, but you wouldn't be able to actually sync and validate an entire blockchain with it.
I suppose picocoin is the most similar to mako, but I think it falls short of being a "full node" in that it doesn't provide a mempool, miner, or RPC (from what I can tell). I'm also not clear on how it is storing UTXOs. It looks incomplete right now, but maybe jgarzik could give more insight on that.
Mako is a _full_ reimplementation of bitcoin. It's an alternative to bitcoin core.
It's not a paper, and no academic who is not interested in it anyway is going to make stupid comments about related work. No need to justify making what you wanted to make. Well done.
In my opinion, given the application is cryptocurrency, reimplementations are pretty valuable. It’s important to avoid systemic failure due to a large percentage of nodes having the same exploit or bug. A fully featured reimplementation like this is really great for Bitcoin!