Hacker News new | past | comments | ask | show | jobs | submit login
Rolling Your Own Crypto (loup-vaillant.fr)
192 points by svenfaw on Dec 20, 2016 | hide | past | favorite | 150 comments



Forget about creating your own crypto. I have had endless troubles just using crypto. Obscure options that aren't explained. Examples that don't actually work.

C# had an option that I finally got to work but the documentation specifically said not to use it; but the other choices didn't work.

I had to encrypt something in PHP then decrypt it in ColdFusion. Despite being the same algorithm, key, etc, it didn't work. Strangely, the exact same data in the other direction (CF->PHP) worked just fine.

I'd just like the libraries I use to have a simple choice: Encrypt(data,key) and Decrypt(data,key) and be done.


I found the issue to be more insidious.

Don’t roll your own crypto. Ok, that makes sense.

Don’t implement existing crypto algorithms. Ok, arguments sound good.

This leaves the options of using existing libraries. Cool. So let’s go find an existing library to deal with validating an SSL certificate chain in Python.

PyCrypto (https://www.dlitz.net/software/pycrypto/api/current/) - hmm, so basically the OpenSSL API, probably a minefield unless you are really up on crypto.

M2Crypto (https://gitlab.com/m2crypto/m2crypto) - umm, so no real docs. The recommendation is to go read a book about network security with OpenSSL. So after reading a network security book, I should be good to wire together some OpenSSL, right?

cryptography (https://cryptography.io/en/latest/) - sweet, we have docs! Hmm, so for humans we have Fernet and the ability to look at X.509 certificates. But nothing about validating. Oh, but there is a stalled PR (https://github.com/pyca/cryptography/pull/2460) from 14 months ago, to verify a certificate signatures.

Which then leaves the typical programmer to – hack something together?

So it is (practically) 2017, and using one of the most popular programming languages, you can’t verify an SSL certificate in a sane way without becoming journeyman cryptographer.

I like crypto, but using cryptography to integrate with existing protocols/standards sucks. Can we really not have an end-user focused implementation of RFC 5280 with a few knobs to turn?

Perhaps people would stop "rolling their own" crypto if there were half-decent, maintained, documented solutions out there. Maybe some day!


> Perhaps people would stop "rolling their own" crypto if there were half-decent, maintained, documented solutions out there. Maybe some day!

For transport security, HTTPS or SSH. Why go lower level?

Cert based stuff sucks, and cert management sucks. All of these suck harder because of backwards compatibility. Slowly, people are coming round to the fact you need to just deprecate stuff and get on with our lives. But the most sucky thing is OpenSSL.

Sometimes, I think OpenSSL has actually done more damage than good. You accept a shitty library because everybody uses it so it must be secure, right? Wrong. And the API is so hostile, the docs are awful. Most things that use it or try to replace it are awful (M2Crypto, PyCrypto, PyOpenSSL, even urllib3 [0]), as if the awfulness of OpenSSL seeps into your thinking - I know it's happened to me once or twice, just interfacing with OpenSSL. Even cryptography suffers from being based on OpenSSL.

Every time I have to use OpenSSL on a new project, I can't wait to see it die.

[0] https://github.com/shazow/urllib3/blob/master/urllib3/util/s...


> For transport security, HTTPS or SSH. Why go lower level?

Because not everything on this green earth uses HTTPS and SSH.

Examples:

(1) I want to implement a SAML consumer, and per the spec, I need to verify signatures. Crypto.

(2) I want to use a client-side cookie so that users can remain authenticated in the current browser session. Crypto.

(3) I want to issue a URL with a signed assertion that the owner of the content has granted permission to access it. Crypto.

Asymmetric encryption is not the be-all end-all of cryptography.


I'm not going lower level – in fact, I'm not even dealing with a communication protocol. Certificates are also used for document and executable signing.


Yeah, fair enough - it was more of a general assertion that it's easy to forget that high-level solutions exist. It's NIH.

Does using certs still suck though? I get that it's a hard problem, but cert revocation and distribution, that's really hard even for OSes.


Look at this code from urllib3 (requests bundles this code): https://github.com/shazow/urllib3/blob/master/urllib3/contri...

If you install the dependencies, requests uses the package cryptography to use openssl to validate certificates, by running that code.

Note that openssl does not validate certificates the way Mozilla Firefox does, even if they use the exact same root CA bundle, because some of the validation logic is code in NSS instead of data (e.g. certificate transparency requirements). Ryan Sleevi wrote on tweeter that Red Hat are working on doing something to let applications using openssl use Firefox's full cert validation logic.


> Look at this code from urllib3 (requests bundles this code): https://github.com/shazow/urllib3/blob/master/urllib3/contri....

This only underscores/validates wbond's critique though. Apparently the advise on how to figure this stuff out isn't even through "RTFM", docs or an easy to use library, it's "go look at the dependency of this other library which uses that other dependency that you want to use on how it is done". And then copy-paste that and hopefully not get anything wrong along the way.


(Just to note, this wasn't me looking for help in how to accomplish that task.)

That said, does the proverbial "you" know all of the chain validation features that are NOT implemented by OpenSSL when using code such as in urllib3? What if you actually care more about revocation than utmost in connection performance? How does one trust a custom CA root? What if you want to verify a cert chain for something other than a TLS connection?

My main point is: here we are with people needing crypto, and (it seems) no one has taken it upon themselves to write good crypto libraries for (some of) the types of tasks that are fairly common. The obvious exception being NaCl. However, that has issues with pragmatic things like distribution due to CPU optimizations, hence libsodium.


Well, this is a very particular case and it's only so bad because you're dealing with SSL and X509. X509 is a clusterfuck of its own. OpenSSL or OpenSSL-esque (like LibreSSL or BoringSSL), but in OpenSSL this essentially just boils down to X509_verify_cert, you really don't need to think much about the crypto.

You might need to care a whole lot about X509 when dealing with loading your certificates and the complete chain though. But overall if you have a nice python wrapper that shouldn't be too bad. Not pleasant, but not too bad.

We do have several quite pleasant cryptography libraries like libsodium, SJCL, etc, but none of these deal with x509.


Or PKCS#12, or PKCS#8, or ECDSA (OpenSSL has determinism now, but different than RFC 6979), or ASN.1. Basically anything where you are using crypto as part of another protocol, not simply for encrypt/decrypt, sign/verify.

So using crypto code will be good once we rid the world of contemporary code and move into the future where all protocols are implemented on top of poly1305, ed25519 and chacha20. Considering how long it took to get rid of SSLv2 and v3, I just think perhaps some effort should go into making good, usable solutions for crypto that needs to be used now.


"but none of these deal with x509"

Thus you have proven his point.


His point was that crypto wasn't pleasant. In reality it's that x509 isn't pleasant.


Serious question: what are the common, practical situations you've run into or can think of where you need to be validating certificate chains yourself in your application code?


Document signing is what pushed me down that path. It uses the same ecosystem and technologies as TLS, but cares more about revocation.


Ah I see. Seems a bit off the beaten path already, though. Were they some sort of existing documents, generated by something not easily available server side?


Did you require online (not cached) revocation check? That would require buying more HSMs.


No, I'm just implementing code that signs and verifies PDF digital signatures for compliance with EU regulations. I'm not dealing with HSMs, or even the PKIX infrastructure other than verifying a certificate is valid according to Adobe/ETSI and generating/checking signatures. The certificate issuers need to provide the CRLs and OCSP responders/responses.


Why is this downvoted? Could the downvoter please comment?


Online revocation check doesn't require buying HSMs. HSMs also usually operate on a much lower level / aren't really connected to the internet.


If you can't use a cached OCSP response and instead need a fresh one (or the cache is valid for a short period only), that means the CA must sign a new OCSP response much more often.

The key the CA uses to sign OCSP must be held safely because it is important, even if that key can't sign new certificates. I think it belongs in an HSM.

I've read Let's Encrypt spends ~98% of the time of its intermediate signing keys on OCSP, not on new certs. If the OCSP was good only for an hour instead of a week, they would need to perform many more signatures per unit of time which would require more hardware.


But that's an implementation detail. Sure, some HSMs will make signing faster, some won't. You don't need to use HSMs which will have a considerable markup too. You can use more servers if you prefer.

And anyway, that's the server case. OP just wanted to check the revocation as a client as far as I can tell, which definitely doesn't require any extra hardware.


In some cases revocation info will be live, in others it will be stored inside of the document.

My use case involves embedding it, but also being able to verify it (Adobe distributes trusted CA certs through a signed PDF). Never would I need to run my own PKIX infrastructure.


I want to implement SAML (as a consumer).

(To be fair, it can be done without certificate chains, but it could also be done with chains.)


Yep, I saw your more general list further upthread as well. I'd argue that much of the time these things are bad and essentially a kind of unnecessary, self-inflicted cryptography. In most cases the same things can be accomplished with a secure authenticated channel, someone hanging on to the relevant state and a cryptographically secure random id. A prime example is all the 'state-carrying authenticated/encrypted session cookie' infrastructure typically built into ruby and python web frameworks.


And you used Python as the example, probably has the easiest cryptography libraries to use.

C/C++, Ruby, JS, etc. are much, much worse in usability.


The Python eco-system is a ghetto. I don't understand why, it really is one of the most popular languages.


Crypto APIs being hard to correctly use was the principal motivation behind Daniel Bernstein's nacl [1]. As others have said, nacl and its reimplementation libsodium try to adhere to this philosophy.

[1] https://cr.yp.to/talks/2012.08.08/slides.pdf



And Javascript:

https://tweetnacl.js.org/

And Python:

https://github.com/pyca/pynacl

This is one of my projects using these libraries, a PGP replacement based entirely on the NaCl API:

https://github.com/Spark-Innovations/SC4


I used that in a project and found that the Java and the Javascript implementations I was using did things slightly differently, so I had to modify the Java version to behave like the Javascript version.

Simpler than Openssl or whatever, but still not ideal.


I've run into the same issue.

One of the main frustrations is you don't know if the output is correct until you manage to reproduce some known key/document exactly.

When you're debugging anything else, you can sort of see what's wrong as you come closer to a correct solution. A crypto scheme is only correct (you've put the right things in the right places) when it's exactly right. Until then, you have a jumble of characters.

It took me ages to match ECDSA on java with Python. In the end, it was because of some fine print where I needed to hash one version but not the other before signing.


It's even worse than that, unfortunately :-( You can absolutely get the correct answer for a bunch of test cases and _still_ get insecure crypto; for example: small subgroup attacks on FFDH with p-1 smooth, small subgroup/off-curve attacks in ECDH, static/predictable k for ECDSA, MtE/M&e instead of EtM... Lots of passing unit tests, no good crypto.


What's especially unfortunate about this is that experts would caution you against using ECDSA at all.


Tell me more?

I was doing some bitcoin calculations and needed it.


If you have to use ECDSA you have to use it. Bitcoin is an example of where you have to use it.

Otherwise, though, it's an inferior and outmoded signature system. It's got the worst random nonce dependency of any modern crypto primitive: if it's even biased, just a little bit, you can recover keys from groups of signatures (a full repeat instantaneously destroys security with a single pair of signatures). It's weak against simultaneous attacks on multiple signatures, so it minimizes the effort attackers need to spend. Meanwhile, it's inefficient compared to the alternatives, so it tends to maximize the effort you have to spend. It's also hard to implement without side channels.

Modern cryptography engineers would recommend something like EdDSA instead.


Do you believe CFRG took so long to standardize EdDSA because of natural bikeshedding, or because the NSA worked to slow them down?


It's obviously bikeshedding.

People have been misconstruing the bikeshedding and cliquishness of the IETF as enemy action for decades.

John Gilmore has a story about how, during IPSEC standardization, someone was pushing for a CBC chained-IV construction hard, and that he was both confident it could only be enemy action and had sources suggesting it was. This came out right after the Snowden leaks, so everyone took it seriously.

But if you look at it context, I'm pretty sure the people he was talking about were Perry Metzger and Bill Simpson† (both clearly not NSA plants). They were arguing with Phil Rogaway --- calling one of the most famous and prolific cryptographers a "so-called" cryptographer when he cautioned them not to do dumb things like chaining IVs.

There's a message thread you can look up on the Internet where this happened. Rogaway even got a petition put together from a bunch of other cryptographers, including Rivest. No luck! The IPSEC standards committee ignored them.

A decade or so later (earlier, really, but nobody took Bard's paper seriously) we discovered chained CBC IVs lead to the BEAST attack on TLS.

Enemy action? No. Crypto standards groups don't need enemy action. They are intrinsically evil, and need to be avoided.

I think this is the case, but I haven't confirmed it with Gilmore; maybe he's talking about a different controversy during IPSEC standardization. But these are the ones where the details fit from what I can tell.


lordnacho, as tptacek wrote below (which also applies to DSA);

>a full repeat instantaneously destroys security with a single pair of signatures

Roughly--assuming ECDSA parameters (H,K,E,q,G)--where H is a hash function, E the Elliptic Curve over finite field K w/ point G of prime order q. Suppose two different messages m and m' have been signed with private key x using the same (non-ephemeral) random nonce value of k.

According to ECDSA Signing these messages m, m' become signatures (r,s), and (r',s') where;

  r = r' = kG,

  s = (H(m) + x*r)/k   mod q,

  s' = (H(m') + x*r)/k  mod q.
Observe that,

  (H(m) + x*r)/s = k = (H(m') + x*r)/s'  mod q.
Or,

  x*r(s' - s) = s*H(m') - s'*H(m)  mod q.
Which allows us to recover the private key x.

Since,

  x = s*H(m') - s'*H(m) / r*(s' - s)  mod q.


The problem is really much worse than this. You don't merely need a non-repeating nonce (the way you can get away with a GCM nonce that increments by 1 every session): you need an unbiased nonce.


I believe an natural segue here is to remind people about cryptopals (especially set 8). Ie., I don't have the chops and wouldn't attempt to writeup EC/DSA nonce bias and partial key exposure attacks better than you all--not to mention the challenges regarding GCM. Cheers.


I had to encrypt something in PHP then decrypt it in ColdFusion.

That still ends up being 'rolling your own crypto' and mostly the kind of crypto the article is talking about. The author is describing putting together cryptographic constructs from the sort of low-level pieces provided by your typical runtime crypto library. This is probably the far bigger problem with the admonition - it's vague enough that someone might think that since they're not actually implementing AES themselves, they're not rolling their own crypto. Stackoverflow is full of questions just like your PHP/Coldfusion problem.


Tell me about it. I spent WAY too much time just yesterday, trying to use the Bouncy Castle OpenPGP stuff to encrypt a message such that I could turn around and decrypt it with GnuPG. In the end I got it to work, but what a rigmarole. Crypto API's are, as you said, full of many obscure options and bizarre permutations of how components can be put together, that it's ridiculous. And they also seem to change rapidly, meaning documentation and even Stack Overflow answers are often out-of-date and useless.

The thing that saved me finally turned out to be that Bouncy Castle actually do package some example programs in with the source. So once I cloned the git repo and dug into those, I was finally able to find an example of doing exactly what I needed. But trying to piece it together from the javadocs and the other online documentation? Hell no... I'd have been working on that until the heat death of the universe. :-(


Out of interest; why do you care about GnuPG decryption specifically? (I write cryptographic software. I care about usability. I might want to provide people with GnuPG-compatible crypto, but that's not necessarily where I'd start.)


I'm working on a SaaS offering where I might need to be able to email sensitive information to customers. Or even just make it available for download. In either case, I'd like the data to be encrypted. So I'm planning to ask users to upload a PGP/GPG compatible public key so that we can encrypt this stuff for them. I'm focused on GnuPG specifically just because it's freely available, well known, available on all the important platforms, etc.


Just a word of caution: If your customers are on Windows, using Gpg4win might not be a very smooth experience. I just helped someone to install that package and I found the UX to be really suboptimal. For one, it’s GTK-based, which makes it stick out like a sore thumb in a Windows environment.


Yep, that's part of the risk of this. I thought about that, but decided to give it a go on the basis of thinking that the people using this service will be pretty tech savvy, and will be OK with installing and using some variant of GPG or the commercial PGP. If that proves not to be the case we'll come up with a different approach.


When I tried this (.NET PGP) I could only get it working one direction based on this abandoned open source library:

https://crypter.codeplex.com/


With SpongyCastle on Android, I didn't even get the example programs to work. Maybe in trying to modify it from using disk files (inconvenient when working with data from an EditText) to InputStreams, I messed something up, but man was that frustrating.


If it's something sparkling new you probably want to use NaCl or for more involved use cases a Noise implementation.


It's the opposite, I think: Noise is good for less-involved cases, the kinds of problems you would ordinarily solve by just using TLS. Nacl is better for when you need something customized to an application.

Incidentally: Noise is great, but it's also fine to use TLS, if you're just a little careful.

See also:

https://gist.github.com/tqbf/be58d2d39690c3b366ad


A bit of Googling reveals this to be a reply to this blog entry by Colin Percival, for those interested:

http://www.daemonology.net/blog/2009-06-11-cryptographic-rig...


The "Avoid /dev/random" recommendation, can you elaborate on that?



You should post these reasons in your gist. As I thought, there are a few reasons one might want to use /dev/random.

Blanket "don't use X" statements are much more useful with context and information about why not to use X (and when using X may be required, useful, or a better choice).


There's no reason to pull bytes out of /dev/random. In the rare circumstance that you're worried about it being early in the first boot and getrandom doesn't exist, you might throw away a single byte from /dev/random to make sure it's ready. But then you're done, and use urandom. And your program almost certainly doesn't need to worry about that circumstance.


Unless no programs ever need to worry about that case (which is untrue), why not mention in a security gist purporting to be a list of crypto implementation suggestions?

Why not be complete and list the caveat, instead of incomplete and possibly lead that one in a thousand down the wrong path?


Far more than one in a thousand will use it as an excuse to do the wrong thing.

And lots of articles about best practice leave out very rare edge cases. No advice is always always right.


> lots of articles about best practice leave out very rare edge cases

I don't think that's an argument in support of leaving out edge cases, and even if you disagree in general, I think security articles should be more diligent than random best-practices articles.

> Far more than one in a thousand will use it as an excuse to do the wrong thing.

If an article says "always do X" and someone does X and the implementation is subtly wrong or insecure, that's an argument to improve the article (which is what I'm suggesting).

But if an article says "almost always do X, and be aware of this edge case" and someone gets it wrong by not following instructions, then I don't think the article is at fault.

I would personally want my security advice to cover any edge case I could think of, even if only in the footnotes.


No, there aren't. Don't use /dev/random.


The reasons I know about for using /dev/random are quoted in the article you yourself just posted here...

https://sockpuppet.org/blog/2014/02/25/safely-generate-rando...


Since I wrote that article, I'm pretty confident that it doesn't recommend ever using /dev/random.


I said the article mentioned reasons one might want to use /dev/random (or be careful about blindly using /dev/urandom), not that the article recommends using /dev/random. But since you wrote the article you should realize this.

Just because the article recommends seeding over using /dev/random early at boot time doesn't mean the original gist shouldn't mention these issues so that readers are aware of them.

Just saying "use /dev/urandom all the time" without mentioning the caveats we're discussing here means someone might read your article and in a (misguided) appeal to authority implement something solely on your recommendation without doing their research properly first.

I would have thought someone writing security articles would want to avoid misleading someone into implementing something that's accidentally insecure. I hope I'm not wrong.



> Here's another article you can argue with: > http://www.2uo.de/myths-about-urandom/

Which backs up my point that using /dev/urandom blindly without knowing about some of the edge cases isn't a good idea, and lists those edge cases and what to do about them (which is what I suggested you do too).

> And another: > http://security.stackexchange.com/questions/3936/is-a-rand-f....

Which also lists the edge cases I'm talking about.

Hey look if you just want to say "I'm aware of the edge cases and don't want to put them in my gist for others to see" that's fine with me, but dodging the issue by claiming there are no edge cases (and then listing 3 articles which all mention the edge cases) isn't the right reply I think.

Don't be surprised if someone suggests that a gist listing security best practices list some edge cases that go along with the blind advice too. Feel free to disagree, but at least disagree honestly.


I honestly disagree.


Appreciated.


Have you looked at Libnacl or Libsodium?


> Encrypt(data,key) and Decrypt(data,key) and be done.

That's the easy part. You can find a library that does exactly that, and only that. Will probably be just one file, not even a library [1].

But then you'll also need padding, MAC, signatures, key distribution, web of trust, etc.

[1] Example: https://github.com/dimview/speck_cipher


The linked repo is not a solution, single-file or otherwise. Speck is not only a weird and unusual block cipher to pick, it is more importantly only a block cipher. It does not let you encrypt more than a block size at a time, and it does not give you authentication! Both of those things are anything but trivial.

Furthermore, Speck has a variable block size between 32 and 128 bits. If you pick wrong, safe-looking things like CTR+HMAC-256 become unsafe (it's not hard to overflow a 32-bit counter).


Well, it does Encrypt() and Decrypt(), which is what was asked. And it's not particularly difficult to pick 128 bit block size (after all, bigger is better).

The problem is, it's not nearly enough to just do Encrypt() and Decrypt(). Only when you see that, the benefits of a real crypto library NaCL become apparent.


It has things called "encrypt" and "decrypt", but it does not do anything a programmer could reasonably consider "encrypt" and "decrypt". If you don't authenticate your ciphertext, there's a decent chance your ciphertext won't remain secret. It doesn't give you an option for randomized encryption. Compare with Fernet: you get the same API, but you get to encrypt any message, and it's securely authenticated such that it stays secret. (secretbox is fine too; but it makes you pick a nonce).

Additionally, "bigger is better" is not a reasonable way to pick primitives. Defaults matter. Perf matters. From experience and stats, people pick according those two vectors a lot more often than they pick along the "bigger is better" axis. When's the last time you saw an RSA-16384 signature? AES-256 might have twice as big a number as AES-128, but how much faith do you have in that extended key expansion? You use AES-128 a lot more often than you use AES-256, and you definitely use RSA-2048 a lot more often than RSA-16384.

I don't think you can responsibly insist that this was an answer to the question. It is not a secure way to encrypt messages.


Well, that's kind of the point - the default Encrypt(data,key) should also implement padding, combine blocks for arbitrary length messages, and include MAC; so that it's reasonably secure out of the box with the default settings. Key distribution is a different issue, but doing the abovementioned things would be sufficient to ensure that you can just run Encrypt/Decrypt(data, key) and expect most of the common tampering attacks to be impossible.


Why would you suggest someone investigating crypto for the first time use the NSA's lightweight cipher? That's a really weird recommendation.


The point I was trying to make is that complexity is not in encrypt() and decrypt(). Those things can be very simple, yet cryptographically strong.

Complexity begins when you try doing something more involved than just symmetric block encryption.


If by "symmetric block encryption" you mean "use the block permutation of the cipher directly", then nobody ever does that; the people who make the mistake of using the cipher's "Encrypt" and "Decrypt" are the ones who end up discovering that the default mode of a cipher is ECB.

I think I sort of see your point, but SPECK is a weird recommendation.


Forget that.

How is the IV stored for example.


Block ciphers in ECB mode don't use an IV. If you were going to use this random crypto code on github (bad idea for various reasons), you'd want to wrap it in GCM or something similar (then you get to deal with IVs, tags, etc). I suppose part of the issue that people don't want to deal with (or don't see the need for) the added complexity of IVs and authentication.


You can't just "wrap" SPECK in GCM-mode! In many of its configurations, it has a block too short to safely run GCM in.

Don't roll your own crypto.


By many configurations, do you mean with a block size ≠ 128 bits? I should've specified that when I posted (notwithstanding issues with rolling your own crypto)


Umm this is a bit off-topic , but I don't know how else to reach out. But I have been not able to access Star Fighters due to a certificate failure, could you please check it out or point me towards whom to contact,


The founders wound it down. Thomas is threatening to actually do a write-up about the experience sometime (poke, poke).


I really am doing it (I also really am publishing a bunch more challenges). I just had dinner with Patrick last night and talked about it a bit. I've got a lot on my plate at the moment, though.


Cool - thanks to all three of you for Starfighter. I'm sad it didn't work out as envisioned. Looking forward to the eventual write-up when it fits into your time.


We will for sure be taking another whack at this down the road, and in the meantime I'm pretty psyched to apply the same ideas again to a hiring practice I actually own (we'll surely be doing the same thing at Latacora); at least I'll get another finger-wagging blog post out of it!


API is Encrypt(data,key) and Decrypt(data,key). There's no IV. It's just a block cipher.

Once you start adding things like CTR mode, MAC, etc. the API is no longer as simple.


I just found out about this 3 year old .NET crypto best practices book / library this past week:

http://securitydriven.net/inferno/


> I had to encrypt something in PHP then decrypt it in ColdFusion. Despite being the same algorithm, key, etc, it didn't work. Strangely, the exact same data in the other direction (CF->PHP) worked just fine.

Padding issues most likely.

PHP might auto detect padding, while CF might not.


You have to applaud him for trying. Building anything as a learning exercise is an investment in yourself, and I'm encouraged when others are ambitious about it.

We've all had "don't roll your own crypto" pounded into us, but nothing teaches a lesson as soundly as trying something yourself. I've tried, failed and learned a lesson from far less ambitions endeavors.


a great rule would be "try to roll your own crypto, but only as a learning tool, don't try to use it"


Roll your own crypto, but for the love of god don't release it.


Quoting an older post of mine:

The 'don't roll your own crypto' argument is mostly just shorthand to 'defer to the opinion of experts, use ready-made constructs when possible, and if not, then exercise caution when hooking crypto primitives together in unproven ways'. [1]

Crypto code, like other library code, is question of trust.

Do I trust Daniel Bernstein? Do I trust Joan Daemen, who is half of the AES team and a quarter of the Keccak team [2]? In practical matters, do I trust tptacek [3]? Yes, I trust them, until people more educated in cryptography than me cast enough doubt or prove it otherwise -- but you might have a different model of who you trust. But ultimately, you're the one who answers to your systems.

It's also a bit like science where we come up with a hypothesis (this seems to work...) and then try very hard disprove it, so our knowledge evolves as we go. It's important to understand that there was a time when it was best practice to use certain primitives that are now considered broken, and this is okay -- assuming we all upgraded our systems since. For cases then that assumption can't hold true, we need to account for that risk.

This is also why it's wise to go with cryptosystems that receive a good amount of peer scrutiny. Your homegrown secret sauce might indeed be super secure, but few will publish papers on how they can obtain collisions on a round-reduced version. It's network effects, it's 'given enough eyes' all over again.

[1] https://news.ycombinator.com/item?id=12400040 [2] https://news.ycombinator.com/item?id=12766941 [3] https://gist.github.com/tqbf/be58d2d39690c3b366ad


Actually, it's more a question of history; history shows that it's too easy even for very, very smart people to think they've come up with something tough to crack that's actually quite easy to crack by an avenue they hadn't thought of. But there are exceptions, and Thomas Jefferson was one. He rolled his own and it was very, very good. Yes, the authorities are fallible, and frequently these days deliberately corrupted. Now I'm contemplating whether I should say more. Stopping here for now.


you can trust their skills and ability, but trusting their intentions shouldn't necessarily follow.


If I trust someones cryptographic abilities but don't trust their intentions, then I should still use their cryptosystems over my own.

If I use e.g. Bernstein's cryptosystem and he's evil, then he and whoever hires him can read my data.

If I use my own or your cryptosystem, then either the cryptosystem or implementation definitely (with a much, much greater confidence than any trust issues) is horribly broken due to some bug, oversight or side channel, and I just haven't noticed yet. So the end result is that everyone can read my data - sticking with someone evil would have been more secure than rolling my own.


Would any body else put the same trust on whatever authority the external auditors or pentesters display?


Most cryptanalysis is public; there's little reason for public researchers to withhold cryptographic breaks. The only reason that'd happen is if you're an intelligence agency and you'd like to keep a vuln to yourself (those are less likely in crypto than in software in general).

In that light, it'd be perhaps better to phrase trusting AES as "not distrusting the union of everyone who's tried to cryptanalyse AES", rather than trusting the Rijndael team specifically. In many cases, especially djb-brand crypto, there's even less need for trust: the way you derive the X25519 curve is extremely well defined (or, as djb puts it, "rigid").


All this is far and away from my point. Paranoia & authority aren't the only two options here.


I am interested enough where I would read a long form article on this kind of stuff, and found this fascinating. But the end kind of left me feeling dejected, as the article I guess is just a parody. Which is fine, but I probably wouldn't have spent as much time trying to take in what I presumed the article was trying to teach. I would really enjoy an article like this that actually goes step-by-step so I could learn something like this start to finish (though I would never implement).

From the article, "Last Step in Crypto:"

Get a PhD in cryptography. You need to be an expert yourself if you ever hope to invent a primitive that works. Publish your shiny new primitive. It may withstand the merciless assaults of cryptanalists, and be vetted by the community. Wait. Your primitive needs to stand the test of time.


This may only be sort of relevant, but I write a math/programming blog on which I implemented a (very inefficient) elliptic curve crypto library from first principles. This includes all the math background, and implementations of a few working protocols. It might be a good starting point if you're looking for resources like this (exposition-wise, not in terms of a practical crypto solution).

https://jeremykun.com/2014/02/08/introducing-elliptic-curves...


I really like this series, as I do everything on your blog, but there are two significant concerns I have with it that are relevant to the thread:

1. It proposes that ECDH is secure, as a protocol, so long as the curve parameter is carefully chosen. But this just isn't true, or at least, it's true only given a technicality that moots the point. For instance: when accepting a point from a counterparty in ECDH, you have to carefully validate that the point is valid on the curve you expect to be working on, or else your own computation might both be confined to an unexpectedly weak curve and disclose information about the results. This is one of Sean's cryptopals set 8 challenges, and it's one of the better and more surprising exercises that project came up with.

2. It suggests that it's reasonable for people designing cryptography to come up with their own curves. But in reality, nobody ever does this! We're increasingly confident about the structure of curves we want to be using (you want curves for which the math rules are consistent and don't require special cases, for which it's easy to convert between equivalent curve structures for signatures and key exchange, with prime structure that makes the curve math fast). Once you find a good curve there (25519 is the best-known example, for its security level), there's practically nothing to be gained from using any other curve.

I get why you walk people through picking a new curve! It's a great exercise; playing with very "small" curves in code is probably the best way to get a feel for how elliptic curve works. But this is the kind of place where people rolling their own stuff can get into a lot of trouble.


Yup! My goal was more to explain the math than anything else. I was planning to extend this for 25519 and discuss the alternate standard forms, but instead I'm trying to finish up that darned book.


Wow, great blog! Just what I was looking for to deepen my knowledge in EC crypto.


If you are interested in more than an article, this course is very good.

https://www.coursera.org/learn/crypto

Sorry to spoil it, but the conclusion will basically be the same as the article, as in "just don't".


Yes, but arriving at the conclusion is very fun. Although, don't hold your breath for Cryptography II. I've been waiting for it for years.



yes, it says it in the very article


I had to admit I screwed up a little… I didn't expect merely writing about crypto would be so hard. Good thing I put this up for peer review.

I have since revised my article. It should be less dangerous now.


Poly1305 is one of the few primitives out there that are provably secure.

That doesn't sound right. I guess this should read something like »is provably as hard to break as the underlying block cipher« in the same way that that a hash function built using the Merkle-Damgård construction is provably as hard to break as the compression function used. The article easily gives the impression that Poly1305 is provably secure in the same way as one-time pad.


I have thought about it and re-read the original paper. I stand by my claim.

The truth of the matter is, poly1305, while provably secure, is almost as impractical as a one time pad: it relies on a shared random authentication key. Where does that key is supposed to come from? In practice, you'd derive it from a session key (and so rely on a symmetric cipher such as AES or Chacha), or from a key exchange scheme such as Diffie-Hellman, and you rely on that.

I left my phrasing as it was, because it is (I think) closer to the truth, and not dangerous. I do reckon the AEAD scheme I recommend only has a reduction proof, not a safety proof. I could be more precise, but I don't like clutter.


The only reason I'm aware of for rolling a custom crypto would be fighting against automated attacks. Even ridiculously simple protection can prevent a script from automatically collecting the data. If there's a lot of data streams and each requires human interaction to decrypt it, it can significantly raise the cost of breaking such data. In that case it's probably best to make it just an additional layer over some well-tested method.


I'm not sure I understand the rationale. Automated attacks can't beat real crypto either -- so why even bother rolling your own? (And automated attacks can certainly see non-cryptographic encodings like Base64, Base32, ROT13...)


As one might expect[1], I have some thoughts.

There are a few "layers" of rolling your own crypto. Often, people think they're not rolling their own crypto because they're using AES instead of some hand-rolled bizarro cipher. This is not that: the primitives the post references are solid (e.g. BLAKE2b, ChaCha20, X25519).

There are definitely challenges with explaining crypto to programmers in a way that is non-scary and at the same time factually accurate; I walk that tightrope constantly with Crypto 101. There are significant factual inaccuracies in this post that matter and aren't just a temporary educational tool so you can get a concept across quickly. Someone on Reddit[2] already pointed out most of the ones that were glaring at me after a cursory review. To the author's credit, he's put that link up by the start of the post.

There are at least two kinds of crypto education for programmers. One is practical advice or libs to help people build better applications with crypto. This means building better tools. Don't "pick Ed25519" -- pick a library like libsodium that did signatures right for you. Saying "compose ChaCha20 and append a MAC" isn't the best level of advice we can strive for. Instead, "use Fernet" and possibly "use libsodium's secretbox" is (although I think having to specify a nonce may not be the greatest default -- I'm working on fixing that). The other kind of crypto education for programmers is to help people break things. That can help you become an expert, but I don't think we should expect any meaningful percentage of programmers will spend a bunch of time finishing all 8 sets of Cryptopals. [3] It is, however, a reasonable assumption that a (super)majority of programmers will at some point touch some crypto: the question is what they find when they do that. Will they have secure password storage and not even notice because that was the right default in the software they were using, or will they have to cobble together their own? I for one am hoping it's the former.

Fundamentally, there are two reasons why you don't want to roll your own crypto. One is because there's a good chance you'll mess it up. The other is because that's effort, and you are not going to do better than the programmers that already have done the hard work for you. This is sometimes true for primitives (don't generate your own FFDH prime/implement your own FFDH -- you will probably have small subgroups!), but especially true for high-level recipes at varying levels of complexity from authenticated encryption (you won't be better than OCB or secretbox) all the way to cryptographic ratchets.

Those reasons put together form my main criticism for this article. Sure, it has fundamental inaccuracies, they're not trivial, and that's not OK. But even if it was technically perfect, it doesn't help the next piece of software be safer.

[1]: I am a cryptographer. I did https://www.crypto101.io. I am Latacora's crypto nerd.

[2]: https://www.reddit.com/r/programming/comments/5iv1ti/rolling...

[3]: http://cryptopals.com


Not rolling your own crypto makes sense given that one is unlikely to fully comprehend the potential failures. But then does that not also mean that rolling ones own crypto (with full intention to never use it on anything 'real') and learning at least a little more about potential failures will aid understanding when using crypto?

For example, naive me once wanted to use BCrypt as a signature algorithm for a web-based game when exchanging game state information, to prevent cheating by allowing the server to verify game state transitions. For the purposes at hand, it seemed like a strong enough cryptographic hash (the server handled the signing between clients). The only problem was that BCrypt disregards everything after the first 72 characters. So after headers, everything was signed valid. Everything.


Rolling own crypto for learning is different from rolling own crypto for deployment. The former is good, the latter a most-definite no-no unless you have training.


Thank you. I have since updated my article in the hope of making it less dangerous. I think it addresses most of your objections. Could you review it again?

Thanks.


edit: too late to delete?


I feel like no discussion of rolling one's own crypto is complete without a link to Renaissance artist Pieter Bruegel the Elder's painting of the subject (c. 1562): http://classicprogrammerpaintings.com/post/148027314949/we-r...


I Wrote my own crypto, here it is: https://github.com/gioblu/Cape can you brake it? I do think it is necessary to write your own crypto to understand what "crypto" is.


I'd have one objection: the Readme doesn't say anything about the underlying primitives (are you using something standard or did you invent your own cipher?), and doesn't say anything about external validation (are you alone so far, or did your code get reviewed?). If it's a learning exercise, warning wannabe users would be nice.

That said, your code looks too simple not to be easily breakable. I'm no expert, but that piqued my interest. I'll take a look. If I break it, I'll write an article about it.


Okay, I think I have cracked it. Haven't tested it, but a qualitative reasoning should be enough to get you going.

I concentrated my efforts on your Cape::encrypt() method. Inlining Cape::crypt() and simplifying the resulting code gives something like this (it appears some computations cancelled each other):

  void Cape::encrypt(char *source, char *destination, uint8_t length)
  {
      uint8_t iv = this->generate_IV();
      char real_key[255];
      for (i = 0; i < _key_length; i++)
          real_key[i] = _key[(i ^ _reduced_key) % _key_length];
  
      for (i = 0; i < length; i++)
          destination[i] = source[i] ^ iv ^ i ^ real_key[i];
  
      destination[length] = iv ^ _reduced_key;
  };

Now the weaknesses of your algorithm are clear.

First, I don't believe your IV is truly randomly generated. I don't know of a real CSPRNG on Arduino, so… probably not your fault.

Second, your IV is only one byte. You cannot securely send more than 256 messages with the same key.

Third, the attacker can get rid of the IV anyway: since it is given at the end of the ciphertext, I can just XOR it with the rest of the ciphertext before trying to crack it. This makes your IV irrelevant. You are now limited to one message per key. In practical terms, you don't do better than a one time pad.

Fourth, the way you access the key in a weird order doesn't matter. It's the same as using a different key. Assuming your key is properly unpredictable in the first place, there is no need to access it in a weird order. I made this clear by reordering the key in a real_key.

Fifth, XORing the message with the index doesn't buy you anything: the attacker will just reverse it before trying to crack the rest.

Sixth, if your key is shorter than the message, you have a repeating key. Repeating keys are shown to be broken since… a long time. The reason why is because XORing parts of the ciphertext encoded with the same key reveals the XOR of parts of the plaintext, and that is easily broken in practice. So you're forced to use keys as long as the message.

Conclusion: you have a fancy one time pad. It is secure if your key is as long as the message, and you use it only once. I have to say your home-made crypto is not completely broken. But… a simple one time pad would have achieved the same results more simply.

A word of (dangerous) advice: you won't accomplish much with XOR alone. Modern ciphers tend to also to rearrange the order of the bits as well. If I may, we have a real cryptographer here, and he has written a course: https://www.crypto101.io/


SC4 is a PGP replacement designed to be simple and easy to use:

https://github.com/Spark-Innovations/SC4

Some (but not all) of the code has been audited. Based on DJB's TweetNaCl.


I like how the author organizes 'levels' of rolling cryptography, and generally agree with his assessment of how those strata should be organized.


The idea that anyone would even want to roll their own crypto makes no sense to me. It would be like rolling your own linear algebra routines instead of using BLAS, or rolling your own numeric optimizer instead of using BFGS (or whatever other optimizer makes sense). Crypto is no less esoteric and no less mathematical.


Rolling your own crypto - how about "don't"


> Erratum: Apparently, I don't know what I'm doing either. I committed a number of approximations, misdirections, and even errors in there. Some of them where pointed out by /u/sacundim on Reddit. If you thought you could trust me, read that comment. It's sobering.

LOL brilliant. That was fast.


A lot of those criticisms are sort of unfair. For example, pointing out length extension attacks in a fabricated MAC construction not found in the article, especially when the OP explicitly recommends blake2b, which is not vulnerable to length extension (and also disclaims "How that magical MAC is built is beyond the scope of this paragraph.").

Another example: criticizing that he mentions appending a MAC (you should choose an AEAD!)... when later he recommends using an AEAD.


AEAD normally only means actual AEAD constructions (think AES-GCM/OCB, Chacha20-Poly1305 - not combinations of primitives providing authentication (=EtM, eg. first encrypt with AES-CTR then MAC with some HMAC).

But yeah, some of the criticism reads kinda nit-picky / "I'm going to find false things in here". But on the other hand he provides good hints how the article could be improved.


The article recommends Chacha20 Poly1305, not some combination of primitives.


What do you mean by this then?

> Another example: criticizing that he mentions appending a MAC (you should choose an AEAD!)... when later he recommends using an AEAD.


There is a section called "integrity: not optional", where he uses MACs (which I would consider to be conceptually simpler to understand than AEADs) to explain how integrity can be implemented.

Later, in a section called "Level 2: choosing crypto", he recommends using ChaCha20 Poly1305.

Anyway, this isn't really my crusade, I just think that if you read the informal blog post as an informal blog post, it's fine.


I think the criticism is still valid: combining a MAC and cipher into an AE scheme (especially an AEAD scheme) is anything but trivial. For example, I don't think it's reasonable to suggest someone would, given those two tools, end up with the moral equivalent of secretbox but with ChaCha20 replacing XSalsa20. (At least there's not a padding timing differential for MtE, but still.)


> A lot of those criticisms are sort of unfair.

Possible, but my readers aren't fair either. If I phrase stuff the wrong way, someone might go off and do something stupid.

In a similar vein, one that stops reading at level 1 will miss my AEAD recommendation at level 2. Were they asking for it, or should I have put the recommendation earlier? Tough call.


There is no fairness in cryptography, only math.


I'm glad he put that up at the top instead of trying to hide it. The moral of the story appears to be indeed: Don't roll your own crypto.


I think rolling your own crypto, as a learning exercise, can be very helpful when choosing which trusted crypto solutions to use for a particular use case.


Breaking crypto is certainly an excellent way to learn many kinds of crypto. The problem is that writing your own typically isn't enough by itself -- a lot of horribly broken crypto looks fine after casual inspection. You probably won't find bugs like CBC padding oracles or Bleichenbacher attacks all by yourself. You can, however, absolutely _implement_ them yourself. I recommend cryptopals, together with my own free book, Crypto 101 if you prefer a little prose versus trying to figure it out yourself.


I draw a different conclusion: don't roll crypto on your own. https://www.reddit.com/r/programming/comments/5iv1ti/rolling...

I have updated my article since, and also made clear you can still screw up even if you had help.


> Erratum: [...] Some of them where pointed out [...]

And Muphry's Law[0] strikes again!

[0] https://en.wikipedia.org/wiki/Muphry's_law


If nothing else, this was a hell of a lesson in just how hard rolling your own crypto really is.


The four most dangerous words in CS.

But it's good that people look into it -> for fun and learning. Just not for realsies.


There's serious pushback against rolling your own crypto. Your attitude towards it is good, but the hivemind-y attitude is to never do it under any circumstances. I'm sure that attitude prevents many people from experimenting and becoming crypo experts in their own right.

Roll your own crypto, but be aware of the stakes, and be prepared to drop it the second that the stakes become real.


>There's serious pushback against rolling your own crypto. Your attitude towards it is good, but the hivemind-y attitude is to never do it under any circumstances. I'm sure that attitude prevents many people from experimenting and becoming crypo experts in their own right.

It's not a hivemind-y attitude, it's Software Engineering.

If you cannot show that a bridge will support a given weight, you do not build the bridge. Period. You learn that in your first Engineering course, freshman year. It's part of the responsibility of being an Engineer.

You can experiment with new bridge designs all you want, but until you can prove mathematically that the bridge is sound, you do not build the bridge. If you really want to roll your own bridge over the ditch in the your backyard, go for it, but think three times before inviting your friend to drive their truck over it, especially if you're not already an expert on bridges.

Cryptography is not a small or simple subject. If you are not an expert in the field, it is incredibly unlikely that you will not know enough to critique your own work. Instead of starting out by rolling your own, start by studying.


That's a reductionist ("Period.") and a strawman position. It's unfortunate that you invoked the responsibility of the engineer in your argument: I went to engineering school. I have two degrees in mechanical engineering from a top school, I've published peer reviewed papers, and have helped design hybrid cars for companies you'd recognize. I qualify as an engineer under every definition you can think of, understand firsthand the responsibilities of an engineer, have gone through rigorous safety training in order to test drive the cars that I've personally designed and built.

And yet I still say that nothing you said here is at odds with my statement: know the stakes. But play anyway.

You think engineers don't build wacky and dangerous shit for their friends to play with? Prototypes that stretch the limits of design and would be irresponsible to mass produce as-is? We do it all the time. It's part of the learning process. I've been physically hurt, shocked, burnt both by heat and chemicals, while working with other engineers' experiments. AND IT'S OK! It's ok to try your own solutions on your friends and peers while the stakes are low. Studying is only one half of learning. The other half is bringing textbook knowledge to the real world by building and prototyping and getting your peer group to throw hammers at it and see what happens. Engineering is not theory. Engineering is theory applied to the real world. You can't be an engineer if you don't have one foot on each side. Study on one side, build and play on the other.

It definitely is a hivemind attitude, because the literal hundreds of other engineers that I know -- people who design actual bridges, and buildings, and medical devices, and cars, and oil rigs, and theater sets, things that hundreds of millions of people rely on to be safe -- don't share this attitude. Their attitude is: know the stakes. Most software projects go nowhere and are usually only used by a handful of people close to the developer. That's the definition of low stakes. And it's ok to experiment, and to roll your own crypto, when the stakes are low. I think it's damaging, and probably hubristic, to think that the stakes are high all the time. They're not.


The thing is, rolling your own crypto can seem reasonable, and you won't know how it's broken, until someone breaks it.

You don't become a "crypto expert" by experimenting with crypto; it's not an API or a library. You learn crypto by studying it. People have already made mistakes in the past, so why not learn from them instead of repeating them? Especially in the high-stakes situation that is causing you to use crypto?


The entire crux of my argument is "know the stakes". And you're forcing the argument away from the general case to the high-stakes case. Not all crypto is used in high stakes situations, and you don't need a high stakes situation in order to use it.


Uhm, you should roll your own crypto. Just make sure it's not the only layer of crypto -- i.e., that there's a standard layer too.

Not going to rehash the arguments, but this was discussed recently: https://news.ycombinator.com/item?id=13199471


Yup. That's the answer - that's what XOR is for, so that you don't have to encrypt twice (or more) serially, but in parallel. Note too, that encrypting a pad can be done in advance, with spare cycles, at both ends, for many methods, then XOR with the pad extremely quickly to communicate. In the Vietnam war the North Vietnamese army used both a one-time pad layer and a book code layer. But without computers it was too cumbersome a method and most messages therefore went out in the clear.


It's actually not what I was thinking of (I was in fact thinking of encrypting sequentially and hashing+XOR in parallel), but that's a good point, that might work too!

I'd need to think about it a bit more to convince myself it doesn't have any significant downsides though; it's less obvious to me. For one thing, requiring an OTP approach seems to constrain the set of encryption schemes you can use. (e.g. imagine an algorithm that reverses all the bits of every block before doing anything else.) Furthermore, if your custom encryption scheme happens to leave any kind of "watermark" on the ciphertext that makes it obvious it wasn't something standard like AES, applying the standard layer last will prevent the attacker from realizing you have a custom layer at all, until the standard layer is broken. Whereas if you use the OTP+XOR scheme, it might become obvious something else is going on too.

That said, this approach might actually be stronger than the layered approach, so it might actually be better. Need to think about it more :)


I'll have to look back to see if I in fact have added to the fundamental meaning of what others have called "layering." The pad doesn't absolutely have to be an OTP, particularly when rolling your own layer, if the pad isn't the only key, and the key can change. Watermarks are extinguished when XORed with any genuinely decent (random-imitating, not frequency-bustable) encryption, true. And you're right, at least one layer should not have a watermark. Order doesn't matter though, there really isn't an order for the layers, they're merged with one another. A not-very-good roll-your-own encryption scheme, if it introduces a very large number of permutations (such as starting pad, key) can be very helpful as a layer even though it would be a huge fail as stand-alone encryption. Mechanisms to increase adjust the size (of all but one layer, anyway) are advisable.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: