What does "don't build your own crypto" even mean any more?
I originally thought it meant "don't implement AES/RSA/etc algorithms yourself"
But now it seems to mean "pay auth0 for your sign in solution or else you'll definitely mess something up"
As an example, we have a signing server. Upload a binary blob, get a signed blob back. Some blobs were huge (multiple GB), so the "upload" step was taking forever for some people. I wrote a new signing server that just requires you to pass a hash digest to the server, and you get the signature block back which you can append to the blob. The end result is identical (i.e. if you signed the same blob with both services the result would indistinguishable). I used openssl for basically everything. Did I roll my own crypto? What should I have done instead?
It used to mean "use AES instead of rolling your own form of symmetric encryption." Then it became "use a library for AES instead of writing the code." It has now reached the obvious conclusion of "don't do anything vaguely security-related at all unless you are an expert."
No it hasn't. The subtext of "don't roll your own crypto" is that "AES" doesn't do, by itself, what most developers think it does; there's a whole literature of how to make AES do anything but transform 16 unstructured bytes into 16 different bytes indistinguishable from random noise; anything past that, including getting AES to encrypt a string, is literally outside of AES's scope.
The shorter way of saying this is that you should not use libraries that expose "AES", but instead things like Sodium that expose "boxes" --- and, if you need to do things that Sodium doesn't directly expose, you need an expert.
Contra what other people on this thread have suggested, reasonable security engineers do not in fact believe that ordinary developers aren't qualified to build their own password forms or permissions systems; that's a straw man argument.
Reasonable developers are qualified to do those things. But to build a full-featured authentication subsystem for their webapp? If it's something that holds any kind of reasonably private info, I'm not so sure.
Sure, a reasonable developer will use some sort of PBKDF to hash passwords. But when users need a password reset over email, will they know not to store unhashed reset tokens directly in the database? Will they know to invalidate existing sessions when the user's password is reset? Will they reset a browser-stored session ID at login, preventing fixation? And on and on and on. The answer to some of these questions will be yes, but most developers will have a few for which the answer is no. Hell, I've probably built more auth systems than most (and have reported/fixed a few vulnerabilities on well-known open-source auth systems to boot) and I'm honestly not sure I'd trust myself to do it 100% correctly for a system that really mattered.
Even outside of "holding the crypto wrong", these things have sharp edges and the more you offload to an existing, well-vetted library the more likely you are to succeed.
Having worked at a competitor, I’ll simply say I know nothing specifically of Okta’s hiring practices, but it’s bold of you to assume they’re hiring any security engineers whatsoever. Still, this was twelve years ago when I was at the competitor in question so all the companies involved were was much smaller than they are today. But quite literally zero people at the particular competitor knew or even particularly cared about security.
Or the companies have figured out it's more expensive to hire a full time professional cryptographer to vet the process, than to deal with the PR for now more or less normalized data breaches.
There are different levels of security. The problem with confidently incorrect level is that it's worse than nothing. Having plaintext tokens (or even passwords) in a DB has its risks, but those risks are (a lot more) obvious, they are not shadowy bugs lying in wait under heaps of code.
In short, in at least one variation, the attacker is able to smuggle in a known (unauthenticated) session token into the victims browser. Once the victim logs in the session token is authenticated and known to the attacker.
The easy countermeasure is to renew the session token on login and not reuse a previously unauthenticated session token. Or your application has no session at all before login.
I am not a cryptography expert but even I know it's wrong that a company I worked at stored passwords in plaintext because my boss told me, customers call in to ask for their password and supports happily gives it to them.
When I rolled my eyes up, I soon got a termination.
Reasonable security engineers take a reasonable position on this. Many other developers (usually uninformed ones) believe this now means "don't make an auth system." Like it or not, this has become a sort of adage that people cargo-cult.
If you can't keep the line between "ordinary developers shouldn't be working with AES" and "ordinary developers can write login systems" clear, you're doing people a disservice, because the two assertions are not alike in how fundamentally true they are.
There's nothing personal about any of this, but I'm watching the both the insane poles of this argument ("I'll be just fine inventing my own block cipher modes" guy, "Only a certified security engineer can write a permissions check" guy") go at each other, and wondering why we need to pay any attention to either of them.
I haven't so much seen the latter kind of guy, the one saying you need a certified professional to safely output-filter HTML or whatever, but I see "lol block cipher modes whatever" people on HN all the time, on almost every thread about cryptography, dunking on anyone who says "don't roll your own cryptography".
FWIW I see a lot more of the "lol I rolled my own block cipher" crowd in articles here, but a lot more of the "only experts should verify a signature" crowd in the professional world (mostly startup-biased recently).
I sort of assume that the latter is a random sampling of software engineers but the former is sampled specifically for idiots who YOLO their crypto.
Only experts should verify signatures! That's actually hard.
If you're using minisign or Sigstore or whatever to do something like verify upstream dependencies, sure. But if you're building a system that is about some novel problem domain involving signatures: you should get an expert to verify your system. The trail of dead bodies here is long and bloody.
Yeah, coming up with your own signature scheme to verify authenticity with certain properties is very difficult but also very different in kind from running a pre-designed scheme using a library baked by someone else.
I think you have a pre-defined notion when I say the word "signature" about what I meant (perhaps something about software signing or SSL certificates). I literally meant "checking that HMAC-SHA-2 of some bytes is what you expect" (using a library for the hash). Incidentally, that's how you authenticate a JWT.
I recently came across a signature check that was (correctly) checking the signature against a public key... The issue was the public key itself was unauthenticated, and provided by the (signed) ciphertext itself... Meaning the crypto was fine, but it wasn't checking anything meaningful, as any rogue message would just include its own public key in the message!
It's not only about the raw operation of checking bytes are equal (hopefully in a constant time manner, if applicable), but also about ensuring the desired security properties are actually present in the application!
> If you're using minisign or Sigstore or whatever to do something like verify upstream dependencies, sure.
No, not "sure". minisign and Sigstore are completely different things, and one needs to understand how those work and what the corresponding signatures mean: A minisign signature says "the owner of that key signed this". Whoever the owner might be at the moment of signature. Whenever that might have happened.
A sigstore signature says "Google/Github/... says that this OpenID account generated that one key that the sigstore CA attests and that signed that blob at that time". Time/ordering verification is better than in minisign, because it is there, even if through a trusted party. But identity verification relies on a trusted third and fourth party, some of whom do have a history of botching their OpenID auth.
Those are not equivalent, and knowledge is needed to not mix them up. You don't need to be an expert, but you should try to understand as much as you can, identify what you don't understand and where you must trust an expert, implementation, company, recommendation or standard to be correct and trustworthy. And then decide if that is ok with you.
eh, as someone who rolls my own login systems and depends on checking salted password hashes against those in a database: Assuming I block brutes after 3 or 4 attempts, is there anything I need to worry about here besides someone getting access to the database or the salt? I think your point was that people building logins off one way hashes aren't really doing anything outrageously wrong?
You'll want to look at your hashing algorithm and number of iterations to make sure they are up to snuff and you'll want to make sure that your salts are unique and properly random. (I recently learned that the PS3 private keys were reversible because Sony's random number function returned a fixed result.)
Proper password hashing is protecting against a different threat than blocking brute force login attempts (which isn't really an encryption issue.) Password hashing is more about protecting your user's passwords if your database is breached so that you limit the attackers ability to re-user those credentials to access other services the user may have re-used them with.
I mean to start with if you are literally using "salted hashes" you are boned. There's a whole literature of "password hashes" ("password KDFs"), and that literature exists for a reason; the papers are a record of the weaknesses found in "salted hashes", a construction dating back to the late 1970s.
But once you adopt something like BCrypt or scrypt or PBKDF2 or Argon2 (literally throw a dart at a dartboard), you're into the space of things where you should, as a competent engineer, be expected to figure out a sound system on your own. The only domain-specific knowledge you really need given to you by a cryptographer is "don't just use salted hashes".
Mostly, here in 2025 the question is: "Why are you using shared secrets?" - which is why this discussion comes up. These are never about a PAKE or the PIN for a Yubikey or something, it's always shared secrets when people say "passwords".
This is a terrible stopgap solution, in 1975 it's reasonable to say we have other priorities right now, we'll get to that later. But it's very silly that in 2025 you're telling people to try PBKDF2 when really they shouldn't even be in this mess in the first place.
Remember when Unix login still hadn't solved this? 1995. That was last century. And yet, here we are, still people are writing "Password" prompts and thinking they're doing a good job.
Incidentally, a presentation at the NIST crypto reading group recently pointed out some of the weaknesses of current password KDFs: Argon2i has a viable attack, bcrypt truncates input, PBKDF2 needs an utterly insane number of rounds, and scrypt and argon2d/argon2id have data-dependent computation time and seem to be vulnerable to side-channel attacks. It turns out this is a really hard problem.
In all seriousness, can you please stop recommending bcrypt for new systems? The input length is hard limited to a frighteningly short string and it is not a KDF since it has a fixed output length. Its internal state is small enough that it can be efficiently cracked with FPGAs (although the structure is unsuitable for GPU-based crackers).
The distinction between a KDF and a hash function is that a KDF has an output length that is configurable so you can directly use its output to generate a cryptographic key with 100% of the entropy of the input. A KDF theoretically needs to have arbitrary input length and arbitrary output length. Bcrypt has neither.
Argon2d is great, scrypt is great, PBKDF2 with 1000000+ rounds works fine if you want FIPS, but bcrypt is somehow still on many people's list as though it doesn't have these flaws. It's not at the point where it needs to be imminently replaced in a legacy system, but if you're building something new, you should use something better.
I'm still in the dark about the problem with storing a salted hash, assuming a decent hash function and a decently long random salt.
Simply replying "everyone knows what the problem is and has know since the 70s" is not as helpful as "here is the problem in three sentences".
You can assume a base level of knowledge in your answer, as I have worked in encryption for a little bit, having written the 3DES implementation for some firmware, and done EMV development.
The very short version is that decent hash functions are very, very fast, take almost no storage to compute and can thus easily be run in parallel. If you store `(,salt ,(HMAC-SHA2-256 salt password)) in your database and it leaks, an attacker can check hundreds of millions of possible passwords every second. The password ‘open sesame’ will get found pretty early, but so will ‘0p3n#s3s4m3$’. Pretty much anything which is not of the form ‘62Jm1QyJKGIZUpb5zARRLM’ will get found pretty quickly.
You want something that is slow and takes a lot of resources to run. PBKDF2 was an early attempt which uses lots and lots of CPU, but it doesn’t use a lot of space. scrypt, bcrypt and Argon2 use lots of CPU and also lots of space in an attempt to make it more expensive to run (which is the point: you want the expected cost of finding a password to be more than the expected value of knowing that password).
That’s one level of issue.
The next level of issue is that when you run a simple system like that the user shares his password with you for you to check that it’s valid. This sounds find: you trust yourself, right? But you really shouldn’t: you make mistakes, after all. An you may have employees who make mistakes or who are malicious. They might log the password the user sent. They might steal the user’s password. Better systems use a challenge-response protocol in which you issue a one-time challenge to the user, the user performs some operation on the challenge and then responds with a proof that he knows his secret.
But those have their own issues! And then there are issues with online systems and timing attacks and more. It’s a very difficult problem, and it all matters what you are protecting and what your threat model is.
> But those have their own issues! It’s a very difficult problem, and it all matters what you are protecting and what your threat model is.
it's all a trade-off - those challenge/response systems are better, but they also have more moving parts. There's more bits in the system to go wrong.
When there's only two pieces of the system (compare salted hash of user-provided password against stored salted hash) there's very little room for errors to creep in. Your auditing will all be focused on ensuring that the user-provided password is not leaked/stored/etc after a HTTP sign-in request is rxed.
When using challenge/response, there is some pre-shared information that must synchronised between the systems (algorithm in use, key length, etc depending on the specific challenge/response system chosen). That's a great deal more points of attacks for malicious actors.
And then, of course, you need to add versioning to the system (algorithms may be upgraded over time, key lengths may be expanded, etc) that all present even more points of attack.
Compare to the simple "compare salted hash of password to stored salted hash": even with upgrading the algorithms and the key lengths there still remain only one point of attack - downloading the salted hashes!
It doesn't matter how much more secure a competing system is if it introduces, in practice anyway, more points of failure.
My takeaway after doing some cryptography for some parts of my career is that by choosing a hash function to be expensive in computational power and expensive is space and keeping the "user enters a password and we verify it" is still going to have fewer points of attack than "Synchronising pre-shared information between user and authenticator as the security is upgraded over the years".
Basically, the trade-off is between "we chose a quick hash function, but can at least upgrade everything without the client software noticing" and the digital equivalent of "It's an older code, sir, but it checks out" problems.
You keep returning to salted hashes. Please read the first two paragraphs of what I wrote for why that is a mistake. If you are going to use a shared secret system, do not use salted hashes. There is almost never a good reason to use salted hashes in 2025.
> you need to add versioning to the system
You need this with salted hashes, too! And of course with any password-based system.
> Please read the first two paragraphs of what I wrote for why that is a mistake.
Okay, I read it, and then re-read it. I still don't get why (for example) `bcrypt` (a salted hash function) is a bad idea.
I fully accept that I am missing something here, but I really would like to know why using `bcrypt` is a problem.
>> you need to add versioning to the system
> You need this with salted hashes, too! And of course with any password-based system.
Not in the client software, you don't. The pre-shared information with password-based system is generally stored in the users head.
The pre-shared information in the challenge/response system means both the submitting software (interacting with the user and rxing the challenge) as well as the receiving software (txing the challenge and rxing the response) need to be synchronised.
Now, once again, I fully accept that I might be missing something here, but AFAIK, that synchronisation contains extra points of attacks; points of attacks that don't exist in the password/salted-hash system.
And since absolutely no system ever discards existing mechanisms completely when upgrading, that deprecated but still supported for a few more months is even more additional points of attack.
Once again, I am trying to understand, not be contentious, and I want to fuolly understand:
a) The problem with salted hashes like `bcrypt`
b) What changes need to be made to client software when upgrading algorithms and key lengths in a password-based system.
bcrypt is not a "salted hash function". It's a a password hash construction (at the time of its invention, it was called an "adaptive hash"; today, we'd call it a "password KDF"). If you're using bcrypt, you're fine.
What you can't do is use SHA2 with a random salt and call it a day.
My understanding of bcrypt math is that the input to the algorithm is a random salt and a message, with the output being a hash.
I believe the actual implementation gives two output fields as a single value, with that value containing the salt and the hash.
This might be why we appear to be talking past each other - I consider bcrypt to be a salted hash because it takes a salt in the inputs and produces a hash in the output.
The fact that the output also contains the salt is, in my mind, an implementation detail.
Yes. We aren't talking past each other anymore. If you aren't composing a "salted hash" out of a cryptographic hash function and a salt, but are instead using bcrypt, scrypt, PBKDF2, or Argon2, you have nothing to worry about. Just to complete your understanding: the salt --- randomizing the hash --- has very little to do with what makes bcrypt a good password hash.
You'll avoid this confusion in the future if you don't refer to bcrypt as a "salted hash". Salted hashes are the technology bcrypt was invented to replace.
OP here: I am using bcrypt to hash passwords, but I struggle to understand how not salting i.e. not randomizing it against a stored secret would still be safe for storing short strings. Be that as it may, I'm glad to hear I'm not doing anything horribly wrong when building login systems.
Thanks, yes, using bcrypt. Usually just with something like PHP's password_hash() and the default algorithm. I sort of expected to trigger people with my post, and it's enlightening, but I feel pretty safe with what I'm doing. I mean, I like to know exactly how everything in my engine works. But there's a point at which I can't get paid to reinvent the wheel anymore, nor do I want to try.
There's even bogus "But it's encrypted" crypto in the new software I'm working with in my day job this year 2025. There are so many other "must fix" user visible problems that I actually forgot to mention "Also the cryptography is nonsense, either fix it or remove it" on the one hour review meeting last week, but I should add that.
I've made one of these goofs myself when I was much younger (32 byte tokens, attacker can snap them into two 16-byte values, replace either half from another token and that "works" meaning the attacker can take "Becky is an Admin" and "Jimmy is a Customer" glue "y is an Admin" to "Jimm" and make "Jimmy is an Admin") which got caught by someone more experienced before it shipped to end users, but yeah, don't do that.
Amount of foot guns in auth flow is high. Implementation of login / password form is just a small piece.
Making sure there are no account enumeration possibilities is hard. Making sure 2FA flow is correct and cannot be bypassed is hard. Making proper account recovery flow has its own foot guns.
If you can use off the shelf solution where someone already knows about all those - it still stands don’t roll your own.
And the consequence is that people get banged on the head if they either use sth existing (cause they will be using it wrong) or they build sth on their own (because that's obviously bad) or they get fed up and don't use anything.
The issue with security researchers, as much as I admire them, is that their main focus is on breaking things and then berating people for having done it wrong. Great, but what should they have done instead? Decided which of the 10 existing solutions is the correct one, with 9 being obvious crap if you ask any security researcher? How should the user know? And typically none of the existing solutions matched the use case exactly. Now what?
It's so easy to criticize people left and right. Often justifiably so. But people need to get their shit done and then move on. Isn't that understandable as well?
> The issue with security researchers, as much as I admire them, is that their main focus is on breaking things and then berating people for having done it wrong.
> Great, but what should they have done instead? Decided which of the 10 existing solutions is the correct one, with 9 being obvious crap if you ask any security researcher?
There's 10 existing solutions? What is your exact problem space, then?
I'm also working in all of my spare time on designing a solution to one of the hard problems with cryptographic tooling, as I alluded to in the blog post.
> How should the user know? And typically none of the existing solutions matched the use case exactly. Now what?
First, describe your use case in as much detail as possible. The closer you can get to the platonic ideal of a system architecture doc with a formal threat model, the better, but even a list of user stories helps.
Then, talk to a cryptography expert.
We don't keep the list of experts close to our chest: Any IACR-affiliated conference hosts several of them. We talk to each other! If we're not familiar with your specific technology, there's bound to be someone who is.
This isn't presently a problem you can just ask a search engine or generative AI model and get the correct and secure answer for your exact use case 100% of the time with no human involvement.
Finding a trusted expert in this field is pretty easy, and most cryptography experts are humble enough to admit when something is out of their depth.
And if you're out of better options, this sort of high-level guidance is something I do offer in a timeboxed setting (up to one hour) for a flat rate: https://soatok.com/critiques
> I've literally blogged about tool recommendations before
Do you happen to know of a similar resource applicable to common HN deployment scenarios, like regular client-server auth?
For example, in your Beyond Bcrypt blog post[0] you seem to propose hand-writing a wrapper around bcrypt as the best option for regular password hashing. Are there any vetted cross-language libraries which take care of this? If one isn't available, should I risk writing my own wrapper, or stick with your proposed scrypt/argon2 parameters[1] instead? Should I perhaps be using some kind of PAKE to authenticate users?
The internet is filled with terrible advice ("hash passwords, you can use md5"), outdated advice ("hash passwords, use SHA with a salt"), and incomplete advice ("just use bcrypt") - followed up by people telling you what not to do ("don't use bcrypt - it suffers from truncation and opens you up to DDOS"). But to me as an average programmer, that just leave behind a huge void. Where are the well-vetted batteries-included solutions I can just deploy without having to worry about it?
Your tool recommendations are actively harmful and dangerous in places.
You whole-heartedly recommend sigstore, a trusted-third-party system which plainly trusts the auth flows of the likes of Google or Github. It is basically a signature by OpenID-Login. This is no better than just viewing everything from github.com/someuser as trusted. The danger of key theft is replaced by the far higher danger of account theft, password loss and the usual numerous auth-flow problems with OpenID.
Why should I take those recommendations seriously?
I feel like there is a huge subset of people who just do trial and error programming. The other day, I watched a whole team of 8 people spend a whole work day swapping out random components on Zoom to diagnose a problem because not a single person considered attaching a debugger to see the exact error in 5 minutes.
I feel like you have to tell people to not roll your own whatever because there are so many of these types of people.
I was so surprised the first time I saw this. I mostly work alone and my workflow is usually study an example (docs or project), collect information (books and article), and iteratively build a prototype while learning the domain. Then I saw a colleague literally copying and pasting stuff, hoping that the errors go away in his project. After he asked for my help, I tell him to describe how he planned to solve the problem and he couldn't. For him, it was just endless tweaking until he got something working. And he could have understood the (smallish) problem space by watching one or two videos on YouTube.
Those same people are utterly incapable of reading logs. I’ve had devs send me error messages that say precisely what the problem is, and yet they’re asking me what to do.
The form of this that bothers me the most is in infra (the space I work in). K8s is challenging when things go sideways, because it’s a lot of abstractions. It’s far more difficult when you don’t understand how the components underpinning it work, or even basic Linux administration. So there are now a ton of bullshit AI products that are just shipping return codes and error logs out to OpenAI, and sending it back rephrased, with emoji. I know this is gatekeeping, and I do not care: if you can’t run a K8s cluster without an AI tool, you are not qualified to run a K8s cluster. I’m not saying don’t try it; quite the opposite: try it on your own, without AI help, and learn by reading docs and making mistakes (ideally not in prod).
The stuggle was real in the early web with getting companies to do proper password storage and authentication but the fact that seasoned professionals turn to auth0 or okta (and have been bitten by this reliance!) nowadays strikes me as a little embarrassing.
It never meant "don't implement AES/RSA yourself". Nobody sane does that; there has never been a need to convince people not to write their own implementations of AES. Ironically, doing your own AES is one of the less-scary freelancing projects you can undertake. Don't do it, but the vulnerabilities you'll introduce are lower-tier than the ones you get using OpenSSL's AES.
It has, always, mean "don't try to compose new systems using things like AES and RSA as your primitives". The serious vulnerabilities in cryptosystems are, and always have been, in the joinery (you can probably search the word "joinery" on HN to get a bunch of different fun crypto vulnerabilities here I've commented on).
Yes: in the example provided, you rolled your own cryptography. The track record of people building cryptosystems on top of basic signature constructions is awful.
I assume you've never seen AES written in vbscript? Generally, the thought process goes: This thing needs AES, I want to talk to it, I know $language, and wikipedia has the algorithm. The idea that AES is there for a reason ( like security) never enters the thought process.
Someone is building a shed, not a whole building, and stopped listening to real builders with their nitpicky rules long ago. It works great, until the shed has grown into a skyscraper without anybody noticing, and an unexpected and painfull lesson about 'load bearing' appears.
I ran the Cryptopals challenges. I have been sent AES implemented in 4 different assemblies, Julia before it was launched, pure Excel spreadsheet formulae, and a Postscript file.
There must be beauties in there;-). Even so, the fact that it's called Cryptopals indicates a public having some basic level of care about crypto. The non-it person hacking an excel macro together to get some job done has a very different attitude, and they do run their stuff in production.
The public should have a basic level of understanding and care of crypto. The logic of those challenges is that encouraging people to break cryptography is always prosocial; building it is a little more complicated, in the same sense as surgery.
I want to roll my own variant of AES (I know, I know!) CTR mode for appendable (append-only) files that does not require a key change or reencrypting any full AES block. Big caveat, this design doesn't have a MAC, with all the associated problems (it's competing against alternatives like AES-XTS, not authenticated modes).
Partial blocks are represented by stealing 4 bits of counter space to represent the length mod block size. This restricts us to 2^28 blocks or about 4GB, but that's an ok limitation for this use.
So say you initially write a file of 33 bytes: two full blocks A and B, and a partial block C. A and B get counter values 0 (len) || 0 (ctr) and 0 (len) || 1 (ctr). C is encrypted by XORing the plaintext with AES(k, IV || 1 (len) || 2 (ctr)).
You can append a byte to get a length of 34 bytes. Encrypted A/B don't change. C_2 is encrypted by XORing plaintext_2 with AES(k, IV || 2 (len) || 2 (ctr)). Since the output of AES on different inputs is essentially a PRF, this seems... ok?
Finally if you append enough bytes to fill C, it gets to len=0 mod 16. So the long and short if it is: no partial or full block will ever reuse the same k+iv+len+ctr, even rewriting it for an append.
> I want to roll my own variant of AES (I know, I know!) CTR mode for appendable (append-only) files that does not require a key change or reencrypting any full AES block.
I want the contents to be unreadable to an attacker who is able to steal a disk. I want to be able to append to files but not overwrite them in place, and do partial reads without decrypting the entire file.
So... you've pointed out the problems, what are the solutions?
What am I allowed to use as primitives to compose systems that require cryptographic functionality? If I'm writing medical device software and the hospitals I'm selling to say I can't store files in plaintext on disk, but also some security expert on HN says I shouldn't use AES as a piece of the solution because that's "rolling my own crypto" and too dangerous, what should I do? Mandate that postgres configured with encryption be used (even if the application is simple and doesn't require a full db)? That will almost certainly harm the prospect of the sale because having to get hospital IT involved introduces a lot of friction. Or are you saying "use sodium to encrypt the file, don't try to do it yourself with AES"?
There's a subtext here of "what do I do when the high-level libraries like Sodium don't do exactly what I need", and the frank answer to that is: "well, you're in trouble, because consulting expertise for this work is extraordinarily expensive".
We have an SCW episode coming out† next week about cryptographic remote filesystems (think: Dropbox, but E2E) with a research team that reviewed 6 different projects, several of them with substantial resources. The vulnerabilities were pretty surprising, and in some cases pretty dire. Complaining that it's hard to solve these problems is a little like being irritated that brain surgery is expensive. I mean, it's good to want things.
> There's a subtext here of "what do I do when the high-level libraries like Sodium don't do exactly what I need", and the frank answer to that is: "well, you're in trouble, because consulting expertise for this work is extraordinarily expensive".
...which is exactly why people roll their own crypto. Security folks don't seem to realize/care that money is a real constraint at most companies (esp. startups) and security can easily become a money furnace. When the only two options are: "use sodium" and "extraordinarily expensive consultant" then devs turn to the third option: https://pkg.go.dev/crypto/aes
Boom, file encrypted, requirement fulfilled, money saved, sale made. Perfectly secure? Probably not. The docs even say "The AES operations in this package are not implemented using constant-time algorithms". But maybe that's an acceptable tradeoff for your target risk profile.
Second class sodium is still a hell of a lot better than setting up your own. For go in particular, writing your own wrapper is a reasonable option. It's a handful of lines of code per function used.
I don't have a full guide for identifying a proper equivalent, but "constant time" is a requirement.
It might be in many cases, but not every language it seems has explicit sodium support. Golang for example, has this not-widely-used wrapper (https://github.com/jamesruan/sodium) maintained by some random guy which might not be desirable to use for a variety of reasons.
Want a specific solution for your specific use case? Talk to an expert to guide you through the design and/or implementation of a tool for solving your specific problem. But don't expect everyone writing for a general audience (read: Hacker News comments) to roll out a bespoke solution for your specific use case.
I feel threat modelling is the really difficult part in gluing together known-good crypto parts into a solution.
I've glued together crypto library calls a few times, and I've implemented RFCs when I've done so, like HKDF[1].
But that isn't enough if the solution I've chosen can easily be thwarted by some aspect I didn't even consider. No point in having a secure door lock if there's an open window in the back.
The sad truth is that even libsodium can be misused (and the article explicitly mentions that twice). Even pretty high-level constructions like libsodium's crypto_secretbox can be misused, and most of the constructs that libsodium exposes are quite low-level: crypto_sign_detached, crypto_kdf_hkdf, crypto_kx, crypto_generichash.
I think we should understand and teach "do not roll your own crypto" in the context of what constructs are used for what purposes.
AES is proven to be secure and "military-grade" if you need to encrypt a block of data which is EXACTLY 128 bit (16 bytes) in size. If you want to encrypt more (or less) data with the same key, or make sure that the data is not tampered with or even just trust encrypted data that passes through an insecure channel, then all bets are off. AES is not designed to help you with that, it's just a box that does a one-off encryption of 128 bits. If you want do anything beyond that, you go into the complex realm of chaining (or streaming) modes, padding, MACs, nonces HKDFs and other complex things. The moment you need to combine more than two things (like AES, CBC, PKCS#7 padding and HMAC-SHA256) in order to achieve a single purpose (encrypting a message which cannot be tampered), you've rolled your own crypto.
Libsodium's crypto secretbox is safe for encrypting small or medium size messages that will be decrypted by a trusted party that you share a securely-generated and securely-managed key with. It's far more useful than plain AES, but it will not make any use case that involves "encryption" safe. If you've decided to generate a deterministic nonce for AES, or derive a deterministic key (the article links a good example[1]), then libsodium will not protect you. If you attempt to encrypt large files with crypto_secretbox by breaking them to chunks, you will probably introduce some vulnerabilities. If you try to build an entire encrypted communication protocol based on crypto_secretbox, then you are also likely to fail.
The best guideline non-experts can follow is the series of Best Cryptographic Answers guides (the latest one I believe is Latacora 2018[1]). If you have a need that is addressed there and you only need to use the answer given with combining it with something else, you're probably safe.
Notice that the answer for Diffie-Hellman is "Probably nothing", since you're highly unlikely to be able to use key exchange safely in isolation. I did roll key exchange combined with encryption once, and I still regret it (I can't prove that thing is safe).
The "You care about this" part explains the purpose of each class of cryptographic constructs that has a recommendation. For instance, for asymmetric encryption it says "You care about this if: you need to encrypt the same kind of message to many different people, some of them strangers, and they need to be able to accept the message asynchronously, like it was store-and-forward email, and then decrypt it offline. It’s a pretty narrow use case."
So this tells you that you probably should not be using libsodium's crypto_box for sending end-to-end encrypting messages in your messaging app.
With signing server level of interaction is the thing that really matters. I would probably say that if you implemented algo doing the signing operation it might be questionable. But if you implemented the key storage fully yourself, it is even more questionable. Or design your HSM module, or just write the keys in plaintext or minimally protected on the disk...
Also who designed and implemented checking the appended blob? Also crypto...
There's a load of places you can shoot yourself in the foot, e.g., not using constant time comparison functions, which leaves you open to timing attacks.
Glueing together secure pieces too often does not result in a secure whole. People spend careers coming up with secure protocols out of secure pieces.
My take on not rolling your crypto is to apply as close to zero cleverness as possible when it comes to crypto. Take ready made boxes, use them as instructed and assume that anything clever I try to build with them is likely not secure, in some way or another.
Makes me think about pass the hash vuln in windows NTLM where if someone grabs the hash they don’t need to know the pass anymore because they can pass the hash.
The same you still can use AES or RSA incorrectly so that whatever you built on top of those is vulnerable if you don’t have experience.
OK TFA explains all I have the same view on topic as author.
I honestly think it's a meme born of propaganda. 'Don't roll your own' is good advice a lot of the time when there are hardened libraries at your disposal, but who does it serve if less people know how to build useful cryptography? Three letter agencies mostly. I had no trouble implementing useful algorithms in college as part of my study of it, or understanding different potential weaknesses like timing or known plaintext attacks. We didn't cover side channels like power analysis but how often does that actually matter outside of a lab?
The fact that you're referring to useful algorithms here is kind of a "tell", for what it's worth. "Known plaintext attacks" are not a broad class of vulnerability in cryptosystems; they're a setting for attacks. It's like saying "I've studied the browser security model including weaknesses such as the same origin policy".
What do you think an algorithm is? This weird criticism is kind of a "tell", for what it's worth. Ciphers are algorithms. This is unnecessarily combative. I'm disappointed by your posts in this thread, because they're low on substance an high on posturing.
Learning crypto isn’t just a box ticking exercise that you learn it and now you’re qualified to “do crypto” for the rest of your career. I also studied this in university but I know that I don’t spend anywhere near enough time to day to avoid, well, any pitfalls of rolling my own crypto.
I currently work on our company’s Auth systems and half my team still side-eyes our own work, despite having worked on this area for years now.
All I'm reading here is more of the same scare-mongering. I don't casually implement cryptosystems to be used in critical situations, and I wouldn't attempt it without a serious amount of research. But "never do it yourself" is a poor substitute for actually spreading knowledge about how to do it and open discussion of pitfalls. It rubs me the wrong way.
You do understand “never do it yourself” implicitly applies to a commercial/production context, right? Nobody is stopping you learning or choosing to enter the field (to make it your sole professional focus).
It certainly comes across like people are discouraging others from even learning about it.
Just read through this thread and see how few people are saying anything like "you should still learn how cryptographic primitives work though". It's almost nothing but negativity and discouragement.
The blog post in question (which I wrote) literally opens with a link to https://www.cryptofails.com/post/75204435608/write-crypto-co... that encourages people to learn about it! Yes, even if that type of learning means (privately) writing crypto code.
Why would you expect the comments about it to argue in favor of "still learning" when the blog post never advocated against learning? They aren't making this point because I didn't make the contrary one, so there's nothing to argue with.
That you're only seeing negativity and discouragement is a cognitive distortion. I'm not qualified to help people with those, but I hope pointing that out might help you see that the world isn't against you.
That software developers presently and generally are bad at understanding cryptography doesn't mean it's impossible to learn, or that you should be discouraged from learning.
Alright that's fair, in this thread there's not much reason for people to encourage learning it so it wasn't a good example to use. Thank you for helping people to learn more about it.
That being said, the sentiment around rolling your own crypto that I see in the rest of internet in general still strongly discourages any kind of curiosity, intentionally or not.
In other places where it's discussed and in the comments on this thread which are not a direct response to aspects of the blog, you get mostly the same kind of response.
> Just read through this thread and see how few people are saying anything like "you should still learn how cryptographic primitives work though". It's almost nothing but negativity and discouragement.
That’s what the “implied” part of my comment means.
It wasn't 'an algorithms class' but grad level cryptography. We did implement boxes as a project and were graded on them. Where do you think people learn this stuff? Super secret cryptography school? I even used it. I previously built a secondary layer protocol using diffie hellman that runs on top of TLS such that in the event TLS is broken there's still some level of secrecy, and I'm very confident that it's better than nothing. Defense in depth, you know?
It isn't arrogant or elitist to suggest that no one person should be rolling their own crypto, even if they have taken a grad level cryptography class.
For instance, you don't expect an aviation engineer to build a brand new plane on their own. They would be missing decades of cumulative knowledge, battle testing, perspectives and knowledge outside of their own. These systems are complex, to the point where taking a grad course is not enough.
I have worked with actual experts on cryptography throughout my big tech career - people that have actually written parts of common crypto libraries suggested in this thread - and even they themselves are not interested in writing crypto code. It is an incredibly involved group effort between experienced experts and your first iteration will almost certainly be broken. There is almost never a reason to do this.
If you are so confident in your ability to do so, you may simply be a crypto prodigy, and I apologize. You should post your DHE implementation here. If it's secure and useful, there shouldn't be an issue, and surely the community would benefit from it.
Again you're mistaking my point. I know enough to know what I don't know and the risk involved in doing any serious crypto work, that I can't solo build libraries to protect financial transactions or whatever willy-nilly. It's obviously a ton of work to get right.
What I'd advocate for instead of "never roll your own crypto" is more like "never use your own crypto in prod". People are better off knowing more than less and getting their hands a little dirty. I think the former, common message is more like "don't even try to understand it," which is a joke.
It makes more sense if you understand that gatekeeping like this also provides job security for security consultants telling you to buy their services instead.
No, instead you said you learned about how to understand "known plaintext attacks", counting on the average reader of this thread not to know that's a content-free claim. One time, in a "graduate level" cryptography class, you even built a protocol using Diffie-Hellman on top of TLS. OK!
You provided it as a proof-point of some sort, but I've led lab exercises of freshmen CS students doing the same exercise, so I'm unclear what it's proof of.
Anybody can build a Diffie-Hellman protocol. You can practically do with a calculator; in fact, we did that as an exercise at a talk, with a big audience, using frisbees to "exchange" the keys. But: the talk was about how, as a pentester, to trivially break these systems, because making a system that uses a DH key exchange safely, as opposed to one that simply appears correct in unit tests, is treacherously hard.
I'm still curious about what you could have possibly meant by learning about "weaknesses" like "known-plaintext attacks". Can you say more?
It's been years, but I recall that, for example, when you know every piece of plaintext starts with "https://www", or perhaps know the full contents of particular messages, it may become some degree easier to brute force your way to a key. I don't think it's a concern for standards in broad use, more like something you would worry about if you were cooking up your own cipher.
building your own crypto is so simple a 5 year old can do it, just roll a dice, and write down the numbers, those numbers are the key. Then convert the text to numbers, and for each letter add the corresponding key number, and the sum will be the encrypted text. To decrypt you subtract the number with the corresponding number on the key.
The problem is that modern crypto is so complicated one will easily make an error implementing it.
It's unbreakable, google for "one time pad" encryption.
The problem with cracking it is that if you just change one letter in an English word it becomes another English word.
Here i have encrypted "hackernews" with the following key 2,3,1,6,2,1,2,2,1,3 Good luck cracking it without the key: JDDQGSPGXV
As was said above, don't just add a number from 1 to 6. You want your ciphertext to be indistinguishable from random, so in this case (assuming only letters and no punctuation nor space), you need to add a number from 0 to 26. And you need a uniform distribution, or again, statistical analysis will screw you.
Now one does not simply generate a uniform random distribution from a d6. You can't throw the die 5 times and add the results for instance. First, you'll get a number between 5 and 30. Not only the range falls short one letter (25 possibilities instead of 26), what you get is a binomial distribution, heavily skewed towards the middle numbers (5 and 30 are very improbable in comparison).
My recommendation here would be to throw the die once, subtract 1, multiply by 6, throw it a second time, add it, subtract 1. This will give you a uniform distribution between 0 and 35 (assuming your die is perfectly fair, which, spoiler alert, it isn't). Think of it of a 2-digit base 6 number. Assign a number to each letter of the alphabet (and now you can even support spaces & punctuations, or numbers), add (modulo 35) the result of your die throw, do not translate the result to an encoding with less than 36 symbols, and now you have a proper one time pad.
One thing that's crucial when attempting anything cryptographic: make sure you've got the maths down. Ideally you should be able to construct a rigorous (even machine verified) proof that your stuff works as intended. For the one time pad it's relatively easy. But you need to do it, that's how you'll notice that if you encrypt "aaaaaaaaa" your result will only yield letters between B and G, which you'll agree doesn't hide your plaintext nearly as well as you intended.
I originally thought it meant "don't implement AES/RSA/etc algorithms yourself"
But now it seems to mean "pay auth0 for your sign in solution or else you'll definitely mess something up"
As an example, we have a signing server. Upload a binary blob, get a signed blob back. Some blobs were huge (multiple GB), so the "upload" step was taking forever for some people. I wrote a new signing server that just requires you to pass a hash digest to the server, and you get the signature block back which you can append to the blob. The end result is identical (i.e. if you signed the same blob with both services the result would indistinguishable). I used openssl for basically everything. Did I roll my own crypto? What should I have done instead?