Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
RFC 8959: The “secret-token” URI Scheme (rfc-editor.org)
174 points by gebt on Jan 31, 2021 | hide | past | favorite | 71 comments


This solves a very real problem that some services like GitHub [0] have started to address. Auth tokens are being committed to public repos at an alarming rate. Detecting this and ideally preventing it as early as possible is key to avoiding account compromise. There are two components to this: identification of a secret and attribution. Identification is non-trivial and requires determining if some text really is a secret and not just a random hash, uuid, or other high entropy string. Most tokens today are generic, alphanumeric patterns; false positives abound. Attribution is tricky too, currently relying on either parsing the variable name (`AWS_SECRET_KEY=XYZ`), commit message, file name, or some other metadata. In the rare case, a service will have designed their auth tokens with this in mind, prepending a unique, static prefix to their tokens.

The URI scheme proposed in the linked RFC will squarely solve the first problem. It will allow for highly accurate CI scanners and pre-commit hooks. The scheme doesn't appear to address attribution, assuming all service providers use the same `secret-token` scheme. However attribution is a nice-to-have, allowing for automated revocation once the secret has gone public. If done right, identification alone could be used to prevent most of the token leakage that occurs today.

[0] https://docs.github.com/en/developers/overview/secret-scanni...


I hope it works out. What I do, is create a file, usually called something like "DoNotCheckThisIntoSourceControl.swift", and then put that in a directory called "DoNotCheckThisIntoSourceControl". I then add "DoNotCheckThisIntoSourceControl" to my .gitignore.

Clunky, but it works. I add things like server secrets and whatnot, there. I keep the file small, and usually add the contents to a secure note in 1Password, so there is version control, of a sort.


You might be interested in:

git config --global core.excludesfile ~/.gitignore

You can have a system-wide (but local only) .gitignore. It doesn't help other people who clone your repo, but it can be useful in some situations.


No need to change the config; the default global ignore file path is ~/.config/git/ignore


Ah, thank you.


What i do is not put secrets in files in my source tree. If secrets have to go in files, they go in files somewhere well clear of any source control tool.


I built an e2e encrypted cloud service for secrets in case you’re interested in trying it: https://cloudenv.com


You might like git-crypt then, to add actual version control for your secrets


I do something similar, by having a pretty global exclude for folders called donotbackup in my backup tools. Quite useful.


As far as backup is concerned, a well-supported (by Borg, restic, and others) way of excluding directories is by putting a file that conforms to the CACHEDIR.TAG standard.

https://bford.info/cachedir/


Thanks! I wasn't aware of this. Mostly it's for Apple's Time Machine which doesn't support it, but this is neat to hear about.


I like this idea a lot and I am slightly annoyed that I didn’t think of it myself. Thanks for the contribution.


If people are looking for a way to put encrypted files into git, you can use LockGit https://github.com/jswidler/lockgit.


I agree. It would have been interesting to do something like secret-token:example.com/abcdef with the option of secret-token:example.com/auth/abcdef (where auth is an arbitrary token type picked by example.com)


Well, you can create your tokens with the structure "domain/authtype/code". I think that's a good idea, and is plainly allowed by the standard.

Yeah, it would be better if it was standardized too, but I didn't think about all the corner cases. Maybe it can't be standardized.


> I think that's a good idea, and is plainly allowed by the standard.

Watch out, "secret-token:domain/authtype/code" is not a valid secret-token by the standard!

The standard has grammar:

  [RFC8959]
  secret-token-URI    = secret-token-scheme ":" token
  secret-token-scheme = "secret-token"
  token               = 1*pchar

  [RFC3986]
  pchar               = unreserved / pct-encoded / sub-delims / ":" / "@"
  pct-encoded         = "%" HEXDIG HEXDIG
  unreserved          = ALPHA / DIGIT / "-" / "." / "_" / "~"
  sub-delims          = "!" / "$" / "&" / "'" / "(" / ")"
                        / "*" / "+" / "," / ";" / "="
That means "/" cannot be part of a secret-token, and a strictly standard-compliant scanner will not pattern-match on "secret-token:domain/authtype/code".

I think that may be a design mistake as people will inevitably build tokens with those characters (using the same reasoning you did), and they won't show up in some scanners (any that are strictly compliant). "/" is allowed in query strings despite being a path delimiter before the query string; allowing it in secret-token would make sense too.

Fortunately it's not a security problem as long as "/"-delimited paths in tokens don't start with "/", because the preceding characters will be enough to match anyway. However, if you have a scanner where you whitelist some strings after being shown matches, the fact it doesn't match the security part of the token introduces a risk of mistakenly whitelisting too broadly (just the domain in this example), and of course there's a chance someone may use a path starting with "/" without realising this is a problem.


Ouch. I didn't check the grammar on the other RFC this one pointed. I just assumed it was sane (even more because the inline text explaining it makes no mention of the slash).

I really didn't expect somebody to define a URL as having a single URL part. Now my opinion is that this RFC is ill conceived.


You could add a bunch of %2F to stand in for slashes, but that's pretty clunky.


Is it? Spec https://www.rfc-editor.org/rfc/rfc3986.txt 2.2 states "/" is reserved and should be encoded vs "-" which is unreserved.

Thinking further: I don't want a secret to identify what it's for. That increases odds it's used when leaked accidentally.


The very real problem is anyone who checks a secret into source should be held responsible for any data loss including fines and jail time. Poor security practices are that are obvious avoidable are inexcusable.


You think that one errant command line entry should subject an individual to personal liability that could run into the billions of dollars?

First, control inputs that can cause catastrophic damage require purpose built system-wide multi-layered controls to prevent from being accidentally applied.

Second, it is never the responsibility of a single individual (even the CEO) to ensure this engineering requirement is identified, scoped, budgeted, funded, fulfilled, and regularly tested.

Third, absent actual malice — specific intent to cause damage — the personal liability for a simple mistake caused by a single person should always be dramatically reduced relative to the damages, particularly when the accident was only possible due to lack of proper safety engineering, or an actual cascade of failures.

Fourth, if software developers are somehow supposed to shoulder personal civil liability for potentially billions of dollars of damages due to a single mis-typed command, the simple truth is that nobody would knowingly and willingly accept that job.


> You think that one errant command line entry should subject an individual to personal liability that could run into the billions of dollars?

Setting aside the hyperbolic dollar amount you’ve suggested (in North America, stories I’ve found about specific engineers being fined for structure collapses have been in the five-figure range[0][1][2])… sure, why not?

If a civil engineer accidentally writes down one load calculation incorrectly, doesn’t follow well-known safe design practices which would’ve caught the error, and it causes the structure to collapse, they do have personal liability. Why should software engineers have special immunity?

This industry has a terrible track record of self-policing when it comes to security, so maybe some added liability would help—and, to the OP’s point, there really is no way that a secret token should be finding its way into a public code repository except by failing to follow safe design practices.

[0] https://www.ehstoday.com/archive/article/21914808/engineers-...

[1] https://www.lexology.com/library/detail.aspx?g=3812f88d-8670...

[2] https://www.denenapoints.com/engineer-fined-errors-dallas-co...


> If a civil engineer accidentally writes down one load calculation incorrectly, doesn’t follow well-known safe design practices which would’ve caught the error, and it causes the structure to collapse, they do have personal liability. Why should software engineers have special immunity?

The government regulates civil engineering, because when civil engineers screw up, people die.

The same is true for certain categories of software – aviation, motor vehicles, medical devices, etc. You can argue about whether the regulations in those areas are good enough (incidents like the 737 MAX suggest not), but those flaws are arguably best addressed by improvements in those industry-specific regulatory processes rather than trying to regulate software engineering as a whole.

With your run-of-the-mill blog hosting software or SaaS app, software bugs don't kill people. Privacy violations, financial damages yes, but actual deaths no. Anyone who wants to propose increased government regulation of software engineering – to turn software engineering into a regulated/licensed profession like civil engineering or medicine, with the kind of personal liability attached that those professions have – is going to get a lot of pushback from businesses that it is adding expense without any great benefit. And I think it is going to be hard to find anywhere enough political capital to overcome that pushback. Financial damage is insurable, and voters value their life and health far more than their privacy.


I don't remember being taught "never commit secrets into source control" at university. I'm sure that calculating loads is part of the civil engineering syllabus at most universities. Civil engineers are also required to be qualified and registered professionals, unlike software engineers.

There's a myriad array of ways that data breaches can occur. Which ones am I liable for as a software developer? If I'm on the receiving end of a 0day exploit, am I still liable? Or if I'm targeted by the Russian/Chinese/North Korean/{{currentEnemy}} government?


To update an old teaching for modern times: He that is without security lapses among you, let him first impose sanction on those who have.

Or to put it more bluntly - all users have flaws, and if your system relies on users not having flaws for being secure, it isn't.


I'm surprised they are not identifying the service in the token. A lot of services are scanning GitHub for committed tokens, and being able to tell if that token is a fake/testing one or a real one for a specific company would make those scans more useful.

For example, SendGrid tokens start with "SG", Amazon tokens with "AKIA", etc. Why not build that into the URL scheme?

This scheme will have services stick to well-known unformatted prefixes in the token itself, and prevent small actors from being notified by GitHub and other security scanners.

See also https://news.ycombinator.com/item?id=25016838


I guess because then you need a registry to keep those strings unique?


Using a domain name would have been enough, e.g.

secret-token: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

vs

secret-token:sendgrid.com/wQsmLe_tnOiByt8JUO4Kro.S_kyH0U2yBC54LHXoUoI4ppe3T5SFsiNmcWiu4xaVt0

Then you could even standardize that scanners should use sendgrid.com/.well-known/revoke-token to notify of secret tokens found in the wild.


The stated purpose is to allow detection of strings that should not be publicly disclosed. If such string would also plainly identify what it is good for it would essentially defeat the purpose. Ie. the authors want to encourage creation of automated tools that prevent disclosure of such tokens, not encourage creation of automated tools that will harvest such tokens from publicly accessible data.


The purpose is to detect secret strings in source code. If you have the source code, the purpose of the secret is easy to figure out.


Yes, but there's also value in services automatically revoking leaked credentials, which is made easier by including such an identifier.


Couldn’t you protect against that with something like com-amazon-secret-token?




It's not friendly to mobile and too grey for my taste. Reading on this site is really hard task


Yes, right ;)



A little embarrassed that I didn’t know about this site before. Thanks!


Tool, not site. However, let me try to add a little value: if, like me, you didn’t know the IETF’s site offered a tool for web-friendly RFC viewing, check out the full (maybe?) list of their tools. Some of the links don’t work or are insecure, but there are some neat scripts available, too[0].

Linking urls and cross-referenced articles is why I prefer sites like man7[1], for online command reference.

[0] https://tools.ietf.org/

[1] https://www.man7.org/linux/man-pages/index.html


This looks neat, and simple enough to very easily adopt if creating a new service.

How does the IETF RFC process work? I note this is labeled "Category: Informational", is it a standard already or to become one? Where can one follow discussions around this (I'm guessing there's a mailing list somewhere?)?


Informational is outside the standards track. In this context more or less "The authors would like to inform you that this is a thing now (e.g. the URI scheme has been registered), you might want to consider using it", but it's not something where a process to develop it further has happened or is expected (which would be the case for a standard), partially because it's just so simple.


I am the author of an Internet Draft (security.txt) that is going through a similar process to Mark Nottingham's RFC above, so I might be able to help.

This is an "Informational" specification which means it went via the "Non-Standards Track". You can read more about this process here: https://www.ietf.org/standards/process/informational-vs-expe....

The Internet Draft (ID) was presented at the DISPATCH meeting session at IETF 103 which includes some discussion at the end of the presentation: https://youtu.be/OAKv4Sc0jhM?t=1183. The "Informational" vs "Standards" track topic is actually brought up briefly here: https://youtu.be/OAKv4Sc0jhM?t=1660.

There were some discussions surrounding the ID in the ART mailing lists: https://mailarchive.ietf.org/arch/msg/art/52wxNQ4KI-Z_9tFuFL....

You can also see comments by the IESG review board here: https://datatracker.ietf.org/doc/rfc8959/ballot/.

Hope that clears up any confusion. :)


There are (roughly) two separate ways RFCs can be created, and there are also both "standards track" and otherwise RFCs. Both these are emergent properties rather than having been consciously decided upon at the outset.

This was an Independent Submission and it was also not Standards Track.

Here's an RFC about the Independent Submission process.

https://tools.ietf.org/html/rfc4846

The IETF takes the pragmatic view that whether something is in fact "a standard" is an observable fact, rather than something they, lacking any enforcement powers, have control over. As a result for most RFCs in Standards Track the end state is "Proposed Standard" although some may eventually get labelled "Internet Standard" when it turns out that's what they are.

To the extent you can really "follow discussions" for an independent submission they'll have taken place in the mailing list for the ART (Applications and Real Time) area https://www.ietf.org/mailman/listinfo/art

If you were developing something a bit more substantial, and wanted a community of people to work on that, you'd form an IETF Working Group to do so. The products from such Working Groups are the other major source of RFCs, and some (but far from all) will be Proposed Standards.

The decision about what to do, especially if you aren't an experienced participant, is a problem for the IETF group named DISPATCH (or, if it's a security problem, SECDISPATCH). DISPATCH can figure out whether it's a good fit for the IETF ("What is a sandwich?" is not), whether it's something where useful consensus can be achieved ("Should web pages use Javascript?" is not) and what best to do with what you've got.

For example: Maybe you want to propose a new protocol for Internet-connected cat flaps. DISPATCH might suggest there's a Working Group already doing related stuff for home pet networking, they could "adopt" your idea, you should write up a draft, submit that, and then get their chairs to have you present your idea to the group. You should probably also join in their discussions meanwhile, you might realise their existing "Pet toy related protocols" are almost what you need anyway and abandon your idea. Or they might say you seem to have a complete working proposal, we can't find anybody else who is interested in this at all, so just submit it as it is.

Independent submission is also good where you are in fact documenting a real product that exists regardless. If Apple decided to allow every platform to use iMessage, they could write up an Independent Submission explaining the protocol. Microsoft did this for various protocols years ago. Clearly if it wasn't already a "standard" when Windows did it, it won't be a "standard" just because you wrote an RFC, but at least now we can all agree how it's supposed to work.


I am curious if someone can elaborate on the scenario in which one is accidentally committing secrets to source control.

For us, anything that would be a "secret" is stored in a SQLite database that sits next to the executable. In many cases, these secrets would additionally be encrypted with platform-specific techniques such as DPAPI. None of these databases are under source control and would be ignored upon commit.

If we want to load a secret into our application, we have to use an administrative web interface exposed by the application in each required environment. We view the management of secrets and other parameters as an important feature for our product and have built tooling up accordingly. Non-developers are able to manage secrets in our software, and we are actually trying to make this mandatory for compliance reasons.


It's common (for novices/experts who are in a rush ) to accidentally commit AWS keys for dev environments, or other API keys (e.g. Mailchimp) or other secrets (2FA, SAML certs etc).

We all know how to do it right, but particularly early on in a project it's easy to test something with a hardcoded API key, then forget and commit it.

Picking a framework that makes this a less obvious choice (e.g using a .env file that's already in gitignore by default) helps a lot here.


So, what's to prevent a developer from also skipping over proper application of this hypothetical URI scheme if they are in a such rush? It seems like this is a similar level of pedantry that would be disregarded if one were in a hurry.


AWS, for example, could start issuing their tokens with this prefix. The hypothetical inexperienced/rushed developer is a consumer of the service, not an issuer of tokens.


The idea is that people issuing tokens (e.g. AWS, Mailchimp etc) in my example would follow this.

This would then allow other tools (e.g. git, GitHub, pre commit linters, frameworks) to flag this as a problem, either with a warning or an error, depending on the tool.

The idea is that then the novice/rushing expert is prevented from this easy to make mistake.


Cloud services and other third party software starting to generate and require bearer tokens in this format.

It will be easiest for developers in a hurry or without knowledge to simply copy and paste these third party strings.

(Pedantry is only required when writing their own software to generate bearer tokens, not even using a library or framework, and developers rarely do that, especially the kinds of things written in a rush.)


A common situation for this is when people put a Github "personal access token" in a dotfile and then store all their dotfiles on Github.

https://docs.github.com/en/github/authenticating-to-github/c...


Unfortunately, there is tooling, and a subculture that builds tooling, that uses configuration files in the same directory as the source code.

Say you're deploying with cool-deploy-tool. cool-deploy-tool finds the password for the server by looking for a file called "Coolfile" in the project root. You have to create that file to use it. Now you have a footgun lying around.

I associate this stuff with the Ruby community, but it's spread far and wide, notably to the Node and Go communities, which i think have large Ruby emigrant populations.

As a prototypal example, consider dotenv, which began in Ruby:

https://github.com/bkeepers/dotenv

And has travelled to Node:

https://github.com/motdotla/dotenv

And Go:

https://github.com/joho/godotenv


Yeah, I'm not sure why this is such a problem. At work we tried out using detect-secrets[1] for a while to make sure we didn't accidentally commit anything important.

It was a huge pain, it picked up nothing but false positives.

After a few months we decided that our pre-existing practices: have secrets passed to the app in environment variables which are set during deployment, and use fake secrets for test and dev, solved the problem in a much less painful way.

Is it so hard to keep real secrets separate from the source code?

[1] https://github.com/Yelp/detect-secrets


This is a good idea! It's slightly ironic though. Usually, if you have some kind of secret, you don't want to announce it. All things being equal, you would prefer not to have a "wallet.dat" sitting around in an easy-to-find location.

secret-token embraces the opposite theory. Let us carefully deliminate our secrets and make them quite easy to find, in the hope that the systems most likely to find our secrets will altruistically notify us rather than do anything nefarious.


So, what do I do if there are multiple secrets needed? How do I know which secret goes to what? This is just using a protocol as a stand in for a header variable, and it is decidedly not a URI.


You continue to do whatever it is you do now.

This idea is just that the actual value of every secret token should be prefixed with a recognisable string.

So instead of your config file looking like:

  payment.api-key=sF1VGYGsJVnnb23a
  logging.api-key=yunHagrya
It looks like:

  payment.api-key=secret-token:sF1VGYGsJVnnb23a
  logging.api-key=secret-token:yunHagrya
Your code doesn't even need to change. It's just a convention used by parties issuing secret tokens, to make them easier to spot.


The first thing that comes to mind is a mindless "let's factor out the cruft" approach somewhere, in the form `getToken() { return "secret-token:" + getMySecret() }`.


This is a neat idea. I've thrown together a small Python library for working with these: https://github.com/Lexicality/secret-token


Doesn't this defeat the whole purpose of the rfc? As I understand it, the rfc is meant to automatically detect commited secrets by basically finding the secret token uri. If one uses your library which automatically creates this uri by taking a secret and prepending "secret-token:" than the "secret token" part won't be part of the commited secret and thus not be detected by something that looks for the uri.


No, this library is for either encoding secrets immediately after generating them, or decoding them in-memory should you need to do that.

The main use case I created this library for is I look after a lot of 3rd party API keys and I want to encode all of them at rest using this RFC, but I'll need to decode them again in order to use them


The idea might not be too bad, but it is ridiculous to have to put something so long and ugly as fullname "secret-token:".

Why not just: "Secret:" or "token:" or "ost:" for example?


What problem does that solve? This isn't something that a human will ever have to type. Explicit long names are much better.


Typing isn't everything. Humans won't have to type "secret-token:" but they will have to read it and it looks ugly :-)

I would probably have voted for "secret:[..]".

But maybe ugly is good.

Secret tokens shouldn't survive for long in source code or files you read, or .bash_history files, or your editor's buffers. They should be in secret config files that you don't read often or even open in an editor.

Maybe it's better that they are annoying to see, as a reminder to move them somewhere you don't look at.


The uglyness is a feature. I wish it where all caps. You should see that line in a PR and the hair on the back of your neck should stand up and you're off to rotate that secret.


Seems like all of those would be much more susceptible to false positives.


Why would you link a secret token to an original sound track header?


Oauth secret token.


Brevity for common things is useful.

More useful would be keeping the value (the part to the right of 'secret-token:') concise. Some of the systems that grant tokens generate really lengthy base64 encoded strings that one must then schlep around with every request. A few have solved this with UUID-like hash values, thankfully.


Why wasn't this made a URN?


I don’t see that URN gets you anything. You’d be registering a namespace identifier for it, so it’d end up just being the difference between urn:secret-token:* and secret-token:*. Does the extra “urn:” help with anything? Doesn’t seem so to me.


do I get this right?

We're trying to solve something that looks like git weakness with www RFC?


Right, my spidey senses were tingling, too. So I looked into the RFC's author, and noted his position as one of the champions of QUIC as HTTP/3, and his promotion of encrypted DNS (specifically DoH), and then an interesting article that on the surface sounds like it advocates for ensuring end users retain control of their DNS requests, but which actually advocates for burying the implementation of DNS in the OS and only exposing a control layer to the end user (e.g. an on/off toggle). [0] I can't pretend to have enumerated all the possible ramifications of this, but without more than a couple of paragraphs in an RFC which is essentially one particular example use case, I consider this a fairly suspicious addition to the URI spec specifically, when viewed in the context of how this might be used to obscure URLs (a subset of URI) from end users (and software running on end user machines).

[0] https://www.mnot.net/blog/2019/06/11/endpoint_control




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: