This is a really common web app flaw as well; there's almost always something yo...

jasonkester · on Oct 22, 2010

You can use it to your advantage too. Start your IDs off at a big number, and your first few savvy customers that see their ID think you're a lot bigger than you really are.

So instead of knowing they're user #8, they think they're user #14221, and are a lot more likely to trust you with their money.

It works well for throwing off the competition too. "Holy Crap! They have 150k users and they're only 3 months old! We're screwed!!!"

thibaut_barrere · on Oct 22, 2010

When I started freelancing 5 years ago, my accountant advised me to start with a large invoice number (eg: 20850) and increment for each new one :)

eru · on Oct 22, 2010

Decrementing from there would be more fun.

thibaut_barrere · on Oct 23, 2010

Fun indeed! Unfortunately illegal in France afaik.

mcculley · on Oct 22, 2010

When I started consulting, I initially made the date part of the invoice number, e.g., 04100501 was the first invoice sent on October 5th of 2004. This sidestepped the problem of showing customers how many invoices I had already sent. But it did show them how many I sent that day, which was usually not many.

I eventually switched to a sequential monotonic invoice numbering scheme for other reasons, mostly because it made it easier on the webapp I use for invoicing.

InclinedPlane · on Oct 23, 2010

yymmdd<seq-#> is monotonically increasing. Though you could also just combine the first 6 or so digits of unix time and tack on a sequence number to the end of that.

there · on Oct 22, 2010

i've heard of new companies doing that when ordering checks.

though strangely i've also seen other companies refuse checks below a certain number, as some sort of measure of credit worthiness.

chadgeidel · on Oct 22, 2010

I had always understood it to be the other way around. Companies won't accept checks below a certain number, so you just start your check run at an "acceptably high" number.

sudont · on Oct 22, 2010

Yep. The bank started my first checking account at #1500.

gregpilling · on Oct 22, 2010

I have always done this with checks and with invoice numbers. I generally start at some number just over 10,000.

codebaobab · on Oct 22, 2010

Perhaps your comment was meant to be light-hearted and I am just being pedantic, but all an interested party has to do is come back a few more times over the course of a few days and note the values of new IDs. At that point they know the rate at which you are (adding new customers/adding new content/whatever) and can project backwards from there (completely ignoring the absolute value of the initial ID.)

snprbob86 · on Oct 22, 2010

Would a repete customer really care what your rate is? If this is about impressions, the fact that they are returning indicates that you made a good enough impression that should far overshadow your invoicing rate.

codebaobab · on Oct 22, 2010

I'm not not thinking about repeat customers. I am thinking about competitors and people looking to gather industry information.

The point, as stated by tptacek, is that sites quite often leak information.

citricsquid · on Oct 22, 2010

What a coincidence! Just today I used the sequential ID numbers of a companies deployment to work out their user numbers, it's interesting to compare things like that with reported figures. I wonder what the best approach to it is (when protecting those numbers) maybe something like Twitter where they would take a chunk of numbers and then stagger deployment through those numbers (If I remember correctly, I'm not all that familiar with their system)

pauldino · on Oct 22, 2010

One thought I've had is encrypting sequential IDs with DES. The block size is 64 bits which conveniently maps to int types and the token you get is half the size of a UUID.

JoachimSchipper · on Oct 22, 2010

> DES

Almost anything involving that word, including your proposal, is a bad idea.

tbrownaw · on Oct 22, 2010

Would you care to enlighten us as to how?

JoachimSchipper · on Oct 22, 2010

DES' key is short enough to brute-force (see below for time/cost). It's not tremendously difficult to obtain some output samples: make the first account, you now know DES(key, 0), where key is the actual key used by the site. Then run through all 2^56 keys k until DES(k, 0) = DES(key, 0); for some extra assurance, you could also check DES(k, 1) = DES(key, 1) by making another account.

Once you have key, the proposed scheme is no better than sequential account numbers.

This gets slightly more difficult if you don't know your account id; in that case, simply create a couple of accounts immediately after each other (script it), and check whether DES(k, i) = DES(key, counter), DES(k, i + 1) = DES(key, counter + 1) etcetera, where again key is the real key, counter is the real counter at the time of creation of the first account. You now have to bruteforce counter (i) as well as key (k), but that's still doable.

Brute-forcing DES is not easy on a desktop, but http://www.sciengines.com/copacobana/faq.html offers a $10,000 off-the-shelf solution that can break DES in 8.7 days (source: their presentation at CHES 2006). Note that this is uses components from 2006 and that it's easy to trade cost for speed (i.e. just buy half as many boards.)

The above is the most obvious attack; I'm in no way saying there are no others. It's not impossible to make schemes like the above work (e.g. using the pair (AES(key1, id), HMAC(key2, AES(key1, id))) and a good library); but even the scheme proposed is more complex than just picking random account numbers.

Poiesis · on Oct 22, 2010

Surely for a startup this sort of information is not worth $10000? It just seems like you've got so many more important things to worry about than to worry if someone,s going to spend thousands of dollars figuring out how many users you have (which is likely an effect, not a cause, of any competitive advantage).

JoachimSchipper · on Oct 22, 2010

If Moore's law holds and you're willing to wait a month instead of ~9 days, it costs <$1500 in hardware, and you can keep/resell that. Or, again, rent time. Or just use a distributed.net-esque approach and bruteforce using (a lot of) standard PCs. Or...

Seriously, "people aren't going to hack us" is the new "premature optimization is the root of all evil": a convenient excuse to do it horribly wrong.

blaines · on Oct 22, 2010

A good solution for obfuscating/encoding ids: http://github.com/kylebragger/tiny

InclinedPlane · on Oct 22, 2010

I wouldn't call that a "good" solution per se, it's highly vulnerable to pre-computed table cracking (or merely discovering and making use of the algorithm).

pmjordan · on Oct 22, 2010

You're pretty much stuck with doing that for invoicing, at least in Europe, where invoice numbers must be sequential. OK, you'll have to spend some money to receive an invoice. But then, at least in the UK, limited companies' accounts are effectively public anyway.

slantyyz · on Oct 22, 2010

Even if not required by law, I'm sure you'll be subject to more scrutiny by any taxation agency if you don't serialize your invoice numbers.

As far as they're concerned, you're throwing away invoices to pocket additional income.

InclinedPlane · on Oct 23, 2010

Invoice numbers must be sequential in Europe? What's the reasoning behind that?

codebaobab · on Oct 22, 2010

I had an idea for an online business a few years back that I never really pursued, but I did work out a scheme to assign key objects random ids so that when those ids appeared in a URL, you couldn't tell how many of those had been created. I keep meaning to do a write up, but, until now, I've never heard anyone else notice the potential to leak information, so I just assumed that no one else cared.

yummyfajitas · on Oct 22, 2010

The django-extensions project provides several simple ways to avoid this - just make URL's reference UUIDs or slugs. I imagine RoR has something similar.

http://github.com/django-extensions/django-extensions/blob/m...

gaelian · on Oct 22, 2010

For RoR, you can override to_param on the model in question, something like:

  class User < ActiveRecord::Base
    def to_param
      self.login
    end
    ...
  end

Assuming self.login is reliably unique. Although I've found this can still have some follow on effects that could need to be handled in a situation similar to the above. There are also some heavier weight, more feature filled options of course.

yummyfajitas · on Oct 22, 2010

Some of the heavyweight ones are nice as far as human readability goes. Consider the AutoSlugField (in Django). Say you have an object, Item(name='Phillip Lim Pleated printed silk-chiffon dress', description='...).

Which link is less scary?

http://secret-project/item/phillip-lim-pleated-printed-silk-...

or

http://secret-project/item/47a61d7c-dde1-11df-ba19-0026c72a6...

lincolnq · on Oct 22, 2010

Funny! I've come to associate slug-type URLs (like the former, that have no numbers or anything in the URL) with content-free websites that have been SEO'd. The latter is also scary though, because it's too MANY random-ass numbers. I have sort've come to expect a combination of a numeric identifier and readable words for reputable sites.

isb · on Oct 23, 2010

The second link although ugly is less "guessable" if you want to skimp on the authorization check. That aside, sometimes an entity might not have a natural key that can be slugified (e.g. an invoice), in which case UUID is better. Also, using a generating/looking-up a UUID might be faster?

lsc · on Oct 22, 2010

on the other hand, in many cases this information is revealed, to no apparent ill effect.

Look at Linode's display of how many servers they have is available. They are giving away how much business they do in a day, but is that hurting them? if anything, I think it helps them.

several other popular companies here on HN post their revenue numbers publicly.

So yeah, while you should be aware of such things, and make a conscious choice, well, for many of us, this isn't data that really needs to be kept secret.

InclinedPlane · on Oct 22, 2010

Worse yet, this sort of behavior can make security flaws much worse. Imagine a comparatively minor security flaw that allows an attacker to view otherwise secret information for customers, orders, members, etc. If you send plain-text sequential ids around then it becomes trivial to exploit that security hole to gain access to a lot of juicy data. However, if you hash or obfuscate every identifier then the problem for the attacker becomes much, much harder, since now the search space is vastly larger.

Defense in depth, belt and suspenders, and all that.

nandemo · on Oct 22, 2010

Would you care to explain how is that a flaw?

werrett · on Oct 22, 2010

Obviously the company and nature of a site will dictate whether leaking that information is considered a "flaw" and how bad a one.

Depending on how user sessions are tracked, being able to predict other valid "user ids" based on your own is an important first step to attacking other accounts.

It isn't unusual to find other flaws that will enable you to pull more (potentially sensitive) information about users or even "impersonate" users when armed with knowledge of someone else's valid user id.

Non-public companies certainly don't have too many obligations to publish information on the amounts of customers, numbers of transactions, etc they are doing. Even public ones won't break a lot of that out.

One of tptacek's strangers (competitors?) being able to tell how many paying online subscribers a newspaper has signed up would probably make someone in management squirm.

Likewise with being able to tell how many transactions an Internet Banking application is pumping.

Both of those are real examples.

brianpan · on Oct 22, 2010

Competitive intelligence - http://en.wikipedia.org/wiki/Competitive_intelligence- competitors can gather information about your business, make estimates about the health of your business, etc.