This is a really common web app flaw as well; there's almost always something your app does that reveals how many customers you have (or how much business you're doing) to any stranger with an account.
You can use it to your advantage too. Start your IDs off at a big number, and your first few savvy customers that see their ID think you're a lot bigger than you really are.
So instead of knowing they're user #8, they think they're user #14221, and are a lot more likely to trust you with their money.
It works well for throwing off the competition too. "Holy Crap! They have 150k users and they're only 3 months old! We're screwed!!!"
When I started consulting, I initially made the date part of the invoice number, e.g., 04100501 was the first invoice sent on October 5th of 2004. This sidestepped the problem of showing customers how many invoices I had already sent. But it did show them how many I sent that day, which was usually not many.
I eventually switched to a sequential monotonic invoice numbering scheme for other reasons, mostly because it made it easier on the webapp I use for invoicing.
yymmdd<seq-#> is monotonically increasing. Though you could also just combine the first 6 or so digits of unix time and tack on a sequence number to the end of that.
I had always understood it to be the other way around. Companies won't accept checks below a certain number, so you just start your check run at an "acceptably high" number.
Perhaps your comment was meant to be light-hearted and I am just being pedantic, but all an interested party has to do is come back a few more times over the course of a few days and note the values of new IDs. At that point they know the rate at which you are (adding new customers/adding new content/whatever) and can project backwards from there (completely ignoring the absolute value of the initial ID.)
Would a repete customer really care what your rate is? If this is about impressions, the fact that they are returning indicates that you made a good enough impression that should far overshadow your invoicing rate.
What a coincidence! Just today I used the sequential ID numbers of a companies deployment to work out their user numbers, it's interesting to compare things like that with reported figures. I wonder what the best approach to it is (when protecting those numbers) maybe something like Twitter where they would take a chunk of numbers and then stagger deployment through those numbers (If I remember correctly, I'm not all that familiar with their system)
One thought I've had is encrypting sequential IDs with DES. The block size is 64 bits which conveniently maps to int types and the token you get is half the size of a UUID.
DES' key is short enough to brute-force (see below for time/cost). It's not tremendously difficult to obtain some output samples: make the first account, you now know DES(key, 0), where key is the actual key used by the site. Then run through all 2^56 keys k until DES(k, 0) = DES(key, 0); for some extra assurance, you could also check DES(k, 1) = DES(key, 1) by making another account.
Once you have key, the proposed scheme is no better than sequential account numbers.
This gets slightly more difficult if you don't know your account id; in that case, simply create a couple of accounts immediately after each other (script it), and check whether DES(k, i) = DES(key, counter), DES(k, i + 1) = DES(key, counter + 1) etcetera, where again key is the real key, counter is the real counter at the time of creation of the first account. You now have to bruteforce counter (i) as well as key (k), but that's still doable.
Brute-forcing DES is not easy on a desktop, but http://www.sciengines.com/copacobana/faq.html offers a $10,000 off-the-shelf solution that can break DES in 8.7 days (source: their presentation at CHES 2006). Note that this is uses components from 2006 and that it's easy to trade cost for speed (i.e. just buy half as many boards.)
The above is the most obvious attack; I'm in no way saying there are no others. It's not impossible to make schemes like the above work (e.g. using the pair (AES(key1, id), HMAC(key2, AES(key1, id))) and a good library); but even the scheme proposed is more complex than just picking random account numbers.
Surely for a startup this sort of information is not worth $10000? It just seems like you've got so many more important things to worry about than to worry if someone,s going to spend thousands of dollars figuring out how many users you have (which is likely an effect, not a cause, of any competitive advantage).
If Moore's law holds and you're willing to wait a month instead of ~9 days, it costs <$1500 in hardware, and you can keep/resell that. Or, again, rent time. Or just use a distributed.net-esque approach and bruteforce using (a lot of) standard PCs. Or...
Seriously, "people aren't going to hack us" is the new "premature optimization is the root of all evil": a convenient excuse to do it horribly wrong.
I wouldn't call that a "good" solution per se, it's highly vulnerable to pre-computed table cracking (or merely discovering and making use of the algorithm).
You're pretty much stuck with doing that for invoicing, at least in Europe, where invoice numbers must be sequential. OK, you'll have to spend some money to receive an invoice. But then, at least in the UK, limited companies' accounts are effectively public anyway.
I had an idea for an online business a few years back that I never really pursued, but I did work out a scheme to assign key objects random ids so that when those ids appeared in a URL, you couldn't tell how many of those had been created. I keep meaning to do a write up, but, until now, I've never heard anyone else notice the potential to leak information, so I just assumed that no one else cared.
The django-extensions project provides several simple ways to avoid this - just make URL's reference UUIDs or slugs. I imagine RoR has something similar.
For RoR, you can override to_param on the model in question, something like:
class User < ActiveRecord::Base
def to_param
self.login
end
...
end
Assuming self.login is reliably unique. Although I've found this can still have some follow on effects that could need to be handled in a situation similar to the above. There are also some heavier weight, more feature filled options of course.
Some of the heavyweight ones are nice as far as human readability goes. Consider the AutoSlugField (in Django). Say you have an object, Item(name='Phillip Lim Pleated printed silk-chiffon dress', description='...).
Funny! I've come to associate slug-type URLs (like the former, that have no numbers or anything in the URL) with content-free websites that have been SEO'd. The latter is also scary though, because it's too MANY random-ass numbers. I have sort've come to expect a combination of a numeric identifier and readable words for reputable sites.
The second link although ugly is less "guessable" if you want to skimp on the authorization check. That aside, sometimes an entity might not have a natural key that can be slugified (e.g. an invoice), in which case UUID is better. Also, using a generating/looking-up a UUID might be faster?
on the other hand, in many cases this information is revealed, to no apparent ill effect.
Look at Linode's display of how many servers they have is available. They are giving away how much business they do in a day, but is that hurting them? if anything, I think it helps them.
several other popular companies here on HN post their revenue numbers publicly.
So yeah, while you should be aware of such things, and make a conscious choice, well, for many of us, this isn't data that really needs to be kept secret.
Worse yet, this sort of behavior can make security flaws much worse. Imagine a comparatively minor security flaw that allows an attacker to view otherwise secret information for customers, orders, members, etc. If you send plain-text sequential ids around then it becomes trivial to exploit that security hole to gain access to a lot of juicy data. However, if you hash or obfuscate every identifier then the problem for the attacker becomes much, much harder, since now the search space is vastly larger.
Defense in depth, belt and suspenders, and all that.
Obviously the company and nature of a site will dictate whether leaking that information is considered a "flaw" and how bad a one.
Depending on how user sessions are tracked, being able to predict other valid "user ids" based on your own is an important first step to attacking other accounts.
It isn't unusual to find other flaws that will enable you to pull more (potentially sensitive) information about users or even "impersonate" users when armed with knowledge of someone else's valid user id.
Non-public companies certainly don't have too many obligations to publish information on the amounts of customers, numbers of transactions, etc they are doing. Even public ones won't break a lot of that out.
One of tptacek's strangers (competitors?) being able to tell how many paying online subscribers a newspaper has signed up would probably make someone in management squirm.
Likewise with being able to tell how many transactions an Internet Banking application is pumping.