Hacker News new | past | comments | ask | show | jobs | submit login
Keeping our free tier sustainable by preventing abuse (geocod.io)
106 points by thecodemonkey 64 days ago | hide | past | favorite | 46 comments



Thanks for this writeup. Whenever people complain about some service removing or making it harder to try out a free tier, I think they don't realize the amount of abuse that needs to be managed by the service providers.

"Why do things suck?" Because parasites ruined it for the rest of us.

> We have to accept a certain amount of abuse. It is a far better use of our time to use it improving Geocodio for legitimate users rather than trying to squash everyone who might create a handful of accounts

Reminds me of Patrick McKenzie's "The optimal amount of fraud is non-zero" [1] (wrt banking systems)

Also, your abuse-scoring system sounds a bit like Bayesian spam filtering, where you have a bunch of signals (Disposable Email, IP from Risky Source, Rate of signup...) that you correlate, no?

[1] https://www.bitsaboutmoney.com/archive/optimal-amount-of-fra...


Co-Founder of Geocodio here who designed the scoring system :)

I suppose you could call it inspired by Bayesian inference since we're using multiple pieces of independent evidence to calculate a score, though that makes it sound a bit fancier than it is and we aren't using the Bayes' theorem. But it's possible I had that in the back of my head from a game theory class I took long ago.

But for the fun of it, let's model it that way:

Probability (Spam | disposable email domain, IP address, etc... ) = [probability(disposable email domain, IP address, etc... | spam) x prior probability(spam rate)] / probability(disposable email domain, IP address, etc...)

Or something like that.

Also — it's a delight to have one of Patrick's articles mentioned in connection with this!


> "The optimal amount of fraud is non-zero" [1] (wrt banking systems)

It's a bit like how each 9 of runtime is an order of magnitude (ish) more expensive to achieve, and most use cases don't care if it's 99.999% or 99.9999%.


Free tier and free trial abuse is a huge problem, but also a huge opportunity.

We have seen customers where free tier abusers created 80k+ accounts in a day and cost millions of dollars. We have also seen businesses, like Oddsjam add significant revenue by prompting abusers to pay.

The phycology of abuse is also quite interesting, where even what appears to be serious abusers (think fake credit cards, new email accounts etc.) will refuse a discount and pay full price if they feel they 'got caught'


I’d love to hear more about the idea that somebody making a fraudulent signup with a stolen credit card is potentially going to pay full price if they “get caught”


There are obviously people who are doing free trial abuse for commercial gain eg. Signing up 1k accounts to get test credit cards or to resell accounts. They are not going to convert (although sometimes you can successfully convert them into affiliates)

We have seen individuals just trying to get free accounts week after week, who when nudged once pay immediately thousands of dollars even after using fake, stolen or empty cards.

These individuals think they are being cheeky and when they are 'caught' they revert to doing the right thing.


> We have seen individuals just trying to get free accounts week after week, who when nudged once pay immediately

This pattern is everywhere. It was foreign to me for a long time because I'm the type of person who likes to play within the rules. There are a lot of people who get a kick out of gaming the system to their advantage, even to the point of breaking the law.

Many people have zero qualms about stealing things when they imagine it's a faceless corporation on the other side. They might even rationalize it with mental gymnastics until they think they're doing the right thing. You see it most often when the topic of media piracy or sharing Netflix logins comes up.

This mindset is very common in startup communities. I've heard so many stories from founders gloating about how they abused some system or used a loophole to avoid paying for something they could clearly afford. It's like a badge of honor to some people. I know one guy who bought an EV but hasn't installed a charger because he drives it to a business down the street and uses the EV charger they installed for their employees every night. Another guy used to brag about sneaking into a cafeteria for another organization and stealing lunch every day. A while ago I talked to a guy who liked to "dine and dash" without paying his tab, even though he could easily afford it. For them, it's all about getting away with it and winning a game.

As soon as you make it obvious that someone is watching them, they cave. They don't want to be the type of person who abuses actual people. They only like to abuse what they see as faceless systems.


You got called out, responded, but didn’t really address the point. Looks like the original claim was overstated.


I was referring to generated or disposable card numbers rather than stolen. maybe that is the confusion?

An concrete examples of converting a user using these types of cards for free trial abuse is a user who signed up 8 week in a row using different emails, names, IPs and cards. Nudging of these users was enabled and on trying to sign up for their 9th trial they immediately switched back to their original account and converted at full price.


I’m not in the that industry so maybe you’re not defining “stolen” as criminal activity?

I think many people would infer it to mean there’s a victim involved, someone’s personal credit card was stolen. If that’s the case it’s especially bizarre and not a customer you’d want to convert.


I imagine an amateur who wants the problem to go away as quickly as possible and with minimum fuss, to the point of overcompensating from anxiety.


100%! This was easy and now it is frustrating to get to the thing they want, the service, and the easiest route is to pay.


Great writeup. Simple heuristics very often work wonders. The fraudsters are out there and try to pinch holes in your shield. Some time ago we were running a mobile service provider and had some issues with fraudulent postpaid subscribers - however the cost of using background checking services was substantial. We solved it quite effectively by turning the background checks on when the level of fraud went over a certain threshold which made them go away for some weeks. We kept this on and off pattern for a very long time with great success as it lowered the friction to sign up significantly when turned off…


When sites use an AI generated image like this and don't bother to spend 10 seconds looking to make sure it looks okay (UIGN SIGN UPP? AISK ANACIS?) it makes me question whether that same level of care was put into writing the article.


Isn't it nice to have just a little bit of an illustration instead of just text? Obviously an AI-generated image is going to spit out some nonsense text as part of the graphic, but we're not really trying to hide that it's AI generated.


I think things that require high credibility and have a learned readerbase it'd be better to not give a careless image, even at the cost of a cool image. I wouldn't mind an almost right image on some advert for cleanex or intranet holiday reminder mail, but I would be very concerned if it was used as part of EU directive


I get why they don't want to share their detection mechanics for potential fraudulent signups, but that is a very interesting topic to learn and discuss.


I would love do a more in-depth talk about this at some point with some more concrete examples.


Apple‘s mail privacy protection creates disposable addresses with host icloud.com. It’s not as hassle free and can’t be automated, but this could definitely be used to create a lot of free accounts. But I don’t see them banning this domain I guess?


We are mainly B2B so we don't really see signups using Apple's email relay. That said, it could be something we might have to consider blocking in the future if it becomes a problem.

For paying customers, it probably doesn't make a lot of sense to use an anonymous email address, since we ask for your name and billing address either way (have to stay compliant with sales taxes!)


How does an address API get it's info? Presumably addresses don't change often right? When they do, how does a service like this update it's records?


Actually, I found this very same company has a previous blog post addressing this question:

https://www.geocod.io/code-and-coordinates/2025-01-13-how-ge...


This is cool! They mention they aggregate from over 3000 sources. What are the usual sources for something like this? Is this public government data from cities?


yep! Local governments create some of the source data we use for tax purposes.

We have more about our data sources here: https://www.geocod.io/data-sources/


Makes me wonder how easy / hard it is to turn this kind of feature into a standalone product?

IE, send email, IP, browser agent, and perhaps a few other datapoints to a service, and then get a "fraudulent" rating?


This is basically what Google's reCAPTCHA v3 does: https://developers.google.com/recaptcha/docs/v3

The other versions of recaptcha show the annoying captchas, but v3 just monitors various signals and gives a score indicating the likelihood that it's a bot.

We use this to reduce spam in some parts of our app, and I think there's an opportunity to make a better version, but it'd be tough for it to be better enough that people would pay for it since Google's solution is decent and free.


Also called DaaS, "discrimination as a service"


Not sure if this was a slight but yes, payment providers and other services need to discriminate valid uses of their service from fraudulent.


I'm thinking along the lines of "let's ban all the Chinese" and "let's ban all the Russians", because that's where the abuse comes from. That's often what those models, both simple and advanced, boil down to.

American stores could prevent most shoplifting by banning people of a certain skin color from entering. The US doesn't let them do this, even though it would most definitely work. They're not allowed to do it for a very good reason, but those reasons seem to be lost to internet companies, who seemingly push so hard for diversity, equity and inclusion.


Except stores aren’t banning the customers from browsing or building a cart. It’s only when someone goes to pay does the fraud detection run and block the transaction. What the US does allow companies to do, like Walmart, is run “background checks” on you before allowing you to cash a check. Over the years I’ve known many with problematic banking history or bad credit who would get denied from this.

I agree blanket bans like you bring up would be problematic and wrong, but I see nuance in using, say, the country of origin as one of the factors in their risk assessment.


There's nothing wrong with trying to discriminate against bots.

If your setup makes you look like a bot, that's YOUR problem. Stop doing things that make you look like a bot.

I get that you want privacy, but so do bots.


very cool, I wasn't expecting to find this so interesting. I yesterday for the first time thought about the "abuse the free tier" actors. I was trying to use a batching job service which limited free-tier batch sizes to 5, which was so low that it took away the point from using the automated job in the first place. I think the little info box explained that they keep the limit low to prevent abuse, and I started thinking about other ways they could prevent that abuse. Your post was very topical. thanks for sharing!


Thanks for giving it a read!


Where can we get a blocklist of those throwaway email domains?

or perhaps a really big whitelist of good ones? that would be extremely helpful!


Neither is a viable option, otherwise all the big players would've done this a long time ago. Nothing is stopping you from creating a throwaway account on Gmail while someone using a custom domain might be your new B2B lead. There's no realistic way to tell which it is simply from the domain.


I think they were referring to actual throwaway email providers. Companies that specifically provide that as a service.


There's a couple of great open source projects[1][2][3] that try to keep up-to-date lists of domains that belong to disposable email providers.

I would probably not recommend implementing a whitelist for blocking purposes. But perhaps domains on a whitelist could get a slight scoring bump.

[1] https://github.com/disposable-email-domains/disposable-email... [2] https://github.com/disposable/disposable [3] https://github.com/unkn0w/disposable-email-domain-list


As for abuse, I made myself a tool to give myself quintillions of email addresses (not using plus addressing) on gmail.com

I use this to sign up for a service with a unique email that is basically my junk box, but the email is its own unique entry in my password manager


I don't see how you could know what everyone's personal domain is to whitelist.


so you implemented some sort of machine learning?


Not at this time. Some simple heuristics go a long way and also makes it very easy to test and debug the logic.


I’ve seen fraud detection used in a SaaS product, and the great thing about a weighted rules approach, is professional services can understand it well enough to adjust it without help from engineering or data science, and they can explain to customers how it produced the results it did in a particular case, and the tradeoffs of adjusting the weights or thresholds, and the customers can understand it too. Whereas, a machine learning model, is much harder to understand and adjust, so issues are much more likely to be escalated back to engineering.

(This isn’t protecting the SaaS vendor against abusive signups, it is a feature of the SaaS product to help its customers detect fraud committed against themselves within the SaaS product’s scope.)


I once did a machine learning project at Intel. The end result was that it was no better than simple statistics; but the statistics were easier to understand and explain.

I realized the machine learning project was a "solution in search of a problem," and left.


Career hack: skip the machine learning and implement the simple statistics, then call it machine learning and refuse to explain it.


statistical regression is also machine learning.


hack v2: call it AI




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: