"Basically, we were in a race to develop new anti-fraud techniques and they were...

sillysaurus3 · on March 18, 2014

The quote was so interesting that I went and found the source: http://ecorner.stanford.edu/1028.ect

Now there's this gem:

"it was basically became clear that we either figure out how to beat the fraudsters or the fraudsters will take us under. And the company more or less refocused itself as a research entity towards figuring out innovative technological ways of destroying fraud in the Internet. And that alone could be the subject of an entire class or a series of classes so I will completely skip over all the cool technology we developed. Some of you might have seen stuff in the news, words like EOR or ELIA, all these tools that we've built. They're as cool as they sound. I could never tell you about them because they're very secret and they're still in use. But maybe if you want to hang out afterwards, I can tell you a little bit. But they're really cool and we did really figured out how to kill fraud.

...

The submerged part of PayPal is this massive and very, very numerically-driven risk management system which allows us to instantaneously tell when you're moving money to someone else, with a very high degree of certainty whether the money you're moving is yours or you got it illegally and we might be on the hook later on to help the authorities investigate or retrieve the money, et cetera, et cetera."

I simply must know. I'll never find out how Paypal killed fraud unless I ask HN right now, and I've got to know. So, with apologies for branching into a completely unrelated topic:

What magic did Paypal invent in order to do all of those amazing things described above? How'd they kill fraud? What are the details of how the technology works? Is it still a closely guarded secret? Do they use some advanced statistical mathematics to detect fraud, or do they bruteforce the detection by providing massive quantities of data to the otherwise-dumb detector algorithm?

What fascinates me is that they've come up with a way to detect fraud automatically, with no human input. There are false positives, which everyone here has probably had some terrible experiences with, but... Still, how did they train a computer to detect and freeze the very sophisticated and very subtle covert techniques of Russian mobster money launderers?

Any info would be very much appreciated, especially citations / references for further reading on this topic.

patio11 · on March 18, 2014

I didn't work in risk analysis at Paypal, but I have a passing interest in the field. A lot of it is much simpler than you think it is.

Want a not-so-secret anti-fraud technique? If the billing address is Kansas and the IP is China, you probably shouldn't let that transaction go through. Really obnoxious for those of us who live overseas. Stupendously valuable, though.

There exists a particular anti-fraud heuristic at a YC company. They shared it to me in confidence, because as soon as you know it exists, you can trivially avoid it. I mean trivially. It's apparently insanely effective, though, half because it has a really good handle on who it wants to frustrate and the other half because it's not in the literature at all and, as a consequence, the bad guys don't even know they have to avoid defenses in that class of algorithms. ("But that's security through obscurity!" Good recitation of the dogma, but can I point out to you "This was implemented, in actual computer code, and does in fact actually work?")

There exist more complicated things you can do with machine learning. There also exist more complicated things you can do with heuristics. There also exist more complicated things you can do with live fraud teams. There is non-zero value to finding fraud even after it has happened, because shutting down the fraudulent accounts means you can either keep some horses in the barn or even, potentially, call some of the stolen ones back.

An underappreciated angle of this is that you don't have to outrun the tiger, you just have to outrun your friend. You start getting dedicated adversarial interest as you approach the most lucrative weak link in the entire financial system. That was Paypal back in the day -- with a bullet. Even though no startup/bank/etc ships with perfect security, investing sufficiently in security means the guy that gets victimized is someone other than you. They either go bankrupt or have their problems burned away in cleansing fire. Then the cycle repeats. (One reason Bitcoin companies keep getting looted is that if you plot out "Amount we could conveniently steal" versus "Resources spent on defense in the last 6 months" for all companies in the financial sector there are a lot of dots representing Bitcoin companies which are isolated islands, and the sea is filled with sharks.)

It's a fun field, for a certain perverse and high-stress definition of "fun." I'd probably have made a career in it, but found making and selling software (with just a wee bit of risk management/security/etc thrown in) to be more fulfilling.

sillysaurus3 · on March 18, 2014

There exists a particular anti-fraud heuristic at a YC company. They shared it to me in confidence, because as soon as you know it exists, you can trivially avoid it. I mean trivially. It's apparently insanely effective, though, half because it has a really good handle on who it wants to frustrate and the other half because it's not in the literature at all and, as a consequence, the bad guys don't even know they have to avoid defenses in that class of algorithms.

That moment when I realize I should've come up with a way of demonstrating my own ability to keep other peoples' secrets, so that I can be let in on this one and satisfy this burning desire to figure out what the heck that particular fraud detection technique could possibly be...

Well, clearly I'm going to be thinking all day about fraud detection methods and about guessing your technique. Hmm. I may as well guess your technique openly:

Your technique must be something unexpected, and seemingly unrelated to "actions that fraudsters would normally be careful not to give themselves away with." So I'm going to guess that the technique is to analyze the linguistic traits of each user's password. Most people reuse passwords, and if the fraudsters share a similar cultural background then the password has a decent chance of giving away the fact that a particular user is from that same cultural background, e.g. if the password contains a foreign dictionary word in their native tongue yet their their purported home address is in a completely different country. That sort of thing.

Or if the same group of fraudsters keep reusing the same passwords, or even parts of their previous passwords, then that would probably be enough to accurately detect them. Most people who don't use a password manager tend to reuse substrings of their prior passwords, so they end up leaving a pretty distinctive identity "footprint" by the type of passwords they choose. Their passwords are likely to share a similar structure, such as always consisting of two words followed by two numbers, for example, or always starting their passwords with two symbols like %#.

And of course, if the fraudsters became aware that their password was giving themselves away, they'd trivially dodge the detector in the future.

Hmm. I'll just have to invent my own fraud detection techniques. This is a lot of fun.

toomuchtodo · on March 18, 2014

> There exists a particular anti-fraud heuristic at a YC company. They shared it to me in confidence, because as soon as you know it exists, you can trivially avoid it. I mean trivially. It's apparently insanely effective, though, half because it has a really good handle on who it wants to frustrate and the other half because it's not in the literature at all and, as a consequence, the bad guys don't even know they have to avoid defenses in that class of algorithms.

Please don't tell me its capitalization of the cardholder name...

patio11 · on March 18, 2014

On a different note, I'm happy to disclose the one heuristic BCC has since I'm the only one who can get negatively affected by it. I have several thousand names of people who have previously bought BCC, and they are a pretty diverse sampling of a slice of the American experience. If you hit the trifecta of a) a first name I've never seen before, b) you use a free email provider and c) you've never actually made a bingo card yet, your purchase causes my phone to light up so that I can refund it prior to the actual cardholder hitting me with a chargeback. I also add it to a spreadsheet that I periodically bug Paypal's fraud team with.

This heuristic has literally perfect detection and recall [+] against a particular carder ring, which was using BCC to test cards (via our Paypal account, sadly, so I have limited pre-charge options to fix it) this summer.

Paypal, to their credit, has apparently shut down this carder ring, since I can't remember that heuristic firing in 2014. (Edit: Poor phrasing. The heuristic not firing doesn't mean I'm safe. Having a historical level of chargebacks, which is less than one a quarter, rather than having the carder ring induced "20+ in a week." suggests that I'm safe.)

[+] Edit: It's 2 AM and I can't remember if this is the right circumstance for that jargon. What I'm trying to say is, of the universe of charges that I can classify as good or bad at N months after the charge, 100% of the ones this heuristic flags are in fact caused by a carder ring and 100% of the ones this heuristic fails to flag are, to the limit of my current understanding, not caused by that carder ring.

wikwocket · on March 18, 2014

Very interesting heuristic! I can see why data mining could be so useful for e.g. Paypal: analyzing data on known-fraudulent and known-valid transactions, looking for unexpected correlations...

Although in this particular case I wonder if the most benefit comes from simply flagging people who buy the product without trying it... and that is a heuristic that almost any SAAS could trivially implement.

toomuchtodo · on March 18, 2014

Thank you for sharing Patrick. Informative as usual!

euroclydon · on March 18, 2014

> a first name I've never seen before

You mean you just see if the first name is not already represented in BCC's user database? Under what conditions are your users required to give you their first name? For the trial or just for the purchase?

patio11 · on March 19, 2014

Paypal passes a first and last name with purchases, and some customers provide theirs in Settings (it's not a requirement), so I have about 5,000 or so lying around.

svenkatesh · on March 19, 2014

Why would he be refunding someone for a free trial? ...

euroclydon · on March 19, 2014

You're missing the point. He refunds a purchase if he's not seen the name before plus the other two conditions. My question is: what exactly does he mean by "I haven't seen the name before"?

enraged_camel · on March 18, 2014

>>Want a not-so-secret anti-fraud technique? If the billing address is Kansas and the IP is China, you probably shouldn't let that transaction go through. Really obnoxious for those of us who live overseas. Stupendously valuable, though.

It's also a great way to completely fuck over foreign students who use their parents' credit cards to be able to do things like buy textbooks, pay tuition, and so on.

toomuchtodo · on March 18, 2014

Two factor auth (text/push notification) would alleviate a lot of this pain. American Express is moving in this direction with real-time fraud push notifications/transaction acceptance.

aestra · on March 18, 2014

I don't know about the specifics of PayPal but....

Automatic fraud detection is common and it is based on data analysis. See http://en.wikipedia.org/wiki/Data_analysis_techniques_for_fr... for more info.

>Fraud management is a knowledge-intensive activity. The main AI techniques used for fraud management include:

>Data mining to classify, cluster, and segment the data and automatically find associations and rules in the data that may signify interesting patterns, including those related to fraud.

>Expert systems to encode expertise for detecting fraud in the form of rules.

>Pattern recognition to detect approximate classes, clusters, or patterns of suspicious behavior either automatically (unsupervised) or to match given inputs.

>Machine learning techniques to automatically identify characteristics of fraud.

>Neural networks that can learn suspicious patterns from samples and used later to detect them.

>Other techniques such as link analysis, Bayesian networks, decision theory, land sequence matching are also used for fraud detection.[3]

It is also used by insurance companies to detect fraudulent claims. From what I understand, the software is very very good at flagging possible fraud and there is probably a human reviewer of the flagged claims.

Paypal wants to keep it a secret because if the fraudsters known what Paypal is looking for, they can change their behavior to bypass the fraud detection. Also, it is a threat to their business if competitors use it as well.