Hacker News new | past | comments | ask | show | jobs | submit login
Mimicking a device is becoming almost impossible (multilogin.com)
290 points by zdw on June 28, 2021 | hide | past | favorite | 212 comments



Browser automation is an arms race between fraud detection and criminals, with users and some legitimate use cases (cough science) caught in between.

I find it's very useful to think of the problem not as governed by technical possibility, but rather by costs.

Good spoofing detection does not prevent fraud, it increases its costs to fraudsters (hoping to render fraud uneconomical).

Bad spoofing detection harms legitimate users and is inefficient against capable actors.

All of this gets much more interesting and complex when we consider privacy (and privacy compliance) to be part of the puzzle - now fraud prevention incurs a cost to the defender as well.

Finally, it's important to consider asymmetric risk/benefit counts. Are you defending against a single major heist, or against billions of tiny events (click fraud)? The trade-off will look different.


> Bad spoofing detection harms legitimate users and is inefficient against capable actors.

On a somewhat related note: I used to write sneaker buying bots and captcha was one of the best things to happen to the bot industry.

Captcha was easily "bypassed" by services that had humans sitting at computers generating tokens. Recaptcha tokens are valid for two minutes by default, so you'd be able to generate tokens up to two minutes in advance and have dozens of tokens available when the product in question became available for purchase. Real buyers had to wait for drop and then spend 15s+ filling out the captcha. Bots would take most of the stock in less than a second. I always felt like this was a fantastic example of a bot prevention mechanism that actually actively harmed "legitimate" users.


Yeah I find that very interesting, on many sites the Anti bot software prevents humans from checking out at all. All the footlocker sites make it impossible for people to checkout. Yeezy supply is basically impossible as well. The only thing that somewhat works is the Apple Pay shortcut for sites with Apple Pay enabled.

Can I ask why you quit the Bot industry? It seems super lucrative right now.


It was definitely lucrative (sort of, I never broke $100k/yr but I have plenty of friends who did) but it was never particularly satisfying. I'm a competitive person but I love the collaboration of open source software. I tried creating a few open source bot projects but the economics of the bot industry just doesn't have room for projects like that. You also can't do well in the sneaker industry without making it your entire life. At the end of the day, money alone can't satisfy the soul. And I didn't care enough about sneakers. C'est la vie.


Thanks, this is an excellent example.

You could actually call this a symbiotic lock-in. I once read an excellent post on how SEO (despite being a gray area for search engines) entrenches Google's market share because many SEO practitioners specialize in it.

If your ecosystem has become so specialized that most well-adapted users are bots, you have a problem (or not, if you can monetize bots).


Fascinating. You'd think the captcha would have implemented a check to ensure it was completed out on the same device.


The captcha-solving services reverse-proxy requests from the human-for-hire through the botmaster's headless browser. In other words, all the javascript runs on the botmaster's machine. Only the images (or screenshots) are sent to the human-for-hire, and clicks/screenshots are sent back from the human-for-hire to the botmaster.

This is the whole point of this article: device detection is generally a false sense of security.


These sorts of things increase the cost (you had to pay humans to do captchas and to setup the system), which can help.

It really depends on what the benefits of bypassing the detection are and if those benefits can be had elsewhere at lower cost. For ticket sales or limited run product sales or similar things, the benefits of buying (and presumbably reselling) the limited item is high, and there's a limit to what you can do to detect humans (and as you've described it can be counter productive). For spam prevention, making it cost more to spam means spammers are encouraged to find somewhere else to spam, which is good for your users (while using your product anyway). But it won't stop everything, some people are going to manually type their spams, and some people will build robots to tap out their spams more like a person; and if you ruin the experience for users (especially new users) that doesn't work either; a network with zero messages is spam free, but doesn't help anyone.


If you know where to look, the human labour is almost free compared to the profit you could be making.


indeed. ~$0.15 per captcha for >$200 sneakers is nothing. it may be shady but people are making a living filling out captchas all day. "Want easy stable income? All you need is a computer!"


If the landing page generates unique captachaa that are random how does this help unless the hashes are valid globally


Presumably it’s something like N headless browsers all waiting to click “checkout” on a cart with a decoy item in it, then when the drop happens they all update their carts with the new item and then post checkout with your precreated captcha token.


That’s one way yes, that called precart. That’s a common but less sophisticated method than what’s available


Cool! Tell us about the other ways.


first rule of sneaker bot club..


Some capcha has this vulnerability. In general bots solve capcha faster than humans now


Even things like hCaptcha?


I think balko bot can solve in 2-3 sec now. It absolutely destroys Shopify.


Recaptcha sends the same image to multiple browsers at the same time and it gets solved by consensus I think.


You’re thinking of reCaptcha “v2”. That was when it was about digitizing books. One distorted photo of a word had a known spelling, and the other didn’t. If you spelled the known one right, it let you through. If enough people spelled the unknown one the same way, it would “learn” how to OCR that word (of sorts). If you had time to spare/waste, you could spell the first one right, and put garbage for the second one. About half the time, it would let you through.

Not sure how reCaptcha “v3” (the photos of streets one) works. It could work the same way, but just with more known and unknown (unclassified) pictures? It’s also a lot more harsh if you get it wrong: artificial delays, switching to the “click until no more remain” mode (with more delays), and more. Sometimes, it just refuses to let you through no matter what: https://youtu.be/zGW7TRtcDeQ It’s extremely frustrating.


I don’t understand why programmatic access is being equated with fraud. Device identification is never a benefit for end users. Lack of privacy, waste of resources, hostile ux… we’re basically talking about DRM.


Then start a website that doesn't try to identify and or block bots and rake in the cash from this user benefit.

My own experience briefly running sites - bot traffic is 100% trash on almost every metric. Actual conversions, spam, click fraud, other fraud (spam talking about work from home), privacy violations (trying to scrap all user profiles, capturing deleted user content etc).


I’ve run plenty of large sites and apps, and yeah I always try to represent the users in product planning and for that the reason not once have I allowed captchas, fingerprinting… to be implemented.

The real reasons people implement these draconian measures range from inept cargo culting to nefarious business models. If you have a problem with spam, add user moderation and call it a day. That’s just a justification though, there is never a pro-user reason to destroy ux and privacy.


" there is never a pro-user reason to destroy ux and privacy"

Huh?

I require users to register (little to no privacy) and pay a bit - this results in a both better user experience and is a pro-user reason to destroy privacy. You can also blacklist most of the anon email providers / do SMS verification etc


Destroying user privacy is absolutely anti-user. Why on earth would I want you, the website owner, to know who I am? I would like to remain anonymous, thus would never even consider using your site since it’s so hostile.

I’m also curious what your site is. Will you say, or do you wish to remain anonymous?


Yeah, HTTP should be a protocol that users can use freely (as long as they don't use it for Denial of Service). I don't see why I shouldn't be able to fetch a website, scrape it for information and reassemble that information in the way I want it to be presented.


> reassemble that information in the way I want it to be presented

site owners want the advertising that was embedded to be seen and thus make revenue from it.


Not all of them, some just want to sell stuff, but make it positively terrible to search for things. Some site owners provide you service, like banks, or chat services like Slack, or Jira, but the user interface is again a turd. Some just publish info on a blog, but you can expect it to go away, so why not save it for later.

Slack limits the amount of searchable messages to 10k, so if you save them regularly to local DB, you can search without limits.

All kinds of owners don't sell/show ads, or care about scraping.


It's because programmatic access on websites with anything social, or with user generated content, is very strongly correlated with spam, and spam is quite often linked with fraud of various kinds. Like 99%+ of the time if something is posted by a bot outside of any officially provided APIs, it will be unwanted by other users.


User moderation solves this. You’ll know if something is or is not wanted by users, as they will moderate the content itself.

There is no need to destroy privacy, waste resources, create terrible ux, gate content…


Users have better things to do than constantly click "report spam" buttons on minor variants of the same content a million times in a row. People see spam reporting as equivalent to filing a complaint i.e. the company failed and they're informing it that it needs to do better.

Automation detection and privacy are orthogonal. The ideal bot detector yields a single bit of information: bot or not. It doesn't need to try to de-anonymize users or anything like that. As for UX, well, spam is a bad UX.


There are a million creative ways to solve this that don’t involve dark patterns, poor ux and wasted resources. This comment does a better job than I’ve done explaining:

https://news.ycombinator.com/item?id=27664604


That comment doesn't explain anything. It just says: spam? Remove the incentive.

The incentive is money. You can't just wave a magic wand and get rid of the incentive to spam things. Asserting that you can is wishful thinking.


Bots will then generate millions of fake users, upvote their own content and downvote all other competing content.


You realize there’s no captchas or fingerprinting on this site right?


This is an absolutely tiny site.

Things that work for small sites fall apart once you get 10~100 million+ people.

Look, I help do the actual user moderation for a large (10m+) subreddit, and it is a god damn nightmare. Plus, on top of that, we have to do automated bot detection anyways, because otherwise it's just neverending bot spam.


I worked on a site with 200M active monthly users. No captchas. It’s totally doable.

Reddit is actually a good example of having millions of users with no captcha. They “fixed” their problem with user moderation.

You chose to become a moderator, so it must not be that much of a nightmare.

EDIT: Reddit does have a captcha now. My guess is it came along with the other users hostile changes in the new design.


Reddit does require a captcha and has for a long time.

edit: They have for at least the last 6 years based on online questions. Long before the new design.


I see you are correct. They did not used to though.


Reddit repost bots execute with full (though unintended) support of moderators and legitimate users.


Users definitely don’t want to be exposed to spam much less have to take action against it.


The government is giving out COVID vaccines. Appointments are free, but still difficult to come by due to limited quantities. Do you really want bots reserving all the appointments?


How does this make any money for the bot operators? Wouldn't it be easy to see who is advertising vaccination timeslots on Craigslist?


You could hire a bot service to snag a vaccine appointment sooner, while others have to wait.


Another legitimate usage use-case is to wade through bullshit, dark patterns, ads & co in an automated way. See Invidious, Nitter, Teddit, etc.


While I love those attempts (kind of like ublock origin on a server), you can easily see how they are stuck between a rock and a hard place. They have all the vulnerability of fraudsters, and none of the benefits of authentic users.

That's why most of them seem to be perpetually banned.

[edit] Now, if someone would build an onion router that ran on people's end devices and served as a proxy network for these front ends, that would be interesting ... but would probably end up getting lots of people blocked, so please don't take it seriously.


> but would probably end up getting lots of people blocked

These scum companies rely on "engagement", so they can only block people if it's a minority. If _everyone_ participates, they have no choice but to accept it (or maybe change their product so there's no incentive to run these unofficial front-ends in the first place).


The recent Microsoft signed rootkit has specific requests for collecting CPU-ID. That says something about the state of device fingerprinting, but I'm not sure exactly what that is.

https://www.bleepingcomputer.com/news/security/microsoft-adm...


CPUID is just a bunch of flags used to discover the capabilities of your CPU: https://en.wikipedia.org/wiki/CPUID


Some other genuine legitimate use cases are: Monitoring (Synthetic Users) Performance Testing (though usually in a separate dedicated environment)


If it is your site then you will be able to bypass the fraud detection, and if it is not your site then the site owner probably does not consider these legitimate use cases to be concerned about.


Right, I guess it's important to have some automated way of testing the billions of configuration possibilities and network/device conditions that you could expect, because they are impossible to systematically test in human-driven QA.

How else would you test your SPA against millions of different content-modifying browser plugins, for example.


> How else would you test your SPA against millions of different content-modifying browser plugins, for example.

You don't.


I think we are trying to solve a social problem with a technical solution. Can we make an unbeatable spoof detector? I think we probably could, but we shouldn't. Locking down things takes the fun out of them.

Ideally, you could get rid of every captcha and just have an open API for everything. Somebody wants to create lots of email addresses to send spam? Remove the incentive to send spam.

Somebody posts thousands of manipulative tweets? Maybe the "feed" model of social media is wrong. Don't assume every random account is equally important. Better just have a simple social network where you can connect with your friends. Or even better, design your society such that it's decisionmaking is more resilient and can't be gamed.

The really hard but worthwile problem is not in tech, it is how we evolve our society along with the tech.


This was the original attitude of the Old School Internet.

Unfortunately a new generation has come along that believes advertising-supported business models are their birthright.


I’d honestly rather pay a (higher) fee to a service I like and is transparent. A service where my money is paying for the uptime and development and I give consent, and not where some “evil” AI is tracking my every step and then has the audacity to recommend me ads of things totally irrelevant to me.

I respect that services have running costs and I’m happy to cover my usage, but I have to understand that the average consumer doesn’t give a crap, or is simply ignorant and doesn’t know.


> Remove the incentive to send spam.

Good luck. Literally nobody has figured out a viable way to do that. At this point, we need to consider spam a fact of nature, like viruses.

> Locking down things takes the fun out of them.

Locking down things so we create a space that actually functions and doesn't succumb to abuse. What you call fun, others would call trolling they'd like to get rid of.


People hurt each other? Just remove their reasons. Not enough food? Just stop being hungry.

But how?


lol


It seems to me that one way through this challenge is just automating the browser itself. Internet Explorer was particularly friendly to this but since it's out of support maybe browser extensions would be the way to go.


Scraping user-generated data should be made a legal right. It’s not like the companies paid for the data they collected from users, and the whole thing is anti-competitive as well, and hurts the economy.


UGC can range from a Facebook post about tacos to intimate details about one’s private life or medical history. Blanket statements like yours do nothing to help the conversation.


You can get only scrape data you are allowed to access. If your eyes can see it, you should be able to scrape it. The analog hole is always there.


This gets very tricky when you think about JavaScript.

What if you only pop up important information via an onBlur event in a dynamically created div element that is overlayed on the page, only after asynchronously querying a REST endpoint to get information from a database?

How would a crawler even know that this element is shown when you leave a specific input field? How would it know to wait a second while the database is getting the data?

Like it or not, we're well past the days of complete HTML documents being served for a request.


I'm using a RasPi running Debian as my daily driver at home. It works pretty well except that I have to solve way more captchas with it than with any other computer on my home network.

Switching to a more common user agent recently helped only a little bit. At least that is what I believe, unfortunately I haven't done any measurements. It might well be that it helped not at all.

The irony is also that I'm permanently signed in to more services on this computer than on any other one in my household and that it is the only computer where cookies don't get purged regularly.

Also, I never used the RasPi for any automated tasks or anything else that could be interpreted as bot traffic.

The worst offender is Yandex, which is pretty much unusable because it let's me solve the captcha every few mouse clicks.

Any ideas how make a RasPi useable as a work computer?


i run non default browser settings (disable canvas and force my fonts)

i see captcha everywhere. and try to avoid when i can. basically if it is a comercial site, bye. but gov sites are using it too now.

from time to time i even get into captcha hellban. which should be criminal since they are used in gov sites. hellban is specially cruel as your entire IP is served the same 10 round of images infinitelly.


I wonder how hard it would be to write a bot that would reliably hellban an IP. Then one could just drive around and get every open WiFi hellbanned.


I use a allow-only javascript setup. I've found that for the web browsing I do that cuts out essentially all extraneous junk. YMMV.


Log into Google, Yandex, Facebook, etc. These could very well be throwaway accounts (with real profiles locked into Firefox Container Tabs for example). This seem to work even with VPNs and non-residential IPv6, so hopefully it helps.


Not sure if the parent comment has been edited, but they did say they're already "permanently logged in" to a bunch of the services in question.


That's the strange thing about it. Being logged into regularly used, long-standing, healthy, real accounts that have only ever been used totally legitimately doesn't seem to help at all. Quite to the contrary I fear to damage the reputation of my decade old accounts by using an unusual device.


That's pretty odd, I doubt it has anything to do with the architecture and more to do with weather or not you're logged into a Google account.

I don't have a google account and use a desktop Linux-based OS on my phone. I have to solve the same captchas there as on my desktop.


Do you have IPv6 enabled? That seems to be a major contributor to the captchas I run into.


Why are you using yandex? Just curious, never heard of anyone outside of Russia using it, and even many Russians try to avoid it apparently.


Why not? Diversify your techno-overlord portfolio beyond DARPA.


Also using yandex quite a bit. Especially around heavily censored topics like covid-19.


Can you give some examples of things you've been able to find out using Yandex that were censored on Google?


Try searching for "possible negative long term effect of mrna vaccines". Google will show lists of fact-check sites, WHO approved all pro-vaccine results. Yandex will show alternative sites like childrenshealthdefense.org, www.naturalnews.com with a contrarian view. I am not saying these are trustworthy sites but I want to see various viewpoints and not have the search engine be the judge. I think mRNA vaccines could be a game changer, but I also want an open discussion about the risks, especially long term, as in say negative effects on newborns from mRNA vaccinated woman.


Interesting. Didn’t think about that… are you not worried about things getting censored in the other direction?


Absolutely, always consider potential bias and motivation when searching and judging content. I probably censor search results myself in the form of selective link-result clicking and phrasing of search queries :-)


Isn't DDG somehow related to yandex, hence all the Cyrillic results?


While proving you are a human becomes more and more difficult. And I am really tired of having to recognize cars, boats and bikes for every website I visit.


You'd think that with the number of 'click all squares with traffic lights' type challenges we've had to endure, Waymo would be running entirely on camera feeds by now.


I take these tests very seriously, I don't want a car to crashes because I missed a traffic light.


I intentionally mislabel one or two every time. It's not my job and not my responsibility to help someone recognize these, and they shouldn't be relying on internet strangers correctly labeling stuff for the security of our road systems.


I remember when Google was using it to digitize and publish freely available books and it felt like a useful, clever thing to do. Now it just feels like being a unpaid worker.


I remember when google was doing that and there were people exploiting it for both fun and profit.

On the "fun" side, a bunch of 4chaners were intentionally poisoning the results with swear words.


It's exactly how I feel whenever I am forced to use a self checkout line.


I don't agree on that front. Checking out of the store is not automatically someone else's job - it's work that needs to be done for the relationship to be ethical. Just like people clean after themselves, they can also mark their own items and pay what is owed.


I've seen alot of comments along the lines that if you're not stealing when using self checkout lanes then you're being scammed. I'm a little swayed by the opinion, to be honest. It didn't used to be that there was only two lanes (out of 12) open during prime time shopping in all grocery stores, but it is now. Grocery stores found out they could just offload that part of the job onto customers for free.


You're compensated with the time you would have had to spend waiting in line.


Versus the time I have to spend scanning and bagging my own goods, waiting for the attendant to come over and punch in some magic code every time the machine whacks out over the weight, getting a manager to come over and provide an override for coupons, or to purchase alcohol or tobacco? Yeah no, self-checkout lines are a special kind of hell.

Maybe the average HN'er doesn't do any serious grocery shopping, say to feed a large family, but there's no way I am taking a cart full of groceries to a self checkout line, I'd rather wait. Sure, for buying 10 items or less, they are OK.

I prefer cashiers who are paid well to do their jobs, rather than subsidizing companies that refuse to pay living wages and just threw in the towel. Try checking out an ALDIs sometime to see how truly blissful being rung up at their checkout lines are.


I would never attempt to ring up a $500 cart of groceries for feeding a family on my own. And I do buy those. Also, I generally prefer the cashier experience.

However, I do occasionally need to get just an item or two. If I'm at a store that has a self-checkout, it seems to be faster for that.

Also, the checkout at ALDI is an impressive demonstration of efficiency. CostCo seems to have a similar philosophy.


Not to be a party pooper, but that’s part of their threat model. They expect some % of users to mislabel, intentionally or not.

To really cause problems, you would have to incentivize a large number of other web users to also mislabel the same images.


I'm under no false impressions - these dark patterns are used because they work. I still don't think it's good, I don't feel compelled to cooperate, and I certainly don't think we should consider that this is an acceptable way to conduct this type of work/research.


I'm always happy when my intentionally-mislabeled solution is accepted. A lot get rejected.


Myself included. Ill mislabel things that look like they could be, but are obviously not. If im asked to “Tag motorcyles” ill tag bikes as well.

Also, i dont want self driving cars. Not until all the social issues are resolved.


Imagine if they waited for all social issues to be resolved before putting cars on the road in the first place, or if Uber waited for better laws, etc. It would have never happened. Putting something out there and getting people used to it before curmudgeonly beurocrats and other naysayers can catch up is the best way to do things in these cases.


Imagine [...] if Uber waited for better laws, etc. It would have never happened.

Ah… thanks for this second of imagination, that was some quality time.


Uber is one of the best things to happen to transportation in a long time, hemming and hawing about the pay or their practices aside.


my man, you're bragging about being one of those social issues


systems also need noisy labels..


That’s a great bit :)



Is the main purpose of these really to prove that you are human, or to improve their ML datasets?


Neither. Nowadays it is only one part of a large scoring system using a bunch of device and user fingerprinting, and it's primary use seems to be tarpitting bots who score low on the other metrics (as well as any users unfortunate or privacy-conscious enough to be misidentified as a bot).


I'm pretty sure that's sufficiently shown by the fact that captcha v3 doesn't even have any inputs.


Yeah, its actually for their dataset.


Whichever saves/makes them more money.


These days most sites do it by requiring a phone number. Since most countries require ID for a phone number and functional burner numbers are really hard to get, the system works well.


I tried to set up a web account with the IRS (the American tax bureau) the other day. My perfectly legitimate mobile number, which works with every other two-factor authentication, was not sufficient for the IRS. Apparently you have to have a number with a certain set of major cell phone service providers like ATT and Verizon, not a minor service provider. They also require half of a credit card number or some other financial identifier like a mortgage. I rather wonder if the cell phone service provider requirement has to do with adding more surveillance capabilities to the IRS' already impressive financial surveillance ability.


A few things ...

I had to set up a web account with the IRS in the recent past and I just skipped the entire mobile validation by having them send the auth code to my mailing address. It arrived in two days and worked perfectly. YMMV.

I am, however, interested in what mobile carrier you use that had a "real" (not VOIP) mobile number and can receive SMS from "short codes", but did not work with the IRS ?

That's unexpected - basically every mobile number in the US is one of the big three or an MVNO operating with one of the big three networks ... can you share just a bit more about your setup ?


Republic Wireless mobile numbers often look like voip because it's a MVNO run by Bandwidth.com (a large wholesale voip carrier). It's apparently in the process of being sold to Dish Wireless though, so that may change.

I think the IRS may have access to customer names associated with mobile phone accounts to confirm identity, but not all carriers have identity information for their customers, and I'd guess smaller carriers (or privacy focused carriers, if any exist) may not provide that access.


It's Xfinity mobile and can send and receive SMS from short codes including my banks, Amazon, etc. Also, I cannot use my mailing address for precisely the reason that I wanted to set up a web account: my accountant put the wrong city and zip code on all my tax returns (several EIN/TINs), and the IRS uses the address on the most recent filing for verification purposes.


Do countries really require ID to get a phone number?

I can buy a prepaid sim card in our Hofer (=Aldi) for 2eur, no verification, no nothing,... they even have the same barcode on, so even Aldi does not know which is mine, and I can pay in cash. I can buy a refill there with cash too, but just to receive an sms, I don't need one, because receiving messages is free.


Lots of countries do: https://blog.telegeography.com/liberty-vs-security-the-battl...

You also have to remember carriers will track the approximate position of a phone with an anonymously-purchased SIM-card, so better not take it home.


2€/account is still decent spam protection, and fair/affordable for a lot of the world population.

It's one thing the whole web can take out of the cryptocurrency transaction model. Costs can also be explicitly monetary and not just time-based through captchas.


If you went to a German Aldi on the other hand, they would let you buy a SIM, but before you can activate it you need to provide your personal information including a verification check that involves your Personalausweis in one way or another.


Does the ID get registered or do you just show it like when buying alcohol?


It is registered. The account is fully bound you.


I had to provide my Passport and submit to a check in order to buy a simcard while on vacation in Berlin.

It was a bit of a surprise as I flew through Heathrow and saw simcard packs in vending machines.


I know in Greece and many European countries you must show ID to get a SIM, nationals and foreigners alike.


Does the ID get recorded though?


> functional burner numbers are really hard to get

For SMS verification at least there are quite a few sites dedicated to verify you for a few cents.

AFAIK they buy real prepaid sim cards and allow reuse by different customers on different sites.


But then anyone can reset your account’s password by using the same service?


Technically yes, practically there should be no way to know which account has which phone number.

The bigger problem is if the site requires you to re-verify your phone for whatever reason (for ex. paypal does when you access it from a strange IP or sth) and the number you used may not be available anymore.


> works well

For everyone except the user who now has no privacy, is trivially hacked by SMS interception, can’t create multiple accounts (e.g., to segregate their activities on a chat platform), ... .


>can’t create multiple accounts

This is usually the intent of sms verification. I hate it just as much as you but I admit that there is no better method for preventing ban evasion. When accounts are trivial to create, moderators have no power.


Sms validation costs the spammer about 30c per validation. Maybe less in bulk. So it depends on how valuable the account is. A Nike snkrs account is worth 1-2$ so it’s still quite profitable


> When accounts are trivial to create, moderators have no power.

Sounds ideal! Moderators are just power tripping gatekeepers. Community ranking of content is sufficient to get rid of trolls, spam…

Taking away people’s right to privacy just to empower moderators is quite dystopian.


^This when I run into a site that whats my phone number I simple don't bother anymore.


The sites requiring a phone number are also now excluding VoIP known numbers such as Google Voice.

Problem. I don't have any other type of number. So while I added my older GV number to my Steam account, I can't update my number to my newer GV number because Valve won't accept non-LL or Cellular numbers for 2FA. So I have to keep a number I've changed from everywhere else... or not have a number on my account anymore.


How do we get all these spam calls in North America then?


The phone company lets them put whatever number they want in their caller ID when they call you. It's a zero-security string. Receiving calls and texts doesn't work the same way.


VoIP


> And I am really tired of having to recognize cars, boats and bikes for every website I visit.

And you are performing work without being compensated for it. In many countries this seems to be illegal - you have to pay at least a minimum wage.


I wonder how such countries' laws define "work". If my interactions with a website are used in an A/B test, am I an unpaid tester? If I sit in a restaurant in winter and the thermal radiation of my body saves them some of their heating costs, is that work that I should be paid for?


I hate hcapcha. A bot can solve it in 2-3 seconds it takes me 5-6 i can’t keep up


I'm curious why they collect fire hydrants. Is it important for self driving?


I work in performance testing, and in the web space you essentially have two different main approaches:

* Protocol Testing - where you use servers to generate lots of HTTP/S traffic that is correctly structured to simulate user traffic. Generally, this is focused on capacity and server response times.

* Browser Based - where you need the complex logic present in SPAs and Javascript to accurately create test traffic. This also allows for as-close-to-possible real user response times. This is essentially "headerless browsers" of varying types. This requires more performance test compute to process - so often a combination of both types are used together.

I find that Testing application security is one of the most technically challenging aspects of performance testing. Often some parts of the security infrastructure have to be disabled to allow testing to occur. For example, Rate Limiting by source IP, any form of Captcha, 3rd party (OpenId etc) services have to be disabled - which increases the risk to application availability because sometimes there are components that haven't been tested exactly the way they will work for actual users.

Luckily most the 3rd party services we use are already significantly tested by their vendors - but it is something that I worry about.


Is there really a ton of effort being put into anti-spoofing by websites?

The examples of people trying to spoof are:

> People or bots who want to get more elements specific to certain devices, or who want to break out of so-called ‘device ghettoes’ (eg they don’t want to have restricted possibilities due to being a mobile device)

OK, but does the website owner really care if a tiny fraction of people do this? Restrictions by device are usually for performance and ease of use reasons.

(And when content is limited to certain devices for legal reasons, like HDCP, this is accomplished with cryptography, not with device detection.)

> Likewise, some threat actors want to take advantage of the fact that some security measures are not as tight for some devices.

Seems like it would be better to patch the security hole instead? Or else deprecate support for old devices (e.g. stop serving HTTP, only HTTPS). Anti-spoofing seems like a terrible solution to security.

> Who can stop people from utilizing device spoofing if a website cannot show captcha to mobile devices even if some rate limits are exceeded...

Since when do CAPTCHA's not work on mobile devices? And if yours doesn't... switch to one that does?

> or if a company offers specific discounts or products only to some types of devices?

That's kind of a dark pattern anyways.

I mean, the article's interesting, and device detection is (sadly) super-necessary for progressive enhancement, as feature detection doesn't work in every case -- but you can assume honest users in that case. If they spoof their user agent and the site breaks, then the problem's on them.

But it seems a little bizarre to me to put development effort into anti-spoofing measures rather than addressing your actual problem directly. Is there a use case I'm missing where anti-spoofing really is the best or only possible approach?


It’s not my area of expertise, but It seems like spoofed devices might be used for fraudulent ad clicks and companies like Google are going to devote substantial resources to defeating them.

Also fake reviews.

Who uses multilogin anyway?


Yeah, would be interesting to see more examples of how spoofed devices are used nefariously.

I'd bet people try to get around quotas or rate limits by spoofing different devices. Maybe ad fraud as well?


OP is yet to discover Chinese simulators designed to spoof all kinds of devices.

Or even better, real device bot farms https://www.youtube.com/watch?v=X_pRsSM_sXQ


Google has secretly installed a trojan that sent cellphone tower IDs to them.

Remember this: https://qz.com/1131515/google-collects-android-users-locatio... ?

A passing birdie told me that it was an internal antibotting sting.


Being pretentious without googling results in Rumi being announced as turkish poet.

https://en.wikipedia.org/wiki/Rumi


Big botters simply the browser in a VM.

You can't do anything about that. And yes, even WebGL fingerprinting was defeated nearly completely.


Not that you can’t do “anything” about it, but you need to analyze the actual behavior instead of just user agent. Soon it will become an AI arms race between behavior analysis and behavior synthesis.


Or just accept that user agents are robots acting on the user's behalf and build your stuff so that it can tolerate robots.


> For example, Netflix supports hundreds of different video formats in various resolutions for each device, from mobile devices to smart TVs. Without device detection, how would that be possible?

Device detection is different from content negotiation. This statement is similar to a statement that ignores the HTTP Accept-Language, and claims that location access is necessary to build internationalized websites.


Ultimately, no matter what you do on the client side, the server only knows what the client tells it. You don't need advanced emulation of JavaScript features to spoof an agent, you just need to mimic the requests a normal agent would make.


Down to TCP handshake, TLS handshake and ciphers selected, timings, HTTP headers (case, order, whitespace, which ones are sent), HTTP session characteristics (concurrent stream count, behaviour of QUIC windowing, behaviour of closing the underlying connection, does the client read all of the response), etc.

At some point the only way to not be spotted as a spoof is to run the real thing.

If you think people aren't detecting spoofs like this then you are mistaken. From ad-spoofing detection, to e-commerce bot detection, this is very routine for companies to look at, it's not new it's just becoming more available to everyone.

Spoofers would do better running the real things and learning tricks from poker bots by reading the video output and controlling computer inputs. This is fine by a lot of people as it has increased the cost on the spoofer side.


It increases the costs, sure, but is the increase significant? It seems to be fairly common to write bots for various online games that only rely on video input. The cost of development is higher, and the hardware requirements somewhat steeper, but how much more expensive is it really? I doubt it's even close to doubling the operational costs.


Costs change the dynamic and allows it to be a more focused problem.

For example spoofing as part of DDoS is now cost-prohibitive as you either cannot achieve the scale needed or you are too slow to be effective... which makes the market for booter service less viable at their low cost.

For ad-click stuff it wipes out the bottom of the market and forces the fraud on higher value adverts where it is more visible.

For e-commerce bots trying to buy the latest sneakers in sneaker drops this cost is irrelevant as the benefits are huge, but... with enough of the other fraud reduced companies that provide services to protect here can focus more resources here to make it harder.

Similar to how the most effective spam I now see on sites I operate is actually now human generated by cheap human labour (effective meaning "gets past layers of detection that stops it early")... the spam problem for me is effectively solved as it's been reduced so much and humans are slow and inefficient.


It's a very significant increase indeed. The increase is (or was) large enough to entirely wipe out most adversaries and restructure the battlefield in ways very advantageous to those playing defense. At least, in the social web space. Video games is a different world.

This stuff is something of a secret weapon to those who know about it. Because so many developers assume it can't work the companies that master it have a large competitive advantage.

Source: About a decade ago I created Google's main "device detection" platform, as this article calls it (not Picasso, the thing that executes Picasso). It's actually more like an automation detection platform, as it's not a fingerprinting or device tracker, it just tries to separate human operated from automated clients. These days I'm told there's a large-ish team that maintains it full time and has ported the concepts to other platforms like Android.

It started as a 20% project because at that time almost nobody at Google took the idea seriously. Fortunately, my manager was happy to support my experiments. People had the same common (but incorrect) intuition you're displaying here, that any sort of client integrity technique is so easy to work around it's hardly worth the bother. Actually even I believed this to a large extent, just less so than the others. This turned out to be wrong for some not entirely obvious reasons related to the structure of the spam industry:

1. Most spammers are either not programmers at all, or are extremely poor programmers compared to a typical tech firm employee. They can in fact be out-coded.

2. This is because spamming is usually not all that profitable, so programmers who get good can find better and steadier money in the white market. The ones who remain are typically those who live in places without any local software opportunities (e.g. developing countries).

3. Because of this mounting even a not very strong defense is sufficient to corral your adversaries into a shallow economic pyramid, in which a small number of "skilled" people produce tools and services they sell the others, who then run the individual campaigns. This means you are probably not fighting as many people as you think you are. Screwing with the supply chain is an excellent way to wreak havoc on spammers.

When we first deployed the system we spent several months tuning it in what was effectively a running battle with the major Google account sellers. We discovered that the sellers were in turn buying their account creation bots from other people, and some sellers were actually re-sellers. One of the sellers had been using a "raw" bot that didn't embed a browser engine, and thus was knocked out of the market for months as they waited for a new bot to be written from scratch. When that came online there were mistakes in its browser automation that we were able to detect. The developer of the bot couldn't de-obfuscate the JavaScript we used (too hard for them) so treated the platform as a black box, just trying random things in the hope it'd work. We could watch this evolution in real time and block new versions as they were released. After a few rounds of this the seller got sick of it and switched to a new bot supplier. This new bot also took months to complete, and when it arrived it had fixed the bug we were using to spot the first bot, but introduced new bugs the other didn't have, meaning even then it was detectable.

At that point the seller gave up, as presumably paying for the development of all these bots was quite expensive relative to the margins involved. This in turn nuked all the resellers that had been relying on that guy, and blew a hole in the entire Google-oriented spam ecosystem. Spammers had to start phone verifying accounts en-masse, and for most of them it just wasn't worth it (a few switched to using stolen accounts instead of creating them). I haven't been there for years so don't know what the current state of play is, but you do still see public threads crop up from time to time where spammers say they tried to beat the system and couldn't, like this one:

https://github.com/BitTheByte/YouTubeShop/issues/14

If you want some insights into the minds of the typical newbie spammer when faced with this system, try this search and flick through some of the results:

https://www.google.com/search?q=site%3Ablackhatworld.com+bot...

NB: Sometimes people claim they've "cracked" this system but usually they mean they did a bit of reverse engineering out of curiosity. Going further and making a real spam bot that can reliably beat it is a much harder thing, especially if you want that bot to be working with HTTP directly for performance. We never saw anyone attempt to build an HTTP level bot that worked against it in the time I was there. Probably there have been some attempts in the years since.


It makes sense that it would be super difficult to out code google. The market for these services is pretty small compared to how much google can spend to prevent and constantly combat. Also google has way better data then they attackers so they are at a significant disadvantage


E commerce bot detection is a joke. The Botters are always 2 steps ahead, they actually use the detection to their advantage often by bypassing it and getting ahead of legit users. See comments from one of the top posts


The stealth plugin for Puppeteer Extra gives a pretty good idea of what you need to cover today. Maybe it's not rocket science, but it's not trivial either.

https://github.com/berstend/puppeteer-extra/tree/master/pack...


In theory, perhaps. But in practice, that's too simple: What, for example about certificate pinning? If you have a safe certificate on the client, spoofing becomes (prohibitively) hard.

Try, for example, to disassemble Facebook's APK or disable pinning via FRIDA (https://github.com/frida/frida). It's not exactly easy, and with frequent releases, it's a moving target.


You can never trust anything on the client though. You must assume everything the client says is a lie, the browser itself, the OS, everything.


The problem being the 'normal agent' is has a fantastically wide range of behaviours it will happily carry out and report to the server. If you want to appear like a normal user you need to mimic this: if your agent (whether used by a bot or a privacy-focused human) blocks this fingerprinting it's very likely to be blocked or tarpitted by the server.


I'm currently typing this on a build of Chromium 93 (official Chrome is at v91) running on Mac OS X Mavericks (released in 2013, incompatible with Chrome 69+), running on a self-built PC with an Intel 4790K. I replaced Mavericks's default AppleColorEmoji font with an updated version from Big Sur, but my system font is of course Lucida Grande. I have a working copy of Widevine.

I would expect most websites to assume that my machine can't exist. But I don't have any problems with captchas.


> If you have ever used a mobile browser (of course you have), you know you cannot resize the browser window. It’s always opened, maximized, covering the whole screen.

Not true. What about splitscreen?


Yeah, I'd bet most people with widescreens, and nearly all with ultrawides dont stay full-screened all the time. I certainly don't.


> mobile browser

I think he refers to these kind of UI features on smartphones: https://www.samsung.com/au/support/mobile-devices/using-spli...



There’s a paper that addresses this fundamental problem:

https://www.freehaven.net/anonbib/cache/oakland2013-parrot.p...

  @inproceedings{oakland2013-parrot,
  title = {The Parrot is Dead: Observing Unobservable Network Communications}, 
  author = {Amir Houmansadr and Chad Brubaker and Vitaly Shmatikov}, 
  booktitle = {Proceedings of the 2013 IEEE Symposium on Security and Privacy}, 
  year = {2013}, 
  month = {May}, 
  www_pdf_url = {http://www.cs.utexas.edu/~amir/papers/parrot.pdf}, 
  www_tags = {selected}, 
  www_section = {Communications Censorship}, 
}


Back in 2018 i crawled 200Million urls of single site by using just freely avilable proxies on the internet to evade rate limits. i do not think it can be repeated today.

CDNs are blocking even my genuine requests.


Sadly, it's easier than ever. Multiple botnets create open proxies which are easily discoverable, if you know where to look. Also, cloud providers give you more IPs you might ever need.


Cloud provider ip ranges are publicly known. They are the ones first get banned


These guys are awesome. Why? Because they get it:

   Options for Paypal and credit card payments (you can choose
   one of the alternatives):  

    * Send us a copy of an ID, issued by your Government, which clearly
      shows your name and picture. The file will be completely deleted
      after the verification process

    * Pass a video interview with our customer support representative

   Options for Bitcoin: 

    * We don’t ask to verify Bitcoin payments


I don't get this. I would have thought it would be the other way around. Paypal and credit card companies do a lot of KYC and other checks on their users. Bitcoin does none. If you need to verify a user's identity for some reason, you need to do it more if they pay by Bitcoin!


The point is not the KYC, the point is to eliminate chargebacks.


Why not 3DS with pre-charge $1 with random description for the customer to later verify?

Sorry but they're not getting my ID. Strikethrough.


Think that’s sort of the point. They don’t want your ID, they want you to use Bitcoin


Since a lot of their customers will be on the grey side of legitimacy, I imagine credit card fraud would be common which can get their seller account shut down.


You've got it all backwards.

Bitcoin doesn't need identity to prevent fraudulent payments. It uses math for that.

The legacy banking system has no real, complete solution to fraudulent payments. So instead they bodge on this identity-checking nonsense, which maybe sorta works sometimes, with very high overhead. Privacy is collateral damage here.

KYC for cryptocurrencies is like horse-buggy manufacturers requiring a whip and manure-scooper in every automobile. The horse-buggy industry is very desperately trying to convince you that this requirement is for your own good. And that without it, the terrorists will win.


You have it all mistaken. It's for AML, not for fraud.


If this business is concerned about AML, why do they accept anonymous bitcoin payments? They care about chargebacks, not AML.


I don't care about this business. I'm responding to the rant of poster above me.


“I’m not accusing Microsoft of having spied on users or abused its data-gathering capabilities.”

I am. The changes to Windows over the last year are designed to do this. For starters, if you install Windows without a Microsoft account (which is only possible if you lie to Cortana during setup and click “I don’t have internet access”), the modals and update flows that pop up after you complete installation represent a dark UX pattern designed to make you create an account anyway. Windows has also been updated several times over the last year to default Edge over your preferred browser (going so far as to actually force the Edge icon onto your task bar AND desktop AND force you to go through a “set default browser as edge” flow). Most recently I was auto signed up for a news ans weather widget (with ads) on my taskbar. MacOS isn’t much better these days because even a brand new Mac is loaded with Apple Bloatware (do you want to pay for iCloud? Apple News? Apple TV+? Fitness? All music? All together in one package? Also here’s a new Finder format which defaults you to save to iCloud and hides your OneDrive).


> instead of using his thumb, index and middle fingers (as Germans do), he uses his ring, middle and index finger.

I'm not German but I also find the "thumb, index and middle" method to be the most natural, curious why people in the Anglo world have a different way for doing this.


I'm Romanian, so hardly in the anglosphere, but extending the thumb is not natural for me. I would always start with the index finger for 1, then continue with the long finger, ring finger, and pinky for 2, 3, 4, and extend the thumb only for 5. Sticking out just the thumb for 1 would look like an OK kind of sign to me, not at all as 1.

I guess this is all just culturally acquired, no deep reasons necessarily. Still, as someone else was pointing out, sticking out all fingers except the pinky for the "German" 4 seems hard to do.


I find that if you lead with the thumb-as-1 - then I can't use my little finger held down for 4.

I naturally use my thumb to hold down the other fingers leading with index-as-1.

I'm not sure what the thumb-as-1 does for 4 though.

Since the thumb is the first digit on the hand (depending on the way you go) I can see it being a logical choice - but not necessarily the most convenient once you get to 4.


I just tested different ways of displaying numbers with fingers and the most comfortable way is to start with the pinky finger and then go in order from there. But I guess the downside to that is a raised pinky isn't as clear as a raised index finger or thumb due to it being so short.


I feel like it's important whether you're tallying things, or presenting a quantity by hand gestures, for example to a bartender. Then 1 is a single index finger raised, with the thumb and other fingers pinched together. 2 the same, but also long finger. For 3, I would use the ring finger, and similarly for 4, using the pinky.

However, if I was tallying, I'd do thumb, index, long, ring, pinky. I have a Chinese colleague, and he always goes pinky, ring, long, thumb.


If I count things, I tend to extend the thumb first, then the index finger, then the long finger, etc. This I think naturally leads you to make 3 by extending the thumb, index and long finger.


That's OK, but can you then extend the ring finger but not the pinky to make a 4? For me it's almost impossible.


Fairly easily yeah, on either hand


For that reason I always count as \.... \\... \\|.. .\|// \\|//, if that makes any sense to you.


I find it physically difficult to fully extend my middle finger while my ring finger is curled up. If I put my thumb on it I have no issues, but otherwise it feels uncomfortable.


>I find it physically difficult to fully extend my middle finger while my ring finger is curled up.

Definitely sounds like a cultural thing. In Australia, this is an extremely common gesture. :^)


It works fine if my thumb even taps my ring finger, but without it the ring finger doesn't like staying down.


I taught myself to count from the pinky, because each of the gestures for 1-5 is quite natural: the thumb retains the bent fingers and releases them one by one.

It's, er, culturally variant, but not in a way that raises eyebrows.



More and more the web is turning into cable TV.

The other day youtube was showing me ads, without exaggerating, every ~20 seconds. This would happen for 3-4 minute stretches before they got less infrequent.

This may not sound like much but when you're watching a 15 minute video they add up.

I looked and looked for ways of blocking youtube ads on my iphone.

Morality aside, what struck me when I looked into this wasn't that I specifically could not find software to block ads (which was disappointing), but rather the larger point that came across as I browsed forums looking for a solution - how hard it is to hack your devices - even android devices.

Most articles ended up with having to root your device, which is fine, but even then the solutions were unreliable.

Curious to see what ends up happening in the long term.


Adguard will block 95% of YouTube ads on your iPhone for the time being, sometimes you’ll need to reload the page for the video to play, but for me at least that’s a better experience than 2 30sec ads


I think that works in the browser not the app, and the browser app is pretty nerfed.

That is probably going to be my best bet though.


Yea, I haven’t used the app in years


On Android, people have reverse engineered the youtube app to eliminate ads -- https://vancedapp.com/


Starting to wonder if the techniques like those outlined in this article are why I'm constantly presented with a captcha asking if I'm a bot. I suppose the future is now.


I haven't noticed any significant differences in how popular websites and CAPTCHA perceive my identity when am surfing the web from Cloud VMs


Then your regular identity must be tainted as well, because the difference is night and day.


> a website cannot show captcha to mobile devices

Wait, what?


Well, the actual quote is:

> Who can stop people from utilizing device spoofing if a website cannot show captcha to mobile devices even if some rate limits are exceeded, or if a company offers specific discounts or products only to some types of devices?


Okay but I'm still confused by the part I quoted.

Aren't CAPTCHAs shown to mobile web browser users all the time? Is there some law against showing them CAPTCHAs in the Peoples' Republic of WestArctica?


I think that makes sense in the context of the whole sentence:

> Who can stop people from utilizing device spoofing if a website cannot show captcha to mobile devices even if some rate limits are exceeded, or if a company offers specific discounts or products only to some types of devices?

The "cannot" is not meant in technical, but rather organizational sense - somebody decided that they don't want to show captcha to mobile users ( one reason might be that the user experience was deemed too bad). The same way they decided to offer specific discounts only to some other class of device.


I couldn't figure how does the emoji image hashing works. Would somebody elucidate me?


Emoji are drawn on a canvas which is then hashed. Here's a demo: https://jsbin.com/xunirujipa/edit?js,output


I’m guessing they render the emoji somehow, capture the image, and hash it.


"Although he wears a Nazi uniform and speaks German well, he gives himself by a minor detail: his fingers."

I thought perhaps the author was going to discuss comparison to real world fingerprints. Here's are a few questions for readers: First, how much can a "device fingerprint" be used to identify a person. Is it identifiying a device, or only the person who is using it. How do we know who is that person. Second, is a "device fingerprint" like a real world one where someone can chop off someone else's finger and, as seen in popular TV/film entertainment, use it to gain entry into some highly restricted area. Third, assuming the answer is yes, what stops the collector of a "device fingerprint" from mis-using it. As long as she can make network connections appear to be coming from a plausibly genuine IP address, how would anyone distinguish a fraudulent user of the "device fingerprint" from an "authentic" one. (For example, the collector of the fingerprint could use it to impersonate the true owner of the device.) At least with real world fingerprints, they are physically attached to our person. Neither copying and re-using them nor stealing someone's finger is trivial. And generally online advertising firms are not in the practice of collecting real world fingerprints; those taking real world prints are often government agencies. We may have certains protections under the law against the government. We cannot make the same claims about "device fingerprints".

As a user, I have not found that very many websites/endpoints that I use require any sort of complex fingerprint. I can retrieve the data I want without using a bloated graphical browser, running Javascript or sending a bunch of gratuitous HTTP headers. I send only two: Host and Connection. I cannot remember the last time this did not work. I never see any ads. As such, I struggle to understand all the fuss about "device fingerprints". This is voluntary data transfer to advertisers. If we send all sorts of data to websites/endpoints every time we make a simple HTTP request, then obviously that data is going to be used for something. The user generally has no legal/contractual control over how the data will be used. For example, the user may see ads as a result. Whereas if we keep requests brief and do not send heaps of gratuitous data, it stands to reason we would see less advertising, and certainly less targeted advertising.


No answers just a couple of downvotes.

If cookies can be stolen, so can cookies that contain "device fingerprints". "Identity theft" keeps getting easier through no fault of the consumer.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: