Hacker News new | past | comments | ask | show | jobs | submit login

If you tweet 100s of times the same text at accounts that don't follow you, you're a bot.



Except when Twitter makes that a heuristic for detecting bots, then they'll soon stop tweeting the same text twice.


>Except when Twitter makes that a heuristic for detecting bots

There are several dozen. Maybe not turned on in prod, but that's a whole another story. In fact, within three months of joining, way back in 2012, one of my very first tasks was to write a standard datamining job that would compute the difference between the GPS location during office hours (9-5pm) and the GPS location during home hours (7pm-7am) of everyone who tweets. A histogram of those differences would tell you about the commute distance of the average American who tweets. You could then bucket by region and say interesting things like the average NY tweeter commutes 25 miles more than the average CA tweeter. Looking at the results we got, it was clear there was a substantial percent of bots, because their location varied so widely, minute to minute hour to hour. Haversine of GPS diffs will be reasonably stable, because your IP maps to the GPS ( we used the standard Maxmind geoip2 API) , and those IPs are relatively stable....Except if you are a bot and switching IPs willy-nilly. This was just one instance, but there were several such projects...usually interns and new employees would work on these to get their feet wet, and then move on to more substantial projects.


VPNs make all of this analysis attack regular users, yes?


Provenance of data was not in scope, 'twas more of a standard datamining "see if you can dig up something interesting" project. Like I said, there were scores of these - one of my colleagues wrote the famous soda vs pop thingy which once again put location stats to good use- http://blog.echen.me/2012/07/06/soda-vs-pop-with-twitter/


Quants have ruled the business landscape for long enough, bring on the Quals.


Hear, hear!

With quantitative analytics being used and abused by ever more businesses, the advantage will go to those that can apply qualitative checks to their assumptions - and scale it.


Yes, they'll need an adaptive system. This is a (mostly) solved problem already - just look at your Gmail spam folder.


Are you suggesting to get a Gmail account, turn on the setting that makes Twitter email you notifications, mark tweets as spam in Gmail, and then use Google's spam detection as an existence proof of a solution if it's successful?


No, he's saying people by-and-large can't defeat Gmail's spam detection.

The same will likely happen at twitter.


> The same will likely happen at twitter.

It took them years to let users upload images (when there was a clear demand and parasite services grew out of this) and there's obvious spam accounts that could've shadow banned with very simple heuristics.

So I don't think this is going to happen.


I suspect their spam/abuse trouble started getting a bit more attention after Disney dropped an acquisition over it in October.

https://www.bloomberg.com/news/articles/2016-10-17/disney-sa...


Will "Congrats on your 250 like tweet!" count under this?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: