Hacker News new | past | comments | ask | show | jobs | submit login

We've seen some fairly aggressive voting rings organized by publications with well known names. Only a few are actually banned though. Usually we just take away the voting ring members' ability to vote.



I deal with so many folks trying to scam our search engine every day I can't imagine the hordes at that gate here.

Reddit's response seems to be to take away the cheese in order to make the site less appealing to the rats. Its one strategy.

So far I've found that making it seem to the bad actors that nothing has changed is most effective at keeping them from getting worse. The first time we tried returning a straight error code for a robot search we got to see how fast a robot could send requests (pretty damn fast!) but we can send them the same 10 bogus serps again and again and again and they will chew their robot cud all day.


So far I've found that making it seem to the bad actors that nothing has changed is most effective at keeping them from getting worse.

Right. The quality of feedback information is one of the biggest factors in development cost. The bugs you can't reliably recreate are the biggest problems, sometimes by one or two orders of magnitude.

This should be used as a weapon in the security battle. Seems like not enough people use it.

In all likelihood, the scam initiator is not the same person as the scam implementor, so "making it seem to the bad actors that nothing has changed" is likely to inflate their costs tenfold. This also works when the scam is entirely the work of one person, but it's especially effective when multiple parties are involved.


    Seems like not enough people use it.
Well, since this is a tactic that works best if the bad actor never finds out about it, isn't it possible lots of people do it without ever talking about it?

Relevant Coding Horror http://www.codinghorror.com/blog/2011/06/suspension-ban-or-h...


isn't it possible lots of people do it without ever talking about it?

I hope it's the case.


Some following this thread might find Tarpits interesting if they don't already know about them: http://en.wikipedia.org/wiki/Tarpit_(networking)

Do you guys know of any other techniques that waste the malicious actor's time making it think everything is a-okay?


Disciplined users - (http://meatballwiki.org/wiki/UsAndThem)

This usually fails because someone isn't strong enough to not respond. Or someone engages because it's a fun conversation for them. Examples of this on HN are some of the political discussions (eg Palestine) where people really should just flag and ignore but often people engage.


>So far I've found that making it seem to the bad actors that nothing has changed is most effective at keeping them from getting worse.

Reddit does do this, through shadowbans. Apparently, they felt that they weren't effective enough in discouraging antisocial behaviour.


As far as I know, shadow/silent bans only apply to users, not domains.

Does anyone know if reddit has tried extending silent bans to domains?


I once submitted a link from my blog to Reddit. It didn't complain, but the submission wasn't viewed by anybody, being totally absent from Reddit and I was able to see the submission only if I was logged in.

I then talked to the moderators about it and they said that my link submission history contained multiple items from my blog, so the automatic spam filtering kicked-in, with my domain being blacklisted. Because the submission was good and ontopic, one moderator proceeded to whitelist my domain again.

I don't know if the key for this silent ban was just the domain name or the combination of (username, domain name), but it was a silent ban.


That isn't a shadowban. That's just the regular spam filter. A shadow ban is a slightly different approach. A shadowbanned user, when logged in can see his posts and comments as if they were there normally, but only he can see them. Other users don't even know about the existence of them. (Mods can see shadowbanned user comments and posts in their subreddits, but normally don't approve them because each post/comment has to be approved manually.)


Yeah, but I was able to see my submission while logged in, as if nothing happened.


How is a robot used to scam a search engine? Are they stealing your results on a per-search basis?


Sometimes. Sometimes people try to build databases of pages about people, you know search for "Kennedy, Alice" and then get all the pages we have, then "Kennedy, Bob" ... "Kennedy, Charles" etc. People then try to offer "Find out about <name>!" types of services on their web sites. Or some wordpress theme will have a sql injection vulnerability and we'll get robots trying to find every site we know about that run that particular theme. There is lots of stuff you can do when you have a copy of the Internet, not all of it 'good.'

Now we do sell API access to our index, but not for stuff like this.


Thanks. Interesting.


Trying to scam Blekko alone, or all search engines?


All.

When I was at Google there is a new hire class that talks a bit about security and 'bad' queries. There is an admonishment in the class notes to 'not' experiment with those queries unless you were a member of SecOps. They claimed they would notice, and they would track you down.

So to answer the meta question, all search engines have this problem. I know this is true because of the new hire class at Google (largest search engine) and because we (Blekko) offer our index through API access to a number of smaller brands you may have tried and by doing that we see what they see in terms of search queries.

Generally there is someone in an operations role at those partners with whom I can share notes and pass along the latest fashion in terms of scams.


By bad queries, you mean if you query "how to buy links without getting notice", Google will track you down, find out who you are, and try and get your sites penalized?


No, I mean queries that look for things like exposed credit card numbers or passwords, or vulnerable software. Web crawlers don't discriminate in what they index and inexperienced or inept web administrators sometimes put things out there that they shouldn't.

Criminals attempt to exploit that, using search engine results to figure out likely targets.

There are also people who try to exploit keyword searches and the advertisers who buy advertisements on them but that is less obviously criminal.


Interesting so Google has an "Internal Afairs" dept who audit themselves ie if some one looks up celbrities details internaly either for curiosity or to sell to the tabloid press.

I know that big telcos have internal security goups and British Telecoms had/has a ferocious rep.

BT got even stricter after some one got a temp job and looked up the x diretory numbers for one of the Queens residences.

After that they started doing posative vetting for team leads on major projects - that is the equivelent of TS clearance in the USA


I've spotted The Atlantic going some cheesy things, and as much as I like some of their content, it really dims my opinion of them. For instance, take a look at these two submissions on the same day:

http://news.ycombinator.com/item?id=4074345

-- and --

http://news.ycombinator.com/item?id=4074230

These point, respectively, to the following url's. Notice the slight url difference in what is actually the same article.

http://www.theatlantic.com/business/archive/2012/06/what-cou...

http://www.theatlantic.com/business/archive/2012/06/what-cou...


This will really blow your mind, then:

www.theatlantic.com/business/archive/2012/06/this-part-actually-does-not-matter/258139/

Feel free to change the next-to-last part to whatever you like.


It's not just the meaningless stubs that I found fishy. It was the same article, posted twice on the same day to two different URL's, by the same person - the author of the article.


It's called A/B testing.


That is not A/B testing, that is circumventing the duplicate submission filter.


Or, it's A/B testing.

The Atlantic is in the content business, and in case you hadn't heard, that space is having its lunch eaten. It's part of the job to produce content and figure out how to have it read. That involves playing around with things.

I don't really find it reprehensible that people write and then promote their own content, whether that's music, movies or the written word. What else should be done? Establish centralized "distribution" centres? Wait a minute...


Promoting your articles on Twitter, Facebook, Google News and other channels expressly designed for that kind of purpose is fine.

Submitting your own articles once to a place like HN or reddit is pretty shady in the first place; since that's not the intent of these sites. The intent of these sites is sharing stuff you found, not stuff you created.

Submitting your own article twice is just a dick move.


>"Submitting your own articles once to a place like HN or reddit is pretty shady in the first place"

I just disagree. These sites operate pretty well as martkets. If your stuff sucks, I likely won't stumble on it.

>"since that's not the intent of these sites"

Ironic, since we have a "Show HN" topic here, the sole purpose of which is to do what you're saying shouldn't be done.

>"The intent of these sites is sharing stuff you found, not stuff you created."

I want to read good content; I don't care who submits it. I'd rather risk people submitting their own crap sometimes than never getting a chance to read something good that nobody "found".

Spamming is one thing, and obnoxious. But I find it a tad hypocritical to accept A/B testing colours on a button to squeeze another buck from someone, but reject someone A/B testing content to find out what combination of words in the title creates the most page views. It's the same business.


> Spamming is one thing, and obnoxious. But I find it a tad hypocritical to accept A/B testing colours on a button to squeeze another buck from someone, but reject someone A/B testing content to find out what combination of words in the title creates the most page views. It's the same business.

Oh come on. It's clearly not the changing of the URL in and of itself that I was objecting to. It's the doing it for the express purpose of submitting it twice to a site which has a filter set up in order to prevent that behavior.

I can't believe you didn't understand that from the get go, so I can only conclude that you are being deliberately obtuse which is annoying.


This actually isn't all that unusual, and there's nothing sinister about it. Amazon does the same thing; try changing

  http://www.amazon.com/Cryptonomicon-ebook/dp/B000FC11A6/
to

  http://www.amazon.com/Snow-Crash/dp/B000FC11A6/
People like seeing friendly, human-readable URLs. But it's bad practice to depend on them--if the title changes for some reason, you can break old links. That's why these URIs include both a human-readable description (which is ignored) and the unique ID of the resource, which is what's really used to render the relevant page.


I don't think there's anything surprising about it. Lots of sites (including stackoverflow) make the slug useless so they (or users) can change the title whenever they want.


Actually I think the feature is primarily for SEO purposes, since the alternative is to have the article ID only.


SEO-friendly URL's are fine, but meaningless dynamic slugs are a really Bad Idea.

Even if you remember to implement a canonical tag pointing to the "real" page, you risk people linking to these dummy pages (producing a minor loss of link value according to Cutts) or weird mangled versions coming into existence (think escaped referrer logs that make their way public and crawlers find, etc), and then bots have to take the time to request incorrect URL's and find out they're junk. Better to return a 404 or 301 to the correct page (if you know what it should be).

See, this is actually the kind of sh*t that SEO is about: usability for bots.


Yes exactly.

If you're going to have slug-type Urls, they should at least be unique, and if they do change, the old version should at least show a canonical link and/or 301 to the new one.

Usually this means keeping a record of every variation the Url has ever been for a piece of unique content.


A feature like that is a tremendous boon to karma whores. The Atlantic probably isn't the site to accomplish this, but such a feature could be used to effectively "outsource" your site's spamming to karma whoring cabals.


The thing I don't understand is why people are actually seeking karma by these means. Sonner or later they have to get cought, no? And once you are, all that precious karma will just evaporate, no?

I can understand spammers, kind of and even trolls in their own logic. But karma whores, in my understanding WANT to belong. So such a move would be quite, well, stupid.


That's vicious! But they really thought they would get away with it on site called HACKER news???


Is he only posting links to his own articles at his employer? He should, really, make that a bit clearer in his profile.

I think I'd be a bit more worried if someone posted a variety of link-baity slugs to the same story.


It's funny because if you look at the HN poster's submissions, @mattobrian, his submit history is all Atlantic, Atlantic, Atlantic, Atlantic, etc., until 7 hours ago for the last 2 months when, boom, he posts a link to Slate. Feeling a change in the wind direction?


The problem is that the atlantic tends to post things worth reading (and their articles tend to be better than most blog posts) so really if you ban them, hn suffers.


If it's really worth reading someone else will submit it.


The entire domain is banned, not the users submitting the link.


At Reddit. Not here. Though, personally, I think we'd benefit from losing VentureWire and ITWorld.


and Extreme Tech, PandoDaily, TechCrunch, New York Times, Wired, GigaOM, CNN, CNBC, and so on. It's not like HN would be worse without them.


We'd be worse off without the NYT datavis posts.

The rest, I agree! Pando and TechCrunch in particular.

Don't hold your breath, though. Banning those sites would do a disservice to YC's portfolio; one of the tangible benefits of being in YC is easier access to the trade press. I really don't expect to see 'pg do anything to alienate anybody who could potentially write a useful story about a YC company.

(I'm fine with that as a cost of "not having to run a site like HN myself".)


Well, I hope the NYT articles (even the non datavis ones) provide a benefit other than normalizing relations between hackers and the general public. If the world needs to be better informed about the affairs of hackers, the hackers also benefit from knowing more about the world.


I personally disagree with regards of wired. And not only because I submit quite regularly from them (I just found out about, so don't be to harsh!). They sometimes have some interessting stuff. For me, again this is only my personal opinion, as long as a certain article has the potential to start interessting dicussions AND isn't just bait, I submit it.

But banning what one could call mainstream media from HN wouldn't do it favour. My impression is that I'm no the only non-hacker around here, so not strictly tech issues have their justification as far as I'm concerened. And if some content shows up that just doesn't fit, well simply don't vote for it or simply flag it. I for my part don't care who submitted a certain article or from which source as long as it's a good read or a good discussions. Perfect if it's both. Just my 5 cents.

Disclaimer: Not working for any of the above mentioned news outlets, and I don't even have a wired subscription. :-)


I'd vote for keeping NYT and Wired - however it's hard to algorithmically enforce the "only post original, substantive articles" rule. (I guess that's what voting and flagging and for though.)


I'd support banning TC and PandoDaily from HN


I think you need to say why TC And Pando are deserving and no disliking MA is not a valid reaseon


That depends on whether you value the HN as a set of links, or as a place to discuss topics of the day. If it's the latter, then which particular articles are put up is of less importance than the quality of the comments and moderation system here on HN. I wonder how many people actually click the links and visit articles before commenting? The numbers might surprise us.


Phys.org and sciencedaily.com can go too, but they rarely get posted here.

They really clog up the science subreddits with wildly hyperbolic titles that get debunked in the first comment.


Why is that a bad thing? If these outlets are spouting off lies/bad science then having this behaviour noted is a good thing, indeed it's a great thing IMO.


You forgot TheNextWeb.


TheVerge too. +1 for dropping TechCrunch unless the post is by Mark Suster, in which case, whitelist that...


Why?

Isn't the whole point that people can submit what they find of interest and what they believe will interest others? All it takes to "ban" them is to not vote them up.


No, because the end result of that policy is cat pictures.


Which apparently people like?


So instead of seeing "theatlantic.com" in a link, we'll see "mypersonalop-edblog.wordspot.com". Cool.


Even if it wasn't temporary, it's not like people don't know where to find articles from The Atlantic.


It's temporary.


[deleted]


This worries me. I'd love to hear an official statement on this from Reddit.


I'm not sure what the deleted comment said but there is plenty of official reddit admins commenting in the linked thread

http://www.reddit.com/r/TheoryOfReddit/comments/v03qc/physor...


The comment looked at the relationships of the banned sites and their competitive nature with the other Condé Nast properties, and how they are in fact some of the biggest competitors to these said properties.


But reddit isn't owned by Condé Naste anymore: http://blog.reddit.com/2011/09/independence.html


reddit Inc. is now owned by Advance Publications (which also owns Condé Nast)


Do you really think they are that out of touch with their capricious user base?

They would have to be barking mad to think they would benefit from trying to control the site in favor of Advance.


I've been collecting examples of people who at-a-glance submit mostly their own stuff or stuff from their employer. It's not short.

https://gist.github.com/461a1a7ab5cc82df02d5

I don't personally think it's a bad thing to submit one's own stuff (who doesn't like DanielBMarkham's agile posts or Jeff Barr's AWS stuff?); I just like to keep lists.


I'm kind of unnerved by this.

What if I want to submit one of my own links one day--will I wind up on your list? And if someone out the loop happens to stumble upon your list, will they think that I am a self-promoting spammer due to my inclusion?

You should probably change this list a little. Adding a comment or two about its purpose would be welcome, and including statistics such as submission ratios by each username would be nice and help to avoid such confusion I have mentioned.

Finally, how does someone get 'off' your list?


I don't think it's always a bad thing, either, but there are better and worse ways of doing it. A regular HN user who sometimes submits something they wrote, when they think it would be of interest to HN, is quite a bit different from an account that exists solely to auto-submit every post from a blog to HN, without otherwise participating.


The problem with such policies is that the definitions are squishy.

How do you define "regular" and "sometimes"? At what point do you decide someone is "participating"?

If someone submits, say, 10 links to other (presumably interesting) sites for every 1 link to his own site, is that kosher? What if it's 4:1? 1:1?

How interesting do the other sites need to be? At one extreme is neat stuff that everyone likes and makes it to the home page (which benefits the "my own stuff" not at all, but certainly helps with karma). At the other is junk thrown in as filler to make that 10:1 ratio.

These are not idle questions. Especially when I think everyone in the HN community is aware of journalists' and bloggers' need to attract attention to their articles (which they, at least, believe are relevant). The question is how one can be an accepted member of the community _and_ also let people know what he created.


I think this is an area that could still use improvement. It seems to have reached a point where the chances for an average HNer to get exposure and feedback on projects in Show/Ask HN type posts is really low. It's a shame because HN is a top-notch community, but it just doesn't feel accessible to people without pre-existing clout in the industry, popularity on the site, or the facility and willingness to use aggressive promotion tactics.

I'm not sure what the solution is, but I think innovation on this front could help to preserve and enhance HN's vibrancy.


Do they know that their ability to vote has been taken away? Or do they have to figure it out for themselves? I would guess that it would be more effective if they thought their votes still counted.


That is typically how "shadow banning" on reddit works. You have no idea you've been banned, and everything appears to work as normal.


So do they see the vote counter going up as well, while for everyone else it remains the same?


I cannot answer about reddit, but the more I read about this issue, the more similar I can see reddit voting system works to the one HN has. AFAIR Reddit was a PG startup (?) so similarities make sense...

When you are hellbanned on HN, here is what happens: nothing. And for sure you are not informed about this in any sort of way: no popup, no messages, nothing. You think everything is normal and you still can write comments, but they appear only to you, which is confusing because you assume others can see them too. You can upvote (and as usual arrow disappears) but your vote does not count. Another thing is that you are being thrown at some sort of non-cached, low-priority server that is as slow as hell. It then takes about 10-15 seconds to open any comments' page (this is confirmed by me, because once I got my account resurrected, everything start working as fast as my internet goes, same moment with no changes on my end). Another thing noticed: after your account is resurrected, your score can go up/down like before, but average score does not change (mine stays .65 for a while now). During dead period I wrote some long and, in my opinion, interesting comments, that were not displayed (especially spent some time on this one: http://news.ycombinator.com/item?id=4068798 ). A nice fellow from England noticed how silly I am for keep participating in HN community but really being a ghost. I emailed info@ couple times and told them I barely get massive downvotes and my karma stay positive for a long period of time. I don't spam or troll. Nothing happened. After couple days I start emailing PG and about 12 hours later got email from someone else apologizing about being killed and my account got resurrected.

I hope that PG/HN/team will add some features so that someone killed can know about it.

edit: sorry didnt answer: as of HN, no, the system is not that hellish. Score does not go up just for you (so you would assume everything is fine).


> I hope that PG/HN/team will add some features so that someone killed can know about it.

Every once in a while, right click the link to one of your comments and "open link in incognito window".

Adding features that tell you you're hellbanned defeats the purpose of hellbanning people.


Telling driver that he got ticket defeats the purpose of ticketing drivers? Sorry for my bias logic, but don't you think I may want to learn a lesson and become better "HNer"? Hellbanning is like the ultimate death penalty. No warnings, no tickets, no first or second degree, nothing, just "we don't want you here anymore".

If I would get some sort of warning of what I am doing wrong, I may be given a chance to learn. Isn't that the purpose behind coming to HN? To learn?


On HN, downvotes and flags are the warnings and tickets.

Hellbans are only intended for spam accounts and trolls, neither of which is going to change their behavior. Sometimes people that aren't spammers or trolls get hellbanned, but that's an unfortunate side-effect of imperfect algorithms and moderators, not intentional.


A professional/habitual troll will know about hellbanning, and techniques like your "open in incognito window". Making it hard to know than you're hellbanned hurts innocents who were misidentified far more than it hurts actual spammers.


"PG company" seems like a bit of a stretch. Reddit was funded by Y Combinator, but Paul Graham wasn't a founder or anything.


No not entirely. It acts like its being upvoted by the banned account, and to everyone else. It just doesnt actually count behind the scenes. If all you needed was 2 reddit accounts to detect if you're banned or not, it wouldnt be a very sound system ;)

Reddit uses an "upvote fuzzing" algorithm. All up votes you see anywhere on reddit are approximations influenced by the shadowban system. Only after a story has become old and locked, do the upvote totals truly stabalize to their actual levels.

The stories position on the site is based on the actual numbers. The numbers next to the story are just approximate though, to help fool banned accounts.

This is why things like Reddit Enhancement Suite that show upvote/downvote totals for comments are largely incorrect, and nothing more than a novelty.


On reddit the vote counts are fuzzed so nobody ever really knows if their individual vote counted. Even on stories with as little as 5 votes, the actual count displayed will fluctuate between 3 and 7. The only way you can really tell if you've been shadow banned is if your whole network of puppet accounts no longer has an effect.


Interesting. Is this implemented similarly to "hellban"? Which is to say, does it appear that the votes are valid to the voting rings, or are they shown something else entirely?


Which seems a much better approach than outright banning/censoring domains.


It seems to me that if a publication really wants to beat the system, banning users can only work to a certain degree. If you really want to stop them from trying, banning a whole domain seems like a good way to persuade them as long as the bans are temporary.

Disclaimer: I have no idea how effective the user banning methods are, but I assume they can be outsmarted more than domain ban.


HN doesn't have nearly the mass and concomitant crowd problems of Reddit, though.

Not that it banning domains is ideal, but effective implementation is often more important.


Then I would also like to see the government sponsored ones to be banned - guess that might be impossible.

One of the worst examples of that is the so-called Global Fund to fight AIDS etc - actually burring any critics on a scandal where billions of your tax payers' money earmarked for development aid disappeared with some of the worst criminals on this planet - and dozens of millions continue to die from that.

There is no perfect world with social networks - you have to take everything happening there with a "grain of salt"


According to a Reddit admin they contact the offenders before banning.

This type of action would merit some type of direct contact with the individuals or company who run the domain.

http://www.reddit.com/r/TheoryOfReddit/comments/v03qc/physor...


Any advice on good papers to read about identifying voting rings?


How hard would it be just to expose the information on who up-voted an article? That would leave it open to easy mining to figure out up-voting trends.


how do you detect these "Voter rings?"




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: