Hacker News new | past | comments | ask | show | jobs | submit login

This is very common in high-frequency trading. There's only so much you can do to capture the number 1 spot, and the number 1 spot means x% of market share, so a perfectly legitimate strategy is duplicate your entire hardware 5x, since now you're not 1/10th of the leading edge, you're 5/15 instead. Especially when the costs of spinning up a new instance are low, but you're going a lot via a small increase in share.



Day one of "writing a reasonable search engine" would be to kill sitespam, no? Any time you see N sites with the same or very similar content, you should assume immediately drop the rank of all but a small slice at the top. You can choose to keep the one with most trusted incoming link (regular pagerank) but also you could just trust the oldest one in your index!

For example: there is now an epidemic of StackOverflow clone sites. They just post the SO answers with their own ads. But I don't want that site. So how on earth can Google show the clone sites on top of the true StackOverflow?

You'd think they have systems in place with hundreds of thousands of canary queries and "known expected rankings" such that IF one of the fraud sites manage to trick their system, they can just swiftly patch it to restore order and bury the clone sites after page 100. But no.


Do you think it could be in Google's interest to show those clones, at least in the short term? How does the advertising game work for those sites?

It just seems unlikely that they can't do anything to fix it.


> there is now an epidemic of StackOverflow clone sites

I never saw a single StackOverflow clone site in my life.


These days I see more clone results than actual SO results for some queries.

It’s been a thing for years See e.g https://news.ycombinator.com/item?id=10103545

But lately (last 6 months maybe) it exploded.


All of the clone sites you linked are dead links now.


Yes that was a five year old HN post so it was the crop of SO clones from 5 years ago. They constantly change.

Example from now:

https://coderedirect.com/questions/190779/changing-the-color...

They even have the same url fornat and everything.

Actual result

https://stackoverflow.com/questions/11862315/changing-the-co...

Here I took a random SO post and searched for a sentence in an answer. This result is in the top 5 results (in this case the real SO result was above - but not rarely the impostor sites are above)


I think it depends on location, browser config and other things. As a another comment said, I generally get SO pages at top, but these pages with same title appear next. I thought they were different forums where there were different answers but they were just SO clones.


That would explain why I cannot reproduce most of the problems here.


I just did a search for a python problem and got a stackoverflow clone on the front page. From a few different searches I found a clone with most of them. They weren't at the top, usually after 5th place. I've also seen sites that have just copied github issue posts.


Can I have the exact query and the links to the clones?


Not OP, but you can replicate probably very well by just copying any sentence from an SO question and pasting it into google. You will find duplicates. If not, try find a more unique sentence in that question/answer, especially with a weird way of speaking. In my example below, the tell was that legitimate NLP re-writes of the question didn't include "and do what ever you want", so including it found lots of clones.

They are being intelligent now and using NLP to mix-up the content, but it's very much the same question or answers, or just the answers, or some variation of the two, or made to look like a forum with SO comments as forum replies, etc. Most of it is non-nonsensical if you try understand it.

Example:

"What you can do is set the FormBorderStyle property to None and do what ever you want with the form using GDI" from https://stackoverflow.com/questions/11862315/changing-the-co...

Gives you:

    https://pretagteam.com/question/changing-the-color-of-the-title-bar-in-winform
    https://www.xsprogram.com/content/changing-the-color-of-the-title-bar-in-winform.html
    http://62.234.115.194/ask/111862315.html
There were a couple that I had no intention of clicking to confirm, though. And that IP one above I probably shouldn't have either.

    http://www.apes.today/post/11862315/1
    https://geek-qa.imtqy.com/questions/193507/index.html
    https://www.itdaan.com/blog/2012/08/07/b72deb841dfd594210520ce0edc22516.html
    https://stackqna.com/questions/11862315/changing-the-color-of-the-title-bar-in-winform
    https://www.extutorial.com/en/share/497449
    https://code-examples.net/en/q/b5012b
    https://csharp.developreference.com/article/24268263/Changing+the+color+of+the+title+bar+in+WinForm
That was just what I could glance from the preview and all on the first google hits page.

I 100% guarantee that if you wacked 95% of the above domains and forever banned whoever registered them legitimately from the web forever that you'd make the web a better place.


Googling "What you can do is set the FormBorderStyle property to None and do what ever you want with the form using GDI" gives me only 5 clones on the first page and 4 of them are blocked by uBlacklist. The stackoverflow result is above the clones.

First google page when using uBlacklist:

https://social.msdn.microsoft.com/Forums/vstudio/en-US/e8f5a...

https://stackoverflow.com/questions/11862315/changing-the-co...

clone: http://62.234.115.194/ask/111862315.html

https://books.google.de/books?id=rLCy1mCqChEC&pg=PT189&lpg=P...

https://github.com/xv/xrails-login-ui

https://link.springer.com/content/pdf/bbm%3A978-1-4302-2550-...



It’s more profitable to run identical clones of the same strategy N times?


Yes, it costs you $5 to run a website so just run tonnes of them and hope 1 wins. Or $500 to run a trading system, but the difference between payoffN/T and payoffN+1/T+1 is greater than 500. Assuming obviously that you have a trading system that is more or less as good as everyone else’s and N is the number of systems you’re running and T is the number of systems in the market




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: