Or how GeeksForGeeks shows up higher than the pages for the official Python docs.
Or how searching for anything seemes to find an auto-generated page with that phrase that outranks any useful info.
The authentic source of the content should be the first hit, not someone talking about it, or a clone, or an SEO page linking to the authentic content. That is a colossal failure.
It's actually surprising that for all Googles talents, they can't tell that site X is just cloning StackExchange or that official documentation should probably rank higher than some random site. Either they can't or they don't care.
These sites are easily detected as clones. How Google reacts is part of the adversarial game theory.
For StackExchange clones, their tactic seems to be to push them to a secondary index. Hellban them, but keep them visible to the creators. You never start over, and try again with smarter duplicate evasion. You just see your site wither to insignificance, with sometimes a temporary bump to confuse you/annoy you/keep you uncertain about which changes helped. But sometimes this strategy can make it seem Google can't detect this, especially when using very specific keywords only found on StackExchange, there just may not be a better 18 pages than a duplicate page with a different "related questions" section.
Official documentation underranking some random site is nearly always a temporary anomaly (or makes some sense, in the case of very verbose documentation, like W3C docs). If structural, nudge Google along with some reports. If malicious, these are the sites that Google is likely to completely nuke. All authority and investment gone. Any part of the spammer's link network contributing to artificial authority also exposed. Makes more economic sense to pick softer targets and spend more energy on staying hidden/not overdoing it, never sure of the threshold.
Or how GeeksForGeeks shows up higher than the pages for the official Python docs.
Or how searching for anything seemes to find an auto-generated page with that phrase that outranks any useful info.
The authentic source of the content should be the first hit, not someone talking about it, or a clone, or an SEO page linking to the authentic content. That is a colossal failure.