1. A table of contents section
2. About 30mb of AI, autogenerated filler ("Why should you foo?", "How does foo affect your dog?", "Can you foo a foo?", "Why you should foo twice a day?")
3. The same content plagiarized from each other's sites, just slightly reformatted/edited
4. NO ACTUAL INFORMATION ON HOW TO BARBAZ THE FOO
It's 2022, and there's still no way to filter this garbage out?
I'm sure there is, but there is an enormous corpus of data, and any filtering you add can end up impacting the legitimate sites you're trying to get to.
SEO spam is constantly adapting to changes in filtering. If you start filtering sites that have a table of contents, SEO spam will remove their ToC in no time, but authentic blog poster will probably not.
Real content producers don't have the time to chase every change to google's algorithm but SEO spammers do. How do you filter out the spammers who adapt to changes without affecting the real content producers who can't afford it?
I promise you, the problem is harder than you realize. And sure as shit a lot harder than "just add a bayesian filter and these 3 hard coded rules I just came up with"
I hate it when pages DON'T have a toc. I wanna see the headings with links so I can jump to the part of the page I care about. Not scroll through hoping to find it...
Or just exclude all basic pages (e.g. recipe) that insist on having a table of contents.
I know that when I'm searching for answers and the page has a TOC, 9/10 I close it out.