>"Here's 100PiB of unlabeled neural net weights. Knock yourselves out."
You need to give the user an explanation on why you blocked his account, but if Google is kind enough to add on top the secret neural network then some people would be happy to have a look at it and find even more garbage in it.
Which is itself detected by a 80PiB neural network, based on the 60TB output of new rules that another neural network spits out every week based on the temperature outside of the corner office and the taste of Sundar's coffee this morning.
The coffee roast temperature and grind is decided every day by yet another ML algorithm, as Google has effectively an unlimited army of ML researchers and infinite computing power. A rogue PhD on a cocaine binge unfortunately tuned the parameters too high once, and the results have been getting worse ever since, as a result of Sundar being increasingly disappointed by the coffee, but not being able to do anything about it because "it's the algorithm"
Same at FB as far as I could tell while I was there. "The algorithm" is a misnomer, popularized by the press but really kind of silly. There are really thousands of pipelines and models developed by different people running on different slices of the data available. Some are reasonably transparent. Others, based on training, are utterly opaque to humans. Then the weights are all combined to yield what users see. And it all changes every day if not every hour. Even if it could all be explained in a useful way, that explanation would be out of date as soon as it was received.
I'm not saying that to defend anyone BTW. This complexity and opacity (which is transitive in the sense that a combined result including even one opaque part itself becomes opaque) is very much the problem. What I'm saying is that it's likely impossible for the companies to comply without making fundamental changes ... which might well be the intent, but if that's the case it should be more explicit.
What needs to be shared is a high level arch not nuts and bolts.
At a broad level:
what are the input sources like IP address , clicks on other websites etc you use to feed the model.
What is the overall system optimized for , like some combination of engagement , view time etc, just listing them if possible in a order of preference is good enough
Alternatively what does your human management measure and monitor as the business metrics of success .
I want to know what behaviors (not necessarily how ) are used , I want to know what is feed trying to optimize for , more engagement, more view time to etc
This is not adversarial, knowing this helps as modify user behavior to make the model work better.
Users already have some sense of this and work around it blindly , for example YouTube has heavy emphasis on resent views and search . I (and am sure others) would use signed out user to see content way outside my interest area so my feed isn’t polluted with poor recommendations. I may have watched 1000’s hours of educational content but google would still think some how to video I watched once means I need to only see that kind of content.
Google knows it is me sure even am signed out, but they don’t use it change my feed that’s the important part and knowing that can help improve my user experience
They haven't talked much detail since Matt Cutts left, but over time they did sort of outline the basics. That the core ranking is still some evolution of PageRank, weighting scoring of page attributes/metadata and flowing it down/through inbound links as well. But then altered via various waves of ML, like Vince (authority/brand power), Panda (inbound link quality), Penguin (content quality), and many others that targeted other attributes (page layout, ad placement, etc).
Even if some of that is off, the premise of a chain of some ML, and some not ML, processors means they probably can't really tell you exactly why anything ranks where it does.
It's clear the public and lawmakers like the idea of knowing how the algorithm works, but what you posted is about as deep as people can reasonably understand at a high level. I don't think they realize how complex a system built over 20 years that's a trillion-dollar company's raison d'être can be.
Those sound like awesome potential features. Allow users to assign 0-100% weights for each of those scoring adjustments during search,and show them the calcs (if you can).
Supposedly there's thousands of different features that are scored, and those are just the rolled-up categories that needed their own separate ML pipeline step.
Like, maybe, for example, a feature is "this site has a favicon.ico that is unique and not used elsewhere" (page quality). Or "this page has ads, but they are below the fold" (page layout). Or "this site has > X amount of inbound links from a hand curated list of 'legitimate branded sites'" (page/site authority).
Google then picks a starting weight for all these things, and has human reviewers score the quality of the results, order of ranking, etc, based on a Google written how-to-score document. Then tweaks the weights, re-runs the ML pipeline, and has the humans score again, in some iterative loop until they seem good.
There's a never-acted-on FTC report[1] that describes how they used this system to rank their competition (comparison shopping sites) lower in the search results.
Edit: Note that a lot of detail is missing here. Like topic relevance, where a site may rank well for some niche category it specializes in. But that it wouldn't necessarily rank well for a completely different topic, even with good content, since it has no established signals it should.
I doubt it, they should know what the various algorithms are, especially the most important ones that drive most of the ranking. But their competitive advantage would be on the line.
Google manually adjusts its results for censorship reasons. This is probably why google has gotten so much worse, they don't want information to be freely accessible, they only want things they approve of to be seen.
I reckon you're right, but I doubt that it's manual or under Google's control. Google is too important a tool of control to be left in the hands of Silly Valley idealists.
I've always wondered why Sergey Brin and Larry Page retired when they did, it coincides almost exactly with the beginning of the SERP quality decline. Wonder what sort of conversation they had with intelligence to quietly walk to the door, cash out, and say nothing about the company since.
What happened was they got what they wanted: full control of running the business. Then they quickly learned that was actually a lot of work and not very much fun, made some fairly unpopular decisions (business, product and policy) with a fair amount of public backlash, put Sundar in charge and backed away.