Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google rewrites many page titles (zyppy.com)
547 points by cyrusshepard on Jan 26, 2022 | hide | past | favorite | 250 comments


> Many site owners find that the titles they carefully craft almost all get rewritten.

Yeah, I'm with Google on this one. I don't see many reasons why a site owner would spend extraordinary amounts of time to "carefully craft" page titles other than SEO and optimizing for clickbaitness. As a user, I'm fine with Google counteracting this.


I think that is the worst reason for them to rewrite titles. If they left the title as-is, then I would be able to see in the search results that it was a spammy site and ignore it. Instead Google is helping to launder their SEO and present it as a more legitimate site. If Google thinks a site is gaming their algorithms they should de-prioritize it, not rewrite it.


I think you've got this wrong - they should heavily editorialize the titles.

Honest titles of search results:

* Five pages of flowery text and images before two lines of instruction on how to boil rice

* A bunch of tantalizing pictures of exactly what you're looking for but zero further information about it

* Product reviews machine-generated from public review sources with affiliate links. Top-rated product has the best affiliate revenue.

* You won't care about this solution to a problem you don't have.

etc...


Ah, rice. The quintessential Asian grain, now consumed by billions around the world. When I was a child, my mixed-race family used to eat rice every day! Even today, the subtle aroma of rice wafting up from the kitchen brings a sense of nostalgia. It's a sure sign that dinner is approaching, my favorite meal of the day...

[5 pages later]

1. Put rice and water in rice cooker

2. Press "start"


Isn't Google one major reason that recipe sites act like this? They've long favoured sites with a lot of textual content (which authors then break up with images) and also penalised sites that people tend to reverse out of quickly? A long story fits that because the majority of people need to read down for the content rather than get their instant answer and immediately retreat.

I find it annoying too, but it often feels like people ridicule the authors when they wouldn't get any traffic if it weren't for that approach. I don't think I've ever searched for a recipe and come across a barebones Gantt-chart-style engineer-thinking recipe plan.


It's not Google per se but a practical impossibility: it has to rank somehow, hopefully as a human knowing the answer would. They could theoretically hire humans to do it but they won't because it's obviously impossible due to how vast the dataset is, so they use software. Software is still far from human level reasoning so they use metrics. Metrics can and will be discovered and gamed, regardless of what kind they are.


As an AdSense user, if you don't use the maximum allowable number of ads (regardless of content length), Google literally emails you to suggest you add more ads. Their documentation encourages you to maintain a reasonable ratio of ads to content at risk of being shutdown, which pushes out page length. They push for unique content (so writers differentiate with personal stories), they measure time on page (longer details, pictures), etc.


> I don't think I've ever searched for a recipe and come across a barebones Gantt-chart-style engineer-thinking recipe plan.

https://clovegarden.com/recipes/index.html


Sorry, I might not have been clear enough. I know they exist. I'm saying that I've never searched for a recipe for something and a leading result has been in that sort of format. Google has created the environment in which the maligned 'epic story and photo album finished by actual recipe' formula wins through, yet the recipe creators get the ridicule.


> Google has created the environment in which the maligned 'epic story and photo album finished by actual recipe' formula wins through, yet the recipe creators get the ridicule.

They still deserve it, IMO. Willingly making a clown, a pawn of an ad-spamming corporation, out of oneself by doing one's darnedest to "win through" at some perverse game rigged by the aforementioned scourge of the Internet, is neither a natural human right nor a divine command. Not playing that game is still a valid move, and AFAICS the only honourable -- i.e. the only non-ridicule-worthy -- one.


So if you're super-keen on food and trying to establish a career as a recipe creator or food photographer, it's dishonourable to: put in a lot of effort custom-writing supporting material and taking quality photos of the dish you're pitching to people? Sounds like these things traumatise you! :)

I'd agree that misleading made-for-AdSense sites that purport to, but then don't, answer a question, farm out writing to $5/page content squads and intersperse stock photos - that's shoddy. But all the recipe sites I find in my searches and have to scroll down for the content, they always seem like genuine, personal efforts. If I'm getting the content for free, scrolling a little bit as a price isn't too laborious and a stretch to think of it as dishonourable work IMO.


> So if you're super-keen on food and trying to establish a career

IOW, you're trying to earn money. Sure, go ahead -- but then you get to pay the price. If the method you're trying to earn money by is going to involve playing along in the game of clickbait, then the price you get to pay is going to be, to be seen as a purveyor of clickbait. Which I, and I suspect quite a few others with me, see as distinctly less than honourable.

It's a free choice: Nobody is forcing anybody to "establish a career as a recipe creator or food photographer" on the ad-financed Internet. If they choose to play the clickbait clown/scum game, they're making themselves into -- so, in the end, are -- clickbait clown/scum. I sure didn't tell them to do that, so I'm perfectly free to see them as such for doing it.

They, OTOH, are perfectly free to try it some other way: publishing printed cookbooks in stead of Internet clickbait; or something adjacent, like run cooking classes, start a restaurant or catering business... Or to do something else altogether.

They could always go into the deeply honourable (/s) business of software engineering, which nowadays seems to consist to about 45% of running ad-spam networks, to about 45% of writing SEO crap to get your ads onto those networks, and about 10% other development... :-( What, me cynic? Bah, geroffmylawn!


I have dozens of cookbooks. Almost every single one is absolutely packed with personal details about the chef and background information on the recipe (who taught them the recipe, their beloved Nanna's method, the history of nut x in remote tribal desert y, etc).

I've been to cooking schools in multiple countries. All have gone into detail about the background of the chef and each recipe.

Same with restaurants. Many restaurants and certainly almost every fine-dining restaurant pushes the profile of the head chef.


I can't help that you paid God knows how much extra for this unnecessary fluff. If I were to get any of those, I'd look around for the least extraneous-fluff-y offer I could find. :-)

More seriously: At least the classes and restaurants already push that stuff in their marketing, don't they? So I get all that already doing my comparison shopping, and therefore would probably actually (at least to some extent) resent the time wasted on repeating it. And the few cookboks I (or we, my wife and I) have are also of the matter-of-fact, recipes and nothing more, kind... I am probably just much less of a "foodie" than you. I think my preference pattern is the overwhelming majority.

Note that Clovegarden has "the history of nut x in remote tribal desert y, etc" too -- but on pages separate from the recipes. (As I recall Mr Grygus started the site in preparation for starting a business of selling foodie stuff online after winding up his computing and automation consultancy business -- but that still seems to linger on, and he is nearing [or, probably, well past?] normal retiring age, so I don't know if that new business will ever materialise. But as long as he is up to updating Clovegarden every now and then it remains my favourite site for food-related stuff.)

[Edit: Ttypo.]


Woah now, shouldn't step 1 be broken up into 2 steps. Each with their own heading and a paragraph explaining how to do that?


What kind of rice? Do you rinse the rice first? How much rice?

How much water? Do you salt the water?


Reminds me of Plain Old Recipe, a website that strips out fluff from big recipe websites. You provide a link to a recipe, it makes it to the point. I thought the site had closed but it's apparently still live!

https://plainoldrecipe.com/ https://news.ycombinator.com/item?id=23648864 (Thank you HN :))


and "rice cooker" is an affiliate link


Let’s also not forget the 55 auto-playing video ads that I need to vault over to get to Step 1. Each one determined to hijack my mouse as I scroll/hurry past and cause a click! It’s like the world’s least fun platformer game.


You forgot the part where there's a pseudo-recipe after the story that catches your eye but doesn't have any measured amounts, and then the actual recipe later.


I got instantly annoyed by the first few words of this comment, thinking you’d gone off on some tangent about rice… until I saw the last part. Well played!


This sounds like it would make a very entertaining Chrome extension.


Would also be nice if they edited things to actually be true, e.g.

* e-bike with 10 miles of actual range even though they advertise 30 miles

* laptop with 2 hours of battery life at 100% CPU usage even though they advertise 10 hours

* median $450 flight even though they advertise it as $199


> laptop with 2 hours of battery life at 100% CPU usage

Is there any laptop on the market that lives up to this. Even top specced MBPs I've gotten from work fall down when you actually use the CPU with compilers and VMs.


My simple M1 mpb 16gb seems to work for almost 2 hours when hammering the cpu. Haven’t timed it actually but I find it astonishing compared to the Dell mess I’ve had to deal with before.


Oh just an example. Hammer it at 100% CPU usage and report battery life based on that.

Or a (min,max) based on idle and 100% CPU.


You're never going to guarantee some kind of range on an e-bike. What's the temperature of the battery? Is it mostly uphill or down hill? How much are you going to brake?

And advertising laptop battery life based on the CPU getting pegged to 100% gives meaningless information as its rare for people to actually have their device running at 100% load anyways.


> You're never going to guarantee some kind of range on an e-bike. What's the temperature of the battery? Is it mostly uphill or down hill? How much are you going to brake?

Yeah but testing the e-bike on a track and telling the public it has 30 miles of range based on that is disingenuous.

Instead, go to a city with an average amount of hills, stop lights, and cold weather and give in a go, and tell that number to the public. If it beats that, in their actual city they'll only be pleasantly surprised. Right now you strand a shitton of people because they think they have 30 miles.


That depends on Google being both honest and accurate. Perhaps they have been so far, but my concern would be that a re-written title would cause quality content to get passed over by many viewers as undesirable/irrelevant because some algorithm misunderstood/misinterpreted what it was looking at, or because google wanted to subtly discourage people from content that competes or disagrees with whatever Google is attempting to promote.

In a better world, algorithms would be perfect and there would be a lot of healthy competition in search engines and google would be incentivized to provide users with the best possible results. In our current world Google's algorithm can't identify obvious spam well enough to keep it out of their results and there are no major search engines that haven't been lifting results from Google directly or indirectly and repackaging them as their own, so google has no pressure to do anything but promote whatever is in their own best interests or keep their results accurate and free of spam.


Imagine if your CLI tools did this.


"Gaming their algorithm" sounds like a fancy way of saying SEO. If Google can produce for me a more accurate (or concise) title, it should only help me find what I'm looking for.

Forcing folks to trudge through inaccurate titles – or hoping people know the tells of a "spammy site" title – does not seem a better alternative.


> "Gaming their algorithm" sounds like a fancy way of saying SEO

It's quite the opposite, "Search Engine Optimization" is the fancy euphemism for gaming the algorithm.


My favorite is when the title sounds like what you’re looking for only to discover it’s a page full of ads and keywords. The original title doesn’t even match.

That causes me to lose faith in google not a better experience.


If that actually happens, I'm surprised the article doesn't cover it. I've never experienced that.


I’ve found it most on the 2nd or 3rd page when googling specific but not common error messages.


I think what HN and the SWE community at large has just missed about Google over the last 10 years is that the product is being built for the masses. Most people would prefer if you just rewrote the title to what it actually was rather than having to take on the cognitive load of understanding what SEO even is.


AMEN to that


+1


>I don't see many reasons why a site owner would spend extraordinary amounts of time to "carefully craft" page titles

Because I want the title to be concise, but still help people explicitly understand what my writing is about? Because I've already spent a lot of time on the content, to then just slap 'Lou's Wednesday Website Update' as a title? Because, historically, a title is an introduction to my writing?

Any of those.


Regarding one of these examples:

  How to Fix a Broken iPhone Screen [Tested by Experts] - Phone Fixer
->

  How to Fix a Broken iPhone Screen - Phone Fixer
Tested by Experts is obviously clickbait; nobody's going to say [Tested by novices].

Same for things like [Updated 2022] - there are tons of websites that superimpose [updated <currentyear>] even if the article content wasn't updated.


If Google believes the site is being disingenuous by writing a click bait headline, then they should punish the site by decreasing their ranking, not reward it by keeping it high and rewriting a more fitting headline.


But if the title is spam, and the content is good (this is a big 'if'), the best solution would be to rewrite the title so that it's useful and keep the page at its original rank, based on the content. Ideally, Google would be able to handle all these different cases and just give me the best search results. Now, we all know that's increasingly less true, but in theory that's how it should work.


But “for 2022” is a guarantee that the content is bad if it hasn’t changed in 2022.

And yet, I don’t see how Google can automate checking this. It’s possible to add a couple of sentences about how you’ve not seen anything to change your mind about last year’s recommendations. That may well be true. Or false. How can Google know? It just sees content that has changed. So it has been updated in 2022.

The bigger issue is brand trust (as a reviewer brand). The NYT bought Wirecutter, I think, because it had established itself as a trustworthy brand. That’s in direct line with the reputation the NYT wants to have as a whole.


I hate how true your second paragraph is. Google should punish sites that change the date without updating the content, but all the SEO spam is just going to automate changing content when it changes the date. And then what does Google do? Figure out how to make an AI that can understand all the indexed content and accurately determine if it's truthful?

That seems fundamentally impossible without defining trusted sources. But then that means that you're trusting that Google's trusted sources are good. And if you do think they're good, then why not just check those sources directly?

The only answer I have is to find your own sources that you trust and go to them first.


But if the title is spam, and the content is good

Then the content would not need to be spam, to be high ranking.

Not if google just cared about content quality.

So in this scenario, where only quality counts for rankings, all a spammy title shows, is the desire to bypass legitimate rankings.

Thus, it should be downranked.

Again, this was if Google legitimately wanted to rank good content high.


I'm not convinced, in general I don't like this additional layer of "fiddling around" with the original contents.

What about the opposite, the title being great but the contents not really? Shall Google serve its own "improved"/"summarized"/whatever version?

Meh... - this reminds me of the snippets of text extracted by some websites that are sometimes shown directly in Google's results, which in my case were sometimes wrong because they didn't take into account the context of what was written in the original contents.


It should do both.


Wouldn't it be better for the users to penalize the sites ranking instead hiding the fact that the result is your usual click bait drivel? Rewriting the titles just hides that the results Google found are low quality garbage.


Maybe Google does both?


> Tested by Experts is obviously clickbait

If we're going to start filtering all "obvious clickbait" then the search results are going to change fairly dramatically...


> If we're going to start filtering all "obvious clickbait" then the search results are going to change fairly dramatically

Isn’t this the intended effect?


> Isn’t this the intended effect?

I hate clickbait as much as the next user, but using that technique to get users to click appears to have even become part of the core business model of previously prestigious outlets.

Picking on the WaPo for no real reason:

How the Washington Post pulled off the hardest trick in journalism https://www.cjr.org/public_editor/washington-post-fluff-news...

An Open Letter to the Washington Post: Please Stop Doing Clickbait https://thedailybanter.com/2016/05/letter-to-the-washington-...


As a subscriber to several newspapers, it's always interesting to see how different the headlines are between the dead tree editions, and the online versions — even for the same story.

The dead tree headlines are almost always very factual and to the point. I don't think I've ever seen anything close to something like "Here's four awesome tricks to get China to admit to the Tiananmen Square massacre" as a headline in actual print.


The easiest fix for clickbait would be to penalize them for it.


More importantly, if the content is actually relevant to the user's search, does it matter whether the title is clickbait or not?

Clickbait pisses me off when it's used to waste my time, but a good search engine wouldn't give me results that waste my time.

In other words, it could give me a relevant result with a clickbait title.. I guess that'd be a little annoying but I don't know if I would want Google to be the judge on what's clickbait or not, and even then I don't feel like it's their place to override titles. I wouldn't want useful pages be downranked just for having a poor title.


A poor title reduces the quality of the resource, though. I think it’s reasonable that there is some penalty imposed for poor titles, and that could include clickbait. If the result is the best one for the search, sure, surface it. But if it’s not clear, though, “clickbait title” is a signal that the result is not the best.

I do agree it’s not really Google’s place to be rewriting titles, though. That feels very suspect.


> A poor title reduces the quality of the resource, though

Is there an objective way to assess quality?

A click-bait title on a page full of ads and text that keep the visitor's attention but don't deliver on the title... ?

Then having held the visitor on your site for a minute or two, but managing to leave them unsatisfied, how about ending the page with a big fat block of even more visual click-bait content at the bottom (Taboola, I'm looking at you).

Don't advertisers and publishers love this stuff? Great metrics.


It would be a great feature if they tracked the date when the content actually changed... significantly. I guess that could still be gamed.


Not obviously. If true, adds credibility.

"Phone Fixer" sounds more scammy to me, lol


> Tested by Experts is obviously clickbait; nobody's going to say [Tested by novices].

Nobody would write [Tested by novices] into their headline, but leaving out the part in the brackets would leave it open if it was tested by experts or novices. So in this case the removed bit does provide some information.


>>> Because I want the title to be concise, but still help people explicitly understand what my writing is about?

And yet from TFA:

>> In fact, we found that matching your H1 to your title dropped typically dropped the degree of rewriting across the board, often dramatically.

Users don't look much at titles - they end up in the browser tab or somewhere like that. If a title doesn't match the H1 heading it's often to get more stuff in for SEO. OTOH short titles might be useful when they show up in a tab where there is limited space. Maybe they shouldn't lengthen them for that reason.


Can't say I agree.

Google should be a neutral middle man providing the results as they are found. If they feel the title is not of their version of quality they should rank it lower.

I'd prefer the version of title of several hundred million individuals rather than Google's aggregated version.

They used to 'borrow' DMOZ titles before DMOZ became defunct. At least in that case it's another point of view on top of their own (and the site author)


Google can't be a neutral middleman because everybody is trying to manipulate the search results. If everybody is clickbaiting their page titles, and Google just displays them as is, it makes their product worse.


The solution is not to re-title, the solution is to de-rank clickbait.


Well nowadays a lot of well known websites use clickbaits regularly, e.g., wsj and NYTimes. Many times, they are willing to summarize the news in the title when the news itself is not that complicated.


I'm sure they'd change to better headlines to avoid getting downranked.


That's assuming there aren't click bait false-positives based on page title.


You step away from neutral as soon as you introduce "version of quality". There will always be an introduction of bias and judgement calls that need to be made to get useful results, especially because bad actors on the web are part of the geography that aren't going away. Just like the press trying to force a neutral "view from nowhere" leads to confused and problematic journalism that can be exploited by bad actors.

https://pressthink.org/2010/11/the-view-from-nowhere-questio...


Indeed, quality/bias/judgement - I wouldn't argue about it wrt 'going away from neutral', I just meant that if a decision is to be made, either de-value it or show it in the top results, either way don't tinker with the information as it was laid out.


I agree in theory for SEO mills... but it can apparently go a bit overboard!

Concrete personal example:

- Title shown by Google: "Policymaking Beyond Corporate CEOs and Partisan Pressure"

- Original title: "Towards Platform Democracy: Policymaking Beyond Corporate CEOs and Partisan Pressure"

Rather large difference!

More details in another comment: https://news.ycombinator.com/item?id=30087485 , but search term is just "platform democracy" (2nd result)


For the same reason they extraordinary amounts of time to "carefully craft" the content of the page? And the images, and the citations, and the links, etc. For the sake of quality.


I think I see where you're coming from, but come to a different conclusion.

If you are, rightly, disappointed about low quality results in SERPs, then why not direct your frustration at Google's search algorithm? But ultimately once the algorithm has decided what to return, I don't want any of it to be tampered with. Maybe there's an argument that once you're using a black box, it might as well be the best black box it can be, but I don't agree.

I wonder whether there is a case for legal action here. Google would not have wasted time developing this rewrite engine unless it had an effect on clicks. Whether that is positive or negative, only they truly know. What if it was found that it was, or wasn't, being applied consistently to the results of their competitors, but not their own sites, for example?


Google isn't doing this for the user, they are doing it so Ads are more clickable than organic search, they want people clicking on Ads. I can guarantee they won't rewrite the clickbait ads written by marketers who are paying for space. The result is ads are more likely to be clicked

100% of the above the fold content is now ads on many search terms, Google is doing everything they can to squeeze more ad clicks, not provide the best information to their users


"How to growth hack your old website after reaching market saturation"


Some titles of the past before they were optimized for clickbaitiness:

Omelas, bye-bye (The Ones Who Walk Away from Omelas)

Things are looking up (Great Expectations)

A crying cop (Flow, my tears, the policeman said)

The one that got away (The Old Man and The Sea)

on edit: I expect someone will point out those are the names of works of literary fiction not webpages, but obviously if we assume that webpages do not deserve the kind of respect we would give a creative work in book form and not change the title because it suits our needs, then we should not spend all our time complaining that the content of the web is just lousy stuff that nobody would care if you changed with an algorithm.


As a user, I'm fine with Google counteracting this.

The problem there is that "optimizing for clickbaitness" means "making the titles as appealing to click on as possible when they're displayed in search pages". Google deliberately making them less appealing to click on means Google are reducing the effectiveness of organic search results, and that favors adverts instead.

In other words, what you are saying is that you believe it's valid for Google to rewrite website content to make search page adverts more appealing than the actual search results.

That is very hard to justify. If Google wanted to 'punish' sites for being too clickbaity then they should drop that site's position in the search rankings. Ranking it highly but rewriting the title to be something worse (or 'less clickbaity') is a massive abuse of their search market position to favor their ad business.


Specially when the article ends with:

> Want to optimize your titles for increased traffic?

> We built a title optimizer to take advantage of the outsized role titles play in SEO. Free to try.

Definitely SEO gaming.


If this were being done by a person, I might agree with you.

But it's not. It's being done by an algorithm which was carefully crafted to improve someone's chance for getting a promotion. It won't be maintained long term, yet it will continue to punish articles based on wholly arbitrary, biased, and opaque logic.


If this were being done by a person, I might agree with you. But it's not. It's being done by an algorithm which was carefully crafted [by a person] to improve someone's chance for getting a promotion.

I made a little change there. Algorithms don't just magically appear like leprechauns and unicorns.


Google search is one area of Google in which this big company problem actually doesn't happen that much. Changes in the algorithm are never implemented by fiat, Google employs raters, and performs blind experiments to test if a change to search actually improves user satisfaction before rolling it out to everyone. So at least they must have some data that it increases user satisfaction, both with the metrics of the signal they measure, and with subjective raters satisfaction.


I don't get this, why are you ok with bots changing your content, even if it's to be displayed on Google SERP?

Why stop at the titles?

I have an idea, let's have bots rewrite the content in a compact tl;dr format and have it be directly displayed on Google SERP, as user, the less actions I take the better right? You don't even need to leave the SERP.

Why can't I just choose what title I want in my blog to be indexed, and if Google wants to penalize it, so be it?


> let's have bots rewrite the content in a compact tl;dr format and have it be directly displayed on Google SERP, as user, the less actions I take the better right?

Fuck yeah! I’d pay monthly for a search engine that does this consistently. Google already does this for the articles that are easy to parse, but I’d love to see what newer methods based on language models can do.

Btw this article is talking about the <title> tag which is mostly used for SEO since users don’t see it on the page. I don’t think search engines have ever cared about it all that much.


And every site has different motivations.

How many times do we bicker about titles that make no sense / are deceptive on HN...

The whole situation is a mess.


"I'm fine with Google counteracting this."

The ministry of truth. Google shall own all truth.


They should de-rank clickbait websites, as many of them qualify as webspam.


>As a user, I'm fine with Google counteracting this.

Would you be fine with Google changing the work of all authors? Maybe "The Brothers Karamazov" doesn't get enough clicks and Google decides it needs a better title. Or "A Portrait of the Artist as a Young Man" doesn't quite convey what Google thinks it should...

How is that different?


To be fair, The Karamazov Brothers is arguably a more natural English translation.


It's perfectly cromulent English.


Should Google adjust it then?



While the Zyppy article is interesting in that it has statistics about the title rewriting, the Google guide on writing proper titles is more relevant to all of us who maintain websites affected by this. Thanks for linking it.

The ideal article would be something like "Google rewrites your titles in search results because your titles suck."

The Google guide does well to explain why some titles are rewritten, such as having duplicate titles across multiple pages, making it impossible to differentiate between pages that show up in the same set of search results.


In other words, Google's policy is that the search result isn't showing the page title, it's showing Google's short description of the page. If Google thinks your page title is an adequate description it might use that, otherwise it will write its own.

(edit: and Google has enough self-importance to advise you to write your title as if it were a short description, to make their job easier)


TIL that indexing and crawling are different, and robots.txt prevents Google from crawling but not indexing.


> Takeaway: to dramatically decrease the chance of Google rewriting your title, matching the H1 to the title tag seems to be an effective strategy.

Of course it should be mentioned this wont last if it becomes popular. Historically every time an SEO trick gets popular, the rules are adjusted. Even having this article on the front page of HN might be enough to see Google react by rethinking how (or whether) tags in titles affect the title rewrites.


I wonder if Google is going to try out "AI"-generated titles that are directly summarized from the page content by machine, treating the page title and headings as inputs to the model.


Next step, an AI to regenerate the contents according to what the AI thinks I should have said. </s>


Problem solved WRT copyright issues relating to news articles. If the AI derived content (a la GitHub copilot) is deemed as original "unlicensed" content, no reason to force users to visit the website. (it's been a while since the news media and Google had their legal battles, and I'm unsure what the end resolution was then)


artificial intelligence (or at least their corporate puppet masters) are fighting for copyright law protection on insights the AI derives from reading copyrighted pages and content on the internet.

a robots.txt can keep you safe


I would be surprised if that's not the case already!


One think I don't see getting discussed in the pros and cons is the simple fact that you can't even tell what titles have been rewritten. Google gives no information in the search results to tell you what is original and what they've rewritten. This matches other trends like how it's become ever harder to discern sponsored ads from organic search results.

I used to love Google for how it presented relevant results and made it easy to discern sponsored ads. Today, I avoid Google products like the plague. (I can't escape all of them, but I'm about 90% off.)


I had this problem recently, was hoping there was a reasonable fix but it appears not... (the H1 already contains the title)

I don't think about SEO, and just focus on useful writing / societal impact. However, I recently discovered by accident that I ended up with a top 2 search result for "platform democracy": https://google.com/search?q=platform+democracy .

But the title is missing the first 3 words—including the key words "Platform Democracy" — so that if I was a random person aiming to learn about the concept, I would likely skip over the result! (I almost did even though I wrote the piece!) This seems not ideal for either users or Google, and also an interesting exploration of AI/NLP impacts, so I tried to dig a bit deeper.

I had a brief exchange with Danny Sullivan, Google's public @searchliaison on it on Twitter (https://twitter.com/metaviv/status/1484636387366289413) which linked to two guides from Google on this. Sadly neither were particularly helpful, but will share them here in case they are helpful to others:

- https://developers.google.com/search/docs/advanced/appearanc...

- https://developers.google.com/search/blog/2021/09/more-info-...

(Also plausibly relevant: I have http://platformdemocracy.com/ redirect to the piece. I imagine this might impact search ranking, but I would be surprised if it impacts the title rewriting.)


Author here. Frustrating situation. As the title is long at 84 characters, we know that Google is definitely going to rewrite it. The simplest way is to break it into parts and get rid of the shortest part that still makes sense.

So maybe take

'Towards Platform Democracy: Policymaking Beyond Corporate CEOs and Partisan Pressure'

And 1) condense it and 2) lose the colon

'Platform Democracy is Policymaking Beyond CEOs & Partisanship' (60 characters)

If that is too condensed, you could try a a short title in the <title> and a longer title in the copy.


Your Google result is actually "Towards Platform Democracy: Policymaking Beyond Corporate CEOs and Partisan Pressure"


Zyppy's content marketing efforts aside, this wouldn't be so much of an issue if Google was any good at it

But as with its meta description rewrites, they're often worse than what was there to start with, and in some cases completely change the meaning, to the detriment of searcher experience


Thinly veiled content marketing for Zyppy, complete with CTA at the bottom, and mentions of themselves throughout, including: "Fortunately, here at Zyppy, we have a large database of titles thanks to our title tag analysis tool. Armed with this data, we set out to determine how often Google rewrites titles and the scenarios which trigger this behavior."

Furthermore, "HTLM" instead of "HTML"? Needs proofreading. Lol.


Pieces from Cloudflare and even Google themselves are posted here all the time, and have CTAs at the bottom.


And you bet I flag them.


Your point about proofreading seems fair.

Pretty much any company producing blog content is engaging in content marketing though. I’m not sure I understand the criticism. Perhaps this particular piece was overly self-promotional?

Sure, there’s a balance to be struck, but I thought the article had some decent takeaways.


Are you implying that what they're saying isn't important or invalid? Or, what is your point?


I'm saying that we should expect better from HN, and the people who frequent it. Otherwise, it's just a poor alternative to Reddit. If I wanted content marketing that nobody could even be arsed to have proofread then I'd go elsewhere. The ever-decreasing standard continues.


Quoting the last paragraph of the guidelines, which has a bunch of supporting hyperlinks:

> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.

I dislike that they’re an SEO company, but I don’t object to self-serving posts, as long as they’re also curious and interesting to me. Seeing a comprehensive list of how Google rewrites page titles is interesting to me, because I’m fascinated with headline writing.

I’m sorry that you find self-serving content and calls-to-action to be problematic, but I would warn you to expect more of them on HN over time (as neither are violations of the site guidelines). There’s no need to claim it’s going to turn us into Worse Than Reddit. It’s been this way for HN’s entire life, or at least the chunk I’ve been present for. HN’s just the same as it always was. I respect your outrage but you chose an invalid expression of it.


This isn't outrage. This is my opinion. I'm outraged by many things, some bullshit on HackerNews and your opinion of my opinion does not bother me. Have a nice day.


> Otherwise, it's just a poor alternative to Reddit.

There are plenty of great subreddits. Every time I hear reddit used as a put-down it feels very snobbish.


Sure. Likewise there's far more terribly moderated ones than good ones. OK, "the average badly moderated sub-Reddit". My opinion remains.


"SEO" is pure marketing whose sole purpose is to sell you a worthless product. The article says nothing you will ever need to know and it, and all other "SEO" "content", should be blocked as spam


It's quite funny to me that people are running these kind of almost scientific experiments on a fully human-generated and in principle knowable system. The reason are understandable of course but it does seem like a waste of human energy.


You see this a lot in gaming too. There are entire sites, devoted to figuring out what various weapon attachments actually do in Call of Duty. If you poke around the Minecraft wiki you'll find the same thing - people working out exactly how fast you can move with different potion effects or how scaling works when leveling up enchantments.

Theoretically, all of this information could be found in the source code, but without it gamers are left to an endless research project.


I'm not convinced that having the source code is necessarily a perfect shortcut to accurate results. Video games, in particular, seem to be subject to a decent amount of emergent behaviors such that scientifically measuring things is honestly probably a better option than trying to read the source code to find out what the developer thinks should happen.


At least in the case of gaming, I think (some) people actually enjoy this aspect. It's a waste in a lot of systems, but in an "art", I think it can elevate the experience, at least for certain games and genres.

An interesting inverse of the norm is the Roguelike ADOM. Most similar games from the same time period like Angband, Nethack, and DCSS were open source, while ADOM was a free, but proprietary game. The other games' secrets were open-book, with no real secrets to speak of as the source code is scoured by players. ADOM remains sort of interesting to me as there are red herrings in even the machine code to throw off reverse engineering, and genuine secrets that open source games simply can't have. I've always appreciated that you can't simply look at the source to know everything, anyway.


You are certainly right, there's a certain appeal to the mystery!

I remember reading an article on the Minecraft wiki about how to achieve the slowest possible movement, which is of course a totally useless thing to do in game, but you could see someone had put a ton of thought into working out how to do it! And who's to say that your slow machine is less an expression of artistry than playing the game "right" and building castles!


Minecraft at least is effectively open source; however, many of the quantities being measured are indirect consequences of the physics engine which would be difficult to derive from the constants in the code.


Agree. Then you imagine that the entire SEO industry is basically based around the idea that a company has a algorithm only they know, and the industry is trying to reverse-engineer that algorithm. If they released a whitepaper describing exactly how it works, the entire industry would have to change their ways to consulting already public information instead of experiments like this.


If they released a white paper explaining how it works, the search results would have even more spam than they already do.


Interestingly, they are likely to find things that the developers themselves don't yet know.

These systems are large and complicated and time is finite. When it comes to analysis of a written system, there's a lot more time free-floating in the global network of users than there is in the group of a dozen, maybe a hundred, developers who wrote the engine (many of which have immediately been re-tasked to write something else).



Great systems view, that's the general basis of cooperation vs competition. we keep some things secret, stimulating other people to expand energy and think creatively instead of doing it for them. It becomes wasteful when the energy required to produce new information and techniques is impossible to obtain. e.g. in massive inequality: a homeless person just can't gain the skills to obtain a job, or an oppressed population can't overcome the excitation energy needed to free themselves. It's also the reason we outlawed monopoly in the U.S., only to reach the local minimum of duopoly.


OP is an SEO company. Wasting human energy is what they do.


Haven’t we been doing similar thing with for example stock market analysis when we analyze a company’s earnings/management etc?


it's like calling adversarial ML a waste of energy. we use this approach for the problems where we want to preserve a lot of variety in solutions


This is tangential, but it could similarly be used to describe many support teams which staffed by non-technical folks, or are cut off from engineering for cultural, political, etc. reasons. It's a complete waste of energy, but for various reasons people get put in these situations and experiment instead of talking to an authority in another department, or getting an expert on their team. It can be sustained for a surprisingly long amount of time as well before someone gets called out on inaccuracies.


Similarly, there's also "research" being done to decipher and understand Apple's hardware and software. It does seem like a waste of human brain cycles.


Ironically, they failed to organize the world's information and make it universally accessible and useful.


The opaqueness of human systems is a real issue. It basically describes 99% of issues in the workplace, and those are systems in the small.


The waste is the point. One might as well wonder why I keep my password secret and force hackers to break it.


> but it does seem like a waste of human energy

Legal processes are enormous wastes of human energy on what are usually negative-sum games.

In only humans could cooperate.


> Legal processes are enormous wastes of human energy on what are usually negative-sum games

I don’t know if this is true. Private legal disputes can be purely antagonistic.

They can also be a form of short-term adversarial long-term adaptation to new information. Court cases, on the other hand, produce precedent (irrespective of the legal system). That, too, helps guide a society through novel circumstances.


Google has done this for practically as long as I can remember. If you remember when dmoz was still a thing, Google would favour the title from that, rather than the site's actual title because it perceived it as more useful to the user as it was moderated. By now I would expect that Google has used this and real moderators to train their machine learning model to rewrite titles, perhaps as a way to, you know, hopefully make the product more useful.


Speaking of rewriting titles... I noticed that HN reworte a self post title a few days ago. [0]

Why is HN editing self post titles?

[0] https://news.ycombinator.com/item?id=30053890


Mods frequently rewrite submitted titles, either cos it's not the same title used in the article, or because there's a better wording for the HN crowd ¯\_(ツ)_/¯

Check dang's (HN mod) comments:

https://news.ycombinator.com/threads?id=dang


This title edit was not for an article, which is what confused me.

A user submit a self post, a rant basically which got popular, and the title was edited hours later.


It's totally ordinary for a post to go front-page and attract hundreds of comments on the basis of a title that's later deemed too interesting/editorialized/whatever, and you later see some dry uninformative title like "Google account security" or whatever it was occupying a top slot.

I can't say I've noticed it often before for text posts, but I do generally think this pattern of closing the barn door after the horse has bolted is pretty silly in general.


It's just wrong to change the title of a self post imho.

Without making it publicly known at least... you're changing what the poster intended to say.

Editing sensationalised headlines back to sanity makes perfect sense though.


This article has been really badly proofread (or probably not at all)

- "HTLM"

- "includ"

(unmatched parentheses


https://ghostarchive.org/archive/54mNm Archived this a few days ago since I knew they'd fix the typos quickly.


(An unmatched left parenthesis creates an unresolved tension that will stay with you all day.

https://xkcd.com/859/


\(\\\\

Escaped left parenthesis and two backslashes, or a cross section of a phalanx?


)


(⁽₍⟮⦅⸨﴾﹙(⦅


POV: your about to learn the most esoteric LISP yet


A LISP-like language that only used various left-brackets sounds even worse than the whitespace-based programming language.


Common Lisp isn't that esoteric!

    (defun fun-reader (stream arg)
      (declare (ignore arg))
      (read-delimited-list #\⸨ stream t))
    
    (set-macro-character #\⸨ (get-macro-character #\) nil))
    (set-macro-character #\( #'fun-reader)
    
    (defun square (x⸨
      (* x x⸨⸨
    
    (square 10⸨
    ; => 100


"Hello World" sample in some esoteric language?! :-)


These characteristics all seem like Google attempting to combat SEO-oriented spam titles.


...by making the titles more enticing for users to click on. Doesn't seem like that great of a solution. I would rather they went the other way by prepending "[possibly spam]" or "[possibly clickbait]" to the title.

There's an arrogance to this whole process that amuses me. I can picture the dev team responsible for this code sitting in a meeting with the lead dev saying "it appears that some idiot web authors are using titles that have extra information placed in brackets! Idiots. Well, let's brainstorm possible solutions to this problem so we can protect our idiot users from this obvious menace..."

Leaving them alone and giving us back pagination of search results would solve this problem for me. Or they could demote these sites that they think need their titles rewritten in their search algorithm.


The issue isn't that Google search results exclude stuff that was in the page title.

The issue is that Google search results insert stuff into the page title that wasn't there.

So the issue isn't that the overly long + pipe

    <title>Which programming language is fastest? | Computer Language Benchmarks Game</title>
is abbreviated. The issue is that the domain of the hosting service is inserted, which gives the misleading impression that this is a project in-some-way approved and promoted by the Debian organisation:

    "Which programming language is fastest? - Debian"
:when it would be better just to snip:

    "Which programming language is fastest?"


HN rewrites page titles too.


20+ years ago also used the meta description tag from a page instead of page text snippets. We're many decades past blindly accepting page author provided content as being the most useful thing to display. People keep thinking of Google as a search engine greps pages to find matching text. That is old/obsolete thinking, any Google-like services has evolved how directly it can return the information/answer you seek instead of returning a page that may contain the information/answer you seek.


What I find fascinating is I’ve seen a small, but increasing, subset of results where:

- the result title is clearly not original, usually derived from content on the page

- the original title is known to be generated

- the original generated title is as close to harmless as any web content could be

- the result title is actively harmful and misleading

- the original title is demonstrably better

- this idiosyncrasy is applied to very high trust hosts (eg GitHub)

- it’s not applied to the same content from obvious scraped content/spam/scam sites with obvious tells


Good. Google's interests more closely align with my own than page authors. I'm glad to have Google as an agent working for me to make more useful page titles.


Surely the misconception is the belief that what google displays is the page title. Google displays a link to a page, with a short description of what you will find there. Likewise, when I link to a page from my page, I don't use the title of the page: I use some text that I chose. This a non-story, as far as "rewriting titles" goes. What is interesting is that Google has an automated way to briefly summarize a page.


Every day my desire to be able to rate sites relevance after a search increases. And I'd love to be able to choose if the original or Google generated title was the most relevant. (Cmon there is some machine learning training potential in that).

Rather that than ditching google search completely which is getting closer every day.


On modern browser, the page title is almost completely obscured. It’s not a thing users generally see, and the few views where users have access to the page titles that are not some sort of developer tool, the title is more often than not cut short.

I don’t see why google has to use the page title as a headline for a link result.


Something has to be fishy with this because I get tons of "Untitled" results now which directly lead to spam. This sucks big time because I usually got really good results since I search a lot coding related things and now I cannot use this account anymore for searching


https://web.archive.org/web/20220126145329/https://zyppy.com...

Since people are reporting failure to load.


This data is based on what's seen in the wild, right? So if they see text in brackets removed more often than text in parentheses, that could reflect what sort of text people tend to put in brackets vs parentheses rather than (or in addition to) how google treats those characters.


I’m no search engine expert. Is this standard practice at some level across other search engines? Is “retitler” just part of every search engine stack (e.g. DDG, Bing, BraveSearch, etc)?

Or is this unique to the “I’m Feeling Lucky” folks?

Honestly curios.


Google literally turns the internet into a garbage dump. There are so many spam news sites that can come to the fore thanks to their seo nonsense that the sites that provide real news are not even seen lol


I hate those dropdown things that say "How to change gamma values in gimp" and they lead to a YouTube tutorial.

Please stop serving me YouTube tutorials; they all suck.


Ok... owning couple of dozen sites all submitted and fully indexed, I have never seen even single URL re-written. Does this really happen?


Google claimed that the original <title> is used "more than 80% of the time" when announcing the change[0].

Combining this rate with the rate seen by the article (rewritten 61% of the time, on the subset of 81,000 URLs they were interested in), I'd guess that some websites see a lot of rewrites and many other websites see none at all.

[0] https://developers.google.com/search/blog/2021/08/update-to-...


So an interesting distinction here is required! When Google says they use the title 80% of the time, they mean they use the title 80% of the time to create their search result title, which they may or may not modify. The other 20% of the time they use an H1 or other elements on the page.


Yes, happens 61% of the time.


Unless the website is public domain or license to freely remix, isn't Google violating copywrite law by creating a derivative work?


Hackernews rewrites many post titles


Including this one, ironically.


Oh I missed the memo where we added Google to the list of things Hacker News loves to hate on.


Has anyone else started seeing results with titles as “Untitled”?


tldr: Google sometimes uses headings instead of titles. Match them to prevent title rewrite; stop using long, verbose titles


From a pure HTTP perspective, isn't the point of page titles to be how a page is referenced? It would be an error if a library reported the title of "A Tale of Two Cities" as "It Was The Best of Times".

> stop using long, verbose titles

This is good advice, but if Google wants to penalize bad titles it should dock their rank, not misreport them.


> isn't the point of page titles to be how a page is referenced

It is, but what would you do if all titles across pages just said "ACME Corp."? That happens often if the developer just displays SITE_NAME in the title.

In those cases it makes sense to present the person searching the web with additional information from a H1 tag which probably has more information like "Contact us"


No thanks, as a Google user, I’m happy that Google is descriptive.

Ideally, Google tells me what the page actually contains. I.e. if you title the page “Top TVs of 2022” and you’re reviewing cars, then it titles it appropriately. Google can’t do that right now, but every step closer is a good thing for me.


There's lots of "isn't the point of..." in HTML that actual users have broken. Google (and other crawlers and intermediaries) have to adapt their algorithms to account for that.


> Google (and other crawlers and intermediaries) have to adapt their algorithms to account for that.

As I see it, Google's in a prime position to algorithmically reward actual users for better HTML discipline by ranking them above users who can't be bothered.


The Search Console is great at knowing if your site could use some improvements. They could easily[a] add a mark for “bad titles”.

[a]: “easily” because they already have logic to determine something needs rewriting


HTTP actually has nothing to do with page titles. I think web browsers should probably display the titles verbatim, but there may be use cases where they don't, a common one being where there isn't enough space so the title is truncated in the UI.

As for what search engines should do with page titles, it's really up to the individual search engine, I'd say. Whatever serves their users best.


As a search engine developer I totally get why. HTML in the wild is not well behaved in the slightest. People use title and heading tags in all manner of weird ways. I've seen <title>-tags in the <body>-tag used as headings. I've seen documents where every line was a <h1>-tag.

You kinda need to make the most of what you're given.


HTML5 has been around for long enough that we should be able to punish sites that use completely bonkers markup at this point right? Since Google effectively has historical archives of the internet they could pretty trivially grandfather in legitimately old content (things they tracked before some date) and just start down-ranking sites that continue to misbehave with markdown but skate by with browsers running in compatibility mode. Something like abusing <h1> tags is legal, if obnoxious, HTML and so it shouldn't really fall under this... but it's been long enough that we can start punishing completely incorrect syntax right?


That would be a massive loss, though. A lot of content isn't in HTML5, and a lot of that pre-HTML5 content is precious and valuable.

Google has sadly already tossed a lot of that by the wayside, since it often isn't served with HTTPS. I think something like 80% of the sites my crawler is aware of serve pages over plain HTTP.

In general, attempts at shaping the web through search engine indexing requirements seems to mostly serve to filter out content made by humans and select for search engine marketing.


Not so sure older content (like the stuff I wrote in the late 90s to mid 00s) would be negatively impacted, so long as search providers pay careful attention to the <!DOCTYPE> tag (or lack thereof). I wouldn't characterize holding people to at least a bare minimum of standards (e.g., title in the head and nowhere else, which has been the rule since at least HTML 2.0 in 1994) as "punishment", any more than dinging them for unclosed parens and other typos. Language is how we communicate understanding, and markup is how we frame presentations on the web (mostly). People need to be prepared for the consequences of making it up as they go along rather than educating themselves on the standard (whether spelling, grammar or markup language).


That really doesn't seem to be what I'm seeing, having built a search engine specialized in this type of content and finding almost nothing but gems in the refuse.

If anything, it seems like the single best predictor of whether a website is a content mill is strict adherence to modern web standards and other "google rules".


I think it'd be a pretty good to let in historical stuff on grace - and just start penalizing new content. Google absolutely has the tools to do this the right way and the internet archive could allow most other folks to accomplish the same thing.

Enabling HTTPS is easy on most platforms. Folks that have rolled their own platform or got unlucky and are using a CMS that fell out of favor do tend to get screwed over by this - but I think its fair to de-prioritize content that fails to adhere to good practices. The HTTP vs HTTPS debate in particular can be a real security concern - with tags its more about paying down the tech debt in our browser technology.


I really wish browsers would stop shrugging their shoulders at bad markup and display blank pages with errors in the consoles or even visible in the rendered page. It would force devs to clean up their act. But as long as 1 browser vendor doesn't do it, the end users will all just assume the strict browser is broken since there is another browser that does "work".


On the website of the company I work for, the title is "tagline | company name" but in the search results it shows up as "company name: tagline". That style doesn't appear on the company website anywhere.

I imagine it's Google trying to normalize how things are shown but it's quite annoying. It could potentially break some company's branding.


Shrug. The almost-religious belief in the necessity for ultra-consistent branding within some companies is nearly comical so long as you're on the outside.


Sadly agreed. Definitely not comical when I have multiple times had our marketing department blame/throw fits at the dev team for the site not showing up in Google's search results exactly how they want it to.


To be fair to the devs, that's an education gap. The response should be "You want us to develop a solution to a third party's whims? Maybe you should try writing them a nice letter about how their representation of our company affects our image; it'll have as much impact. Possibly more."

In real corporations, of course, that's not how it works because the tech people are "wizards" and Google is "part of the wizard stuff," but this isn't a technical problem (and maybe marketing needs to stop trying to control another company; that's no more likely to succeed than Coke yelling at Amazon that they don't always put Coke products at the top of every search result).


It's relatively minor for most businesses, but sometimes it isn't. Inconsistent messaging makes it a lot easier for someone to set up a phishing attack against your customers. My bank uses several different URLs, email sending addresses, and taglines for its services. It's not always easy to tell if an email is actually from the bank.

Google adding more permutations into the mix doesn't help.


Google changing the way title tags are formatted on their SERP is not the reason that your bank's customers are falling for phishing attacks.


Of course not. I didn't say it is. It's a whole bunch of things. Google changing things is one very small factor.

But it is a factor...


Related, but I dislike when I'm bookmarking a page and the title is one word - the name of the product or the company. It makes it hard to search for it later.


Specifically, make your title and H1 match exactly and aim for a character length of around 51-60.


Not everything needs a length of 51-60 characters. Instead of "Home" use "The is the starting page for this website" ;)


Site refuses to load for me, hug of death?


Oh, now I understand.


Fascinating. Such a great study!


I totally get this. Back in the day when I was a kid, we went to the local library and read about the world. When the librarians weren’t serving me by “checking out books” to me, they were busily putting new and improved titles on the books in receiving.

/s

Seriously. Google is starting to feel less like the librarian of the net (we index the world) and more like the Truman show: we craft your reality.


It’s the ads. The way Brin and Page phrased it in their 1998 paper, they considered ad-oriented search engines to be lower quality. They were going to be more academic. They thought that there was lots of user data to mine in search…for academic purposes. Then innovation #2 at the actual startup was the ad auctions and that was the beginning of the end, all the way back at the beginning.

I’ve recently read a lot about hedge funds, and it’s astounding how many scientists literally say, “I don’t think hedge funds add value to society, I wouldn’t work there.” And then the firm slides this check across the table, and they didn’t even realize a single check could have that many zeroes, and they join the firm and stay forever. That’s what happened with Google and all the rest.


Agreed. The industrialization of ad tech has been a loss for humanity. It’s a runaway mechanization at this point.

What I don’t understand, is why we don’t tax it. If an industry generates lots of wealth, but has a questionable impact on society, the “f(r)ee market” west’s response has usually been to throw a stiff vice tax on it. It doesn’t make the vice go away, but it puts a governor on its excess and redirects some of the spoils for projects which hopefully are net positive.


Doesn't the tax usually come when the consensus about the societal harm carries more weight than the money produced by that harm?

Or at least enough weight to be competitive. Sin taxes have a way of permanently tying the sin (at some reduced level) to the general budget.

I don't think we're there yet. People can get plenty mad at "tech" without connecting the ad-tech dots.


Well, you should try to establish the societal cost of the negative externality and then tax at that level. The idea isn't to destroy the thing but to make it's price reflect its actual cost

Edit: "then cost" => "then tax"



It's the exact same thing with almost every "technology" company out there today.

We're sinking our best and brightest (and also plenty of perfectly useful and adequate) talent into getting people to look here, buy something they don't need, or press button.

It's comically to contrast that with the same people who pretend climate change is an existential crisis. Meanwhile, so many scientists and engineers idealistically interested in that, leave for software-related subjects where they'll make 10X the money making the problem worst.


One obvious solution is to pay them more.

If you're not the principal investigator, a NIH grant will pay someone with a PhD + 7 years of experience...$65,292 with pretty weak guarantees on job security (etc).


"Then innovation #2 at the actual startup was the ad auctions..."

I don't think Google quite invented those, GoTo/Overture invented ad auctions and pay-per-click, but missed out on patenting them. Google did improve on the idea, with the second-price auctions.

https://slate.com/business/2013/10/googles-big-break-how-bil...


> we craft your reality

As mentioned above. It's also the AI.

Ads are not the fundamental problem. The fundamental problem is tracking. More on that here and about search: https://www.mojeek.com/support/ads/


Ads are a fundamental problem. They skew the incentives.

The search engine could, for example, give semi-poor results, making the person search again, increasing ad impressions.

An ad-supported search engine would also prioritize pages with ads that are also conveniently sold by the search engine.

As a user, I want a search engine to give me the best page with the fewest searches. An ad-supported search engine wants me to view more ads and click on them. Those are, if not orthogonal, often in conflict.


>An ad-supported search engine wants me to view more ads and click on them

Is this really the case? Assuming pay per click model and rational and competent advertisers. More clicks would increase their costs and reduce the generated value per click. The advertisers would limit their maximum cost per click. This would limit the revenue of the ad-supported search engine to the previous level (from before introducing bad search results).

It is possible that more clicks (generated by tricks and bad search results) produce more revenue for the advertisers. This would (slightly) benefit both the advertisers and the search engine.

In the end and in the long run the incentives of the ad supported search engines are alligned with their customers ( advertisers ) if the above assumptions are met.


Hedge funds are totally over. What are you talking about?

https://www.investopedia.com/managing-wealth/hedge-fund-over...


So you are talking about a different source, than the one you linked?

Because this is the summary of the article.

"Is the hedge fund over? It's difficult to say."


The idea that ads affect Google's search ranking just isn't true. There are purposeful barriers between ads and search at Google to prevent this, such that the ads team can't even file bugs with search.


> the ads team can't even file bugs with search

I don't think it would happen at the low level of engineers filing bugs. It happens at the highest levels of management, where the main concern is corporate profits.

Even if there's no explicit cooperation or algorithmic link between the ads and search divisions, everyone on the search management team knows that search is a huge, expensive operation that makes no money on its own. Advertising is what pays for their salaries, bonuses, operating expenses, etc., and you can bet that they make their executive decisions accordingly.


There’s a grain of truth here, in a bit of a tangent: librarians classify all books using a system like Dewey Decimal or Library of Congress Classification.

While not adjusting titles, librarians do have some influence on how a book is classified and thus filed/organised within the library. Check out the wiki article on Dewey[1] for the various options for homosexuality, which has numbers for it including under areas including mental illness! Depending on the library systems leanings you may still find it there or the section for sexual disorders or hopefully in the sexual relations area. (Disclaimer: I just used this as an easy example because it’s on Wikipedia)

1. https://en.m.wikipedia.org/wiki/Dewey_Decimal_Classification


The Dewey Decimal classification system is ridiculously flawed, and no self-respecting library uses it these days (unless it always has, and hasn't got around to re-organising). Even my school's little one-room library didn't, something I found annoying at first, but came to appreciate.


Disagree that it’s ridiculously flawed. It has issues like any system, but it still works well the majority of the time.

> no self-respecting library uses it these days (unless it always has, and hasn't got around to re-organising).

The vast majority of library systems have been around long enough where Dewey was the defacto choice (or LCC). Just checked a few like the British Library, the French National Library, and all the other libraries I’ve looked up now in London, all Dewey.


"Libraries in the United States generally use either the Library of Congress Classification System (LC) or the Dewey Decimal Classification System to organize their books. Most academic libraries use LC, and most public libraries and K-12 school libraries use Dewey." [1]

[1] https://www.usg.edu/galileo/skills/unit03/libraries03_04.pht...


What are some alternative systems? I'd expect that any categorization system for content needs to make subjective choices.


Library of Congress is the standard for academic and professional institutions, at least in the US.


Those numbers were added as a consequence of the books that needed to be classified in the 1930s, and now that there are books that don't belong in the category there are new numbers.


It still comes down to librarian interpretation. Sometimes they will just defer to another source, like the national library of their country, or the publishers recommendations, but at least in my experience working part time in a library many years ago, the librarians out back doing the processing and cataloguing would refer to Dewey index guides and also make judgements based on the nature of the book (eg mostly practical vs theoretical nature would be the difference between a 6xx filing and somewhere completely different).


> Check out… the various options for homosexuality, which has numbers for it including under areas including mental illness!

Classifying a new technology is another major area where the original taxonomies need to be extended in order successfully index material. The internet, for example, didn’t exist at when LC/Dewey were originally defined.


Last time I was in a public library books about the Internet were next to books about UFOs!


Yes and no. Anecdotally, most of the SEOs and ppl who do SEO "part time" (e.g., ecomm store owner) still don't understand the foundation of modern SEO.

1. Google doesn't care about the sites. The sites aren't Google's customer.

2. More importantly, the person doing the search is the customer.

Unfortunately, most sites believe SEO is about them. They can improve how they present themselves but the "transaction" is not about them.

Google, serving ads aside, needs to maximize customer satisfaction or run the risk of losing a customer.

It's worth repeating: Google doesn't care about rhe sites.

If Google believes a site's content is a good fit for maximizing customer satisfaction, but the title isn't optimal then it makes perfect sense Google would want to optimize the title, if the title is the "gateway" to a happy customer.

Whether that's right or wrong, IDK. Whether it actually helps, again IDK. But from a pure relationship / business perspective it makes sense.


1. Most SEOs don't care about what Google cares about. The sites are their customers.

2. SEO is about the sites and Google and other sources are just that: means to an end.

It's all a matter of perspective.


No actually it's not. You're (gravely) mistaken.

The customer is the person doing the search. The sites aren't viewing the ads. The sites aren't clicking on the ads. *That* is Google's #1 source of revenue. Full stop.

And thus, as originally stated and supported, too many practicers of SEO, with that wrong lens, continue to misunderstand Google.

Put another way, Google doesn't change the title for the benefit of the sites. There's simply no biz model / source of revenue to support that idea. None.


Given how site owners habitually attempt to distort reality with tag stuffing and other bullshit metadata, what do you expect? Reality is not what is printed on the tin.


Should I ask Walmart to kindly start relabeling products on their shelves because what’s on the tin is rarely as good for me as what the maker purports?

Maybe that’s what we need. An FDA metadata label for every website served, kinda like the fav-icon, but useful.

- Readable word count (protein)

- Ad count (fats)

- Image count (carbs)

- Embedded script size (the list of nasty sounding chemicals it contains)

- Average data transmitted (sugars)

- etc

Must be shown in black text on white background with a black border. Sorry dark mode guys.


This might actually be the killer app for AR. Reviews of products as you look at them on the shelf.


Rather than showing the actual reviews, just lower the color saturation for lower reviewed products. So high reviewed products would pop in a sea of gray scaled items.

Sounds like something out of Black Mirror, but could be interesting.


I can’t wait for “This product is awesome 5/5 btw I don’t own it” and “My favorite 1/5” in AR


Vivino kinda does that but for wine only. You can scan any bottle with it and it shows you its rating based on user reviews.


Sounds like it might either decrease sales, or increase the manufacturing of fraudulent or shill reviews.


I'd expect Google to downrank sites that are trying to manipulate the system. Not rewrite them.


How could that work out? Low quality sites usually have more juicy ad spots?

More seriously: incentives are stacked against search quality these days. Poor results means more trips into ad laden wastelands, and more returns to the ad laden search results page.

Giving people the result up front and center would directly affect quarterly profit I am afraid.

At least this is the model that makes most sense to me.

The next most probably is machine learning is already out of control and the people who created it left.

Edit: wild speculation of course.


>...they were busily putting new and improved titles on the books in receiving.

The book titles are unchanged (when you visit the site) - this is just the Librarians adding synonyms and/or simplifying titles in their catalog so that it is "Dr. Strangelove" rather than "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb".


That's kind of the fundamental insight here though, isn't it? Carnegie developed his libraries as a philanthropic endeavor, with the aim of supporting meritocracy in society. Google developed their library with the aim of making a shit-ton of money from advertising.

Google never was a suitable candidate for the world's authoritative librarian. Unfortunately, we'll probably need another Carnegie to displace them.


>and more like the Truman show: we craft your reality.

It should be more like an assistant: here are some boring tasks/questions, find out everything about them, summarize it for me and present me my options. I dont't really want to search or find something. I want to get things done, questions answered.


> Seriously. Google is starting to feel less like the librarian of the net (we index the world) and more like the Truman show: we craft your reality.

"Feels"? This has been the reality for many years.


Advertising corrupts. Ad-tech corrupts absolutely.

The reason is the cumulative impact of two things:

1. Algorithmic optimization of results for ad click-through rates, and

2. The scarcity of space for organic results on results pages for queries with commercial intent (because of the large amount of space given to ads). The high value of the clicks on those pages (sometimes $100+) drives marketers to focus disproportionate resources on SEO tricks and gaming to show in one of the few spaces left on the front page.

A search engine with ads and ad-tech tracking cannot work well for consumers in the long term. Google is now an ad-tech company, not a search company. It employs 3x as many people on advertising as search.

[Edit for clarity] It makes sense in this context to programmatically re-write titles to optimize conversion, rather than consumer experience.


Why was the title edited here on HN? The original title is much better, and had more information. A bit ironic given the subject matter.

Dang, was this your doing? If so, can we please have an open discussion on this? It's happened a few times and it's annoying and seemingly randomly enforced. The guidelines state not to editorialize headers but this rule gets ignored a lot. What was deficient about the original title?


My guess is because it was self-submitted, it was held to a higher standard. Fair enough.

And in some universes the title could be considered click bait, although it is accurate in this case.

Moderation is a tough job. You never win.

That said, the revised HN title seems like it was written by bad AI. The point seemed to be to drive it off the homepage. In that, the HN title succeeded.

Regardless, I'm happy the article generated a lot of interesting discussion before manually being deemed unfit.

It is curious how at the same time the title changed, all the top comments (which were generally supportive) got pushed to the bottom. And now, including yours. Assuming this article touched a nerve at the same time someone was having a bad day.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: