> Many site owners find that the titles they carefully craft almost all get rewritten.
Yeah, I'm with Google on this one. I don't see many reasons why a site owner would spend extraordinary amounts of time to "carefully craft" page titles other than SEO and optimizing for clickbaitness. As a user, I'm fine with Google counteracting this.
I think that is the worst reason for them to rewrite titles. If they left the title as-is, then I would be able to see in the search results that it was a spammy site and ignore it. Instead Google is helping to launder their SEO and present it as a more legitimate site. If Google thinks a site is gaming their algorithms they should de-prioritize it, not rewrite it.
Ah, rice. The quintessential Asian grain, now consumed by billions around the world. When I was a child, my mixed-race family used to eat rice every day! Even today, the subtle aroma of rice wafting up from the kitchen brings a sense of nostalgia. It's a sure sign that dinner is approaching, my favorite meal of the day...
Isn't Google one major reason that recipe sites act like this? They've long favoured sites with a lot of textual content (which authors then break up with images) and also penalised sites that people tend to reverse out of quickly? A long story fits that because the majority of people need to read down for the content rather than get their instant answer and immediately retreat.
I find it annoying too, but it often feels like people ridicule the authors when they wouldn't get any traffic if it weren't for that approach. I don't think I've ever searched for a recipe and come across a barebones Gantt-chart-style engineer-thinking recipe plan.
It's not Google per se but a practical impossibility: it has to rank somehow, hopefully as a human knowing the answer would. They could theoretically hire humans to do it but they won't because it's obviously impossible due to how vast the dataset is, so they use software. Software is still far from human level reasoning so they use metrics. Metrics can and will be discovered and gamed, regardless of what kind they are.
As an AdSense user, if you don't use the maximum allowable number of ads (regardless of content length), Google literally emails you to suggest you add more ads. Their documentation encourages you to maintain a reasonable ratio of ads to content at risk of being shutdown, which pushes out page length. They push for unique content (so writers differentiate with personal stories), they measure time on page (longer details, pictures), etc.
Sorry, I might not have been clear enough. I know they exist. I'm saying that I've never searched for a recipe for something and a leading result has been in that sort of format. Google has created the environment in which the maligned 'epic story and photo album finished by actual recipe' formula wins through, yet the recipe creators get the ridicule.
> Google has created the environment in which the maligned 'epic story and photo album finished by actual recipe' formula wins through, yet the recipe creators get the ridicule.
They still deserve it, IMO. Willingly making a clown, a pawn of an ad-spamming corporation, out of oneself by doing one's darnedest to "win through" at some perverse game rigged by the aforementioned scourge of the Internet, is neither a natural human right nor a divine command. Not playing that game is still a valid move, and AFAICS the only honourable -- i.e. the only non-ridicule-worthy -- one.
So if you're super-keen on food and trying to establish a career as a recipe creator or food photographer, it's dishonourable to: put in a lot of effort custom-writing supporting material and taking quality photos of the dish you're pitching to people? Sounds like these things traumatise you! :)
I'd agree that misleading made-for-AdSense sites that purport to, but then don't, answer a question, farm out writing to $5/page content squads and intersperse stock photos - that's shoddy. But all the recipe sites I find in my searches and have to scroll down for the content, they always seem like genuine, personal efforts. If I'm getting the content for free, scrolling a little bit as a price isn't too laborious and a stretch to think of it as dishonourable work IMO.
> So if you're super-keen on food and trying to establish a career
IOW, you're trying to earn money. Sure, go ahead -- but then you get to pay the price. If the method you're trying to earn money by is going to involve playing along in the game of clickbait, then the price you get to pay is going to be, to be seen as a purveyor of clickbait. Which I, and I suspect quite a few others with me, see as distinctly less than honourable.
It's a free choice: Nobody is forcing anybody to "establish a career as a recipe creator or food photographer" on the ad-financed Internet. If they choose to play the clickbait clown/scum game, they're making themselves into -- so, in the end, are -- clickbait clown/scum. I sure didn't tell them to do that, so I'm perfectly free to see them as such for doing it.
They, OTOH, are perfectly free to try it some other way: publishing printed cookbooks in stead of Internet clickbait; or something adjacent, like run cooking classes, start a restaurant or catering business... Or to do something else altogether.
They could always go into the deeply honourable (/s) business of software engineering, which nowadays seems to consist to about 45% of running ad-spam networks, to about 45% of writing SEO crap to get your ads onto those networks, and about 10% other development... :-( What, me cynic? Bah, geroffmylawn!
I have dozens of cookbooks. Almost every single one is absolutely packed with personal details about the chef and background information on the recipe (who taught them the recipe, their beloved Nanna's method, the history of nut x in remote tribal desert y, etc).
I've been to cooking schools in multiple countries. All have gone into detail about the background of the chef and each recipe.
Same with restaurants. Many restaurants and certainly almost every fine-dining restaurant pushes the profile of the head chef.
I can't help that you paid God knows how much extra for this unnecessary fluff. If I were to get any of those, I'd look around for the least extraneous-fluff-y offer I could find. :-)
More seriously: At least the classes and restaurants already push that stuff in their marketing, don't they? So I get all that already doing my comparison shopping, and therefore would probably actually (at least to some extent) resent the time wasted on repeating it. And the few cookboks I (or we, my wife and I) have are also of the matter-of-fact, recipes and nothing more, kind... I am probably just much less of a "foodie" than you. I think my preference pattern is the overwhelming majority.
Note that Clovegarden has "the history of nut x in remote tribal desert y, etc" too -- but on pages separate from the recipes. (As I recall Mr Grygus started the site in preparation for starting a business of selling foodie stuff online after winding up his computing and automation consultancy business -- but that still seems to linger on, and he is nearing [or, probably, well past?] normal retiring age, so I don't know if that new business will ever materialise. But as long as he is up to updating Clovegarden every now and then it remains my favourite site for food-related stuff.)
Reminds me of Plain Old Recipe, a website that strips out fluff from big recipe websites. You provide a link to a recipe, it makes it to the point. I thought the site had closed but it's apparently still live!
Let’s also not forget the 55 auto-playing video ads that I need to vault over to get to Step 1. Each one determined to hijack my mouse as I scroll/hurry past and cause a click! It’s like the world’s least fun platformer game.
You forgot the part where there's a pseudo-recipe after the story that catches your eye but doesn't have any measured amounts, and then the actual recipe later.
I got instantly annoyed by the first few words of this comment, thinking you’d gone off on some tangent about rice… until I saw the last part. Well played!
> laptop with 2 hours of battery life at 100% CPU usage
Is there any laptop on the market that lives up to this. Even top specced MBPs I've gotten from work fall down when you actually use the CPU with compilers and VMs.
My simple M1 mpb 16gb seems to work for almost 2 hours when hammering the cpu. Haven’t timed it actually but I find it astonishing compared to the Dell mess I’ve had to deal with before.
You're never going to guarantee some kind of range on an e-bike. What's the temperature of the battery? Is it mostly uphill or down hill? How much are you going to brake?
And advertising laptop battery life based on the CPU getting pegged to 100% gives meaningless information as its rare for people to actually have their device running at 100% load anyways.
> You're never going to guarantee some kind of range on an e-bike. What's the temperature of the battery? Is it mostly uphill or down hill? How much are you going to brake?
Yeah but testing the e-bike on a track and telling the public it has 30 miles of range based on that is disingenuous.
Instead, go to a city with an average amount of hills, stop lights, and cold weather and give in a go, and tell that number to the public. If it beats that, in their actual city they'll only be pleasantly surprised. Right now you strand a shitton of people because they think they have 30 miles.
That depends on Google being both honest and accurate. Perhaps they have been so far, but my concern would be that a re-written title would cause quality content to get passed over by many viewers as undesirable/irrelevant because some algorithm misunderstood/misinterpreted what it was looking at, or because google wanted to subtly discourage people from content that competes or disagrees with whatever Google is attempting to promote.
In a better world, algorithms would be perfect and there would be a lot of healthy competition in search engines and google would be incentivized to provide users with the best possible results. In our current world Google's algorithm can't identify obvious spam well enough to keep it out of their results and there are no major search engines that haven't been lifting results from Google directly or indirectly and repackaging them as their own, so google has no pressure to do anything but promote whatever is in their own best interests or keep their results accurate and free of spam.
"Gaming their algorithm" sounds like a fancy way of saying SEO. If Google can produce for me a more accurate (or concise) title, it should only help me find what I'm looking for.
Forcing folks to trudge through inaccurate titles – or hoping people know the tells of a "spammy site" title – does not seem a better alternative.
My favorite is when the title sounds like what you’re looking for only to discover it’s a page full of ads and keywords. The original title doesn’t even match.
That causes me to lose faith in google not a better experience.
I think what HN and the SWE community at large has just missed about Google over the last 10 years is that the product is being built for the masses. Most people would prefer if you just rewrote the title to what it actually was rather than having to take on the cognitive load of understanding what SEO even is.
>I don't see many reasons why a site owner would spend extraordinary amounts of time to "carefully craft" page titles
Because I want the title to be concise, but still help people explicitly understand what my writing is about? Because I've already spent a lot of time on the content, to then just slap 'Lou's Wednesday Website Update' as a title? Because, historically, a title is an introduction to my writing?
If Google believes the site is being disingenuous by writing a click bait headline, then they should punish the site by decreasing their ranking, not reward it by keeping it high and rewriting a more fitting headline.
But if the title is spam, and the content is good (this is a big 'if'), the best solution would be to rewrite the title so that it's useful and keep the page at its original rank, based on the content. Ideally, Google would be able to handle all these different cases and just give me the best search results. Now, we all know that's increasingly less true, but in theory that's how it should work.
But “for 2022” is a guarantee that the content is bad if it hasn’t changed in 2022.
And yet, I don’t see how Google can automate checking this. It’s possible to add a couple of sentences about how you’ve not seen anything to change your mind about last year’s recommendations. That may well be true. Or false. How can Google know? It just sees content that has changed. So it has been updated in 2022.
The bigger issue is brand trust (as a reviewer brand). The NYT bought Wirecutter, I think, because it had established itself as a trustworthy brand. That’s in direct line with the reputation the NYT wants to have as a whole.
I hate how true your second paragraph is. Google should punish sites that change the date without updating the content, but all the SEO spam is just going to automate changing content when it changes the date. And then what does Google do? Figure out how to make an AI that can understand all the indexed content and accurately determine if it's truthful?
That seems fundamentally impossible without defining trusted sources. But then that means that you're trusting that Google's trusted sources are good. And if you do think they're good, then why not just check those sources directly?
The only answer I have is to find your own sources that you trust and go to them first.
I'm not convinced, in general I don't like this additional layer of "fiddling around" with the original contents.
What about the opposite, the title being great but the contents not really? Shall Google serve its own "improved"/"summarized"/whatever version?
Meh... - this reminds me of the snippets of text extracted by some websites that are sometimes shown directly in Google's results, which in my case were sometimes wrong because they didn't take into account the context of what was written in the original contents.
Wouldn't it be better for the users to penalize the sites ranking instead hiding the fact that the result is your usual click bait drivel? Rewriting the titles just hides that the results Google found are low quality garbage.
I hate clickbait as much as the next user, but using that technique to get users to click appears to have even become part of the core business model of previously prestigious outlets.
As a subscriber to several newspapers, it's always interesting to see how different the headlines are between the dead tree editions, and the online versions — even for the same story.
The dead tree headlines are almost always very factual and to the point. I don't think I've ever seen anything close to something like "Here's four awesome tricks to get China to admit to the Tiananmen Square massacre" as a headline in actual print.
More importantly, if the content is actually relevant to the user's search, does it matter whether the title is clickbait or not?
Clickbait pisses me off when it's used to waste my time, but a good search engine wouldn't give me results that waste my time.
In other words, it could give me a relevant result with a clickbait title.. I guess that'd be a little annoying but I don't know if I would want Google to be the judge on what's clickbait or not, and even then I don't feel like it's their place to override titles. I wouldn't want useful pages be downranked just for having a poor title.
A poor title reduces the quality of the resource, though. I think it’s reasonable that there is some penalty imposed for poor titles, and that could include clickbait. If the result is the best one for the search, sure, surface it. But if it’s not clear, though, “clickbait title” is a signal that the result is not the best.
I do agree it’s not really Google’s place to be rewriting titles, though. That feels very suspect.
> A poor title reduces the quality of the resource, though
Is there an objective way to assess quality?
A click-bait title on a page full of ads and text that keep the visitor's attention but don't deliver on the title... ?
Then having held the visitor on your site for a minute or two, but managing to leave them unsatisfied, how about ending the page with a big fat block of even more visual click-bait content at the bottom (Taboola, I'm looking at you).
Don't advertisers and publishers love this stuff? Great metrics.
> Tested by Experts is obviously clickbait; nobody's going to say [Tested by novices].
Nobody would write [Tested by novices] into their headline, but leaving out the part in the brackets would leave it open if it was tested by experts or novices. So in this case the removed bit does provide some information.
>>> Because I want the title to be concise, but still help people explicitly understand what my writing is about?
And yet from TFA:
>> In fact, we found that matching your H1 to your title dropped typically dropped the degree of rewriting across the board, often dramatically.
Users don't look much at titles - they end up in the browser tab or somewhere like that. If a title doesn't match the H1 heading it's often to get more stuff in for SEO. OTOH short titles might be useful when they show up in a tab where there is limited space. Maybe they shouldn't lengthen them for that reason.
Google should be a neutral middle man providing the results as they are found. If they feel the title is not of their version of quality they should rank it lower.
I'd prefer the version of title of several hundred million individuals rather than Google's aggregated version.
They used to 'borrow' DMOZ titles before DMOZ became defunct. At least in that case it's another point of view on top of their own (and the site author)
Google can't be a neutral middleman because everybody is trying to manipulate the search results. If everybody is clickbaiting their page titles, and Google just displays them as is, it makes their product worse.
Well nowadays a lot of well known websites use clickbaits regularly, e.g., wsj and NYTimes. Many times, they are willing to summarize the news in the title when the news itself is not that complicated.
You step away from neutral as soon as you introduce "version of quality". There will always be an introduction of bias and judgement calls that need to be made to get useful results, especially because bad actors on the web are part of the geography that aren't going away. Just like the press trying to force a neutral "view from nowhere" leads to confused and problematic journalism that can be exploited by bad actors.
Indeed, quality/bias/judgement - I wouldn't argue about it wrt 'going away from neutral', I just meant that if a decision is to be made, either de-value it or show it in the top results, either way don't tinker with the information as it was laid out.
For the same reason they extraordinary amounts of time to "carefully craft" the content of the page? And the images, and the citations, and the links, etc.
For the sake of quality.
I think I see where you're coming from, but come to a different conclusion.
If you are, rightly, disappointed about low quality results in SERPs, then why not direct your frustration at Google's search algorithm? But ultimately once the algorithm has decided what to return, I don't want any of it to be tampered with. Maybe there's an argument that once you're using a black box, it might as well be the best black box it can be, but I don't agree.
I wonder whether there is a case for legal action here. Google would not have wasted time developing this rewrite engine unless it had an effect on clicks. Whether that is positive or negative, only they truly know. What if it was found that it was, or wasn't, being applied consistently to the results of their competitors, but not their own sites, for example?
Google isn't doing this for the user, they are doing it so Ads are more clickable than organic search, they want people clicking on Ads. I can guarantee they won't rewrite the clickbait ads written by marketers who are paying for space. The result is ads are more likely to be clicked
100% of the above the fold content is now ads on many search terms, Google is doing everything they can to squeeze more ad clicks, not provide the best information to their users
Some titles of the past before they were optimized for clickbaitiness:
Omelas, bye-bye (The Ones Who Walk Away from Omelas)
Things are looking up (Great Expectations)
A crying cop (Flow, my tears, the policeman said)
The one that got away (The Old Man and The Sea)
on edit: I expect someone will point out those are the names of works of literary fiction not webpages, but obviously if we assume that webpages do not deserve the kind of respect we would give a creative work in book form and not change the title because it suits our needs, then we should not spend all our time complaining that the content of the web is just lousy stuff that nobody would care if you changed with an algorithm.
As a user, I'm fine with Google counteracting this.
The problem there is that "optimizing for clickbaitness" means "making the titles as appealing to click on as possible when they're displayed in search pages". Google deliberately making them less appealing to click on means Google are reducing the effectiveness of organic search results, and that favors adverts instead.
In other words, what you are saying is that you believe it's valid for Google to rewrite website content to make search page adverts more appealing than the actual search results.
That is very hard to justify. If Google wanted to 'punish' sites for being too clickbaity then they should drop that site's position in the search rankings. Ranking it highly but rewriting the title to be something worse (or 'less clickbaity') is a massive abuse of their search market position to favor their ad business.
If this were being done by a person, I might agree with you.
But it's not. It's being done by an algorithm which was carefully crafted to improve someone's chance for getting a promotion. It won't be maintained long term, yet it will continue to punish articles based on wholly arbitrary, biased, and opaque logic.
If this were being done by a person, I might agree with you.
But it's not. It's being done by an algorithm which was carefully crafted [by a person] to improve someone's chance for getting a promotion.
I made a little change there. Algorithms don't just magically appear like leprechauns and unicorns.
Google search is one area of Google in which this big company problem actually doesn't happen that much. Changes in the algorithm are never implemented by fiat, Google employs raters, and performs blind experiments to test if a change to search actually improves user satisfaction before rolling it out to everyone. So at least they must have some data that it increases user satisfaction, both with the metrics of the signal they measure, and with subjective raters satisfaction.
I don't get this, why are you ok with bots changing your content, even if it's to be displayed on Google SERP?
Why stop at the titles?
I have an idea, let's have bots rewrite the content in a compact tl;dr format and have it be directly displayed on Google SERP, as user, the less actions I take the better right? You don't even need to leave the SERP.
Why can't I just choose what title I want in my blog to be indexed, and if Google wants to penalize it, so be it?
> let's have bots rewrite the content in a compact tl;dr format and have it be directly displayed on Google SERP, as user, the less actions I take the better right?
Fuck yeah! I’d pay monthly for a search engine that does this consistently. Google already does this for the articles that are easy to parse, but I’d love to see what newer methods based on language models can do.
Btw this article is talking about the <title> tag which is mostly used for SEO since users don’t see it on the page. I don’t think search engines have ever cared about it all that much.
>As a user, I'm fine with Google counteracting this.
Would you be fine with Google changing the work of all authors? Maybe "The Brothers Karamazov" doesn't get enough clicks and Google decides it needs a better title. Or "A Portrait of the Artist as a Young Man" doesn't quite convey what Google thinks it should...
While the Zyppy article is interesting in that it has statistics about the title rewriting, the Google guide on writing proper titles is more relevant to all of us who maintain websites affected by this. Thanks for linking it.
The ideal article would be something like "Google rewrites your titles in search results because your titles suck."
The Google guide does well to explain why some titles are rewritten, such as having duplicate titles across multiple pages, making it impossible to differentiate between pages that show up in the same set of search results.
In other words, Google's policy is that the search result isn't showing the page title, it's showing Google's short description of the page. If Google thinks your page title is an adequate description it might use that, otherwise it will write its own.
(edit: and Google has enough self-importance to advise you to write your title as if it were a short description, to make their job easier)
> Takeaway: to dramatically decrease the chance of Google rewriting your title, matching the H1 to the title tag seems to be an effective strategy.
Of course it should be mentioned this wont last if it becomes popular. Historically every time an SEO trick gets popular, the rules are adjusted. Even having this article on the front page of HN might be enough to see Google react by rethinking how (or whether) tags in titles affect the title rewrites.
I wonder if Google is going to try out "AI"-generated titles that are directly summarized from the page content by machine, treating the page title and headings as inputs to the model.
Problem solved WRT copyright issues relating to news articles. If the AI derived content (a la GitHub copilot) is deemed as original "unlicensed" content, no reason to force users to visit the website. (it's been a while since the news media and Google had their legal battles, and I'm unsure what the end resolution was then)
artificial intelligence (or at least their corporate puppet masters) are fighting for copyright law protection on insights the AI derives from reading copyrighted pages and content on the internet.
One think I don't see getting discussed in the pros and cons is the simple fact that you can't even tell what titles have been rewritten. Google gives no information in the search results to tell you what is original and what they've rewritten. This matches other trends like how it's become ever harder to discern sponsored ads from organic search results.
I used to love Google for how it presented relevant results and made it easy to discern sponsored ads. Today, I avoid Google products like the plague. (I can't escape all of them, but I'm about 90% off.)
I had this problem recently, was hoping there was a reasonable fix but it appears not... (the H1 already contains the title)
I don't think about SEO, and just focus on useful writing / societal impact. However, I recently discovered by accident that I ended up with a top 2 search result for "platform democracy": https://google.com/search?q=platform+democracy .
But the title is missing the first 3 words—including the key words "Platform Democracy" — so that if I was a random person aiming to learn about the concept, I would likely skip over the result! (I almost did even though I wrote the piece!) This seems not ideal for either users or Google, and also an interesting exploration of AI/NLP impacts, so I tried to dig a bit deeper.
I had a brief exchange with Danny Sullivan, Google's public
@searchliaison on it on Twitter (https://twitter.com/metaviv/status/1484636387366289413) which linked to two guides from Google on this. Sadly neither were particularly helpful, but will share them here in case they are helpful to others:
(Also plausibly relevant: I have http://platformdemocracy.com/ redirect to the piece. I imagine this might impact search ranking, but I would be surprised if it impacts the title rewriting.)
Author here. Frustrating situation. As the title is long at 84 characters, we know that Google is definitely going to rewrite it. The simplest way is to break it into parts and get rid of the shortest part that still makes sense.
So maybe take
'Towards Platform Democracy: Policymaking Beyond Corporate CEOs and Partisan Pressure'
And 1) condense it and 2) lose the colon
'Platform Democracy is Policymaking Beyond CEOs & Partisanship' (60 characters)
If that is too condensed, you could try a a short title in the <title> and a longer title in the copy.
Zyppy's content marketing efforts aside, this wouldn't be so much of an issue if Google was any good at it
But as with its meta description rewrites, they're often worse than what was there to start with, and in some cases completely change the meaning, to the detriment of searcher experience
Thinly veiled content marketing for Zyppy, complete with CTA at the bottom, and mentions of themselves throughout, including: "Fortunately, here at Zyppy, we have a large database of titles thanks to our title tag analysis tool. Armed with this data, we set out to determine how often Google rewrites titles and the scenarios which trigger this behavior."
Furthermore, "HTLM" instead of "HTML"? Needs proofreading. Lol.
Pretty much any company producing blog content is engaging in content marketing though. I’m not sure I understand the criticism. Perhaps this particular piece was overly self-promotional?
Sure, there’s a balance to be struck, but I thought the article had some decent takeaways.
I'm saying that we should expect better from HN, and the people who frequent it. Otherwise, it's just a poor alternative to Reddit. If I wanted content marketing that nobody could even be arsed to have proofread then I'd go elsewhere. The ever-decreasing standard continues.
Quoting the last paragraph of the guidelines, which has a bunch of supporting hyperlinks:
> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.
I dislike that they’re an SEO company, but I don’t object to self-serving posts, as long as they’re also curious and interesting to me. Seeing a comprehensive list of how Google rewrites page titles is interesting to me, because I’m fascinated with headline writing.
I’m sorry that you find self-serving content and calls-to-action to be problematic, but I would warn you to expect more of them on HN over time (as neither are violations of the site guidelines). There’s no need to claim it’s going to turn us into Worse Than Reddit. It’s been this way for HN’s entire life, or at least the chunk I’ve been present for. HN’s just the same as it always was. I respect your outrage but you chose an invalid expression of it.
This isn't outrage. This is my opinion. I'm outraged by many things, some bullshit on HackerNews and your opinion of my opinion does not bother me. Have a nice day.
"SEO" is pure marketing whose sole purpose is to sell you a worthless product. The article says nothing you will ever need to know and it, and all other "SEO" "content", should be blocked as spam
It's quite funny to me that people are running these kind of almost scientific experiments on a fully human-generated and in principle knowable system. The reason are understandable of course but it does seem like a waste of human energy.
You see this a lot in gaming too. There are entire sites, devoted to figuring out what various weapon attachments actually do in Call of Duty. If you poke around the Minecraft wiki you'll find the same thing - people working out exactly how fast you can move with different potion effects or how scaling works when leveling up enchantments.
Theoretically, all of this information could be found in the source code, but without it gamers are left to an endless research project.
I'm not convinced that having the source code is necessarily a perfect shortcut to accurate results. Video games, in particular, seem to be subject to a decent amount of emergent behaviors such that scientifically measuring things is honestly probably a better option than trying to read the source code to find out what the developer thinks should happen.
At least in the case of gaming, I think (some) people actually enjoy this aspect. It's a waste in a lot of systems, but in an "art", I think it can elevate the experience, at least for certain games and genres.
An interesting inverse of the norm is the Roguelike ADOM. Most similar games from the same time period like Angband, Nethack, and DCSS were open source, while ADOM was a free, but proprietary game. The other games' secrets were open-book, with no real secrets to speak of as the source code is scoured by players. ADOM remains sort of interesting to me as there are red herrings in even the machine code to throw off reverse engineering, and genuine secrets that open source games simply can't have. I've always appreciated that you can't simply look at the source to know everything, anyway.
You are certainly right, there's a certain appeal to the mystery!
I remember reading an article on the Minecraft wiki about how to achieve the slowest possible movement, which is of course a totally useless thing to do in game, but you could see someone had put a ton of thought into working out how to do it! And who's to say that your slow machine is less an expression of artistry than playing the game "right" and building castles!
Minecraft at least is effectively open source; however, many of the quantities being measured are indirect consequences of the physics engine which would be difficult to derive from the constants in the code.
Agree. Then you imagine that the entire SEO industry is basically based around the idea that a company has a algorithm only they know, and the industry is trying to reverse-engineer that algorithm. If they released a whitepaper describing exactly how it works, the entire industry would have to change their ways to consulting already public information instead of experiments like this.
Interestingly, they are likely to find things that the developers themselves don't yet know.
These systems are large and complicated and time is finite. When it comes to analysis of a written system, there's a lot more time free-floating in the global network of users than there is in the group of a dozen, maybe a hundred, developers who wrote the engine (many of which have immediately been re-tasked to write something else).
Great systems view, that's the general basis of cooperation vs competition. we keep some things secret, stimulating other people to expand energy and think creatively instead of doing it for them. It becomes wasteful when the energy required to produce new information and techniques is impossible to obtain. e.g. in massive inequality: a homeless person just can't gain the skills to obtain a job, or an oppressed population can't overcome the excitation energy needed to free themselves. It's also the reason we outlawed monopoly in the U.S., only to reach the local minimum of duopoly.
This is tangential, but it could similarly be used to describe many support teams which staffed by non-technical folks, or are cut off from engineering for cultural, political, etc. reasons. It's a complete waste of energy, but for various reasons people get put in these situations and experiment instead of talking to an authority in another department, or getting an expert on their team. It can be sustained for a surprisingly long amount of time as well before someone gets called out on inaccuracies.
Similarly, there's also "research" being done to decipher and understand Apple's hardware and software. It does seem like a waste of human brain cycles.
> Legal processes are enormous wastes of human energy on what are usually negative-sum games
I don’t know if this is true. Private legal disputes can be purely antagonistic.
They can also be a form of short-term adversarial long-term adaptation to new information. Court cases, on the other hand, produce precedent (irrespective of the legal system). That, too, helps guide a society through novel circumstances.
Google has done this for practically as long as I can remember. If you remember when dmoz was still a thing, Google would favour the title from that, rather than the site's actual title because it perceived it as more useful to the user as it was moderated. By now I would expect that Google has used this and real moderators to train their machine learning model to rewrite titles, perhaps as a way to, you know, hopefully make the product more useful.
Mods frequently rewrite submitted titles, either cos it's not the same title used in the article, or because there's a better wording for the HN crowd ¯\_(ツ)_/¯
It's totally ordinary for a post to go front-page and attract hundreds of comments on the basis of a title that's later deemed too interesting/editorialized/whatever, and you later see some dry uninformative title like "Google account security" or whatever it was occupying a top slot.
I can't say I've noticed it often before for text posts, but I do generally think this pattern of closing the barn door after the horse has bolted is pretty silly in general.
...by making the titles more enticing for users to click on. Doesn't seem like that great of a solution. I would rather they went the other way by prepending "[possibly spam]" or "[possibly clickbait]" to the title.
There's an arrogance to this whole process that amuses me. I can picture the dev team responsible for this code sitting in a meeting with the lead dev saying "it appears that some idiot web authors are using titles that have extra information placed in brackets! Idiots. Well, let's brainstorm possible solutions to this problem so we can protect our idiot users from this obvious menace..."
Leaving them alone and giving us back pagination of search results would solve this problem for me. Or they could demote these sites that they think need their titles rewritten in their search algorithm.
The issue isn't that Google search results exclude stuff that was in the page title.
The issue is that Google search results insert stuff into the page title that wasn't there.
So the issue isn't that the overly long + pipe
<title>Which programming language is fastest? | Computer Language Benchmarks Game</title>
is abbreviated. The issue is that the domain of the hosting service is inserted, which gives the misleading impression that this is a project in-some-way approved and promoted by the Debian organisation:
20+ years ago also used the meta description tag from a page instead of page text snippets. We're many decades past blindly accepting page author provided content as being the most useful thing to display. People keep thinking of Google as a search engine greps pages to find matching text. That is old/obsolete thinking, any Google-like services has evolved how directly it can return the information/answer you seek instead of returning a page that may contain the information/answer you seek.
Good. Google's interests more closely align with my own than page authors. I'm glad to have Google as an agent working for me to make more useful page titles.
Surely the misconception is the belief that what google displays is the page title. Google displays a link to a page, with a short description of what you will find there. Likewise, when I link to a page from my page, I don't use the title of the page: I use some text that I chose. This a non-story, as far as "rewriting titles" goes. What is interesting is that Google has an automated way to briefly summarize a page.
Every day my desire to be able to rate sites relevance after a search increases.
And I'd love to be able to choose if the original or Google generated title was the most relevant. (Cmon there is some machine learning training potential in that).
Rather that than ditching google search completely which is getting closer every day.
On modern browser, the page title is almost completely obscured. It’s not a thing users generally see, and the few views where users have access to the page titles that are not some sort of developer tool, the title is more often than not cut short.
I don’t see why google has to use the page title as a headline for a link result.
Something has to be fishy with this because I get tons of "Untitled" results now which directly lead to spam. This sucks big time because I usually got really good results since I search a lot coding related things and now I cannot use this account anymore for searching
This data is based on what's seen in the wild, right? So if they see text in brackets removed more often than text in parentheses, that could reflect what sort of text people tend to put in brackets vs parentheses rather than (or in addition to) how google treats those characters.
I’m no search engine expert. Is this standard practice at some level across other search engines? Is “retitler” just part of every search engine stack (e.g. DDG, Bing, BraveSearch, etc)?
Or is this unique to the “I’m Feeling Lucky” folks?
Google literally turns the internet into a garbage dump. There are so many spam news sites that can come to the fore thanks to their seo nonsense that the sites that provide real news are not even seen lol
Google claimed that the original <title> is used "more than 80% of the time" when announcing the change[0].
Combining this rate with the rate seen by the article (rewritten 61% of the time, on the subset of 81,000 URLs they were interested in), I'd guess that some websites see a lot of rewrites and many other websites see none at all.
So an interesting distinction here is required! When Google says they use the title 80% of the time, they mean they use the title 80% of the time to create their search result title, which they may or may not modify. The other 20% of the time they use an H1 or other elements on the page.
From a pure HTTP perspective, isn't the point of page titles to be how a page is referenced? It would be an error if a library reported the title of "A Tale of Two Cities" as "It Was The Best of Times".
> stop using long, verbose titles
This is good advice, but if Google wants to penalize bad titles it should dock their rank, not misreport them.
> isn't the point of page titles to be how a page is referenced
It is, but what would you do if all titles across pages just said "ACME Corp."? That happens often if the developer just displays SITE_NAME in the title.
In those cases it makes sense to present the person searching the web with additional information from a H1 tag which probably has more information like "Contact us"
No thanks, as a Google user, I’m happy that Google is descriptive.
Ideally, Google tells me what the page actually contains. I.e. if you title the page “Top TVs of 2022” and you’re reviewing cars, then it titles it appropriately. Google can’t do that right now, but every step closer is a good thing for me.
There's lots of "isn't the point of..." in HTML that actual users have broken. Google (and other crawlers and intermediaries) have to adapt their algorithms to account for that.
> Google (and other crawlers and intermediaries) have to adapt their algorithms to account for that.
As I see it, Google's in a prime position to algorithmically reward actual users for better HTML discipline by ranking them above users who can't be bothered.
HTTP actually has nothing to do with page titles. I think web browsers should probably display the titles verbatim, but there may be use cases where they don't, a common one being where there isn't enough space so the title is truncated in the UI.
As for what search engines should do with page titles, it's really up to the individual search engine, I'd say. Whatever serves their users best.
As a search engine developer I totally get why. HTML in the wild is not well behaved in the slightest. People use title and heading tags in all manner of weird ways. I've seen <title>-tags in the <body>-tag used as headings. I've seen documents where every line was a <h1>-tag.
You kinda need to make the most of what you're given.
HTML5 has been around for long enough that we should be able to punish sites that use completely bonkers markup at this point right? Since Google effectively has historical archives of the internet they could pretty trivially grandfather in legitimately old content (things they tracked before some date) and just start down-ranking sites that continue to misbehave with markdown but skate by with browsers running in compatibility mode. Something like abusing <h1> tags is legal, if obnoxious, HTML and so it shouldn't really fall under this... but it's been long enough that we can start punishing completely incorrect syntax right?
That would be a massive loss, though. A lot of content isn't in HTML5, and a lot of that pre-HTML5 content is precious and valuable.
Google has sadly already tossed a lot of that by the wayside, since it often isn't served with HTTPS. I think something like 80% of the sites my crawler is aware of serve pages over plain HTTP.
In general, attempts at shaping the web through search engine indexing requirements seems to mostly serve to filter out content made by humans and select for search engine marketing.
Not so sure older content (like the stuff I wrote in the late 90s to mid 00s) would be negatively impacted, so long as search providers pay careful attention to the <!DOCTYPE> tag (or lack thereof). I wouldn't characterize holding people to at least a bare minimum of standards (e.g., title in the head and nowhere else, which has been the rule since at least HTML 2.0 in 1994) as "punishment", any more than dinging them for unclosed parens and other typos. Language is how we communicate understanding, and markup is how we frame presentations on the web (mostly). People need to be prepared for the consequences of making it up as they go along rather than educating themselves on the standard (whether spelling, grammar or markup language).
That really doesn't seem to be what I'm seeing, having built a search engine specialized in this type of content and finding almost nothing but gems in the refuse.
If anything, it seems like the single best predictor of whether a website is a content mill is strict adherence to modern web standards and other "google rules".
I think it'd be a pretty good to let in historical stuff on grace - and just start penalizing new content. Google absolutely has the tools to do this the right way and the internet archive could allow most other folks to accomplish the same thing.
Enabling HTTPS is easy on most platforms. Folks that have rolled their own platform or got unlucky and are using a CMS that fell out of favor do tend to get screwed over by this - but I think its fair to de-prioritize content that fails to adhere to good practices. The HTTP vs HTTPS debate in particular can be a real security concern - with tags its more about paying down the tech debt in our browser technology.
I really wish browsers would stop shrugging their shoulders at bad markup and display blank pages with errors in the consoles or even visible in the rendered page. It would force devs to clean up their act. But as long as 1 browser vendor doesn't do it, the end users will all just assume the strict browser is broken since there is another browser that does "work".
On the website of the company I work for, the title is "tagline | company name" but in the search results it shows up as "company name: tagline". That style doesn't appear on the company website anywhere.
I imagine it's Google trying to normalize how things are shown but it's quite annoying. It could potentially break some company's branding.
Shrug. The almost-religious belief in the necessity for ultra-consistent branding within some companies is nearly comical so long as you're on the outside.
Sadly agreed. Definitely not comical when I have multiple times had our marketing department blame/throw fits at the dev team for the site not showing up in Google's search results exactly how they want it to.
To be fair to the devs, that's an education gap. The response should be "You want us to develop a solution to a third party's whims? Maybe you should try writing them a nice letter about how their representation of our company affects our image; it'll have as much impact. Possibly more."
In real corporations, of course, that's not how it works because the tech people are "wizards" and Google is "part of the wizard stuff," but this isn't a technical problem (and maybe marketing needs to stop trying to control another company; that's no more likely to succeed than Coke yelling at Amazon that they don't always put Coke products at the top of every search result).
It's relatively minor for most businesses, but sometimes it isn't. Inconsistent messaging makes it a lot easier for someone to set up a phishing attack against your customers. My bank uses several different URLs, email sending addresses, and taglines for its services. It's not always easy to tell if an email is actually from the bank.
Google adding more permutations into the mix doesn't help.
Related, but I dislike when I'm bookmarking a page and the title is one word - the name of the product or the company. It makes it hard to search for it later.
I totally get this. Back in the day when I was a kid, we went to the local library and read about the world. When the librarians weren’t serving me by “checking out books” to me, they were busily putting new and improved titles on the books in receiving.
/s
Seriously. Google is starting to feel less like the librarian of the net (we index the world) and more like the Truman show: we craft your reality.
It’s the ads. The way Brin and Page phrased it in their 1998 paper, they considered ad-oriented search engines to be lower quality. They were going to be more academic. They thought that there was lots of user data to mine in search…for academic purposes. Then innovation #2 at the actual startup was the ad auctions and that was the beginning of the end, all the way back at the beginning.
I’ve recently read a lot about hedge funds, and it’s astounding how many scientists literally say, “I don’t think hedge funds add value to society, I wouldn’t work there.” And then the firm slides this check across the table, and they didn’t even realize a single check could have that many zeroes, and they join the firm and stay forever. That’s what happened with Google and all the rest.
Agreed. The industrialization of ad tech has been a loss for humanity. It’s a runaway mechanization at this point.
What I don’t understand, is why we don’t tax it. If an industry generates lots of wealth, but has a questionable impact on society, the “f(r)ee market” west’s response has usually been to throw a stiff vice tax on it. It doesn’t make the vice go away, but it puts a governor on its excess and redirects some of the spoils for projects which hopefully are net positive.
Well, you should try to establish the societal cost of the negative externality and then tax at that level. The idea isn't to destroy the thing but to make it's price reflect its actual cost
It's the exact same thing with almost every "technology" company out there today.
We're sinking our best and brightest (and also plenty of perfectly useful and adequate) talent into getting people to look here, buy something they don't need, or press button.
It's comically to contrast that with the same people who pretend climate change is an existential crisis. Meanwhile, so many scientists and engineers idealistically interested in that, leave for software-related subjects where they'll make 10X the money making the problem worst.
If you're not the principal investigator, a NIH grant will pay someone with a PhD + 7 years of experience...$65,292 with pretty weak guarantees on job security (etc).
"Then innovation #2 at the actual startup was the ad auctions..."
I don't think Google quite invented those, GoTo/Overture invented ad auctions and pay-per-click, but missed out on patenting them. Google did improve on the idea, with the second-price auctions.
Ads are a fundamental problem. They skew the incentives.
The search engine could, for example, give semi-poor results, making the person search again, increasing ad impressions.
An ad-supported search engine would also prioritize pages with ads that are also conveniently sold by the search engine.
As a user, I want a search engine to give me the best page with the fewest searches. An ad-supported search engine wants me to view more ads and click on them. Those are, if not orthogonal, often in conflict.
>An ad-supported search engine wants me to view more ads and click on them
Is this really the case? Assuming pay per click model and rational and competent advertisers. More clicks would increase their costs and reduce the generated value per click. The advertisers would limit their maximum cost per click. This would limit the revenue of the ad-supported search engine to the previous level (from before introducing bad search results).
It is possible that more clicks (generated by tricks and bad search results) produce more revenue for the advertisers. This would (slightly) benefit both the advertisers and the search engine.
In the end and in the long run the incentives of the ad supported search engines are alligned with their customers ( advertisers ) if the above assumptions are met.
The idea that ads affect Google's search ranking just isn't true. There are purposeful barriers between ads and search at Google to prevent this, such that the ads team can't even file bugs with search.
I don't think it would happen at the low level of engineers filing bugs. It happens at the highest levels of management, where the main concern is corporate profits.
Even if there's no explicit cooperation or algorithmic link between the ads and search divisions, everyone on the search management team knows that search is a huge, expensive operation that makes no money on its own. Advertising is what pays for their salaries, bonuses, operating expenses, etc., and you can bet that they make their executive decisions accordingly.
There’s a grain of truth here, in a bit of a tangent: librarians classify all books using a system like Dewey Decimal or Library of Congress Classification.
While not adjusting titles, librarians do have some influence on how a book is classified and thus filed/organised within the library. Check out the wiki article on Dewey[1] for the various options for homosexuality, which has numbers for it including under areas including mental illness! Depending on the library systems leanings you may still find it there or the section for sexual disorders or hopefully in the sexual relations area. (Disclaimer: I just used this as an easy example because it’s on Wikipedia)
The Dewey Decimal classification system is ridiculously flawed, and no self-respecting library uses it these days (unless it always has, and hasn't got around to re-organising). Even my school's little one-room library didn't, something I found annoying at first, but came to appreciate.
Disagree that it’s ridiculously flawed. It has issues like any system, but it still works well the majority of the time.
> no self-respecting library uses it these days (unless it always has, and hasn't got around to re-organising).
The vast majority of library systems have been around long enough where Dewey was the defacto choice (or LCC). Just checked a few like the British Library, the French National Library, and all the other libraries I’ve looked up now in London, all Dewey.
"Libraries in the United States generally use either the Library of Congress Classification System (LC) or the Dewey Decimal Classification System to organize their books. Most academic libraries use LC, and most public libraries and K-12 school libraries use Dewey." [1]
Those numbers were added as a consequence of the books that needed to be classified in the 1930s, and now that there are books that don't belong in the category there are new numbers.
It still comes down to librarian interpretation. Sometimes they will just defer to another source, like the national library of their country, or the publishers recommendations, but at least in my experience working part time in a library many years ago, the librarians out back doing the processing and cataloguing would refer to Dewey index guides and also make judgements based on the nature of the book (eg mostly practical vs theoretical nature would be the difference between a 6xx filing and somewhere completely different).
> Check out… the various options for homosexuality, which has numbers for it including under areas including mental illness!
Classifying a new technology is another major area where the original taxonomies need to be extended in order successfully index material. The internet, for example, didn’t exist at when LC/Dewey were originally defined.
Yes and no. Anecdotally, most of the SEOs and ppl who do SEO "part time" (e.g., ecomm store owner) still don't understand the foundation of modern SEO.
1. Google doesn't care about the sites. The sites aren't Google's customer.
2. More importantly, the person doing the search is the customer.
Unfortunately, most sites believe SEO is about them. They can improve how they present themselves but the "transaction" is not about them.
Google, serving ads aside, needs to maximize customer satisfaction or run the risk of losing a customer.
It's worth repeating: Google doesn't care about rhe sites.
If Google believes a site's content is a good fit for maximizing customer satisfaction, but the title isn't optimal then it makes perfect sense Google would want to optimize the title, if the title is the "gateway" to a happy customer.
Whether that's right or wrong, IDK. Whether it actually helps, again IDK. But from a pure relationship / business perspective it makes sense.
The customer is the person doing the search. The sites aren't viewing the ads. The sites aren't clicking on the ads. *That* is Google's #1 source of revenue. Full stop.
And thus, as originally stated and supported, too many practicers of SEO, with that wrong lens, continue to misunderstand Google.
Put another way, Google doesn't change the title for the benefit of the sites. There's simply no biz model / source of revenue to support that idea. None.
Given how site owners habitually attempt to distort reality with tag stuffing and other bullshit metadata, what do you expect? Reality is not what is printed on the tin.
Should I ask Walmart to kindly start relabeling products on their shelves because what’s on the tin is rarely as good for me as what the maker purports?
Maybe that’s what we need. An FDA metadata label for every website served, kinda like the fav-icon, but useful.
- Readable word count (protein)
- Ad count (fats)
- Image count (carbs)
- Embedded script size (the list of nasty sounding chemicals it contains)
- Average data transmitted (sugars)
- etc
Must be shown in black text on white background with a black border. Sorry dark mode guys.
Rather than showing the actual reviews, just lower the color saturation for lower reviewed products. So high reviewed products would pop in a sea of gray scaled items.
Sounds like something out of Black Mirror, but could be interesting.
How could that work out? Low quality sites usually have more juicy ad spots?
More seriously: incentives are stacked against search quality these days. Poor results means more trips into ad laden wastelands, and more returns to the ad laden search results page.
Giving people the result up front and center would directly affect quarterly profit I am afraid.
At least this is the model that makes most sense to me.
The next most probably is machine learning is already out of control and the people who created it left.
>...they were busily putting new and improved titles on the books in receiving.
The book titles are unchanged (when you visit the site) - this is just the Librarians adding synonyms and/or simplifying titles in their catalog so that it is "Dr. Strangelove" rather than "Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb".
That's kind of the fundamental insight here though, isn't it? Carnegie developed his libraries as a philanthropic endeavor, with the aim of supporting meritocracy in society. Google developed their library with the aim of making a shit-ton of money from advertising.
Google never was a suitable candidate for the world's authoritative librarian. Unfortunately, we'll probably need another Carnegie to displace them.
>and more like the Truman show: we craft your reality.
It should be more like an assistant: here are some boring tasks/questions, find out everything about them, summarize it for me and present me my options. I dont't really want to search or find something. I want to get things done, questions answered.
The reason is the cumulative impact of two things:
1. Algorithmic optimization of results for ad click-through rates, and
2. The scarcity of space for organic results on results pages for queries with commercial intent (because of the large amount of space given to ads). The high value of the clicks on those pages (sometimes $100+) drives marketers to focus disproportionate resources on SEO tricks and gaming to show in one of the few spaces left on the front page.
A search engine with ads and ad-tech tracking cannot work well for consumers in the long term. Google is now an ad-tech company, not a search company. It employs 3x as many people on advertising as search.
[Edit for clarity] It makes sense in this context to programmatically re-write titles to optimize conversion, rather than consumer experience.
Why was the title edited here on HN? The original title is much better, and had more information. A bit ironic given the subject matter.
Dang, was this your doing? If so, can we please have an open discussion on this? It's happened a few times and it's annoying and seemingly randomly enforced. The guidelines state not to editorialize headers but this rule gets ignored a lot. What was deficient about the original title?
My guess is because it was self-submitted, it was held to a higher standard. Fair enough.
And in some universes the title could be considered click bait, although it is accurate in this case.
Moderation is a tough job. You never win.
That said, the revised HN title seems like it was written by bad AI. The point seemed to be to drive it off the homepage. In that, the HN title succeeded.
Regardless, I'm happy the article generated a lot of interesting discussion before manually being deemed unfit.
It is curious how at the same time the title changed, all the top comments (which were generally supportive) got pushed to the bottom. And now, including yours. Assuming this article touched a nerve at the same time someone was having a bad day.
Yeah, I'm with Google on this one. I don't see many reasons why a site owner would spend extraordinary amounts of time to "carefully craft" page titles other than SEO and optimizing for clickbaitness. As a user, I'm fine with Google counteracting this.