This was actually my primary role at Google from 2006 to 2010. One of my first t...

copsarebastards · on May 12, 2015

> (This is when I learned about holding down shift while hitting the browser's reload button to make it act like it was a fresh page fetch.)

Most useful aside of all time.

snowwrestler · on May 12, 2015

I used to use this a lot. My experience is that for some reason, a couple years ago it stopped working reliably as a fresh page fetch. Some items were still coming up cached. Now I use incognito or private browsing windows instead.

laumars · on May 12, 2015

If you're running Chrom(e|ium) with developer tools open then you can right click the refresh button and it gives you a few refresh options (eg clear cache and reload).

That tends to be my fall back whenever I'm specifically fussed about the "freshness" of a page. That or curl

snowwrestler · on May 12, 2015

Thanks, never tried right-clicking that before. There's also a checkbox in dev tools settings to "Disable cache while dev tools is open."

findjashua · on May 12, 2015

i used to go through a lot of head scratching when doing manual testing, before discovering the joys of cmd-shift-r

olalonde · on May 12, 2015

> At some point, some SEO figured out that random() was always returning 0.5. I'm not sure if anyone figured out that JavaScript always saw the date as sometime in the Summer of 2006, but I presume that has changed. I hope they now set the random seed and the date using a keyed cryptographic hash of all of the loaded javascript and page text, so it's deterministic but very difficult to game.

I don't get why the rendering had to be deterministic. Server-side rendered HTML documents can also contain random data and it doesn't seem to prevent Google from doing "duplicate elimination".

KMag · on May 20, 2015

Byte-for byte de-duping of search results is perfect and fairly cheap. Fuzzy de-duping is more expensive and imperfect. Users get really annoyed when a single query gives them several results that seem like near copies of the same page.

Tons of pages have minor modifications by JavaScript, and only a very small percentage have modifications done by JavaScript that result in JavaScript analysis resulting in improved search results.

So, if JavaScript analysis isn't deterministic, it has a small negative effect on the search results of many pages that offsets the positive effect it has on a small number of pages.

ipullrank · on May 13, 2015

Great thread. I'm one of those people that was poking around and trying to figure this out a few years back.

Obviously, there's a lot I know you can't say, but I'd love to know your general thoughts on how far off we were: http://ipullrank.com/googlebot-is-chrome https://moz.com/blog/just-how-smart-are-search-robots

antimagic · on May 12, 2015

" Since the WSJ didn't do this for its English language pages, my best guess is that they weren't trying to hide content from search engines, but rather trying to work around some old browser bug that incorrectly rendered (or made ugly) Chinese text, but somehow rendering text via JavaScript avoided the bug."

Or maybe they were trying to get past the great firewall of China?

KMag · on May 12, 2015

> Or maybe they were trying to get past the great firewall of China?

Possible, but at that time the only affected pages were for a certain date range in their archives, not the most recent pages. I alse think the Great Firewall of China did simple context-free regex searches that would have caught the text in the JavaScript literals.

MichaelGG · on May 12, 2015

Did you load in Ajax? I've got a client that runs a site that loads HTML in separately. They've been paying for a third party service to run PhantomJS and save HTML snapshots to serve to Googlebot - is that no longer needed?

(I'm not thrilled about rendering this way, but it makes development a lot easier.)

tracker1 · on May 12, 2015

In practice, and from experience... content changes driven by JS tend to lag a few days, if the content was changed via direct output... If you're doing client-side rendering, couldn't you refactor to use node, or similar for your output rendering?

If you aren't heavy reliant on conversions from search traffic, you can probably get away with being JS driven, I'd suggest sticking with Anchor tags for direct navigation with JS overrides. Assuming you are supporting full url changes.. otherwise you need to support he shebang alternate paths... which is/was a pain when I did it 3-4 years ago.

miketuritzin · on May 12, 2015

As an aside, did you work on the indexing team at Google? I was on the indexing team from 2005-2007, and I remember that Javascript execution was being worked on then, but I don't remember who was doing it (was a long time ago ;) ). My name is my username.

KMag · on May 13, 2015

I was always in the New York office (before and after the move from Times Square to Chelsea), on the Rich Content Team sub-team of Indexing. My username is the same as my old Google username.

I was working on the lightweight high-performance JavaScript interpretation system that sandboxed pretty much just a JS engine and a DOM implementation that we could run on every web page on the index. Most of my work was trying to improve the fidelity of the system. My code analyzed every web page in the index.

Towards the end of my time there, there was someone in Mountain View working on a heavier, higher-fidelity system that sandboxed much more of a browser, and they were trying to improve performance so they could use it on a higher percentage of the index.

miketuritzin · on May 14, 2015

Ah, okay, cool. Never visited the NY office. That's probably why I just remember the general idea that "JS execution was being worked on."