Hacker News new | past | comments | ask | show | jobs | submit login
SEO in JavaScript Web Apps (alexmaccaw.com)
81 points by maccman on July 22, 2013 | hide | past | favorite | 51 comments



Another option is to make a website that works without JavaScript. Only dynamically fetch pages when JavaScript is enabled. Progressive enhancement rocks.

Visiting http://monocle.io/posts/how-yield-will-transform-node without JavaScript support yields a blank page.

All page titles are <title>Monocle</title> which isn't very descriptive and the meta description is the same for every page. Only with escaped_fragment can a user see descriptive page titles. For JavaScript users all pages are titled "Monocle".

There are no unique content articles to rank nr. 1 for. The articles are all found on other sites. I don't really see Monocle rank 2 a lot (a quick glance). Those are reserved for other aggregating sites.

The Google guidelines say:

  Make pages primarily for users, not for search engines.
  "Does this help my users? Would I do this if search 
  engines didn't exist?"
I'd extend that to JavaScript apps. Why make escaped_fragment especially for search engines, and then forget to offer this functionality for human users too?


Honestly, with every worthwhile browser (both desktop and mobile) supporting JavaScript and having it enabled by default, it doesn't make much sense to support browsers with it disabled. The couple million people with NoScript installed who enable it only on specific sites know to enable it if a site doesn't work.


It does make sense to me, because:

- Not all search engines can handle JavaScript, Google is leading the way, but not perfect.

- Those NoScript users could be potential clients or users of your site. I love it when my conversation rate goes up 1%, and that becomes harder when you ignore 1-5% of users.

- Screenreaders generally do not support JavaScript [not true, see comment]. If only one blind user gets to access the content/design I created, then that is worth it to me. I am thinking as front-end engineer here, not as a business owner, where time is money. (Also depending on jurisdiction it may be against the law to be inaccessible).

- Noscript users will likely bounce in large numbers when seeing just a blank page. Simply adding a <noscript> tag, explaining why you need JavaScript goes a long way.

As a front-end engineer I go for maximum content accessibility. I don't meddle into the politics of things ("If we don't drop IE6 support, the web won't move forward!").

I totally understand the new landscape, where a lot of people have JavaScript on. Some web apps can not use progressive enhancement, because the JavaScript is core to the app. But for "static" content websites like these, it is certainly possible to make a website that is usable by most users, human and robot.


"Screenreaders in generally do not support JavaScript."

Please stop saying that. It hasn't been true for a good 5 years or so.

* http://www.brucelawson.co.uk/2011/javascript-and-screenreade...

* http://webaim.org/projects/screenreadersurvey4/#javascript

* http://www.w3.org/TR/WCAG20-TECHS/client-side-script.html


Thank you! I've been behind the times. I'll start viewing it as a general accessibility issue, not specific to screen readers. But like Steve Klabnik commented, and from http://www.w3.org/WAI/intro/aria.php , there are a few more steps to take to make JavaScript enabled screen readers play nice:

"WAI-ARIA addresses these accessibility challenges by defining how information about this functionality can be provided to assistive technology. With WAI-ARIA, an advanced Web application can be made accessible and usable to people with disabilities."


But wai-aria has nothing to do with JavaScript, rather the application of semantic elements to make non-native controls able to be interpreted by screen readers.


Thank you, now I have more things to point people at than just my blog post.


I'm not saying that you shouldn't ensure that your website is crawlable by search engines. You absolutely should. But we're at a point today where you simply don't need to worry about whether your site works without Javascript or not for the functionality of the site. Even Mozilla is removing the option to disable Javascript from the browser's UI (you can still use the advanced about:config or extensions).

As mentioned elsewhere, screen readers fully support Javascript and have for a very long time (which you've updated your post to acknowledge). That was one of the big reasons I used to ensure sites worked without Javascript years ago.

NoScript users know enough to turn it on for a site for it to work. NoScript users only make up 0.08% of internet users worldwide, a far cry form 1-5%. Basically, you can safely give them a significantly reduced site experience or just a message to turn Javascript on. They're not really worth the effort in most cases. Hardly anyone disables JavaScript anymore as most sites simply won't work right without it.

If you want to expend the extra effort, more power to you. It's just that for most sites nowadays, it's not worth the time/money anymore.


While I'm not going to say that all screen readers don't support JS, in many cases they can: http://words.steveklabnik.com/emberjs-and-accessibility


It's not a question of having JavaScript enabled/disabled. It's a question of whether the JavaScript gets delivered from your server to the client's browser properly and fully in a state that it can be executed successfully. Progressive enhancement is about robustness, being adaptable when the network fails to be perfect.

Look at it this way, this __escaped_fragment__ is only supported by Google. No other search engine supports it. And considering that building it with progressive enhancement first, and then enhancing it with JavaScript, you get content that's indexable by any search engine, not just Google.

The workload is the same, the complexity is the same, the only difference is the focus on progressive enhancement first rather than try to bolt a clearly less optimal solution later.


At least Bing supports the hash fragment as well, but it looks like you have to enable it manually http://www.bing.com/blogs/webmaster/f/12248/p/671232/9669509...


You have to support non-JS browsers if you want to support search engines crawling your sites. The hash fragment is a non-JS method that only works with search engines. Why not use something that works with noscript browsers as well, and incidentally with search engines that don't support the hash fragment?


Don't forget that javascript performance on mobile is abysmal. If your use of javascript is much more than light fluff, its going to seriously annoy mobile users.

http://sealedabstract.com/rants/mobile-web-apps-are-slow/


> Honestly, with every worthwhile browser (both desktop and mobile) supporting JavaScript and having it enabled by default, it doesn't make much sense to support browsers with it disabled.

Well, javascript support in w3m and lynx is still pretty poor. It makes it harder to easily get some content from offline reading using wget/curl.

It does depend on what kind of site/app you're making. I'd say that if your main content is text, then requiring js makes no sense (especially if you're publishing some blog posts on configuring server software -- I might want do download that article to a headless server).

I think most apps also benefit from an old school REST architecture, so that it is possible to eg: script your todo-app with curl to create a new todo-item without having to go through 3 pages of api-specs.


>it doesn't make much sense to support browsers with it disabled

But that is exactly what you are doing anyways, you are just limiting it to googlebot for no reason.


Hey, thanks for taking the time to comment.

I certainly could make the titles more useful, in fact due to you suggestion I've just committed a fix to that.

I'm interested though, why do you think it's useful for users to read meta descriptions? Or the raw text used by spiders?

How does making this a pure JavaScript web app degrade the end-user's experience? I think I can effectively argue the opposite - that JS and client-side rendering makes for a much better experience.


It's slower to first load and usable application on the client side.

If we were to take two web apps, one using a rendr-style-render-on-server approach ( https://github.com/airbnb/rendr ), and one using a blank-html-bootstrap-through-js approach, the rendr-style app will win out for time-to-first interaction.

To take a specific example, loading the monocle home page gives a base html time of 355ms for me. setup.js takes another 570ms.

All told, it's an initial load of 355ms vs 970ms. Or "close to instant" vs "is something wrong? oh no, it's good".


>why do you think it's useful for users to read meta descriptions?

It isn't, really. I made a mistake and thought that every page (served to users and search engines) had the same title and description. A user is unlikely to read a meta description, unless he/she is on a search results page.

>How does making this a pure JavaScript web app degrade the end-user's experience?

NoScript users and special need users are unable to access your website's content.

> JS and client-side rendering makes for a much better experience.

Agreed. Progressive enhancement makes this possible and get the best of both worlds: Accessible content for NoScript users, spiffy JS rendering one page app for JavaScript users. RMS can even download your pages through Lynx.

If you go pure JavaScript, you can at least add a <noscript> where you explain why you need JavaScript to enjoy this site.


> "special needs users are unable to access your sites content" Sorry I have to call you out on this, ignoring the use of the term "special needs", how could this possibly affect someone's use of JavaScript?


I don't understand your question. Is "special needs" a term I shouldn't use? I got it from Wikipedia/accessibility.

  [Accessibility] often focuses on people with disabilities or special needs.


Having the same meta description on every page could lead to incorrect SERP snippets. See also https://support.google.com/webmasters/answer/35624?hl=en

Though such an obvious error is hopefully always ignored.


For an article that aims to "put misconceptions to rest", it's pretty damn short.

My companies do website/CMS services for small businesses–so SEO is WAY more important to us than an app or a content aggregator like Monocle. If our clients suspect SEO sucks for our product they will leave. Conversely, if our product has a reputation for good SEO, it can drive a lot of business to us.

Also, Monocle.io isn't setting the title tag for any of their URLs, so that's a pretty poor example to use wrt SEO.

I'd like to see a real, in-depth article that discusses the following:

- At this point, should I use hash-fragments or pushState?

- Which front-end JS framework (backbone, ember, angular, etc) has the best support for SEO features out of the box?

- Is Rails 4 + Turbolinks SEO-friendly?

- I'd love to see some kind of experiment/example showing that a JS/hash_fragment based site can actually rank well when competing against basic HTML sites. I know that SEO comes down to content and links (more or less) so experiments like that are hard/impossible. I just used to do a lot of SEO for Flash sites back in they day. In the end, you could only do so much and I worry that doing SEO for JS sites is similar.

Just because Google provides the hash-fragment feature doesn't mean they don't give such sites less weight when ranking.


Monocle.io is not getting indexed correctly.

see:

https://www.google.com/search?q=%22Node+has+had+great+succes...

Notice how the monocle.io link is totally useless. Overtime, this will get you killed by Google as they realize your domain is returning useless results.


Yeah, it's only failing because I list the summary of the posts on the index page, and Google isn't picking up the real link.

It'll take a bit more time for Google to index the rest of the site, but I've removed the summary from the index page so it shouldn't get picked up by this.

[1] - http://monocle.io/?_escaped_fragment_=


Any idea why it's failing here? The crawler is successfully fetching the content, but not associating it with the correct URL. Maybe google crawler isn't pushState aware?


So he starts with a JavaScript only application, then retrofits in a progressive enhancement layer in a Google-specified query string parameter.

He could just as easily done the core experience first with HTML, got a URL structure that is friendlier and RESTful, and then enhanced it with the JavaScript enhancement he needs to turn it into a perception of a one page website.

The bonus of doing it this way is clean URLs for each piece of indexable content.

Because, really, what's the advantage of http://example.com/?_escaped_fragment_=about-us over http://example.com/about-us ?


i will save you a lot of pain: just don't do it. the __escaped_fragment__ is the most idiotic recommendation google has ever given. basically you have to render two views, the user JS view and a server side rendered basic HTML __escaped_fragment__ view. oh yeah, the __escaped_fragment__ view will never be visited by your users and not by yourself - just by googlebot.

and now guess: what view will be less maintained, not up to date and regularly broken?

why? when there is no direct feedback, there is no direct feedback.

if you want to do SEO and as well go down the "but it's faster with JS" road just do "progressive enhancement" and history.pushState. the __escaped_fragment__ spec is a leftover from the ajaxy web2.0 times, and even then it was a bad idea.


I find it a little odd that this article focuses on the hash fragment approach and only mentions the HTML5 pushState in passing, and how to avoid it. There are a few scenarios where the hash fragment is more useful (states that don't map well to URLs), but pushState has the huge benefit of looking natural AND working in non-JS browsers in general.

I think it would be good to mention Sinatra in the title.


I focus on pushState throughout the article. Unfortunately, to spider a website that uses pushState you have to use a Google spec that was originally designed for the hash fragment.


No, that's wrong. You just use links to /the/normal/looking/URL and let the browser use JS instead when it can. And obviously you serve the real content at /the/normal/looking/URL instead of behind the obscure hash fragment method.


People should keep in mind that you do not always have the benefit of green field development; sometimes you have a project already written that does not have much of server-side component, and has no budget or time left for adding real pages/do graceful degradation (much less progressive enhancement.) In these cases, using a spider is pretty much your only choice. Some notes/recommendations:

- I'd recommend PhantomJS (there are some other packages built on top of it, but for my custom needs, using the original was better)

- If you spider a whole site, especially if it's somewhat complicated, log what you're spidering and see if and where it hangs. I started getting some PhantomJS hangs after ~100 URLs. In this case, it can be a good idea to do multiple spidering runs using different entry points (I use command line options to exclude certain URL patterns I know were spidered during previous runs)

- If you're spidering sites using script loaders (like require.js), pay careful attention to console errors; if you notice things aren't loading, you may have to tweak your load timeout to compensate. Using a "gnomon" (indicator) CSS selector is very helpful here.

- Add a link back to your static version for no-JS people in case Google/Bing serves up the static version. This only seemed to be problem shortly after spidering, but it's worth doing regardless (later, search engines seemed to start serving the real version)

- For those wondering how to keep the static version up-to-date, use a cron job, then cp/rsync/whatever your latest version to your "static" directory.

One thing I'd like to add is that I wish PhantomJS would support more things that node.js does (since some of its API is modeled after it), particularly many synchronous versions of functions. That aside, it's an incredibly useful piece of software.


I made a web service that will do javascript SEO for you. Check it out at BromBone.com.

We render your entire page for you a save it has html on a CDN. Then you can just do the simple routing described in this article, but send Google the page from our CDN instead. That way Google sees the exact same thing as your users, but you don't have to code it again.


> BromBone uses a real web browser to download your web pages. We run all that fancy javascript, make all your AJAX calls, and save the result.

Makes me wonder why Google itself doesnt do it like that. Maybe Google should buy BromBone...


> Makes me wonder why Google itself doesnt do it like that.

I do believe Google does do it like that.

At least, when I had a Twitter search widget on my site, Google indexed the content in that widget.


Does it matter that the JS-constructed HTML does not look anything like the spider-friendly version?

We're about to deploy an AngularJS application that is using PhantomJS to generate the spider-friendly content on the server. I'd much prefer to do this simpler method if it works just as well.


We're building a small-ish site now with Backbone where we do something similar with PhantomJS. In our `grunt build` task, phantomjs saves static versions of subpages. Each of those pages loads up the same backbone app, but the user (and search engine) sees the content they would expect to see no matter where they land – without waiting for Backbone to load and run through it's router. And it didn't take much time to setup.

Progressive enhancement is "The Right Way", for sure, but there are some projects where we just aren't concerned with targeting users without Javascript. That said, hopefully this will let us cater to many of those users, reap the SEO benefits, and provide a better first-landing experience for those not hitting the home page.

Edit: This is the grunt plugin we are using for this: https://github.com/cburgdorf/grunt-html-snapshot


Google's spiders prefer you not to try and 'detect' them and serve different content up to them. However, afaik they're fine with this technique (I mean, they even provide a spec to do so). See their content guidelines on 'Hidden text and links' [1]

> However, not all hidden text is considered deceptive. For example, if your site includes technologies that search engines have difficulty accessing, like JavaScript...

[1] - https://support.google.com/webmasters/answer/66353?hl=en


I don't believe so. I did the same as you (except with a Backbone app), and used an .htaccess to detect the fragment and serve up pages using Phantfom.


For AngularJS apps, you are welcome to try (and improve!) AngularSEO https://github.com/steeve/angular-seo (based on PhantomJS)



So, basically, the article says that in order to have good SEO with full js apps, you must also expose a non js app?

Like, you have to implement 2 sets of templates - one "classic", html only, and one for js?


https://www.cityblis.com/ I implemented the <noscript> solution which actually shows the same content (without dynamic positioning) to the users and allows them to go to non-javascript versions of the pages. With JS on I serve dynamically positioned content and with scrolls.

I also have an implementation for the search, but it's not pushed yet. It doesn't paint the first page, but only provides pagination for the users without JS.

What do you think of this kind of a solution?


As a noscript user I'd like to see an explanation to why I should enable javascript on your site. If you put a link to an explanatory page along with the "please enable javascript for a better experience" header I think that would be useful.

Of course getting the explanation down is going to take some effort you don't want to be condescending and you do want to give a meaningful explanation - perhaps with pictures, animated gifs? - but don't want to make it into a dissertation either.


Would just adding a line like "This site uses AJAX for a large part of its functionality" work or would you prefer a more in-depth description?


I would want to know what I was getting in trade for the increased risk of enabling javascript. Your proposed message isn't substantially different from the current message.

Noscript users are going to be more savvy than average users, but they are still users so if you are going to make any effort to inform them, make sure it's from the perspective of a user rather than a developer. Developers care about AJAX, users only care about what they see on the screen.


But I'm a developer, so I don't know what the best wording for this is. The point is, some of the functionality of the site is broken when not using Javascript because it's AJAXed and there is no <form> fallback with POSTs and so forth (that's not in the development budget)


You charge extra to do it properly?


I only have so many hours in the day, I already got assigned on a different project, they told me they don't have time to support less than 5% of users.


You ask for permission to do your job properly?


Personally i only serve client-side rendered content to browsers that are in my whitelist, everything else gets the server rendered content.

http://chadscira.com/ (set your user-agent to blank)


I wrote about using the same technique with a headless browser a while ago.

http://backbonetutorials.com/seo-for-single-page-apps/

Works nicely...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: