I've actually written a Ruby library modeled on Readability (quite closely modeled - I read the source code of the bookmarklet and based my library on what I learned) that is excellent for screen scraping - like Readability, it is pretty good at finding the element on a site that holds the content, and once you have that, it's trivial to pull the content out.
If that sounds useful to you, let me know, I can probably open-source it.
I did something similar in Java, though it was for a company so I can't open source it. FYI, I noticed that the script changed drastically between (roughly) December and January. The new script works a lot more reliably in my experience. It now has a multi-pass algorithm that relaxes various criteria if it can't find anything with the strictest settings. It also looks for content DOM nodes and assigns points to parent and grandparent DOM nodes. It used to only assign points to the parent, which would give the wrong results in some cases.
In any case, I was just thinking that I would really like to get a python library that does the readability thing for a personal Google App Engine project I have in mind. If anyone knows of anything, I'd love to save some time. Otherwise, I'll probably start from Beautiful Soup and try porting readability on top.
Your Ruby code might also be useful if you end up open-sourcing it.
I use it so often that I have it bound to a Quicksilver hotkey. I must use it at least 5 times a day.
Even in the cases where the layout isn't particularly bad the consistency of reading things formatted in a way you're used to makes stuff easier to read.
Until today I always used to CTRL+Scroll and highlight text... NO MORE :)
I have it on my FF bar now... would love the feature to change the style it while looking at the page.. different pages need different reading styles for me.
There's so many websites with tiny text and colours with really poor contrast, especially people's blogs about their business/start-up experience, I've found.
Most of them are just about readable on my 24" iMac, but on my tiny little netbook, if it wasn't for Readability I'd miss out on them.
Of course, in an ideal world we wouldn't need Readability as much because people would consider small screen sizes and poor eyesight when picking the site design.
I'm trying to apply these usability lessons (based on my multiple-times-daily use of readability + instapaper) to my apps.
There's a lot to be said for clear, large, readable, high-contrast text, either where there's a big block of text, or a critical label used for skimming the layout of the page. If you want your users to be able to quickly find the element on a page that's important to them, give it a big fat text label.
I sometimes give advice to the web editor at a consumer magazine-- their body text is small, tightly spaced, and low contrast. As a result their average time-on-page and bounce rates are quite depressing. Unfortunately, as is the case in many large orgs, making the content (remember, they are in the content business) more accessible to the site's visitors is not necessarily high on the larger organization's priority list (they seem to only be interested in pageviews, thus boosting their ad inventory so they can qualify for larger ad networks).
I saved the bookmarklet with the keyword 'read', so when I'm a site that could benefit from it (which happens a lot) it's a simple: Ctrl-l read <return> away
I use this as well, but I wish someone would do a chrome extension which allows to apply Readability without needed additional click (for example they could display a tiny clickable icon behind every link, which will open the article in Readability mode).
Question: Don't you know if Readability can be used programmatically? I would like to have a script, which would automatically save a Readabilite'd version of a web page.
IMHO it would be pretty burdensome to have an additional element after every link, but the aesthetics probably depend on how often you're using Readability.
As of your question, you can inject <script> tags pointing to Readability into the DOM. If you want to do it without the browser, WebKit is your friend.
A couple months ago I coded a server side version of the readability bookmarklet. I made it to be able to link to a page and have it converted by the script. http://cold-sunrise-39.heroku.com/
There's also a bookmarklet there so you can easily process urls.
Most browsers let you control font size, and unless the site design is frozen with terrible margins, you can fix that by resizing the window. It still offers some help with the font and color scheme, but I guess I need to play with it more to understand the value.
A similar little project I undertook recently: http://purepistos.net/thankful-eyes Still needs polish, but I use it often instead of browser zooming or Readability. Feedback or patches welcome.
Maybe not everyone is aware so I want to point out these issues:
1. Don't use it if the page is over SSL (it'll include external JS over HTTP which means that you are vulnerable to MITM)
2. Don't use it if the website carries "sessionid"s over URL
3. Keep in mind that arc90's JS can actually read the cookies (I'm not saying they are but they can). That means if someone hack into their systems they can access to cookies in used websites. (think XSS). Obviously by using it you trust instapaper guys with your account in the active website.
Developers of Readability should point out these security issues clearly in their website.