The web is now twenty and browsers are still incapable of something as basic and commonplace as hyphenation and justification. It’s a real shame that this problem has to be solved with JavaScript in 2010.
I've been asking for years why browsers do not have this. The only reply I've gotten is for performance considerations, which is a bad answer for several reasons.
Internet Explorer actually has this through the (almost standardized) text-justify CSS property. It still doesn't do hyphenation, but Hyphenator.js (http://code.google.com/p/hyphenator/) fills that gap pretty nicely.
Performance isn't a good argument in my opinion. The algorithm isn't that expensive. The most expensive part right now is retrieving all the text metrics, but you would get that a lot cheaper in the browsers rendering engine.
I briefly looked at hacking it into Webkit, but then gave up due to a lack of time.
I also looked into it recently, and Damon from the Gnome project attempted it in 2002 or so. Also Adobe+Google are trying to get it into WebKit. The problem (as far as I can tell) is that people have tried to do it all at once, and pushing such a large thunk of code upstream is very hard.
I suspect that if you take it in pieces: first get a decent hyphenation algo into Pango, then get that into FF and WebKit, then work on the line-breaker, and then get a new CSS rule approved by the W3C... well, maybe you could get it done in 3 or 4 years.
One other thing. This algorithm, unless I'm missing something, doesn't handle situations in which the available widths are different for different lines in the paragraph, and in particular in which they available width for a line depends on the precise break positions and vertical alignment results of all the earlier lines in the paragraph. Handling this is required to correctly handle CSS floats. Greedy line-breaking does this by the simple expedient of fully laying out all previous lines in the paragraph before considering the next line.
It's quadratic in length of the paragraph, no? Not a problem for most reasonable text chunks, but browsers have to deal with unreasonable text too. In particular, O(N^2) algorithms in browser layout are generally unacceptable...
But it would only apply when the web page author "opted in" with the appropriate CSS, no? Doesn't seem like it should affect performance on pages that don't use the feature.
If you made it an opt-in, that might be doable... though there would still be the danger of pages cargo-culting into the opt-in.
But at that point you're also asking browsers to maintain two separate line-wrapping codepaths, of which one is not used anywhere to a first approximation. Browser vendors seem to be somewhat resistant to doing that sort of thing.
It could maybe be switched off (even if requested) once a paragraph hits a certain threshold size. Of course I suppose that could get complicated as the paragraph gets mutated by JavaScript... you don't want to be turning it on and off all the time.
English hyphenation rules are very simple and relaxed by continental standards. It is about the only language where such a simple and straightforward algorithm as the one in TeX (it doesn't even contain a full list of word stems and a rule engine for the (de)construction of composite words) can work.
>> English hyphenation rules are very simple and relaxed by continental standards.
Really? From my experience, ESL students often don't understand the logic behind english syllables.
My point though, is that if different languages hyphenate differently and we're talking about sites with international user generated content (Facebook or Orkut come to mind), then it's not exactly trivial to hyphenate correctly.
Zooming is problematic, I haven't quite figured out a reliable way of fixing it yet. The problem is most apparent in Webkit based browsers, Firefox seems to handle it much better (though there are still some small problems--most can be fixed though.)
Yes, this is an application of dynamic programming. The computation is actually quite fast, most of the time is spent in retrieving the text metrics (put each word in a span, retrieve width and move on to the next word.) If that can somehow be alleviated it would become a feasible solution.
I agree with you that browsers should support this natively (Internet Explorer actually does.) If you are interested, I've written a bit on this subject in this Typophile thread: http://typophile.com/node/71247
Oh, I was wondering if it was just me. A few lines seem to have hanging right margins. This is probably because my morning lean-back browsing is done at +2 zoom-factor in Safari.
I recall the first interesting example usage my professor taught of dynamic programming was this example (well, described, not taught). He basically just said that all you do is take the L^2 or L^3 sum of extra space and minimize it using a DP. The fact that that yields great-looking text I thought was pretty cool.
I have the same Chrome version, but I'm running Ubuntu. Perhaps it is a font issue. I will check it out later when I have access to a Mac. Thanks for reporting.
Great work! The output is beautiful. Looking forward to this being a standard library in the future. Do you know how well it might handle cases of text that already has some formatting?
Also: It blows up in IE9 for me though, many of the lines go on for quite a ways.
I would also highly recommend "Digital Typography" by Knuth. It contains a more detailed description of the algorithm as well as many interesting historical and technical chapters on TeX, MetaFont and computer typesetting in general.
It looks really good in Firefox! However, I just tried printing the page and though it still looks decent, the justification is a lot worse than when rendered on-screen.
I honestly hadn't thought about printing it out yet. I had a quick look, and I think the issues you are seeing can probably be fixed. My initial plan was to render PDFs server-side, instead of relying on the browser's print mode.
As far as I know, this is the only implementation in JavaScript. It might be possible to turn this into a jQuery plugin at some point, but there are probably quite a couple of bugfixes and changes needed to turn this from a tech demo into a drop-in plugin.
How old is TeX again?