Is there a reason browsers don't have decent line breaking algorithms yet that make paragraph text look better than the current "left" and "justified" CSS alignments? I miss the way LaTeX lays out text.
Also, maybe I missed it but are these algorithms trying to find the best line breaking solution? How much more efficient can you make this if you look for "good enough"?
The reason why efficient DP algorithms exist is because the evaluation function you're trying to minimize is decomposable over the decisions (of where to put a line break). Rivers on the other hand can be optimally evaluated only after you've put your spaces and breaks.
Similar issues are encountered in machine learning for sequence tagging or natural language parsing, where the loss function does not always decompose over sequence of decisions. The statistical models used there (that replace Viterbi-like DP algorithm) try to compensate by reinforcement learning or decomposition tricks. It works quite well.
> there's no need to look for good enough, there's already a nice n log n solution.
How efficient is that under local changes? E.g., when the user (or javascript) changes one word in a long paragraph, does the algorithm require a re-analysis of the whole paragraph?
In the worst case yes, since that might change the number of words that fit in that line, and that would mean exchanging words with the line before or after that, leading to a cascading effect.
However, you might find that many of the old linebreaks can stay as they are, so I'd expect reflowing to be faster in practice.
I'm thinking the algorithm will then not be able to provide a responsive UI in all cases, so perhaps browsers should allow for a temporary suboptimal rendering.
Browsers have to worry about things like floats, so that the available width for a line can vary depending on the line-breaking choices made for the lines above it.
You say this as if browsers invented floats. Pretty sure TeX has knowledge of floats. Would have to double check, as I am used to LaTeX. These are not new terms to publishing.
Edit: Saw in a following post that TeX doesn't have direct primitives for what are commonly called floats. \midinsert and the like aren't quite the same, and while \parshape is hella powerful, it isn't the same either. I'm still skeptical that these aren't powerful enough. And I'm still fairly certain LaTeX does have floats, but I was wrong on TeX.
> Is there a reason browsers don't have decent line breaking algorithms yet that make paragraph text look better than the current "left" and "justified" CSS alignments? I miss the way LaTeX lays out text.
The CSS model supports floats, while TeX doesn't. Floats make the problem much more difficult, because floats can have ceilings anywhere within a block, and placing a float reduces the amount of space available for an arbitrary number of lines below it. To make matters worse, the spec requires that floats be placed as high as possible, so browsers aren't free to move the floats around to try to minimize raggedness.
Parshape isn't as expressive as the CSS float model, because the entire shape of the paragraph has to be specified up front. CSS floats, though, effectively change the shape of the block as the block is laid out.
I'd argue it is just different. CSS floats really only deal with square shapes, still. (I could be behind on this point, not something I've cared to do in... well, ever. :( )
Quickly googling for the example from the TeX-book of the text in a circle doesn't give anything similar for html.
Rectangle vs circle isn't the point of this thread, as I understand it. It’s that the calculated width of text determines the width of a float when the float has no specified width. Here is a fiddle demonstrating what I understood people were talking about regarding floats, although it is a different thing than the other jsfiddle in this thread is showing: https://jsfiddle.net/1wj3efkm/
The width of the float not being fixed seems like a massive goal post shift. Unnecessarily so, honestly.
That said, this problem doesn't seem that much harder, honestly. Each float has a max size. They are only smaller if they had less text.
And see the example in the TeX-book. Basically a circular cutout for text aligned on a circle. Both parts fully hyphenated to fill the outline. I'm on my phone, but will try and find an example, if it would help.
It's true that there's no feature that directly corresponds to parshape. (This is what the ill-fated CSS Regions spec was for; the problem with it is that it was far too expressive to the point of becoming near-unimplementable.)
However, I believe you can simulate parshape by just stacking line-sized floats on top of one another and using clear to stack them vertically. This should give you just as much expressivity as parshape.
I think the advantage TeX has here is only over HTML. Specifically, that it is a full programming language. Well, mostly full. Adding in JavaScript and I don't know how the web couldn't do these things. Some of the abstractions will make it harder, of course. But some will make it easier, too.
One thing that makes this hard is Unicode & OpenType. Decent line breaking requires hyphenation which with complex shaping requires entire words to be re-shaped for each combination of possible hyphenation points if we want optimal breaks.
You need more than that. Find a typical LaTeX document and compile with and without \usepackage{microtype}. The difference is enormous. You need character expansion (small expansion to the characters themselves) to do it.
Isn't line breaking an NP complete problem, akin to bin packing? I seem to recall hearing a presentation about it in the context of autoformatters for code. For example, Auto formatting Java code, which is typically verbose, relies on heuristics to avoid computational snares.
It depends what on what bells&whistles you want, but line breaking is “theoretically” quadratic but, for most use cases, basically linear. A real misfortune is that the Knuth-Plass algorithm was published before we had a good grasp on how to present algorithms well. The actual algorithm is basically just a linear pass (reverse) along with a memo-table. (This is usually described as “dynamic programming”, but I find that term less than useless.)
Even if it is NP Complete, that would really just limit you on the size of a text for which you could find the absolute best line breaking.
Consider, the traveling salesman for the capitals of the US is easily solvable, due to the small size of the problem space. Similarly, I'd imagine the actual problem space to not be that large on line breaking.
Specifically, since line breaking is limited to the paragraphs you are looking at, this probably isn't an issue in most cases. Degenerate cases could be checked for and processing limited to not try too much.
It doesn't actually matter, unless you need an exact solution to the problem.
There are often a number of relatively close and efficient solutions when "close enough" is what you're aiming for.
The independent variables are the width of the line in characters (shown on the horizontal axis), and the length of the text in words (the "depth" axis). The dependent variable is the time that the algorithm takes to run, and it's shown on the vertical axis.
The orientation of the graphs doesn't change, but the scale does pretty massively, and seems to indicate the rough efficiency of the algorithm (the time is always in the range of 0.0-1.0 seconds, so it's clear that brute force is many times slower than the more-advanced algorithms).
Also, maybe I missed it but are these algorithms trying to find the best line breaking solution? How much more efficient can you make this if you look for "good enough"?