Is there a reason browsers don't have decent line breaking algorithms yet that m...

bitcoinboi9 · on March 3, 2018

It's not just line breaking that makes LaTeX beautiful. It's also space placing too.

Sometimes you can adjust the distance between letters in a word or distance between words to not break to many lines.

The algorithm is similar.

The algorithms do find the best breakage but as you can see, there's no need to look for good enough, there's already a nice n log n solution.

These algorithms work so well that then you get the problem of rivers. https://en.wikipedia.org/wiki/River_(typography)

The reason why efficient DP algorithms exist is because the evaluation function you're trying to minimize is decomposable over the decisions (of where to put a line break). Rivers on the other hand can be optimally evaluated only after you've put your spaces and breaks.

Similar issues are encountered in machine learning for sequence tagging or natural language parsing, where the loss function does not always decompose over sequence of decisions. The statistical models used there (that replace Viterbi-like DP algorithm) try to compensate by reinforcement learning or decomposition tricks. It works quite well.

amelius · on March 4, 2018

> there's no need to look for good enough, there's already a nice n log n solution.

How efficient is that under local changes? E.g., when the user (or javascript) changes one word in a long paragraph, does the algorithm require a re-analysis of the whole paragraph?

yorwba · on March 4, 2018

In the worst case yes, since that might change the number of words that fit in that line, and that would mean exchanging words with the line before or after that, leading to a cascading effect.

However, you might find that many of the old linebreaks can stay as they are, so I'd expect reflowing to be faster in practice.

amelius · on March 4, 2018

I'm thinking the algorithm will then not be able to provide a responsive UI in all cases, so perhaps browsers should allow for a temporary suboptimal rendering.

coretx · on March 3, 2018

The word you're looking for is "Kerning". And no, i'm neither a native english speaker nor a designer.

Tomte · on March 4, 2018

Kerning is intra-word spacing, but LaTeX goes beyond that and (I think preferably) adjusts inter-word spacing, as well.

mjw1007 · on March 3, 2018

Browsers have to worry about things like floats, so that the available width for a line can vary depending on the line-breaking choices made for the lines above it.

There's some discussion at https://bugzilla.mozilla.org/show_bug.cgi?id=630181

taeric · on March 4, 2018

You say this as if browsers invented floats. Pretty sure TeX has knowledge of floats. Would have to double check, as I am used to LaTeX. These are not new terms to publishing.

Edit: Saw in a following post that TeX doesn't have direct primitives for what are commonly called floats. \midinsert and the like aren't quite the same, and while \parshape is hella powerful, it isn't the same either. I'm still skeptical that these aren't powerful enough. And I'm still fairly certain LaTeX does have floats, but I was wrong on TeX.

jacoblambda · on March 4, 2018

Can confirm that LaTeX has floats. They can be a bit finicky to work with but they are there.

taeric · on March 4, 2018

Floats and positioning are easily the most misunderstood parts of the web. So, not sure it matters that they are finicky.

pcwalton · on March 4, 2018

> Is there a reason browsers don't have decent line breaking algorithms yet that make paragraph text look better than the current "left" and "justified" CSS alignments? I miss the way LaTeX lays out text.

The CSS model supports floats, while TeX doesn't. Floats make the problem much more difficult, because floats can have ceilings anywhere within a block, and placing a float reduces the amount of space available for an arbitrary number of lines below it. To make matters worse, the spec requires that floats be placed as high as possible, so browsers aren't free to move the floats around to try to minimize raggedness.

pgtan · on March 4, 2018

FWIW TeX has \parshape

pcwalton · on March 4, 2018

Parshape isn't as expressive as the CSS float model, because the entire shape of the paragraph has to be specified up front. CSS floats, though, effectively change the shape of the block as the block is laid out.

taeric · on March 4, 2018

I'd argue it is just different. CSS floats really only deal with square shapes, still. (I could be behind on this point, not something I've cared to do in... well, ever. :( )

Quickly googling for the example from the TeX-book of the text in a circle doesn't give anything similar for html.

alanh · on March 4, 2018

Rectangle vs circle isn't the point of this thread, as I understand it. It’s that the calculated width of text determines the width of a float when the float has no specified width. Here is a fiddle demonstrating what I understood people were talking about regarding floats, although it is a different thing than the other jsfiddle in this thread is showing: https://jsfiddle.net/1wj3efkm/

Also, everything on the web is a rectangle, but that hasn't stopped people from hacking around this (https://meyerweb.com/eric/css/edge/slantastic/demo.html), and there is even a draft web standard to give shape to text (https://drafts.csswg.org/css-exclusions-1/), but only Microsoft has implemented it (https://caniuse.com/#feat=css-exclusions).

taeric · on March 4, 2018

The width of the float not being fixed seems like a massive goal post shift. Unnecessarily so, honestly.

That said, this problem doesn't seem that much harder, honestly. Each float has a max size. They are only smaller if they had less text.

And see the example in the TeX-book. Basically a circular cutout for text aligned on a circle. Both parts fully hyphenated to fill the outline. I'm on my phone, but will try and find an example, if it would help.

pcwalton · on March 4, 2018

It's true that there's no feature that directly corresponds to parshape. (This is what the ill-fated CSS Regions spec was for; the problem with it is that it was far too expressive to the point of becoming near-unimplementable.)

However, I believe you can simulate parshape by just stacking line-sized floats on top of one another and using clear to stack them vertically. This should give you just as much expressivity as parshape.

See this jsfiddle I threw together quickly (floats in blue to show how the technique works): https://jsfiddle.net/vve89j06/4/

taeric · on March 4, 2018

I think the advantage TeX has here is only over HTML. Specifically, that it is a full programming language. Well, mostly full. Adding in JavaScript and I don't know how the web couldn't do these things. Some of the abstractions will make it harder, of course. But some will make it easier, too.

jahewson · on March 4, 2018

One thing that makes this hard is Unicode & OpenType. Decent line breaking requires hyphenation which with complex shaping requires entire words to be re-shaped for each combination of possible hyphenation points if we want optimal breaks.

ComputerGuru · on March 4, 2018

I think hyphenation is more of a hack that good line breaking can generally avoid.

kccqzy · on March 4, 2018

You need more than that. Find a typical LaTeX document and compile with and without \usepackage{microtype}. The difference is enormous. You need character expansion (small expansion to the characters themselves) to do it.