IMHO a text-based browser isn't exactly in the "challenging" category, as it bas...

zerf · on Dec 21, 2020

To this list I would add "Web Browser Engineering" [0] which is a textbook / browser engine that is currently being written by Dr. Pavel Panchekha at the University of Utah. The code for the book and browser is available on GitHub [1] and a more current bleeding edge draft is also published [2].

The book guides the reader in implementing a graphical web browser, starting with HTTP and HTML then moving on to the layout, the box model, CSS, browser chrome, forms, and scripts.

[0] https://browser.engineering

[1] https://github.com/pavpanchekha/emberfox

[2] https://browser.engineering/draft/

azhenley · on Dec 21, 2020

Thanks, I will add that book to the post! It looks really good.

jmnicolas · on Dec 21, 2020

> IMHO a text-based browser isn't exactly in the "challenging" category, as it basically amounts to [...]

All my projects start with me thinking like that, then many hours, days or months later me thinking "hey it was more complex than I thought".

For 2021 I want to build a personal finance app for myself. The usual me thinks it will take a couple months. The realist me wonders if it will be finished in this decade :)

ZephyrBlu · on Dec 21, 2020

There's a difference between scope creep and difficulty.

Forge36 · on Dec 21, 2020

It looks straightforward until you hit a couple of edge cases. Examples:

test <1 becomes test 1

Test< 2 becomes test 2

Test <a becomes test

Test < b becomes test b

(From memory)

What about: Test <fakeTag>?

Per tests i did, "test " was expected however "test <fakeTag>” was seen as the plaintext version suggesting there's a list of valid tags which is filtering the behavior.

userbinator · on Dec 21, 2020

That's because '<' needs to be followed by [!/?a-zA-Z] to be recognised as a tag start. Otherwise it is a literal '<'.

The full details are in here somewhere: https://www.w3.org/TR/2011/WD-html5-20110113/tokenization.ht...

benibela · on Dec 21, 2020

I have been stuck on such these edge cases for almost 15 years building my own HTML parser

It is always working on all the HTML files I have, but then people make new HTML files with other issues.

throwaway201103 · on Dec 21, 2020

Doing proper table layouts (including rowspans, colspans) is a little more than stripping and replacing tags.