IMHO a text-based browser isn't exactly in the "challenging" category, as it basically amounts to stripping all the HTML tags out and doing some very simple transformations (like replacing <br>'s with newlines.) Then again, one of the things I've been working on intermittently for the past few years is a graphical (CSS2+) browser, which is definitely in the challenging category. There are some other public efforts too:
To this list I would add "Web Browser Engineering" [0] which is a textbook / browser engine that is currently being written by Dr. Pavel Panchekha at the University of Utah. The code for the book and browser is available on GitHub [1] and a more current bleeding edge draft is also published [2].
The book guides the reader in implementing a graphical web browser, starting with HTTP and HTML then moving on to the layout, the box model, CSS, browser chrome, forms, and scripts.
> IMHO a text-based browser isn't exactly in the "challenging" category, as it basically amounts to [...]
All my projects start with me thinking like that, then many hours, days or months later me thinking "hey it was more complex than I thought".
For 2021 I want to build a personal finance app for myself. The usual me thinks it will take a couple months. The realist me wonders if it will be finished in this decade :)
It looks straightforward until you hit a couple of edge cases.
Examples:
test <1 becomes test 1
Test< 2 becomes test 2
Test <a becomes test
Test < b becomes test b
(From memory)
What about:
Test <fakeTag>?
Per tests i did, "test " was expected however "test <fakeTag>” was seen as the plaintext version suggesting there's a list of valid tags which is filtering the behavior.
https://github.com/lexborisov/Modest
https://github.com/litehtml
https://github.com/ArthurHub/HTML-Renderer
Along the same lines, some other challenging projects I recommend are to write decoders/renderers for existing formats like MP3, MP4, PDF, etc.