I don't know. It kind of feels like they are replicating real user (developer) behavior by producing lots and lots of weird, low-quality, and not-to-spec code that a parser will likely have to deal with. By doing so they are simply exposing bugs that real users (bad developers) would have done anyway. Seems like a totally legit way to test a complex product. No assumptions. Just lots of randomized nonsense that shows reality.
As a developer I would love to have a browser that strictly follows specs and doesn’t deal with any historic compatibility issues. I would focus on making sure my web app works best there which _should_ give best compatibility across a wide range of browsers.
I kind of don't buy that argument. The web is not fundamentally different from other programming environments, say Python or Java. It might sometimes be practical to have a python interpreter accept syntactically invalid input because it kinda knows what you mean anyway, but most programming languages don't work that way because it makes things harder in the long run, and the benefits are pretty miniscule.
The problem is that this kind of philosophy is fundamentally incompatible with HTML5.
There was an attempt for a "strict-mode" HTML, it was XML, but it failed (on the web) for various reasons (including IE). HTML5 specifies the exact behavior of what every browser must do upon encountering tag-soup, which is useful because real-world HTML has been tag-soup for a very long time.
I guess the strictest thing you can do is to die upon encountering "validation errors", but I don't think this would help much to simplify your job. (Maybe you can drop the adoption agency?) But now your parser chokes on a lot of websites - likely on hand-written HTML, which has a greater potential for validation errors but also typically simpler layout.
And HTML parsing is still the easy part of writing a browser! Layout is much harder to do, partly because layout is hard, but also because it's under-specified. Implement "undefined behavior" in a way that other browsers don't, and your browser won't work on a lot of pages.
(There have been improvements, but HTML is still miles ahead. e.g. CSS 2 has no automatic table layout algorithm, and AFAICT the CSS 3 version is still "not yet ready for implementation".)
Why would you want a web browser which can't open Facebook, X, or half of the other top websites?
And why would they bother to "fix" their websites when they work fine in Chrome, Edge and Firefox, but not in your very unpopular but super-strict browser?
> The web is not fundamentally different from other programming environments, say Python or Java
To me what makes the web completely different from any programming environment is the very blurry line separating code from data. The very same web page can produce totally different code two hours later just because of a few new articles with links, graphics, media and advertising. The web is that place where data is also code and code is also data; this must come at a price.
I think of the web like I think about Windows. Decades of backwards compatibility. Dubious choices that get dragged along because it is useful for people who can't or won't let go of stuff that works for them. It's a for better or for worse situation.
I'm not talking about the fuzzing but the design approach. As in, can you make a real browser starting with a kind of 'happy path' implementation and then retrofitting it do be a real browser. That part I'm somewhat skeptical of. It's a totally sensible way to learn to make a real browser, no doubt.
"real browser" is doing a lot of work in your comment.
It's not doing nearly as much work as real browsers do!
After all what is a browser other than something that browses? What other characteristics make it "real"?
A real browser is a browser that aspires to be a web browser that can reasonably be used by a (let's say even fairly technical) user to browse the real web. That means handling handling outright adversarial inputs and my point is this is so central to a real browser, it seems it might be hard to retrofit in later.
I gave one example with the null thing, another one would be the section on how the JS API can break the assumptions made by the DOM parser - it similarly sounds like a bug that's really a bug class and a real browser would need a systemic/architecture fix for.
You might as well be describing Safari, Chrome, or Firefox. All are heaping piles of complexity that are tortured into becoming usable somehow. Such is the nature of software. We shoot lightning into rocks and somehow it does useful stuff for us. There's nothing inherently "right" or "wrong" about how we do it. We just do whatever works.
I would say that a "real browser" — which I think is being used here to mean a "production-quality" browser, in contrast to a "toy" browser — would be a robust and efficient browser with a maintainable codebase.
We're well past absurdity on this line of argument.
Given:
A = a goal of just implementing just the latest and most important specs
B = shipping something they want people to use
There is no browser team, Ladybird or otherwise, that is A and not B, or, A and B.
For clarity's sake: Ladybird doesn't claim A.
Let's pretend they do, as I think that'll be hard for people arguing in this thread to accept they don't.
Then, we know they most certainly aren't claiming B. Landing page says it's too unstable to provide builds for. Outside of that, we well-understand it's not "shipping" or intended to be seen as such.
What a weird comment on their progress and being transparent. Better have a demo working and itterate on it right? By your way how one even finish anything?
The spec is so complex at this point, that I'm not sure you can go the other way. It would also force you to implement weird things nobody will ever use before letting people work with a basic page.
I'd love someone to prove me wrong, but I feel like you'd end up with "you can't display a paragraph of basic text, because we're not even done implementing JS interface to conic gradients in HSL space in a fully compliant way".