Hacker News new | past | comments | ask | show | jobs | submit login
Fuzzing Ladybird with tools from Google Project Zero (awesomekling.substack.com)
522 points by awesomekling 9 months ago | hide | past | favorite | 60 comments



And thus demonstrated is the value of lots of different implementations of a spec. Already one hole found in the spec in just this article, and I'm sure there will be/were more.


Yes indeed! We've found and reported lots of issues in the various HTML, CSS and JS specs.

Multiple independent implementations are crucial for the long-term health of the web platform, so we're trying to do our part! :)


> Multiple independent implementations are crucial for the long-term health of the web platform, so we're trying to do our part! :)

It's really great that you're doing this work. This principle also applies to many other specs. I've implemented a few and found multiple issues with real-world impact.


Awesome! Thank you for being the change you want to see. Inspiring to say the least, great work!


Why couldn't the fuzzer be used to discover the bug in the popular browsers?


A bug in the spec doesn't necessarily mean there will be a noticeable bug in the browsers; e.g. a crash.

The browsers may have been written to "work" / not crash over adhering strictly to the spec.


https://github.com/google/clusterfuzz

At least Chromium has integrated multiple different fuzzers into their regular development workflow and found lots of bugs even before going public.


> And thus demonstrated is the value of lots of different implementations of a spec. Already one hole found in the spec

That's a bit of a... non-sequitur. Imagine if you had tweeted "eggplants are my favorite vegetables", someone corrected you "actually they're fruits", and then you declared: "And thus demonstrated is the value of Twitter! Someone already made me a better-informed citizen in response to my tweet." This feels kind of similar.

This isn't to say what they're doing isn't valuable, or that there isn't value in having lots of implementations of a spec. Just saying that implication isn't there (yet) with this particular example.


I love that this project keeps showing how possible it is for a small group to make something amazing. This would be very hard to do in a company with stakeholders.


The project is cool but this post makes me wonder whether this particular approach - starting with something that "does an okay job with well-formed web content" and then trying to work backwards to fix spec and de facto browser behaviour and potential security issues can actually result in a production browser. Which is fine, one can always go back and redo things, especially in a hobby project but it's hard to escape the vague feeling some of this stuff might need to be architected in from the get go.


I don't know. It kind of feels like they are replicating real user (developer) behavior by producing lots and lots of weird, low-quality, and not-to-spec code that a parser will likely have to deal with. By doing so they are simply exposing bugs that real users (bad developers) would have done anyway. Seems like a totally legit way to test a complex product. No assumptions. Just lots of randomized nonsense that shows reality.


As a developer I would love to have a browser that strictly follows specs and doesn’t deal with any historic compatibility issues. I would focus on making sure my web app works best there which _should_ give best compatibility across a wide range of browsers.


These days, a lot of the historic compatibility issues are either baked directly into the spec (eg https://dom.spec.whatwg.org/#concept-document-quirks) or hard-coded to only apply on specific websites (eg https://github.com/WebKit/WebKit/blob/main/Source/WebCore/pa...). Unless you work for a company that's too big to fail, you're unlikely to encounter the latter.


This is a great link directly to the source code (`Quirks.cpp`). Here's a fun little snippet from near the top:

  #if PLATFORM(IOS_FAMILY)
  static inline bool isYahooMail(Document& document)
  {
      auto host = document.topDocument().url().host();
      return host.startsWith("mail."_s) && topPrivatelyControlledDomain(host.toString()).startsWith("yahoo."_s);
  }
  #endif


ABSOLUTELY.

But, and this is the crucial part, AS A USER YOU WOULD NOT because a large portion of the web is broken.

We don't live in a perfect, sanitary world, and the software we build and use reflects that.


I kind of don't buy that argument. The web is not fundamentally different from other programming environments, say Python or Java. It might sometimes be practical to have a python interpreter accept syntactically invalid input because it kinda knows what you mean anyway, but most programming languages don't work that way because it makes things harder in the long run, and the benefits are pretty miniscule.


The problem is that this kind of philosophy is fundamentally incompatible with HTML5.

There was an attempt for a "strict-mode" HTML, it was XML, but it failed (on the web) for various reasons (including IE). HTML5 specifies the exact behavior of what every browser must do upon encountering tag-soup, which is useful because real-world HTML has been tag-soup for a very long time.

I guess the strictest thing you can do is to die upon encountering "validation errors", but I don't think this would help much to simplify your job. (Maybe you can drop the adoption agency?) But now your parser chokes on a lot of websites - likely on hand-written HTML, which has a greater potential for validation errors but also typically simpler layout.

And HTML parsing is still the easy part of writing a browser! Layout is much harder to do, partly because layout is hard, but also because it's under-specified. Implement "undefined behavior" in a way that other browsers don't, and your browser won't work on a lot of pages.

(There have been improvements, but HTML is still miles ahead. e.g. CSS 2 has no automatic table layout algorithm, and AFAICT the CSS 3 version is still "not yet ready for implementation".)


Why would you want a web browser which can't open Facebook, X, or half of the other top websites?

And why would they bother to "fix" their websites when they work fine in Chrome, Edge and Firefox, but not in your very unpopular but super-strict browser?


> The web is not fundamentally different from other programming environments, say Python or Java

To me what makes the web completely different from any programming environment is the very blurry line separating code from data. The very same web page can produce totally different code two hours later just because of a few new articles with links, graphics, media and advertising. The web is that place where data is also code and code is also data; this must come at a price.


I think of the web like I think about Windows. Decades of backwards compatibility. Dubious choices that get dragged along because it is useful for people who can't or won't let go of stuff that works for them. It's a for better or for worse situation.


I'm not talking about the fuzzing but the design approach. As in, can you make a real browser starting with a kind of 'happy path' implementation and then retrofitting it do be a real browser. That part I'm somewhat skeptical of. It's a totally sensible way to learn to make a real browser, no doubt.


"real browser" is doing a lot of work in your comment. Feels like you're about to make a no true scotsman argument.

After all what is a browser other than something that browses? What other characteristics make it "real"?

If Ladybird browses, then it must be a browser.


"real browser" is doing a lot of work in your comment.

It's not doing nearly as much work as real browsers do!

After all what is a browser other than something that browses? What other characteristics make it "real"?

A real browser is a browser that aspires to be a web browser that can reasonably be used by a (let's say even fairly technical) user to browse the real web. That means handling handling outright adversarial inputs and my point is this is so central to a real browser, it seems it might be hard to retrofit in later.

I gave one example with the null thing, another one would be the section on how the JS API can break the assumptions made by the DOM parser - it similarly sounds like a bug that's really a bug class and a real browser would need a systemic/architecture fix for.


You might as well be describing Safari, Chrome, or Firefox. All are heaping piles of complexity that are tortured into becoming usable somehow. Such is the nature of software. We shoot lightning into rocks and somehow it does useful stuff for us. There's nothing inherently "right" or "wrong" about how we do it. We just do whatever works.


I'm afraid I don't follow how this is responsive to what I wrote.


I would say that a "real browser" — which I think is being used here to mean a "production-quality" browser, in contrast to a "toy" browser — would be a robust and efficient browser with a maintainable codebase.


> robust and efficient browser with a maintainable codebase.

i would say neither chrome or firefox score particularly high in any of these


We're well past absurdity on this line of argument.

Given:

A = a goal of just implementing just the latest and most important specs

B = shipping something they want people to use

There is no browser team, Ladybird or otherwise, that is A and not B, or, A and B.

For clarity's sake: Ladybird doesn't claim A.

Let's pretend they do, as I think that'll be hard for people arguing in this thread to accept they don't.

Then, we know they most certainly aren't claiming B. Landing page says it's too unstable to provide builds for. Outside of that, we well-understand it's not "shipping" or intended to be seen as such.


What a weird comment on their progress and being transparent. Better have a demo working and itterate on it right? By your way how one even finish anything?


The spec is so complex at this point, that I'm not sure you can go the other way. It would also force you to implement weird things nobody will ever use before letting people work with a basic page.

I'd love someone to prove me wrong, but I feel like you'd end up with "you can't display a paragraph of basic text, because we're not even done implementing JS interface to conic gradients in HSL space in a fully compliant way".


> it's hard to escape the vague feeling some of this stuff might need to be architected in from the get go.

When I'm developing something, work or otherwise, I find that I often write my worst code when I'm writing something bottom-up i.e. designed, because it usually turns out that the user of that particular code has completely different needs, and the point of integration becomes a point of refactor. I think the top-down approach applied at the project level is much nicer because it allows you to _start from somewhere_ and then iteratively improve things.

That is not to say you shouldn't take precautions. In Ladybird, stuff like image decoding and webpage rendering/JS execution are isolated to their own processes, with OpenBSD style pledge/unveil sandboxing. They aren't perfect of course, but it allows for the kind of development that Ladybird has without much worry about those aspects.


I'm not really suggesting Ladybird is doing something "wrong" or should do something else. Reading something like:

The fix is to make Document::window() return a nullable value, and then handle null in a bajillion places.

makes me think you're going to find something like this and do this kind of fix maybe once, twice, five times and then probably decide you need a more fundamental fix of some sort. Another way of thinking about it is 'What would, say, the Google Chrome team, wish they could do were they starting from scratch?' i.e. aiming for the state of the art, rather than trying to catch up to it later which may turn out to be overwhelming.


Even if they did 'something else' and produced a bullet-proof implementation they are still dealing with a buggy spec in the first place.

If someone thought their dev chops were 100% infallible why would they bother to fuzz the spec?


I think you're misunderstanding my point, it's not about implementation or spec bugs but design. Forget Ladybird for a moment and think of Firefox. Its core design was something along the lines of 'x-platform toolkit for making enterprise groupware apps' where one of the apps was a web browser. Kind of neat for 1998, by 2008 it was clear that's no longer a good fit for making a browser. Despite heroic efforts and many advances, Firefox has never really been able to close the gap to more recent browsers. And (statistically) nobody makes new browsers based on Firefox, it's effectively a design dead end.

It can be hard to retrofit 'complicated but decent parser with a js runtime attached' to something like 'safe parser of arbitrarily adversarial inputs connected to an open RCE' (i.e. something akin to a modern browser) if the latter wasn't a fundamental design goal to start with.


Who said the goal was to create a production browser?

This seems like a pure passion project: to return to the pleasure of building something just for the sake of it. Design and explore. Hack.

Not every endeavour has to become a product. As soon as you get users, you get obligations, and this tends to destroy these feelings.


Nobody said that. It's an interesting conversation, not an adjudication.


They've implemented SVG? This project is coming along faster than I thought. I watch enraptured


Yes, we have implemented a decent chunk of the SVG specification, although lots of things are still missing (animations is a big one) :)


I'm curious how you handle the things that are between SVG specs 1.1 and 2. Because AFAICT both Chrome and Firefox decided not to implement SVG 2. Yet both have grabbed a common selection of changes from SVG 2 and implemented them.

E.g., myRect.style.x = '50px' will work in both Chrome and Firefox, even though SVG 1.1 doesn't allow for this because "x" isn't a presentation attribute (and only presentation attributes are supposed to have corresponding CSS properties).

Relevant to animations-- the fact that Chrome and Firefox allow most (all?) SVG attributes as css props lets the user do a nice end run around SVG animations. They can just treat the SVG objects as if they were HTML and use the web animations API to animate them.


We're working based on SVG 2 and basically ignoring SVG 1.1.

I was unsure about the best approach here, so I asked Nikolas Zimmermann (original author of SVG support in WebKit) and his advice was to do exactly this. :)


That makes sense.

I was going to ask if you were prioritizing the SVG 2 features that are already implemented in Chrome and Firefox. But it appears the W3C has removed a lot of the new ones I remember from the spec (path data bearings, mesh gradients), and that both Chrome and Firefox have implemented a good amount of the existing spec like tabindex and friends.

(Ok, here's one-- "inline-size" and others for doing auto-wrapping text in SVG. Looks to be unimplemented anywhere.)


For issue #3, it might also be a good idea to have a maxdepth mechanism in gradients that point to other gradients; this would be a defense in depth control vs some error or limitation in your “have I seen this reference before” logic. I’m not familiar with SVG gradients; maybe there is a reason to have reference chains of these 1000 links long, but I’d bet that if you ever encounter this in the wild then it’s an attack or a fuzzer.


Btw in the anti malware space I saw this type of structure abuse all the time and I never saw a legitimate case more than 5 units deep.


This comment is being left from Ladybird. Hacker News works in Ladybird now. I use Ladybird for the few minutes a day that I surf sites like Hacker News and OSnews.

It is slow. It is fragile. But it works. That alone is amazing given how young the project is and how they have written literally everything from scratch.

I am really looking forward to Ladybird maturing.


Now that's what I call early adoptering! :^)


Interesting thanks. What bothers me though is that almost all developers do exactly what you see in issue #1: We found it! fix committed done! Nope, you should understand exactly what went wrong: assuming parents must exist... Now search the entire codebase for the same kind of mistakes. Use your creative brain to figure out where else same thing can happen. It will never be in just done place. All modern software is unreliable bug ridden nightmare, mostly because of capitalism constraints yes... but it is possible to do better


Will Ladybird make an appearance in Web Engines Hackfest this year?


A little off topic: what happened to the hacking videos on YouTube? Used to look forward to them but I haven’t seen a new one in a while.


To be perfectly honest, after uploading well over 1000 videos, I got a little tired of it. I still post monthly update videos, but it's been months since the last hacking video.

I'm still working on Ladybird every day, and I also manage two full time engineers now, thanks to the generous sponsorships we got from Shopify & others last year. :)


Glad to hear you're doing the right thing by yourself. I regularly go back and watch some of the mini series, or porting videos. I refer many graduate engineers to learn from your high display of clarity and pragmatism that you constantly display.

If the hacking comes back some day, I'll be delighted, but just wanted to say thanks for the fact that we have such a wonderful backlog thanks to your long term efforts.


Fair enough, and that totally makes sense. I guess I just miss the “Well, hello friends…” :)


That is totally understandable.

That said, I think those videos are a significant contributor to the project success. I hope they do not go away completely.

In fact, I think the videos are as important a contribution as the project itself. I remember seeing a quote once from a musician that said he was inspired by both the Beatles and The Rolling Stones. The Beatles showed him what a band could be. The Rolling Stones made him feel like he could do it too. I see that in Linux and Serenity. Your videos make me feel like I could solve any problem by just starting it and breaking it down into smaller, more solvable chunks. They are inspiration and I am not surprised SerenityOS has attracted people to contribute other ambitious aspects. The PDF browser, the GPU stack, and the RISC ports are examples of amazing projects in their own right. I think one of the reasons we see such ambitious contributions in such a young project is the inspiration provided by your leadership and the example set in those videos.

Regardless, thank you for the contribution so far. With the recent improvements to HTMLInputElement, I was able to use Ladybird to leave a comment on the OSnews site recently and it gave me a huge thrill.


Thank you for all the videos! I particularly enjoyed the porting and profiling/optimization videos and I still occasionally rewatch them to this day. :)

Your overall pragmatism and no nonsense C++ style is something more developers should aim to replicate imho.


Add me as another vote that misses them. I totally understand you need a break and other obligations take more time, but I hope you can still find the time to do them occasionally. :)


I absolutely loved the JIT series, but fair enough!


Me too. I'm currently watching the emulator hacking playlist https://www.youtube.com/playlist?list=PLMOpZvQB55bfk92aBKZ8p...


"fuzzing ladybird" is such a delightfully barbaric combination of words


Like some vaguely un-PC insult from an alternate-reality Scotland


I am secretly hopeful Ladybird can take over the world some day. Don't tell anyone.


[flagged]


Agreed; make sure that when you publish it you submit it to HN.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: