Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It is built by a former contributor to both Chromium and Firefox, and is built out of personal opinion on how Web Browsers should try to understand the Semantic Web.

Could you share more about this vision?

> writing my own SGML parser

How did you land on SGML?

What do you think of a browser/mode that parses markdown, so we can have a "markdown web" with less complex clients?



> Could you share more about this vision?

Phew, tough question. As I went into web development when XHTML 1.1 strict was the "cool shit", I kind of valued the aspect of using the web for acquiring and distributing knowledge. Not only for me, but also for publishing or other forms of media (e.g. by offering print stylesheets), screen readers, and semantic extraction of that kind of knowledge.

(I was also working on project(s) that were using DAISY to automatically convert websites into hearable formats to be consumable by blind people.)

Somehow from then (around 2000ish) to now, everything went to shit and nobody cares about that aspect anymore. News websites are too busy displaying ads and pushing subscription dialogs in my face (before I read a single line of their article) - rather than being readable or consumable.

And I kind of disagree with that. I want to make the web an automatable tool to acquire knowledge in an easy manner. And I hope I can do that programming-free. Currently, programmers can easily build scrapers - but imagine the possibilities once any person or kid can do that with a few mouse clicks.

I know there are a lot of proprietary scrapy-based solutions out there already, but honestly I think they're crappy. They see the web as DOM and not as a statistical model that a neural network "could" learn once you have a different way of rendering/parsing/modelling things.

> How did you land on SGML?

The reason why I am currently building my HTML(5) compatible parser with SGML ideas is because nobody closes tags. The spec is very complicated (especially while having an eye on what can be abused in the XSS sense or related security issues with CORS), so currently I'm kind of looking at a lot of parsers out there and try to find my own way of making this into a statistical model, so that in future my neural net adapters can optimize old HTML code into new, clean, HTML5 code.

> What do you think of a browser/mode that parses markdown, so we can have a "markdown web" with less complex clients?

Actually this was my first idea to build this. I wanted to convert all html to markdown and back, so that it's easier and cleaner. The issue I realized is that most markup and meta information that comes with a website is lost in markdown (or commonmark), and layouting sometimes implies structure, too - due to how websites in wordpress (or any user-friendly CMS) are being built.

Code-wise you usually cannot imply meaning by only looking at HTML, sadly, that's why I switched to a "filtering proxy-like" approach, whereas the Browser UI simply receives the upgraded, clean HTML, CSS (and webfonts or other assets).


This is a subject I've been fascinated with recently. The web isn't nearly as good as it could be at gathering, networking, and assimilating information.

I feel that one key aspect of something like this would be the ability to annotate anything on any page you stumbled across, and to navigate between all your annotations in a cohesive manner.

I'm excited to see what you make!


Hypothesis was working on web annotation, https://web.hypothes.is/about/


Thanks for the detailed response.

> (I was also working on project(s) that were using DAISY to automatically convert websites into hearable formats to be consumable by blind people.) Somehow from then (around 2000ish) to now, everything went to shit and nobody cares about that aspect anymore.

Yes, it's tragic that you could seamlessly compose streaming audio, video & text from multiple servers using an SMIL _text file_ in early 2000s, but it's all gone now.

Yet we now have large markets of broadband-connected humans with countless hours spent in front of streaming media (including video conferences) that they cannot annotate, inspect or compose. Then people wonder why they are "exhausted" after hours of Zoom meetings via powerless blackbox client apps.

There's still a tiny bit of standards activity on sync of A/V content with web text, part of the upcoming fusion of epub & the web, aligned with Google's "Web Packaging" that will enable a fully-offline internet with signed content (can of AMP worms).

https://www.w3.org/AudioVideo/Activity https://www.w3.org/community/sync-media-pub/

> so that in future my neural net adapters can optimize old HTML code into new, clean, HTML5 code.

This is exciting work. Apple has a powerful ML/AI chip on recent iPhones, likely to be used for image processing and augmented reality annotation of live video. It would be nice to apply this silicon power to the semantic ambiguity in real-world human use of markup languages.

We need an alternate timeline fork of the security aesthetic of CSS "user" vs "publisher" stylesheets, which at least tried to formalize the inherent social/power/finance conflicts between stakeholders in the web content rendering pipeline. Of course, we've since added identity, device fingerprinting, keystroke timing and countless other minutiae to the arms race. But the fundamental need for separation of powers will never go away.

Many users have powerful silicon on their devices, but today it is rarely employed in defense of "user" stylesheet/reality parsers. The proxy architecture you are developing could be combined with fully-private "user" datastores, of the kind harvested today without consent, but instead customized by the user for their own objectives, with data always in their physical control. With local personalization and ML-powered disambiguation, the unfair playing fields could be tilted a little towards local autonomy.


> But the fundamental need for separation of powers will never go away.

... and I think that this was actually the job of web browser engineers, and they failed to do so. I kind of like where Brave is going to be honest, though I do not think that an optional approach will make a change. We've been there, a lot of times, and nothing will be changed if we don't force the industries to.

Honestly currently the only Browser that is doing the right thing when it comes to privacy policies of third party cookies is WebKit/Safari [1] [2] [3] as Apple has the leverage to enforce it via their iOS market share.

Firefox/Mozilla currently is too concerned about breaking things and Chromium is a bad privacy joke outside of Ungoogled Chromium.

> The proxy architecture you are developing could be combined with fully-private "user" datastores, of the kind harvested today without consent, but instead customized by the user for their own objectives.

Exactly ;) Can't talk about this more (for now as my startup idea has to stay under the radar until Q3 this year) but I think you've figured out what I want to do with this concept.

- [1] https://webkit.org/tracking-prevention-policy/

- [2] https://webkit.org/blog/8311/intelligent-tracking-prevention...

- [3] https://webkit.org/blog/10218/full-third-party-cookie-blocki...


Thanks for the discussion, looking forward to using your work! Brave on iOS is interesting because it combines underlying Safari browser code with Brave's policy UX (e.g. per-site JS controls).


> we can have a "markdown web" with less complex clients

You might want to check out the Gemini protocol[1].

[1] https://gemini.circumlunar.space/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: