Hacker News new | past | comments | ask | show | jobs | submit login

I tried to use Dillo on some news websites (like bbc.co.uk or rte.ie) and they didn't render very well, which is not so much a criticism of Dillo but of the fact that even websites who main job is to display simple textual content and maybe some images have managed to complicate their UIs enough that lightweight browsers struggle to display them.

A common problem seems to be pages that display different content snippets in containers side-by-side, rather than in a more traditional list format (which Dillo seems to handle better). Another one is hidden menus (behind a hamburger button) which many lightweight browsers display by default, in list format, so that at the top of every page there is a long list down the left hand side (or in the centre) that you have to scroll past to see the actual content. I have this problem with elinks as well.

HN looks great though!




I honestly don't throw this phase out there lightly so please don't roll your eyes, but this would be an ideal application for AI. "See all this neatly formatted text? Yeah do your deep learning thing and make this ugly text look like this neatly formatted text. Bonus if you make all the adds look like blurry blobs but always show images that are part of the article"


I could see that working if news sites were considerate enough to ship little enough text to fit in the LLM context window. Does anybody support 10M+ tokens yet?


I didn't mean it as a "Hey GPT: format the following text neatly", rather it could learn to produce a static program that takes ugly looking pages and formats them to look professionally published. It wouldn't really be operating on text, but instead changing font size, column width, and other css type variables.


That latter thing is a much harder cat and mouse game though. Is the proposal to use the rendered output as feedback to help the LLM converge faster than the people already making de-shittified UIs? If you can't use the actual content of the page to assist the AI then that seems like a very hard problem.


It would train AI to recognize an article's text, and discard the ads. Then format the text neatly, like a newspaper used to be.


How do you deal with the text being distributed mixed in many JS files, half only being delivered over the wire after you click a button, "text" being displayed as a nested div to generate some sort of formatting (especially when the HTML for such a monstrosity doesn't fit in a single context window), ...?


you could just render it as if it were being shown to a person. Thats where the fuzzy logic of AI would come it. It could be trained to identify what an article looks like, based on the layout and size and other inferences (that it figures out, hence AI).

For a while now, I've thought of the idea of outsourcing this to a 3rd world sweatshop. Basically pay people to click on the scummiest ad-loaded pages all day, saving copies of just the content, and re-hosting it as, say, web 1.0 content, text and pictures, nothing more. Whether they used copy-paste, or just "save page" and then pipe it through another program, who cares. just extract the content and host it as web 1.0 that would load super fast but maybe have the same font or formatting as the original.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: