I tried to use Dillo on some news websites (like bbc.co.uk or rte.ie) and they d...

gosub100 · 2024-12-16T20:01:01 1734379261

I honestly don't throw this phase out there lightly so please don't roll your eyes, but this would be an ideal application for AI. "See all this neatly formatted text? Yeah do your deep learning thing and make this ugly text look like this neatly formatted text. Bonus if you make all the adds look like blurry blobs but always show images that are part of the article"

hansvm · 2024-12-16T20:58:52 1734382732

I could see that working if news sites were considerate enough to ship little enough text to fit in the LLM context window. Does anybody support 10M+ tokens yet?

gosub100 · 2024-12-16T21:24:40 1734384280

I didn't mean it as a "Hey GPT: format the following text neatly", rather it could learn to produce a static program that takes ugly looking pages and formats them to look professionally published. It wouldn't really be operating on text, but instead changing font size, column width, and other css type variables.

hansvm · 2024-12-16T21:59:12 1734386352

That latter thing is a much harder cat and mouse game though. Is the proposal to use the rendered output as feedback to help the LLM converge faster than the people already making de-shittified UIs? If you can't use the actual content of the page to assist the AI then that seems like a very hard problem.

gosub100 · 2024-12-17T17:10:42 1734455442

It would train AI to recognize an article's text, and discard the ads. Then format the text neatly, like a newspaper used to be.

hansvm · 2024-12-18T02:57:17 1734490637

How do you deal with the text being distributed mixed in many JS files, half only being delivered over the wire after you click a button, "text" being displayed as a nested div to generate some sort of formatting (especially when the HTML for such a monstrosity doesn't fit in a single context window), ...?

gosub100 · 2024-12-19T12:38:46 1734611926

you could just render it as if it were being shown to a person. Thats where the fuzzy logic of AI would come it. It could be trained to identify what an article looks like, based on the layout and size and other inferences (that it figures out, hence AI).

For a while now, I've thought of the idea of outsourcing this to a 3rd world sweatshop. Basically pay people to click on the scummiest ad-loaded pages all day, saving copies of just the content, and re-hosting it as, say, web 1.0 content, text and pictures, nothing more. Whether they used copy-paste, or just "save page" and then pipe it through another program, who cares. just extract the content and host it as web 1.0 that would load super fast but maybe have the same font or formatting as the original.