I didn't mean it as a "Hey GPT: format the following text neatly", rather it could learn to produce a static program that takes ugly looking pages and formats them to look professionally published. It wouldn't really be operating on text, but instead changing font size, column width, and other css type variables.
That latter thing is a much harder cat and mouse game though. Is the proposal to use the rendered output as feedback to help the LLM converge faster than the people already making de-shittified UIs? If you can't use the actual content of the page to assist the AI then that seems like a very hard problem.
How do you deal with the text being distributed mixed in many JS files, half only being delivered over the wire after you click a button, "text" being displayed as a nested div to generate some sort of formatting (especially when the HTML for such a monstrosity doesn't fit in a single context window), ...?
you could just render it as if it were being shown to a person. Thats where the fuzzy logic of AI would come it. It could be trained to identify what an article looks like, based on the layout and size and other inferences (that it figures out, hence AI).
For a while now, I've thought of the idea of outsourcing this to a 3rd world sweatshop. Basically pay people to click on the scummiest ad-loaded pages all day, saving copies of just the content, and re-hosting it as, say, web 1.0 content, text and pictures, nothing more. Whether they used copy-paste, or just "save page" and then pipe it through another program, who cares. just extract the content and host it as web 1.0 that would load super fast but maybe have the same font or formatting as the original.