Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> something HTML-based

Instead of HTML you can use XML to write structured documents. With XML you can define whatever custom tags you need with schema validation if desired. The advantage of this approach is you have full control over the input schema and output, but the downside is XML is much more syntactically noisy compared to Markdown or RST and you will need a script to parse & convert the XML to your preferred output format(s).



> Markdown or RST and you will need a script to parse & convert the XML to your preferred output format(s).

XSLT will happily output any number of formats from an XML doc. There's also a number of existing and very capable XML schemas for documentation like DocBook. You can even go from a lightweight markup like RST or Markdown to DocBook and then have an XSLT pipeline that delivers that intermediary to any number of formats. There's also very capable graphical editors that will spit out DocBook directly.

I find the aversion to XML very strange. It was definitely applied in some places poorly by products with "Enterprise" somewhere in the name but the language itself is very useful. The aversion to XML has led to a lot of work re-learning of the lessons that led to some XML features. It's also led to implementing XML features poorly in other languages.


You're not alone. I find XML to be massively more readable and flexible. I work in the macOS/iOS IT space, and our bread and butter is the Property List, which tends to either actually be XML underneath or is at the very least rendered like XML, all just conforming to the plist DTD (which is pretty short). These plists that conform to this fairly simple DTD are enough to describe every init task and background process to the OS, every preference in applications that use UserDefaults and CFPrefs (which is most), and I use them for storing structured data in and out of scripts that I throw together.

Loads of other systems I work with still output XML, and I have fast, powerful tools to work with them pre-installed on any Mac I work on, and it doesn't even matter that Apple hasn't updated them in a while since XML hasn't really been changed either. I can look at an XML doc and have so much information about the data inside just telegraphed into my head, just by the way the things are shaped. I don't have to read the tags like a book, I can just gloss over them and still get enough to understand the strings/numbers inside. Yet, they're still on the page, out of focus, in my periphery, providing anchoring and structure and navigation, I can find my place. Maybe I'm just a freak, or that comes from 20 years of staring at them. Whatever reason, I just can't do that with JSON as easily or as fast.

I don't hate on any of the other formats (save for YAML, not interested), and I really appreciate JSON's brevity sometimes for simple data and small responses, but that brevity quickly turns into wisps of formatting and text seemingly floating out in the middle of nowhere.


> I don't hate on any of the other formats (save for YAML, not interested), and I really appreciate JSON's brevity sometimes for simple data and small responses, but that brevity quickly turns into wisps of formatting and text seemingly floating out in the middle of nowhere.

Same. Every back end project I work on I recommend XML as the transport and every time I get a reflexive "ugh, XML". Then I spend an inordinate amount of time making Swagger endpoints so a consumer can figure out how the JSON output is supposed to behave.


> the language itself is very useful.

It really isn't. Both namespaces and schema cause far more problems than they solve. And the theoretical advantages of having a declarative language for expressing transformations (and schema) is outweighed by how awful it actually is to write XSLT (and schema).


People hate namespaces wherever they are used, but they are one of the keys to "programming in the large". The first year after Java came out the official docs for namespaces weren't complete and I had to go to a web page at NASA to understand exactly how they work, particularly in the strange case where you don't include a package statement -- people got used it though. (I'll make the case that Java was a software reuse revolution not because it was OO or had a particular implementation of OO but because of getting a bunch of "little" features right such as namespaces and memory management.)

If we had namespaces for classes and id's in CSS we'd be able to transclude content from one HTML to another HTML page easily but we don't so we can't. (Shadow DOM helps a bit though) The "hygenic macro problem" is one of the many barriers people face doing metaprogramming. In the RDF world it's refreshing to be able to import facts from multiple systems into multiple namespaces so you can put data you got from different sources into the same database and "it just works".


A certain amount of namespacing is useful, but the XML model where you can't even write a hello world xpath query until you've set up all your namespaces and used them in your query is something that clearly does more harm than good. Java is I think widely recognised as having gone too far on the namespacing front; most post-Java languages have a much less strictly hierarchical namespacing model and I don't think any other package registry has adopted the fully nested style that Maven does. And even Java's heavyweight namespacing is much easier to work with than XML's thanks to things like auto-import in IDEs.

I don't think namespaces solve the macro hygiene problem, because if you're going to have macros then they need to be able to interact with and manipulate your namespaces, and sooner or later you need your macro to e.g. generate a fresh identifier for use within an existing namespace and then you're right back where you started.

Being able to disambiguate when you have one document including another would be useful. But the cost/benefit on XML-style big namespacing up-front just doesn't stack up.


Yeah I always liked XML as a data format and combined with XSLT it is quite powerful.


What I really like with XSLT is what you can do with user-defined functions. Not only are they helpful in implementing transforms but you can also write stateful functions that insert rows into a database or something.


Pro-tip: use SGML which has it all:

- is a superset of XML (XML is derived from SGML as a simplified, proper subset)

- can parse HTML precisely with all bells and whistles such as tag omission, enumerated attributes and other shortforms (which are based on SGML after all)

- can define markdown syntax as an SGML SHORTREF customization (note that SGML can not cover all of markdown syntax; for example, defining reference links somewhere at the end of a document and expecting SGML to pull a href URL into the place where the link is referenced won't work with SGML SHORTREF)

The combination of markdown-as-sgml and inline HTML or HTML blocks is particularly nice and predictable, and also informs further customization such as using entities (text variables), custom elements, and other advanced SGML stuff with markdown that comes up frequently as a requirement.


Once you get that far, XSLT is waiting for you if you need to render XML to HTML.


Yes, waiting in the dark, with teeth.

(I kid, mostly. It was a mild pain in the projects I used it for, and I've heard much worse horror stories than mine.)


My take is that XSLT comes from a different universe where Prolog became a mainstream programming language.

That is, XSLT is based on pattern matching rules that most people find strange and unfamiliar. It wouldn't be so strange to people if this kind of system was more widespread, but as an island that's different from everything else I think people struggle to wrap their heads around it.


I don't know. I liked the pattern matching, it was more the XML aspects that ended up not being worth it.


Same. I've used it for small projects, like styling an RSS feed. I imagine it gets tricky fast in anything larger.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: