Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

XML is a horrendous technology.

JSON was a much better choice. YAML would have been better still (it's more human readable), but both beat XML hands down in any situation.



XML was horrible for data interchange, configuration, and all the other crap people used it for.

It's one of the best tools for marking up content. It's a Markup Language! If you've never used XSLT you really should look at what it can do. Because I can transform an XML resume into something gorgeous without writing too much programming logic.

It's the reason so many technical book publishers use XML. Easy to transform content while also describing that content.


It is horrible for markup too. It is far, far, far too complex and far more verbose than it needs to be. These are the hallmarks of a bad language (or metalanguage).

It is absurdly popular in bureaucratic environments. Bureaucrats adore its complex rules, ability to create arcane standards and illusion of openness. It's also good for job security. However, none of that bodes well for the rest of us.

I have used XSLT and I am absolutely appalled that something like that can even exist. It's actually worse than XML. It is a turing complete language (this means that you can make it as complex as you like, and boy oh boy does THAT happen), but its readability is AWFUL.

There is a reason we have other general purpose languages turing complete like python or javascript or even java and that is because, while they may have some warts, they have had a lot more thought put into them in order to help make them more consistent, better structured and readable than similarly committee-designed / thrown together languages like XSLT.

I can use a normal programming language transform a JSON resume into something gorgeous and I GUARANTEE it will be lot easier to read and modify than your XSLT equivalent.

Sorry to rain on your parade, but it's true.


> It is horrible for markup too. It is far, far, far too complex and far more verbose than it needs to be.

Oh stop. XML syntax is freaking simple. <tag>content</tag>, <tag><child>...</child>...</tag>, or <tag attribute="value" ...></tag>. Toss in namespaces for composing other formats _if need be_ (and you don't have to use them). And... you're done for 99.9% of any use case you can think of. You have to repeat yourself when you close a tag. Whoopdedoo. Any decent editor will close them for you anyway. I think the "XML sucks" crowd are conflating very complex XML _formats_ with XML _itself_ and deciding that XML is ridiculously complex and verbose because e.g., SOAP is complex and verbose.

Just curious, do you hate HTML for similar reasons as well?


You've forgotten:

* DOCTYPE

* Namespaces and namespace collision handling.

* The likelihood that you will receive badly formed XML and be forced to parse it anyway.

* The multiple different ways of escaping characters.

* Language identification

...I could go on.

Oh, and your parser has to handle ALL of this. It's no good assuming that "since the basics are fairly basic it doesn't matter". It doesn't matter only when you're generating XML, but somebody SOMEWHERE is going to have to consume it.

Wanna know why it's typically possible to blow up an XML parser with a memory leak or a buffer overflow but not a JSON parser? This complexity is why. Just using it creates an increased attack surface for your internet facing API.

>Just curious, do you hate HTML for similar reasons as well?

I'm not its biggest fan, and I feel like it could have been done a lot better. However, A) there isn't much alternative so bitching about it it somewhat pointless and B) at least I never have parse it (want an exercise in frustration some time? try doing THAT...).


> You've forgotten:

I was mentioning the "99.9%" use case of XML for e.g., a resume format. You don't have to specify DOCTYPEs, namespaces, etc.

> The likelihood that you will receive badly formed XML and be forced to parse it anyway.

You're supposed to fail on badly-formed XML!

> Oh, and your parser has to handle ALL of this.

So? I don't have to write an XML parser because there are numerous well-tested ones already written. I wouldn't write a JSON parser for the same reason.

> Wanna know why it's typically possible to blow up an XML parser with a memory leak or a buffer overflow but not a JSON parser?

First of all, you shouldn't speak in absolutes. You certainly could write a JSON parser that blows up for the same reasons. There's nothing inherent in JSON that makes it impossible to write a bad parser.

> at least I never have parse it (want an exercise in frustration some time? try doing THAT...).

For anything more than a trivial format you can automatically have the parser bind the content to a POJO (or your language's "plain old object" analogue).


I have to agree. XML can be complicated, or can at least look complicated when using lots of namespaces and what-not, but for most purposes it's dead simple and about as easy to understand as anything else.

Truth be told, I find XML easier to read than JSON in many cases. shrug


Sorry to rain on yours, but XML isn't bad. The way people used it is bad.

Just because you were forced to use bad schemas in enterprisey situations doesn't mean XML is bad. SOAP is not XML. Spring MVC is not XML. Those things use XML.

Docbook's pretty easy to follow. That, in my opinion, is a great example of XML+XSLT.

With XML I can do this:

<experience> <title>Manager</title> <company>Veridian Dynamics</company> <description> Responsible for doing all manner of things. </description> </experience>

It describes exactly what we're looking for in the same way as the JSON variant does. My editor will auto-close those tags the same way my editor will auto-close the closing curly braces in the JSON version.

Of course, we could make it suck if we just go ahead and add all sorts of extra rules and hierarchy and turn it into a monstrosity.

But I'd rather not.


No, it IS bad and the complaint that 'people use it badly' is simply an excuse to cover for its deficiencies. In many ways, its deficiencies (which YAML/JSON do not have) are an incitement to use it badly, but that's far from the only problem with it.

I am not complaining about being forced to use bad schemas in enterprisey situations, either. That is merely the icing on the cake of the badness that is XML.

I am complaining simply because there is not one instance I have ever come across in my entire programming career (12 years now) where XML would have served me better than an alternative (assuming I had a choice). Zero.

For example:

>With XML I can do this: > ><experience> <title>Manager</title> <company>Veridian Dynamics</company> <description> Responsible for doing all manner of things. </description> </experience>

Yes, you can, but so what? You can do that in YAML and JSON too. Furthermore, your YAML/JSON parser is 10x less complex, faster, less likely to have buffer overflow security issues and less likely to have memory leaks.

Furthermore, while my app that consumes this markup will probably have to treat <experience><title>Manager<company>Veridian</experience> as valid, because consuming badly formed XML is standard and expected (your product manager will probably treat not parsing it as a bug), my app won't EVER have to treat invalid JSON as valid.

>Of course, we could make it suck

Yes, and there's a 90% chance somebody will. Whereas with JSON and YAML the chances of it being made to suck are far lower, because so many of the unnecessary avenues which are used to make XML suck are closed off in YAML.


I don't have to do anything with XML that I don't want to. I don't have to accept XML that doesn't conform to my schema. You're talking web services again and I'm talking about resumes. You're talking general and I'm talking about a specific place, publishing content, where XML shines. See previous comments - I use XML daily in book publishing. It is not anything like what you're making it out to be.

XML is a markup language. JSON is a serialization language. YAML is a serialization language too. I want to semantically mark up my content. And this is very different than application programming with XML.

You want to write a book with Docbook's XML? Better give it exactly what it wants, or it will reject it. It's not SOAP.


The way I see it we actually want to serialize a datastructure. We have many potential use cases not just presentation. If I download a thousand resumes I'm not as interested in viewing them as documents as I am in consuming them as datastructures, doing filtering / machine learning / ranking / ..

The way I look at it this is as much (and probably way more) a platform to write services on top of as it is a another hole to throw resume.xml down. And for that, json is undeniably better suited.

Nobody is trying to convince you to publish books in json.


Give it time. As long as JSON is "the answer", people will build more layers of stuff on top of it.

Just like XML.


>I don't have to accept XML that doesn't conform to my schema.

Yes you do, because otherwise people will treat your application as buggy. People have tried doing this with browsers and it never, EVER, ever works. If one app accepts malformed markup, they expect every other one to do it as well and no amount of preaching about the sanctity of well formed markup is going to change their mind.

>You're talking web services again and I'm talking about resumes.

If resumes are going to be done in markup it's so they can be consumed by web services. There is no point in having a machine readable language if it's not going to be read by machines.

>You're talking general and I'm talking about a specific place, publishing content, where XML shines.

I would rather use LaTeX for publishing content. It's complex, but it can handle layout beautifully.

Docbook is complex and it can't: http://www.docbook.org/tdg/en/html/part2.html

For simple markup, I would rather use markdown. Markdown in YAML would probably be a better solution than docbook, but docbook's the standard now, so not much you can really do about that.


Hard to not see you're way too stubborn.

> Yes you do, because otherwise people will treat your application as buggy

You are mistaking HTML/Browser rendering and XML parsing. An application that parses XML for book rendering, or whatever the hell the purpose of the application is, will fail on invalid input, whether it is because of invalid tag nesting or schema invalidity.

That is NORMAL, and you would do the same with JSON, so that's not because of history of HTML and IE/Netscape/whatever that you need to change your mind about XML.

> If resumes are going to be done in markup it's so they can be consumed by web services

So if a web service was to accept a specific type of XML, it would certainly need to be made valid, if a client does not conform why the fuck do you want your service to be permissive? Seriously.

You can make a service that accepts the resume schema directly, or embed it in a CDATA if you want to.

> I would rather use LaTeX for publishing content. It's complex, but it can handle layout beautifully

Talking about apple and oranges. XML and docbook are not the same.


>You are mistaking HTML/Browser rendering and XML parsing.

I am not mistaking, I am comparing.

>An application that parses XML for book rendering, or whatever the hell the purpose of the application is, will fail on invalid input

Which will be treated as a bug if other applications DON'T fail (which some undoubtedly will).

>That is NORMAL, and you would do the same with JSON

In my experience the OPPOSITE is normal. If XML is invalid but still makes sense, you treat it as valid, because if you don't, your competitors will.

If the customer is getting XML from output X and feeding into your app, he doesn't care that output X is outputting invalid XML if your competitors' apps accept it. He'll just treat your app as the broken one, not buy it and move on with his life and you've just lost a sale. I've seen it happen MANY times. This phenomenon does NOT just apply to browsers, it's just more aptly demonstrated with browsers. NOBODY wants a browser that doesn't accept valid HTML. They never did and they never will.

And what's different about web pages to any other kind of XML? There's more of them, that's all, and people want ALL of them to work. Permissiveness in the consumers is inevitable with ANY kind of XML document that becomes popular for exactly the same reason as it is with browsers.

>that's not because of history of HTML and IE/Netscape/whatever that you need to change your mind about XML.

Yes it is because this sort-of-invalid-but-not-quite problem is a unique one to XML. JSON doesn't have it.

>So if a web service was to accept a specific type of XML, it would certainly need to be made valid

If you don't think somebody would throw invalid XML at your app and not expect it to work then you've clearly never worked with XML seriously.

If you think you can just tell the customer to go fuck themselves because the XML they got from another service doesn't validate, see my previous point.

>if a client does not conform why the fuck do you want your service to be permissive? Seriously.

Because it will make your service work where an impermissive one won't, and when your service works and competitors don't, that's called gaining marketshare.

Seriously.

>You can make a service that accepts the resume schema directly, or embed it in a CDATA if you want to.

Or you could just use JSON or YAML.

>Talking about apple and oranges. XML and docbook are not the same.

docbook IS XML.


> I am not mistaking, I am comparing.

Yes, you are mistaking it. HTML is "relaxed" about various parsing errors. XML absolutely is not. If the input document so much as has a closing tag missing, parsing will fail, end of story. If it doesn't match your schema, parsing will fail, end of story. I'm not sure why you keep going on about this point. You're woefully misinformed if you think XML parsers accept malformed input.

> Which will be treated as a bug if other applications DON'T fail (which some undoubtedly will).

Are you talking about "my app accepts schema A and another app accepts schema A.1, therefore if I don't accept schema A.1 people will think my app sucks"? I'm not sure why you think this is an XML-specific issue.


> I am not mistaking, I am comparing.

Yes apples and oranges. They taste different, because they are.

> In my experience the OPPOSITE is normal. If XML is invalid but still makes sense, you treat it as valid, because if you don't, your competitors will.

If this is a business requirement you would do the same for JSON.

> If you don't think somebody would throw invalid XML at your app and not expect it to work then you've clearly never worked with XML seriously.

Nope, if you have a customer sending you an XML which should validate against a predefined schema, no, hell no. I've worked for the European office of publications, and the web services were just receiving tons of XML on a daily basis. And you know what, invalid XML were rejected that's as simple as that. If you don't respect the supplied schema, retry when correct.

> docbook IS XML.

Docbook USES XML.

Last but not least, HTML is not XML, HTML is not a subset of XML. Permissiveness has nothing to do with web services acceptance of invalid schemas.

If you call a Google service whether it uses SOAP (so XML) or REST, if the request body is invalid, you'll be rejected either way. The SOAP one won't try to parse more or resolve issues just because it is XML based.


I have to agree here. XSLT failed when they added <xsl:if>. Get your flow control out of my document format. That's what code is for. Adding <xsl:if> makes it code, horrible, horrible code.


> XSLT failed when they added <xsl:if>. Get your flow control out of my document format. That's what code is for.

XSLT isn't a document format, its by design a domain-specific programming language designed for use in transforming XML documents to other formats.

> Adding <xsl:if> makes it code, horrible, horrible code.

Its supposed to be code. Now, if you want to argue that its horrible code, that's another issue, but complaining that XSLT is code when that's the whole point of XSLT is, well, missing the point.


Oh, I don't know, it's kinda handy when you need it. How often have you wished for that in CSS?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: