Is the Unix shell ready for XML? (2006)

_paulc · on April 4, 2015

The Juniper libxo [1] library is an interesting approach which preserves the traditional text output but also gives an option to output structured data:

  The libxo library allows an application to generate text,
  XML, JSON, and HTML output using a common set of function
  calls. The application decides at run time which output
  style should be produced.

There is some work ongoing to convert FreeBSD utilities to support libxo [2] and a long thread on freebsd-arch [3]

[1] https://juniper.github.io/libxo/libxo-manual.html

[2] https://wiki.freebsd.org/LibXo

[3] https://lists.freebsd.org/pipermail/freebsd-arch/2014-July/0...

xenophonf · on April 5, 2015

I've just started working with the new AWS CLI, and while in the long term I will probably switch to using boto directly from Python, for now I process the CLI's JSON output in bash scripts. Using structured data in this way is really quite interesting. Because it isn't sensitive to whitespace or newline mangling, the JSON output can safely be stored in an environment variable a la Perl (e.g., ```VAR=$(aws ... -output json)```). Rather than hacking up some rudimentary parsing using sed/awk/etc., I can pipe it to an existing parser (e.g., jshon), which will extract the data I want with minimal effort. I think that if I had an objection to XML, it isn't because it's structured but because it's so verbose relative to simpler serialization formats like JSON or YAML. Then again, I don't know why we even bother with those when we're just a tiny bit of syntax away from sexps. :)

jcrites · on April 5, 2015

I've also found the utility called jq [1] to be a great help working with JSON data on the CLI, such as from the AWS CLI. It's great at parsing and selecting output, or performing simple transformations.

[1] http://stedolan.github.io/jq/

AndrewGaspar · on April 4, 2015

If you're looking to work with structured, hierarchical data from a CLI, this is where PowerShell really shines. Its OO model is a really excellent fit for Windows, and all of the common utilities in PowerShell are designed around this model. And it solves this problem much more elegantly than trying to massage XML through UNIX-style text processing utilities.

On the one hand, I think it would be interesting to see a truly OO shell for UNIX, but on the other hand I don't know if that's really necessary for a UNIX system.

tdicola · on April 4, 2015

Unfortunately in practice I've found powershell hasn't been much of a panacea. You give up all the problems of parsing text, but now have new problems dealing with tools that don't output objects and getting them to do so.

I used powershell for some moderately complex workflow & reporting stuff but at the end was kind of on the fence about if I should have just scrapped it all and done it with traditional tools. It really felt like powershell made hard things easy but easy things hard. I don't think I would use powershell again unless there was absolutely no other choice.

GauntletWizard · on April 5, 2015

I was all about Powershell, and then a coworker called it a gimped Python. I realized nearly everything I'd ever 'solved' in Powershell was a triviality in Python, and nearly every api and utility that was good in powershell would have been just as well off with a python api. Not to fault Microsoft, it was a huge step forward, but I'd love if they made bindings easy and deprecated, now that they've embraced open source.

MetaCosm · on April 5, 2015

Sadly, deployment of Python (or Ruby, Perl, etc) is horrific pain at scale.

If you are automating a large organization -- Powershell has one massive edge, already deployed and able to be managed sanely via the domain management tools with a permission model around it.

Some large (30k+ machine) organizations actually have fairly decent sized research projects on best ways to automate deployments and system management. It ends up being far trickier than it appears on the surface and any tool blessed by MS and shipped by default has a huge edge. Powershell might not be great -- but it is already on the machine and the competition is batch files.

tdicola · on April 5, 2015

Wow that's a great way to describe it. Funny because all the workflow and reporting stuff I've done since then has been in python and has been an absolute joy & breeze to build compared to powershell.

_ondq · on April 5, 2015

Powershell is nice if you are a sysadmin--it beats the pants off of cmd.exe or clicking around wizards. It's very nice that it has real data structures like arrays and hashmaps, and I found JSON processing to be much better than using bash+jq.

That said, I'm not sure I would ever want to implement much in it other than typical systems automation stuff.

sudioStudio64 · on April 5, 2015

I find that the more I use powershell the more it makes sense. If you take the time to do things the powershell way and get over the fact that it isn't UNIX, then its insanely productive.

That being said I'm with you on your last statement. Jeff Snover has always said that Windows benefits from PS being OO because its configuration is API oriented. PS brings the kind of consistency to Windows that has typically been available in UNIX.

tdicola · on April 4, 2015

Where are the UML diagrams and how many sprints will it take to implement all the tools? Has an architect approved all the design patterns that will be used to implement them?

On a serious note though, wouldn't XML sed and awk just be XSLT transforms?

jeff_marshall · on April 5, 2015

XSLT is rather verbose compared to typical sed/awk syntax.

While it's not quite sed/awk (there isn't a programming language per se, just simple edit/select commands) xmlstarlet can do some of the things to an xml document that you might do to a text document with sed and uses xslt internally.

fsniper · on April 4, 2015

Isn't it a solution for a non-problem?. I wish people would let alone unix shell and clear text.

edit: new "Avoid Gratuitous Negativity" complience.

calibraxis · on April 4, 2015

If Unix's commandline were critiqued as a REPL (and its commands as functions/procedures), problems like impoverished datatypes would be more commonly considered.

fsniper · on April 4, 2015

isn't this a bit hard pushing the shell prompt? Shell's intention is not being a repl or programming environment. It's intended to command basic io operations. (At least whenever it was invented) It has programing like constructs because, that helps while automating.

And because of this, unices come with at least one accompanying higher order scripting language like perl or python.

voidfunc · on April 4, 2015

My problem with the shell is that the output of commands is often irregular. You never know what you're going to get. Newlines? Maybe... CRLF? Tab delimited? And it often varies between operating systems and environments. I'd much rather every command dumped XML or JSON that I could parse.

fsniper · on April 4, 2015

You are right about that. It would be better to have common sense in outputting. Only common sense currently is clear text.

falcolas · on April 5, 2015

It's hard to consider clear text to be the best option when you're trying to parse file names. What tool will properly generate your delimiters? How do you get your tool (or bash loop) to parse on that delimiter?

Filenames with spaces (or quotes, or unicode characters, or control characters, or...) have wreaked many a seemingly ideal program which needs to iterate over files.

fsniper · on April 5, 2015

I did not mean that it's best option. I meant current defacto command line minimum is cleartext.

zer0rest · on April 4, 2015

The shell itself is a programming language, it has construct and everything because well, it is a programming language.

I am not a fan of the bourne shell syntax but you can write fairly large programs on it, and the idea itself of using other programs and piping it at the core and heart of unix.

Perl is tied with any modern unix for historical reasons, but it has been criticized for not really following the principles of unix really well.

fsniper · on April 4, 2015

Yes it's full fledged programming language right now but it was not intended to be. It grew to be one.

calibraxis · on April 4, 2015

Yes, heavily-adopted things are often pressured to grow to their logical conclusions. One logical conclusion of a commandline+commands is REPL+procedures.

One critique of the worse-is-better [1] philosophy (Unix), is the design starts cracking more as it's pushed to scale to its logical conclusion.

Of course, maybe the worse-is-better essay is inapplicable (and the author himself argued both sides under a pseudonym or two). And good design is a matter of degree; we could all critique the-right-thing representatives. But here I think it's a useful model to describe what's going on.

[1] http://www.jwz.org/doc/worse-is-better.html

fsniper · on April 5, 2015

In this case, we should also argue about, should software evolve into their any of it's logical conclusions?

I'm a cli ninja (at least for many tasks I need to do daily for a measure.) even though I nearly never ever need it's expanded use cases like arrays or process substitution. For example the latest shellshock was the result of an extended use case where implications of it was not well thought.

"Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can." - Look at the samples: Emacs, MATLAB, Mozilla, Opera, Trillian, and Drupal. Really?

http://en.wikipedia.org/wiki/Jamie_Zawinski#Zawinski.27s_law...

jez · on April 4, 2015

I wouldn't say it's a non-problem, but I agree that there is a certain elegance to line-oriented text processing. It's the same design problem as choosing a data structure; sometimes you can get by with a list, but some problems benefit from non-linear data structures.

fsniper · on April 4, 2015

Problem is not xml, json or any other non linear data structure. It's trying to use shell to manipulate data that it's not intended to.

Shell will always have problems with handling bloated/overheaded data structures. In these situations using intermediary higher level languages would help.

PS: I'm sorry my first post is unintentionally against new rule. I just realized.

zer0rest · on April 4, 2015

what new rule?

fsniper · on April 4, 2015

http://blog.ycombinator.com/new-hacker-news-guideline

goalieca · on April 4, 2015

I would say its a non-problem. It's not hard at all to grab columns using awk, do replace using sed and find using grep. Programs that output plain structured text are simple and versatile.

coldtea · on April 5, 2015

No, it's a solution for a quite big problem (having structured data), that people sidestep and deal with subpar tools and text-mangling, mainly because it would take tons of coordination, effort and work to change the situation.

fsniper · on April 5, 2015

It was not a problem for nearly 45 years since epoch. Shell is not intended for high level data processing.

Also there are repls or shells intended to have common data structures for input and output. (look into powershell)

coldtea · on April 6, 2015

>It was not a problem for nearly 45 years since epoch.

What I mean is, this sentense you wrote is content-free. What does it mean "it was not a problem"? That they managed to do and be productive without it?

In the same sense not having computers was not a problem for nearly 2 millenia since Christ. They've managed to get by without them.

Not having something is not a problem in general, when you don't know what you're missing. You only see the problem when you compare the faster and more productive workflow and the possibilities you'd get if you had something compared to coping without it.

Which is how inventions are made, compared to "well, what we got works for us".

>Shell is not intended for high level data processing.

That's a limitation of the shell, that could very well be bypassed, not some physical law.

One could imagine shells very well intended for high level data processing (and in fact people have built some of those -- heck, even the Lisp Machines qualify).

amelius · on April 4, 2015

If you have ever used the -print0 option to find, and the -0 option to xargs, you know that this is not really a non-problem.

falcolas · on April 5, 2015

Agreed. If you want to iterate over blobs of Unicode text which contains a lot of various formatting (such as file names), then you absolutely need something other than tabs, commas, and spaces to delimit elements.

Even -0 fails with some file contents due to null bytes embedded in Unicode characters.

imron · on April 4, 2015

>Isn't it a solution for a non-problem?

Perhaps this is for enterprise shells, so yes :-)

zackmorris · on April 5, 2015

It's really too bad that there isn't a standard binary data structure format that we could process the same way we process stdin/stdout/stderr. Stumbled onto BSON the other day, which allows MongoDB to store binary JSON:

http://bsonspec.org

http://en.wikipedia.org/wiki/BSON

The main issue is that it doesn’t have arbitrary precision like, say, OpenTNL. That could probably be overcome somewhat with compressed streams.

It would also be nice to have a leaf-first format, or hints for depth-first or breadth-first traversal so that the stream could be processed as it’s received.

These are really old ideas so maybe I just haven’t stumbled onto a solution.

userbinator · on April 5, 2015

ASN.1 DER? It's very compact and length-delimited.

vog · on April 5, 2015

That was my first thought, too. Why do people forget about good old standards? There are so many attempts to do Binary JSON, Binary XML, Binary YaML, ... whatever. Followed by language-specific serialization formats like Python's Pickle, php-serialize, and so on.

What a huge mess!

If any binary format wants to become a universal competitor to those, it:

- should offer some big advantages over the already established ASN.1, and

- should be significantly smaller and faster than compression of human-readable formats (e.g. json+gzip or xml+xz)

slapresta · on April 4, 2015

Thankfully, no.

XML, as a medium of transmission of structured data, pales when compared to basically any other commonly used alternative in 2015, in ease of use, readability and available tooling.

Now, I enthusiastically agree with the underlying ideal of having programs communicate on the shell through data structures instead of through eight-bit bytes. Let's do that, be it XML or anything else.

carterehsmith · on April 5, 2015

I am not the one to promote XML, far from it, but the tooling for it is superb. Like, recently, we had to connect our (Java) app to NetSuite web service. All it took is to point to their web service endpoint, and the tool created strongly-typed (Java) client classes and proxies. That is in Java, and it works with C#, too (Visual Studio). Provide the endpoint, VS reads the WSDL exposed by the service, and generates all the required classes and proxies. And at all times, you see what functions the web service is exposing, and what parameters you need to send, and what kind of response you will get.

Much better then your typical REST API.

slapresta · on April 5, 2015

Hmm. Point taken, but this is more of a WSDL thing than an XML thing, is it?

wantab · on April 5, 2015

So it does everything you want, with ease, in every language, better than everything else, but you won't promote it?

carterehsmith · on April 5, 2015

True. What happened was, back in the 2000-something, XML was so hyped that people pushed XML as a solution for everything.

On the negative side, we had to endure managers coming back from “XML conferences” and recommending we ditch our

Oracle or Sybase or whatever and replace it with an XML database, because XML.

On the plus side, vendors made sure to support it so to this day we have good tooling in Eclipse, VS, etc.

Also on the bad side they have gone crazy overboard so we have security issues like this:

https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Pr...

So I still have WS endpoint but I rewrote it to manually parse XML and avoid all the crappy extensions that they put in and I cannot disable.

So yeah, this is loaded… I am not promoting XML. It has its issues. Use what you like.

wantab · on April 5, 2015

Your complaint is about people who don't know what XML is and how to use it and broken tools, not XML. XML does not cause any of the examples you show.

monkeycantype · on April 4, 2015

I'm genuinely non-snarkly interested in the other commonly used alternatives your mentioning, in addition to json, anything else you recommend looking into?

xenophonf · on April 5, 2015

I'm really digging YAML as a serialization format. I haven't ever used it with web applications, so I don't know what kind of tools might be available for that.

slapresta · on April 5, 2015

TOML and YAML are nice as human-readable configuration formats; Protocol Buffers/Cap'n Proto are really good if the schema of your data is strongly defined.

wantab · on April 5, 2015

XML has never been a medium for transmitting anything and never will, especially not structured data other than text.

agumonkey · on April 5, 2015

Out of curiosity, Google Trends (xml) https://www.google.com/trends/explore#q=xml

delbel · on April 5, 2015

Cuba is the #1 country interested in XML.

agumonkey · on April 5, 2015

Maybe they just got news of its existence.

haddr · on April 4, 2015

Same now for json. Thankfully there is jq (json path).

ynrbode · on April 5, 2015

Be very cautious of texts that start with "this is how we can" instead of "this is why we should"