The Juniper libxo [1] library is an interesting approach which preserves the traditional text output but also gives an option to output structured data:
The libxo library allows an application to generate text,
XML, JSON, and HTML output using a common set of function
calls. The application decides at run time which output
style should be produced.
There is some work ongoing to convert FreeBSD utilities to support libxo [2] and a long thread on freebsd-arch [3]
I've just started working with the new AWS CLI, and while in the long term I will probably switch to using boto directly from Python, for now I process the CLI's JSON output in bash scripts. Using structured data in this way is really quite interesting. Because it isn't sensitive to whitespace or newline mangling, the JSON output can safely be stored in an environment variable a la Perl (e.g., ```VAR=$(aws ... -output json)```). Rather than hacking up some rudimentary parsing using sed/awk/etc., I can pipe it to an existing parser (e.g., jshon), which will extract the data I want with minimal effort. I think that if I had an objection to XML, it isn't because it's structured but because it's so verbose relative to simpler serialization formats like JSON or YAML. Then again, I don't know why we even bother with those when we're just a tiny bit of syntax away from sexps. :)
I've also found the utility called jq [1] to be a great help working with JSON data on the CLI, such as from the AWS CLI. It's great at parsing and selecting output, or performing simple transformations.
If you're looking to work with structured, hierarchical data from a CLI, this is where PowerShell really shines. Its OO model is a really excellent fit for Windows, and all of the common utilities in PowerShell are designed around this model. And it solves this problem much more elegantly than trying to massage XML through UNIX-style text processing utilities.
On the one hand, I think it would be interesting to see a truly OO shell for UNIX, but on the other hand I don't know if that's really necessary for a UNIX system.
Unfortunately in practice I've found powershell hasn't been much of a panacea. You give up all the problems of parsing text, but now have new problems dealing with tools that don't output objects and getting them to do so.
I used powershell for some moderately complex workflow & reporting stuff but at the end was kind of on the fence about if I should have just scrapped it all and done it with traditional tools. It really felt like powershell made hard things easy but easy things hard. I don't think I would use powershell again unless there was absolutely no other choice.
I was all about Powershell, and then a coworker called it a gimped Python. I realized nearly everything I'd ever 'solved' in Powershell was a triviality in Python, and nearly every api and utility that was good in powershell would have been just as well off with a python api. Not to fault Microsoft, it was a huge step forward, but I'd love if they made bindings easy and deprecated, now that they've embraced open source.
Sadly, deployment of Python (or Ruby, Perl, etc) is horrific pain at scale.
If you are automating a large organization -- Powershell has one massive edge, already deployed and able to be managed sanely via the domain management tools with a permission model around it.
Some large (30k+ machine) organizations actually have fairly decent sized research projects on best ways to automate deployments and system management. It ends up being far trickier than it appears on the surface and any tool blessed by MS and shipped by default has a huge edge. Powershell might not be great -- but it is already on the machine and the competition is batch files.
Wow that's a great way to describe it. Funny because all the workflow and reporting stuff I've done since then has been in python and has been an absolute joy & breeze to build compared to powershell.
Powershell is nice if you are a sysadmin--it beats the pants off of cmd.exe or clicking around wizards. It's very nice that it has real data structures like arrays and hashmaps, and I found JSON processing to be much better than using bash+jq.
That said, I'm not sure I would ever want to implement much in it other than typical systems automation stuff.
I find that the more I use powershell the more it makes sense. If you take the time to do things the powershell way and get over the fact that it isn't UNIX, then its insanely productive.
That being said I'm with you on your last statement. Jeff Snover has always said that Windows benefits from PS being OO because its configuration is API oriented. PS brings the kind of consistency to Windows that has typically been available in UNIX.
Where are the UML diagrams and how many sprints will it take to implement all the tools? Has an architect approved all the design patterns that will be used to implement them?
On a serious note though, wouldn't XML sed and awk just be XSLT transforms?
XSLT is rather verbose compared to typical sed/awk syntax.
While it's not quite sed/awk (there isn't a programming language per se, just simple edit/select commands) xmlstarlet can do some of the things to an xml document that you might do to a text document with sed and uses xslt internally.
If Unix's commandline were critiqued as a REPL (and its commands as functions/procedures), problems like impoverished datatypes would be more commonly considered.
isn't this a bit hard pushing the shell prompt? Shell's intention is not being a repl or programming environment. It's intended to command basic io operations. (At least whenever it was invented) It has programing like constructs because, that helps while automating.
And because of this, unices come with at least one accompanying higher order scripting language like perl or python.
My problem with the shell is that the output of commands is often irregular. You never know what you're going to get. Newlines? Maybe... CRLF? Tab delimited? And it often varies between operating systems and environments. I'd much rather every command dumped XML or JSON that I could parse.
It's hard to consider clear text to be the best option when you're trying to parse file names. What tool will properly generate your delimiters? How do you get your tool (or bash loop) to parse on that delimiter?
Filenames with spaces (or quotes, or unicode characters, or control characters, or...) have wreaked many a seemingly ideal program which needs to iterate over files.
The shell itself is a programming language, it has construct and everything because well, it is a programming language.
I am not a fan of the bourne shell syntax but you can write fairly large programs on it, and the idea itself of using other programs and piping it at the core and heart of unix.
Perl is tied with any modern unix for historical reasons, but it has been criticized for not really following the principles of unix really well.
Yes, heavily-adopted things are often pressured to grow to their logical conclusions. One logical conclusion of a commandline+commands is REPL+procedures.
One critique of the worse-is-better [1] philosophy (Unix), is the design starts cracking more as it's pushed to scale to its logical conclusion.
Of course, maybe the worse-is-better essay is inapplicable (and the author himself argued both sides under a pseudonym or two). And good design is a matter of degree; we could all critique the-right-thing representatives. But here I think it's a useful model to describe what's going on.
In this case, we should also argue about, should software evolve into their any of it's logical conclusions?
I'm a cli ninja (at least for many tasks I need to do daily for a measure.) even though I nearly never ever need it's expanded use cases like arrays or process substitution. For example the latest shellshock was the result of an extended use case where implications of it was not well thought.
"Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can." - Look at the samples: Emacs, MATLAB, Mozilla, Opera, Trillian, and Drupal. Really?
I wouldn't say it's a non-problem, but I agree that there is a certain elegance to line-oriented text processing. It's the same design problem as choosing a data structure; sometimes you can get by with a list, but some problems benefit from non-linear data structures.
Problem is not xml, json or any other non linear data structure. It's trying to use shell to manipulate data that it's not intended to.
Shell will always have problems with handling bloated/overheaded data structures. In these situations using intermediary higher level languages would help.
PS: I'm sorry my first post is unintentionally against new rule. I just realized.
I would say its a non-problem. It's not hard at all to grab columns using awk, do replace using sed and find using grep. Programs that output plain structured text are simple and versatile.
No, it's a solution for a quite big problem (having structured data), that people sidestep and deal with subpar tools and text-mangling, mainly because it would take tons of coordination, effort and work to change the situation.
>It was not a problem for nearly 45 years since epoch.
What I mean is, this sentense you wrote is content-free. What does it mean "it was not a problem"? That they managed to do and be productive without it?
In the same sense not having computers was not a problem for nearly 2 millenia since Christ. They've managed to get by without them.
Not having something is not a problem in general, when you don't know what you're missing. You only see the problem when you compare the faster and more productive workflow and the possibilities you'd get if you had something compared to coping without it.
Which is how inventions are made, compared to "well, what we got works for us".
>Shell is not intended for high level data processing.
That's a limitation of the shell, that could very well be bypassed, not some physical law.
One could imagine shells very well intended for high level data processing (and in fact people have built some of those -- heck, even the Lisp Machines qualify).
Agreed. If you want to iterate over blobs of Unicode text which contains a lot of various formatting (such as file names), then you absolutely need something other than tabs, commas, and spaces to delimit elements.
Even -0 fails with some file contents due to null bytes embedded in Unicode characters.
It's really too bad that there isn't a standard binary data structure format that we could process the same way we process stdin/stdout/stderr. Stumbled onto BSON the other day, which allows MongoDB to store binary JSON:
The main issue is that it doesn’t have arbitrary precision like, say, OpenTNL. That could probably be overcome somewhat with compressed streams.
It would also be nice to have a leaf-first format, or hints for depth-first or breadth-first traversal so that the stream could be processed as it’s received.
These are really old ideas so maybe I just haven’t stumbled onto a solution.
That was my first thought, too. Why do people forget about good old standards?
There are so many attempts to do Binary JSON, Binary XML, Binary YaML, ... whatever. Followed by language-specific serialization formats like Python's Pickle, php-serialize, and so on.
What a huge mess!
If any binary format wants to become a universal competitor to those, it:
- should offer some big advantages over the already established ASN.1, and
- should be significantly smaller and faster than compression of human-readable formats (e.g. json+gzip or xml+xz)
XML, as a medium of transmission of structured data, pales when compared to basically any other commonly used alternative in 2015, in ease of use, readability and available tooling.
Now, I enthusiastically agree with the underlying ideal of having programs communicate on the shell through data structures instead of through eight-bit bytes. Let's do that, be it XML or anything else.
I am not the one to promote XML, far from it, but the tooling for it is superb. Like, recently, we had to connect our (Java) app to NetSuite web service. All it took is to point to their web service endpoint, and the tool created strongly-typed (Java) client classes and proxies.
That is in Java, and it works with C#, too (Visual Studio). Provide the endpoint, VS reads the WSDL exposed by the service, and generates all the required classes and proxies.
And at all times, you see what functions the web service is exposing, and what parameters you need to send, and what kind of response you will get.
Your complaint is about people who don't know what XML is and how to use it and broken tools, not XML. XML does not cause any of the examples you show.
I'm genuinely non-snarkly interested in the other commonly used alternatives your mentioning, in addition to json, anything else you recommend looking into?
I'm really digging YAML as a serialization format. I haven't ever used it with web applications, so I don't know what kind of tools might be available for that.
TOML and YAML are nice as human-readable configuration formats; Protocol Buffers/Cap'n Proto are really good if the schema of your data is strongly defined.
[1] https://juniper.github.io/libxo/libxo-manual.html
[2] https://wiki.freebsd.org/LibXo
[3] https://lists.freebsd.org/pipermail/freebsd-arch/2014-July/0...