The Rule of Silence (2006)

borplk · on Dec 13, 2016

To play devil's advocate, part of the reason things like The Rule of Silence are talked about is because of the messy unix philosophy of treating everything like plain text.

If structured data was embraced we would have developed appropriate tooling to interact with it in the way that we prefer.

This runs very deep in unix and a lot of people are too "brainwashed" to think of other ways. Instead they develop other exotic ways of dealing with the problem.

Oh you don't like that output? Easy! pipe that crap into sed then awk then perl then cut then wc and you're golden!

When you get tot that point you have to understand that you have chosen to ignore the fact that the data you are dealing with must be represented in something much closer to a relational database than lines of ASCII text.

Logging is another area you see the consequences of this. A log is not a line of text. Repeat after me a log entry is not a damn line of text.

"Oh but isn't it neat you can pipe it to grep?" NO! No it's not neat, maybe it was neat 20 years ago. Today I want that damn data in a structure. Then you can still print it out in one line and pipe it to grep all you want.

Another area that you see the unfortunate side effects of this philosophy is with the mess of file-based software configuration.

Yes I get it, you like your SSH session and Emacs/Vim blah blah but that's short-sighted.

I want my software configuration stored in a database not in a bunch of fragile files with made up syntax that are always one typo or syntax error away from being potentially silently ignored.

The fetish for easily-editable ASCII files and escaping from structure is holding us back. Structured data does not automatically imply hidden and inaccessible, that's a matter of developing appropriate tooling.

skrebbel · on Dec 13, 2016

In 1993, someone made exactly your argument a in Redmond board room, and so many people agreed that what you describe could be adequately called "The Windows Philosophy".

All settings in a database and not text files (the registry); a command-line that pipes data, not text (PowerShell). Tailored UIs to change settings, not magic invocations and obsure text file syntaxes.

I guess most developers on HN are also aware of the downsides of this philosophy. If not, try configuring IIS.

rtpg · on Dec 13, 2016

Surely you can reconcile structured representations and something like the Unix command line.

Imagine if the default wasn't bash, but something like Ruby + pipes (or some other terse language).

What is the argument for shell scripts not working on typed objects? How much time has been lost, how many bugs have been created because every single interaction between shell scripts has to include its own parser. How many versions of " get file created timestamp from ls" do we need?

Something Windows does get right is the clipboard. You place a thing on the clipboard, and when you paste, the receiving program can decide on the best representation. This is why copy-pasting images has worked so magically.

I could see an alternative system where such a mechanism exists for shell programs.

pjungwir · on Dec 13, 2016

Wow, the clipboard is really a thought-provoking comparison. I'm not sure if many people are quite aware of what you said, unless they've done desktop programming: when an application puts something on the clipboard, it can put multiple formats, so that when something else wants to retrieve it, it can use whichever format it prefers. This is how you get such good copy/paste interoperability between programs.

What if pipes worked the same way? What if we added stdout++, stderr++, and stdin++, and when you write to stdout/err++, you can say which format you're writing to, and you can write as many formats as you like. And then you can query stdin++ for which formats are available, and read whichever you like. And if stdin++ is empty, you could even automatically offer it with a single "text" format, that is just stdin(legacy).

The appeal of the Unix text-based approach is a kind of "worse is better". It is so simple and easy, compared to Powershell. The clipboard idea seems like it has a similarly low barrier to entry, and is even kind of backwards-compatible. It seems like something you could add gradually to existing tools, which would solve the marketplace-like chicken-and-egg problem.

You could even start to add new bash syntax, e.g. `structify my.log || filter ip_addr || sort response_size`. (Too bad that `||` already means something else....) Someone should write a thesis about this! :-)

buzzybee · on Dec 13, 2016

Didn't the Amiga do something like this? (I'm not actually familiar with its OS, I've just seen allusions to how it handled file formats)

anexprogrammer · on Dec 15, 2016

Amiga had datatypes.library, and also IFF filetypes (that were mostly lovely) designed be EA for DPaint. It's the way the world should work.

Let's say you're MS writing word for the Amiga.

They provide a datatypes description for doc files. this gives ability to read and write the format, and fingerprint it (not based on extension).

Now any program, old or new, that wants to read or write doc files can do. It's just there.

chei0iaV · on Dec 13, 2016

The problem is that now simply providing user-friendly output is not enough. For every program or script you throw together you need to provide the text output for the user, and then the type object stream for piping. And then, the user would need to read documentation to see how to access each piece of data, what data type it is, what data type the other command takes, and maybe consider how to convert one to the other.

... at which point you basically have a scripting language, so you could just as well use an existing one (e.g. Ruby).

emodendroket · on Dec 13, 2016

> The problem is that now simply providing user-friendly output is not enough. For every program or script you throw together you need to provide the text output for the user, and then the type object stream for piping.

In PowerShell if the result of a command is just an object the object is pretty-printed to the console in practice ends up looking pretty much like what a Unix command would have given you.

chei0iaV · on Dec 13, 2016

With generic pretty-printing, your program output becomes generic.

Compare the output of "df -h" vs the PowerShell equivalent "gdr -psprovider filesystem", for example. One provides the data in dense (easy to follow) rows, while the other spaces it out across the whole screen, leaving large gaps of empty space around some columns while also cutting off data in others. The difference is especially noticeable of you have network shares with long paths.

PowerShell is probably nice for scripting, but I wouldn't want to have it as my shell.

emodendroket · on Dec 13, 2016

You can pretty easily pass it to select and get just the properties you care about, or you could output to a different format with the various out- commands. I find it pretty good as a shell.

__jal · on Dec 13, 2016

Options to select output? Sounds like advice from the Rule of Silence.

But.

- People above were just complaining about having to use sed to tweak output. I don't see why they would prefer a built-in filter to an external one. The external filter is far more flexible, and if that isn't enough, you can replace it.

- I'm generally not a fan of applications that tailor their output to what they think the human wants. Unless there is a deeply compelling reason, I want to see the same output on pts/0 as something down the pipeline. The reason for this is that picking up environmental hints to serve as implicit configuration is hacky, subject to error and and can later be the cause of really difficult-to-find bugs.

Perhaps I'm just irredeemably brainwashed. If you like a typed command line, Redmond has your back. For me, wanting types is a signal that I should start considering whatever little shell hack I'm working on complex enough to take it to a language that wasn't designed for ad hoc interactive use.

And at the same time, I really, very much do not want my command line to look like C#.

I get that a lot of folks these days are mostly GUI users who maybe type a little at git or run test suites from the command line, and not much else. I get why things like Powershell are appealing to such folks[1]. But when the command line is your primary interface, strong typing and OO hoop-jumping is huge waste of cognitive energy.

I do feel that Unix, to a first approximation, got the balance somewhere close to right. Loose coupling with lot of shared commonalities instead of a rigid type system and nonexplicit magic works really well for me, and if tighter coupling is a good idea, then I'll build it.

[1] Why they want to radically change the command line instead of using their favorite language to do systems stuff from the comfort of wherever they spend most of their time, I do not get.

emodendroket · on Dec 14, 2016

I feel compelled to point out, re-reading this, that you've misunderstood. You pass the output to the Select command and indicate the object properties you want; it's not a feature that has to be built in into each command.

nailer · on Dec 14, 2016

> People above were just complaining about having to use sed to tweak output. I don't see why they would prefer a built-in filter to an external one.

You don't see why someone would prefer 'select' to scraping with grep or awk?

emodendroket · on Dec 13, 2016

Yeah, wow, all I have to do is write a bunch of regular expressions tailored to the unique output of this command instead of passing in a list of property names. It's so easy.

dahauns · on Dec 13, 2016

To add to that: a list of property names you don't even have to exactly know beforehand, since the shell can easily deliver you the exact data structure and metadata of any object (cmdlets, parameters, results,...) usable on the shell.

I find the idea baffling that basically having to have the whole syntax tree memorized instead of the shell providing it for me is somehow less of a "waste of cognitive energy".

emodendroket · on Dec 14, 2016

I really think people are mistaking "I'm not familiar with it" with "it's bad and poorly designed" because they've forgotten what it was like to first use the Unix command line.

nailer · on Dec 14, 2016

Slightly more readable posh version:

    Get-PSDrive | Where Provider -like '*FileSystem'

michaelcampbell · on Dec 13, 2016

> in practice ends up looking pretty much like what a Unix command would have given you.

So... no real improvement, then?

emodendroket · on Dec 13, 2016

The improvement is that piping commands together doesn't require tedious text-munging while the console output looks basically the same. Which I think you could have figured out if you thought a little harder before jumping to snark.

romaniv · on Dec 13, 2016

>What is the argument for shell scripts not working on typed objects?

Typed objects can make it harder to pipe commands together. How do you grep a tree when tree is an actual data structure and grep expects a list of items as input? You would need to have converters. Either specific converter between tree and list, or a generic one: tree->text->list.

>Something Windows does get right is the clipboard.

It useful, but the actual implementation is pretty bad. Opaque, prone to security issues, holds only single item, cannot be automated.

JadeNB · on Dec 13, 2016

> Typed objects can make it harder to pipe commands together. How do you grep a tree when tree is an actual data structure and grep expects a list of items as input? You would need to have converters. Either specific converter between tree and list, or a generic one: tree->text->list.

To be fair, untyped objects also require converters, but at every boundary. That is, instead of having some pipes of the form `program -> mutually agreeable data structure -> program` and some pipes of the form `program -> unacceptable data structure -> parser -> program` (as happens with a typed language), you are guaranteed by a text-based interface always to have pipes of the form `program -> deparser -> text -> parser -> program`.

rtpg · on Dec 14, 2016

grepping a tree would not be hard. You simply go down to the leaves and "grep" on the properties themselves. Inversely, structured data lets you easily get to a property.

For example, if you grep the output ls -l for a file named "1", you'll also get files with 1 in their timestamp. In text land, you have to edit the ls command to get simpler output. In structured land, you could edit your filter: ls -l | grep name~=1

You could imagine various structured data filter tools that could be built that wouldn't require modifying the input.

Though in this example, you can easily use awk to select the column, wouldn't it be nice to not have to worry about things like escape characters when parsing text?

patmcguire · on Dec 13, 2016

How about protocol buffers and content negotiation? Your pipe figures out a proto that program a can emit and program b can consume.

ls -> repeated FileDescriptor files; | repeated string names;

Where FileDescriptor is whatever it needs to be, has all the info ls -l does. You have a heirarchy of outputs: if the next takes FileDescriptors, you give it FileDescriptors, if it doesn't you give it strings.

What would go to stdout goes through a toString filter.

Kalium · on Dec 13, 2016

You are absolutely, completely correct. The two can certainly be reconciled!

There is one possible complication, though. The two would need to be reconciled in the same way by everyone who wants to write a shell tool. Given that even fairly simple standards (RSS, HTML, etc.) cause lots of failures to comply, what are the odds of near-universal compliance in a larger and more diverse ecosystem like shell utilities?

fulafel · on Dec 13, 2016

Image processing works pretty well in shell pipelines because image formats are generally self-identifying.

cyphar · on Dec 13, 2016

> What is the argument for shell scripts not working on typed objects? How much time has been lost, how many bugs have been created because every single interaction between shell scripts has to include its own parser. How many versions of " get file created timestamp from ls" do we need?

Aside: that's what the stat command is for. My big concern with types is how would you make sure that the output of a command will always have the right types? Otherwise you'll have runtime type errors which would be just as bad as runtime parsing errors.

junke · on Dec 13, 2016

> Otherwise you'll have runtime type errors which would be just as bad as runtime parsing errors.

Parsing errors in script shells can easily go unnoticed (until you realize your data is corrupted).

tejtm · on Dec 13, 2016

> How many versions of " get file created timestamp from ls" do we need?

none? 'ls' is for humans. 'stat' is for scripts.

mattbee · on Dec 13, 2016

What you get from text + availability of source code = a set of practical documentation that shipped right where you needed it.

Microsoft forgot (and still forget) the documentation for Windows - if you want a nice A-Z reference for the bootloader, or kernel, or shell or IIS's configuration file or half the command-line tools, you're usually out of luck. The official place is often an inaccessible, badly-written knowledgebase article written from a task-based, usually GUI-based perspective.

It never mattered that they'd implemented a more coherent system full of better ideas, because the only way they'd tell you about it is through the GUI.

The MSDN CDs from 20 years ago were really good for a complete programmer's reference, but 1) I'm not sure how well they kept that up and 2) I could never find anything as comprehensive for sysadmins.

JdeBP · on Dec 13, 2016

You should have kept looking at the MSDN and TechNet, because the idea that there's no documentation because it has been forgotten, or indeed the idea that there is no documentation, is utter nonsense.

Windows Server 2012 Command-line reference:

* https://technet.microsoft.com/en-gb/library/cc754340(v=ws.11...

Windows XP A-Z Command-line reference, that is actually named that:

* https://technet.microsoft.com/en-gb/library/bb490890.aspx

Doco for the wdsutil and md commands, picked at random:

* https://technet.microsoft.com/en-gb/library/cc771206(v=ws.11...

* https://technet.microsoft.com/en-gb/library/cc754711(v=ws.11...

bcdedit command references:

* https://technet.microsoft.com/en-gb/library/cc731662(v=ws.11...

* https://msdn.microsoft.com/en-gb/library/ff542205(VS.85).asp...

And so forth. There's a huge amount of doco, including reference doco.

emodendroket · on Dec 13, 2016

This perspective sounds a bit out-of-date when you consider how heavily they're pushing to get PowerShell into sysadmins' workflow. The online MSDN docs are also often pretty good.

emodendroket · on Dec 13, 2016

The Windows way of working is only "self-evidently" horrible if you're used to the Unix way of doing things. There are real defects that are legacy baggage (running as admin by default is not good, the registry probably shouldn't be a single database or at least should have better segregation between apps) but having a UI and having real objects in the shell isn't one of them. And I hardly think the registry and PowerShell, which came out nowhere near the same time, were conceived of at the same time.

skrebbel · on Dec 13, 2016

> And I hardly think the registry and PowerShell, which came out nowhere near the same time, were conceived of at the same time.

I believe that the registry, in a way, caused PowerShell.

PowerShell works the way it does because Windows is structured data all the way down. A text-based shell a-la bash would not be very useful for Windows sysadmins. If you want to do Windows automation (e.g. on a cluster of windows servers or whatnot), you need to process and manipulate structured data (server settings, user permissions, AD groups, whatever). Hence, a shell and scripting language for doing just that.

If Microsoft hadn't moved from .ini files to the registry between Windows 3.1 and 95, I don't think PowerShell would've had the same design goals as it does now.

emodendroket · on Dec 13, 2016

Perhaps, but PowerShell wouldn't look the way it does without the CLR being ready to hook into either. And I could easily see just having applications that interact with the structured data but return text -- that's what all the old stuff did.

numbsafari · on Dec 13, 2016

S/PowerShell/Windows Scripting Host/ ... which was later than the registry, but not much.

emodendroket · on Dec 13, 2016

I'd say PowerShell and the WSH are fairly different but I've worked very little with the latter.

Pxtl · on Dec 13, 2016

The simple fact that the Registry doesn't have comments is ludicrous.

ygra · on Dec 13, 2016

What database has?

To clarify: The registry is a database for OS or application stuff (caches, settings, etc.). It's not meant to be user-editable and outside arcane trouble-shooting stuff you're unlikely to ever have to venture in there.

mr_overalls · on Dec 13, 2016

Oracle, for one, has had it for a long time.

"Use the COMMENT statement to add a comment about a table, view, materialized view, or column into the data dictionary."

https://docs.oracle.com/cd/B19306_01/server.102/b14200/state...

teddyh · on Dec 13, 2016

Also PostgreSQL (since 7.0, released in 2000):

https://www.postgresql.org/docs/current/static/sql-comment.h...

Pxtl · on Dec 14, 2016

Also mssql, the extended property ms_description can be applied to objects and columns and appears in SSMS as comments.

zajd · on Dec 13, 2016

> It's not meant to be user-editable

I think that's kind of a stretch since regedit has been a part of windows as long as I can remember.

emodendroket · on Dec 13, 2016

I'd say regedit is not really the preferred method for anyone to interact with the registry though.

chungy · on Dec 13, 2016

I'd say that regedit is integral to day-to-day use of Windows.

emodendroket · on Dec 13, 2016

Well I don't know what in the world you're doing but I almost never use it.

gayprogrammer · on Dec 13, 2016

> arcane troubleshooting stuff

... is exactly the reason to have useful comments in there.

Pxtl · on Dec 13, 2016

Config files do, which is what the Registry replaces.

brusch64 · on Dec 13, 2016

Adding to that my favourite pet peeve - translated UIs for server processes.

Every time I come around a German installation of IIS or SQL Server I'm cringing. Googling the right solution and then trying to figure out how they've translated this option is something I can't stand.

TheCoelacanth · on Dec 13, 2016

For the best of both worlds, why not something that is both text and structured, like JSON or sexps?

willismichael · on Dec 13, 2016

Yes! I don't know how many times I've had to learn yet another one-off configuration syntax, logging format, or template language and wondered why oh why doesn't this use s-expressions (or even json?)... I fail to understand why the same people who are so allergic to lisp-like syntax will happily work on some project that makes you context switch between about a dozen different syntaxes once you count in all of the configuration, templating, and expression languages, some of which are embedded in string literals with absolutely no tooling or syntax highlighting.

redcalx · on Dec 13, 2016

I think IIS is a bad example, many of it's config settings are a complete mess of partially obsoleted legacy settings. And some of it has no definitive reference, e.g. the threading and concurrency settings which seem to change with each IIS version, but no documentation is available that defines what settings are available in each version.

BurningFrog · on Dec 13, 2016

The philosophy sounds great.

Maybe the downsides are because of the execution?

sp332 · on Dec 13, 2016

Yeah, the registry has been solid for years now, which pretty clearly shows that the troubles were in implementation and not the philosophy. And powershell's object-passing has been pretty great from the start, so I'm not even sure how that's an argument against.

Chris2048 · on Dec 13, 2016

The windows registry is not too bad. You can easily store this stuff in a SQLite file though: https://www.sqlite.org/appfileformat.html

PowerShell' object-oriented nature is great btw!

Pxtl · on Dec 13, 2016

Around the office when teaching PowerShell we say it takes about two Googles per line. That's not a compliment.

setq · on Dec 13, 2016

This is in line with my experiences with powershell (or powershit as we refer to it).

The canonical example of the complete failure and total friction is in the simple case of obtaining a file from a web server and sticking it on disk. This is the steps:

1. Try various built in cmdlets. Eventually find one that works.

2. It's a big file and gets entirely buffered in RAM and takes the machine out. You don't get the privilege of finding this out until you try dragging a 20GiB vhdx image from a build server to plonk on your SCVMM platform. That conks it completely.

3. So you think you'll be clever and just use curl. Oh no there's a curl alias!

4. Every damn machine gets sent a reconfig to remove this stupid alias.

5. Then when you do get there you find that the curl output pipe just pukes multibyte garbage out. 20GiB of it that took a long time to come across the wire.

6. You tell curl to save the file directly. Might as well have used cmd.exe by now!

So 8 Googlings and 3 hours later you got a fucking file. Every time I use powershell that is my life.

ygra · on Dec 13, 2016

    man iwr

Oh, there's an OutFile parameter. Let's see what it does

    man iwr -param outfile

How nice, it writes the response directly to a file instead of to the pipeline. The curl alias actually points to the very same cmdlet.

Not to excuse bad examples on the internet (there are lots of people who fail to grasp PowerShell and still try to write articles and how-tos), but PowerShell having built-in documentation for commands and parameters is actually fairly easy to figure out from within the shell. Admittedly, once you learn the basic half dozen or so cmdlets you tend to use all the time.

You're free to ask me (or on SO, I tend to answer PowerShell questions there) if you're having trouble. Figuring above out took literally just 20 seconds. Really. I'm kinda sick of people cursing and blaming tools just because they're different from what they're used to. If I did the same and complained about bash and Unix (which I rarely find the need or time to learn) I'd be tarred and feathered in seconds ...

setq · on Dec 13, 2016

It doesn't work for large files. Try iwr with outfile on a 16Gb file on a 4/8Gb machine.

Knowing all the edge cases, exceptions and places where reality breaks down is the problem.

Where software should indeed work with the principle of least surprise, Microsoft have patented the principle of most inconvenient surprise.

Also SO has one small comment about this which didn't exist when I discovered it. I had to use windbg to dump the CLR heap and find out that there were a crap ton of buffers...

And that's every day using powershell. It's even worse if you trip over someone else's cmdlets which aren't aware of the correct harry potter style incantations to issue that don't cause the universe to implode.

The whole thing is a joke. A bad one.

ygra · on Dec 13, 2016

Unless you also write things to the pipeline via -PassThru specifying -OutFile will read in 10K chunks and write them to the file. No memory apart from that buffer is used. Look at the source, it's public. My PowerShell instance uses 38 MiB of memory the whole time during the download, regardless of the file size.

setq · on Dec 13, 2016

What's in your $PSVersionTable?

I'm doing this on windows 2008 server.

ygra · on Dec 13, 2016

5.1 on Windows 10. Admittedly, I never really tried iwr on older versions. Heck, the last time I tried downloading something with PowerShell to a file I went the WebClient.DownloadFile route, so it's been a while.

You can probably install a more recent version of WinRM on that machine, though (or by now probably the open-source version of PowerShell).

xenophonf · on Dec 13, 2016

To be fair there are many, many examples of downloading files using System.Net.WebClient. In fact that is how the first several search results for `powershell download file` tell you to do it despite Invoke-WebRequest being included in PowerShell since 2012. `get-help download` doesn't return anything useful. Contrast that with `apropos download`, which at least on FreeBSD includes wget in the search results.

TeMPOraL · on Dec 13, 2016

Well, for many years I had that experience every time I tried to use UNIX command line. I think it's just the issue of familiarity.

ygra · on Dec 13, 2016

How many Googles per line do Unix shell scripts need, assuming equal familiarity with both? As for me, I understand PowerShell fairly well, but hate having to deal with Unix utilities and for me the amount I have to google to get stuff done on a Unix command-line is significantly higher than with PowerShell.

It's just that once you learn how to use a tool you don't have to think about how to approach a problem anymore, you just do it. And then, when learning a completely different tool you have to learn again. Surprise.

Pxtl · on Dec 13, 2016

I should've explained the background - these are C# coders coming to Powershell, not Unix admins.

The problem with Powershell is that it tried to have its cake and eat it too - it wants the lightweight IDE of Unix-philosophy tools, but the detailed structure of C# objects. The problem is that the former is tightly tied to the simple common api of raw text, and the latter is inherently dependent on an intellisense-oriented IDE and static typing that heavily hints the names and parameters of useful actions.

Powershell manages to combine the worst of both worlds - objects means more complex APIs, but without the powerful IDE guiding you around those APIs.

justinlaster · on Dec 13, 2016

I'm sorry what? Powershell has to be one of the better documented systems I've used in quite some time, and the sytnax lends itself to a certain "learning velocity" if you will. I also find that the general standard for tooling is a bit higher than say, bash.

It's also been really neat to see how well Powershell lends itself to extending other applications.

norswap · on Dec 13, 2016

For people who don't know the standard Unix tools, I expect much the same.

emodendroket · on Dec 13, 2016

Well, sure, if you don't know PowerShell or use it often you'll need a lot of help. That just sounds like an argument for slavish adherence to existing conventions till the end of time -- in which case why bother with a new language?

GrinningFool · on Dec 13, 2016

Yep, it's really ground breaking stuff[1]

It actually is really good, but I do find it interesting that the first and last loser of the desktop wars got there a good long time ago.

[1] https://en.wikipedia.org/wiki/Workplace_Shell , https://en.wikipedia.org/wiki/Object_REXX

ajuc · on Dec 13, 2016

Powershell was much later invention. Agreed otherways.

lolpowershell · on Dec 13, 2016

lol at powershell in 1993

skrebbel · on Dec 13, 2016

Tell me that somehow writing this comment was worth making a new HN account for.

Ianvdl · on Dec 13, 2016

> To play devil's advocate, part of the reason things like The Rule of Silence are talked about is because of the messy unix philosophy of treating everything like plain text.

The amount of tooling that surrounds text is vast and has evolved over decades. You cannot replace that with a single database and call it better.

I can place the majority of my config files in git and version them. I can easily perform a full text search on any log regardless of its syntax. I can effortlessly relocate and distribute logs to different filesystems.

> I want my software configuration stored in a database not in a bunch of fragile files with made up syntax that are always one typo or syntax error away from being potentially silently ignored.

So you would like your configuration stored as a bunch of made up data structures? Databases do not make you immune from typos and syntax errors, ask anyone who has ever written a delete without a where clause.

And what happens when the giant, all-knowing database your system depends on has a bug or vulnerability? When something on my linux box breaks I can drop to a recovery mode or lower runlevel and fix it with a kernel and a shell as dependencies.

I think you would be a lot happier with a configuration management system (puppet, ansible et al) and some log infrastructure without having to completely redo the unix philosophy and the years of experience and fine-tuning that comes with it.

Chris2048 · on Dec 13, 2016

> The amount of tooling

Or the amount of cruft.

> git

Any structured data can still be serialized and diff'd, but it isn't always the clearest. Where is the contrast here?

> made up data structures?

so standardize the non-text format

> Databases do not make you immune from typos

Depends on the constraints. There are few on text files, possibly excluding sudoers.

If you aren't sticking to good practice you can just as easily rm a text file.

> has a bug or vulnerability?

What happens when the kernel has a bug or vulnerability? There are quite a few mature db systems. Plus, all text files depend on the file-system, which is why you store root on something stable like ext (still depends on hdd drivers though, unless you have some some ramfs on boot).

> years of experience and fine-tuning

Can you describe specifically what that experience is, and what the "fine-tuning" is?

laumars · on Dec 13, 2016

You can rm a database as well. In multiple different ways in fact. You can also put constraints on text files by forcing editing via a helper program (much like visudo and crontab do). In that regard the text format isn't much different from a database format aside the encoding of the data (it's probably also worth mentioning that you can - and some people do - store a database as flat text files if you wanted. They don't necessarily have to be binary blobs).

> "What happens when the kernel has a bug or vulnerability? There are quite a few mature db systems. Plus, all text files depend on the file-system, which is why you store root on something stable like ext (still depends on hdd drivers though, unless you have some some ramfs on boot)."

I'm not sure I get your point. What do kernel bugs have to do with text vs binary formats? Or the argument for or against centralised databases? Databases still need to store files on persistent storage so if your text files are compromised from a kernel vulnerability or instability in the file system then your binary database files will also be.

Chris2048 · on Dec 13, 2016

I'm not sure using a helper program has as many assurances around it, and manipulating data in a db often won't involve any 'rm' command, though copy/replacing a text file might.

> What do kernel bugs have to do with text vs binary formats?

you said "what if the db has a bug or vulnerability", my point is you have to rely on something, even the kernel. The difference is how stable these things are, and databases can be very stable.

> if your text files are compromised from a kernel vulnerability

not all kernel vulnerabilities will put the db at risk, it depends on the exposure to parts of the kernel. You can restrict the type and "fanciness" of the file-system a database will use if you know you don't need those additions, in the same way you use a stable fs for system files. You need a basic set of binaries one way or the other to access this data.

laumars · on Dec 13, 2016

> I'm not sure using a helper program has as many assurances around it, and manipulating data in a db often won't involve any 'rm' command, though copy/replacing a text file might.

DBs still have files, they can be rm'ed. DBs also have other delete commands like 'DELETE FROM x'

My point is it's just as easy to "accidentally" delete data in a database as it is in text files.

> you said "what if the db has a bug or vulnerability", my point is you have to rely on something, even the kernel. The difference is how stable these things are, and databases can be very stable.

Someone else said that. I think the whole stability point is moot.

> not all kernel vulnerabilities will put the db at risk, it depends on the exposure to parts of the kernel. You can restrict the type and "fanciness" of the file-system a database will use if you know you don't need those additions, in the same way you use a stable fs for system files. You need a basic set of binaries one way or the other to access this data.

If a software vulnerability exposes text files like /etc/passwd then it can expose the database disk files in exactly the same way. Having a database format won't magically stop files from being read remotely.

It's also worth mentioning that most of the time it's not kernel vulnerabilities you need to be worried about (not that I'm saying they're not bad); any bug in software (e.g. Wordpress vulnerability) that allows an attacker to specify the source file to be read would put both your database config concept and the existing UNIX config layout at risk.

xenophonf · on Dec 13, 2016

  The fetish for easily-editable ASCII files and escaping
  from structure is holding us back. Structured data does not
  automatically imply hidden and inaccessible, that's a
  matter of developing appropriate tooling.

Speaking as someone who's in the process of automating configuration management on Windows, I'll say that this is _much_ easier said than done. Imagine something like Active Directory Federation Services, which stores its configuration in a database (SQL Server) and offers a good API for altering configuration data (Microsoft.Adfs.PowerShell). Instead of using a generic configuration file template---something supported by just about every configuration management system, using a wide variety of templating mechanims---I must instead write a custom interface between the configuration management system and the AD FS configuration management API. Contrast that with Shibboleth, which stores its configuration in a small collection of XML files (i.e., still strongly typed configuration data). These I can manage relatively easily using my configuration management system's existing file templating mechanism---no special adapter required. I can easily keep a log of all changes by storing these XML files in Git. I can put human-readable comments into them using the XML comment syntax. The same goes for configuration files that use Java properties or JSON or YAML or even ini-style syntax, not to mention all the apps that have configuration files that amount to executable code loaded directly into the run-time environment (e.g., amavisd-new's config file is Perl code, SimpleSAMLphp's is PHP, portmaster's is Bourne shell, portupgrade's is Ruby, and so forth).

In short, your configuration database scheme is like an executable, whereas text config files are like source code (literally, in some cases). I'd much rather work with source code, as it remains human readable while at the same time amenable to a variety of text manipulation tools. Databases and APIs are more difficult to worth with, especially from the perspective of enterprise configuration management.

Edit: See also http://catb.org/jargon/html/S/SMOP.html.

Rotareti · on Dec 13, 2016

I was on Windows for about 20 years and moved to Linux about a year ago. One thing I happen to like the most in Linux is file-based configuration. If you add a settings-database to Linux you would end up with the same mess that you find in Windows. In Windows you have the Registry (a giant database for OS and app settings) AND you have the config files. Great... settings are spread over a gigantic db AND the file-system. IMHO: a nightmare. If you add a db for settings there will still be config files, they won't go away and you end up with an additional (huge) layer of complexity..

GrinningFool · on Dec 13, 2016

Don't forget group policies (2 or 3 layers of those, iirc)!

yason · on Dec 13, 2016

> If structured data was embraced we would have developed appropriate tooling to interact with it in the way that we prefer.

What kind of tooling might work with ad-hoc structured data and still getting all the tools to talk with each other like in Unix? How would it work without having to write input parsing rules, data processing rules, and output formatting/filtering/composing rules for each tool?

I suspect that the reason it's not very popular to pass around structured data is that it's damn difficult to make various tools understand arbitrary data streams. Conversely, the power of text is that the tools understand text, only text, and do not understand the context, i.e. the user decides what the text means and how it needs to be processed. Then the tools become generic and the user can apply them in any number of contexts.

tim333 · on Dec 13, 2016

>it's not very popular to pass around structured data

There's an awful lot of JSON that gets passed around. That seems a reasonable compromise between readable text and some sort of structure.

falcolas · on Dec 13, 2016

JSON is an OK serialization format, and a terrible format for outputting human readable data. Let's take, for example, the output of 'ls -al' and imagine it were presented via JSON:

The keys 'mode', 'number of links', 'owner', 'group', 'size', 'last modified' would be repeated over and over again, stacked vertically.

Mode would remain an arbitrarily formatted string, or (worse) be broken into its own object for every attribute represented by the string.

A reasonably populated directory would fill multiple screens with cruft.

The formatting of the timestamp in the last modified field would still be an arbitrary string.

Comparing two files would require piping through another utility to extract just the appropriate fields and display them unadorned.

Sure, it might be moderately easier to consume in another program since you can simply iterate over a list and reference specific keys, but it's not really that hard to iterate over a list of lines and extract a specific field by number.

    ls -al | awk '{print $5,$9}'

vs.

    ls -al | jq '.[] | [.size,.name|tostring] | join(" ")'

yason · on Dec 13, 2016

So, how do you parse, read, and process JSON without additional instructions of how to interpret the data and what to look for?

All right, maybe you do have a common tool that implements a query language so that you can filter out certain paths and objects from the JSON data into a thinner data set with known formatting expected by the next command in the pipeline. Then you need to write that command and you need more instructions, possibly again in another language, to describe what you actually want to do with the data now that you know where to find it.

At this point you typically write a separate script file to do this because it's easier to express in full-blown programming language what you want to do with the tree of hashes and lists and values. On the other hand, programs for lines of text are quite short and fit on the command line.

I don't see an immediate value in structured data, and especially none that would outweight loss in general applicability and usability in comparison to text based data processing.

Don't get me wrong: I would love to see a good prototype or sketch of how such a thing would work, and then try to imagine how I might be able to apply it to similar things for which I use Unix command line today. But I'm sceptical of "how" and also quite sceptical of "why".

jslabovitz · on Dec 13, 2016

I've started using jq (https://stedolan.github.io/jq/) in pipelines to parse/transform JSON output from commands that support it. It has some XPath-like notations for specifying elements within the JSON data tree. It's not perfect, but it's a good start and useful right now.

Slackwise · on Dec 13, 2016

YAML is a superset of JSON, but is much more human readable:

http://yaml.org

This means you can give it JSON when you have JSON, but otherwise use the nicer format when possible.

aloisdg · on Dec 13, 2016

TOML is far better

https://github.com/toml-lang/toml

emodendroket · on Dec 13, 2016

But the programs don't really just "understand" text; it has to be munged into exactly the format they expect.

module0000 · on Dec 13, 2016

> Yes I get it, you like your SSH session and Emacs/Vim blah blah but that's short-sighted.

How so? I'm one of those people that like my SSH session, and in my case vim and "blah blah blah". I've contributed to countless open source software packages that you likely use with this method, and so have tons of other developers. Nothing is broken here, things are working great for everyone who reads the manual and follows it.

> I want my software configuration stored in a database not in a bunch of fragile files with made up syntax that are always one typo or syntax error away from being potentially silently ignored.

apache, postfix, haproxy, even vim are certainly not prone to silently ignore anything, just to name a few.

> The fetish for easily-editable ASCII files and escaping from structure is holding us back.

Holding us back from what?

I am both a developer and an administrator, and I've had all the fun with solaris/aix configurations that are often not stored in plain text that I care to have. If you also have this experience, and still feel the way you do, then I'd love to hear more. Otherwise, your rant comes off as "your way is hard, and I don't want to learn it!"

Look at all the available structures available for that plain text you speak of... XML, JSON, YAML, the list goes on. You are free to use one of those, then you have that structure you crave. There are plenty of areas that could use revolution, but UNIX-like configuration files are not one of them. There is no problem here. If you are making typos or mis-configuring your software, then you have a problem of your own creation.

borplk · on Dec 13, 2016

> How so?

As I mentioned in another comment in my view the problem is that something like the configuration is shared both by the humans and the computers. Because of this we settle on something that is not optimal for either group.

We end up with something that is hostile to both the humans and the computers, just in different ways.

In fact the argument of people like you for ASCII config files exactly demonstrates my point. You are fighting for your human-convenience against the machines.

> Holding us back from what?

By embracing and acknowledging that the humans and computers are not meant to share a language we free ourselves from this push-pull tension between human vs machine convenience.

We can develop formats and tooling that respects its human audience, that doesn't punish the human for making small superficial syntax or typo errors and so on.

And we can finally step the hell out of the way of computers and let them use what is suitable for them.

And at that point you could still have your SSH session and Vim/Emacs and blah blah blah and you could still view and interact with stuff as plaintext if you wanted to.

> apache, postfix, haproxy, even vim are certainly not prone to silently ignore anything, just to name a few.

It's not always a matter of silently ignoring something but due to the nature of the task it is certainly very easy to shoot yourself in the foot doing something that isn't technically an error but wasn't your intention.

For example you can silently break your cron jobs by leaving windows-newlines in them.

Perfect example of humans and computers sharing a language that is hostile to the human.

BAD BAD human! You stupid human why do you use bad Windows invisible characters? Use good linux invisible characters instead that are more tasty for your almighty lord Linux.

module0000 · on Dec 13, 2016

> It's not always a matter of silently ignoring something but due to the nature of the task it is certainly very easy to shoot yourself in the foot doing something that isn't technically an error but wasn't your intention.

I agree with you here, it's easy to write perfectly valid configurations that don't do what you intended. But throwing it all out seems like the baby going out with the bathwater to me.

In all seriousness, what if it was XML all around? I hate writing XML by hand, but that's part of the problem you are describing(ie human editing of raw config). XML parses very nicely, so difficulty of coding tools to speak XML is almost non-existent.

All in all, it's just a big change you are proposing. Us UNIX people are incredibly change-averse. We keep getting burned :)

borplk · on Dec 13, 2016

> All in all, it's just a big change you are proposing. Us UNIX people are incredibly change-averse. We keep getting burned :)

Well that's exactly my point. I don't propose that we deploy it tomorrow but I want to see people do it and think about it and talk about it so our grand children can have better computing. Instead of sticking to the way of our fathers for all eternity.

XML as you identified has the same problem.

You need a human-dedicated interface that captures the intent of the human and you then convert that to something that the computer likes and tell them here ... this is what the human has ordered.

When we share the same raw input file with the computers that's when we set ourselves up for trouble.

Do I have the ultimate solution? No. Can I still complain about it being a problem? Yes.

One "solution" is a graphical rich interface with things like auto completion and validation and all that so it's much more capable of capturing the true intent of the humans. Basically the same way we interact with other web sites.

Imagine your bank told you to append a new text line to the end of "transactions.txt" file if you wanted to transfer money.

Now I put solutions in quotation marks because I know a GUI has other practical limitations and problems but you get the idea.

My point is as humans eventually we have to learn to graduate from sharing a rudimentary text language with the computer for the sake of short-term convenience.

wallacoloo · on Dec 13, 2016

A lot of config files are already trivial for the computer to parse (by design). Most config files even follow a loose standard of some sort (e.g. INI, which is a sort of key-value system).

Would it not be easier to keep this format, and then add some integration into your favorite text editor that prevents you from accidentally editing the INI keys, or warns if you create a syntax error, etc? You could even make a generic INI gui editor that replaces appropriate values with checkboxes, sliders, etc.

I guess I don't see much wrong with the formats for most config files, but if you insist that there is a problem, and the problems you've outlined are caused only when editing them, then why not change the editor and preserve the format?

(That cron problem does sound bad though)

jstanley · on Dec 13, 2016

> "Oh but isn't it neat you can pipe it to grep?" NO! No it's not neat, maybe it was neat 20 years ago. Today I want that damn data in a structure. Then you can still print it out in one line and pipe it to grep all you want.

The systemd journal works how you describe, and it is very painful to interact with. I'll take plaintext logfiles any day of the week.

It's fine if you want to interact with the log in ways that have been designed into it. But:

- it's harder to work out what you can delete to free up space in an emergency

- it's harder to get logrotate to do what you want

- it's harder to use "tac" to search through the log from the bottom up

> I want my software configuration stored in a database

So now you can't put comments in your config, you can't (as easily) deploy config with puppet, or in RPMs. You can't easily diff separate configs.

borplk · on Dec 13, 2016

I get it.

All the things that you mention can in theory be fixed over time.

The stuff I'm talking about is not for the next 6 months. It's not very meaningful to compare it against the current tools and landscape.

I can almost imagine a similar conversation in the past.

Someone saying "MAYBE ONE DAY WE CAN FLY!" and everyone's like "BUT OUR HORSES CAN ONLY JUMP 2 METERS HIGH! It would never work."

I understand your comment from a pragmatic point of view but none of those problems are big or important enough that we couldn't fix them in other ways.

Throwing away a rich structured piece of data and trading that for a dumb line of characters that needs to be re-parsed just so that it's easier to use logrotate and tac with them and so on is a losing trade.

digi_owl · on Dec 15, 2016

> You can't easily diff separate configs.

Reminding me of one of the little niceties of Gobolinux.

You have a pristine pr version config copy sitting in the main dir, and a package-wide settings dir. It also provides a command (implemented as a shell script, as is most of Gobolinux tools) that gets run upon installing a new package version.

If said command detects a difference between existing and new config files it gives you various options. You can have it retain all the old files, replace them with the new files, or even bring up a merged file that give you the new lines as comments next to the old ones.

pif · on Dec 13, 2016

> ... the messy unix philosophy of treating everything like plain text. If structured data ...

You are forgetting a crucial point: plain text is very well defined. Actually, it was already defined when the first Unix tools were being written. Using plain text means that you can use grep to search the logs of your program, even if your program was written yesterday and grep was written 40 years ago.

Structured data? In which format? Who will define the UNIQUE format to be used from now on for every tool? The same people who chose Javascript as the web programming language?

Do you realize that choosing plain text prevented any IE6-like monstrosity from happening?

devdas · on Dec 14, 2016

Clearly, you want to avoid the overloading caused by implementing a half baked version of Common LISP into every utility. Just use s-expressions.

Everyone can just boot into emacs :P.

michrassena · on Dec 13, 2016

You would think plain text was well-defined, but along came Unicode and even that's not true anymore.

verinus · on Dec 13, 2016

I'm with you when it comes to structured data, but plz no more data bases. these config files do not need to be centralized. I am thinking more in the direction of a parser that could check the validity of a configuration...

riskneural · on Dec 13, 2016

So, horribly unreadable xml config files?

TeMPOraL · on Dec 13, 2016

Like S-expressions don't exist. Or even goddamn JSON. Come on, we don't have to jump from one stupidity (unstructured text) to another (using XML as a data representation format).

nailer · on Dec 13, 2016

CSON or .desktop / .service or something similar is immediately understandable to most people and doesn't waste time with unnecessary tokens like XML does.

stevedonovan · on Dec 13, 2016

Not necessarily. JSON is not bad, _if_ you allow comments. Even plain-jane key/value config files can be sanity-checked. I suspect part of the problem is that anything fancy like that is awkward to do in C, so people take the lazy way out.

karl42 · on Dec 13, 2016

Maybe many tiny sqlite databases? Then you would not need to to centralize your data in a single database, but still do queries across different config files.

TeMPOraL · on Dec 13, 2016

SQLite is a huge win. All the power of SQL, and you don't have to introduce system-wide and non-local dependencies.

flukus · on Dec 14, 2016

How is this a win? Instead of opening a text file we have to write update statements to change any setting. Sql is itself a verbose language that should have been discarded in the 80's.

ptero · on Dec 13, 2016

Your complaints are, IMO, orthogonal to the UNIX philosophy (which, IMO, is also pretty orthogonal to the rule of science; now I have upset both sides :) ).

UNIX philosophy is, as written in the article, is based on programs that do one thing well and work with other programs. The second part, "work with other programs" is the one that encourages (but not requires) simple, text based I/O.

If A, B and C write programs and independently design some custom, structured, binary I/O the chance of them being compatible is nil. If they output text, the UNIX glue of pipes and text conversions makes them cooperate quickly and efficiently. Not elegant? Sure. But working well in no time.

TeMPOraL · on Dec 13, 2016

That's my primary issue with UNIX culture. It took a huge step backwards by deciding to work with unstructured text. It wasn't a wrong turn, mind you. It was backtracking on known and understood best practices all the way and then picking a wrong turn. And only now people seem to rediscover what was in common use in the era before UNIX - the virtues of structured text.

I guess our industry is meant to run in circles, only changing the type of brackets on each loop (from parens to curly on this iteration).

taneq · on Dec 13, 2016

It's not like unstructured piped text is the only possible way to work. It's widely used precisely because it's so expedient. If you use structured data, then every program in the sequence has to understand the structure. If you just smash everything flat into a stream of text, you can then massage it into whatever form you need.

It's not always the best way to approach a problem, but it's not meant to be. It's duct tape. You use it where it's good enough.

digi_owl · on Dec 13, 2016

The way i see it, shell scripting and piping allows non-programmers to get their feet wet, one command at a time.

You run a command, look at the output, now you know what the next command in the pipeline will see, and can add adjustments as needed.

Powershell etc seems to be more programmer oriented in that one keep thinking in terms of variables and structures that gets passed around.

And this seems to be the curse of recent years. More and more code is written by programmers for programmers. Likely because everyone has their head in the cloud farms and only the "front end" people has to deal with the users.

UNIX came when you risked having admins and users sitting terminal to terminal in the same university room. Thus things got made that allowed said users to deal with things on their own without having to nag the admins all the time.

TeMPOraL · on Dec 13, 2016

UNIX actually came when users were programmers at the same time. There was an expectation in the past that using computers involved knowing your way around your OS and being able to configure and script things.

TeMPOraL · on Dec 13, 2016

If you use structured data in a standard format, you can have a single system-wide implementation of a parser and then have each program process the data it needs, already structured in a semantically-meaningful way.

In current, unstructured text reality, each program has to have its own (usually buggy, half-assed) shotgun parser, and it has to introduce semantic meaning back to the data all by itself. And then it destroys all that meaning by outputting its own data in unstructured text.

It works somewhat ok until some updated tool changes its output, or you try and move your script to a different *nix system.

taneq · on Dec 14, 2016

But if your data is structured in a semantically meaningful way, then your receiving program needs to understand those semantics. Maybe you could introduce new command line tools to refactor streams of data so as to adapt one program to another, but I can't see it being simpler and quicker (in terms of piping input from one place to another) than the current approach.

I do like the idea of a standard parser to avoid the ad-hoc implementation of parsers into everything.

Your last comment gives a hint at the real problem, which is people using command line hackery in 'permanent' solutions. It's duct tape. You don't build important system components out of duct tape. Well, you shouldn't, anyway.

ArkyBeagle · on Dec 13, 2016

I can't disagree there. I've used SNMP in anger before. Underlying SNMP is the MIB.

People ran away screaming from it :)

qwertyuiop924 · on Dec 13, 2016

Structured text is good. Very good, in fact. It might even be idea. structured binary data, less so, at least as a storage format.

I want to be able to look at your file format using tools that haven't been specialized to the task. Is that so wrong?

TeMPOraL · on Dec 13, 2016

Well, I'm advocating for structured text, not binary. Mostly because I haven't seen a future-proof binary format yet, and editing binary formats indeed would require special tooling. I think - for a data exchange protocol meant to be used between many applications - going structured text instead of binary is a worthwhile tradeoff of little lower efficiency vs. much better accessibility.

EDIT: Some comments here are slowly making me like the idea of a standard structured binary more and more.

ryandrake · on Dec 13, 2016

Let's say I want to know how many days since an asset on my web server has been modified. With bash + some standard unix tools, from the top of my head I have to do something like this:

    curl -svo /dev/null http://example.com/file 2>&1 | grep Last-Modified | cut -d ' ' -f 3-

And that's just to get the last modified date in text form. Now I'm writing a script that parses that date and gets today's date, convert them to days, and subtract. YUCK!

Wouldn't it be nice if your shell could do this?

    curl(http://example.com/file).response_headers.Last-Modified.subtract(date().now).days

I think it would be nice.

qwertyuiop924 · on Dec 13, 2016

Yes, but as it stands, most of those would have to be builtins for that to work. I would rather have:

   curl -j example.com|select headers.Last-Modified|time before now|time to days

Where the commands send and receive JSON.

bandrami · on Dec 13, 2016

Can you name me a single UNIX configuration file whose format is not "structured"?

TeMPOraL · on Dec 13, 2016

Config files are less of a problem. The issue is with programs, which you want to use with pipes. Each has its own undocumented, arbitrary pseudo-structure with often inconsistent semantics, optimized for viewing by user.

bandrami · on Dec 13, 2016

What programs are you thinking of? Maybe this is my sysadmin bias but about 90% of my UNIX tools usage is on config files...

JdeBP · on Dec 13, 2016

Let me suggest an incendiary example: systemd

systemctl status wibble.service : Displays human-readable information, with colours, line drawings, and variable-length free-form text; that is exceedingly hard for a program to parse reliably.

Contrast with

systemctl show wibble.service : Outputs machine-readable information, in systemd's favourite form of a Windows INI file.

bandrami · on Dec 13, 2016

Yikes. Makes me glad I run Slackware. Though I seem to recall from The Init Wars that it was precisely this quality of SystemD that made people lob the charge that it violated the "Unix philosophy"

TeMPOraL · on Dec 13, 2016

ls, ps, du, df, ... pretty much all CLI tools. The kind you use in scripts.

bandrami · on Dec 13, 2016

All four of which users are enjoined, over and over again, not to try to parse the output of (particularly ls). That is, those are the tools specifically not meant to be connected by pipelines, but merely used for operator convenience.

qwertyuiop924 · on Dec 13, 2016

...But how do you extract that data otherwise?

bandrami · on Dec 14, 2016

Instead of ls, find. Instead of ps, you parse the nodes in the /proc filesystem that ps itself parses. Ditto the /sys filesystem and du/df.

qwertyuiop924 · on Dec 14, 2016

When those tools were written, /sys and /proc didn't exist.

borplk · on Dec 13, 2016

It's only "specialized" because we haven't been doing it so it's considered special.

At some point you have to admit that what's meant for the computer is not always byte by byte the same as what's meant for the human.

We try to shove these two together and we screw up both of them.

Empower the computer to be the best that it can be by taking the human out.

Empower and respect the human by giving him/her their own representation.

The "I just want to read the bytes that are on disk" philosophy is inherently limiting and broken when the audience are two very different things (humans vs computers).

My argument is that instead of fighting that we must embrace it.

xg15 · on Dec 13, 2016

Yes, because it's: a) inefficient b) invites a lot of wrong assumptions about the storage format, e.g. about the underlying grammar, the maximum length of tokens or the possible characters than could occur in the file. c) requires you to solve the same problems over and over again (how to represent nested lists, graphs, binary blobs, escaped special characters, etc) d) encourages authors to roll their own format and not research if there maybe is an existing format that would solve their case.

I agree with you, the other extreme - treat binary files as sort of opaque black boxes that you can only access with specialized tools beloniging to your application - is even worse. But I don't see why we can't reach some middle ground: have a well-documented, open, binary format that encodes a very generic data structure (maybe a graph or a SQLite-style database) and a simple schema language that lets you annotate that data structure. Then you can develop generic tools to view/edit the binary format even though every application can use it in their own way.

dagw · on Dec 13, 2016

Parsing structured text is slow and inefficient. Also reading out just one part of a data structure stored as text often requires either walking through the file character by character or first slurping the whole thing into memory.

qwertyuiop924 · on Dec 13, 2016

...But when your system crashes, having all that data in an easily accessible manner (regardless of what tools you have on hand) is a major win.

michrassena · on Dec 13, 2016

Let's not forget that when your system crashes, all of these easy-to-read text files are actually stored in a binary format, sometimes scattered in pieces, and require special tools to extract.

qwertyuiop924 · on Dec 13, 2016

True, but they're less fragile: even if the text is garbled, you might be able to get some information out of it.

imtringued · on Dec 13, 2016

A text editor is specialised to the task of viewing and modifying text in a certain format.

qwertyuiop924 · on Dec 13, 2016

Wrong. It's specialized to viewing and modifiying any text, regardless of format. That's a huge difference.

gaga666 · on Dec 13, 2016

First, rule of silence has nothing to do with plain text. It applies to any human-machine interface including GUI and physical knobs.

Second, you are right about structured data and all. The only thing is that it's either impossible or extremely hard to achieve. Many have tried, all of them failed. Windows now has a mix of registry, file and database configs which is a nightmare and is much worse than any unix. AIX has smitty and other config management solutions which are a bitch to work with if you want something non-trivial. Solaris is heading this direction (actually it's heading to the grave but it's another story) and it's also not nice. There are a lot of other OSes and programs which tried to do it but failed.

This is much like with democracy: it's a terrible form of ruling, too bad we have nothing better. This is exactly what's up with unix configs and data formats. It is possible to make some fancy format and tools which will achieve it's goal for like 80% of the time. But it will cause huge amount of pain in the ass in the rest 20% and this is where it will be ignored and you'll end up with a mix of two words which is worse than one.

michrassena · on Dec 13, 2016

The thing about today's Unix Philosophy, epitomized by both the BSDs and Linux is that it's backward-looking. There's a Garden of Eden myth and everything. Look at the reaction to systemd.

I remember getting fairly excited when Apple OSX first came out and quite a few of the configuration files were XML-based. Finally, a consistent format, but it wasn't pervasive enough. Even Apple couldn't see fit to break with the past.

I've even contemplated rewriting some core utils as an experiment to spit out XML (because I didn't know about typed objects at the time), but I lack the skillset.

I know we can't (and maybe we shouldn't) change Unix and its derivatives. There's too much invested in the way things work to pull the rug out. But, when a new OS comes along that wants to do something interesting, I hope the authors will take a look at designing the interface between programs rather than just spitting out or consuming whatever semi-structured ball of text that felt right at the time.

Wouldn't it be neat, for example, if instead of 'ls' spitting out lines of text, which sometimes fit the terminal window, sometimes not, which contain a date field which is in the local user's locale format, which is in a certain column which is dependent on the switches passed to the command, instead you get structured, typed information, ISO-formatted date and time, etc. On the presentation side, you can make it look like any old listing from ls if you like, rather than mashing the data and presentation layer together. I'd like to imagine such a system would be more robust than one where we could never change the column the date was in for fear of breaking a billion scripts.

bluetomcat · on Dec 13, 2016

I would be so excited to see new operating systems which depart from POSIX completely and introduce new abstractions to replace the dated notions of hierarchical filesystems, terminals, shells, shared libraries, IPC, etc. The sad truth is that everyone targets POSIX because there is so much software that can be ported in order to make the system usable.

michrassena · on Dec 13, 2016

I agree. The "Unix compatibility layer" has killed so many interesting projects over the years. If systems research were still an academic pursuit, maybe there would be some interest in bringing a system like this to fruition. It would require a long-term investment in the design of the system foremost.

koralatov · on Dec 13, 2016

I don't disagree with you, but how do we decide which structured data format to use as a replacement for plaintext? I have the sneaking feeling that a large part of why we still use plaintext is because it's established already as the standard, for worse or for better, and replacing it with a standard everyone could agree on proved impossible.

TeMPOraL · on Dec 13, 2016

Use whatever and stick to it. The sane structured data formats differ little but by shape of the braces and some punctuation. Were UNIX developers thinking about this back in the days, they'd probably chose s-expressions or something else from that era. Now it may as well be JSON. The thing is, one format should have been picked, and a parser for it should have been available in a system library. Then we wouldn't have to agree on anything, we'd have one system standard to use.

borplk · on Dec 13, 2016

Plaintext is the lazy way out. It's not even remotely a standard because everyone does it differently.

It's the shortest path from thinking "I need to persist this crap" to getting something working. Write the bytes to a file, sprinkle some separators, read and parse it back.

GrinningFool · on Dec 13, 2016

> When you get tot that point you have to understand that you have chosen to ignore the fact that the data you are dealing with must be represented in something much closer to a relational database than lines of ASCII text.

This is true. But when your needs aren't that complex, basic textual output sure is nice.

> The fetish for easily-editable ASCII files and escaping from structure is holding us back. Structured data does not automatically imply hidden and inaccessible, that's a matter of developing appropriate tooling.

Good plan. I'll set up a schema by which people can exchange data, and wse'll get it standardized. Given the complexity of the relationships involved - and the fact that I really don't know how my data will be used downstream of me - I'd better make it something really robust and extensible. Maybe some kind of markup language?

Then we can ensure that everyone follows the same standard. We can write a compatibility layer that wraps the existing text-only commands and transforms the data into this new extensible markup language (what to call it thought? MLX?). Then anyone who has the basic text tools can download the wrappers, learn the schema , and start processing output.

Then again, I could just do that grep | cut. The only thing I have to learn is the shape of the thing I'm grepping for, and the way to use cut - the basics take a few seconds, and no additional tooling is required. Best of all, chances are high that it'll work the same way 20 years from now (though likely with expanded options not current available).

There's a lot to be said for having simple tools that accept simple input and produce simple output.

This doesn't mean it's the only approach - databases and structured data absolutely have their places in modern CLI tooling - but that has no bearing on the value of an ASCII pipeline.

Vogtinator · on Dec 13, 2016

You can just develop alternative tools and pipe JSON. Text and data in one.

Pipes can transfer arbitrary data, so it's just the tools that you don't like, not the underlying mechanism.

TeMPOraL · on Dec 13, 2016

Yeah, but that's the point. Pipes are fine. The tools suck, though. UNIX would be infinitely better if it defaulted to piping structured text instead of making each tool have to implement its own shotgun parser.

qwertyuiop924 · on Dec 13, 2016

...So you're talking about having text-serialized key-value objects (or any other kind of object), with standard deserializers and tools for manipulation?

That's actually a great idea. Better yet, it's actually viable now, unlike many proposals for "fixing" unix.

TeMPOraL · on Dec 13, 2016

> ...So you're talking about having text-serialized key-value objects (or any other kind of object), with standard deserializers and tools for manipulation?

Yes, basically. Someone in power should just pick any format - modified JSON (without the integers are really IEEE754 floats stupidity), s-expressions, whatever - and make standard deserializers part of the system library.

mbrock · on Dec 13, 2016

Heh heh, "someone in power." Who would that be?

There's no authority to complain to about all this...

Structured text formats come and go: S-expressions, SGML, XML, JSON, "modified JSON", etc etc etc...

Fortunately the Unix tools work the same with any byte stream, pretty much, so they've survived gloriously since the 1970s.

That's kind of the whole deal with "worse is better."

TeMPOraL · on Dec 13, 2016

> There's no authority to complain to about all this...

Whoever accepts patches to base UNIX system libraries would be a good start.

> Fortunately the Unix tools work the same with any byte stream, pretty much, so they've survived gloriously since the 1970s.

So would a structured text format, except you wouldn't be discarding and then recreating semantic information at every junction of a pipe.

module0000 · on Dec 13, 2016

> Whoever accepts patches to base UNIX system libraries would be a good start.

That can be you! Get to work, or pay someone else to get to work if the skills required aren't in your set.

pwdisswordfish · on Dec 13, 2016

But then when you put structured data through pipes, why bother with making it text at all (over the wire, at least)?

stevedonovan · on Dec 13, 2016

Modern tools should offer the option to write out JSON (with a proper spec, please). I can definitely see the value of a 'ls' variant that can do this, and I remember people discussing JSON-based shells. But not either/or! For example, the Rust compiler can now be persuaded to write out error messages in JSON.

jslabovitz · on Dec 13, 2016

Modern tools are offering the option to write out JSON. Grepping /usr/share/man and /usr/local/share/man on my macOS system reveals a good dozen or so commands that take JSON as input or generate it as output. (Their man pages generally document the schema, too.) Most of the tools I've written myself also have JSON I/O.

JdeBP · on Dec 13, 2016

* http://jdebp.eu./Softwares/nosh/guide/service-show.html

Jos Backus asked for the JSON option in 2013.

chrisweekly · on Dec 13, 2016

Great points, esp about logfiles. So grateful to have discovered lnav --http://lnav.org -- an amazing little CLI tool with embedded sqlite engine and regex support. It solved all my logfile parsing / querying problems, and then some.

upofadown · on Dec 13, 2016

Our civilization has been using unstructured text for a very long time now...

I totally agree that Unix sux. We need a better philosophy. But you eventually have to come up with something that actually works in practice. I am still waiting.

I remember my excitement at the idea that things like CP/M and MSDOS running on personal computers were going to free us all from the tyranny of mainframe computers running things like Unix. We all know how that turned out. Everyone eventually just gave up and started emulating the top of a desk.

So Unix is good at messing with unstructured text? Good. Get back to me when you have something better that actually works.

jokoon · on Dec 13, 2016

I'm split, because there was this article saying the opposite, that actually text is a pretty good format because it's simple, compact, and easily debug-able. On the other hand text is awful if you want to serialize data because parsing and searching text is one of the most CPU intensive thing.

Now, it's true that we should recognize where text is really inadequate, especially when indexing and searching is needed. Webpages, for example, should not be plain text.

I think the problem resides in programmers not being able to properly use and understand how a database works. Databases and their engines are black boxes, so it's normal if fewer developers want to to use DB like you say. Meanwhile dictionaries and B-tree are not very sophisticated algorithms, yet I see almost no programmers using them consciously. The less a programmer know about the tools he has in his hands, the less he will get benefits from it and thus he will start using easier things.

So really my thought is that the tools are not accessible enough. The concepts of file and database are so distant that it's completely impossible to work with both, but to me it should.

imagist · on Dec 13, 2016

I can see your argument for consistent structure--I definitely don't love vimscript or nginx's custom configuration language. But that doesn't require we jump out of text--JSON and XML are viable, provided they remain geared toward hand editing.

It may be that given proper tooling for database-driven configuration it could be visible and accessible, but the fact is, I haven't seen timing that pulled that off.

mbrock · on Dec 13, 2016

JSON and XML are both horrible in their own ways, especially for use as configuration languages.

imagist · on Dec 13, 2016

True, but my point is that the alternatives being proposed are vaporware. JSON works now. IMHO JSON is the least shitty of the shitty solutions that exist.

mbrock · on Dec 14, 2016

I guess my point is that if JSON is the least shitty alternative then I can't really agree that Unix-type environments would be better off using it for everything, because JSON is a pain in the ass to type and edit with a regular text editor... the same with XML... and the same with S-expressions... YAML is a monster, CSV isn't really structured, and so on...

Basically there is no universal structural language that's ergonomic and nice for all uses, so it makes sense that the Unix hackers of yore preferred to create tiny custom languages for everything. It's also because "worse is better".

https://www.jwz.org/doc/worse-is-better.html

betenoire · on Dec 13, 2016

It doesn't matter if it is text or not, and if it is structured or not.

You missed the whole point about signal vs noise.

When I'm ALWAYS presented with a blob of something to decipher, it requires a context switch.

Nothing IS something, and it's a structured something.

ArkyBeagle · on Dec 13, 2016

Structured data may or may not imply "hidden and inaccessible", but there's a lot of correlation between the two.

dalore · on Dec 13, 2016

Agree with log lines. We now output our logs as json format (one message per line but in json format). Often with lots of metadata. Now you can still use something like jq to parse/grep it. But you can also process it into something like elasticsearch and use all the lovely metadata.

77pt77 · on Dec 13, 2016

> This runs very deep in unix and a lot of people are too "brainwashed" to think of other ways. Instead they develop other exotic ways of dealing with the problem.

Well, powershell did solve that problem...

closeparen · on Dec 13, 2016

A somewhat decent middle ground, which my employer uses a lot, is CLIs that output JSON. You pipe them to JQ to do "selects" on the data structures in a reasonably clear and compact way.

digi_owl · on Dec 13, 2016

Seem to me that while shell scripts can be used by anyone given a bit of time, wysiwyg style, powershell is made by programmers for programmers.

emodendroket · on Dec 13, 2016

That's why I think PowerShell is cool.

pwdisswordfish · on Dec 13, 2016

The text-oriented nature of Unix may be a mistake, but the Rule of Silence is not concerned with it. It is about not overwhelming the user with information they are not interested in. GUIs can follow or violate the Rule of Silence just as well.

StRoy · on Dec 13, 2016

Talking about the rule of silence in GUIs, I wish I could slap the eurocrats who decided to force websites to show the cookie warning on a first visit of all websites. Did they even understand the consequence and wasted time of what they were doing? Having to click all the time to get this dumb warning off?

Of course some websites had to do it in an even dumber way than the law asks for. Like slashdot: http://i.imgur.com/5Fp0nmo.png

This is what greets the French every time slashdot decides to forget you agreed to let them put cookies on your computer and you need to click continue before you can get to the actual website.

The law actually made it worse for the people it's supposed to protect (those who might refuse cookies for privacy?) because those warning then will stick around like glue if they can't give you a cookie to remember you accepted their existence.

pwdisswordfish · on Dec 13, 2016

The 'best' part is that the only way to remember the fact that the warning was shown (and not display it any more) is to use a cookie or something functionally equivalent to one.[0] So instead of empowering people who wish to protect their privacy, these warnings push people even further to keep cookies enabled.

[0] Storing it server-side, per IP address, is obviously impractical.

restalis · on Dec 13, 2016

"GUIs can follow or violate the Rule of Silence just as well."

It would have been nice if this was permitted to be violated only by GUIs. Ask about the first-time *nix experiences before the GUI-embracing era and one of the few things they were noticing was the continuous text-spitting. For instance, that was happening on boot and OS loading sequence (and still happens, now only being hidden by default with splash-screens), a lot of reporting about all the things that were performed successfully. It's funny in this regard seeing Unix' Rule of Silence being respected more... outside Unix, where is just common sense, with no need to be formulated as rule.

pwdisswordfish · on Dec 13, 2016

Booting up is a bit of a special case that I'd actually be willing to carve out an exception for. When the boot-up goes wrong, it's often due to causes so deep that they leave you with very few means of diagnosing it or showing an error message. If the booting process hangs, having a rough idea at what stage it happened can be very helpful. Without these messages, discovering the cause of failure would be much harder.

The boot-up splash screen hardly even counts as "GUI" anyway.

SixSigma · on Dec 13, 2016

Your comment was saved. Click OK to continue.

kulu2002 · on Dec 13, 2016

Precisely!

xg15 · on Dec 13, 2016

Note that the "rule of silence" (combined with the habit of writing documentation like longform essays) is also one factor that makes unix-like systems newbie-unfriendly. (Famous example: trying to exit vi)

I think the rule makes sense within the specific constraints *nix programs are usually expected to work in (two output channels with no structure except the one informally defined by the program and the convention that the output should be human- and machine-readable at the same time) but I don't see it as a general rule if better ways to filter the output are available.

vog · on Dec 13, 2016

> Famous example: trying to exit vi

To be fair, this has been fixed a long time ago. At least Vim (which is the Vi installed on most systems) shows the following message on startup:

  ~                     VIM - Vi IMproved                       
  ~                                                             
  ~                      version 7.4.1829                       
  ~                  by Bram Moolenaar et al.                   
  ~                          [...]
  ~        Vim is open source and freely distributable          
  ~                                                             
  ~               Help poor children in Uganda!                 
  ~       type  :help iccf<Enter>       for information         
  ~                                                             
  ~       type  :q<Enter>               to exit                 
  ~       type  :help<Enter>  or  <F1>  for on-line help        
  ~       type  :help version7<Enter>   for version info

On the other hand, it doesn't show this message when you call "vi" with a filename. But at least a beginner running "vi" for the first time should be taken care by this.

ygra · on Dec 13, 2016

Someone unfamiliar might just be dumped into vi because some other program thought it'd be great to open a text editor, e.g. for a commit message. You're unlikely to run vi intentionally unless you also know you want to run vi and how to exit, I guess.

013 · on Dec 13, 2016

Programs that put you into Vi, /should/ be calling $EDITOR, if $EDITOR is to to Vi(m), you should know how to use it.

Also, if you ^C in Vi, you get the message:

  Type  :quit<Enter>  to exit Vim

palunon · on Dec 13, 2016

And what if $EDITOR is not set ? Vi is the standard POSIX editor, calling it makes sense.

JdeBP · on Dec 13, 2016

Actually, defaulting from $VISUAL to vi and from $EDITOR to ex (or ed) is what -- strictly speaking -- makes sense.

Remember what the difference between $VISUAL and $EDITOR was intended to be. There's a whole range of places in Unix where there was, and even still is, a distinction between a line editor and a full screen editor. Consider, for just one example, the ~v and ~e commands in BSD Mail.

Now enjoy the discussion at https://news.ycombinator.com/item?id=13113556 .

marcosdumay · on Dec 13, 2016

I do think distros should change their default $EDITOR to nano.

chungy · on Dec 13, 2016

Debian has done exactly that, years ago.

It tends to annoy me, but it's easy enough to change :)

tim333 · on Dec 13, 2016

Yeah happened to me. It's probably standard for anyone trying to learn GIT.

return0 · on Dec 13, 2016

to be fair thats not enough either. someone might accidentally have pressed 'a' , and now good luck understanding how to get out of it without having a crash course on vim

babuskov · on Dec 13, 2016

Yeah, whenever I run some program, I just press a random key like "a" to get started. ;)

ygra · on Dec 13, 2016

It's not unreasonable to start typing in a text editor and you're likely to hit a or i sooner or later ...

deadbunny · on Dec 13, 2016

^z

pkill vi

madeofpalk · on Dec 13, 2016

Actually I found the colon 'syntax' at the beginning of the command to be quite confusing - I wasn't aware that youre supposed to literally type shift + semicolon. That's completely unintuitive compared to 'regular' shortcuts you see in nano like ctrl + c.

crisnoble · on Dec 13, 2016

I consider myself a major vim beginner, only using it for commit messages, very light file creations, and checking out config settings files when sshed into a server. I have never seen this screen.

dom0 · on Dec 13, 2016

> Note that the "rule of silence" (combined with the habit of writing documentation like longform essays) is also one factor that makes unix-like systems newbie-unfriendly. (Famous example: trying to exit vi)

    $ man foo
    *scroll to the end with the EXAMPLES section*

There should be an option for that. man --take-me-to-the-examples foo

vog · on Dec 13, 2016

(NOTE: The parent comment was edited. This response applies to the original parent comment that contained just the command line ("man foo" + EXAMPLE section) and nothing else.)

Wow, I haven't seen such a blunt and unhelpful RFTM comment for a while. This comment is inappropriate in so many ways:

1) The unix systems have an inconsistent documentation mix of man pages, info pages, "-h", "-help", "--help", HTML docs, separate manuals (e.g. Debian Administrator's Handbook) and so on.

2) "man foo" leads to: "No manual entry for foo"

3) "man vi", as well as "man vim" both lead to a manpage that has no EXAMPLES section at all (see https://www.freebsd.org/cgi/man.cgi?query=vi, https://www.freebsd.org/cgi/man.cgi?query=vim)

4) The Vi(m) manpages explain only the command line arguments, not the editor commands. The latter are available by typing ":help" in the editor.

swiley · on Dec 13, 2016

You don't really need an "examples" section for vim, 90% of the time you just type "vim file" which the usage info probably covers well enough. You only need an "examples" section for something like zip.

vog · on Dec 13, 2016

> you just type "vim file" which the usage info probably covers well enough

This is wrong. If you just type "vim file", you don't get any usage info, not even in the status line. See also: https://news.ycombinator.com/item?id=13165795

setrofim_ · on Dec 13, 2016

You might find this intersting:

https://github.com/tldr-pages/tldr

teddyh · on Dec 13, 2016

That project is a symptom of manual pages not having good “EXAMPLES” sections. The examples on that web page should be contributed upstream to the manuals pages of the software that they are for.

setrofim_ · on Dec 13, 2016

The issue isn't just the lack of EXAMPLES, but also with how man pages tend to be structured. They tend to be very "encyclopedic". There is a set ordering for sections, with a lot of them very verbose, and examples, when present, near the end. Options are often listed in alphabetic order, which doesn't usually correspond to how often they are used or useful.

Man pages are OK when you're first learning how to use something; but if you're already familiar with a command and just need to remind yourself of a the specific sequence of options to achieve a desired result, they're not the most convenient.

I think it's useful to have a tool that fulfills the latter purpose without worrying about the former.

JdeBP · on Dec 13, 2016

Microsoft documentation was mentioned earlier in this discussion. One of the things that MSDN and TechNet doco does is have both "X reference" and "using X" sections. Manual pages are reference doco, in this way of organizing things.

The FreeBSD, TrueOS, and related worlds put the "using" doco into what are often called "handbooks" or "guides".

* NetBSD Guide: https://netbsd.org/docs/guide/en/

* FreeBSD Handbook: https://freebsd.org/doc/handbook/book.html

* DragonFlyBSD Handbook: https://www.dragonflybsd.org/docs/handbook/

* TrueOS User Guide: https://www.trueos.org/handbook/trueos.html

* PC-BSD User Guide: http://web.pcbsd.org/doc-archive/10.1.2/html/pcbsd.html (viewable off-line directly in both PDF and HTML forms in /usr/local/share/pcbsd/doc/)

Some parts of the Linux world do the same. upstart had the Upstart Cookbook for example:

* http://upstart.ubuntu.com/cookbook/

The Linux Documentation Project was supposed to contain a wealth of this stuff, but large parts of it are seemingly moribund, and incomplete after decades or woefully outdated. Wikibooks tried to take up the slack with an "anyone can edit" Guide to Unix and a Linux Guide:

* https://en.wikibooks.org/wiki/Guide_to_Unix

* https://en.wikibooks.org/wiki/Linux_Guide

If you want examples and doco that works from the basis of what you usually want to do, then these handbooks and guides are the places to go, not reference manuals.