Quoting command line arguments the wrong way (2011)

makecheck · on Dec 27, 2016

If an API has a problem, you fix the API. If necessary, you release a free library to back-port the improved API to older OS versions as well. You come up with a fixed version, make it as convenient as possible, call it the “new standard”, officially deprecate the alternatives, and then write a blog post. Except the blog post would only need two lines of sample code showing how easy it is to work around the problem now.

Not this. Frankly, few developers will even know about the need for careful coding such as this, and even fewer will actually do it because it will muck up each and every program with dozens of lines of extra stuff to work around a deficient part of the PLATFORM.

tetrep · on Dec 27, 2016

I'm absolutely amazed that Windows doesn't offer a process spawning API that takes an array of strings as arguments[0], if only because that's exactly how a C program expects them anyway.

[0]: https://linux.die.net/man/3/exec

13of40 · on Dec 28, 2016

The problem is that parsing the command line isn't done by the operating system at all, but delegated to each individual console application. If you made a new CreateProcess API that took an array of command line arguments, the operating system would need to serialize the array into a single string that could be passed to legacy commands. Unfortunately, there's no way to tell what weird parsing is buried inside the commands, so there will always be gaps in what you can express in the serialized command line.

For example, suppose the console app thinks that apostrophes should be treated as quotes. I pass "'x", "x'" into the new API, and that gets serialized to "'x x'" (two strings to most applications), but this particular app interprets that as one string that says "x x". The OS can't even escape the apostrophes to avoid this, because it doesn't know what language the console application speaks.

PowerShell had to deal with this problem because native command lines need to be rehydrated from its AST before they can be passed to CreateProcess, and (IIRC) they ultimately had to add an operator that means "everything after this point in the command line should be passed to the command verbatim" to cover all of the corner cases from this.

dom0 · on Dec 27, 2016

These APIs do exist, they just don't work that way:

> These functions appear to be precisely what we need: they take an arbitrary number of distinct command line arguments and promise to launch a subprocess. Unfortunately and counter-intuitively, these functions do not quote or process these arguments: instead, they’re all concatenated into a single string, with arguments separated spaces

Dylan16807 · on Dec 27, 2016

Or in other words, those APIs don't exist, and mentioning a function that has the same C-type but does something totally different is trivia at best.

dom0 · on Dec 27, 2016

... which is exactly my point. They apparently exist, and would even appear to work correctly, until some scrutiny is applied.

Dylan16807 · on Dec 27, 2016

> They apparently exist[...]until some scrutiny is applied.

But that's the opposite of "does exist"?

asveikau · on Dec 27, 2016

When you get to the Windows kernel, the command line is a single PWSTR. Full stop.

Any API or C program main() running on Windows that suggests anything else is fiction - the C runtime parsing what the kernel gave it to a string array on one side, or concatenating into a single string on the other.

quotemstr · on Dec 28, 2016

Well, it's actually a UNICODE_STRING. ;-) The limit on the length of the command line comes from the range of the Length field of the UNICODE_STRING structure. (NT uses Pascal-style strings internally.)

NT's native process creation functionality is powerful, but baroque: see [1]. There's a ton of stuff that processes can be passed in addition to the command-line. One trick that's not well-known is that CreateProcess allows parent processes to pass an opaque binary blob to subprocesses via the lpReserved2 member of the STARTUPINFO structure. Cygwin uses this blob to pass information about file descriptors, ttys, and other POSIX context; this information block bootstraps Cygwin's fork implementation. The Microsoft C runtime uses it for a vaguely similar purpose: it's how file descriptor inheritance works when neither NT nor Win32 know anything about file descriptors (which are private to libc).

[1] http://www.rohitab.com/discuss/topic/40191-ntcreateuserproce...

[2] https://msdn.microsoft.com/en-us/library/windows/desktop/ms6...

asveikau · on Dec 28, 2016

I was a dev in Windows from 2008-2011 so I am not sure you are aware you are replying to someone who is already a big fan of the NT native API (and not so much a fan of the crude hack that is Windows CRT file descriptors, like most things in the MS CRT...) I did mean to type PWSTR in full awareness that I'm using it as a figure of speech for UNICODE_STRING.

quotemstr · on Dec 28, 2016

Ah, I had no idea. It's sometimes hard to distinguish a lie-to-children from ignorance.

pjc50 · on Dec 27, 2016

The terrible thing is that there already is an API here that everyone else on UNIX is using successfully - spawn a process with an argv of null-terminated strings and they turn up in the argv of the spawned process.

Microsoft have just chosen not to make it work like that because that would involve admitting they were wrong.

JoshTriplett · on Dec 27, 2016

Exactly. While they can't fix the old functions without potentially breaking software that relied on the old behavior, they can introduce a new set of process-launching functions that do the right thing: handle arguments exactly as written, no splitting, no quoting, no metacharacter interpretation.

Then, for new OSes, write a compatibility layer to implement the old functions on top of the new ones, and for old OSes, write a compatibility library to implement the new functions by quoting and passing to the old ones.

Mark the old functions as "deprecated, do not use in new code", and point to the new ones.

gravypod · on Dec 28, 2016

This is something that is important. The steps to go about.

1. Fix old solution

2. Implement alternative that is feature complete and document

3. Write comparability layers

If people followed this life as a software developer would be easier on bloodpressure. I remember fondly working with many APIs thst were deprecated but had no alternative and the devs admitted it.

mkesper · on Dec 27, 2016

They even copied that broken system into PowerShell "functions" (proper functions are expected to only return explicit results, these return every output they gather, falling behind structured programming).

quotemstr · on Dec 27, 2016

I've never understood the Win32 platform team's resistance to adding an ArgvToCommandLineW function to mirror the longstanding CommandLineToArgvW[1] function.

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/bb7...

umanwizard · on Dec 27, 2016

I don't understand your footnote.

quotemstr · on Dec 27, 2016

Fixed; thanks.

EvilTerran · on Dec 27, 2016

That is a bit more difficult than usual in this case, because it's not just one API, it's two working in unison: the one used by the calling process to pass arguments, and the one used by the called process to receive them.

At present, they're both "everything in one big string"; if an alternative "array of strings" API were added at both ends, you'd need to come up with shims for when the caller passes a string & the callee expects an array, and vice-versa. It's not immediately obvious how you'd do that in a way that works reliably in all cases, especially considering the callee currently has total freedom to parse its command-line however it likes.

rusanu · on Dec 27, 2016

Looking now at how Windows handles console application arguments, it sure looks broken. But you have to put your mindset in cca. 1990 and think what Windows applications looked like back then, and what was the model Microsoft was betting on. Arguments were passed via DDE[0], and then later all the bets were on OLE[1] and finally COM[2]. System components were all the time accessed via in-process DLLs communicating with services over LRPC[3]. In this world, the command line, the pipe philosophy and the 'less is more' mindset were not only not welcome, they were the adversary.

Even when finally it was aknowledged that the command shell needs some love too, the answer was PowerShell, which yet again defined an object interface between cmdlets[4].

[0] https://en.wikipedia.org/wiki/Dynamic_Data_Exchange [1] https://en.wikipedia.org/wiki/Object_Linking_and_Embedding [2] https://en.wikipedia.org/wiki/Component_Object_Model [3] https://en.wikipedia.org/wiki/Local_Procedure_Call [4] https://en.wikipedia.org/wiki/PowerShell#Pipeline

pjc50 · on Dec 27, 2016

Since when was argument passing ever done with DDE? This structure dates back to DOS and starting programs with INT 21h: http://kipirvine.com/asm/articles/ExecChild.pdf

You know, in all these years I've never bothered to find out how DDE worked, having used COM instead, and now I find: https://msdn.microsoft.com/en-us/library/ms648774.aspx and it's a terrifying abomination built on wparam/lparam.

> In this world, the command line, the pipe philosophy and the 'less is more' mindset were not only not welcome, they were the adversary.

I agree that this is Microsoft's greatest heresy and also an effective tool against interop.

pwdisswordfish · on Dec 27, 2016

> Since when was argument passing ever done with DDE?

Umm... Windows 3.1, I think? It could use DDE to pass the path to the file opened in the File Manager to the program specified in the Registry.

sukilot · on Dec 27, 2016

That's a long way of saying that Microsoft spent many years promoting a overcomplicated bad ideas over simple correct ideas.

wongarsu · on Dec 27, 2016

>promoting a overcomplicated bad ideas over simple correct ideas.

Encode the program name together with all arguments as one big string, discarding all type safety, spawn the process of your favorite shell, let that shell process the string in order to spawn one or multiple processes, in each process have the standard library process the arguments into an array called argv, and have your average process call out to yet another library to parse these string argument strings into flags and parameters, and prints a not standarized, potentially localized string if an error occurs, and if the error is fatal exits with a semi-standarized return code. The calling program tries to make sense of the output of the called program, often by fuzzy matching against known output.

That's the standard way how it's done since the beginning of Unix. Some APIs skip the shell, but that's a minor detail in all this. If this sounds simple and like the obviously best solution, then congratulations. The windows developers disagree and tried (and continue to try) to find a better way. They mostly failed so far, but I think we should thank them for at least trying to innovate.

arielb1 · on Dec 27, 2016

If you use some API other than system(3) - even if you literally use a shell-script - it's:

1) serialize data structure to semi-typed array of strings 2) pass array of strings directly to target program 3) have target program parse the array of strings to flags & parameters

With no shell or C library touching the command line arguments at all.

The only way this could be more direct would be if you used JSON instead of arrays of strings, web-API style.

rdslw · on Dec 27, 2016

So true. After 25yrs using pcs/unix/macs, I've come to conclusion that it's deliberate on Microsoft side.

They create much more closed, difficult and obscure technology which needs, training, certification and lot of care (services provided by Microsoft).

quotemstr · on Dec 28, 2016

Trust me: it's not deliberate. There is no conspiracy. Everyone I met on Windows is at least as well-intentioned as anyone working in the POSIX world. The reason Windows has some bad APIs is the same reason Unix has bad APIs: someone bootstraps a system quickly and doesn't see the problems that can arise from their choice of APIs; the system becomes wildly successful; and now everyone has to support these ill-conceived APIs.

Sure, Windows command-line argument passing is bad, but have you ever tried using wait/waitpid/wait4/waitid/etc.? That's a nightmare in the POSIX world; Windows has nice, clean process handles, not the garbage /proc stuff that makes it fundamentally impossible to write a safe pkill(1).

If you're writing a brand-new system, for the love of God, do a good job of designing the APIs. You will not have a chance to go back and fix the APIs later.

preordained · on Dec 28, 2016

Uh, so much truth here. If I added music, this would be like the ballad of software. Once the cement has hardened on an application's design, reinforced with industrial strength needy customers who only want the next feature...that is game, set, match.

rdslw · on Dec 28, 2016

Nah, I'm far from consipracy theory here.

To list quickly from memory (and my game machine experiences) - all from windows 10 pro.

* new install with ms office, I run autoruns.exe and see that I have 70+ things being run on startup. Yikes. Default install.

* registry: multiple things, biggest for me: you can't easily move part (or all) of it between machines, due to being tied to specific machine: imagine in unix you can't move whole /etc between to a newly installed separate machine.

* still lack of decent package management (check chocolatey (BTW they try to do some work here): among dozens of problems they have, they still (and in fact probably never) can't tell you simple 'list all files of given package'. Don't event want to start laughing about appx microsoft newest invention: it does not return even HALF of default installed apps of new laptop.

* logging. During update 1607, process is stuck (6hrs, and still 'please wait'): no simple place (log) to analyze what's happening (and no, eventlog is not such place). All daemons and windows do not have sane logging with enough information constantly being written to logfiles to analyze problem (this is big and deliberate).

* file system mess: e.g. system drivers running with kernel permission installed in 'program files' (new dell xps from 2016) and also usual common day programs being installed in c:\windows - while windows happily allows it - again, clean new install :)

* naming of services/technologies and their (microsoft) general approach to architecture design (boundaries and namespaces): this is mess, one example among hundreds: check what is short name of background transfer service (the bits one) you need to restart if windows update (sic!) stops working - no, it's not bits or bts :)

This is only written on the phone high level things, There is on the net comprehensive listo of 500+ things could have been fixed.

So???

kazinator · on Dec 28, 2016

All applications in Windows receive a command line string, console or not.

ajross · on Dec 27, 2016

Everyone does this "wrong" because every app does this differently. The core reason windows command line options aren't (or at least shouldn't be) used to pass complicated data is that way back when DOS simply provided a single command line string to the executed program and let it parse it itself. So no two command line parsers are the same. The glitch with escaping here is merely one symptom of a broader problem.

Unix got this right by forcing the shell to provide the kernel a pre-parsed list of strings, so the only insanity the tool integrator needs to understand is the shell's quoting syntax. Which is still insane. But it's only insane in one particular way.

comex · on Dec 27, 2016

On Unix, most tools don't really need to use the shell at all; it's enough to treat argument lists as lists internally and pass them to exec or posix_spawn (which of course, unlike the Windows _exec and _spawn, aren't broken for arguments with spaces!).

However, at minimum it is still useful to shell-escape arguments when displaying them to the user (for ease of copy+paste), so it's unfortunate that many languages don't have any standard library function to do this - including C on POSIX, and Python before version 3.

iokanuon · on Dec 27, 2016

But in comparasion with MS systems, escaping strings for unix shells is very simple - just prepend and append an ' and change every ' into '\''

ajross · on Dec 27, 2016

Alas, no. Bourne shell syntax allows for double quotes with variable interpolation and some other fancy syntax (including backslash escaping of literal double-quote characters), and a single-quote syntax for "raw" strings with no fancy syntax INCLUDING backslash escaping.

So your rule won't work. You can't single-quote a string that itself contains single quotes, which makes for some fun when you have arbitrary strings (file names are the big frustration) that need to be substituted into a parseable command line.

But like I said above: Bourne syntax[1] is only one kind of insanity, which is still much better than the DOS/Windows world of a separate parser for every app.

[1] We shall not speak of the C shell here.

ufo · on Dec 27, 2016

Their rule does work. The single quotes are transformed into '\''.

The first ' closes the current single quotes string. The \' adds a single quote character outside a quoted string (and outside single quotes you can use backslash escapes). Finally, the last ' reopens the single quoted string that we closed before.

skissane · on Dec 27, 2016

A method I've used: replace all occurrences of ' with '"'"' and then surround resulting string with '

It's a bit ugly but it works.

d0mine · on Dec 27, 2016

pipes.quote() is in Python's stdlib since forever. Though it was not documented until it became shlex.quote() in Python 3.

kazinator · on Dec 28, 2016

_exec an _spawn are not Windows; they are functions in the MS Visual C redistributable run-time.

(Well, they are also in a system DLL called MSVCRT.DLL. That is an internal library which is undocumented and considered by Microsoft to be off-limits to applications.)

pwdisswordfish · on Dec 27, 2016

Unix got something right in that you can unambiguously pass a list of separate strings to launched processes. However, it does nothing to ensure unambiguous meaning of those strings.

This is for example why you should avoid giving your files such cute names as '-rf'.

kazinator · on Dec 28, 2016

Unix, in fact, does something for this.

Firstly, its IEEE standard (1003.1 or "POSIX") specifies the -- convention for separating option arguments from non-option arguments. The tiny handful of utilities like "echo" which do not implement it are also documented that way.

Secondly, Unix provides the POSIX standard getopt C library function, and getopts command. Programs and scripts which use these standard functions for processing options will implicitly support the -- convention.

Developers of new command line programs can ignore the documentation and standard functions, of course, developing their own non-conforming parsing from scratch. But at least users have something to point to if they report that as a problem: look, your program isn't supporting --, meaning that you ignored both the POSIX standard convention and the library function which enforces it.

quotemstr · on Dec 28, 2016

> This is for example why you should avoid giving your files such cute names as '-rf'.

The kernel should ban these names. I'm a big fan of dwheeler's proposal for fixing filenames: see http://www.dwheeler.com/essays/fixing-unix-linux-filenames.h...

These is no god damn reason why a filename should be able to contain, say, LF, DEL, or BEL. None whatsoever.

grymoire1 · on Dec 28, 2016

Yes there is. You want the filesystem to be flexible. If the shell doesn't like those characters, use a different shell that doesn't care. It's brain-dead to create a filesystem that prevents flexibility in user interfaces.

quotemstr · on Dec 28, 2016

Flexibility is only a good thing if the benefits outweigh the costs. I insist that there are no legitimate (i.e., no better option) use cases for control characters in file names. The filesystem being "flexible" is not a good thing if flexibility causes real problems.

kalleboo · on Dec 28, 2016

> These is no god damn reason why a filename should be able to contain, say, LF, DEL, or BEL. None whatsoever.

OK you want ASCII 0x07 to be disallowed. Should a filename be allowed to contain "㜇"? (U+3707)

kazinator · on Dec 28, 2016

That's not a problem because the UTF-8 encoding of U+3707 will absolutely not contain any USASCII control characters, or any special shell or filesystem characters. It will all be bytes in the range 0x80-0xFF.

kalleboo · on Dec 29, 2016

There are other encodings than UTF-8 though. Which is kind of my point. If you have your file system set to UTF-16 (doesn't NTFS do this?) then 0x07 will be present.

quotemstr · on Dec 29, 2016

I also believe that filesystems should require that all filenames be fully normalized UTF-8. I don't think the benefits (slight, IMHO) of allowing filenames to be arbitrary byte strings outweigh the costs of code complexity and security problems.

quotemstr · on Dec 28, 2016

That's not how UTF-8 works.

kalleboo · on Dec 29, 2016

It is how UTF-16 (NTFS) does though.

quotemstr · on Dec 29, 2016

That doesn't count. Windows doesn't allow the 16-bit word 0x0007 to appear in filenames.

hueving · on Dec 28, 2016

What? '-rf' is a specific set of flags for a specific program. You can't ban all possible flags for all programs in file names.

kazinator · on Dec 28, 2016

The operating system could address it by having a separate argument list and option list at the kernel level, creating an unambiguous interface for calling a program, giving it a list op options and non-option arguments.

Ambiguity would remain in how a given shell parses input to determine what are options and what are arguments: but this would at least be out of the control of individual programs. Notably, the shell would be the tool which parses the -- convention. Programs wouldn't see the -- delimiter which separates options from non-options, so it would be impossible for a program to neglect to implement support for --.

quotemstr · on Dec 28, 2016

Yes, programs are free to interpret arguments any way they want. (See dd(1).) But in practice, almost all programs interpret a leading dash in an argument word to mean "here be options". By banning filenames with leading dashes, we close a large number of security holes at minimal cost. Of course it's not a total solution, but from a pragmatic perspective, it's the right thing to do, because it goes a long way toward solving a real problem.

hueving · on Dec 28, 2016

Close what security holes? If someone isn't escaping input they are still screwed if you ban dashes.

It's like suggesting we don't allow sql to store quotes so we can use quotes to enclose data.

quotemstr · on Dec 29, 2016

It's harm reduction. Yes, everyone should be escaping input. Yes, everyone should be using "./.foo" instead of just ".foo". But people don't, and they're not going to start. If we ban leading dashes, we stop these bugs from turning into security vulnerabilities.

Your stance is like being against ASLR because developers just shouldn't have buffer overflow vulnerabilities in their code.

sukilot · on Dec 27, 2016

What does it mean to say that an argument's meaning could be unamibigious regardless of the program it is passed into ? that's a logical impossibility

Still, it would be right and proper if Unix programs a little type-safety in their arguments, for example by requiring that ALL arguments be flags, as in this hypothetical smart_rm: "/bin/smart_rm -rf --pattern foo/bar"

tuukkah · on Dec 27, 2016

You can escape file names that look like options with '--'.

ufo · on Dec 27, 2016

This depends on the application recognizing the -- convention and also depends on having all the little scripts in your system remembering to use the --.

kazinator · on Dec 28, 2016

Even if the OS kernel provided a process launching API with separated options and arguments, that would not remove the need for the -- syntax to remove the ambiguity at the shell level, and hence your need to use that in scripts.

It would remove the problem of programs all being required to implement --.

kazinator · on Dec 28, 2016

> the only insanity the tool integrator needs to understand is the shell's quoting syntax

Only if the designer is using command-based functions like the ISO C system or POSIX popen; not when forking and exec'ing programs.

kevin_thibedeau · on Dec 27, 2016

MS does provide an alternate startup routine that parses the arguments before entering main().

pwdisswordfish · on Dec 27, 2016

This is a quite good example of the kind of problems you run into when you follow the philosophy of representing all data in informally-specified ad-hoc text formats. Everyone thinks they can just roll their own parser/serialiser, which they then neglect to test thoroughly enough, creating subtle bugs when the serialisation side forgets to escape data somewhere, or the parsing side doesn't even provide any way to escape grammar-significant characters.

tbrownaw · on Dec 27, 2016

No, the problem is the lack of a standard encoder function to go with the standard decoder function. It's not "everyone thinks they can", it's "everyone has to".

The problem is also YAGNI and validation-thru-testing, instead of up-front design. The "ad-hoc" and "text" parts aren't what's important, it's the whole approach of not doing any more than the bare minimum of up-front work. Which seems to historically give overall better results, even if it does come with interesting bugs that need fixing later.

pwdisswordfish · on Dec 27, 2016

Well, the real problem here is that there's not really a 'standard' you can rely on in the first place.

And while it's true that text isn't a necessary part of the general problem, in my experience text-based formats seem especially prone to it. How many times have you seen people attempt to use regular expressions to parse HTML/validate e-mail addresses/whatever?

grymoire1 · on Dec 27, 2016

I think this indicates a huge difference between Microsoft/UNIX mindsets. Microsoft allowed

   rename *.txt *.bak

To do this, the "rename" command had to understand how to parse the asterisk character while being familiar with the contents of the directory. However, creating a new replacement "rename" command is difficult, as well as creating new commands that can parse wildcards.

In the Unix environment, the shell expands the asterisk to all files that matches that pattern, and then passes these files to the "rename" command, who never sees the asterisk. Therefore it's trivial to create a new "rename' utility because it doesn't need to parse wildcards. However, renaming all .txt to .bak is awkward in a UNIX system.

pvdebbe · on Dec 28, 2016

It may be awkward but it's no use to ruin a perfect system for just one usecase. For what it's worth, zsh provides a capable renamer tool called zmv. An example:

    zmv -W '*.txt' '*.markdown'

And of course there are tools like rename(1) that work regardless of the shell used.

sukilot · on Dec 27, 2016

    find --name *.txt --exec mv {} {}.bak\;

That's imperfect because find doesn't provide a nice way to do the pattern-replacement but it's a pretty simple pattern.

If this use case were important, it's simple enough to write a purpose-built program that generated command-line argument pairs according to a spec.

Asooka · on Dec 27, 2016

I would probably do

   find -name '*.txt' -exec bash -c 'mv -nv "$0" "${0/.txt/.bak}"' {} \;

That's not at all user-friendly, but these are supposed to be programmer tools... If you want user-friendly file operations on UNIX command line, use midnight commander. It can do mass rename, etc.

quotemstr · on Dec 27, 2016

By the way: the Perl-based rename utility allows for similarly convenient file name transformations:

http://search.cpan.org/~rmbarker/File-Rename-0.06/rename.PL

hk__2 · on Dec 27, 2016

I don't see how the problems you're listing have anything to do with text formats; you have these kind of issues with any "informally-specified" format.

joeyh · on Dec 27, 2016

That's about Windows, but many uses of system() that involve non-static strings also probably get quoting wrong.

Of course, avoiding the shell is the best way to avoid the problem. Sometimes, you can't avoid the shell.

My preferred shell quoting method, for unix, is to wrap each parameter in single quotes. Then only single quotes inside a parameter are a problem. They can be replaced with '"'"'

Probably a lot of things use double quotes and perhaps try to escape $ and ' and " but miss details to do with \ and perhaps other characters that some shells treat specially.

Another way is to pass the filename in the environment: system("rm -rf \"$DIR\"")

pwdisswordfish · on Dec 27, 2016

> Sometimes, you can't avoid the shell.

What circumstances have you got in mind?

tene · on Dec 27, 2016

My favorite awkward environment to cite is running a command remotely over ssh. As far as I've been able to tell from casual testing, without having read the source code yet, ssh does something very similar to what Windows does here and just glues everything together with spaces and passes it to the remote shell for interpretation, so you have to deal with the shell and provide your own quoting.

EvilTerran · on Dec 27, 2016

That's correct - AIUI, the ssh utility has to smush the command & arguments into a single string, because that's all the protocol's "exec" request can handle:

https://tools.ietf.org/html/rfc4254#section-6.5

And I guess it can't really do any quoting/escaping, because there's no guarantees in the protocol as to how such things will be interpreted server-side - the command line could be interpreted by cmd.exe on the server for all `ssh` knows ;)

With that in mind, I've made it a habit to always quote the command-and-parameters part of `ssh` lines into a single argument - I figure, that's what it's doing anyway, so it's best to be explicit about it. And when I know I'm working with bash on both ends, my pattern of choice is

    ssh host "$(printf '%q ' cmd arg 'arg with spaces' "$var" ...)"

dozzie · on Dec 27, 2016

>>> Sometimes, you can't avoid the shell.

>> What circumstances have you got in mind?

> My favorite awkward environment to cite is running a command remotely over ssh.

Dude, if you run remote commands by calling them through SSH, you didn't just got things backward: you fucked things up heavily. SSH was never ever designed as a batch, unsupervised tool, despite many people using it as such.

Remote code that is parametrized should be run exactly as that: as a remote procedure call, a technique known for over thirty years now. One of the reasons is quoting (because for non-interactive SSH call the command needs to be quoted exactly twice if in shell, and exactly once when run from exec()), but there are problems with distributing keys, maintaining usable home directories, and disabling unnecessary things that are enabled by default (port forwarding, among the others), and that doesn't exhaust the list of issues.

Proper RPC protocol, like XML-RPC (which was released twenty years ago and is still usable while being quite simple), covers quoting -- or actually, serializing data -- without programmers worrying if they got their list of metacharacters right and did enough passes for things to work correctly.

On the other hand, I'm not surprised that people do this through SSH (and a variant of this stupidity: adding apache user to sudoers, so a web panel can add firewall rules). After all, I've never seen an easy to use RPC server that has all the procedures passed in its configuration. I needed to write such thing myself (once in Perl, as xmlrpcd, and recently in Python, with custom protocol that can do a little more, as harpd of HarpCaller project).

tene · on Dec 27, 2016

You're entirely right; I completely agree.

And yet, sometimes you're working in an environment that already has a good ssh configuration for other reasons, and you're very low on engineering time that you can invest into something, and ssh is good enough for a first pass implementation. Alternately, you may be working on some kind of ad-hoc data collection or maintenance task that's not going to become part of any long-term infrastructure (or will be replaced by something better), and you don't yet have any better systems in place to run ad-hoc programs across the cluster.

I completely agree with you that good RPC is a much better foundation to build reliable systems on.

[edited to add]: HarpCaller looks like a pretty interesting project, and similar to several things I've considered building in the past. Nice work.

dozzie · on Dec 27, 2016

> sometimes you're working in an environment that already has a good ssh configuration for other reasons, and you're very low on engineering time that you can invest into something, and ssh is good enough for a first pass implementation.

I would agree personally before I wrote xmlrpcd. After I wrote it, I, its author, have no excuses for using SSH as an RPC protocol. Though I'm not good on the marketing side, so I understand that people just don't know about such tools.

> Alternately, you may be working on some kind of ad-hoc data collection or maintenance task that's not going to become part of any long-term infrastructure (or will be replaced by something better), and you don't yet have any better systems in place to run ad-hoc programs across the cluster.

Honestly, this is yet another matter.

To properly manage a set of servers, one needs three different services[&], each for different thing. One service is for running predefined procedures (that can possibly be parametrized) -- this is what HarpCaller and earlier xmlrpcd are for. Another service is for managing configuration and scheduled jobs -- this is a place for CFEngine and Puppet. Then there is what you just said: a tool for running commands defined in an ad-hoc manner and collecting their output synchronously. From the three, the first and second don't match how SSH works and is used, but for the last one it actually makes sense.

[&] It doesn't have to be three services, but we don't have one that would cover all three in an uniform way.

> [edited to add]: HarpCaller looks like a pretty interesting project, and similar to several things I've considered building in the past. Nice work.

Thank you. I'm quite proud of how it turned out, and the middle part of it was an excellent pretext for me to write something for production use in Erlang.

paulddraper · on Dec 27, 2016

Disagree. Building XML RPC client and server requires an auth mechanism, permissions, API definitions etc.

SSH + unix + existing cli already have excellent answers to those. Plus more, like better response streaming.

dozzie · on Dec 27, 2016

In the case of XML-RPC, authentication mechanism is quite obvious: HTTP authentication. Permissions are hardly a problem, even less so than in the case of SSH, because you don't give the caller full shell, only a small set of well-defined operations. And definition of call interface does not magically go away when you move to SSH, so I don't know where did you came from with this argument.

That being said, SSH has plethora of problems as an RPC mechanism. Host key distribution sucks heavily if you have more than a handful of servers. User key distribution is even worse, unless you incorporate external mechanisms. You need to maintain a usable home directory for a service, which otherwise woudn't need such a thing. SSH has plenty of obscure functions, like port forwarding in three flavours, X11 forwarding, VPN baked in, and others. And to use SSH-as-RPC for a service you need to disable them. Are you sure you have covered every single one of them? And then there is also mixing a debugging channel with regular operations. Break just one and you cannot recover (and it's easy to break a debugging channel, as you want it reconfigured and limited to only allowed accounts and whatnot). Those two should be separated.

The only thing that SSH has better than XML-RPC is streaming a response. But first, it's a rarely used function for a setting where you need to execute a remote operation, and second, because it sometimes is actually useful, my HarpCaller (RPC daemon) needed a custom protocol.

SSH is very, very far from being an excellent protocol for running predefined remote operations, even if it were only for issues with quoting, which are difficult on their own.

paulddraper · on Dec 27, 2016

> SSH is very, very far from being an excellent protocol for running predefined remote operations

Right. But then again, it's the not fully predefined operations that have the most need if escaping.

dozzie · on Dec 28, 2016

Oh, quite contrary. They do need it, otherwise you have a system that just waits for breaking apart.

However, I agree that quoting for ad-hoc synchronous commands to be run through SSH is very troublesome and tiring, especially when one doesn't understand how the commands are executed through non-interactive SSH (and most people don't).

grymoire1 · on Dec 27, 2016

I'm not aware of a XML-RPC protocol which is secure.

dozzie · on Dec 27, 2016

I don't understand your statement. XML-RPC is not inherently "secure" or "insecure", no more than just HTTP is.

grymoire1 · on Dec 28, 2016

You were the one who said RPC should be used instead of SSH. People use SSH because it's secure, despite the issues in passing parameters.

Suggesting that RPC be used instead of SSH while ignoring security is terrible advice.

dozzie · on Dec 28, 2016

You know what's the difference between XML-RPC and SSH with regard to payload encryption and client authentication? Only the fact that SSH has the two covered by mandatory parts of the protocol, and XML-RPC has this part optional (HTTPs and either HTTP authentication or client certificates). Nothing prevents you from exposing procedures only through HTTPs and only to authenticated clients. In fact, my xmlrpcd works exactly this way.

Security-wise, your advice to use something that makes building correct system virtually impossible (because quoting issues, unnecessary features enabled by default, and others) is simply stupid and dangerous.

joeyh · on Dec 27, 2016

As well as running things over ssh (which IIRC requires double-shell-escaping), su -c and similar take a single parameter containing the command to run.

nilved · on Dec 27, 2016

I'm going to use this opportunity to ask a question I've been thinking about for a long time. Why do we have both environment variables and command line arguments? They are the same thing, except one is key-to-value and one is positional and often needs to be parsed by hand in an ad-hoc fashion. I don't think that people should use command line arguments when environment variables are an option, and I'm not aware of any use cases where they are not an option.

newscracker · on Dec 27, 2016

Quickly, I can think of one reason - command line arguments override environment variables whenever conflicting options are present (programs should be written so that this is true). This gives flexibility to run a one-off command with specific options without having to reset and re-reset environment variables (of course, you can also deal with this by launching new shells that'll have the one-off environment variable settings without tampering parent shells).

There's a longer debate about this topic on Stackoverflow titled "Argument passing strategy - environment variables vs. command line". [1]

[1]: http://stackoverflow.com/questions/7443366/argument-passing-...

makecheck · on Dec 27, 2016

Some reasons:

— It is possible to “see” a command line (in "ps" output, etc.) in ways that you won’t see environment settings. This can be useful when passing information to a sub-process that you need to keep private.

— It may be that you are configuring your sub-process in a place that is “far away” from the point that actually executes the command. Rather than have to thread an extra command-line argument through your code to make sure it is part of the final command invocation, it can be quite convenient to just set a variable. I have often used this to enable debugging features or test experimental features, or even to disable entire features when unexpected problems arise.

— In a similar way, your program may use multiple languages or otherwise be difficult to manage in any common way without environment variables.

— In a cross-platform scenario, environment variable names might be far easier to keep constant across UNIX, Windows, etc. than command-line syntax.

I’m sure there are other reasons.

Someone · on Dec 27, 2016

I don't see any argument for having arguments there, and that is what the OP asked for.

One argument for having command-line arguments is that it can be less typing, as argument names can be implied. Compare:

  >cp foo bar

with

  >from=foo to=bar cp

It also seems easier to me to specify multiple arguments from the command line, but that probably could be solved (mostly) by changing sh syntax.

tene · on Dec 27, 2016

The main argument, to me, for command-line arguments is that they're not automatically inherited by child processes like environment variables are, so you don't have to rely on every process tidying up its environment before executing anything else. To me, that just seems like a recipe for heisenbugs and spooky-action-at-a-distance.

grymoire1 · on Dec 27, 2016

The POSIX shell has nothing to do with the syntax of the command. You can write a shell script that can parses cp from=foo to=bar if you wanted to. The shell expands metacharacters and variables, and sets up STDIN/STDOUT. But = is not a meta-character, so it is pased to the command unchanged. Several UNIX commands use that sort of syntax - like dd(1).

makecheck · on Dec 27, 2016

With the "env" command it’s possible to set environment using key=value for anything (it just means you have to say "env a=x b=y cmd -arg1 -arg2" instead of expecting "cmd a=x b=y -arg1 -arg2" to be valid).

tene · on Dec 27, 2016

You may already be aware of this, but it's trivial to read a process's environment variables, for any process running as the same user, or when root. They're exposed as a null-delimited text file in /proc/$pid/environ. You can even get ps to print the environment variable for you, if you use the 'e' flag (no leading dash). Depending on your actual security constraints, this may be important to be aware of. Of course, there are a variety of options for reading a process's arbitrary memory locations, so for actual security you need to control accesss to the host, but if you're worried about leaking 'ps' for command line arguments, you should be similarly aware about 'ps' showing environment variables.

mnarayan01 · on Dec 27, 2016

"Usually" permissions on /proc/$pid/environ are way more restrictive than on /proc/$pid/cmdline.

tene · on Dec 28, 2016

Yep, that's why I mentioned "as the same user". It's slightly less of a risk of data exposure, but it's worth being aware of when evaluating your threat model.

Animats · on Dec 27, 2016

A more interesting question is, why do we pass in command arguments and environment variables to a subprocess, but only get an integer status code back? The original reason comes from the way fork/exec was implemented in PDP-11 UNIX. But that was a while ago. At some point, the subprocess concept should have been extended to handle return values, like all other forms of function call. "exit" should have an argc/argv, which get passed back to the caller.

quotemstr · on Dec 28, 2016

It's always bugged me that the POSIX exit status can only really communicate seven bits of information reliably. (The other half of the byte is overloaded with signal-exit information from the shell.) Windows does it better: there, you at least get 32 bits, which is enough for an HRESULT.

Also, I've never understood where this exit(-1) idiom comes from. It's nonsense.

arielb1 · on Dec 27, 2016

The reason is of course that returning a variably-sized value to an already-existing address space is annoying. IOW that's what standard output (and pipes on /proc/self/fd/N) is for.

vram22 · on Dec 27, 2016

Interesting idea, and would make it more flexible, like Python and other languages which can return multiple values (from a called function to a calling function or the module scope, not to the OS, AFAIK).

zo1 · on Dec 27, 2016

Think of it like nested context in a "normal" programming language. The environment variables are state or data from the outside or encompassing context. Whereas command-line parameters are just that, parameters passed down to a function based on some logic held within the parent context.

Within that explanation, we all know that global/shared variables are a code-smell for the most part. Say you want to call the same command with different logic, or multiple times even.

result1 = func("hello", "world"); result2 = func("hello", "another world");

vs

greeting = "hello"; greetee = "world";

result1 = func(); //How do we even know that func uses greeting and greetee variables?!

greetee = "another world"; result2= func(); //Did func change my greeting variable? I don't know.

So let's assume that greeting and greetee are actual important variables. You are essentially then sharing your "state" with the func in order to alter its behavior. I think in some shells, the functions themselves can alter global environment variables, so it would be a giant mess making sure that functions are idempotent and don't have artifacts.

hk__2 · on Dec 27, 2016

Environment variable affect all instances; command-line arguments affect only one program instance. Set your defaults with environment variables (or a config file), then override them as needed with command-line flags.

icebraining · on Dec 27, 2016

No, you can definitively pass an environment variable to a single instance.

grymoire1 · on Dec 27, 2016

Yes, that is true but very misleading. If the single instance does not create any new processes, then there is no practical difference.

The difference occurs when a process A created process B and process B spawns one or more processes.

grymoire1 · on Dec 27, 2016

In the Unix world, environment variables are passed to ALL processes spawned by the parent, including sub-processes. For example, if you log onto a computer, and your HOME variable is set, then every single process you launch will know your home directory, including processes that launch other processes. It's automatic UNLESS a process explicitly change this value. This does not use any sort of global registry. I used to be an admin of a VAX computer that had 50 simultaneous users logged onto the server, and each user had a different HOME directory.

Environment variables also made shell scripts reusable by other users. The file $HOME/special would refer to the "special" file in the user's home directory.

Command line arguments are only passed to the one single child process. And if that process wants to launch a new process, it must create it's own command line arguments.

perlgeek · on Dec 27, 2016

Environment variables are inherited to child processes by default, so you can think of them as arguments to a whole process group, not just to the program you invoke at the top level.

gpderetta · on Dec 27, 2016

parameters are lexically scoped, environment variables are dynamically scoped. Today dynamic scoping is frowned upon as it is an instance of spooky action at distance, but in the 70s I guess it wasn't that obvious (and environment variables probably predate unix).

Also dynamic scoping can be very powerful when stitching together pieces separately designed. To this day emacs lisp is still dynamically scoped by default and arguably it derives some of its power from it.

I still hate environment variables.

ufo · on Dec 27, 2016

I think dynamic scoping is great for configuration parameters, which is what we tend to use env vars for.

gpderetta · on Dec 27, 2016

We do as well, but good luck figuring out which of the dozen of layers of shell glue code set up that specific variable and why.

pavel_lishin · on Dec 27, 2016

How do you represent multiple arguments without command line arguments? e.g., "rm file1.txt file2.txt file3.txt"?

grymoire1 · on Dec 27, 2016

You can use stdin:

    echo *.txt | read_stdin_and_get_filenames

You can use environment variables:

    export SPOOLDIR=$HOME/myspooldir;
    empty_spool_directory

You can use a single argument:

      echo *.txt >list_of_filenames
      mycomand list_of_filenames

pavel_lishin · on Dec 27, 2016

*.txt gets expanded by bash into a list of arguments. I don't know if this is the case in Windows.

~~Also, the paired asterisks ate some of your comment by treating it as an italics marker.~~ fixed now, thanks.

vram22 · on Dec 27, 2016

>.txt gets expanded by bash into a list of arguments. I don't know if this is the case in Windows.

(that should read star dot txt in the line above, not sure how to disable the italics meaning of star in posts)

I don't think it is the case in Windows, and this seems not to have changed since DOS days, when some programs would be abled to handle wildcards (internally) while others could not, because it was done by the individual programs, not the shell (COMMAND.COM or nowadays CMD.EXE).

A quick test:

$ python -V Python 3.5.2

$ type test_arg_list.py

import sys print(sys.argv)

$ python test_arg_list.py a t b ['test_arg_list.py', 'a', 't', 'b']

(that should read t star (not just t) in both the lines above)

So wildcards are not expanded. I'm sure there are Windows calls to expand them (there were from the DOS days, like FindFirst and FindNext (awkward approach, IMO), but your program has to actually use them for the expansion to work.

In fact, that is what I did, via the Python glob module, in this recent post:

Simple directory lister with multiple wildcard arguments:

https://jugad2.blogspot.in/2016/12/simple-directory-lister-w...

Whereas, in Unix, the shell (at least sh / bash) does it automatically for all arguments for all command-line programs, before the program even sees the arguments. This is one of the (many) key benefits of the shell. In fact, all metacharacters are interpreted by the shell and/or the kernel, acting together. This includes redirections, piping, the many special symbols that start with $, backquotes, and many others.

vram22 · on Dec 27, 2016

Interesting question.

I think I may have the answer (at least for Unix). (IIRC had read about this somewhere a while ago, also had thought about this issue myself earlier, so it's a combo of reading the reason somewhere and (maybe) figuring it out. Anyway, here it is:

It is because it allows 3 different ways of setting options for commands: rc files, environment variables (env. vars from now) and command line arguments, with each subsequent one able to override the previous one. The logic being that they go, in order, from more permanent to less permanent (as settings). rc files (rc stands for run command, a term I think I read Unix inherited from some previous OS) are config files for commands, like .exrc and .vimrc for vi/vim, .bashrc for bash, .netrc and many more. Any command can create or require users to create its own rc file, and can use it if present to read settings. The setting in a file is less easy to change on the fly than an env. var (not really difficult, of course, just that you have to go edit that file in an editor - or use sed etc.), and an env. var in turn is (a bit) less easy to change than a command line option, when we are talking about multiple different invocations of a command, in which you want the values for that option to be different in some of the invocations.

Let's take the example of a setting for a port (for a network server or client):

First, put the most common and permanent setting for the option, say, PORT=8080, in the rc file, say .foorc (for command foo - whether foo is built-in or written by you).

Second, for times when you want to change it for say today's work, set (i.e. change) it via an env. var, like:

export PORT=8181

foo args ...

# this setting will remain in effect until you change the var or you logout/reboot, and as long as it is present, will override any PORT value in .foorc each time you run foo.

It can also be shortened to:

PORT=8181 foo args ...

# but this is now a one-time setting of the env. var, so will override any PORT value in .foorc for this run only.

# In both the above variants, the args will not include PORT, since the foo command will be written to check for an env. var called PORT internally (and similarly checks for a PORT setting in .foorc before checking for an env. var called PORT, with the latter overriding the former if both are present).

And third, for the time(s) when you want to change the PORT setting on the fly, maybe just once for today, do:

foo --port 8282 args

which will override the settings for port (if any) in both the rc file and the env. var.

So the order is: command line option overrides env. var. and env. var overrides rc file setting.

This is what I read/figured out. It gives a lot of flexibility. Many Unix commands work that way. If you want your own to work that way, you have to write the code for it, like checking for presence of the rc file and for the setting in it, checking for the env. var with getenv() and finally checking for the command line option.

Edit: for typos.

quotemstr · on Dec 27, 2016

Oh, hi. This is my article. I'm a bit sad that I lost edit rights when I left Microsoft.

ronsor · on Dec 28, 2016

You should always host your own blog and then post a summary on the MSDN blogs -- always have control over content

zkhalique · on Dec 27, 2016

Oh brother, this is one of the reasons UNIX was much more developer-friendly.

agentgt · on Dec 27, 2016

Actually Bash does have some issues with command line arguments if doing variable expansion. For example in Java apps it is actually fairly difficult to pass -DsomeParameter="something with a space in it" if doing variable expansion.

marcosdumay · on Dec 27, 2016

But it's bash that has those issues, not the OS.

If it's an issue for you on Linux, you can always run a python shell and spawn your process there. On Windows you can't.

grymoire1 · on Dec 27, 2016

It's a bash programmer issue, not a bash issue.

If you want spaces to be unparsed in parameters, you "quote" the parameter.

  A="spaces in the argument"
  touch "$A"
  ls -l

   total 0
   -rw-rw-r-- 1 me me 0 Dec 27 15:40 a file with spaces

kbp · on Dec 27, 2016

Could you clarify what you mean? What's wrong with -DsomeParameter="$var"? Or am I misunderstanding?

agentgt · on Dec 27, 2016

I'm trying to find the exact situation but basically if you have something like:

   PROPS="-Dprop1=foo bar -Dprop2=bar foo"

   java $PROPS SomeClass.class

You can try to escape with backslashes and what not but it becomes fairly hard if not impossible to expand multiple parameters correctly as a single variable. I believe one solution is to use arrays and another is just not use arguments and instead rely on other configuration mechanisms (env variables, files, etc).

You see this often rear its ugly head with daemon scripts.

colemickens · on Dec 27, 2016

A bit hard to understand your exact scenario here (basically the same shell lexing problem...), but I think this is what you're looking for.

I left an example of an argument with a space in there (the last one).

    PROPS=("-Dprop1=foo" "bar" "-Dprop2=bar foo")
    java "${PROPS[@]}" SomeClass.class

(The quotes around `${PROPS[@]}` is important.) Do note, there is an unfortunate edge case here if PROPS is empty, you'll get a false `""` arg passed to java in that case. There's a less pleasant syntax that avoids that issue but I don't recall it off-hand.

(edit: Please read the replies to my post, I didn't think about the fact that this syntax is bash specific. Thanks to those who pointed it out)

agentgt · on Dec 27, 2016

Oh I am sure there are ways around it but the big issue is that almost all of the daemon scripts out there do not do it. That is you can't just set some configuration in /etc/default/some_daemon as the script will try to concatenate the command.

I tried to find a failsafe solution once while rewriting a daemon script and just gave up.

hashhar · on Dec 27, 2016

That's a bashism I think and hence not as portable as POSIX shells.

hwc · on Dec 27, 2016

This is, of course, a bashism.

hk__2 · on Dec 27, 2016

Can't you quote each argument?

     PROPS="'-Dprop1=foo bar' '-Dprop2=bar foo'"

EvilTerran · on Dec 27, 2016

Sadly not: only quotes that appeared literally in the command-as-written affect word-splitting or are subject to quote removal, not those resulting from an expansion step (variable substitution, globbing, etc); so you'd end up with four elements in argv (or one if you double-quoted the variable reference), with literal single-quote characters in them. You'd have to do something unpleasant with "eval" to make that work.

It's a sensible rule, when you think about it - otherwise, for example, an expansion that introduced mismatched quotes would cause total chaos.

grymoire1 · on Dec 27, 2016

You can include quotes in arguments on Unix systems. The only tricky bit is getting meta-characters past the shell to the command the shell executes.

Example

  #!/bin/bash
  Q1="'"
  Q2='"'
  echo "$Q1$Q2"

This prints

'"

And if you really understand that ' and " turn quoting on and off anywhere in the line, you can combine these two into a single variable:

  #!/bin/bash
  Q1="'"'"'
  echo "$Q1"

EvilTerran · on Dec 27, 2016

Yeah, that's the very behaviour I'm talking about - it just happens to be a problem in this particular case: the single-quotes in hk__2's PROPS variable would be passed to the executed command, not interpreted by the shell, but we wanted the latter here.

dozzie · on Dec 27, 2016

I think the parent misattributed where the fsckup was. Most probably it wasn't bash or other shell as it is, it probably was broken /usr/bin/java script (yes, it used to be a shell script) that looked for Java bytecode interpreter in various places. As most scripts that come from a big company, its style was terrible.

saurik · on Dec 27, 2016

A friend of mine had to fix this recently for PowerShell, which had a regression that caused arguments you passed to programs to be incorrectly escaped to executed commands.

https://github.com/PowerShell/PowerShell/pull/2182

It is extremely common to get this wrong. Apache Portable Runtime even gets this wrong :/. (I haven't submitted a patch for this yet, but I intend to: I ran into it a couple months ago and then got distracted after working around it in my program by predicting what incorrect escaping might be performed by APR and compensating by adding quotes and escape characters to my input to their open process function... my build is statically linked so I don't feel bad about this temporary hack ;P.)

quotemstr · on Dec 27, 2016

Wow. Microsoft really has changed for the better since I left. Not a coincidence, I'm sure. ;-)

kazinator · on Dec 28, 2016

This should be "quoting command line arguments the right way for passage into applications developed using Microsoft Visual C, and linked to its C run-time library that parses the command line string and calls main or wmain".

There is no general correct way to quote command line arguments in Windows, because every application receives just a character string which it parses however it wants.

There is no single specification for the syntax by which arguments are delimited within the command string.

jdright · on Dec 27, 2016

I dont really know what to say from this without risking being downvoted. But, this coming from a Microsoft blog is a little... awkward. DOS heritage + really bad shell implementations, well.. I avoid the hell out of using command line on any Windows. Luckily there is mintty.

marcosdumay · on Dec 27, 2016

> well.. I avoid the hell out of using command line on any Windows

How do you avoid it if it's the only API available for spawning processes on Windows?

alkonaut · on Dec 27, 2016

High level Api's could easily provide this, e.g. the C#

    Process.Start(executable, args);

Takes a single string as args, and has no overload taking an array as args, which formats it safely and correctly so that the receiving process would see the same string in its args vector.

So while it seems to be pretty easily fixable, it hasn't been.

marcosdumay · on Dec 27, 2016

The right way to do it would be to change the OS API into accepting an array of strings, and turn this function into an overlay that will have its parameters parsed and broken apart in user-space.

But yes, if MS ever fixes this it's more likely that they'll go into the route you described. I can't wait to see how many bug reports with crazy descriptions it will create, and how many more overlays will be written to fix the bugs preserving backward compatibility.

gaius · on Dec 27, 2016

DOS's command line conventions come from the PDP-11 which pre-dates Unix.

Piskvorrr · on Dec 27, 2016

The barest core does. Quoting and escaping...is not a part of that, IIRC.

vram22 · on Dec 27, 2016

IIRC, Microsoft C (quite some years ago - had worked on a product using it) had different variants of functions to spawn or execute a process from another one - like the exec family - execlp, execvp, execvpe, execlpe, etc., which varied in things like fixed vs. variable number of args, checking environment vs. not, etc. I also remember reading about it in a Waite Publications book, The Microsoft C Bible, by Nabajyoti Barkakati. Not sure if the issues mentioned in the OP could be solved if those functions were present - need to check.

Edit: and the DOS / Windows exec family of functions was likely derived from Unix's exec().

Pxtl · on Dec 27, 2016

What I find frustrating is how many MS tools freak at the sight of a quoted path when quoted paths are what "copy filename with path" (or whatever the context command is called) gives.

wfunction · on Dec 27, 2016

I recall seeing different behavior between the C runtime and CommandLineToArgvW. I don't remember what the difference was, but I remember it driving me nuts.

jaclaz · on Dec 27, 2016

[2011]

grymoire1 · on Dec 27, 2016

Oh. I was confused until I realized the publishing site was Microsoft. Apparently "Everyone" only refers to Windows programmers, and Unix/Mac/whatever programmers do not exist in this universe.

chc · on Dec 27, 2016

It's an MSDN article. Most of the articles on Apple's developer site are equally inapplicable to Windows development without explicitly saying so.

colemickens · on Dec 27, 2016

Looks like the title has been fixed but I was also confused before this thread filled up with comments confirming my suspicions that it was a Windows-specific issue.

It does seem rather misleading to say "everyone" does it wrong when it's specifically a problem with Windows APIs (though not terribly surprising from the Windows team).