Hacker News new | past | comments | ask | show | jobs | submit login

Unix got something right in that you can unambiguously pass a list of separate strings to launched processes. However, it does nothing to ensure unambiguous meaning of those strings.

This is for example why you should avoid giving your files such cute names as '-rf'.




Unix, in fact, does something for this.

Firstly, its IEEE standard (1003.1 or "POSIX") specifies the -- convention for separating option arguments from non-option arguments. The tiny handful of utilities like "echo" which do not implement it are also documented that way.

Secondly, Unix provides the POSIX standard getopt C library function, and getopts command. Programs and scripts which use these standard functions for processing options will implicitly support the -- convention.

Developers of new command line programs can ignore the documentation and standard functions, of course, developing their own non-conforming parsing from scratch. But at least users have something to point to if they report that as a problem: look, your program isn't supporting --, meaning that you ignored both the POSIX standard convention and the library function which enforces it.


> This is for example why you should avoid giving your files such cute names as '-rf'.

The kernel should ban these names. I'm a big fan of dwheeler's proposal for fixing filenames: see http://www.dwheeler.com/essays/fixing-unix-linux-filenames.h...

These is no god damn reason why a filename should be able to contain, say, LF, DEL, or BEL. None whatsoever.


Yes there is. You want the filesystem to be flexible. If the shell doesn't like those characters, use a different shell that doesn't care. It's brain-dead to create a filesystem that prevents flexibility in user interfaces.


Flexibility is only a good thing if the benefits outweigh the costs. I insist that there are no legitimate (i.e., no better option) use cases for control characters in file names. The filesystem being "flexible" is not a good thing if flexibility causes real problems.


> These is no god damn reason why a filename should be able to contain, say, LF, DEL, or BEL. None whatsoever.

OK you want ASCII 0x07 to be disallowed. Should a filename be allowed to contain "㜇"? (U+3707)


That's not a problem because the UTF-8 encoding of U+3707 will absolutely not contain any USASCII control characters, or any special shell or filesystem characters. It will all be bytes in the range 0x80-0xFF.


There are other encodings than UTF-8 though. Which is kind of my point. If you have your file system set to UTF-16 (doesn't NTFS do this?) then 0x07 will be present.


I also believe that filesystems should require that all filenames be fully normalized UTF-8. I don't think the benefits (slight, IMHO) of allowing filenames to be arbitrary byte strings outweigh the costs of code complexity and security problems.


That's not how UTF-8 works.


It is how UTF-16 (NTFS) does though.


That doesn't count. Windows doesn't allow the 16-bit word 0x0007 to appear in filenames.


What? '-rf' is a specific set of flags for a specific program. You can't ban all possible flags for all programs in file names.


The operating system could address it by having a separate argument list and option list at the kernel level, creating an unambiguous interface for calling a program, giving it a list op options and non-option arguments.

Ambiguity would remain in how a given shell parses input to determine what are options and what are arguments: but this would at least be out of the control of individual programs. Notably, the shell would be the tool which parses the -- convention. Programs wouldn't see the -- delimiter which separates options from non-options, so it would be impossible for a program to neglect to implement support for --.


Yes, programs are free to interpret arguments any way they want. (See dd(1).) But in practice, almost all programs interpret a leading dash in an argument word to mean "here be options". By banning filenames with leading dashes, we close a large number of security holes at minimal cost. Of course it's not a total solution, but from a pragmatic perspective, it's the right thing to do, because it goes a long way toward solving a real problem.


Close what security holes? If someone isn't escaping input they are still screwed if you ban dashes.

It's like suggesting we don't allow sql to store quotes so we can use quotes to enclose data.


It's harm reduction. Yes, everyone should be escaping input. Yes, everyone should be using "./.foo" instead of just ".foo". But people don't, and they're not going to start. If we ban leading dashes, we stop these bugs from turning into security vulnerabilities.

Your stance is like being against ASLR because developers just shouldn't have buffer overflow vulnerabilities in their code.


What does it mean to say that an argument's meaning could be unamibigious regardless of the program it is passed into ? that's a logical impossibility

Still, it would be right and proper if Unix programs a little type-safety in their arguments, for example by requiring that ALL arguments be flags, as in this hypothetical smart_rm: "/bin/smart_rm -rf --pattern foo/bar"


You can escape file names that look like options with '--'.


This depends on the application recognizing the -- convention and also depends on having all the little scripts in your system remembering to use the --.


Even if the OS kernel provided a process launching API with separated options and arguments, that would not remove the need for the -- syntax to remove the ambiguity at the shell level, and hence your need to use that in scripts.

It would remove the problem of programs all being required to implement --.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: