Hacker News new | past | comments | ask | show | jobs | submit login
Escaping strings in Bash using !:q (simonwillison.net)
482 points by goranmoomin on Oct 2, 2020 | hide | past | favorite | 81 comments



If only more languages supported quote operators like Perl. They really do make certain things like this so much easier.

Non interpolating quotes:

  q/I wasn't surprised when he said "boo" and game me $5/;
  q@I wasn't surprised when he said "boo" and game me $5@;
  q!I wasn't surprised when he said "boo" and game me $5!;
  q(I wasn't surprised when he said "boo" and game me $5);
  q[I wasn't surprised when he said "boo" and game me $5];
Interpolating quotes:

  qq/Hello, $name!/;
  qq!Hello, $name\!!;
  qq@Hello, $name!@;
  qq(Hello, $name!);
  qq[Hello, $name!];
  qq'Hello, $name!';
Rule of thumb, single q for single quote string (non-interpolating normally), and two q's for a double quote string (normally interpolates).


Lifted directly from sed, one of Perl's inspirations:

  s/replace #/with $/
  s#replace /#with _#


I think it was in ed already. The POSIX ed spec says: "Any character other than <space> or <newline> can be used instead of a slash to delimit the RE and the replacement."


and, AFAIR, in perl space and enter could be delimiters too! I think I've used it in some perlgolf task which didn't count whitespace.


and memory tricked me. Whitespace doesn't work for regex and quoting terminators.


Works in Ruby:

  irb(main):001:1/ puts(%r
  irb(main):002:1* a+
  irb(main):003:0> .match? %q aaa )
  true


Raw string literals in C++ allow this as well:

  R"/(I wasn't surprised when he said "boo" and game me $5)/"
  R"@(I wasn't surprised when he said "boo" and game me $5)@"
  R"|(I wasn't surprised when he said "boo" and game me $5)|"
https://en.cppreference.com/w/cpp/language/string_literal


I also like how Raku has extended this with the Q operator, allowing one to specify what should be interpolated (scalars, escape sequences, closures, etc) also with arbitrary delimiters --

   Q/I got $5 \sigh /   # literal
   Q:s/Hello $name/     # just scalars
   Q:b/hello world\n/   # just backslash escape sequences
https://docs.raku.org/syntax/Q


That may be a good candidate of something to port back when Perl 7 gets to the stage of adding new features.


It does look nice, and I kind of miss the feature in other programming languages. I also maintained cperl-mode for many years, and honestly the editor support for this functionality never quite worked right. It is hard to parse, and even harder when you are doing a heuristic-based incremental parse. (The challenge with text editors is that you have to provide correct syntax highlighting even when the document contains parse errors that the compiler would have bailed out on.)

Because editor tooling is important to me, I'm willing to see an ugly "\"quoted\" string" from time to time, if it means that I get better tools as a result.


I do generally prefer Perl's q and qq, though Python's triple-quote and its raw form are often good enough. Python also inherited C's (mis-)feature of concatenating consecutive string literals automatically, which comes in handy in some cases.



Unfortunately D doesn't allow substitutions like Perl.


I'm not sure if Ruby borrowed the full range of q and qq operators, but it offers Percent strings: https://ruby-doc.org/core-2.7.1/doc/syntax/literals_rdoc.htm...


This is just string interpolation.


%q allows different delimiters to represent single quoted strings, which doesn't have string interpolation. For ex:

    $ ruby -e 'a=5; puts "[#{a}]"'
    [5]
    $ ruby -e 'a=5; puts %q/[#{a}]/'
    [#{a}]


%Q or just % does interpolation:

    $ ruby -e 'a=5; puts %Q/[#{a}]/'
    [5]
    $ ruby -e 'a=5; puts %/[#{a}]/'
    [5]


%q/ruby/

%q@is@

%q!strongly!

%q(inspired)

%q[by]

%q<perl>

%Q~#{"interpolating quotes"}~

%&non interpolating by default&


q and qq are fantastic.

And qw too:

    # ['this', 'is', 'a', 'list']
    qw/this is a list/;


... but boy am I glad I don’t have to read stuff like that.


  ['but', 'boy', 'am', 'I', 'glad', 'I', 'don\'t', 'have', 'to', 'read', 'stuff', 'like', 'that']

  qw/but boy am I glad I don’t have to read stuff like that/
But why?


Or as it is written in mostly any other language:

    split("but boy am I glad I don't have to read stuff like that")
But I guess you saved three bytes, so that's nice.


Split works at run time, while qw// is just shorthand for array initialization syntax, so it works at compile time.


Split works at run time, but it could, in principle, evaluate at compile time when it has a constant argument.

For example, the ord() Perl function is evaluated at compile time if the argument is a constant string.

(It's not worth it because split() syntax isn't as clean as qw() for this purpose. split's first arg, if present, is the pattern to split on.)


If splitting a string is an important optimization point in your program, neither language is a good choice.

You fail to explain why other languages do not grow these vestiges. I can only assume the implicit explanation is that you believe them to be incompetently designed, whereas I believe the exact complement: it looks like a good idea, but it isn’t.


There are many good ideas that aren't in one programming language or another.

Does that actually mean they are bad ideas? No. It only means that particular language didn't copy that particular idea.

There may be bad ideas that are in many languages, that doesn't somehow make them good.

---

Quote-words are used regularly in Perl because they are clear and useful.

    for (qw' alpha beta charlie ') {
      say
    }
    for (split '', 'alpha beta charlie') {
      say
    }
    for ( 'alpha', 'beta', 'charlie' ) {
      say
    }
The `qw` emphasizes that we are dealing with `alpha`, `beta`, and 'charlie`. The other ones have extra noise that is only there to satisfy the compiler.

Now imagine you have to add `delta` to the list.

With the `qw` you only have to press the spacebar and the letters `d e l t a`. You don't have to worry if you accidently left off a `'`, because you didn't need to add one.

To be fair, the `split` would have the same benefit. But it is still more error-prone. For example I wonder how many people didn't notice that the first argument to `split` was an empty string when it should have been a string with one space in it.

I didn't even notice it, and I've been programming in Perl for decades.


Such things are called "syntactic sugar". They are common in other languages. Quote: «A construct in a language is called "syntactic sugar" if it can be removed from the language without any effect on what the language can do: functionality and expressive power will remain the same.»[0]

Another quote: «Data types with core syntactic support are said to be "sugared types." Common examples include quote-delimited strings, curly braces for object and record types, and square brackets for Arrays.»

[0]: https://en.wikipedia.org/wiki/Syntactic_sugar


Wouldn't that fall over for common usages of the shell (presuming it acts like sed)?

Suppose you have an executable "queue".

you'd generally run it as

  ./queue
With this rule, wouldn't that be parsed as

  ./q'e'e

?


It's the idea, rather than the implementation, which I'm advocating for. For a shell, it would likely be best to trigger with a reserved character (or two) that's much less likely to be encountered in normal usage.


Understood.

Finding a reserved character (or even sequence) that doesn't have collisions or unintended behavior beyond what the semi-reserved ' and " can do will be hard (but not necessarily impossible :)! ).


In addition to what kbenson said, word characters [a-zA-Z0-9] usually aren't eligible as quotes anyway, otherwise how would qq'this has a q in it' parse (versus q/'this has a /).


Powershell has "here-strings". For example:

    @"
    For help, type "get-help"
    "@
https://docs.microsoft.com/en-us/powershell/module/microsoft...


BASH too. Example:

SILLY_VAR="

My silly username is "$USER"

"

echo ${SILLY_VAR}


Cool trick!

Adding an additional :p modifier prevents the complaint about command not found:

  $ # This string 'has single' "and double" quotes and a $
  $ !:q
  '# This string '\''has single'\'' "and double" quotes and a $'
  # This string 'has single' "and double" quotes and a $: command not found
  $

  $ # This string 'has single' "and double" quotes and a $
  $ !:q:p
  '# This string '\''has single'\'' "and double" quotes and a $'
  $


!:q and then pressing M-^ (probably Alt-Shift-6) to do the history expansion is probably more useful; then you can immediately edit the line.


Or one can use `bind space:magic-space` in .bashrc to activate the auto-expansion of the history on space


Or `echo !:q`


I'm not at a computer right now, but this also seems like it should work:

    : !:q
The appeal (to me) would be less clutter in the output.


Even better -- use this script, paste anything into stdin. It will come back out quotable and pasteable into a shell.

  #!/bin/bash
  
  printf '%q' "$(cat)"
  echo


Even better, put this in your ~/.bashrc

    function bashquote () {
        printf '%q' "$(cat)"
        echo
    }


Golfed one-liner

    bashquote() { printf '%q\n' "$(cat)" ; }


You should contact your administrator if you are still paying per-line for your .bashrc


I think he's just saving on keystokes because those Model M's are getting hard to find.


Can't we get rid of cat somehow?


Yes,

    bashquote() { printf '%q\n' "$(</dev/stdin)"; }
Bash treats the string "/dev/stdin" as magic, so that works even if /dev/stdin doesn't exist. However, bash (unlike ksh) spawns a subshell for $(</dev/stdin), so using cat is actually lighter. (Also, it's not clear to me why $(<&0) doesn't work in bash.)


Is the echo there just for the newline? Why not slap a \n in there and make it a one liner?


Or on Mac, just run this script and it will modify your clipboard

  #!/bin/bash
  printf '%q' "$(pbpaste)" | pbcopy


  $ echo "text with spaces" | bash -c 'printf "%q" "$(cat)"'; echo
  text\ with\ spaces
For me, this escapes spaces with backslashes; the example in the article escapes them by quoting the whole string. Is this a difference in bash versions, or do these uses differ somehow?

(I replaced the single quotes with double so i could pass this to bash -c, but i get the same result if i use your code verbatim as a script)


    text\ with\ spaces
is same as

    'text with spaces'
See https://mywiki.wooledge.org/Quotes


It's plainly not the same, you can see that by looking at it. It encodes the same text, but it encodes it a different way.

And the difference is not negligible. With quotes, i can type or paste more into the middle, and the string is still valid. With escapes, i have to be careful to escape the added text. It's significantly more ergonomic to use quoted text.


Yes the example in the article and parent comments ought to be referred to as escaping, not quoting. Most of the time, quoting with single quotes leads to something far easier on the eyes.

Here's an example function for quoting from stdin:

    function bashquotesingle() { 
        printf "'"; 
        sed "s/'/'\\\\''/g"; 
        printf "'"; 
    }
Though it does have a bug, at least on macOS, where sed adds a trailing newline. (See https://stackoverflow.com/questions/13325138/why-does-sed-ad...)


ksh's %q prefers quotes; for your example it prints 'text with spaces', and for the article's example,

    $'This string \'has single\' "and double" quotes and a $'
vs bash's %q,

    This\ string\ \'has\ single\'\ \"and\ double\"\ quotes\ and\ a\ \$


If you intend to type or paste quotes, you still need some additional escaping. And on the other hand, most substrings of (non-quoted) escaped text are themselves properly escaped.


I learned recently that escaping characters in zsh/bash also works for parameter expansion:

  # zsh using flags ${(flags)name}
  % string="This is a string with \"\"\" and ''' and \"\" again ''. Also such stuff as & % # ;"
  % echo $string
  This is a string with """ and ''' and "" again ''. Also such stuff as & % # ;
  % echo ${(q)string}
  This\ is\ a\ string\ with\ \"\"\"\ and\ \'\'\'\ and\ \"\"\ again\ \'\'.\ Also\ such\ stuff\ as\ \&\ %\ \#\ \;

  # bash using operators ${name@operator}
  % string="This is a string with \"\"\" and ''' and \"\" again ''. Also such stuff as & % # ;"
  % echo $string
  This is a string with """ and ''' and "" again ''. Also such stuff as & % # ;
  % echo ${string@Q}
  'This is a string with """ and '\'''\'''\'' and "" again '\'''\''. Also such stuff as & % # ;'

Wow, so easy then, I remember struggling so many times in the past when e.g. iterating over filenames with unusual cahracters.

/edit: fix markup


One thing I've figured is that in bash you can use $'These kinds of strings', without any variable expansion, but what you get is essentially what's present in most programming languages quote-wise. Example:

    $ echo $'hey there, it\'s "double quotes", \'single quotes\', and some \\, \', ", $ chars'
    hey there, it's "double quotes", 'single quotes', and some \, ', ", $ chars


Those strings also support things like \0 to get a null byte, and \uxxxx to get a Unicode character. This is useful for working with filenames and other things with spaces, quotes, and so on using find, xargs, etc. E.g.

  find ... -print0 | while read -d $'\0' f; do ...; done


"read" also supports the empty string to mean the null byte in this case:

    find ... -print0 | while read -d '' f; do ...; done
And since Bash 4.4, also "readarray"/"mapfile" can specify a delimiter, which is great to read files safely into an array:

    readarray -d '' -a arr < <(find ... -print0)


>Those strings also support things like \0 to get a null byte

WARNING: this is not true in bash!

You can have exactly one null byte in a bash string: the terminating null byte. Try this:

  echo $’foo\0bar’
It prints “foo”.

So practically you can’t have null bytes in bash strings, as it will be mistaken for the terminating null of the underlying C string.

In your example read -d ‘’ would work just the same; actually that’s the idiomatic way to iterate on zero-delimited input (or xargs -0). Why does the empty string work? Because -d takes the first char of the string as the delimiter, which for empty C strings is the terminating \0 - this is how bugs become features.


zsh has quote-line by default bound to alt-' which will escape your current command line:

  quote-line (ESC-’) (unbound) (unbound)

    Quote the current line; that is, put a ‘’’ character at the beginning and the end, and convert all ‘’’ characters to ‘’\’’’. 
http://zsh.sourceforge.net/Doc/Release/Zsh-Line-Editor.html


After being disappointed that the `!:q:p` trick doesn't work, I'm even more delighted to see that esc-' does it inline! thanks ZSH!


You can do the same with bash:

!:q<ESC>^

(<ESC>^ is Escape Shift-6 in my keyboard)


Note that you can precede the !:q with `echo` to see the result as well as the invocation without an error.

  $ echo !:q
  echo '# This string '\''has single'\'' "and double" quotes 
  and a $'
  # This string 'has single' "and double" quotes and a $


Seeing the result of a modified history expansion is (or was) kind of a desired feature, so it's built-in just like the ability to escape stuff with :q

    $ # here be quote ' " chars

  (use :p to print but not execute)
    $ :!q:p
    '# here be quote '\'' " chars'

  (use :s to modify – no :q here because we modify the already quoted text !!, unquoted is now !-2)
    $ !:s/be/are/:p
    '# here are quote '\'' " chars'


This is very useful for saving off a one-liner .sh file once you've finally gotten the one that you want to work. I will have to remember this!

echo !:q > doagain.sh


Is anyone else annoyed by the amount of built in magic character strings in bash? There are times where I know a task is possible in a bash script but make it in Python (with no dependencies required and compatible with 2 and 3) because it's easier than looking up every random gotcha and running into issues later on when someone runs it in a directory with a space.


You must think about the teletype era. One character commands are economical, even if you have to consult a manual to remember them all. Anything printed on the paper was expensive. I remember stealing color ribbons from the account reserved for manual typists and secretaries. It was also customary to use the paper roll all 4 ways, when editing.


I turned off histexpand because i was fed up with exclamation marks doing random things i never wanted, so this doesn't work for me :(

Still, i can use ctur's script approach.


You can always take advantage of what I believe is the portable property of consecutive strings (without space) being concatenated together. Then you never need to escape anything in your scripts.

For instance, to produce a double quote inside single quotes, you can do this

echo "'"'"'"'"

That's three quoted strings next to each other that produce

'"'


You can get proper code formatting on HN by indenting two characters [1]:

  echo "'"'"'"'"
  '"'
[1] https://news.ycombinator.com/formatdoc


I thought "!:q " was what you typed to quit vi.


:q!

: - Open command prompt.

q - Quit.

! - Yes, I'm sure.


This works too:

    % cat /tmp/sh
    var=variables
    x=$(
    cat <<EOT
    This string has 'single' and "double" quotes and can interpolate '$var'
    EOT
    )
    echo $x

    % bash /tmp/sh
    This string has 'single' and "double" quotes and can interpolate 'variables'


I wrote bash function that leverages 'set -x' to get me the quoting in "$@" into a single bash env var say $job or in to a temp file. I use it from time to time -- pretty sure it's not perfect, but it works well enough to be useful. To use it it usually involves an 'eval'.


## key step: out=$((set -o xtrace;: "$@") 2>&1) # xtrace option shows us quoting for our args

    ## Cleanup $out
    #  only 1 of the next 4 'left trims' is expected to change $out:
    out=${out#+++++ }
      # if $FUNCNAME is eval'd three times there are five "+" chars
    out=${out#++++ }
      # if $FUNCNAME is eval'd twice there are four "+" chars
    out=${out#+++ }
      # if $FUNCNAME is eval'd there are three "+" chars
    out=${out#++ }
      # xtrace prepends the '++ '; ': ' is from our code shown above.
      # We strip these 5 left most chars.

    # Would be nice to support any level of evals, (hence any no.
    # of '+' chars) [...]


The only thing that trips me up more on becoming a true grey beard other than regex mastery is how and when to properly quote things in my command line incantations.


This is hands down the coolest thing I've learned this year. Thank you


Oh I thought like, end a string without a quote...


Try doing it with :#s///


Could someone explain why this was downvoted? Is this dangerous, or does it not perform the same operation?


It is not dangerous, and does not perform the same operation (it is actually incorrect and incomplete: ":" and "#" swapped, and missing substitution words). I guess it was downvoted because it didn't explain? The bash manual does, though, under "HISTORY EXPANSION". Try this:

Type a command, but don't press enter, e.g.

  echo 'hello';
Type

  !#:s/hello/world/
Press enter. !# means the whole current command so far, and the s/a/b/ modifier replaces the first instance of "a" with "b".

Press up to get the command that ran:

  echo 'hello'; echo 'world';
To see such history expansion things before they are run, press M-^ (probably Alt-Shift-6).


Just another reminder bash scripts are unnecessarily complicated, use python instead.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: