Shellcheck: a static analysis tool for shell scripts

dpina · on Feb 5, 2015

Just spent some time sending my scripts to this site for it to analyse and see what it does. I can see that while it wasn't be able to tell me of more efficient code to achieve my goal (wasn't really hopping for that), it did spot 1) one liners where some commands are not needed, 2) variables which are not used, 3) where I should use double quotes to prevent word splitting and 4) lines where my ssh was eating up my stdin.

What a great sanity check for the days when I'm writing something on my own without a second pair of eyes to proof-read it.

Chico75 · on Feb 5, 2015

Combine it with the sublime text plugin (https://github.com/SublimeLinter/SublimeLinter-shellcheck) and you got real time static analysis while without your shell scripts !

lilyball · on Feb 5, 2015

Shellcheck is great. The Vim Syntastic plugin already knows about Shellcheck so if you use Syntastic and install Shellcheck you'll automatically start getting warnings on your code.

BTW, it can be a little hard to figure this out, but if Shellcheck gives you a warning that you want to ignore (because you intended to trigger that behavior), you can put the following comment above the offending line:

  # shellcheck disable=SC1234

where "SC1234" is replaced with the actual error code that Shellcheck gives.

Munksgaard · on Feb 5, 2015

This looks very helpful: bash scripts are notoriously difficult to get right. I wish it'd suggest best practices like `set -e` and the like though.

koala_man · on Feb 5, 2015

The jury's still out on whether `set -e` is worth it.

On paper it sounds like it's equivalent to `on error goto 0`, making a script fail-fast -- which would have been awesome.

Instead, it makes a script fail sometimes for things that are sometimes errors. The rules for how and when are unexpected and unintuitive, several weird cases are described on http://mywiki.wooledge.org/BashFAQ/105

If enabling it just granted a free 50% chance of stopping on any given error, it would have been worth it, but it triggers on false positives as well.

vetrom · on Feb 5, 2015

Well, dealing with horrible exception handling is still better than just silently running off the edge on errors.

Much like any language, you need to read and understand it to be able to truly write, I think. bash is deceptive in this regard IMO, due to how low its barrier to entry is.

Now if you want to see some truly horrible code, implement a shell script that runs in bash and zsh and does exception printing in both.

nimrody · on Feb 5, 2015

The real problem with shell scripts is that they usually tie together a few external commands and tend to pass information around using the filesystem.

This, together with poor error handling is a recipe for disaster: Problems with permissions, insufficient disk space, etc. Instead of stopping when encountering an error, most shell scripts will happily continue break at some other point in time (or worse - destroy valuable data).

grymoire1 · on Feb 5, 2015

The emacs interface is very nice as well, using flycheck

reedlaw · on Feb 5, 2015

I ran it against a deploy script generated by Mina [1]. Only a few deprecation warnings and notes about using find instead of ls to better handle non-alphanumeric filenames. I've learned a lot about error handling by reading Mina-generated scripts.

1. http://nadarei.co/mina/

rtpg · on Feb 5, 2015

Isn't this a pretty good argument against shell scripts? I feel like we've advanced far enough in PL research to think of something a bit safer

xiaq · on Feb 5, 2015

Time for advertisement! You might like elvish https://github.com/elves/elvish which proudly has optional typing and much more well-defined semantics than say, bash. This is work in progress though.

Advertisement aside, there is some inherit unsafety in shell scripts that cannot be easily resolved, namely the unsafety involved in interacting with external commands.

Compared to other scripting languages, the greatest advantage of shell languages is the convenience of interacting with external programs. However, at least in Unix, there are few static constraints you can apply to them. Everything we know is that the program will (probably) parse something in argv which are just bytes, (probably) take something from stdin which are just bytes, and (probably) put something to stdout which are again just bytes; there is no universal method to check that the commands arguments are well-formed, or the input format is correct, or the output format conforms to a certain schema without running the actual program. A solution is to define some kind of static protocols for external programs so that their invocations can be statically checked, but it's already too late.

hyperpape · on Feb 5, 2015

Interesting! On first glance, this might be the most appealing attempt to improve on the shell I've seen yet. It's a really hard design space. Some questions:

Why is set necessary? Once you have declaration with var, can't mutation be done without set?

What's your thinking behind making var mandatory for declarations? Safety is obvious, but it seems like terseness is a really big goal for shell programming, especially interactive use.

Also, documentation wise, I don't see how/if you do variable expansion in strings. Same as sh?

xiaq · on Feb 5, 2015

var is for declaration, set for assignment. This is an important contrast that some dynamic languages miss; ironically JavaScript got it right. Contrast this

    var $x = "foo"; if $true { set $x = "bar" }; echo $x # outputs "bar"

with

    var $x = "foo"; if $true { var $x = "bar" }; echo $x # outputs "foo"

The declaration/assignment contrast is very important when it comes to closures (and there are closures in elvish). In python 2, for instance, there is no way (!) to assign to outer variables in closures since `=` declares and assigns at the same time in a `def` block.

There are no variable expansions, but strings are concatenated implicitly when they run together. In sh:

    echo "hello $name, welcome!"

In elvish:

    echo "hello "$name", welcome!"

Implicit concatenation can read a bit weird at first, but it's actually conceptually much simpler and only slightly more cumbersome than string interpolation. It also makes the syntax much simpler.

ScottBurson · on Feb 5, 2015

JavaScript didn't really get it right:

  var a = 1;

  function four() {
    if (true) {
      var a = 4;
    }

    alert(a); // alerts '4', not the global value of '1'
  }

Also, if you omit 'var', the code is still legal (except in strict mode), and the variable winds up in the global scope, which is a recipe for disaster.

Still, it's nice that 'var' exists in JS at all. The idea that variable declarations are unnecessary noise and should be elided -- an idea that dates back at least to BASIC -- is, in my opinion, one of the worst seductive ideas in programming language design. Unless your language has only a single global scope (like BASIC), it always causes problems -- and we know that block scope is important for nontrivial programs.

Elvish sounds interesting; I will have to check it out.

hyperpape · on Feb 6, 2015

I don't think we know that. We know some notion of lexical scope is valuable, but function only vs block scope seems like an issue of familiarity/style.

ScottBurson · on Feb 6, 2015

Okay, but that doesn't change my point that a single global scope is archaic, and for good reason.

hyperpape · on Feb 5, 2015

So I see why you want var, but can't you just treat "x = 5" as "set x = 5"?

You can also remove the need for var (at the expense of no safety for typos) by prohibiting shadowing, like coffeescript does.

xiaq · on Feb 5, 2015

The need for `set` has to do with syntax. In shells the first word of a statement is always considered to be the command, so "x = 5" will not work - it reads "execute command 'x' with arguments '=' and '5'". The traditional solution is to treat the command as an assignment when it contains '=', so you write "x=5" and you are prohibited from adding any spaces, which I find aesthetically very unpleasant.

I was not aware of the CoffeeScript approach towards shadowing before. I will look into it, but it seems to be a very controversial design choice of CoffeeScript.

hyperpape · on Feb 5, 2015

Re coffeescript and shadowing: I don't like it because I dislike the mismatch in semantics with JavaScript and am used to Python, but beyond that, I'm not sure it's wrong.

Re set: it would slightly complicate your grammar, but I don't think detecting "word space* = ..." would create any ambiguities.

Partially this issue is just about how much you value explicitness/regularity vs. concision.

xiaq · on Feb 5, 2015

@hyperpape we seem to have hit the critical level for flame war and now I cannot reply to you :) it's said I will be able to reply after some cooldown time, but here is my reply:

re "word space* = ...": should this echo an equal sign and $ip, or assign $ip to $echo?

    echo =$ip

hyperpape · on Feb 5, 2015

You're right. I knew at some point I'd make a bad assumption based on my own (limited) forays into writing a shell language.

In my case, I'm treating "=" as not able to be included in an unquoted string literal, and I'm requiring variables to start with $.

So in my shell, this would be a syntax error.

    echo =$ip

This works.

    $echo = $ip

If you relaxed the variable naming idea, it would set echo to $ip, but that's probably a bad idea.

xiaq · on Feb 5, 2015

No this doesn't work either, unless you sacrifice the functionality of using variables as commands. Suppose $echo is equal to "echo", what is the most reasonable thing one would expect `$echo = $ip` to do?

Also consider the following snippet:

    : ${SSH:=ssh}
    $SSH $host1 'command1'
    $SSH $host2 'command2'

This has at least two use cases: 1) The user may direct the script use an ssh that is installed somewhere not in PATH by overriding SSH; 2) The user may supply extra flags to ssh by overriding SSH.

That is for the traditional shell part, which is still true in elvish (although due to stricter word splitting semantics use case 2 is different in elvish). Also since in elvish closures are first-class values, it's very intuitive to just call them directly:

    var $f = {|$x $who| echo "Hello, "$who"!" }
    $f = world # outputs "Hello, world!"

From the design perspective, it is possible to let `$echo = $ip` stand for assignment and still retain the ability to use variables as commands. If you give special meaning to "=" when it's the second word and alone and introduce a "call" command:

    var $f = {|$x $who| echo "Hello, "$who"!" }
    call $f = world # outputs "Hello, world!"
    $f = world      # assigns "world" to $f

But this introduces quite some ugliness to the language, and I decided that just requiring assignments to use "set" is the best solution.

A relevant note: Coming up with syntax for a shell language is actually very difficult due to the existence of bare words which greatly limit your inventory of potential operators. For instance one would very likely expect "echo user@example.com" to just echo "user@example.com", so if you give special semantics to "@" or "." it causes confusion and inconvenience. The only safe place you can introduce new semantics is the command, and this forces the language to have a prefix structure. If this reminds you of Lisp, you are correct - due to lack of infix operators lisp has very few restrictions on variable names; but the situation in shell languages is the opposite: due to the liberal use of bare words it is very difficult to introduce new infix operators to a shell language.

hyperpape · on Feb 5, 2015

I agree with all that, and you're right about the syntax being difficult.

My one caveat you can have a little more freedom if you're willing to give up some of the patterns of existing shells, at the cost of unfamiliarity. Of course, you also need to be as expressive as shell so far as possible.

Anyway, "set" isn't a high cost to bear. I don't like it, but that's personal preference, and it clearly helps with the syntax of what you're doing.

rtpg · on Feb 5, 2015

I think a decent solution is to have a shell alternative that defines interfaces for known programs (much like autocomplete scripts do now).

I know ls returns a list of files, so I should be able to use that. I don't know foo, so it's basically string -> string or whatever, but if an entrepreneuring spirit does know about foo, he could write an abstraction layer for it.

The trick is making a simple interface for that

xiaq · on Feb 5, 2015

Yes, one thing that can be done (in future) with elvish is writing wrappers for external commands that run the commands and convert their (bytes) output to strongly typed values.

For instance, `ls` outputs a bunch of lines where each line is supposedly a single file name, but this breaks when some file name contains a `\n`. It is possible to use `ls -b` to escape special characters, but now you have to un-escape the filenames when you pass them to other commands. With elvish it is possible to write a wrapper around `ls` that actually outputs a list (yes there are lists in elvish) of strings and each member of the list can be passed around without un-escaping.

hyperpape · on Feb 5, 2015

Also there's a question about whether you can properly deal with option hell. Parsing every possible output of ls reliably in the face of malicious filenames sounds...fun.

Edit: It's really impossible to avoid edge cases. Take find: you can't parse it, because it's just a list of filenames separated by \n. But filenames can contain just about any character. How do you handle /home/bar\n/tmp?

Maybe you just ignore pathological input, but now you're regressing towards the state of bash.

xiaq · on Feb 5, 2015

Option hell is indeed a problem.

The problem with `find` happens to have a solution (-print0). However it is a PITA in deal with \0-separated strings in traditional shells, unless you pipe it to another command that happens to recognize \0-separated strings.

With elvish you can parse the \0-separated strings outputted by `find ... -print0` into a genuine list - not lines (which are \n-separated strings) or \0-separated strings, but real lists that support indexing, iteration, etc. and there is absolutely no chance that two consequent items will run together or one item will be treated as two. Imagine how fantastic it is to deal with that :)

icot · on Feb 5, 2015

Well, there's plenty of legacy code laying around that may be easier to improve and fix than completely rewrite. In fact, I just presented this tool to my co-workers as a suggestion to clean over 150KLOC of shell scripts we have laying around.

igravious · on Feb 5, 2015

Be interesting to run this checker on the default scripts that come with major UNIX OSes like MacOSX and Ubuntu and Fedora and the like - sounds like a great janitorial project...

jcurbo · on Feb 5, 2015

How about a Haskell DSL? http://www.haskellforall.com/2015/01/use-haskell-for-shell-s...

codygman · on Feb 5, 2015

I've been using this and it's pretty nice. See a real world example in this pull request:

https://github.com/Gabriel439/Haskell-Turtle-Library/commit/...

pjmlp · on Feb 5, 2015

The Xerox PARC and ETHZ answer to that would be REPL instead of shell.

xiaq · on Feb 5, 2015

A shell is a REPL. The problem is that the shell is sloppy in that it only deals with bytes and cannot process any complex data structure.

Again, advertisement for my side project https://github.com/elves/elvish, a Unix shell with true data structures. Still a WIP though.

omaranto · on Feb 5, 2015

There is already a well-developed shell with rich data structures and a fairly reasonable programming language: Microsoft's PowerShell. Sadly it is not a Unix shell. You're probably aware of it, but if not, check it out for design inspiration.

xiaq · on Feb 5, 2015

Of course! PowerShell definitely has a lot of brilliant ideas. Sadly it is overenginnered and has quite some design mistakes. Nevertheless it has served as a great source of inspiration for me - I have actually gone through several PowerShell manuals before I started elvish.

ygra · on Feb 5, 2015

As someone who loves PowerShell and uses it daily, may I ask for specifics for design mistakes and overengineering? You may also answer per mail if you want.

Don't get me wrong, I realise it has its flaws and warts, but for me, and comparing to cmd or bash I still think it's very, very much an improvement.

Off the top of my head actual mistakes (the sort that tends to bite many people) include handling of [ and ] in -Path arguments (necessitating -LiteralPath arguments in later versions), and the constant wondering whether something returns a scalar or an array (and an array of one element being unwrapped into a scalar automatically). During my time working on Pash I also noted a few weirdnesses on source code side, most recently and notably LanguagePrimitives.Convert which has a dependency on the currently-executing runspace (which is stored in a thread-local field).

pjmlp · on Feb 5, 2015

> A shell is a REPL.

Only when it allows the same expressive power over the OS as Lisp Machines, Interlisp-D, Cedar, Oberon have over the running environment.

The only mainstream modern shell that approaches that is Powershell.

Edit: forgot to say good luck for your project

anom9999 · on Feb 5, 2015

You don't need support for complex data types for a shell to be described as "REPL".

REPL is just a Turing-complete real time interpreter. Which means even the VBA "Immediate" panel in (as seen in MS Office) is REPL. And it means Bash is REPL too.

The question you're raising is whether all REPLs are equal. Lisp machines definitely had more control over the host than VBA does. But that doesn't mean that VBA's immediate panel isn't REPL just because a more powerful example exists.

As for Bash, that's a bit of a weird one because Bash wouldn't be much without the accompanying GNU / POSIX userland. But if you're willing to include a UNIX / Linux userland into scope then Bash has just as much control over the host as Lisp did on Lisp machines. But even without the aid of forking additional executables, Bash can still modify the state of the kernel directly. eg

    echo 0 > /proc/sys/vm/swappiness
    echo 3 > /proc/sys/vm/drop_caches

(For those who may not have been aware, echo is a built in command in Bash)

pjmlp · on Feb 5, 2015

You are missing the part about manipulating other applications or controling GUI elements, like Powershell kind of allows via DLL interop and OLE Automation.

As for /proc/sys, not all UNIXes have such features.

In Oberon I could pipe selected text from any application into any command that had a GUI aware type signature, for example.

anom9999 · on Feb 5, 2015

> You are missing the part about manipulating other applications or controling GUI elements, like Powershell kind of allows via DLL interop and OLE Automation.

There are lots of command line hooks for GUIs. Want to copy data to the clipboard from the command line? xclip. Want to pop up a notification in your desktop environment's notification bar? notify-send "hello world!" etc

> As for /proc/sys, not all UNIXes have such features.

That example of mine was clearly taken from Linux - so it goes without saying that most UNIXes would behave different in that specific regard. Even so, they'd still have command line tools for doing the same thing (and to be fair, Linux does too, even with a vaguely-Plan 9 virtual file system)

> In Oberon I could pipe selected text from any application into any command that had a GUI aware type signature, for example.

Well like I said, I'm not trying to say that all REPL's are equal, but most of what you're describing is still possible in at least Linux. I'm not saying it's as intuitive nor "pretty" as it would have been on the Oberon, but it's certainly possible.

To be quite honest, most of what you've been posting on this topic has really just been elitism. And I do actually sympathise with your point as working in Bash can be a complete hateful mess at times (even without comparing it to the old Lisp machines). But that doesn't change the fact that Bash is a REPL environment.

PantaloonFlames · on Feb 5, 2015

> it only deals with bytes and cannot process any complex data structure.

Just FYI. Microsoft tried to address that problem in Windows a long time ago when it introduced Powershell.

whitten · on Feb 5, 2015

Does Powershell provide a 'datatyping facility' for the contents of Files? or is it just values returned from Powershell defined objects or methods?

ygra · on Feb 6, 2015

Contents of files are, depending on how you read them either a byte[], a string or a list of strings (lines). You can run them through parsers for JSON, XML, CSV or whatever else is handy to get actual objects. In the CSV case there are even cmdlets that work directly with files (Import-Csv, Export-Csv), for XML I usually use [xml](gc file), more general there are the ConvertFrom-* and ConvertTo-* cmdlets, e.g. for JSON and CSV.

xiaq · on Feb 5, 2015

See my reply to omaranto.

ygra · on Feb 5, 2015

As someone who regularly answers batch file questions on Stack Overflow, I think this would be invaluable for all the mistakes people make there too.

xrstf · on Feb 5, 2015

This is awesome. As a beginner when it comes to writing shellscripts, this is my new jshint equivalent.