Note to author: the part with actual shell examples is buried too deep, I didn't think it was capable of actual shell things (seemed more like a scripting language) until I came across the 'Shell Capabilities' chapter. Seems like an obvious title but I completely glossed over it while perusing until I did a keyword search for 'redirection', and I imagine many people will too.
Consider putting everything on a single page like oil shell does [http://www.oilshell.org/release/latest/doc/oil-language-tour...], so that simply scrolling down will show the shell examples without having to manually click click click the next button.
I agree. As a language which intends to replace shell scripts, I would show first that the syntax is pretty much the same in Hush. I would put this page first https://hush-shell.github.io/cmd/index.html
To add to this, an example using pipes would be great too. I assume it's done the normal Bash way within command blocks. But I have no idea whether it's supported, after all the point is to use a different, more maintainable language.
But otherwise I think it's great, it has all the basics I'd expect from a modern language and still seems simple enough for quick scripts. I'll try it alone for the better syntax and the imho neat idea to return stdout & stderr separately in a dict.
> Traditional shell scripting languages are notoriously limited<snip>
I feel like people looking to replace shells and shell languages need to really think deep and hard about this if it's something they believe. Shell scripts are really anything but limited, and in fact most replacements are more limited (either by design or by accident), often imposing awkward control flow on you or making things that should be simple much much more complicated.
Error prone? Sure. Difficult to maintain? Yep, often. But when the thing you're doing is working with external processes and hooking them together into some kind of routine, the reason shell scripts have endured is precisely because they are extremely good at that, and provide immense amounts of flexibility and power to how you go about it in a concise way.
I couldn't agree with this statement more. As someone who has developed their own shell over the last decade I've been keeping a close eye on what other people have been building too. So often I see people writing shells that are inspired by programming languages so their syntax looks amazing in documents. But they always strike me as being hugely tedious for repetitive and often quite dull tasks. The kind of 5 minute jobs that you drop into an interactive REPL to solve an immediate problem. And then maybe save as a shell script later if you find you're doing that task frequently.
People look at the problems with error handling and maintainability (which, as you highlighted yourself, are very real) but they end up throwing the baby out with the bath water when trying to solve those problems.
Ultimately it doesn't matter how well a shell is for scripting if it still creates more trouble as an interactive shell compared to (for example) Bash. I say that because we already have Python, Ruby, Perl, node.js, Jua, and a plethora of other languages that can be used for scripting. We don't need more scripting languages. We need better shells.
Obviously. I don't think this is the most important missing part, though. I would say it differently: we need way, way better REPLs.
IPython is an example of a REPL that's passable as a shell. It can run in a terminal and has a GUI version based on Qt, which allows displaying images inline. You can drop into a "real" shell with a single `!` character (you get pipes, output capture, and (Python) variable interpolation), and it even has some syntactic shortcuts for the parts where Python's own syntax is irritatingly verbose. If you like Python, then IPython can be your day-to-day shell right now. You just need to remember not to start ncurses programs from within qtconsole (works ok in terminal). I used it for a few years when I was forced to work on Windows. Before my time, I heard it was popular to use tclsh as a shell on Windows.
I think that it proves that almost any language can be used as a shell, as long as its REPL is as rich and featureful enough. Since you can use Python as a shell, which as a language is not exactly the epitome of terseness and expressiveness, you could definitely make do with almost any other interpreted language, too. The problem is that very, very few languages have REPLs that are anywhere near IPython. It's so bad sometimes that you're advised to use `rlwrap` just to get basic line editing and history!
I've been working on a new shell based on GNU Smalltalk[1]. I really like the syntax - or lack of thereof - and being able to dump an image (ie. full state of the VM) at any time (and load it later) seemed like a good idea. The only change I needed was to add the `|>` pseudo-operator, which puts what's on the left into parens. Being able to introspect the running session was my primary motivation: I wanted to make the shell and the whole environment as discoverable as possible. I wrote some code for that and then realized that the default REPL uses readline from C, so it freezes the entire VM when waiting for input (including all background threads). My workaround was to set up a socket server and connect to it via rlwrapped telnet...
Anyway, I think "do we need a new shell" is the wrong question; instead, we should focus on improving REPLs to the point where a separate shell becomes unnecessary.
The REPL is the shell. I even used the term "REPL" (and a couple of synonyms too) in my comment. So I do agree it's critical but that doesn't make the language irrelevant. Your point about how Python shells have had to create syntactic sugar for REPl usage is a good illustration of my point about how it matters a lot.
Also you can render images in quite a few terminal emulators already. Some shells (mine included) ship with hooks to autodetect which terminal emulator you're using and find the best method for rendering those images. eg https://github.com/lmorg/murex/blob/master/config/defaults/p...
There's definitely room for improvement in the whole TTY / shell / terminal emulator integration space though. But that's not going to happen without breaking support for existing CLI tools. Which basically means it's never going to happen given it wouldn't ever gain traction.
> Your point about how Python shells have had to create syntactic sugar for REPL usage is a good illustration of my point about how it matters a lot.
Of course it matters. The fact that Python is hardly suited for one-liners means that, if you want to write one-liners in it, you'll have to first extend or change its syntax and stdlib, pouring time and effort into creating a DSL for the interactive use-case. Some languages basically are DSLs for one-liners, like AWK, jq, TCL, Perl and Raku; some require minimal effort, like Lisp, Smalltalk, Ruby or Groovy; and some require a lot of work to make the syntax you want work, like Python, Lua, or JavaScript (or Java). So yes, the choice of language definitely matters, but in principle every language with a REPL can be made into a shell with more or less effort.
The problem is that even when you do make that effort, you're stuck with a shell nice for scripting, but with line-editing capabilities of raw readline at best. I would like to see a framework for creating rich REPLs that would be language agnostic, so that I could get a state of the art auto-completion dialog no matter which language I decided to make into a shell.
> Some shells (mine included) ship with hooks to autodetect which terminal emulator you're using and find the best method for rendering those images.
That's an interesting bit of functionality, I will take a look. Thanks!
> I would like to see a framework for creating rich REPLs that would be language agnostic, so that I could get a state of the art auto-completion dialog no matter which language I decided to make into a shell.
It's doable with existing tools. You have LSP to provide the syntactical framework and there's no shortage of alternatives to readline (I'd written my own[1] to use in murex[2], and open sourced that).
The problem you still face is that a good shell will offer autocompletion suggestions for strings that aren't language keywords or function names. eg
- file names; and there's a lot of hidden logic in how to do this. Do you build in fzf-like support, just include fzf wholesale but increase your dependency tree, or go for basic path completion. Do you check metadata (eg hidden files and system files on Windows), include dot-prefixed files on Linux / UNIX, etc. How do you know when to return paths, or paths and files, or even know not to return disk items at all? (see next point)
- flags for existing CLI tools (assuming you want compatibility with existing tools). Fish and murex will parse man pages to populate suggestions, others rely entirely on the community to write autocompletion scripts.
- Are you including variables in your completion of strings. And if so are you reading the variables to spot if it's a path and then following that path. eg `cd $HOME/[tab]` should then return items inside a your home directory even though you've not actually specified your home directory as a string. That means the shell needs to expand the variables to see if it's a valid path. But that's a shell decision rather than a language feature.
Some of these lists might take a while to populate so you then have another problem. Do you delay the autocompletion list (bad UX because it slows the user down) or provide the autocompletion sooner. And if the latter, how do you do that without:
1. changing the items under what you're about to select causing you to accidentally select the wrong option
2. communicate that there are update clearly
3. ensure the UI is consistent when slower loading entries might not fit the same dimensions as the space allocated for the list (if you dynamically size your completions to fit the screen real estate)
4. ensure that there's still something present while you're lazy loading the rest of the suggestions; and that those early entries on the completion list are worthwhile and accurate
5. what about sorting the list? Alphabetical? By feature? etc
The REPL in murex was inspired by IDEs so I've spent a lot of time trying to consider how to provide the best UX around autocompletion. One thing I've learnt is that it's a lot harder to get right than it seems on the surface.
> Also you can render images in quite a few terminal emulators already. Some shells (mine included) ship with hooks to autodetect which terminal emulator you're using and find the best method for rendering those images.
Same word, different concepts. In Smalltalk an 'image' is a memory dump of the VM state, suitable for reloading and resuming, and typically includes a complete development environment:
TBH I used the word "image" in both context: once as a synonym for picture, and the other time meaning Smalltalk image. I think GP was responding to the former use in the quoted text.
But, it's actually worth highlighting: GNU Smalltalk is image-based, but has NO GUI by default. So while the image does contain the state of your whole program at the point of dumping it, it does not include the whole IDE together with it. GNU Smalltalk is such a wonder, it's so sad it's not being actively developed :(
> GNU Smalltalk is image-based, but has NO GUI by default. […]it does not include the whole IDE together with it. […] it's so sad it's not being actively developed :(
I suspect that, given the general emphasis in the Smalltalk community on images that are "batteries included", that there is a cause and effect relationship between these lacks.
That model (image with batteries included) of Smalltalk development does not work in reality in an open source setting. It still works if there's a company behind the implementation, but with typical open source development, where everyone goes on to write things they are interested in at the moment, without looking back on the rest of the ecosystem all that much, this model breaks. It's like maintaining a Linux distro, but with a few orders of magnitude less people and with tooling that never reaches any kind of stability (because it's replaced before it has the chance).
Distribution in the vein of Cuis and GST - small, with carefully curated and selected content - are the better way of going forward, unless you're willing to fork some $1k for a commercial license for Visual Works or ST/X.
GST is also a Smalltalk than can be embedded in other applications. Try doing that with Pharo.
The lack of GUI by default - it exists as a loadable module, BTW - was not the thing that derailed GST development. It was a small, niche project and when some of the contributors lost interest in it, it stagnated and, after a while, fell behind other implementations. If only the JIT in GST was finished, and a few compatibility layers were added on top of its "Smalltalk in Smalltalk", GST could well resurrect. Not that it's going to happen...
In the meantime, if you want to play with Smalltalk, I'd suggest using either the open-source ST/X or VW personal use license. Morphic is nice to experience once - use Cuis or Squeak for that.
(I'm writing this comment right after the rant about Pharo so I'm probably repeating myself here, but wanted to note that Pharo is the perfect example of how badly the traditional Smalltalk development model fits the modern reality. It simply does not, at all.)
Whelp, didn't make it in time and I can't edit anymore... Here's what I wanted to write:
I don't want to talk about Pharo. Please, don't make me talk about Pharo at length. Even a short summary will be painful... But I'll do it anyway if only to have something to link to in the future.
In short: Pharo is the worst Smalltalk out there, with a community that is small, elitist, and entirely unconcerned with user experience. I once engaged with the community on Discord to say that they could maybe care a tiny little bit more about backward compatibility. I learned that "they do care a lot" and - oh wonder! - they are introducing a framework (brand new, of course) to help fix compatibility issues (something based on pragmas)! I then said, "look, I just opened an image of Pharo 7, went to package manager, and tried to install PetitParser, only to get debugger window pop up multiple times." Stock image. Stock package manager. It worked perfectly fine in Pharo 5. Doesn't work at all now. The response? "Well, that package manager is old and was abandoned," and they are currently (again) rolling out an entirely new way of doing things as fundamental as installing packages. The new tool was hard to find, buried five feet under in menus; the broken, old, abandoned project was left where it was, without a single comment about the deprecation. From my perspective - someone returning to Pharo after a few years apart - something that used to work quietly stopped working. I then heard that I should follow the development closely, and anyway, that's not an example of breaking backward compatibility (in user experience) - it's just... something else, and it "just happens" sometimes. I was then chastised for daring to suggest that a mouse is not, in fact, a requirement for having an interactive, introspective, personal system (which is obvious - you can run Emacs in a terminal, and you'll get the exact thing). I left afterward, not wanting any more of the "support" from the community...
In a tiny world of Smalltalk, Pharo rose to prominence because it was free, and it was used by a few flagship projects as a reference platform (while they were being developed - most of the projects had already died - usually shortly after authors got their PhDs). Unfortunately, these were just accidents of history, yet Pharo's community somehow took them for granted and became even more elitist than Smug Lisp Weenies ever were.
The problem with the user experience was mirrored in the developer experience. Over the years, the churn in frameworks was considerably worse in the Pharo ecosystem than even in the JavaScript world. In such a small ecosystem, and over just a few years, most core frameworks were replaced, sometimes more than once, with the "next versions," "new incarnations," and "reboot projects." Due to Pharo not supporting namespaces, keeping multiple kinds or major versions of a framework in the same image is problematic and can lead to bugs. Without doing that, though, you lose all compatibility with tools, utilities, and libraries written for the older versions. In JavaScript, the number of active programmers lets you expect someone to update and fix those tools so that you don't have to. In a tiny ecosystem of Pharo, you can't expect that - the original author most likely moved on years ago, plus the collaboration tools in Smalltalk traditionally sucked a lot. If you depend on something, you should expect to be the one to maintain it - against a collection of ever-changing APIs all around you. In other words, the Pharo community worked hard to dump everyone back into the "dependency hell" period of the early Linux.
Most of that came from the misguided efforts of the community, but some problems ran deeper than that. Take Morphic as an example. It's a nice - if a little simplistic - UI library, and the reactivity of all the UI elements under it is eye-opening. At the same time, it's relatively inflexible (most suited to animating spinning rectangles) and definitely shows its age. This led to attempt after attempt to extend it with features people expect from modern UIs. Those attempts were hard to implement, hard to use, and mostly unsuccessful - all due to the underlying model's limitations. The user experience suffers because of them, too. For example, I - and probably 99% of other users - want my windows to look like all the other windows on my desktop. Instead, all Pharo windows live in a single OS-level window. Another example: I like how my (tiling) window manager does things and would like Pharo's windows to also be managed by it. No such luck. Problems that are long solved elsewhere in Pharo are being solved again, in a Pharo-specific way. Scaling an interface so that the labels still fit in their containers when I increase the font size is something that Pharo still has trouble with. It took ages to get 4k support. Customizing the key bindings was done ad-hoc in Morphs - last time I checked, there was a new framework that would fix that in the works. It was in 2018.
This case might not be purely due to Morphic; there's a pinch of elitism hidden here: because real Smalltalkers use the mouse! Mouse and mouse only! Using a keyboard is what the guys writing Java do, so it's a "Bad Thing" to do. All the poor sheeple, trying to hold to their home rows: Smalltalk (meaning: just Pharo, there's no other spoon) exists to liberate the ignorant you from the tyranny of keyboards!
I'd write more, but unfortunately, after Pharo 7, the installer broke for me on Linux, and I couldn't run it ever since. The 4th incarnation of the FFI framework (that itself needs an obsolete version of libffi) is used to require libgit so old that I can't get it without compiling it from the source. Neither FFI nor Git should be critical to the whole system, but the image simply fails to run when the libgit is not loaded correctly. That, in turn, is because the 5th incarnation of the source code version management tool, finally, begrudgingly, switched from custom code to Git. That's good in itself, but you should either bundle what's needed with the VM or make the tool optional. But no - for all the backward compatibility caring, Pharo moves fast and breaks things with impunity. It doesn't matter that users will lose interest in both Pharo and Smalltalk at the same time when faced with such an experience.
The funniest thing in all this is that you don't even need Morphic to directly interact with widgets in a system. Multiple other Smalltalks demonstrate that it's possible to seamlessly inspect and modify widgets during runtime in a system running in OS-level windows and using native widgets.
If you want to experience Smalltalk properly, get a personal copy of VisualWorks from Cincom. On Linux, you can also use Smalltalk/X (the "jv branch") - it's still being developed, albeit slower. Both implementations can effortlessly run Smalltalk code written in the 90s. Pharo has problems (but the Next Great Framework, when done, will help with that!) running code from 2 years ago. Both, of course, provide headless modes of operation and had been doing so literally decades before Pharo. Even if you want to witness Morphic and classical Smalltalk experience, don't use Pharo - go for Cuis Smalltalk or Squeak instead.
Obsessed with new and shiny, with things bolted-on on top of utterly inadequate infrastructure, rediscovering problem domains that the rest of the world solved already, Pharo is a trap that is not, and likely never will be, user friendly and stable. On the other hand, it is capable - if you are lucky enough for it to work for you.
TL;DR: There's way more to Smalltalk than just Pharo, and I would recommend basically any other implementation to people wanting to learn it. Visual Works, Smalltalk/X, Squeak, and Cuis, are all excellent choices, depending on your needs. I learned Smalltalk with Pharo, starting with version 1.4. There was a brief period when everything worked, and there was a lot of effort put into documentation, most notably the two Pharo books and some follow-ups. After that, everything went downhill, and for a long time, I thought that's just how Smalltalk development is in general. Seeing and working with other implementations showed me a very different picture, and it caused me to lose all faith in the Pharo project.
Cuis is a really interesting incarnation. It's quite stripped down, and Juan, the creator, has done a lot of work to simplify things as much as possible. He has even stripped down Morphic and has an early working version of its "successor," which is vector based.
The dual problem with the open source Smalltalks is that there aren't enough people working on them, and the community that is working on them is fractured.
> He has even stripped down Morphic and has an early working version of its "successor," which is vector based.
That's very interesting, actually, I didn't know, thanks. Morphic is not a bad idea in itself - the problem is that its implementations are ~30 years behind the times, and slapping framework after framework on it won't make the pig fly. But, rethinking and modernizing the ideas behind the base implementation might be a way to make all those frameworks work properly, finally. I will pay attention to Cuis more.
> The dual problem with the open source Smalltalks is that there aren't enough people working on them, and the community that is working on them is fractured.
No, the problem is that the open source Smalltalk community comes from academia mostly, plus some hobbyists, and they are by and large simply poor software engineers. Smalltalk is capable enough to allow them to get what they want more-or-less working, and quickly, but the code is a mess. The authors then go on to write their dissertations or publish some papers, which generates a bit of interest in the project. But then two things happen, almost every time: a) the author gets the promotion or title they wanted and leave the project; and b) the people who got interested in the project are met with a steaming pile of garbage code that modifies half of the Kernel category for no reason at all, has methods with 100+ lines of code, has complex hierarchies that were abandoned halfway but left in the code, has no or almost no comments, the names of identifiers are incomprehensible abbreviations, and so on. Coupled with "move fast, break things" of Pharo, this results in the project stagnating and then quickly stopping to work altogether, other than with the original image (if you're lucky and the author actually posted the image).
Diving into the world of commercial Smalltalks showed me that it's actually possible to write solid, extensible, well documented, tested and stable Smalltalk code. Later, looking at Cuis showed me that it is in fact possible to do the same with open source implementation, but you need a strong leadership, a vision, and skills that most of the Pharo community is simply not interested in having.
Note: this is the first ever time where I'm knowingly, willingly, and somewhat brutally attack or complain about a particular community. I'm really tolerant. I can deal with a lot. But half of the Discord channel shouting at me and calling me names like "masochist" just because I want Ctrl+A and Ctrl+E work as Home and End in the editor? That was too much, even for me.
I love so much about Smalltalk but also felt many frustrations with implementations and community issues.
A couple of me own rants on that include from 2007:
"[Edu-sig] Comments on Kay's Reinvention of Programming proposal (was Re: More Pipeline News)"
https://mail.python.org/pipermail/edu-sig/2007-March/007822....
"There is once again the common criticism leveled at Smalltalk of being
too self-contained. Compare this proposal with one that suggested making
tools that could be used like telescope or a microscope for relating to
code packages in other languages -- to use them as best possible on
their own terms (or perhaps virtualized internally). Consider how the
proposal suggests scripting all the way down -- yet how are the
scripting tools built in Squeak? Certainly not with the scripting
language. And consider there are always barriers to any system -- where
you hit the OS, or CPU microcode, or proprietary hardware
specifications, or even unknowns in quantum physics, and so on. :-) So
every system has limits. But by pretending this one will not, this
project may miss out on the whole issue of interfacing to systems beyond
those limits in a coherent way. For example -- OpenGL works, and
represents an API that has matured over a quarter century, and is linked
with lots of hardware and software implementations; why not build on it
instead of "bitblt"?. Or consider how Jython can use Java Swing
libraries and other such Java libraries easily. Consider this example of
proposed failure: the proposal's decision to punt on printing -- no
focus there in difficulties of interfacing with a diversity of OS
services as a guest -- of relating to prior art. This is where Python
often shines as a system (language plus libraries plus community) -- and
Python is based on a different design philosophy or perhaps different
design ethic. Python has also prioritized "modularity" from the
beginning, which has made a lot of these issues more manageable; Kay's
proposal talks a lot about internal integration, but the word
"modularity" is not even in the document. In this sense, I think Kay's
group are repeating mistakes of the past and also dodging another very
difficult issue -- one that Python somehow has more-or-less gotten right
in its own way. It's their right to do their own thing; just pointing
out a limitation here."
And later from 2010:
"[fonc] On inventing the computing microscope/telescope for the dynamic semantic web"
https://www.mail-archive.com/fonc@vpri.org/msg01445.html
"As I said at the end of my second post linked above: "It's taken a while for me to see this, but, with JavaScript, essentially each web page can be seen like a Smalltalk ObjectMemory (or text-based image like PataPata writes out). While I work towards using the Pointrel System to add triples in a declarative way, in practice, the web of calling cgi scripts at URLs is a lot like message passing (just more like the earlier Smalltalk-72 way without well-defined syntax). So, essentially, a web of HTML pages with JavaScript and CGI on servers is like the Smalltalk system written large. :-) Just in a very ad hoc and inelegant way. :-)""
Scripting without a shell is just scripting. The "shell" is the UI component. Shell's don't even need to be CLI based either (eg explorer.exe, web shells, etc).
I want a shell language that wonderfully encompasses both interactive usage and the creation of easy, safe, effective shell scripts.
I've come to the conclusion I can want that all I want, but it's not possible. Shell scripts and interactive usage have too many fundamentally opposing forces in them for one language to bridge the gap. You can be great at one or the other, and you can be bad at both, but you can't be great at both. There simply isn't such a thing, even in theory.
Closest you could get is a language that has two very closely related dialects, but the "dialects" aren't going to be "the same thing, but with a couple of little changes"; there's going to be a lot of differences, enough that people are going to be arguing seriously and with some merit that it's just two languages tied together at the hip.
My latest discovery - Raku - does try, and with not bad results. Also, Scala and Ammonite work quite well. Clojure and Babashka, too.
> There simply isn't such a thing, even in theory.
That's a very strong claim. What makes you think it's impossible to design such a language? The fact that none exists currently doesn't mean that it can't exist.
> arguing seriously and with some merit that it's just two languages tied together at the hip.
Just use Racket :-) It's a bundle of tens of languages anyway, so nobody will complain if you add one more. https://rash-lang.org does this. Due to how Racket works, you can write scripts for that shell in anything from Scheme to Typed Racket to Datalog to Brainf*ck (and more, obviously).
"What makes you think it's impossible to design such a language?"
1. An interactive shell bases its error handling on the human being right there to handle the error. A programming languages bases its error handling on a human not being right there to handle it. This is a fundamental difference, and a language must privilege one or the other. Shell languages currently privilege the first, which is a non-trivial element of why they're dangerous to program in.
2. An interactive shell prioritizes keystrokes, because a human is right there in the moment tapping the keyboard trying to do something. A programming language won't necessarily ignore this, but it's definitely down the list of priorities.
3. An interactive shell bases its expectations of what state the user has in their head as being based on the live session state and the user's experience. A programming language bases its expectations of what state the user has based on the parameters to the functions and other language environment stuff. "cd" is really quite dangerous in shell scripts, especially once you start feeding it variables, but generally safe in interactive use because you see where you are (unless your PS1 is too bare, but you should fix that). Yes, obviously, we've all still messed that up at some point but I don't think it's necessarily above the baseline for any other human messup.
4. An interactive shell expects you to base your grounding of what's going to happen when you hit enter on the contents of the line when you hit enter. Again, yes, there are human messups, but that's the model. You don't wrap a whole bunch of defensive programming around every command line you type, you just use your eyes and your own understanding. A programming language does expect defensive programming. Defaults that make sense in one case are incredibly annoying or dangerous in the other. In my shell scripts, I want an empty (or perhaps "undefined") variable to be an error. In my interactive use, I may like a button to expand them all in place so I can see what I'm actually running but don't want it to necessarily be an error. And that's just one example of the sort of differences I'd expect. Another is string handling; there is a reason that almost all programming languages require all strings to be delimited via quotes (and the ones that don't generally consider it a mistake), and almost all shells do not. I wouldn't be surprised this criterion alone could be used as a razor to separate the categories with very high accuracy.
I suspect that were I to take the time to examine the 6 things you cite as things that are putatively both good scripting languages and good shell languages that I could tell you which category I put them in quite easily and why I wouldn't consider using them for the other thing, but that's rather more than I can take on for an HN post.
The weakness of the shell is that it did not emerge as an LR-parsed language that could easily be expressed with a concise yacc grammar. The "one true awk," for example, bundled a yacc grammar as part of the build until relatively recently. The complex grammar of the POSIX shell is ambiguous in the extreme in many situations.
The shell language was further constrained by POSIX that removed much Korn shell functionality, even though a public domain implementation of Korn existed. This was done to allow the shell parser to be very small, and Debian's Almquist shell compiles to under 100k on i386 (unlike bash, which trails it also in speed).
This new Hush shell looks a bit too wordy and reminiscent of javascript.
Anyone designing a new shell grammar should also design in such a way that Busybox could bundle this new interpreter, should popularity justify it.
While being extremely small is a worthy goal, I suppose the aim of Hush is to make writing larger shell scripts easier and less error-prone. It's more for the niche of Perl of old than of minimal shells like ash.
For a very limited device, a very limited shell like that in Busybox is sufficient, because it likely does not need large shell scripts, or a lot of interactive work.
Looking at [1], current Hush is under 700k, which is still way smaller than Python or Perl, with much of its expressiveness.
The whole of Busybox is a megabyte. If Hush is 700k, then there are many, many places that it cannot go. As such, it cannot be a standard default shell.
Use on smaller embedded platforms will not be possible. That is the trade.
The requirement, as set by POSIX, was to run the shell in Xenix on an 80286, which only supported a 64k text segment.
There are few tools that can meet this requirement, and the limit is as relevant now as the day it was first published. Any aspiring shell must run within this limit to fully pervade POSIX. Ignoring this requirement greatly reduces applicable scope.
Korn '88 could do it, but the code was not pretty.
Any POSIX shell replacement has to hit that target.
"Indeed, in 2014 -- 42 years after Intel’s rollout of the 8008 -- the godfather of microcontrollers accounted for 39.7% of overall sales revenues in the MCU market... That’s more than 32-bit devices (38.5%) and 16-bitters (21.8%), which grabbed most of the headlines."
1. How many people connect to shells on those microcontrollers?
2. How many people actually develop for those microcontrollers?
The numbers of devices doesn't matter. Exaggerating a bit, if you have 100 folks developing for 50 billion devices and 10 million developing for 10 billion devices, we will always optimize for the development experience of the 10 million developers, not the 100.
I'm quite convinced (based on job board statistics) that embedded developers are a minority.
And that's not even counting sysadmins, devops, testers, etc, in which case the community outside of embedded completely dwarfs the embedded community.
Agreed. The problem with bash & Co. (if that's a problem) is that it's the result of a process of adding clunk after clunk of stuff in the only few ways that wouldn't break the language. So we have things like i=$(( i + 1 )) and [ $? -eq 0 ] and [ "$?" = "0" ]. Add to it the differences between "$@" "$" and $ and a number of similar things. Apart from that it's very good at what it does. Hush seems to start from a normal language like Python or Ruby and add the pipelining / redirection stuff which is usually very verbose and error prone to do in those kind of languages.
Yes, Python comes with 100% overhead from Shell. I have packages inside either the system or virtual environment tools. Honestly I err toward sh/bash because I write it, ship it, and forget it!
Yes, exactly. I have a few slogans for this, but one is that the shell language should compose with the rest of the operating system, which is "processes and files". It's a situated language.
So I have been making the point that the "narrow waist" of shell is still processes and files. Whereas the narrow waist of Python and Lua is functions and compound data structures like dicts/lists (tables).
One thing that is not quite obvious until you design AND USE a shell language is that if you have these conflicting narrow waists, you have problems of composition, and you will offload this complexity onto your users.
From a very quick look at the hush docs, it is Lua with some shell-like stuff grafted on. I believe that will cause cause such problems in programs. They will be longer and harder to write. You will have more edge cases.
----
I hit this issue many times in Oil, and I wrote many posts tagged #software-architecture about it:
- Should shells have two tiers?
- Both external processes and internal "functions"?
- Both pipelines of bytes and pipelines of structured data?
- Another one I'm working on now: Both exit codes and Python-like exceptions?
In Oil the answer is yes for the first 2, but a subtle point is that the former are PRIMARY and still the narrow waist; the latter are helpers. Dicts and Lists exist to help you deal with files (and produce JSON and TSV); but they're not the primary structures of the program.
I've also discussed this issue pretty extensively with the authors of the NGS and Elvish shells.
Having unsigned / signed and sync / async (what color is your function?) are similar problems of composition in language design. I call them "Perlis-Thompson" problems.
There is no easy answer but IMO it's vital to explicitly think about and address these problems. C has a flawed solution to the combinatorial explosion (implicit conversions), and it's caused problems and been lamented for ~50 years. There's also a good quote in that thread about why unsigned sizes in STL were a language design mistake.
Is this really a shell scriping language? Hush isn't an interactive shell, nor does it compile to a common shell script. The only way to run these scripts is to install the hush interpreter and run the script through it.
Isn't that just a normal scripting language? What's the real benefit of using this over Node or Python? I suppose the syntax is more aesthetically similar to shell scripts... but I don't exactly see that as a plus.
System level stuff sucks in Python. Dealing with files, I/O, permissions, etc is a real pain. It easily takes 5x as long and as many loc to do the same thing as in bash.
I can see the benefit of dropping to a command block to, say, run a command and filter the output with some | grep | awk | sort of whatever, and then seamlessly come back up to a more fully featured language to deal with that data.
This is very true, particularly if your script is just an imperative list of commands to run.
Python scripts win when you actually need to handle errors, or non trivial output parsing, or when the concept of “list of things” doesn’t fit neatly into the “lines of text / pipe symbol” paradigm.
I try to think of it like Donkey Kong. Every time you do something like this:
var=$(command | awk -vID=$ID ‘$6 ~ /foo/ and $7 == ID {print $1 “ “ $2} | head -3)
…you get hit by a barrel. Three hits and you turn it into a Python module with a __main__.py or an if __name__ == “__main__” at the bottom.
If it’s a truism that most shell scripts are better off written in Python, it’s just as true that all Python scripts are better off as reusable modules (that may also be standalone scripts.)
My experience is different. For me, Python has a good balance of readability, access to system functions and consistent, predictable interfaces. I'm not saying you're wrong, just that "it depends". I'm pretty adequate at writing BASH, but it's not my "daily driver" programming language, so I have to still try things out, google a lot of things that aren't easily obvious, and excessive trial and error. But as a Java programmer I also know that Java, with all of its abstractions and verbosity, is a complete non-starter for shell scripting. For me, Python has often been a helpful compromise.
Part of this may also be that I've never learned to do a great job at organizing a BASH shell "project". So, those often get unwieldy maintenance nightmares. At plenty of time me and my peers have agreed: when your BASH script exceeds [some low number of] lines, find another language.
Again, it matters a lot what your team's culture and comforts are. If that's BASH for you, then r.e.s.p.e.c.t.
You’re comparing Python and Bash as programming languages, not as shells. Python’s great as a programming language, but it’s a terrible shell. Likewise, Bash is a terrible programming language, but as a shell is pretty good and is probably the most used shell. The point parent was making is that it’s far more difficult in Python to run a process, capture the output, and grep something from it, than it is in Bash. In Bash this is a one liner with extremely minimal syntax. In Python, this is a big multi-line affair invoking several modules and multiple functions. Python isn’t a shell purely because you can’t execute an executable file by typing the bare name of the file. Python also doesn’t pass the cut & paste test - you normally can’t paste snippets from somewhere into a Python repl due to indentation issues, this alone makes it very difficult to use as an interactive shell. And Python is not meant to be an interactive shell, so there isn’t much of a problem.
It might take a few more lines of code to do stuff in a real programming language like Python (I would recommend Deno actually) but at least there's a decent chance it will actually work reliably.
> then seamlessly come back up to a more fully featured language to deal with that data
That's the thing - I don't see how you can do that. The command blocks return nil or error, but not the output, so that you'd have to dump that to a temporary file, scrape the file separately, remember to delete it, and so on. I can make it work, but it doesn't live up to the idea of seamlessness.
The point is that you can do things like create external processes, pipe them together, redirect output to files, with the same ultra-lightweight syntax of bash etc. Compare that to all the nonsense you have to do in Node or Python to pipe two processes together!
You’re right in that writing Python using shell paradigms is awful.
Writing Python that shells out to do stuff — but using Python paradigms — isn’t so bad. Particularly if you are used to calving off your shell stuff into shell functions that run something and massage the output. In Python you end up doing something pretty similar: yield the text you want and use that as arguments to the next call to subprocess.run.
The purpose of OP's project kind of reminded me of shell.js (shx) [1] which is a nodejs library that wraps all kinds of common UNIX commands to their own synchronously executed methods.
I guess that most shell projects start off as wanting to be a cross-platform solution to other operating systems, but somewhere in between either escalate to being their own programming language (like all the powershell revamps) or trying to reinvent the backwards-compatibility approach and/or POSIX standards (e.g. oil shell or zsh).
What I miss among all these new shell projects is a common standardization effort like sh/dash/bash/etc did back in the days. Without creating something like POSIX that also works on Windows and MacOS, all these shell efforts remain being only toy projects of developers without the possibility that they could actually replace the native shells of other operating systems.
Most projects in the node.js area I've seen migrate their build scripts at some point to node.js, because maintaining packages and runtimes on Windows is a major shitshow. node.js has the benefit (compared to other environments) that it's a single .exe that you have to copy somewhere and then you're set to go.
When I compare that with python, for example, it is super hard to integrate. All the anaconda- or python-based bundles for ML engineers are pretty messed up environments on Windows; and nobody actually knows where their site-packages/libraries are really coming from and how to even update them correctly with upstream.
I love Python, but if all I need to do is grab one part of one line of something, and put it into some other command, then I'm just going to use some unholy mixture of sed, awk, or whatever. If I did the same thing in Python, I'd end up with thirty lines or more.
Thirty lines? Unholy one liners are not limited to shell scripting:
from subprocess import check_output
sh = lambda script: check_output(script, shell=True, text=True)
ips = set(line.split()[-1]
for line in sh('last -a').splitlines()
if line and 'tmux' not in line)
The first two lines are pure overhead.
The last line is the equivalent of a shell script one liner but now has all the advantages of a language that supports “\N{clown face}”.
In 200 bytes half of which is overhead. That's not a great start.
> The last line is the equivalent of a shell script one liner
It's slower and requires much much more typing, and anything I want to add isn't going to go in the right place. It has a gross hack to support "tmux", and produces other spurious output (bugs). I don't want my X sessions or other ptys; I have reverse DNS disabled, and I really just want IP addresses.
This is what I would write in shell:
last -a|sed -e 's/.* \([0-9a-f]*[:.][0-9a-f.:]*\)$/\1/;t;d'|sort -u
> now has all the advantages of a language that supports “\N{clown face}”.
This is a joke right? I have twenty cores, this is going to use three. Python is going to use one. 30% less typing, less bugs, faster. Those things are important. A fucking emoji is not.
for those of us with not as much grey in our beards (+1 this response if you get the joke) the python example is a ton more readable to me. Now that probably doesn't matter if the utility is only for you. I've a number of helper programs/scripts/etc that I use that I wrote and are only for my consumption.
Re the paralell stuff and python it's easy to import multiprocessing and take advantage of all cores. I think where python wins is how easy it is to handle errors and organize things as it's a full fat programming language vs a shell scripting language like Bash + the gnu userland.
> the python example is a ton more readable to me.
The python example is also wrong, as in it produces the wrong output: Maybe if you are only worried about your own consumption you can ignore all those poor souls running literally any other application besides ssh and tmux, but at some point you're going to have to stop admiring how "readable" it is and fix it, and then what?
I don't buy that conservative coder mentality of keeping the code clean and readable, because one drop of shit is going to be impossible to remove later. In fact, I think that's bananas. Get correct first, then improve.
> Re the paralell stuff and python it's easy to import multiprocessing and take advantage of all cores
You jest.
> I think where python wins is how easy it is to handle errors
Pray tell what errors do you think we need to handle?
Your shell thing is honestly fine. The emoji thing was a joke, sorry that wasn’t clear.
The thing is with 1 IP lookup in N addresses a grep is also fine, but with M IP lookups the set is O(1) instead of O(N) and that makes a difference.
It’s not just theoretical. Bash slowness adds up fast and you will need to upgrade to a real language anyway.
Example: IPv6 addresses need to be truncated to /64 but last prints them with 0 truncation. Good luck doing that in bash. You’ll need a fully fledged IPv6 address parser, if it’s code that has to survive.
You make a fine point, honestly, about what you can get away with in shell scripting. The bigger picture is that literally nothing ever survives the sticky fingers if many engineers over time if it is a shell script. It gets worse and worse until it is ported, at which point it begins a new life as maintainable code.
> Unholy one liners are not limited to shell scripting
You’re mostly proving the parent’s point here. Yours isn’t a one liner, and even in Bash people don’t write multi-line for loops in one line. I don’t know what you mean about the first two lines being overhead; this doesn’t work without them, and you have magical options “shell” and “text” in there that matter. Your script doesn’t pass the result to another process, so you need another line. This also splits output on spaces, it should be more generic. Doing that involves another module (regex), another function call (or more likely several), and several more lines.
Compared to “last -a | grep tmux | process”, your script is much closer on a log scale to 30 lines than 1, even if the parent’s 30 is a little exaggerated, and even if we use the 5 lines you showed and not the ~10-12 lines it really would be.
I don't have exact rules, but if it's (1) quick, (2) simple to read/write in shell, (3) separate from my app's business logic, then I'll use a shell script.
For example, to take a line off of the end of a file, sed is pretty easy:
sed -i '' -e '$ d' file.txt
After a few pipes, I will reach for Python. Regarding line count, after a `if __name__ == '__main__':` block and some helpful docstrings, it ends up being about thirty lines or so. "\N{man shrugging}"
Check out marcel (https://marceltheshell.org). Marcel is a shell that allows the use of Python functions, and pipes Python values between commands. E.g. remove the files/directories listed in victims.txt (one per line):
read victims.txt | args [f: rm -rf (f)]
Parens delimit Python expressions, so (f) just returns the value of f, one of the values read from victims.txt.
Marcel also provides a Python module so that Python can be used for scripting much more conveniently than in straight Python (i.e. without the marcel module). E.g., print the names of recently changed files in the current directory:
import os
from marcel.api import *
for file in (ls(os.getcwd(), file=True, recursive=True) |
select(lambda f: now() - f.mtime < days(1))):
print(file)
Using your editor to drive a shell is a huge win because it: (a) flips the development bit in your brain, and (b) creates a shared history across all your machines.
Combine that with literate programming and you have a `duck-talking` history of every development or outage response tied to what you actually did.
I'll make the mandatory hn comment since I have not seen it yet. A website introducing a new language should show said language on the first page. It's like introducing a car by saying
- safe
- can ride on 4 wheels
- 2 car seats included
- highway compatible
Yes, this is the first thing that struck me. I don't know why people invest so much time inventing new languages and then fail at the last mile.
If the front page of a new lang site doesn't have language examples showing not just hello world but core differentiating use-cases, why should I bother?
> 1 == 1.0, # false, int and float are always distinct.
Uh?? If int and floats are always distinct then you should disallow int and float comparison (panic) instead of allowing it and always returning false!
That said it is probably a good idea to disable comparison for floats algother: maybe == would be int&bool&string only but for floats there would be ~= (approximate relative equality) but of course the issue is that not everybody will agree on the correct percentage tolerated for equality: 0.1%, less?
The language looks really quite interesting. I could see myself using it for quick scripts.
I think I'd prefer bash's noclobber behaviour to be the default redirection style. The explicitness of being forced to >| always feels like a nice safety feature to me, which would tie in nicely with their other defaults for safer scripting.
Also, not sure I'm keen on their minor change to the redirection syntax¹. It suggests that "2>1" and "2> 1" have very different results, which feels like an odd change to make when you're keeping the other aspects of that syntax. Probably says more about me though, I want 100% match or a complete break to force me to think.
I think it is interesting to compare the approach of oil², which has overlapping goals but is attacking the problem of improving shell scripting while attempting to keep the syntax.
It's really the same interpreter with a few parse time and runtime options. There were surprisingly few compromises (probably the biggest one is the shell-like string literal syntax, which I mention.)
The biggest changes are that we take over ( ) and { } , so you can write stuff like:
if (x > 0) {
echo 'positive' | tac
}
Rather than
if test "$x" -gt 0; then
echo 'positive' | tac
fi
We add a Python-like expression language between () and surprisingly few things need to change to make it feel like a "new language".
Frankly, I think that only, say, three things are worth keeping from traditional shells: pipes, redirections, and $-variables.
Most of the rest in them are design choices dictated by highly constrained machine resources and, most of all, incremental evolution. I'd say that Bash as a language is larger than Hush, while being conceptually much less clean, and in many regards less ergonomic.
Ick, you're right. I was thinking solely about the change with their version of the "2>&1" vs "2>& 1" case.
Although making such an error in my example doesn't seem to invalidate my point. It takes an already odd syntax and makes it subtly different, instead of tightening it up.
The unfortunate truth of all such projects: if it isn't pre-installed on all reasonably complete Linux distos, it can truly succeed, no matter how awesome and cool it is...
... which is why I try to write as much stuff as POSIX sh as possible, and sometimes I have to throw in a Bashism to get by.
And I say this as someone that wishes sh was better, semi-lovingly.
I think part of the problem is that we still don't have a standard /bin/sh even though there is a written standard. It feels fair to me to suggest that you can call dash the working standard on Linux, but you can't trust that beyond Linux systems(and I'm ignoring installs that are busybox-based here too).
And that also feels like the problem with extending a script with a few bashisms too, as you're suddenly trying to remember when that feature appeared. Off the top of my head I'm wondering what things won't be available to Mac users because it ships with a pre-GPLv3 bash for example.
> I think part of the problem is that we still don't have a standard /bin/sh even though there is a written standard. It feels fair to me to suggest that you can call dash the working standard on Linux, but you can't trust that beyond Linux systems(and I'm ignoring installs that are busybox-based here too).
Why do we need a one true shell that implements the standard? Am I misunderstanding you…? The whole point of a standard is that everyone implements it so the different implementations don’t matter (and everyone does implement it, in the case of POSIX shell, minus a couple things like [ -t ] and some bash defaults).
Given developers testing against shells that implement the spec, and a solid spec, and a variety of implementations that correctly implement that spec then I'm in full agreement with you. However, the moment you add caveats to that description you have, well… caveats to that description.
The situation - as I see it - is that we have a spec, and we have systems that ship a /bin/sh that implements something like it. If people are targetting their system's /bin/sh, then they're writing scripts for their /bin/sh implementation not posix compliant shells. For example, my install uses dash which has features beyond the posix spec which I have to eschew to produce portable scripts.
As a sibling comment points out shellcheck will point out a lot problems, but it requires people choosing to use it. You only have to dive around /usr/bin for a little while to see its usage isn't universal ;)
Of course, if the differences in implementation don't affect your usecase then it simply doesn't matter.
This isn't really true at all. We have an ungodly amount of operational process and knowledge tied up in bash scripts deployed across our fleet, but we could easily deploy a different shell, and use that. We don't use bash because it's the least common denominator on our systems --- in fact, I think in an era of AMIs and Dockerfiles, few people do. We use it because of path dependance.
It wasn't that long ago that a fresh Unix install of any flavor didn't come with Perl, Python, zsh, or other common shell or scripting languages. And there's already a trend away from installing those by default. Since adding them now takes mere seconds, the barrier is low.
I dunno, I think the future of interesting shell scripting languages can either be a.) the basis for a new distro, which will decide to use it for the system scripts, b.) be intended for use with containers in some manner and can simply be added as another dependency, or c.) be primarily intended to be used interactively in some unique way, so it'll just be installed by individuals.
There will always be a place for the universal shells and text editors. Bash and vim aren't going away any time soon. But there are plenty of places for non-default choices. Vim's existence doesn't mean Sublime doesn't have a place.
First, this is a pretty nice language, which achieves much with very few, very general constructs. It's distinctly higher level than Bash, while using much fewer concepts. Here I applause.
Its shell capabilities are also minimalist, but should be ergonomic enough to avoid nearly all of the ceremony that Python's `popen()` usually requires.
OTOH I'm certain that prefixing all standard functions with `std` will get tiresome soon; I'd go with prefixing them with a `@` or `:` or another short special-case glyph. Also, the fact that `1 != 1.0` scratches me the wrong way; I'd rather have an integer be equal to a float when the float's fractional part is exactly zero. It does make mathematical sense, and is cheap to implement.
In general, Hush looks like a great language for cases when writing Bash becomes tiresome, and you'd rather reach for Python, Ruby, or Perl, but know that each incurs its own pain when used as a shell language.
Hush would also make a very nice embeddable language, much like Lua on which it is based.
If this wants to sell itself as a shell scripting language, it should very quickly advertise what is it that makes it superior to say, bash for typical shell scripting tasks.
Shell scripts with bash are painful to the point that if I find myself writing more than around 10 lines of shell, I tend to stop and switch to Perl instead. But Perl hasn't been too popular lately and isn't ideal either, so I'm very much up for something better.
Here's some features I want from a shell scripting language:
* 100% reliable argument passing. That is, when I run `system("git", "clone", $url);` in Perl, I know with exact precision what arguments Git is going to get, and that no matter what weirdness $url contains, it'll be passed down as a single argument. Heck, make that mandatory.
* 100% reliable file iteration. I want to do a "for each file in this directory" in a manner that doesn't ever run into trouble with spaces, newlines or unusual characters.
* No length limits. If I'm processing 10K files, I don't want to run into the problem that the command line is too long.
* Excellent path parsing. Such as filename, basename, canonicalization, finding the file extension and "find the relative path between A and B".
* Good error handling and reporting
* Easy capture of stdout and stderr, at the same time. Either together or individually, as needed.
* Excellent process management. We're in 2022, FFS. We have 128 core CPUs. A modern shell scripting language should make it trivial to do something like: take these 50000 files, and feed them all through imagemagick, using every core available, while being able to report progress, record each failure, and abort the entire thing if needed.
* Excellent error reporting. I don't want things failing with "Command failed, aborted". I want things to fail with "Command 'git checkout https://....' exited with return code 3, and here's for good measure the stdout and stderr even if I redirected them somewhere".
* Give me helpers for common situations. Eg, "Recurse through this directory, while ignoring .git and vim backup files". Read this file into an array, splitting by newline, in a single line of code. It's tiresome to implement that kind of thing in every script I write. At the very least it should be simple and comfortable.
That's the kind of thing I care about for shell scripting. A better syntax is nice, but actually getting stuff done without having to work around gotchas and issues is what I'm looking for.
Most of those features are in PowerShell, especially version 7.
It uses strict ('reliable') argument passing, strong typing, etc..
It has a bazillion independent streams such as "Debug", "Verbose", "Warning", "Error", "Progress", "Output", "Information", etc...
It has "foreach -parallel", which is a fun way to make the CPU fan spin up.
> "Recurse through this directory, while ignoring .git and vim backup files". Read this file into an array, splitting by newline, in a single line of code. It's tiresome to implement that kind of thing in every script I write. At the very least it should be simple and comfortable.
$lines = dir -Recurse -Exclude *.git, *.vim | Get-Content
Not exactly rocket science in most shells? In most shells you can make this a modular function that takes in a list of file names, but PowerShell's "dir" doesn't output file names. It outputs file objects, so then it is just one step to do a filter on something like extension (or length, or whatever):
In PowerShell you can create functions that integrate with the object-oriented pipeline. You can pipe full objects into a function, and objects out. Object attributes can be mapped automatically to function arguments too. You can have begin/process/end blocks so you can do things like open a DB connection, pipe in data, then close the connection neatly without needing to do this all manually each time. Literally something as trivial as this:
PowerShell is indeed excellent, but there's nothing wrong with it having a bit of competition.
Some of it is also less than ideal, but rather less than bash. Issues in powershell I can think of:
* It leaks environment variables. Set an env var, and it propagates to the shell from which it was called from!
* It can't comfortably import environment from batch files. Working with vcvars is an annoyance. You'd think somebody at Microsoft would have made built in support for that one.
* Argument passing in Windows is a bloody horror. PowerShell accepts commands encoded in base64 because it sucks so much!
* For some reason you can't quote the command itself. You can do:
Unlike sane architectures, on Windows, WinMain gets the entire command line as a single string, which means it's each process individually what breaks it into separate arguments.
Well... there is still no leak as there is no new shell spawn. I personally find this behavior way more natural then that of bash. If you don't want it, use variables. I can imagine coming from bash background that this irritates you.
> WinMain gets the entire command line as a single string
This has nothing to do with PowerShell which imposes its own parameter parsing standard so that no individual scripts do that. Also, when we are at it, I guess this "unsane" architecture is more flexible.
> Well... there is still no leak as there is no new shell spawn. I personally find this behavior way more natural then that of bash. If you don't want it, use variables. I can imagine coming from bash background that this irritates you.
It's a serious annoyance when you're doing build scripts that do stuff like setting $PATH. Suddenly, stuff breaks randomly depending on what you ran in that particular powershell window before. And there's a length limit too, so if you append to a variable you may find stuff breaks on the 5th invocation of the script.
It also has the annoying "feature" of leaving you in the last directory the script changed to. Most annoying for debugging stuff.
> This has nothing to do with PowerShell which imposes its own parameter parsing standard so that no individual scripts do that.
It's not specifically powershell related, but if you're running on Windows, the Windows design problems apply to everything, including powershell.
> Also, when we are at it, I guess this "unsane" architecture is more flexible.
And more failure prone. It may mean there's no way whatsoever to pass a given path to a program. If a program internally doesn't handle quoting and spaces right, you're screwed.
It can also mean security issues. No matter how well you write your code, a malicious party can give you a filename like "data.txt /deleteeverything ", and the program you're calling may choose to interpret that as an argument, regardless of any quoting you might do.
For that matter, if you want to have the same flexibility on Unix, you can just concatenate all of argv, and then do your own parsing.
> It's a serious annoyance when you're doing build scripts that do stuff like setting $PATH.
Indeed, I was bitten by this too. Honestly, its probably the best not to touch PATH and friends and resolve the problem some other way. Or you could use modules.
> It also has the annoying "feature" of leaving you in the last directory the script changed to.
Yeah, that might be problem. I used to use cd instead of cpush because of this. However, using good framework fixes this. Check out Invoke-Build for your build stuff and you will forget about all of that. Its totally epic.
> It may mean there's no way whatsoever to pass a given path to a program. If a program internally doesn't handle quoting and spaces right, you're screwed.
Regarding space, this is easily solvable by using short syntax. Regarding quoting, this seems like a less important case - its very rare to have quotes in the file names in my experience.
Powershell improves on Unix commands being based on text streams, and makes them based on objects. Which means you're pretty much never extracting stuff with cut and awk, and instead can just get whatever field you want. Eg:
One of the nice things about it is that you can add stuff easily. You don't have to worry that every script that parses the output of your command will now break because you added an extra column or added extra functionality.
It depends. In a world where commands produce, expect and consume well defined and highly structured data streams, it is actually great. It works well in Windows but only because scripting as the concept is relatively new to the Windows world.
In UNIX, however, it is usually a mess and data extraction using the PowerShell approach would almost never work due spurious characters appearing in the output of a command (for any reason, really) as the UNIX style is precisely this: «cobble things together, use a sledgehammer to make everything work and move on. If it ain't broken then don't fix it». This is why running the output through a «sed» and searching for stable string patterns to cut interesting parts out and then (optionally) running them through cut/awk/et al is the Swiss army knife.
Life has become somewhat easier recently with the advent and more widespread use of JSON, YAML (and to a certain extent XML before) as we now have jq, yq, dasel, mlr (Miller), xmlto etc – to capture, for instance, the JSON formatted output and do something with it just in the same way it is possible in PowerShell whilst also retaining the extensibility (see below) without having to rely on the availability of the source code of the producing utility/app.
> One of the nice things about it is that you can add stuff easily. You don't have to worry that every script that parses the output of your command will now break because you added an extra column or added extra functionality.
You can only add stuff easily if you control (own) the producer of the data stream. If the producer is a third party provided script/app you don't have the source code for, I believe you still have the same breakage problem, however PowerShell experts might want to chime in and correct me.
> In UNIX, however, it is usually a mess and data extraction using the PowerShell approach would almost never work due spurious characters appearing in the output of a command
Where would they come from?
> (for any reason, really) as the UNIX style is precisely this: «cobble things together, use a sledgehammer to make everything work and move on. If it ain't broken then don't fix it». This is why running the output through a «sed» and searching for stable string patterns to cut interesting parts out and then (optionally) running them through cut/awk/et al is the Swiss army knife.
If you're doing that a lot, the code tends to be fragile. If you use cut for instance, it breaks the second the data you're working it changes. Program decided column needs to be 5 characters wider? Now the stuff you're looking for is not there anymore.
That's how you end up with "ifconfig is old, everyone switch to ip now". At some point a program's output may be parsed by so much stuff that any change risks breaking something, and it forces it to remain static for eternity.
> You can only add stuff easily if you control (own) the producer of the data stream. If the producer is a third party provided script/app you don't have the source code for, I believe you still have the same breakage problem, however PowerShell experts might want to chime in and correct me.
No, my point is that the producer is free to improve without risking the consumers. If your command that produced IPv4 addresses adds support for IPv6, it doesn't suddenly break every script that relies on precise lengths, line numbers and columns.
You can also take somebody else's returned data and add extra stuff to it if you want to, just like you could take a bunch of JSON and modify it.
> If the producer is a third party provided script/app you don't have the source code for, I believe you still have the same breakage problem
There are number of ways to prevent that in PowerShell, and by default you usually have to do nothing. But if we imagine that vendor removes some param you relied on in newer versions, you can make it work with old scripts using it by providing so called "proxy function" that returns it back by just adding new stuff on top of existing stuff of underlying function. The similar can be done with objects and properties.
> That's how you end up with "ifconfig is old, everyone switch to ip now".
Let’s not confuse Unix and Linux, shall we? (I agree that any given use of cut -c is probably wrong, but this is a weird conclusion. People just use awk.)
Logging components being the worst offenders immediately spring to me. Especially the ones that receive data points over a network in heterogenous environments. syslog running on a flavour ABC of UNIX receives an input from a locally running app that has a buffer overrun, the app has previously accepted a longer than permitted input and dumped the actual log entry + all trailing the garbage until the app encountered ASCII NULL into syslog. syslog does not care about the correctness of the received entry and, hence, is not affected and, say, diligently dumps it into a locally stored log file. The log parser is now screwed. I can think of similar examples outside log parsers, too, such interoperability related between different systems.
Granted, it has become less of a problem in recent years due the number of UNIX varieties having gone extinct or languages and frameworks considerably improving in the overall quality, but it has not completely disappered. Just less than a couple of years ago, a sloppy developer was dumping the PDF file content (in binary!) into the log file. The logger survived, but the log parser had a severe case of indigestion.
> If you're doing that a lot, the code tends to be fragile. If you use cut for instance, it breaks the second the data you're working it changes. Program decided column needs to be 5 characters wider? Now the stuff you're looking for is not there anymore.
You are absolutely correct. This is why I do not use «cut» and treat all columns as variable length patterns that can be matched using a regular expression in «sed». It is immune to column width changes as long as the column delimiters are known that are used as start and stop characters. «cut» is only useful when parsing fixed-length formats, such as SWIFT MT940/MT942, where the column width is guaranteed to remain fixed. «cut» is just overcomplicates everything and makes scripts prone to unpleasant breakages.
> That's how you end up with "ifconfig is old, everyone switch to ip now". At some point a program's output may be parsed by so much stuff that any change risks breaking something, and it forces it to remain static for eternity.
The cited reason to switch to «ip» was an unrelated to parsing, if I recall it correctly. But otherwise you are correct, the community has a proven track record of resisting changes in the output format due to the risk of breaking gazillions of cobbled together and band-aided shell scripts.
> No, my point is that the producer is free to improve without risking the consumers.
This is not guaranteed. If the producer changes the content of the structure or merely extends an existing structure, then consumers will continue to consume. However, if the producer decides to changes the structure of the output content itself, the breakage problem still persists. Changes to the content structure not infrequently occur when person A hands over to person B something called a piece XYZ that have been working on before, and person B has a different way of doing the same thing.
> If your command that produced IPv4 addresses adds support for IPv6, it doesn't suddenly break every script that relies on precise lengths, line numbers and columns.
Anything that is tightly coupled with precisely defines things is going to break, there is no scripting solution possible, I am afraid. E.g. if the script author relies on the maximum IPv4 address length (AAA.BBB.CCC.DDD) to never exceeed 15 characters or a specific format of IPv4 addresses, the added suppport for and IPv6 addresses appearing in the output will certainly break the script. Again, one possible solution is to treat all values as variable length patterns that are enclosed within delimiters and do not try to interpet the column content.
> The log parser is now screwed. I can think of similar examples outside log parsers, too, such interoperability related between different system
Use a sane system, like journald that can dump logs in JSON and doesn't require you to parse dates by hand. It can also deal with binary content fine, and can store stuff like crash dumps in the log if you want to, and provides functionality to make log parsing easy.
In any case I don't think such a problem should happen in the Powershell model. If you have a log object, then your $log.Message can contain any arbitrary garbage you want, and so you can just go and put that in a database and have that work with no trouble.
> This is not guaranteed.
Of course, but there are better and worse ways of doing things. With the object way, you can extend stuff quite easily with a minimum of care. As opposed to the unix model where you have to think whether somebody, somewhere might be using cut on this and fixing a typo in a descriptive text that adds a character might break something.
> Anything that is tightly coupled with precisely defines things is going to break, there is no scripting solution possible, I am afraid.
Not the best example I admit, but I mean in the case of something like Get-NetIPAddress in powershell, so long the user is either looking for the specific thing they want, or ignores stuff they don't recognize, you very much could add a yet new addressing scheme without running into trouble.
Good design helps a lot there. If you make it clear that stuff has types, and that new types may appear in the future, it's easy for the end user to write code that either ignores anything it doesn't understand or gives a good error message when it runs into it.
> Powershell improves on Unix commands being based on text streams, and makes them based on objects
I assume this means that PowerShell is deeply integrated with the .NET ecosystem? I don't like .NET very much, so that's a downside for me.
One of the reasons I use Bash is the decades-old ecosystem of various utilities people have written. Can PowerShell use the same CLI utilities as Bash, and do the plain-old-text-stream processing pipelines like Bash, or is it .NET objects only?
I don't think I need to "put some logic behind" not liking .NET, same as I dont need to put some logic behind not liking Java. It just sux and we know better now :)
Also good luck with your emacs pinky. :) :) :) :D :D :D :) :) :^)
Its not awesome however to write Where-Object and Foreach-Object instead ? and %. It really turns one down. I can understand fanboyism for not using other aliases but those 2 are special and not using them brings the wrong picture about the pwsh.
If I wasn’t on mobile I would dig up some excellent articles on the topic, so please excuse the too brief summary:
The core concept of the UNIX philosophy is that each tool should do one thing, and then you can compose them.
In practice this doesn’t work because of the physical constraints of the small machines used back in the day when UNIX was developed.
The original shell concept used plain byte or char streams, which are too primitive.
The end result is that every tool has to do parsing and serialisation. Worse, they pretty print by default, losing the structure and making subsequent steps fragile.
So for example, Visual Studio crashes if your Git version is “wrong” because it’s trying and failing to parse the output text.
Similarly if you ask UNIX people to solve simple problems like “stop all processes run by ‘Ash’”, they’ll reach for grep and accidentally stop every bash process also.
In PowerShell the commands don’t parse. They don’t serialise. They don’t pretty print. They don’t sort, filter, group, export, or import.
Instead there are dedicated, single-purpose commands for those functions that compose elegantly.
This is the UNIX philosophy, and PowerShell does it more than bash.
Oh my god, your first point. I had to pass arbitrary arguments through a couple layers of shell and I couldn't do it. Gave up and rewrote it all in Python, even though it's annoying in other ways, and I'm decent at shell scripting. It's a nightmare.
> 100% reliable argument passing. That is, when I run `system("git", "clone", $url);` in Perl, I know with exact precision what arguments Git is going to get, and that no matter what weirdness $url contains, it'll be passed down as a single argument. Heck, make that mandatory.
Variables are parsed as tokens so they're passed to the parameters whole, regardless of whether they contain white space or not. So you can still use the Bash terseness of parameters (eg `git clone $url`) but $url works in the same way as `system("git", "clone", $url);` in Perl.
> 100% reliable file iteration. I want to do a "for each file in this directory" in a manner that doesn't ever run into trouble with spaces, newlines or unusual characters.
Murex is inspired by Perl in that $ is a scalar and @ is an array. So if you use `f +f` builtin (to return a list of files), it returns as a JSON array. From there you can use @ to expand that array with each value being a new parameter, eg
rm -v @{ f +f } # delete all files
or use $ to pass the entire array as a JSON string. Eg
echo ${ f +f }
# you could just run `f +f` without `echo` to get the list.
# This is just a contrived example of passing the array as a string.
Additionally there are lots of tools that are natively aware of arrays and will operate on them. Much like how `sed`, `grep`, `sort` et al treat documents as lists, murex will support lists as JSON arrays. Or arrays in YAML, TOML, S-expressions, etc.
So you can have code that looks like this:
f +f | foreach file { echo $file }
> No length limits. If I'm processing 10K files, I don't want to run into the problem that the command line is too long.
This is a kernel limit. I don't think there is any way to overcome this without using iteration instead (and suffering the performance impact form that).
> Excellent path parsing. Such as filename, basename, canonicalization, finding the file extension and "find the relative path between A and B".
There are a number of ways to query files and paths in murex:
- f: This returns files based on meta data. So will pull files, or directories, or symlinks, etc depending on the flags you pass. eg `+f` would include files. `+d` would be include directories. or `-s` would exclude symlinks.
- g: Globbing. Basically `*` and `?`. This can run as a function rather than being auto-expanded. eg `g *.txt` or `rm -v @{ g *.txt }`
- rx: Like globbing but using regexp. eg `rx '\.txt$'` or `rm -v @{ rx '\.txt$' }` - this example looks terrible but using rx does sometimes come in handy if you have more complex patterns than a standard glob could support. eg `rm -v @{rx '\.(txt|rtf|md|doc|docx)$'}`
The interactive shell also have an fzf like integration built in. So you can hit ctrl+f and then type a regexp pattern to filter the results. This means if you need to navigate through complex source tree (for example) to a specific file (eg ./src/modules/example/main.c) you could just type `vi ^fex.*main` and you'd automatically filter to that result. Or even just type `main` and only see the files in that tree with `main` in their name.
> Good error handling and reporting / Excellent error reporting. I don't want things failing with "Command failed, aborted". I want things to fail with "Command 'git checkout https://....' exited with return code 3, and here's for good measure the stdout and stderr even if I redirected them somewhere".
A lot of work here.
- `try` / `trypipe` blocks supported
- `if` and `while` blocks check the exit code. So you can do
if { which foobar} else {
err "foobar does not exist!"
}
# 'then' and 'else' are optional keywords for readability in scripting. So a one liner could read:
# !if {which foobar} {err foobar does not exist!}
- built in support for unit tests
- built in support for watches (IDE feature where you can watch the state of a variable)
- unset variables error by default
- empty arrays will error by default when passed as parameters in the next release (hopefully coming this week)
- STDERR is highlighted red by default (can be disabled if that's not to your tastes) so you can clearly see any errors if they're muddled inside STDOUT.
- non zero exit numbers automatically raise an error. All errors return a line and cell number so you can find the exact source of the error in any scripts
Plus lots of other stuff that's referenced in the documents
> Easy capture of stdout and stderr, at the same time. Either together or individually, as needed.
Murex handles named pipes a little differently, they're passed as parameters inside triangle brackets, <>. STDERR is referenced with a proceeding exclamation mark. eg a normal command will appear internally as:
(You don't need to specify <out> nor <!err> for normal operation).
So if you want to send STDERR off somewhere for later processing you could create a new named pipe. eg
pipe example # creates a new pipe called "example"
command1 <!example> parameter1 parameter2 | command2 <!null> parameter1 parameter2
This would say "capture the STDERR of the first command1 and send it to a new pipe, but dump the STDERR of command2".
You can then later query the named pipe:
<example> | grep "error message"
> Give me helpers for common situations. Eg, "Recurse through this directory, while ignoring .git and vim backup files". Read this file into an array, splitting by newline, in a single line of code. It's tiresome to implement that kind of thing in every script I write. At the very least it should be simple and comfortable.
This is where the typed pipelines of murex come into their own. It's like using `jq` but builtin the shell itself and works transparently with multiple different document types. There's also an extensive library of builtins for common problems, eg `jsplit` will read STDIN and output an array split based on a regexp pattern. So your example would be:
cat file | jsplit \n
> That's the kind of thing I care about for shell scripting. A better syntax is nice, but actually getting stuff done without having to work around gotchas and issues is what I'm looking for.
I completely agree. I expect this shell I've created to be pretty niche and not to everyone's tastes. But I'd written it because I wanted to be a more productive sysadmin ~10 years ago and since I've moved into DevOps I've found it invaluable. It's been my primary shell for around 5 years and every time I run into a situation where I'm like "I wish I could do this easily" I just add it. The fact that it's a typed pipeline makes it really easy to add context aware features, like SQL query support against CSV files.
Bash is definitely painful to write - I write shell scripts a lot and I enjoy it (in a masochistic kind of way), and I wholeheartedly agree with most of the criticisms you've written.
However, the greatest strength of Bash is its ubiquity - it is available on almost every modern Unix-like environment - and when combined with standard POSIX tools it can provide a-little-less-painful
environment for writing reliable shell scripts.
In order to prove it, I will do my best to provide some kind of solution to each of the problems you have mentioned. Note that they are in no way perfect solutions, and all of them are just hacks that work around Bash's inherent clumsiness.
---
> 100% reliable argument passing
Wrapping a variable in double-quotes does this:
var="1 2 3 4"
for x in "$var"; do echo $x; done
For double-parsing (e.g. SSH commands) you can wrap the entire expression in single-quotes to maintain the original string until the second shell:
ssh user@host 'var="1 2 3 4"; for x in "$var"; do echo $x; done'
> 100% reliable file iteration
The "find" command does exactly this, and it works for whatever weird characters in the filename you have.
find ./directory/ -type f -exec SOME_COMMAND {} \;
> No length limits. If I'm processing 10K files, I don't want to run into the problem that the command line is too long.
Do not evaluate arguments directly in shell, use xargs to feed the arguments as standard input. E.g.:
find ./directory/ -type f | xargs SOME_COMMAND
> Excellent path parsing. Such as filename, basename, canonicalization, finding the file extension and "find the relative path between A and B".
* Excellent process management. We're in 2022, FFS. We have 128 core CPUs. A modern shell scripting language should make it trivial to do something like: take these 50000 files, and feed them all through imagemagick, using every core available, while being able to report progress, record each failure, and abort the entire thing if needed.
While bash is truly terrible when it comes to concurrency, I find GNU Parallel to be pretty satisfying for most concurrent shell-scripting:
For POSIX purity, xargs also can be used with "-L 1" argument that parses a single line per command iteration. For parallelism, there is also a "--max-procs" argument.
* Excellent error reporting. I don't want things failing with "Command failed, aborted". I want things to fail with "Command 'git checkout https://....' exited with return code 3, and here's for good measure the stdout and stderr even if I redirected them somewhere".
I find that adding "set -x" to the top of your shell, which prints each command with "+" prefix as it is expanded and executed, very useful for error tracking.
* Give me helpers for common situations. Eg, "Recurse through this directory, while ignoring .git and vim backup files". Read this file into an array, splitting by newline, in a single line of code. It's tiresome to implement that kind of thing in every script I write. At the very least it should be simple and comfortable.
There are helpers out there for most common situations - it's just that they are implemented as CLI tools, and not officially part of the shell. And in the scenario where you can't expect the availability of those unofficial tools, you can always write your own small library of commonly used Bash functions, and just copy-paste them into your script whenever you need them. It's ugly, but still possible.
> 100% reliable file iteration. I want to do a "for each file in this directory" in a manner that doesn't ever run into trouble with spaces, newlines or unusual characters.
In UNIX file names, there is exactly one unusual character: ASCII NULL (or '\0'). Every other character is usual, including spaces, newlines, tabs and other historical ASCII control characters.
------
> No length limits. If I'm processing 10K files, I don't want to run into the problem that the command line is too long.
Shell does not impose length limits, the UNIX kernel does by defining the limit of how much can be passed into the «execve» syscall which UNIX shells use to create new processes. You can find the length limit on your system by running «getconf ARG_MAX» in the shell as it varies across different systems (it varies even across different versions of the same system; in Linux, it is now reportedly 1/4th of ulimit -s). The command line limit is calculated using following constituents:
> Excellent path parsing. […] finding the file extension […]
File names in UNIX do not have extensions, they simply have names. Is «.bashrc»: 1) a full and complete file name or 2) an empty file name with the «bashrc» extension? It is (1). Moreover, any valid character can be used as a separator and its interpretation is either left out (almost always), or the interpretation is left up to the semantically aware app. One is free to use a comma or the Javanese wasana pada as part of the extension; that is, «file,exec» and «file꧅ ꦆ ꧅exec» are both valid and both have the extension of «exec» – as long as the file system supports the appropriate character set. This is also why «basename», to work correctly, requires the separator as part of the imaginary «extension», i.e. «basename myfile,exec ,exec» will always give «myfile» as the result.
DOS style extensions are a made up convention that neither the kernel, nor the shell, nor file processing utilities enforce as they are file extension unaware. It is better to think of UNIX file names as being made up a prefix and an optional, arbitrary length suffix (but not an extension).
------
> Excellent process management. We're in 2022, FFS. We have 128 core CPUs. A modern shell scripting language should make it trivial to do something like: take these 50000 files, and feed them all through imagemagick, using every core available
With respect to «every core available». You almost certainly don't want that and should overprovision the number of running processes compared to the number of cores on your system.
UNIX has supported multiprocessor systems for a very long time. Despite massive improvements in the hardware performance, disks and networks are still the slowest moving (or still) parts. They were even slower when UNIX was in its relative infancy when the CPU time was also very expensive. Therefore, the CPU time had to utilised efficiently whilst waiting for a disk to return a string of bytes.
File processing tasks spend their time between: 1) waiting for I/O (in the blocked state) and 2) actually processing (the running state). Since the disk is still very slow compared to the speed of a modern CPU, the UNIX process scheduler blocks the process until I/O completes and checks the process run queue in the kernel to see if there is another process ready to compute something (i.e. in the running state). This inherent interleaving of «blocked for I/O» and «running» process states can be used to an advantage depending on a few factors.
The less is size of the unit of data an app processes and the larger is the total size of the input (i.e. the input and output files), the more time the app spends in the «blocked for I/O» state, and most of the CPU time is simply wasted unless there is something else to do. But if we know such specifics (the size of the unit of work and the size of the input), we can overprovision the number of processes thereby utilising the CPU compute time more efficiently whilst the disk controller is transferring bytes into the memory via the direct memory access (I am oversimplifying a few bits here). This is the reason why «make -j12» will compile almost always faster than «make -j8» on a 8 CPU core system on projects with a large number of small(-er) files – because of the I/O overhead. Whilst there is no universal formula, 1.5x process overprovisioning factor is a decent starting point. For simpler daily file processing tasks GNU parallel is good enough to spare oneself of headaches of such computations, though.
Therefore, the «We're in 2022, FFS. We have 128 core CPUs. A modern shell scripting language should make it trivial to do something like: take these 50000 files, and feed them all through imagemagick, using every core available […]» does not make sense in the context of UNIX shell languages and the process scheduling in UNIX. In fact, you will underutilise your 128 CPU cores, sometimes pretty heavily, unless you correctly account for the I/O factor.
Only if the app/process is aware of how to efficiently parallelise its own workload (because it knows its workload better than anyone else), then and only then it does need to know how many cores there are available at its disposal. No scripting language / shell can solve this problem.
------
> Excellent error reporting. I don't want things failing with "Command failed, aborted". I want things to fail with "Command 'git checkout https://....' exited with return code 3, and here's for good measure the stdout and stderr even if I redirected them somewhere".
The excellent error reporting has existed in UNIX since day 1. It is called the process status code; anything that is not a zero status code indicates an error. The semantic interpretation of each speific numeric status code, however, is entirely decoupled from the thing that might fail and is documented in the man page. The status code of 3 in «mv» and in «git» will mean two completely different things, therefore the specific status code processing is localised to the process invocation point in shell scripts. Most of the time, though, I personally don't want my shell script to explode with an error message from a random failed command unless it is something of extreme importance to me; checking for the process exit code and acting upon it accordingly is sufficient and is good enough.
Whether such an approach is a good thing or not is a matter of a debate. Global lists of errors and/or error messages require an official register of both diligently maintained and updates of which to be centrally coordinated, which I don't think could work with the open source.
Other operating systems have attempted to mandate error codes with complex structures and a well defined (and sometimes written by a professional technical writer!) error message. Yet, they have had limited success. Yes, OS/400 running on an AS/400 could inspect a failed process' error code and automatically dispatch a message to an IBM service centre to order a spare part for a specific failing piece of hardware whose SKU would have been deduced from a specic part of the error code without requiring the human intervention, but that is somewhat of an extreme and extravagant example and is certainly not mainstream.*
I've changed my mind about this sort of thing recently. I disagree with the widespread belief that one should part with shell scripts as soon as possible.
Yeah, there are definitely eventually scripts where one should rewrite the work in another language, but we now just have too many people who have no idea what getopts is, too many people who think bash is available everywhere, too many people who think it's a good and safe idea to put all of your program arguments in environment variables, too many people who have no idea where POSIX specifications are and why they're useful, too many people who don't know how to simply reach for manual pages and do all their learning exclusively off Stack Overflow and Medium, etc.
You don't need advanced knowledge of all of this stuff, but when you're outright afraid of it, there's a professional problem involved. I'm not asking for people to have a comprehensive knowledge of awk, but if you've avoided every opportunity to simply understand what 'set -e' 'if' and 'test'/[]' are, you're intentionally atrophying your professional development.
As you can probably guess from my username, I am among the last people who would disagree on the usefulness of common shell utils, and the benefit of knowing them.
But I still hold that one should rewrite into a proper programming language, as soon as a shell script becomes too large, and that point is usually around the 100 LOC mark. Yes, I can do amazing things just with sh/bash and the coreutils. But a small Go program or python script, is just alot more readable, at least imho, because they impose STRUCTURE, which the shell lacks on purpose: It's a CLI first, and a programming language second.
I can read through several hundred lines of Go in one sitting. When asked with analyzing some old bash script, and `wc -l` tells me something along the lines of 400+, I grab the coffee machine and bring it to my desk.
The reality is that blame shaming devs for not learning something isn't effective. No matter your personal beliefs, nobody is getting fired for not knowing the intricacies of sed or bash or awk. The only options are keeping archaic artifacts around and have the elders maintain it, or improve it for everyone. I have tremendous respect for our unix/posix legacy but at the same time they didn't get everything right. How could they?
That said, it's harder to build consensus today. Currently people are replacing shell scripts with all kinds of stuff, and that's not good either. We need something good enough and ubiquitous, imo.
This seems worth a try; some interesting design choices here!
Definitely put the shell scripting examples up front if you want to call it a shell scripting language. I had to click through ten pages before finding out how to execute a command.
I wonder how usable it would be as an interactive shell with some conveniences. For example, the prompt could pre-type a "{ }" for you and put the cursor between them, so you can immediately enter commands, or decide to backspace over the "{" if you want to write more complex expressions.
A few things I commonly do in bash that it isn't clear (or wasn't immediately clear from the documentation that you can do in Hush):
Pass environment variables into a single command ala `NO_COLOR=1 ./foo` in standard sh. The only workaround I see is std.export to set it, and then std.export again to unset it.
The section on expansion doesn't mention std.glob - since there are times when you want to handle globs in variables it might be worth mentioning it there.
This looks nice. I'm slightly surprised by the error handling though:
if std.type() == "error" then
It seems (a) a bit verbose and (b) a bit hacky to compare the type against the string "error". I wonder if more ergonomic error handling is something the author is planning.
The section on the try operator¹ includes an example showing the syntactic sugar to reduce the verbosity. It does feel like it might be better to introduce it earlier on, as I was feeling the same way as you until I saw that.
I think hush is going in the wrong direction. The majority of shell automation is associated with running IaC and container images / orchestration tools.
Shell scripts don't need to follow functional programming or OOP. They need to be container-oriented / VM-oriented / image-oriented with a conformation to YAML notation as this is already the adopted norm for containers and IaC.
Side note: The most straightforward way to incoporate YAML as an elevated text "type" would be to define a new type of pipeable file descriptor on all commands, specifically meant for replacing shell script options / flags.
(That way options aren't mixed with stdin)
Edit: and you get typing validation for free by running your yaml options through a yaml schema validator should you choose to do so
Really cool concept. I like the idea of scripting in a "safe" language and then dropping into a command block to do all the system level stuff. It's the best of bash and python, in a way.
I'm not sure when I would use something like this. The verbosity definitely means I wouldn't use it as my actual shell, and I don't think that is the intention. Maybe something like a big service / orchestration script where I need to do all sorts of file manipulation, command processing, etc but there's enough logic and complexity to make it annoying to do in bash.
This looks neat. The command blocks are an interesting solution to the question of how to impose more structure on things without imposing a lot of additional syntax where it isn't wanted.
For me the reason for pursuing scripting languages over sh is because of multi-platform projects that I've had to support in the past where every little shell nuance, every tool nuance has introduced bugs in one platform or the other. It's too easy to create problems and too time consuming to find them. You need special resources to test/debug and so on.
The installer and the build system were the areas that caused the most pain - e.g. trying to debug build errors because some system has a slightly different version of some tool like sed.....
I felt that including a portable language in the installer would mostly eliminate this problem since one could eliminate the use of these hard-to-standardise tools. I don't care about POSIX standard command-line tools - IMO they are an example of how to encourage the proliferation of variants by blocking progress.
Some installers include Java which is a nightmare - big and not helpful. IIRC InstallJammer included TCL which was not bad.
I think GNU make (even though it is old school it's very commonly used) needs a shell that can work on windows AND UNIX operating systems equally and is fast to load up (i.e. not cygwin). The problem with e.g. using python as the make shell is that it has a bigger startup time than e.g. bash which is ok for most usecases but does matter in a build.
So I sort of like the idea of hush but I need to find out how portable it is and what's the startup time.
You might love autosetup, a Tcl-based autoconf tool chain replacement.
Not just Tcl-based, but designed to work with a simple Tcl subset shell called Jim, which compiles from one source code file. So you can bootstrap your project in a mostly-platform-independent way with a standard C compiler as the only prerequisite.
Thanks for the resource. I have been using Clojure and Babashka to bring the feeling of functional programming to the shell but have not been happy so far.
Wasn't Hush already the name of the HUmbleSHell ( Hush is a Bourne/POSIX-style shell that was originally part of BusyBox ) ...https://github.com/sheumann/hush
Yes, I find it odd how there's a massive blind spot when it comes to powershell.
Proponents of strong typing everywhere in their programming languages are put off by a shell that dares pipe objects rather than having everything as strings.
I get that Unix has a long tradition and it's hard to change, but powershell is genuinely a modern shell that while uncomfortably verbose without aliases is extremely powerful and feels modern.
Typical verbosity argument is nonsense - people should use aliases. With cross platform shell, first time we have some reason not to use them, to make script more portable. Besides, this verbosity is form of documentation - you really must consider that any bash script comes with invisible man that you usually must check even after years of usage. When you factor that in, even fully verbose PowerShell is joy.
Most of Unix haters seem to bi bigots. I have never seen good argument against PowerShell, apart of that it could be faster for specific use cases.
It seems like the powershell occupies a different niche than sh/bash, etc. It seems more oriented towards manipulating system properties than sorting and grepping. At least, the examples are always about system objects and properties. I wouldn't even know how to integrate normal unix commands in the powershell. Things like this look pretty counter-intuitive: https://devblogs.microsoft.com/commandline/integrate-linux-c...
Sorting and grepping in PowerShell are an every-day trivial thing.
I even use it to as data analytics tool to query moderate amount of data because its just simpler then using anything else (given that ConvertFromTo-XML,Json,Csv etc. are integrated, standard, and don't require bunch of other tools like jq, miler etc.) and I can send it to anybody along with the data to reproduce it or tweak it without having to install anything or learn any new language.
> At least, the examples are always about system objects and properties
Not sure where you get that idea from. Third party examples are not important at all. Majority of usage has nothing to do with the system and its properties.
Not sure why you mentioned that particular method of invoking WSL stuff. If anything, it shows how powerful PowerShell is.
But you can - `cat` is alias to Get-Content (one of).
Better, use `gc` Its shorter then `cat` and almost immediately guessable what it does if you know that aliases in pwsh are also named by the standard. "Cat" on the other hand has almost nothing to do with original intent (I am not concatenating stuff, I am getting a content so, the problem with that name is the same as with "magic numbers" in general programming)
This is first level ignorance. Which is super amusing when coming from folks that otherwise pride themselves on reading long and dry manuals.
PowerShell has aliases. And it comes with a ton of them already defined, they're even listed in the command help page. Get-Content is gc. Heck, Get-Content is even cat.
I think a lot of it boils down to knee-jerk reactions about MS products. When I speak of powershell with colleagues, their first reaction is usually "I couldn't use a non-free shell!"... Even though powershell has been MIT licensed and open source for years now.
As for the verbosity, this isn't a good argument. In my mind there are two "modes" of powershell:
* Day to day shell use, which should definitely use aliases. Nobody wants to type Get-ChildItem to get the list of files in a folder ten times in a row.
* Script writing, which should use the long form of commands. Any good text editor (e.g. vscode does it) should be able to translate aliases into long form commands.
I think they really hit the sweet spot when considering these two aspects. Long scripts are readable without referring to manpages all the time, while day to day shell is quick and easy.
This looks really cool! I think I found a few errors in the docs:
1. On https://hush-shell.github.io/intro/basic-constructs.html you say that `let x = array[5]` should panic, but the array has enough items. Perhaps you wrote that, then added more items to the array and forgot to change it?
Wouldn't the value of the fifth index be executed there, which is false and would hence fail?
Not sure either though
> safe_div_mod and sometimes safe_division
Aren't those different things? Save division wouldn't use mod by default, right?
The function for save division is missing though, but it might just be a built-in
Reading the Capture section, it seems if you want to do a command substitution you’ll always capture both stdout and stderr. Which is a showstopper if stderr is used to print progress, interactive prompts, etc.
On one hand we are on it since 2013 so there is "more" but I would actually like to highlight the difference in OOP and functional approaches: types, inheritance, multiple dispatch.
There's no interactive shell from what I can see. How does this qualify as shell scripting?
The point of a shell is that it's how I interact with the system. The point of scripting that is to automate my interactions by storing the exact same commands I type and sometimes adding a bit of logic around them. Hush is not that.
This looks excellent - I outlined wanting something like a cross between Lua and Bash back in 2016, and at first glance, it would seem that this project by and large fits the bill: https://github.com/stuartpb/lash
> I have designed and implemented a Unix shell called scsh that is embedded inside Scheme. I had the following design goals and non-goals:
> The general systems architecture of Unix is cooperating computational agents that are realised as processes running in separate, protected address spaces, communicating via byte streams. The point of a shell language is to act as the glue to connect up these computational agents. That is the goal of scsh. I resisted the temptation to delve into other programming models. Perhaps cooperating lightweight threads communicating through shared memory is a better way to live, but it is not Unix. The goal here was not to come up with a better systems architecture, but simply to provide a better way to drive Unix. {Note Agenda}
> I wanted a programming language, not a command language, and I was unwilling to compromise the quality of the programming language to make it a better command language. I was not trying to replace use of the shell as an interactive command language. I was trying to provide a better alternative for writing shell scripts. So I did not focus on issues that might be important for a command language, such as job control, command history, or command-line editing. There are no write-only notational conveniences. I made no effort to hide the base Scheme syntax, even though an interactive user might find all the necessary parentheses irritating. (However, see section 12.)
> I wanted the result to fit naturally within Scheme. For example, this ruled out complex non-standard control-flow paradigms, such as awk's or sed's.
This is a dynamically typed language. Null makes sense there.
It also makes sense in a statically typed language with union types, think `string | undefined` in TypeScript. The extra layer added by Some/None is not fundamentally necessary for safety.
I wonder if the name of this language is a reference to Goodnight Moon. The language is inspired by Lua, and one of the occupants of the room in Goodnight Moon is
Consider putting everything on a single page like oil shell does [http://www.oilshell.org/release/latest/doc/oil-language-tour...], so that simply scrolling down will show the shell examples without having to manually click click click the next button.