My biggest concern with this is that there are a lot of really terrible bash script writers out there on the internet, and the quality of code between two different answers will vary wildly.
I'm not going to claim to be a great bash script writer, but I'm better than 80% of what I see out there. Throwing together random fragments is going to cause nearly as many problems as it's meant to solve.
This is very true, and it has happened to me. Luckily I dont write anything critical in bash, there are usually better suited people for that. I am just a big crybaby without compiler :(
Out of curiosity, what do you currently dislike about the shell, if anything?
I'm not setting up a "you hate the shell" statement here - hashing out gripes and dislikes can sometimes be a great way to jump straight into the middle of an iterative positive feedback loop of learning.
(Or at least I think so, anyway. I think that if I did that myself it would identify the (sometimes not intuitive) blocks and glitches I have with things more quickly than I figure them out in the end.)
Oh, also - note that there is a lot I outright hate about the shell myself, but which is due to its, shall way say, overly organic development over the last ~20 years.
I think it's mostly that I need to write it so little, that I never get to become proficient "enough" with it. There is little motivation to invest hours to learn it properly, because
a) I get to write bash scripts so little that
b) I would forgot half by next time I use it.
I think I'm basically stuck on the worst part of learning curve. Btw I use console for pretty much everything from git shell to vim. But when I have to run-in-parallel/fork two processes, pipe X from some to third one that has to repeatedly do curls to some service and check http response, there are 10 things I need to google, starting from: -how to fork process, how to pipe process, how to IF in bash, how do LOOP in bash, etc. Worst thing is, I've googled this 10 times before, but never remember the syntax etc.
I see. Okay, well, here's how I'd respond to what you've said.
The fact that you're already in the console puts you in a pretty good position: now you just need to stay there whenever you perform complex tasks. I have very few bash scripts myself (the little code I actually have saved is scattered everywhere; most of my scripts are on some disks that are offline at the moment). Rather, I primarily use the shell as an interactive way to let me hop to wherever it is I need to go as a sequence of steps.
Ultimately using Haskell (or Scala) will also definitely get the job done, but with that you have to create a file, hammer the code out (and write whatever boilerplate for forking and execing and piping), then save it and run it. With bash you just type something and hit enter. See what happens, hit the up arrow, edit, repeat. So it's a standard REPL, but this REPL can launch and pipe and fork in much less characters than other languages can.
I'm very curious about the use-case you loosely described. There are a few ways I could interpret what you said; depending on which you mean, what you describe might genuinely not be easy in bash.
If you need to fork two processes and merge their output, that is definitely not easy to do in any shell, AFAIK - you can do (proc1 & proc2) and hope for the best, but output might get mis-buffered and partial lines from one process might get merged with partial lines from other (eg, read(4096) from one might get "hello world" and read(4096) from the other might get "...blah\nblah\nbl" and then you could end up with "blah\nblah\nblhello world" - I've seen this happen once when two processes fight for priority on a terminal). AFAIK this is "not supposed to" occur :) but doing your own line buffering (ie, making a program, like you're already doing) would be critical if you needed this kind of merging.
If you're just doing simple things like running a program that outputs a list of URLs, doing some simple testing on that list, and then doing something else based on the results, that's quite easy.
Here's a highly-contrived script that
- gets a series of URLs
- directly pings all HTTPS URLs via curl and collects the output in output1.txt
- pings all HTTP URLs against https://example.com and collects the output in output2.txt
- parses the contents of output1.txt as a series of comma-delimited lines, then looks for lines that contain "URL" after the 2rd comma, and looks for a URL on the next immediate line after the 1st comma
- runs 50 copies of wget in parallel (xargs will start new wgets as old ones finish) on the contents of output2.txt (warning: this will result in a thoroughly messed up terminal if you ever do this, but it works great!)
- the sed command first grabs HTTPS URLs (,^https://,{...}), prints them directly (p) then jumps to :1 at the end (b1); if the {} block doesn't fire that means the ,https://, match didn't work, so that means the (w/dev/stderr) runs, which does what you think (GNU sed accelerates writes to /dev/stderr; on other UNIXes that path - or /proc/self/fd/2 - needs to exist for this to work)
- 2>( .......... ) is a bash subshell thingy that spits all of stderr (fd 2) into a subshell (this is also great for "command | ... do; ... x=1; done" situations - you'll lose your "x=1" association from the other side of the pipe. Doing "while ... do; x=1; done <(command)" keeps your "x=1" accessible!
- < <(...) is a way to redirect output from a subshell into the input of another process, similar to the reverse pipe explained above.
- The "2>(...) | while ... done" structure is simply the way I've expressed pushing stderr into the subshell, while pushing stdout into a pipe.
- "while IFS=$',' read" is the way you get `read` to use a different internal field separator. As an aside, I often use "IFS=$'\n' x=($(ls -1))" as a way to fix the "everything in the array is one of the words in the line" problem, as well as similar "unfixable" issues when bash is parsing input with tabs and spaces in it, or when you're using "read -a" to split things into arrays.
- The "if (u == 1) ... done; if (this_is_a_url) u=1" is a standard structure you've probably used elsewhere that implements the "look for URLs on the next line" behavior.
- Nobody's checked the git access logs, but I'm personally pretty sure bash's regex functionality was checked in from a cave full of dragons :) - besides the \/\"\/ picket-fence issues it often has its own ideas about matches. You cannot single- or double-quote strings; your quote will be considered part of the regex (:D :D). On bad days just pipe everything through "awk '/^https:\/\//{ print $1 $2 $3 ...; }'" or similar (yes that command needs editing but you get the idea).
If you want to explain the use-case you described in more detail, I could probably take a crack at figuring it out. I kind of went off on a very random tangent here, but it was fun.
Well it's 51 minutes later, and I got to the third bullet point :D
EDIT: I tried to focus on readability here. Things could be quite a bit more concise but that requires firmer grasp of interactions between >>=, <$>, and point-free functions.
Here's what I have so far:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Prelude hiding (appendFile)
import Data.Monoid ((<>))
import Network.Wreq
import Control.Lens
import Control.Concurrent.Async
import Data.ByteString.Lazy (ByteString, appendFile)
main = do
urls <- promptUserURLs
responses <- mapConcurrently get urls
let responseBodys = fmap (view responseBody) responses
responseInfos = fmap (\(url,body) -> (take 5 url, body)) (zip urls responseBodys)
httpUrls = filter (\(url,_) -> url == "http:") responseInfos
httpsUrls = filter (\(url,_) -> url == "https") responseInfos
mapM_ (appendFile "output1.txt") httpUrls
exampleSiteResponses <- mapConcurrently get httpUrls
let exampleSiteResponses' = fmap (view responseBody) exampleSiteResponses
mapM_ (appendFile "output2.txt") exampleSiteResponses
promptUserURLs :: IO [String]
promptUserURLs = go []
where go :: [String] -> IO [String]
go urls = do
line <- getLine
if line == ("q" :: String)
then return urls
else go (urls <> [line]) -- lol, this is a very bad way to snoc
Huh. So Haskell is the language of the missing apostrophe instead of the missing semicolon. (In all fairness, so is Lisp, but it puts the apostrophes on the other side.)
I'm curious about "(zip urls response" though. Does that extend to the "responseInfos)" on the next line, or am I reading it wrong?
(I noticed the "fmap fst" and "fmap snd".)
Wonders where to start to figure out how this works
Yep, I've missed apostrophe's quite a few times. It's was a common cause of infinite loops for me in the past, but I suppose I've trained myself to spot them more quickly.
Perhaps zip can be more simply explained by a simple example:
Hopefully the ticked version is a different type than the unticked version and would be caught by the typechecker. Failing that, hopefully the ticked version is now unused and will be caught by the linter. But yeah, this is definitely a thing that can happen.
If you're doing a lot of successive transformations of data of the same type, putting it in State will prevent you from messing up the ordering (or needing to come up with fresh names).
And surprisingly, I can actually follow it. (I have that thing where people say "yeah I can read this but don't expect me to write it" right now though...)
This is very interesting. Does Haskell not have reasonably easy-to-use EOF indication?
I'm reasonably impressed this is as short as it is; being able to shove everything into maps (and get concurrency for free) is nice.
I can understand why Haskell is less widely used though; this is... OH, "take 5" is substring, that's why you did "http:" and "https"!
I made a more compact version closer to what I might write in a hurry.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Prelude hiding (appendFile)
import Data.Monoid ((<>))
import Network.Wreq
import Control.Lens
import Control.Concurrent.Async
import Data.ByteString.Lazy (ByteString, appendFile)
import Data.List
main = do
urls <- promptUserURLs
responses <- mapConcurrently get urls
let (httpInfo, httpsInfo) = partition ((== "http:") . fst) (zip urls responses)
mapM_ (appendFile "output1.txt" . view responseBody . snd) httpsInfo
mapConcurrently (get . fst) httpInfo >>= mapM_ (appendFile "output2.txt" . view responseBody)
promptUserURLs :: IO [String]
promptUserURLs = go []
where go :: [String] -> IO [String]
go urls = do
line <- getLine
if line == ("q" :: String)
then return urls
else go (urls <> [line]) -- lol, this is a very bad way to snoc
NOTE: To make this faster I create a custom Library that re-exports these common pieces. So for me I'd erase those 7 lines of imports and trade them for one. It also has something like promptUserURLs already there. So in that case it looks like:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import QuickScriptLibrary
main = do
urls <- promptUserURLs
responses <- mapConcurrently get urls
let (httpInfo, httpsInfo) = partition ((== "http:") . fst) (zip urls responses)
mapM_ (appendFile "output1.txt" . view responseBody . snd) httpsInfo
mapConcurrently (get . fst) httpInfo >>= mapM_ (appendFile "output2.txt" . view responseBody)
OverloadedStrings is an extension that basically puts `fromString` in front of every string literal. This means that instead of having type `String`, string literals have type `IsString a => a` (meaning "for anything that implements `IsString`, this could be a that"). It's used here so that `"http:"` can be used as a `ByteString` without cluttering the code up with an explicit `fromString`.
I definitely have no excuse, I need to go find a decent Haskell reference-tutorial-manual. Know of any good ones for people with no former experience with functional programming (and ADHD :P)? I'm aware of Learn You a Haskell, just wondered what your experiences were.
Oh, okay. And no problem :) I'm very happy to hear that I helped a bit ^^
(Bash won't solve every problem (particularly tooling and things that might evolve over time; see also GNU autoconf :D), but it is awesome when it can be used to speed interactive tasks up.)