Amen. Writing a bash script? Branch off a separate terminal and type `man bash`.

scalatohaskell · on April 19, 2017

I have a big distaste for bash, and am happy to copy paste anything similar from stackoverflow. Shame on me though!

falcolas · on April 19, 2017

My biggest concern with this is that there are a lot of really terrible bash script writers out there on the internet, and the quality of code between two different answers will vary wildly.

I'm not going to claim to be a great bash script writer, but I'm better than 80% of what I see out there. Throwing together random fragments is going to cause nearly as many problems as it's meant to solve.

scalatohaskell · on April 19, 2017

This is very true, and it has happened to me. Luckily I dont write anything critical in bash, there are usually better suited people for that. I am just a big crybaby without compiler :(

i336_ · on April 20, 2017

Out of curiosity, what do you currently dislike about the shell, if anything?

I'm not setting up a "you hate the shell" statement here - hashing out gripes and dislikes can sometimes be a great way to jump straight into the middle of an iterative positive feedback loop of learning.

(Or at least I think so, anyway. I think that if I did that myself it would identify the (sometimes not intuitive) blocks and glitches I have with things more quickly than I figure them out in the end.)

Oh, also - note that there is a lot I outright hate about the shell myself, but which is due to its, shall way say, overly organic development over the last ~20 years.

scalatohaskell · on April 20, 2017

I think it's mostly that I need to write it so little, that I never get to become proficient "enough" with it. There is little motivation to invest hours to learn it properly, because a) I get to write bash scripts so little that b) I would forgot half by next time I use it.

I think I'm basically stuck on the worst part of learning curve. Btw I use console for pretty much everything from git shell to vim. But when I have to run-in-parallel/fork two processes, pipe X from some to third one that has to repeatedly do curls to some service and check http response, there are 10 things I need to google, starting from: -how to fork process, how to pipe process, how to IF in bash, how do LOOP in bash, etc. Worst thing is, I've googled this 10 times before, but never remember the syntax etc.

i336_ · on April 21, 2017

I see. Okay, well, here's how I'd respond to what you've said.

The fact that you're already in the console puts you in a pretty good position: now you just need to stay there whenever you perform complex tasks. I have very few bash scripts myself (the little code I actually have saved is scattered everywhere; most of my scripts are on some disks that are offline at the moment). Rather, I primarily use the shell as an interactive way to let me hop to wherever it is I need to go as a sequence of steps.

Ultimately using Haskell (or Scala) will also definitely get the job done, but with that you have to create a file, hammer the code out (and write whatever boilerplate for forking and execing and piping), then save it and run it. With bash you just type something and hit enter. See what happens, hit the up arrow, edit, repeat. So it's a standard REPL, but this REPL can launch and pipe and fork in much less characters than other languages can.

I'm very curious about the use-case you loosely described. There are a few ways I could interpret what you said; depending on which you mean, what you describe might genuinely not be easy in bash.

If you need to fork two processes and merge their output, that is definitely not easy to do in any shell, AFAIK - you can do (proc1 & proc2) and hope for the best, but output might get mis-buffered and partial lines from one process might get merged with partial lines from other (eg, read(4096) from one might get "hello world" and read(4096) from the other might get "...blah\nblah\nbl" and then you could end up with "blah\nblah\nblhello world" - I've seen this happen once when two processes fight for priority on a terminal). AFAIK this is "not supposed to" occur :) but doing your own line buffering (ie, making a program, like you're already doing) would be critical if you needed this kind of merging.

If you're just doing simple things like running a program that outputs a list of URLs, doing some simple testing on that list, and then doing something else based on the results, that's quite easy.

Here's a highly-contrived script that

- gets a series of URLs

- directly pings all HTTPS URLs via curl and collects the output in output1.txt

- pings all HTTP URLs against https://example.com and collects the output in output2.txt

- parses the contents of output1.txt as a series of comma-delimited lines, then looks for lines that contain "URL" after the 2rd comma, and looks for a URL on the next immediate line after the 1st comma

- only fetches URLs matching the above specification that are HTTPS, chops up the URL from https://example.com/path1/path2.ext to https://example.com/path2.ext, and runs wget on the result

- runs 50 copies of wget in parallel (xargs will start new wgets as old ones finish) on the contents of output2.txt (warning: this will result in a thoroughly messed up terminal if you ever do this, but it works great!)

  program_that_outputs_urls | sed -n ',^https://,{p;b1}w/dev/stderr;:1' \
    2>(while IFS=$'\n' read x; do curl -s "https://example.com/?url=$x" \
       || echo "$x fail"; done > output2.txt
    ) | while read x; do curl -s "$x" || echo "$x fail 1"
  done > output1.txt
  
  u=0
  while IFS=$',' read -a x < output1.txt; do
    [[ "${x[0]}" =~ fail\ 1$ ]] && continue
    ((u==1)) && [[ "${x[1]}" =~ ^(https:\/\/[^\/]+\/)[^\/]+\/([^$]+) ]] && {
      wget "${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
      u=0
    }
    [[ "${x[2]}" =~ ^URL ]] && u=1
  done
  
  xargs -P50 -n1 -I{} wget "{}" < <(grep -v fail$ output2.txt)

Some notes:

- the sed command first grabs HTTPS URLs (,^https://,{...}), prints them directly (p) then jumps to :1 at the end (b1); if the {} block doesn't fire that means the ,https://, match didn't work, so that means the (w/dev/stderr) runs, which does what you think (GNU sed accelerates writes to /dev/stderr; on other UNIXes that path - or /proc/self/fd/2 - needs to exist for this to work)

- 2>( .......... ) is a bash subshell thingy that spits all of stderr (fd 2) into a subshell (this is also great for "command | ... do; ... x=1; done" situations - you'll lose your "x=1" association from the other side of the pipe. Doing "while ... do; x=1; done <(command)" keeps your "x=1" accessible!

- < <(...) is a way to redirect output from a subshell into the input of another process, similar to the reverse pipe explained above.

- The "2>(...) | while ... done" structure is simply the way I've expressed pushing stderr into the subshell, while pushing stdout into a pipe.

- "while IFS=$',' read" is the way you get `read` to use a different internal field separator. As an aside, I often use "IFS=$'\n' x=($(ls -1))" as a way to fix the "everything in the array is one of the words in the line" problem, as well as similar "unfixable" issues when bash is parsing input with tabs and spaces in it, or when you're using "read -a" to split things into arrays.

- The "if (u == 1) ... done; if (this_is_a_url) u=1" is a standard structure you've probably used elsewhere that implements the "look for URLs on the next line" behavior.

- Nobody's checked the git access logs, but I'm personally pretty sure bash's regex functionality was checked in from a cave full of dragons :) - besides the \/\"\/ picket-fence issues it often has its own ideas about matches. You cannot single- or double-quote strings; your quote will be considered part of the regex (:D :D). On bad days just pipe everything through "awk '/^https:\/\//{ print $1 $2 $3 ...; }'" or similar (yes that command needs editing but you get the idea).

If you want to explain the use-case you described in more detail, I could probably take a crack at figuring it out. I kind of went off on a very random tangent here, but it was fun.

codygman · on April 25, 2017

Well it's 51 minutes later, and I got to the third bullet point :D

EDIT: I tried to focus on readability here. Things could be quite a bit more concise but that requires firmer grasp of interactions between >>=, <$>, and point-free functions.

Here's what I have so far:

    {-# LANGUAGE OverloadedStrings #-}
    module Main where
    
    import Prelude hiding (appendFile)
    import Data.Monoid ((<>))
    import Network.Wreq
    import Control.Lens
    import Control.Concurrent.Async
    import Data.ByteString.Lazy (ByteString, appendFile)
    
    main = do
      urls <- promptUserURLs
      responses <- mapConcurrently get urls
      let responseBodys = fmap (view responseBody) responses
          responseInfos  = fmap (\(url,body) -> (take 5 url, body)) (zip urls responseBodys)
          httpUrls = filter (\(url,_) -> url == "http:") responseInfos
          httpsUrls = filter (\(url,_) -> url == "https") responseInfos
      mapM_ (appendFile "output1.txt") httpUrls
      exampleSiteResponses <- mapConcurrently get httpUrls
      let exampleSiteResponses' = fmap (view responseBody) exampleSiteResponses
      mapM_ (appendFile "output2.txt") exampleSiteResponses

    promptUserURLs :: IO [String]
    promptUserURLs = go []
      where go :: [String] -> IO [String]
            go urls = do
              line <- getLine
              if line == ("q" :: String)
                then return urls
                else go (urls <> [line]) -- lol, this is a very bad way to snoc

codygman · on April 26, 2017

Eh, I had a mistake.

    {-# LANGUAGE OverloadedStrings #-}
    module Main where
    
    import Prelude hiding (appendFile)
    import Data.Monoid ((<>))
    import Network.Wreq
    import Control.Lens
    import Control.Concurrent.Async
    import Data.ByteString.Lazy (ByteString, appendFile)
    
    main = do
      urls <- promptUserURLs
      responses <- mapConcurrently get urls
      let responseBodys = fmap (view responseBody) responses
          responseInfos  = fmap (\(url,body) -> (take 5 url, body)) (zip urls responseBodys)
          httpUrls = fmap fst (filter (\(url,_) -> url == "http:") responseInfos)
          httpsResponses = fmap snd (filter (\(url,_) -> url == "https") responseInfos)
      mapM_ (appendFile "output1.txt") httpsResponses
      exampleSiteResponses <- mapConcurrently get httpUrls
      let exampleSiteResponses' = fmap (view responseBody) (exampleSiteResponses)
      mapM_ (appendFile "output2.txt") exampleSiteResponses'
    
    promptUserURLs :: IO [String]
    promptUserURLs = go []
      where go :: [String] -> IO [String]
            go urls = do
              line <- getLine
              if line == ("q" :: String)
                then return urls
                else go (urls <> [line]) -- lol, this is a very bad way to snoc

i336_ · on April 26, 2017

Huh. So Haskell is the language of the missing apostrophe instead of the missing semicolon. (In all fairness, so is Lisp, but it puts the apostrophes on the other side.)

I'm curious about "(zip urls response" though. Does that extend to the "responseInfos)" on the next line, or am I reading it wrong?

(I noticed the "fmap fst" and "fmap snd".)

Wonders where to start to figure out how this works

codygman · on April 26, 2017

Yep, I've missed apostrophe's quite a few times. It's was a common cause of infinite loops for me in the past, but I suppose I've trained myself to spot them more quickly.

Perhaps zip can be more simply explained by a simple example:

    out: zip ["a","b","c"] [0,1,2]
    out: [("a",0),("b",1),("c",2)]

    out: fmap fst [(0,1),(2,3),(4,5)]
    out: [0,2,4]

    out: fmap snd [(0,1),(2,3),(4,5)]
    out: [1,3,5]

i336_ · on April 26, 2017

Ohhhhh. zip is cool! And I wondered whether "fst" was "first" and "snd" was "second"... is there a "trd" and "fth"?

Also - is [] a list (or mixed type) and () an array (or same-item type)?

dllthomas · on April 30, 2017

Hopefully the ticked version is a different type than the unticked version and would be caught by the typechecker. Failing that, hopefully the ticked version is now unused and will be caught by the linter. But yeah, this is definitely a thing that can happen.

If you're doing a lot of successive transformations of data of the same type, putting it in State will prevent you from messing up the ordering (or needing to come up with fresh names).

i336_ · on April 25, 2017

And surprisingly, I can actually follow it. (I have that thing where people say "yeah I can read this but don't expect me to write it" right now though...)

This is very interesting. Does Haskell not have reasonably easy-to-use EOF indication?

I'm reasonably impressed this is as short as it is; being able to shove everything into maps (and get concurrency for free) is nice.

I can understand why Haskell is less widely used though; this is... OH, "take 5" is substring, that's why you did "http:" and "https"!

...I was going to say "hard". :P

Thanks for taking the time to write this!!

codygman · on April 26, 2017

I made a more compact version closer to what I might write in a hurry.

    {-# LANGUAGE OverloadedStrings #-}
    module Main where
    
    import Prelude hiding (appendFile)
    import Data.Monoid ((<>))
    import Network.Wreq
    import Control.Lens
    import Control.Concurrent.Async
    import Data.ByteString.Lazy (ByteString, appendFile)
    import Data.List
    
    main = do
      urls <- promptUserURLs
      responses <- mapConcurrently get urls
      let (httpInfo, httpsInfo) = partition ((== "http:") . fst) (zip urls responses)
      mapM_ (appendFile "output1.txt" . view responseBody . snd) httpsInfo
      mapConcurrently (get . fst) httpInfo >>= mapM_ (appendFile "output2.txt" . view responseBody)
    
    promptUserURLs :: IO [String]
    promptUserURLs = go []
      where go :: [String] -> IO [String]
            go urls = do
              line <- getLine
              if line == ("q" :: String)
                then return urls
                else go (urls <> [line]) -- lol, this is a very bad way to snoc

NOTE: To make this faster I create a custom Library that re-exports these common pieces. So for me I'd erase those 7 lines of imports and trade them for one. It also has something like promptUserURLs already there. So in that case it looks like:

    {-# LANGUAGE OverloadedStrings #-}
    module Main where
    
    import QuickScriptLibrary
    
    main = do
      urls <- promptUserURLs
      responses <- mapConcurrently get urls
      let (httpInfo, httpsInfo) = partition ((== "http:") . fst) (zip urls responses)
      mapM_ (appendFile "output1.txt" . view responseBody . snd) httpsInfo
      mapConcurrently (get . fst) httpInfo >>= mapM_ (appendFile "output2.txt" . view responseBody)

i336_ · on April 26, 2017

Wow. Interesting. This is really cool, being able to learn like this :3

I must admit I have so many questions, like what the dot does, what fst and snd and >>= and OverloadedStrings and zip do... :)

Oh, and what "snoc" means.

dllthomas · on April 30, 2017

OverloadedStrings is an extension that basically puts `fromString` in front of every string literal. This means that instead of having type `String`, string literals have type `IsString a => a` (meaning "for anything that implements `IsString`, this could be a that"). It's used here so that `"http:"` can be used as a `ByteString` without cluttering the code up with an explicit `fromString`.

codygman · on April 29, 2017

cons = prepend snoc = append fst = first item of tuple snd = second item of tuple

>>= is monadic bind... I won't try to explain it here. It's somewhat similiar to flatmap in Scala which has some easy explanations.

zip, I just remembered I explained already.

i336_ · on April 29, 2017

Ooooh. Thanks. :)

I definitely have no excuse, I need to go find a decent Haskell reference-tutorial-manual. Know of any good ones for people with no former experience with functional programming (and ADHD :P)? I'm aware of Learn You a Haskell, just wondered what your experiences were.

codygman · on April 25, 2017

I only have about 20 extra minutes, but I'll see if I can give a Haskell version for comparison.

i336_ · on April 25, 2017

Oh, okay! Cool!!

I've had Haskell on my todo list for a little while actually.

scalatohaskell · on April 21, 2017

hey, thanks for response. Even though that example was just something I made up, your post helped me in learning.

i336_ · on April 21, 2017

Oh, okay. And no problem :) I'm very happy to hear that I helped a bit ^^

(Bash won't solve every problem (particularly tooling and things that might evolve over time; see also GNU autoconf :D), but it is awesome when it can be used to speed interactive tasks up.)