Hacker News new | past | comments | ask | show | jobs | submit login

This is an interesting way of writing a quick and dirty script, completely different than I would have written it (I would have definitely have used a lot more subshells, temporary variables, escaped the characters in the commands, "#" in sed instead of "/" and a lot less sed magic).

If anyone is curious as it took me a bit to decipher (and I consider myself quite familiar with shell scripting!):

  read X Y
Read the stdin into X and Y. I'm honestly not sure why read Y here as it's unset and re-defined on the next line?

  unset Y;Y=${X##*/};  
Unset Y, use Parameter expansion to match the regex "/" greedy and delete it from the string, assign it to Y. Also I believe that using "" here is a "bashism" so it would not work on a strict posix shell :)

  echo "$X" \
Echo X to be piped as stdin in the next command

  |sed 's/^/url=/' \
Use sed to prepend url= to the string

  |curl -4sK- \  
Connect over ipv4 (-4), don't output (-s, silent) and read the config from stdin (-K-). Apparently curl support reading what to fetch from a config file, I was not aware that curl supported that

   |tr -d '\134' \
Delete all backslashed from the input. "\<number>" is the octal ascii code of the character

   |sed -n 's/u0026/\&/g;s/.*\"application\/pdf\",\"uri\":\"//;s/\".*//;s/https:/url=&/p' \
Several substitutions, and don't echo the output automatically (-n). Replace "u0026" with "&" (u0026 is the unicode number of ampersand - not sure why but docdroid use that in their page), all occurrences. Select the line that matches '.*"application/pdf","uri":"' Match "https:" and prepend it with "url=", and print it

  curl -4o "$Y".pdf -K- 
As above, parse the config file to download the output to $Y.pdf (-o) over IPv4



   read X Y
"Read the stdin into X and Y. I'm honestly not sure why read Y here as it's unset and re-defined on the next line?"

You can drop the Y but then if you unintentionally have anything after the URL on stdin the script will break.

   unset Y;Y=${X##*/}; 
"Unset Y, use Parameter expansion to match the regex "/" greedy and delete it from the string, assign it to Y. Also I believe that using "" here is a "bashism" so it would not work on a strict posix shell :)"

There is no regex. This is globbing. It is a shell feature sometimes called "Parameter Expansion". This will delete everything up to "/".

I am not a bash user. I use NetBSD ash as both the interactive and scripting shell.

   |curl -4sK- \ 
"Connect over ipv4 (-4), don't output (-s, silent) and read the config from stdin (-K-). Apparently curl support reading what to fetch from a config file, I was not aware that curl supported that"

I only use curl in HN examples. I do not use it otherwise, so the -K- option is just a stupid hack to make curl behave more like the programs I actually use: yy025, nc and so on.

    |sed -n 's/u0026/\&/g;s/.*\"application\/pdf\",\"uri\":\"//;s/\".*//;s/https:/url=&/p' \
"Several substitutions, and don't echo the output automatically (-n). Replace "u0026" with "&" (u0026 is the unicode number of ampersand - not sure why but docdroid use that in their page), all occurrences. Select the line that matches '."application/pdf","uri":"' Match "https:" and prepend it with "url=", and print it"

This does not select the line that matches the pattern, it deletes everything up to that pattern in all* lines of the input. It then deletes everything after double quotes from all lines of the input.

Yes, I escaped the forward slashes. That adds some characters. Sometimes when dealing with URLs as input I will use a character that is not permitted in URLs as a separator, such as "<" or ">". As a matter of course, I do not use "#" as a separator because, for me, it makes inline sed comments prefixed with "#" more difficult to distinguish. I also superfluously escaped the double quotes out of habit. No need in this case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: