This is an interesting way of writing a quick and dirty script, completely different than I would have written it (I would have definitely have used a lot more subshells, temporary variables, escaped the characters in the commands, "#" in sed instead of "/" and a lot less sed magic).
If anyone is curious as it took me a bit to decipher (and I consider myself quite familiar with shell scripting!):
read X Y
Read the stdin into X and Y. I'm honestly not sure why read Y here as it's unset and re-defined on the next line?
unset Y;Y=${X##*/};
Unset Y, use Parameter expansion to match the regex "/" greedy and delete it from the string, assign it to Y. Also I believe that using "" here is a "bashism" so it would not work on a strict posix shell :)
echo "$X" \
Echo X to be piped as stdin in the next command
|sed 's/^/url=/' \
Use sed to prepend url= to the string
|curl -4sK- \
Connect over ipv4 (-4), don't output (-s, silent) and read the config from stdin (-K-). Apparently curl support reading what to fetch from a config file, I was not aware that curl supported that
|tr -d '\134' \
Delete all backslashed from the input. "\<number>" is the octal ascii code of the character
Several substitutions, and don't echo the output automatically (-n).
Replace "u0026" with "&" (u0026 is the unicode number of ampersand - not sure why but docdroid use that in their page), all occurrences.
Select the line that matches '.*"application/pdf","uri":"'
Match "https:" and prepend it with "url=", and print it
curl -4o "$Y".pdf -K-
As above, parse the config file to download the output to $Y.pdf (-o) over IPv4
"Read the stdin into X and Y. I'm honestly not sure why read Y here as it's unset and re-defined on the next line?"
You can drop the Y but then if you unintentionally have anything after the URL on stdin the script will break.
unset Y;Y=${X##*/};
"Unset Y, use Parameter expansion to match the regex "/" greedy and delete it from the string, assign it to Y. Also I believe that using "" here is a "bashism" so it would not work on a strict posix shell :)"
There is no regex. This is globbing. It is a shell feature sometimes called "Parameter Expansion". This will delete everything up to "/".
I am not a bash user. I use NetBSD ash as both the interactive and scripting shell.
|curl -4sK- \
"Connect over ipv4 (-4), don't output (-s, silent) and read the config from stdin (-K-). Apparently curl support reading what to fetch from a config file, I was not aware that curl supported that"
I only use curl in HN examples. I do not use it otherwise, so the -K- option is just a stupid hack to make curl behave more like the programs I actually use: yy025, nc and so on.
"Several substitutions, and don't echo the output automatically (-n). Replace "u0026" with "&" (u0026 is the unicode number of ampersand - not sure why but docdroid use that in their page), all occurrences. Select the line that matches '."application/pdf","uri":"' Match "https:" and prepend it with "url=", and print it"
This does not select the line that matches the pattern, it deletes everything up to that pattern in all* lines of the input. It then deletes everything after double quotes from all lines of the input.
Yes, I escaped the forward slashes. That adds some characters. Sometimes when dealing with URLs as input I will use a character that is not permitted in URLs as a separator, such as "<" or ">". As a matter of course, I do not use "#" as a separator because, for me, it makes inline sed comments prefixed with "#" more difficult to distinguish. I also superfluously escaped the double quotes out of habit. No need in this case.
If anyone is curious as it took me a bit to decipher (and I consider myself quite familiar with shell scripting!):
Read the stdin into X and Y. I'm honestly not sure why read Y here as it's unset and re-defined on the next line? Unset Y, use Parameter expansion to match the regex "/" greedy and delete it from the string, assign it to Y. Also I believe that using "" here is a "bashism" so it would not work on a strict posix shell :) Echo X to be piped as stdin in the next command Use sed to prepend url= to the string Connect over ipv4 (-4), don't output (-s, silent) and read the config from stdin (-K-). Apparently curl support reading what to fetch from a config file, I was not aware that curl supported that Delete all backslashed from the input. "\<number>" is the octal ascii code of the character Several substitutions, and don't echo the output automatically (-n). Replace "u0026" with "&" (u0026 is the unicode number of ampersand - not sure why but docdroid use that in their page), all occurrences. Select the line that matches '.*"application/pdf","uri":"' Match "https:" and prepend it with "url=", and print it As above, parse the config file to download the output to $Y.pdf (-o) over IPv4