Hacker News new | past | comments | ask | show | jobs | submit login
A simple web server written in Awk (github.com/crossbowerbt)
204 points by keepamovin on Sept 20, 2023 | hide | past | favorite | 69 comments



If you enjoy stretching what you can get done with limited environments, I recently discovered BusyBox includes not only an AWK implementation but an HTTP server[0] that supports CGI as well. I spent a weekend setting up a really simple web-app using recutils [1] as the database and BusyBox AWK as the programming language.

If you enjoy bricolage and you feel like wasting a weekend, I would recommend giving it a try!

[0] https://openwrt.org/docs/guide-user/services/webserver/http....

[1] https://www.gnu.org/software/recutils/


That looks so featureful (thinking of attack surface here), you might as well install a regular web server at that point. What's the benefit of using busybox for this?

Edit: ah is this file size? Or does this refer to RAM? Either way, sounds like that's the benefit

> BB httpd can be compiled with only basic features like CGI and ETag and will have only 8Kb


BusyBox provides a complete set of POSIX utilities and more in a single minimal binary; there are lots of embedded systems with pretty much just BusyBox installed, although I'm not sure how common it is to compile it with httpd. GP linked an OpenWRT article, so I'm guessing (the site seems to be down) it's included in that OS.


I have found that busybox awk:

1. Does not implement the GNU networking extensions.

2. Does not implement the GNU array sorting extensions.

3. Does implement the mktime() and strftime extensions.

I see that there is a call to stat() in the code; I didn't know that GNU added this.


Busybox is certainly a gem.


> gsub(/\/\.\./, "/", request_filename) # avoid directory trasversal

Hmmmm

http://localhost:8888/..../..../..../..../..../..../.../.......


damn, you beat me to it.

Was gonna write:

http://localhost:8888/..../..../..../..../..../..../etc/host...

mypc

These regex substitutions are so easy to bypass :)


Always fun :D


All this tells me is that preventing directory traversals can only be done by checking absolute file paths are within a bounded range, and nothing else.


Running the server as a service account that can only read its own directories, running it in a chroot, running it in a mount and pid namespace, using SELinux to further restrict what files it can read even in principle.

Of course, if you're trying to go superminimal anyway, it's not that big a deal to create a server that doesn't even have sensitive data on it. You can make init simply mount a root filesystem that only has busybox and whatever files you want to serve and starts up the httpd process and nothing else. Turn Linux into a unikernel basically. If you compile busybox yourself, you're also able to remove all the subcommands you don't actually need.


change that to:

> gsub(/\/\.\.+\/?/, "/", request_filename) # avoid directory traversal

source: am "regex expert" >..< (and know how to spell)


U sure? echo -e "GET /../.. HTTP/1.0\r\n\r\n" | nc localhost 8888


Hah. I know the web frameworks have solved this issue but it seems like a fun puzzle to figure out without peeking


For those who like simple shell tools taken to extremes, this sed script implements "Super Mario Level 1 for the NES"

https://github.com/chebykinn/sedmario


crazy. lol


If I got this term correctly, these days it would rather be a "Serverless webserver in awk". Also sounds more catchy.


Ah, I had posted a golfed silly gawk webserver here some time ago. It uses gawk's built in tcp.

https://news.ycombinator.com/item?id=22085459

Pretty printed: https://gist.github.com/tyingq/4e568425e2e68e6390f3105e58878...


neat! i was aware of bash's built-in tcp client at /dev/tcp, but gawk being able to function as a tcp server is way cooler. Thanks for that bit of knowledge :>


Actually, bash has now a loadable builtin to listen on tcp socket. I wrote a script and the patch. https://github.com/dzove855/Bash-web-server

The patch is now included in bash 5.2


Excellent! Looking forward to your pure bash TLS implementation next :)


it is already in my mind creating a bind to something like libressl or rustssl, but it will be much more complicated.


It's funny how closely this resembles nginx.conf.



Marek's achieves it without socat, using only cat and sh built-ins. Badass OG.


a small nitpick - it uses inetd, which actually handles the network stuff allowing you to work on stdin/stdout. At least that's what i remember from 2003(?) when i wrote an identd for conntracked connections in bash.


What does the loop and sleep 1 do? Is that to respawn upon crashes (why'd it crash?) or does it exit socat after handling a request?

I recently made a netcat webserver returning only one static response just to have a tiny info page for my new email server, that needs a while true loop but no sleep. I then benchmarked the performance and was very surprised to find that a slow VPS manages [spoiler answer] https://lgms.nl/p/cau/?b64&bY%2FBTsQwDER%2FZbivViDxAxw58Q1pO... Performance graph: https://snipboard.io/Vtn0MO.jpg


It's a one-shot awk script. Handle one request and exit.


Huh okay, then why the sleep?! That kills performance but why'd you do that


The sleep is probably to prevent endless revival of the script when you try to cancel it with Ctrl-C. But yeah, it's not ideal considering you can't make more than one request within a second.


Could

    if [ -f /tmp/dontrespawn ]; then break; fi
Instead of sleeping, you can create that file when you want it to stop. But perhaps that'd imply that someone would genuinely use this rather than only developing it as a toy


No idea, not my script. Socat does support this kind of syntax, which would seem better suited in this case.

  socat -v -v TCP-LISTEN:8080,reuseaddr,fork SYSTEM:/some/script
It also has a settable max-children so you can put some bounds on it.

I did post another comment where I shared my gawk webserver, but I'm not using socat.


I love when people do projects with awk! I received my awk 2nd edition book just yesterday! I am so excited about reading it!


I've heard of Awk for years and (probably like many) only used it for single-line snippets for the vast majority of that time.

Imagine my surprise when I just decided to look into it one day (after finding slightly more complex Awk scripts that did a lot with very little code which piqued my curiosity) and finding this very nice line-oriented DSL that has aged SHOCKINGLY well given how old it is.

I just wish it had interrupt handling of some sort without running a custom fork/patched version


Cool news -- I liked the first edition a lot back in the 90s.


Awwwwk such a cute web server :)


then on top of that you can run werc a web framework written entirely in rc, save for a few of the filters which i think are awk files (like the markdown2html converter)


I thought you could listen() accept() in modern awks.


Sort of. Gawk supports magic file names like "/inet/tcp/8080/0/0" to be a tcp client or server, but you don't get control of accept()...it's all mooshed together. So you can make a slow, single threaded webserver with it.

Which is too bad, because the usually shipped gawk extensions also give you fork. If they had separated listen() and accept(), you could actually make a reasonable webserver. See my other comment for an example of what you're limited to.


I know this seems the antithesis, but I've always wondered if you could fold and mangle cURL to actually act as a web server?


I don't think so. Poking around the source code, the only thing exposed in the cli client around listen() and accept() is related to FTP, because of the way FTP works. It does mean, though, that libcurl has functions like Curl_conn_tcp_listen_set() and Curl_conn_tcp_accepted_set() that could be used for what you're describing. It's just that they are only used for FTP now.


You can achieve the same in bash. With no so at.


Could you post a bash string or script that does this?


It must be a very awk-ward service to deploy.


Interesting, but what is a TCP wrapper?


Something that handles the TCP connections and passes them to the wrapped program’s stdin and stdout. Back in the day, lots of network daemons were written as command line tools and executed via inetd.


Is there a modern solution for this that isn't some variant of FastCGI? I know it's inadvisable but sometimes I would like to write a proof of concept that doesn't require me to code a reliable, long-running process.


Compile a CGI program in any language to WASI, then use https://github.com/deislabs/wagi to run it.


inetd still exists. I’d probably just use that.

Edit: There's a CGI plugin for Caddy that looks easy to set up: https://github.com/aksdb/caddy-cgi


"requires socat"


For tcp. There's another one which uses GAwk's internal tcp stack, but I wasn't sure if that would work in normal awk, and I'm a purist like that so you got this one.


I’d like to see both, please


Socat is a trivial stream converter, like a pipe fitting with different thread sizes on two ends.

The interesting part is the logic and the actual.string processing.


True, but you can go the other way and do it with only socat: socat -v tcp-listen:8888,reuseaddr,fork exec:'cat RESPONSE' is a (Very Static) entire web server, if RESPONSE has 2 header lines, a blank, and an html body. (A little too late in the morning for me to do the golfing needed to eliminate the "cat" and do the cr-nl from whatever shell you're already running, though.)


(scrolls down further in the socat manpage) Ohhh, under "EXAMPLES": you can combine a crlf option to TCP-LISTEN, SYSTEM instead of EXEC, and "echo -e", and there's actually a slightly useful http server already done :-)


My first interesting question was, "How do I open a port in awk?!"


GNU Awk has an entire separate manual for TCP/IP internetworking: https://www.gnu.org/software/gawk/manual/gawkinet/


Or cd to the html directory and run this:

  python -m http.server 8080
which does not require anything else like socat.


I think this is more of a “ain’t this interesting?” sort of thing.


Not a nawk on awk (pun intended) or socat, just a reply to the parent's comment.


This is functional and uninteresting.


For those who were drawn to ''awk'' in the title it is probably uninteresting. For those who were drawn to ''simple web server'', it is likely of some interest.


There are 1000 simple web servers. That part was never interesting to anyone interesting themselves.

If it had been "process some records with awk" that too would have been uninteresting even though it said awk.


"requires python"


Which in a way is somehow worse :P


in many ways


Yes, for some that would be a show-stopper. It is a nice one-line web service string, but should probably be run thusly:

  nohup python -m http.server 8080 2>/dev/null &


Nothing is interesting about overrated Python though


Ubiquitous, omnipresent, consistently rated in the top 2 or 3 most used computer languages. You don't like it, fine.


Apparently not fine. Are you about to cry?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: