darkhttpd is cool, and it's pretty featureful for a small program, but it's also 2500 lines of code.
If you liked it, you might also like my server httpdito: http://canonical.org/~kragen/sw/dev3/server.s which has documentation in http://canonical.org/~kragen/sw/dev3/httpdito-readme. It's also an HTTP server in a single source file, but it's i386 assembly, for Linux, that doesn't use libc. It's only about 700 lines of code, and the executable is up to 2060 bytes now that I've added CSS and PDF support to it.
httpdito is, to my surprise, practically useful on occasion as an alternative to things like `python -m SimpleHTTPServer`.
Unlike the server in Koshkin's excellent comment https://news.ycombinator.com/item?id=26672683 httpdito does send CRLFs as the HTTP standard demands. Unfortunately it also requires them in the request. Also—and this is a key point on which most software falls down—httpdito's documentation contains a section entitled "Are you insane?", which really is a question every programmer ought to answer in their documentation.
It doesn't have most of the things darkhttpd has, though. It doesn't support IPv6, HEAD, directory listings, byte ranges, conditional requests, keepalive, name-based virtual hosting, sendfile, logging, chroot, or privsep, and doesn't run on Solaris, SPARC, or ARM. But if you're looking for features like "small memory footprint" and "no installation needed" and "no messing around with config files" it may actually be better than darkhttpd. Due in large part to improvements in modern Linux and its small memory footprint (normally 5 pages) it's efficient enough to saturate a gigabit connection.
It’s funny, this is the exact kind of thing I’d want on a non-x86 system. Raspberry Pico for example would be a great place for a 2KB HTTP server but with not support for HTTPS, IPv6, or for any non-x86 architectures it’s a bit of a non-starter for my use cases. Still, very cool project!
The CPU architecture is actually the least of your concerns there—I'm pretty sure qemu-user can run httpdito on ARM with less than an order of magnitude performance overhead. There are a lot of embedded systems where an HTTP transaction per second per MHz would be more than sufficient.
The bigger problem is that the Raspberry Pico is a dual-core Cortex-M0+, which doesn't have an MMU, so it can't run Linux and especially can't handle fork(). But httpdito is basically scripting the Linux system call interface in assembly language—it needs to run on top of a filesystem, an implementation of multitasking that provides allocation of different memory to different tasks, and a TCP/IP stack. Any one of these is probably a larger amount of complexity than the 296 CPU instructions in httpdito.
The smallest TCP/IP stack I know of is Adam Dunkels's uIP. Running `sloccount .` in uip/uip cloned from https://github.com/adamdunkels/uip gives a count of 2796 lines of source code ("generated using David A. Wheeler's 'SLOCCount'."). uIP can run successfully on systems with as little as 2KiB of RAM, as long as you have somewhere else to put the code, but for most uses lwIP is a better choice; it minimally needs 10KiB or so. uIP is part of Dunkels's Contiki, which includes a fairly full-featured web server and a somewhat less-full-featured browser. I think he got both the server and the browser to run in 16KiB of RAM on a Commodore PET, but not at the same time.
(twIP http://dunkels.com/adam/twip.html is only 139 bytes of C source but doesn't support TCP or any physical-layer protocol such as Ethernet, PPP,or SLIP.)
However, Adam Dunkels has also written Miniweb http://dunkels.com/adam/miniweb/, which implements HTTP and enough of TCP and IP to support it, in 400 lines of C. It needs at least 30 bytes of RAM. Like twIP, it doesn't provide a physical layer. But that's solvable.
You can build mainline linux without an MMU, and there are even pretty crazy setups where you can run it on a ARM cortex (though usually an M4). It is not a standard system though, very little software will run without modification. The biggest issue for such processors is usually actually lack of memory (they have relatively little built-in and most have no external memory busses. There's at least one project where the externel memory is bitbanged through gpio!).
> Not having MMU means there's no virtual memory and instructions refer to physical memory addresses, cmiiw?
Pretty much, yeah.
> You say Linux won't work without MMU, it can't handle physical addresses? Moreover, why won't fork() work without MMU?
When httpdito fork()s two child processes, each of them starts receiving the HTTP request into the request buffer at `buf`. This works because the semantics of fork() give those two children two different buffers at the same memory address, one in each process's address space. The Linux userland relies relatively heavily on these semantics. It was a major obstacle to getting an SSH server running on cisco IOS, for example.
An event-driven server like darkhttpd is a much better fit for an MMUless system. Implementing multithreading is easy (it's half a page of assembly) but implementing memory mapping without an MMU requires some kind of interpreter.
(Actually you can implement fork() without virtual memory and without an MMU, for example with PDP-11-style segmentation, but the Cortex-M0+ doesn't have any of those facilities either.)
>"The Linux userland relies relatively heavily on these semantics. It was a major obstacle to getting an SSH server running on cisco IOS, for example."
Can you elaborate on this? Hasn't Cisco IOS at various times run on MIPS and X86 processors?
The original Cisco IOS ran on 68000 series processors which lacked an MMU. Even the later 68K models used an "embedded" version of a processor which did not have an MMU. For example, the Cisco 2500 used a 680EC30. Regular 68030s had MMUs, but the "EC" model did not. Later versions did run on MIPS though.
Unfortunately I'm just reporting secondhand rumors from people who worked at cisco, and I probably should have made that clear. So I don't know how IOS works at the machine-instruction level, just the command line.
Without an MMU, you can't do paging. That means fork() cannot do the normal copy-on-write business, because there's no page table to copy the entries in.
You also have no inter-process security, so everything can crash everything else including the kernel, and no swap.
I'm pretty sure Linux ELF has always allowed you to specify the initial load address. When I first wrote StoneKnifeForth https://github.com/kragen/stoneknifeforth its load address was 0x1000, but at some point Linux stopped allowing load addresses lower than 0x10000 by default (vm.mmap_min_addr). I originally wrote it in 02008, using the lower load address, and fixed it in 02017. It's still not using 0x804800 like normal executables but 0x20000. ASLR does not affect this.
Maybe you mean that before ELF support, Linux a.out executables had to be loaded at a fixed virtual address? That's possible—I started using Linux daily in 01995, at which point a.out was already only supported for backward compatibility.
For your website, it seems the files aren’t being encoded right because Firefox on iOS is rendering them as something else. I’m getting mojibake with apostrophes and the like:
> ### setup_socket: there’s little interesting to say here that’s not
### in, say, Beej’s Socket Guide.
Thanks! Yeah, what happened was that Apache wasn't sending a content-type header until I tweaked the .htaccess file, which was about 14 minutes before you posted your comment. I had to ctrl-shift-r in Firefox to get it to notice the changed content-type.
No, I didn't write it to solve use cases at all, but rather because I thought it would be awesome to write a web server in assembly. As the readme says, we all have moments where we do things we regret. Youthful† lapses in judgment.
If you git clone http://canonical.org/~kragen/sw/dev3/.git you will see that I wrote most of it one weekend in 02013, removed the dependency on libc and added forking the next weekend, and tweaked it slightly over the next month and a half. Then in 02019 I added CSS and PDF to the mime-types list. So it's not, like, an ongoing project. It's just a program I wrote once, like most things in that directory.
I did just fix the documentation, though, because it was still claiming it was under 2000 bytes.
Thanks. I thought I'd done so when I wrote it in 02013! I've now corrected the omission by adding a Creative Commons Public Domain dedication. Thank you for pointing that out!
One of 2004 IOCCC winners (2004/hibachi [1]) was an 154 line long CGI and vhost capable HTTP server [2]. It is one of a few winning entries that ever came with its own ./configure, and violated so many guidelines but not a single rule, making judges pretty upset. Its obfuscation was pretty basic even at that time and it won solely out of the novelty factor.
I wish I could dig it up now, but years ago I ran a QA team that needed to validate that our equivalent of man pages could be downloaded on the fly with a command. The guy who owned it came up with a test plan that included setting up and maintaining a web server just for this purpose, and was dragging the project on for weeks until I sat through lunch one day and wrote him a bare bones HTTP server in Powershell that we could just start locally and talk to at run-time. I'm pretty sure the first cut of it was only like 50 lines of code.
It wasn't obfuscated at all. The guideline doesn't explicitly dislike such entries, but as a side effect of not being obfuscated it implicitly violated several guidelines at once (e.g. it was quite "longer than [it] need[s] to be"). Indeed, today's nginx source code [1] would be probably slightly more obfuscated than this code modulo perhaps name changes (and that can be considered as a minimal effort to get the entry within the size limit). Its only obfuscation was reusing a buffer from getenv() pre-initialized (referred as to "getenv() == "putenv()" in the hint).
Sounds like you are describing Python, Go and whatever is the next programming language du jour. Folks "showing off minimal coding skill" proclaim "X in Y lines of code" for these "batteries included" languages all the time. Yet I never see any snarky comments pointing out the size, age or authors of the libraries they are using. All the parent comment did was demonstrate using the openssl binary, which is ubiquitous.
The GP referred to using nc. Assuming the original from 1995, that is not 10s of 1000s of LOC and was not written over three decades. It is the work of one person.
Yes, it is true. Everytime we use the OpenSSL library we are using 10s of 1000s of LOC written by a changing team of developers over several decades, a project with a long legacy of ad hoc development and mistakes. I doubt the parent is trying to take credit for any of that "heavy lifting".
Well in that case I one-up you with HTTPS server command without requiring write privileges by adding the command as an alias in your posix shell. Could even put the port argument last to scale usability.
I wrote filed [0], another single file HTTP server. It's very fast !
I wrote it because I was hosting containers (using LXC) on a laptop which only had mass storage available via a USB 1.1 (12mbps) connected drive. I was serving video streams out to the various Roku devices in my house from this container, but other webservers would do things very slowly (such as opening the file to serve out, which would require doing a lot of slow I/O) causing periodic interruptions. filed doesn't do those things.
It also supports Range requests (required to meaningfully serve media), and a few other things. If you want a static HTTP server, filed can work out quite well.
I agree, jamming a program in a single source file is not the best metric ever.
To be fair, that's just a line in the project's Readme, which is far from the best and doesn't even provide a concise description of what the project does.
Yep. darkhttpd is "Written in C - efficient and portable."
Portable, psh. How about a single that binary runs on Linux, MacOS, Windows, FreeBSD, OpenBSD, and NetBSD and also contains all the files you want it to serve, and you can add files without recompiling.
I wonder how small you could make TLS1.3 (or 1.2) implementation if you only supported bare minimum like single ciphersuite, no negotiation, no fancy features, no weird cert formats, but without skimping on the crypto parts?
althttpd is another single-binary webserver [1]. It's from the creator of SQLite. Also single-binary Fossil SCM, by the same author, internally implements a webserver (which is multiplatform, by the way)
Very cool! It talks about using stunnel for https, but for ssl termination I could just as easily put caddy or haproxy in front of the standard (non-https) http althttpd right?
When I look at this code it becomes clear that after using design patterns, dependency injection and other "modern" stuff for too long I have totally lost the ability to write short, concise code :-(
What are the performance implications of something like this (and other similar ones posted here) for a really light server (let’s say a small chat server). Would getting down closer to the metal matter much?
darkhttpd and things like `python -m SimpleHTTPServer` only support static content, so you can't run a chat server on them.
A lot depends on your context. I haven't benchmarked darkhttpd, but it's probably significantly better than my httpdito (linked above), which can spew out 1.8 gigabits per second and at least 20000 requests per second on my 8-year-old 4-core 2.3-GHz amd64 laptop. An event-driven server like darkhttpd would be a much better basis for adding Comet functionality like chat—that's why, at KnowNow in 02000, we contracted Robert Thau to move our Comet functionality from Apache with Perl CGI (IIRC ≈64 concurrent connections and ≈64 chat messages per second on a 1GHz server with 1GiB of RAM) to his select()-driven thttpd (≈8192). This work, now mostly of archaeological interest, has been open-sourced as mod_pubsub, which also includes a compatible select()-driven Python server I wrote the next year. It was also able to handle thousands of messages per second and thousands of concurrent clients.
There are at least four axes to your problem:
- How many concurrent connections do you mean when you say "a small chat server"? This could be anywhere from 2 to 2048.
- How many messages per second is it processing? This could be anywhere from 128 per connection (for something like Mumble) to 1/2048 per connection (if almost all clients are idle almost all the time).
- How big are these messages on average? This could be anywhere from 64 bytes to 64 mebibytes.
- What kind of hardware are you running it on? This could be anything from a 1MHz Commodore PET with an 8-bit 6502 and 16KiB of RAM to a 3.7GHz 12-core Ryzen 9 5900X with 256GiB of RAM. (I don't think anyone's written a chat server on Contiki, but it would be easy to do. It just wouldn't scale to very high loads.)
So, the answer to your question may vary by a factor of about 2⁷³, 22 decimal orders of magnitude. Can you give more detail on what you're thinking about?
The performance problems of HTTP servers are architectural, not usually about things like instructions per cycle. For example common topics are whether accepting connections will tend to starve serving established connections, whether events are distributed optimally among threads or CPUs, how the server behaves if a client sends one byte per frame, very slowly, or if the client receives the response very slowly, whether the process will run out of memory if a number of events happen suddenly, etc. A web server that's actually equipped to survive as a public service will be much, much longer than 2500 lines of code.
While mostly I agree with your comment, I suspect both darkhttpd and httpdito (linked above) are actually equipped to survive as a public service. httpdito is about 700 lines of code (in assembly); it handles these issues as follows:
1. Accepting connections starving established connections: this is mostly a problem for event-driven servers. httpdito runs each connection in its own child process, has a compiled-in limit of 2048 concurrent connections. Whenever it has 2048 live children, it blocks in waitpid() until at least one of them dies. Modern Linux does a fair enough job of scheduling that this seems to be sufficient in practice to keep the parent from hogging the whole machine and preventing the children from making progress, even if most of your incoming connections just close without sending an HTTP request or any bytes at all. Darkhttpd might have a problem with this or not, I haven't tried, but it seems to rely on OS ulimits to limit its number of concurrent connections. My guess is that with the default ulimit -n 1024 darkhttpd won't have any problem with this at all.
2. Whether events are distributed optimally among threads or CPUs: the real issue here is not optimality but stability, and again, this is mostly an issue for event-driven servers (or servers made of thread pools, etc.) The essential property for production is usually not that you're using the machine to within 0.1% or 10% of its capacity, but that you're not wasting 99+% of its capacity uselessly thrashing—or, at the other extreme, responding so fairly that you provide some service to every client, but not enough to actually ever complete a whole request successfully. Modern Linux does a good enough job of this that httpdito's very stupid approach of forking a new child for each request works reasonably well. darkhttpd avoids the problem in a different way by not having threads and not using multiple CPUs, which is clearly not optimal but also totally avoids any related stability concerns.
3. How the server behaves if a client sends one byte per frame, very slowly: in overload conditions, which is when this matters "to survive as a public service", Linux's TCP stack does a reasonably efficient job of consolidating the bytes until a particular server child gets a chance to run. httpdito does a super stupid O(N²) thing in these cases where it looks for the CRLFCRLF at every position in the buffer after every received chunk (potentially a single frame), but this is ameliorated not only by Linux's TCP stack but also by the fact that the request buffer size is only 1024 bytes, so in the worst case we spend 4084 instructions on an incoming packet the last time we run this loop for a given client, and only half that (2042 instructions) on average. This is still fast enough to handle somewhere in excess of 4 million packets per core and probably faster than Linux's TCP stack.
The flip side of that is that by sending data slowly in this way, or by not sending any data at all, a client can monopolize all 2048 concurrent connections, preventing other clients from connecting. This is a problem that cannot be solved in TCP/IP, only ameliorated. httpdito ameliorates it with a 32-second timeout, counted from when it accepts, so in the steady state it always accepts an average of at least 64 new connections per second, regardless of how fast or slowly the requests come in.
I haven't tested darkhttpd under these conditions but it has a 30-second idle timeout and a 4000-byte max request length. It kind of looks like it does the same O(N²) thing, but I've only glanced at the code. I don't know how long it allows for the client to slowloris the initial requests in one byte at a time.
4. How the server behaves if the client receives the response very slowly: I think the above covers this. However, extremely large files or extremely slow connections will whack into the 32-second timeout. At 9600 baud, for example, responses of more than about 30 kilobytes will get truncated. There's an unavoidable tradeoff between handling slow connections successfully (which requires many memory buffers), handling fast connections efficiently (which requires large memory buffers), and withstanding DoS attacks (which requires, among other things, limiting memory use).
5. Whether the process will run out of memory if a number of events happen suddenly: httpdito does all of its within-process memory allocation at compile time, so this isn't a thing that can happen within a process. Its limit of 2048 children works out to about 16 mebibytes of RAM typically, so it's not a problem systemwide either, unless you're running it on a Rio Receiver or something. darkhttpd does allocate memory, but at first glance it seems to be limited to a couple of allocations of less than 4000 bytes per concurrent connection, so I don't see a problem here either.
It's common for event-loop servers to handle overload events like these much more gracefully than multitasking servers; httpdito is kind of an outlier here because it uses about a thousandth the RAM of a normal multitasking server, and running on an OS whose task scheduling is less shitty than is typical, which allows it to get away with a lot. (The original slowloris README describes it as a DoS tool for threaded web servers.) By contrast, the usual overload problem event-loop systems have trouble with is where some event handler that usually takes a millisecond or a microsecond suddenly starts taking a second or a kilosecond, locking out everything else until it's done. Maybe you have an O(N³) algorithm in your HTTP content negotiation, but you never noticed because normal browsers send an Accept: header with five content types and so N³ means 125 iterations, and then some guy in Turkey with a screwed up browser configuration sends you an Accept: header with 1296 content-types and your whole server freezes for several seconds. Every three minutes.
You could hypothetically make an event-loop system immune to this kind of thing by doing a rigorous WCET analysis on every single event handler, as if it were jet engine control firmware. But that's a huge pain in the ass for any program of more than about 1000 machine instructions, so I've never heard of anybody doing this with an event-loop-driven network server. I've been toying with ideas for solving this problem over the years, including approaches like BPF and software transactional memory. Some of my notes on this are at https://dercuano.github.io/topics/transactions.html, while others are in Derctuo and Dernocua.
That said, I haven't tested darkhttpd at all, and I'm not running any public services on httpdito. It wouldn't surprise me if either or both had some kind of stability bug under overload conditions. But it would surprise me very much if fixing those bugs required adding thousands of lines of code.
Hmm, I was just looking at httpdito and I see that I was wrong about the timeout when it's sending: it disables the timeout once it starts sending the response, on the theory that sooner or later TCP will time out if you stop acknowledging. Well, it will, but it might take a lot longer than 32 seconds. And I don't know for sure but I suspect you can hold open a TCP connection forever by periodically resending the same ACK packet, which would mean you could totally DoS httpdito just by requesting some large web page and then never reading any of it, 2048 times. It would probably need to be big enough that it won't fit in the kernel's TCP send buffer in order to keep the child process from exiting.
At least that "solves" the 9600 baud problem, I guess.
I don't think that's the kind of thing that's likely to happen by accident, but it's probably an easy enough attack to mount intentionally.
I can see you've spent a lot of time thinking about things like slowloris. What do you think of the approach that redbean takes? https://justine.lol/redbean/index.html It still needs more unit tests, but basically what it has is a meltdown mode so that when fork() starts to fail it sends SIGUSR2 to the process group to EINTR recv() calls so that lingering processes can exit gracefully. Its parser also isn't cubic, so even for a fragmented messages, it only needs to consider each character a single time: https://github.com/jart/cosmopolitan/blob/master/net/http/pa...
This also doesn't correctly implement HTTP/1.1 (or /1.0). (It doesn't really claim to, either: the README really seems to hint that it is implementing a subset of HTTP that might work in a pinch. I don't think I'd put a production load on that, though.)
The IETF version of QUIC isn't really that bad on layering. HTTP/3 is fully decoupled from the QUIC transport layer, and thereby e.g. stream multiplexing/flow-control is also fully separated from HTTP semantics (unlike in HTTP/2).
Handshake/TLS is also mostly decoupled from QUIC, as are flow controllers (recovery).
Compared to TCP is certainly has multiplexing as part of the main layer, and uses mandatory crypto. But that's more or less the point of QUIC.
> rwasa is our full-featured, high performance, scalable web server designed to compete with the likes of nginx. It has been built from the ground-up with no externel library dependencies entirely in x86_64 assembly language, and is the result of many years' experience with high volume web environments. In addition to all of the common things you'd expect a modern web server to do, we also include assembly language function hooks ready-made to facilitate Rapid Web Application Server (in Assembler) development.
I love the fact that it uses flat assembler! It is worth checking out their other stuff: https://2ton.com.au/Products/
For example https://2ton.com.au/HeavyThing/ supports Curve25519/Ed25519, SHA3/Keccak, SHA512, SHA384, SHA256, SHA160, MD5 HMAC, PBKDF2, scrypt, HMAC_DRBG, and Poly1305. It is quite impressive.
I use it for most of my local dev. Handles only GET requests, serving files below the directory in which the server is started. Does Content-Type inference based on file extensions (e.g. html, js, png, jpg).
Currently disables any type of cacheing as I use it mostly for local development and want to avoid versioning mistakes.
Also has fledgling SSL support that has occasionally worked.
> ./http_load -p 10 -f 100000 test.url
100000 fetches, 10 max parallel, 1.024e+09 bytes, in 34.9176
seconds
10240 mean bytes/connection
2863.88 fetches/sec, 2.93262e+07 bytes/sec
msecs/connect: 0.0995282 mean, 2.717 max, 0.037 min
msecs/first-response: 3.18724 mean, 202.23 max, 0.315 min
HTTP response codes:
code 200 -- 100000
It's a simple server, yes, but it fails to correctly implement HTTP. But that's the trouble with all these "look a 'simple' X in N Loc"; yes, a great many things get simpler when you ignore the requirements of the standard.
It also has what I would call some security issues, e.g., you can ask for a file outside of the www directory.
Without stealing any bit of thunder from OP, there are Free/Open CDNs to host entire Blog package. All you need is a device/machine to generate static content & push them to these free CDNs (eg GitHub pages or Hostry).
If you want to clone a git repo from your laptop onto your cellphone (or vice versa), or test your CSS for iPhone 6 compatibility while you're actively editing it on your laptop, it's probably a lot less clunky to run a web server locally.
DarkHTTPD appears to be self-hosted on Apache. It's also configured to force HTTPS too, even if the Upgrade-Insecure-Requests header isn't present. That's one of my pet peeves.
If you liked it, you might also like my server httpdito: http://canonical.org/~kragen/sw/dev3/server.s which has documentation in http://canonical.org/~kragen/sw/dev3/httpdito-readme. It's also an HTTP server in a single source file, but it's i386 assembly, for Linux, that doesn't use libc. It's only about 700 lines of code, and the executable is up to 2060 bytes now that I've added CSS and PDF support to it.
httpdito is, to my surprise, practically useful on occasion as an alternative to things like `python -m SimpleHTTPServer`.
Unlike the server in Koshkin's excellent comment https://news.ycombinator.com/item?id=26672683 httpdito does send CRLFs as the HTTP standard demands. Unfortunately it also requires them in the request. Also—and this is a key point on which most software falls down—httpdito's documentation contains a section entitled "Are you insane?", which really is a question every programmer ought to answer in their documentation.
It doesn't have most of the things darkhttpd has, though. It doesn't support IPv6, HEAD, directory listings, byte ranges, conditional requests, keepalive, name-based virtual hosting, sendfile, logging, chroot, or privsep, and doesn't run on Solaris, SPARC, or ARM. But if you're looking for features like "small memory footprint" and "no installation needed" and "no messing around with config files" it may actually be better than darkhttpd. Due in large part to improvements in modern Linux and its small memory footprint (normally 5 pages) it's efficient enough to saturate a gigabit connection.