For those that want to get started in assembly language programming, here's a suggestion. Write a simple skeleton in un-optimized C first. Next take a look at the generated code and replace or fill in the missing parts with assembly language.
When programming in assembly language, one starts with a blank slate. Every abstraction you want to construct, say a data structure or even a function call, you build from scratch. By starting with C you eliminate many dead-ends and false starts.
By adopting C's conventions for stack allocation of local variables and how parameters are pushed on the stack and how return values are passed back to callers, you get yourself situated on firm ground rather than floundering around trying to reinvent basic linkage of caller to called functions.
When learning MIPS, the lessons I followed recommended something similar to this. We didn't look at the compiled assembly but used C as a form of pseudocode for assembly. It was a valuable experience - especially the illumination on how close to the machine C is whilst looking much higher level to me before that.
I tested to see if writing the code first in Scheme or CL offered the same benefits. It did not - it was far easier to translate solutions from C to assembly. My only guess is that a lot of work in C is similar to what I had to do in assembly, where in Scheme there isn't such things. So C only took away handling the stack for procedures and what not, where as scheme took away that and the rest of what I would need to do in assembly. Long story short, the solution in C was closer to assembly than the solution in scheme.
I am not agree. The human writes assembly language in very different way, compared to the HLL compiler. Taking the compiler as a reference, you will get all the disadvantages and none of the advantages it have.
I would suggest simply to take some good quality, human written assembly language source and to try to modify it to fit your needs. Or start to program some small assembly language program from scratch and ask on the asm forums for help.
> This is an engine for web based message board (forum)
> implemented entirely in assembly language,
> using FastCGI interface and SQLite database as a storage.
Whatever acceleration is gained by writing the middleware in assembly seems like it would be lost by relying on a database written in C, even SQLite.
To be fair, if I were to write an assembly project in this day and age, it would be to keep my chops up, not for performance (usually). Optimizers are pretty good at writing assembly these days.
It is a work in progress. I needed good transactional database in order to handle multiply connections simultaneously and SQLite fit this goal pretty well. But later if I find (or write) assembly language database, I always can change it. Combined with good assembly written web server (already available: https://2ton.com.au/rwasa/) and assembly written OS (like MenuetOS or KolibriOS) we will have the full stack for assembly language based web hosting. :)
"gcc -S sqlite3.c" will give you an assembly language database engine. :-)
OK, maybe you meant "hand-written" assembly language. But on the other hand, SQLite claims to be written in C, yet a fair amount of that C code is automatically generated using other scripts and programs. Does that mean SQLite is not really written in C?
FWIW, we actually use assembly language sqlite3.s file (as generated above) during testing. We have scripts that go through and punch out individual opcodes, then assemble the result and verify that the test suite detects the error. This is a test of the SQLite test suite more than a test of SQLite, but a strong test suite makes for a strong product, so it still helps.
"gcc -S sqlite3.c" will give me the result of the C code compilation. The fact it is in form of "assembly code" changes nothing - it still is HLL code.
The humans write assembly language in different way, because they are not limited by the HLL rules, only by the hardware resources.
This is very impressive. I wonder what the server costs to run a website like Reddit would be if they hadn't used Python and instead used assembly, C or C++.
The benchmarks game [1] suggests that C++ can have a speedup of 10x to 100x for many toy benchmark programs and use far less memory. A highly optimized assembly version could add another factor.
While code closer to the metal would require higher investments in coding, it would be drastically easier to scale. And in the long run, server costs will dominate, so it would be beneficial to reduce those.
Who knows if Reddit would be cash positive if their codebase were written in a different language?
Two stories. Years ago I wrote code to parse email messages in C. And by "parse email" I mean parse the headers into usable C structures. Looking back through git history, it appears it took me about a week's worth of coding to get all defined email headers parsed using 1375 lines of code. And I never did fine an acceptable structure to store the contents of say, the To: header (can't be an array since you can include named groups).
More recently, I re-approached the same problem, but this time using Lua (and LPEG). It's easily half the code (658 lines) and this time, it was rather easy to store the contents of the To: header (given the nature of Lua tables). And while I suspect the C version is faster (honestly, I don't know---I haven't actually measured it) the Lua/LPEG version probably does give it a run for its money (LPEG compiles into its own parsing VM). It took a days worth of work (plus a few bug fixes here and there).
I can't imagine doing this type of project in assembly language (and I did nothing but assembly work for ten years, mid-80s to mid-90s). It'll be largely the same as C (read byte by byte, jump tables etc.) but you also have the overhead of dealing with calling conventions---you either juggle parameters into registers, or spend half your time pushing data onto the stack. It becomes real tedious real fast.
Second story. At work, I was told to look into this SIP stuff. So I did. And I wrote code in Lua/LPEG to parse SIP messages (given that it's the same base format as email messages, why not use something easy write to play around with this stuff?). Pretty soon, it became the product (sigh---the prototype is the product, but that's a rant for another time) but it turned out to be fast enough to handle actual telephone network traffic levels. It also received an unexpected load test as fewer instances were running than expected, and a routing issue caused about 3x the expected traffic. All that and I've yet to profile the code [1].
So, would Reddit have more money with a different language? Well, it was rewritten once from what I understand, but I think that was from a language only a few have experience with to one that has a larger coding base. I don't think you'll find the number of programmers capable of out-writing a compiler with assembly that you will with PHP, Java, Python or Ruby experience, never mind the negative responses you'll receive from the anti-C crowd.
[1] To be fair, the code is only taking SIP messages, parsing for some expected information, repacking that up into another (proprietary) format to call the business logic component, which is written in C++ [2], taking the results and sending a new SIP message. So it's not doing that much overall.
[2] Because the developer thought he was programming this for a 1MHz 6502 with 16K of RAM, even though it's running multiple 64-bit SPARC [3] boxes with 24 or 32 cores and gobs of RAM but again, that's another rant.
[3] Because of telecom requirements for power; I suspect if we could have found x86 boxes that met the requirements, we would have used them.
It is very fast, but how much of that is due to it being written in assembly? Isn't the real bottleneck for an application like this going to be network speed?
Hard to say, but some details about the hosting may help to estimate. The site is hosted on the cheapest possible shared hosting for 2.5€ per month: 15GB space, 150GB monthly transfer, CPU usage 30 minutes per 24 hours, 1GB RAM, max 60 processes.
When programming in assembly language, one starts with a blank slate. Every abstraction you want to construct, say a data structure or even a function call, you build from scratch. By starting with C you eliminate many dead-ends and false starts.
By adopting C's conventions for stack allocation of local variables and how parameters are pushed on the stack and how return values are passed back to callers, you get yourself situated on firm ground rather than floundering around trying to reinvent basic linkage of caller to called functions.