Creating a Fibonacci Generator in Assembly

emersonrsantos · on Sept 10, 2018

In Forth language would be:

  : fib 2dup + ;

2dup word duplicates n and n-1 elements and put it on the top of the stack.

+ word takes n and n-1 elements from the stack and sum them. The result goes on top of the stack.

Just start with 1 and 1 at the top of stack.

willvk · on Sept 10, 2018

That's some very concise code golf! It would be interesting in seeing how Forth compiles this under the hood.

pjc50 · on Sept 10, 2018

Normally Forth is interpreted. Compiled Forth code is unlikely to look particularly clean or elegant.

(If you like languages with a very high density of functionality to symbol count, you should look at APL and its successor J)

userbinator · on Sept 10, 2018

As someone with considerable experience in using x86 Asm (over a decade), it's always nice to see more articles on it, but there are a few things I found odd about this one.

The basis of the algorithm to use was decided to be a space-optimised method using a dynamic programming approach. This effectively means using an array on the stack to hold our values while giving O(n) time and space complexity.

That isn't "space-optimised" at all --- when I think of a Fibonacci example I think of O(1) space. Unless it was chosen specifically to demonstrate array indexing (in which case it would be better to use an algorithm that actually requires an array, like sorting), the choice of algorithm is unusual. The same applies to finding the length of argv[1] before doing the equivalent of an atoi() on it --- if you're going to read the bytes of the string through, might as well accumulate and convert, and oddly enough the length appears to be unused too.

The unexplained and useless nop is also puzzling. Why is it there? What is it for? The same goes for the many other instructions which aren't actually contributing to the calculation, and initially made me think it was compiler output when I first glanced over the code. Writing a program that performs the given task is relatively straightforward even in Asm, but your solution is unnecessarily complex.

Making things easier to understand using function calls

I consider this a premature abstraction. A comment or even label will do just fine to delimit the blocks of functionality, as your "functions" are called only once. Your task has 3 parts [basically atoi(), fib(), itoa()+output] so you will have 3 blocks of code. It cannot be any simpler than that. Better is "making things easier to understand by avoiding unnecessary code".

Here is how I would write the first 2 blocks; I wrote this without testing at all so it could have errors, but it is really quite simple:

        mov esi, [esp+8]  ; get argv[1]
        xor eax, eax      ; clear the upper bits
        xor ecx, ecx      ; will hold converted value
    atoiloop:
        lodsb             ; get a character
        cmp al, '0'
        jb fibinit
        cmp al, '9'
        ja fibinit
        imul ecx, ecx, 10
        lea ecx, [ecx+eax-'0']
        jmps atoiloop
    fibinit:
        jecxz invalid       ; can't do fib(0)
        xor eax, eax
        inc eax             ; eax = 1
        cdq                 ; edx = 0
    fibloop:
        xadd eax, edx
        loop fibloop
        ; now eax and edx will contain fib(n) and fib(n-1)

Observe that I chose the specific registers for a reason: esi/al will be used by lodsb, ecx by the looping instructions, and edx for xadd. This is one of the things that compilers are not very good at, and also why carefully written Asm can easily beat them.

If you want to learn more Asm I recommend any assembler besides GAS, and looking at some examples from advanced users: http://www.hugi.scene.org/compo/compoold.htm The demoscene in its smaller-size competitions has lots of particularly knowledgeable users too.

willvk · on Sept 10, 2018

That's some awesome feedback. Thanks so much. As I'm teaching myself asm I'm learning as I go. I'll incorporate it all into a revised version soon.

pkilgore · on Sept 10, 2018

Quick suggestion to massively improve your writing for general consumption: strip out the passive voice where not absolutely necessary, e.g., "was decided". Not only is the active voice "I/we decided" generally much more interesting to read, it usually forces you to adopt a sentence structure with fewer words (easier to read). Your writing later in the article is much more engaging as you start to use the active voice.

And thanks for the article, I love reading interesting excursions in low level programming!

willvk · on Sept 11, 2018

Thanks for reading, and thanks for the feedback! That's a good tip on the passive/active voice. I did actually think about standardising that but let it (unfortunately) slide as I thought I would go from my initial thinking to more instructional but that's probably not as effective and engaging to the reader the whole way through. Will remember that tip for my next post! Thanks again!

exikyut · on Sept 10, 2018

I mentioned this elsewhere, but just to make sure you see it, I also stumbled on https://codegolf.stackexchange.com/a/135618/15675, which may also be useful.

willvk · on Sept 10, 2018

Thanks for that. Yep, I've seen that too.

pjc50 · on Sept 10, 2018

Your article is actually pretty handy in terms of setting up the "workflow" for compilation and debugging from scratch. It's the sort of thing genuine beginners need to avoid getting tripped up.

willvk · on Sept 11, 2018

Thank-you for your kind words about the post. That was what I was trying to do with it - to break down some of the nebulous fear that beginners have with assembly. Cheers!

saagarjha · on Sept 10, 2018

> The unexplained and useless nop is also puzzling. Why is it there? What is it for? The same goes for the many other instructions which aren't actually contributing to the calculation, and initially made me think it was compiler output when I first glanced over the code. Writing a program that performs the given task is relatively straightforward even in Asm, but your solution is unnecessarily complex.

It is explained:

> Line 5 is a no-operation (or noop), which means that it is a filler instruction and doesn’t actually do anything but take up a CPU clock cycle. Older versions of GDB require this proceeding _start, and so I have left it in for these examples.

userbinator · on Sept 10, 2018

So it is. It would've been much clearer to mention that at the beginning when it's first introduced, since it's not commented like the other instructions and I first thought it was a left-over placeholder.

That said, it must be a very old GDB since I've otherwise not ever seen a NOP as the very first instruction of any nontrivial program in all these years... maybe two NOPs in the early days for patching, but that's about it.

willvk · on Sept 10, 2018

I agree the structure of some of my explanations could have been a bit better. The book I was using as a basis was from the mid-90s so it would be a version of GDB from 20 years ago that requires the NOP.

exikyut · on Sept 10, 2018

As someone interested in understanding/getting into assembly language (actually for years), I have a small torrent of naive and hopefully not too annoying questions. :)

- Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.

- argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

- Continuing along the implicit linked-ness train, how is accessing AL reading the value set in SI?

- How does the LEA usage here work? You're putting the address of "[ecx+eax-'0']" into ecx. First question, how is ecx not clobbered? Secondly, how does that dereference work? I see similar semantics used in the 2nd and 3rd instructions, it seems AX and CX are linked in some way (in this situation).

- So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?

- At the end you have a loop around a single xadd instruction, (which, if I understand what's happening, is the equivalent of "edx += eax" here?). The only way I can interpret this is that the atoiloop routine was actually building up a stack (in the LEA bit) and this is ticker-tape-ing its way through that?

I appreciate any insight and your patience :) if I'm learning something unfamiliar enough to be completely disorientating, reading through insights others have written down (ie, the questions others asked and documented the answers to) doesn't seem to help. I have to ask a thousand questions of my own in order to get my bearings.

This could be the fault of the assembly "tutorials" out there though. I can't remember the number of articles I've eye-glazedly read through; I understand binary, the stack, the basics, etc, incredibly incredibly well, but (as you can see) have tons of blocking issues and holes in my understanding. Some I'm aware of - for example, I know I have no mental model of how x86 memory management is done, and I've never been able to find anything concise that just deals with that - but unfortunately some of the holes are so big all I know is that something's missing and that my eyes glaze over and I don't know why. In any case, learning difficulties FTL. Been wanting to understand asm since 2006.

NB. While responding to another answer I stumbled on https://codegolf.stackexchange.com/a/135618/15675, a(n exhaustingly) exhaustive exploration of Fibonacci in x86, also for Linux.

saagarjha · on Sept 10, 2018

> Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.

The kernel puts them there when the program starts; I believe this is part of the System V calling convention so it's not Linux-specific.

> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.

> How does the LEA usage here work? You're putting the address of "[ecx+eax-'0']" into ecx. First question, how is ecx not clobbered? Secondly, how does that dereference work? I see similar semantics used in the 2nd and 3rd instructions, it seems AX and CX are linked in some way (in this situation).

lea just does math, it doesn't dereference anything. ecx is being clobbered, but that's what we want: the lea is doing the equivalent of ecx += eax - '0'.

> So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?

This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.

exikyut · on Sept 10, 2018

>> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

> Where do you see this?

Here:

  mov esi, [esp+8]  ; get argv[1]                value written to esi; okay
  xor eax, eax      ; clear the upper bits       \ how is writing to eax
  xor ecx, ecx      ; will hold converted value  / and ecx now relevant?!

> lea just does math, it doesn't dereference anything.

Right; I said "LEA usage"; the instruction itself I understand, but "[ecx+eax-'0']" is throwing me for a loop.

T-hawk · on Sept 10, 2018

LEA does math, but with a particular structure to the inputs: a pointer plus an index plus a constant. That's meant to be used as a C compiler would, where the pointer would be to a struct array, the index would be the position within the array, and the constant would be the byte offset within the struct.

Calculating a memory address in that way is a common operation, so x86 processors give it special support with hardware that can calculate that sum all in one step.

Most often the computed address is used in a MOV instruction. LEA provides access to the address-computation hardware without doing the actual data move. That's useful because some math operations can be done faster through that hardware than with general-purpose math instructions. LEA requires that pattern of register + register*shift + constant (we aren't using the shift here).

    imul ecx, ecx, 10
    lea ecx, [ecx+eax-'0']

Those two instructions multiply ecx by 10 and then add the next digit (stored as ASCII in eax) to it. Subtracting '0' converts that digit from ASCII to its numerical value. It just so happens that LEA can execute the add and conversion fast and together in one step, because that combined operation matches the pattern of computing an address.

willvk · on Sept 10, 2018

I didn't know you could do that with lea. That's pretty cool!

saagarjha · on Sept 10, 2018

Sorry, I'm actively editing my answer as I get a better sense of your questions and I can answer more of them. Please see my edits; also, the specific lines you pointed out are not related to reading in argv[1]. They're just zeroing out eax and ecx; eax because we will only be writing to the lower bits with the lodsb and don't want out arithmetic to be fouled by anything left over in the higher bits, and ecx because we are continuously adding to it to in atoi, so it needs to start off at zero.

userbinator · on Sept 10, 2018

They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.

lodsb (b for Byte) loads into al, not ax. It's lodsw (Word) which uses ax, and of course there's lodsd for eax and even lodsq for rax (64-bit only).

This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.

Actually that's the non-error path; eax is set to 1 and edx to 0 as initial values for computing fib(n) in the fibloop.

saagarjha · on Sept 10, 2018

Oops, you're right. I should have read that more carefully…

exikyut · on Sept 11, 2018

NB Addendum: https://i.imgur.com/u6jtIAZ.png

exikyut · on Sept 10, 2018

FWIW, the article mentions a possible followup "using ... MMX, XMM or SIMD". (To clarify, XMM is SSE.) SIMD is noted last as though it's the newest development, when that would arguably be is at least AVX, and AVX-512 to be specific.

willvk · on Sept 10, 2018

Thanks for that. I didn't intend for it to be a chronological list but will add AVX to it. Cheers.

exikyut · on Sept 10, 2018

Oh, okay, my bad!

And wow, fibonacci using hand-rolled AVX :) that would definitely be a sight to see.

...although, I should explicitly point out, AVX is a set of vector streaming/processing instructions. Fibonacci is inherently a single-step operation, with no lookahead.

Okay, so I googled "fibonacci avx". The results were quite a mess but some careful wading found me http://www.insideloop.io/blog/2014/10/14/arrays-part-iii-vec..., which states: "A a consequence, the following code that computes the sequence of Fibonacci can’t get vectorized."

So, forgive my own (bullish!) naivete. AVX, and _possibly_ SIMD, SSE, etc _may not_ be applicable to/usable for Fibonacci computation.

userbinator · on Sept 10, 2018

Vector instructions are certainly usable for scalar computation, although it doesn't make much sense --- you basically just use one piece of one vector and throw away all the other results.

Fibonacci is a toy example but illustrates where the vector instructions do become useful: if you want to compute the n'th term of various Fibonacci-like sequences with different starting conditions but the same recurrence, then you could do them in parallel using those instructions.

leghifla · on Sept 10, 2018

Unrolling the computation of the series, it is possible to compute f(n) using only f(n-4) and f(n-8):

f(n) = 7*f(n-4)-f(n-8)

So, using 4-vector operations, you can compute 4 numbers in one step.

pq0ak2nnd · on Sept 10, 2018

every programmer needs to do this at some point in their life. it is too easy to take take modern languages for granted. but i'm a little disappointed that you dogged `make`. i think the programming landscape would be a lot more effective if people took the time to learn about what already exists (not just make but autoconf/automake/configure) instead of reinventing the wheel every three years.

willvk · on Sept 10, 2018

Thanks for reading and thanks for the feedback. It wasn't my intention to 'dog' make, I just felt that for an entry level post that introducing the make syntax may detract from the focus on assembly (although make is an integral part of gas asm) and add extra complexity and length to the (already very long) article. Anything more complex, such as multiple .s files would definitely warrant a description of make. Thanks again.

emilfihlman · on Sept 10, 2018

  movl $1, %eax	movl	$1	%eax	copy 0 into register eax
  movl $0, %ebx	movl	$0	%ebx	copy 1 into register ebx

Surely the explanations are flipped?

willvk · on Sept 10, 2018

Yep. It's a typo. Thanks for the feedback.

saagarjha · on Sept 10, 2018

I think you might be missing the NULL that terminates argv in your stack diagram.

willvk · on Sept 10, 2018

Thanks for that. I've amended the diagram accordingly.

sdfsdfsdff · on Sept 10, 2018

It's also a problem in 7 Billion Humans or Human Resource Machine (I forgot which - they are both great anyway), fun programming games that also use a kind of assembly language.