This is correct, from the snippet given - what the other commentators are failing to say is that they're assuming the registers are initialised "offstage" by some other part of the program.
If all functions took (rax, rbx, rcx, ...) and you had loop statement that defrements rcx and jumps unless zero, then how would you write fib-function’s body?
In a single-core cpu, the operating system's scheduler manages the register state for each process: basically, when switching from one context to another, it dumps the old process' registers to memory, and loads the register state to the new one. From the user's point of view, the register state will appear unaffected by different processes: your loop register will not changed by other processes and threads. There is no parallelism, so only one program and register state is active at once, but there is concurrency (if the OS supports it). On a multi-core processor, each core has its own set of registers, so the scheduler could theoretically run a multiple processes uninterupted on one core per process.
Answer is you don’t have to initialize rcx, since it is already a prerequisite for a looping subroutine. There is no for (i=0; i<n; i++). Argument n is a counter and it is rcx itself. It is ‘discharged’ by a loop. If caller wants to save it, then it does that by his own means. This contradicts all regular languages’ rules, but this is low-level, far below all the safety.
On your second question about different loops. First, you can loop through any register via (dec <reg>; jnz <label>), thus having nested loops. Second, all valuable registers are pushed onto the stack and popped before/after a call or an interrupt, so their contents don’t change. Some registers have to be saved by the caller and some by the callee, depending on a calling convention and an actual usage. The stack is always available (rsp points to its top), so you can offload temporary values and concentrate on current evaluation, loading values on demand later.
I think this is the biggest problem with learning x86 assembly (or ARM or anything else) on modern systems (or more specifically modern operating systems).
It’s sometimes difficult to think about the assembly code in situ when you start to think about the operating system doing a ton of context switching and paging etc. in the background, which can distract your thought process from what’s right in front of you (as well as the operating system’s software interrupts / system calls on top of the basic ISA, which is another abstraction!)
Older systems had the currently running program as the entire context of the system at that point in time - in a similar way to embedded programming, which is imho a much easier realm to learn assembly in once you’ve got a bit of basic electronics under your belt!
The whole point of how interrupt handling works is that it returns back to the same state of the program already in progress when finished. The abstraction is such that the interrupted program doesn't need to care.
Even in those "old" systems of single address spaces and no protection, you're constantly getting timer interrupts, interrupts for I/O, etc., which your application might not have installed its own handler for.
I agree - my main point is that an OS is ‘just a program’ as well
I suspect we’re both making a similar point in a roundabout way - the operating system is both another layer of abstraction on top of the Instruction Set, while also making the programming process for that chipset somewhat easier (providing software interrupts etc. at the expense of bare metal understanding).
My argument is loosely that modern (x64) assembly is not so much targeting hardware as it is programming into a software abstraction (the operating system).
The code above I believe is the point to jump to when the register has been assigned some value for x. Think along the same lines as a function call once the pre-amble is out of the way (or a ‘goto’ by someone who knows when to use it!)