Was watching someone getting a Altair to run basic via a teletype and a paper tape. The amount of bit flipping just to get the initial bootloader ready made me appreciate the work Woz and like did to make later systems boot straight to a prompt on a CRT.
On a different note, those old systems had a RAM and CPU clock that was basically synced. Thus my understanding is that one could "cycle count" ones way to high performance code.
The CPU was faster than the memory by the mid to late 70s (though they did start roughly equal at the start of the decade). You would insert a number of "wait states" with an external counter to ensure your memories' timing was not violated.
At that time RAM was asynchronous, you put an address in the A bus and by the specified delay time you would have your result on the output. Later on memories were pipelined and that is when they got their clock.
On a different note, those old systems had a RAM and CPU clock that was basically synced. Thus my understanding is that one could "cycle count" ones way to high performance code.