Ask HN: What are some examples of beautiful x86 assembly code?

jbaiter · on April 27, 2018

I bought a nice little booklet of small x86 gems a few years back[1][2] and this one is my favorite:

  .loop:
      xadd     rax,rdx
      loop     .loop

Fibonacci with two instructions.

[1] https://www.amazon.com/dp/1502958082

[2] https://www.xorpd.net/pages/xchg_rax/snip_00.html

zero_one_one · on April 27, 2018

I’d like to add to this comment to say that the author has a fantastic assembly course available on udemy - very thorough and practical (teaching the basics in a win32 environment)

https://www.udemy.com/x86-asm-foundations/

vortico · on April 27, 2018

Is that series endorsed by xorpd or an account just called xorpd? Udemy is known for allowing anyone to steal, upload, and profit from video series.

EDIT: Looks like it is. https://www.xorpd.net/pages/x86_adventures.html

xorpd · on April 27, 2018

Yes, I made the videos a few years ago. This video series is mostly for beginners as it goes pretty slowly.

The exercises are open source and can be found here: https://github.com/xorpd/asm_prog_ex I think that more experienced developers can take the instruction set and bash through the exercises.

zero_one_one · on April 27, 2018

I think it’s created by the man himself - having been through it it’s a very thorough coverage of the basic instruction set without delving too much into the windows specifics (it uses FASM syntax from what I recall)

giancarlostoro · on April 27, 2018

Having barely any assembly experience, and looking up what xadd means[0] it really is quite a nice little snippet of code.

[0] https://stackoverflow.com/a/30131285

neilsimp1 · on April 27, 2018

Can someone more versed in x86 explain to me how this works?

It's been a long time since I've done any assembly, and that was MIPS, but I don't see how there's any sort of exit condition or anything. I'm guessing there's more to xadd than `x + y = z`?

giancarlostoro · on April 27, 2018

I don't know much Assembly, but at what point does Fibonacci end? Shouldn't it just keep going?

booblik · on April 27, 2018

No, loop decrements rcx until it reaches zero and then stops

abhishekjha · on April 27, 2018

But rcx has not been initialised.

pjc50 · on April 27, 2018

This is correct, from the snippet given - what the other commentators are failing to say is that they're assuming the registers are initialised "offstage" by some other part of the program.

wruza · on April 27, 2018

If all functions took (rax, rbx, rcx, ...) and you had loop statement that defrements rcx and jumps unless zero, then how would you write fib-function’s body?

abhishekjha · on April 27, 2018

You raise a good question. At one time you would have only one loop running through the system as per my understanding. I am a little lost here.

dangerbird2 · on April 27, 2018

In a single-core cpu, the operating system's scheduler manages the register state for each process: basically, when switching from one context to another, it dumps the old process' registers to memory, and loads the register state to the new one. From the user's point of view, the register state will appear unaffected by different processes: your loop register will not changed by other processes and threads. There is no parallelism, so only one program and register state is active at once, but there is concurrency (if the OS supports it). On a multi-core processor, each core has its own set of registers, so the scheduler could theoretically run a multiple processes uninterupted on one core per process.

wruza · on April 28, 2018

Answer is you don’t have to initialize rcx, since it is already a prerequisite for a looping subroutine. There is no for (i=0; i<n; i++). Argument n is a counter and it is rcx itself. It is ‘discharged’ by a loop. If caller wants to save it, then it does that by his own means. This contradicts all regular languages’ rules, but this is low-level, far below all the safety.

On your second question about different loops. First, you can loop through any register via (dec <reg>; jnz <label>), thus having nested loops. Second, all valuable registers are pushed onto the stack and popped before/after a call or an interrupt, so their contents don’t change. Some registers have to be saved by the caller and some by the callee, depending on a calling convention and an actual usage. The stack is always available (rsp points to its top), so you can offload temporary values and concentrate on current evaluation, loading values on demand later.

zero_one_one · on April 27, 2018

I think this is the biggest problem with learning x86 assembly (or ARM or anything else) on modern systems (or more specifically modern operating systems).

It’s sometimes difficult to think about the assembly code in situ when you start to think about the operating system doing a ton of context switching and paging etc. in the background, which can distract your thought process from what’s right in front of you (as well as the operating system’s software interrupts / system calls on top of the basic ISA, which is another abstraction!)

Older systems had the currently running program as the entire context of the system at that point in time - in a similar way to embedded programming, which is imho a much easier realm to learn assembly in once you’ve got a bit of basic electronics under your belt!

asveikau · on April 28, 2018

The whole point of how interrupt handling works is that it returns back to the same state of the program already in progress when finished. The abstraction is such that the interrupted program doesn't need to care.

Even in those "old" systems of single address spaces and no protection, you're constantly getting timer interrupts, interrupts for I/O, etc., which your application might not have installed its own handler for.

zero_one_one · on April 28, 2018

I agree - my main point is that an OS is ‘just a program’ as well

I suspect we’re both making a similar point in a roundabout way - the operating system is both another layer of abstraction on top of the Instruction Set, while also making the programming process for that chipset somewhat easier (providing software interrupts etc. at the expense of bare metal understanding).

My argument is loosely that modern (x64) assembly is not so much targeting hardware as it is programming into a software abstraction (the operating system).

zero_one_one · on April 27, 2018

The code above I believe is the point to jump to when the register has been assigned some value for x. Think along the same lines as a function call once the pre-amble is out of the way (or a ‘goto’ by someone who knows when to use it!)

dangerbird2 · on April 27, 2018

It's a hardware register, so it has always has a value (in 64-bit mode, at least), whether or not the program explicitly sets it.

boomlinde · on April 27, 2018

But it has a value regardless

bonzini · on April 27, 2018

"xadd r1,r2" sets r2=r1 and r1=r1+r2 at the same time.

psyc · on April 27, 2018

I was confused until I noticed the "x" in front of "add".

partycoder · on April 27, 2018

interesting but i prefer Binet's formula

varjag · on April 27, 2018

My interest, save for occasional tweak in ARM boot code, is mostly historical. Grew up on Z-80 assembly but really enjoy PDP-11 code for didactic purposes. You can easily see that the rise of optimising C compilers was helped by divergence from PDP instruction set. Not ever since C idioms map down so nicely.

strlen():

  LEN:   MOV #-1, R0
  1⊙     INC R0
         TSTB (R1)+
         BNE 1⊙

or take strcpy():

  COPY:  MOVB (R1)+, (R0)+
         BNE COPY

versus the C version:

  void strcpy(char *s, char *t) 
  {
    while (*s++ = *t++)
      ;
  }

It translates to the optimal machine code verbatim. It does not require any smarts of the compiler, at the expense of understanding on the side of the programmer. This is made possible by orthogonality of the instruction set, which in less well thought out designs was hacked around with special instructions (like LDIR on Z80). The price for that is you have to make compiler optimize these situations.

vardump · on April 27, 2018

That's remarkably close to 68k str-functions.

Page 133 http://www.atarimania.com/documents/Asm_Lang_Prog_68K_Family...

  * STRLEN - RETURNS LENGTH OP NULL TERMINATED STRING IN D0
  * A0 -> STRING
  STRLEN: MOVE.L A0,-(SP) SAVE REG
          CLR.L  D0       INITIALIZE
  STRLENI:TST.B  (A0)+    NULL?
          BEQ    STRLENR  YES, RETURN
          ADDQ.L #1, D0   BUMB COUNT
          BRA    STRLENI  LOOP
  STRLENR:MOVE.L (SP)+,A0 RESTORE REG
          RTS

  We might also want to copy a string:
  
  * STRCPY - COPY A NULL TERMINATED STRING
  * A0 -> SOURCE STRING
  * A1 -> DESTINATION STRING
  STRCPY: MOVEM.L A0-A1,-(SP) SAVE REGS
  STRCPY1:MOVE.B  (A0)+,(A1)+ MOVE A BYTE
          BNE     STRCPY1     GET ANOTHER IF NOT NULL
          MOVEM.L (SP)+/A0-A1 RESTORE REGS
          RTS
  
  Next, we will want to compare two strings:
  
  * STRCMP - COMPARE TWO NULL TERMINATED STRINGS
  * A0 -> STRING 1
  * A1 -> STRING 2
  STRCMP: MOVEM.L A0-A1,-(SP)  SAVE REGS
  STRCMP1:CMPM.B  (A0)+,(A1)+  COMPARE BYTES
          BNE     STRRET       RETURN IF DIFFERENT
          TST.B   -1(A0)       HAVE WE HIT A NULL?
          BNE     STRCMP1      NOW MORE BYTES LEFT
  STRRET: MOVEM.L (SP)+,A0-A1  RESTORE REGS
          RTS

Although I guess I'd implement strlen more like this (not sure if it works, been a long time since I last wrote anything for 68k):

  strlen:   movem.l a0-a1,-(sp)
            move.l  a0, a1   ; copy a0 to a1
  slenloop: tst.b   (a0)+
            bne     slenloop
            addq.l  #1, a1
            sub.l   a0, a1
            move.l  a1, d0
            movem.l (sp)+,a0-a1
            rts

Just two instructions in the inner loop. But not sure whether address registers supported addq etc.

varjag · on April 27, 2018

Wow it's close indeed, even stack manipulation looks the same. Would be

  MOV  R0, -(SP)
  ....
  MOV  (SP)+, R0

for the PDP code.

vardump · on April 27, 2018

My faster (?, see [0]) strlen seems to compile (using MIT 68k syntax):

  > cat > test.asm
  strlen:   movem.l %a0-%a1,-(%sp)
            movl    %a0, %a1     ;# copy a0 to a1
  slenloop: tst.b   (%a0)+
            bne.b   slenloop
            addq.l  #1, %a1
            sub.l   %a0, %a1
            move.l  %a1, %d0
            movem.l (%sp)+,%a0-%a1
            rts
  ^D

  > m68k-linux-gnu-as test.asm && m68k-linux-gnu-objdump -d a.out
  a.out:     file format elf32-m68k
  
  Disassembly of section .text:
  
  00000000 <strlen>:
     0:	48e7 00c0      	moveml %a0-%a1,%sp@-
     4:	2248           	moveal %a0,%a1
  
  00000006 <slenloop>:
     6:	4a18           	tstb %a0@+
     8:	66fc           	bnes 6 <slenloop>
     a:	5289           	addql #1,%a1
     c:	93c8           	subal %a0,%a1
     e:	2009           	movel %a1,%d0
    10:	4cdf 0300      	moveml %sp@+,%a0-%a1
    14:	4e75           	rts

I wonder whether it actually works... I think same idea should be translatable to PDP-11 as well.

[0]: It runs slower for short strings, though. Not sure where the break even point is.

vardump · on April 27, 2018

Previous one didn't work, because I mixed up a0 and a1 pointers when subtracting.

But that gave me a new idea: copy pointer to d0 and complement the difference instead of addq, saving a1 register (+stack operations) and a move-instruction:

  strlen:   move.l  %a0,-(%sp)
            movl    %a0, %d0   ;# d0 = a0;
  slenloop: tst.b   (%a0)+     ;# test *a0, post incr
            bneb    slenloop   ;# loop if non-zero
            sub.l   %a0, %d0   ;# d0 = d0 - a0;
            not.l   %d0        ;# d0 = ~d0;
            move.l  (%sp)+,%a0
            rts                ;# d0 is the return value

Equivalent C-code (http://cpp.sh/2oecd):

  #include <stdio.h>
  #include <stdint.h>
  
  size_t strlen68k(char* a0) {
      int zero_flag;
      uintptr_t d0 = (uintptr_t) a0;
      do {
          // tstb (%a0)+
          if (*a0 == 0) 
              zero_flag = 1;
          else
              zero_flag = 0;
          a0++; // (%a0)+ (post increment)
      } while(!zero_flag); // bneb slenloop
      d0 = d0 - (uintptr_t) a0; // sub.l %a0, %d0
      d0 = ~d0;  // not.l %d0
      return d0;
  }
  
  void strlen_test(char* str) {
      printf("str \"%s\" length is %lu\n", str, strlen68k(str));
  }
  
  int main(void) {
      strlen_test("");
      strlen_test("a");
      strlen_test("abcd");
      return 0;
  }

Again, untested, but I think it probably works. I guess this is also faster for all string lengths >0 as well.

This idea might not work on PDP-11 anymore, depending on how binary arithmetic is implemented.

oh5nxo · on April 27, 2018

68xx is another case that got the conditions which set flags _just_ right. IMHO, obv

sedatk · on April 27, 2018

The source code of AGC (Apollo Guidance Computer) written by Margaret Hamilton and her team. Clean and obviously it worked well. Comments are also interesting to read. https://github.com/chrislgarry/Apollo-11

quickthrower2 · on April 27, 2018

That's certainly not x86 code, but loved the link anyway, thanks.

Something1234 · on April 28, 2018

What are some particularly interesting parts of this poke around in?

pcwalton · on April 27, 2018

Here's a wiki dedicated to very small x86 programs: http://www.sizecoding.org/wiki/Main_Page

Lots of fun stuff there, like a scrolling Matrix screen saver in 8 bytes (!!) that jumps into the middle of an instruction: http://www.sizecoding.org/wiki/M8trix_8b

basementcat · on April 27, 2018

That is a great resource. Another great example is a 16 byte paint program http://www.sizecoding.org/wiki/Paint16b

xem · on April 27, 2018

If you're interested in demoscene, this explains how to make x86 programs in 256b or less: http://sizecoding.org/

...

and yes, I consider it beautiful x86 code :)

panic · on April 27, 2018

Jones Forth: http://git.annexia.org/?p=jonesforth.git;a=blob_plain;f=jone...

There was some discussion of it here a while ago: https://news.ycombinator.com/item?id=942684 (though unfortunately the site is now dead)

q845712 · on April 27, 2018

In my work book club we've been reading the book "zero bugs" by Kate Thompson.

The first half is some fun writing with stock good advice for software development, and the second half is a bunch of code examples the author finds interesting, many of which are in various flavors of assembly.

(I have no affiliation with the author, publisher, nor any other ulterior motive. Office consensus is that this is the best book we've done a book club around - highly recommended as just plain fun to read, if you're already writing software.)

imron · on April 27, 2018

It's out of date now, but back in the day Michael Abrash's 8-cycle per pixel texture mapper was considered a thing of magic:

http://www.jagregory.com/abrash-black-book/#thats-nicebut-it...

frostirosti · on April 27, 2018

Come on, the movfuscator has to be the best x86 I've seen:

https://github.com/xoreaxeaxeax/movfuscator

harel · on April 27, 2018

I was going to ask why? But then I read the single faq at the end: because I thought it would be funny...

stochastic_monk · on April 27, 2018

In addition to being funny, it's appealing to me to have the option of generating assembly to perform a task in such a way that a human looking at the assembly would have enormous difficulty in determining what is performed. As suggested by its name, it serves as a nice obfuscator.

It also fully avoids all branches.

kardos · on April 27, 2018

No movfuscator post is complete without mentioning the demovfuscator [1]

https://github.com/kirschju/demovfuscator

trentnelson · on April 27, 2018

This injection thunk I wrote in order to do facilitate remote code and context (file handles, event handles, shared memory maps etc) injection: https://github.com/tpn/tracer/blob/master/Asm/InjectionThunk...

msravi · on April 27, 2018

IBM PC x86 BIOS code:

https://github.com/kaneton/appendix-bios

OT: This brings back memories of tinkering with the MS-DOS boot process. Back then, the BIOS would read the MBR and copy its contents to 0x7C00 and start execution from there. So you could assemble your own code (using no less than MS-DOS debug) and plonk it into the MBR. I remember doing things like fooling the boot loader into thinking there's less ram than there actually was (639kB instead of 640kB) and using the unaccounted 1kB for placing your own code that could be triggered by a captured interrupt... Fun times!

mschaef · on April 27, 2018

What was the advantage of this over 'Terminate and Stay Resident'?

msravi · on April 27, 2018

So yes, this was a TSR, except that it resided in unaccounted for memory. Also, since you put it into the MBR, it would get loaded into ram automatically with every boot, and no fiddling with autoexec.bat or anything where it could be easily discovered. All this of course, in trying to understand the fascinating world of viruses.

linuxlizard · on April 27, 2018

Maybe stealthiness? TSRs could be viewed with the right tools, IIRC. This seems like a technique to hide code?

leni536 · on April 27, 2018

Well, not written in assembly but uses x86 intrinsics:

https://github.com/leni536/fast_hilbert_curve

I'm pretty proud of it. It calculates the coordinates of the nth point on the hilbert curve. No loops, no branches. It uses the pext, pdep and popcount intrinsics creatively.

keldaris · on April 27, 2018

Interesting. Have you tried generalizing that for higher dimensions?

dexen · on April 27, 2018

Duff's device, as emitted by GCC[0], is a bit on the verbose side but still quite neat. In particular the single-instruction computed goto that uses a look-up table made up of 8 quad-words, filled in by the linker.

Note the '.section .rodata' directive which actually places the quads pointers, seemingly interleaved with code, in a read-only data section.

Note also the dec/test/jle instructions implementing the while loop occur before the last of the eight copy operations, and interleaved with the next-to-last copy operation.

  duff:
  .LFB0:
          .cfi_startproc
          lea     eax, [rdi+7]
          mov     r8d, 8
          mov     rcx, rdx
          cdq
          idiv    r8d
          mov     r9d, eax
          mov     eax, edi
          cdq
          idiv    r8d
          cmp     edx, 7
          ja      .L2
          mov     edx, edx
          jmp     [QWORD PTR .L4[0+rdx*8]]
          .section        .rodata
          .align 8
          .align 4
  .L4:
          .quad   .L3
          .quad   .L5
          .quad   .L6
          .quad   .L7
          .quad   .L8
          .quad   .L9
          .quad   .L10
          .quad   .L11
          .text
  .L11:
          mov     al, BYTE PTR [rsi]
          inc     rsi
          mov     BYTE PTR [rcx], al
  .L10:
          mov     al, BYTE PTR [rsi]
          inc     rsi
          mov     BYTE PTR [rcx], al
  .L9:
          mov     al, BYTE PTR [rsi]
          inc     rsi
          mov     BYTE PTR [rcx], al
  .L8:
          mov     al, BYTE PTR [rsi]
          inc     rsi
          mov     BYTE PTR [rcx], al
  .L7:
          mov     al, BYTE PTR [rsi]
          inc     rsi
          mov     BYTE PTR [rcx], al
  .L6:
          mov     al, BYTE PTR [rsi]
          inc     rsi
          mov     BYTE PTR [rcx], al
  .L5:
          mov     al, BYTE PTR [rsi]
          dec     r9d
          inc     rsi
          test    r9d, r9d
          mov     BYTE PTR [rcx], al
          jle     .L2
  .L3:
          mov     al, BYTE PTR [rsi]
          inc     rsi
          mov     BYTE PTR [rcx], al
          jmp     .L11
  .L2:
          ret
          .cfi_endproc

___

edit: formatting, sigh

[0] v 7.3.0 64bit; gcc -S -Os -masm=intel

zeusk · on April 27, 2018

> Note the '.section .rodata' directive which places ..., seemingly interleaved with code, in a read-only data section.

It only specifies which section that part of "code" goes into, the linker pools it all up in a binary image and fills in the address for the tags when linking as directed by the linker script[0].

[0]: https://sourceware.org/binutils/docs/ld/Simple-Example.html

_o_ · on April 27, 2018

I think that Duff device is more interesting in C :)

https://en.wikipedia.org/wiki/Duff%27s_device

emily-c · on April 27, 2018

I've always found this 16 byte bubble sort implementation to be absolutely beautiful.

https://gist.github.com/jibsen/8afc36995aadb896b649

indescions_2018 · on April 27, 2018

Pokémon! GBA (ARM7) is even being used in some undergrad level OS classes. Although you may get more mileage out of RP3 development.

disassembly of Pokémon Red/Blue

https://github.com/pret/pokered

TONC GBA Programming Principles

https://www.coranac.com/tonc/text/toc.htm

cosarara · on April 27, 2018

Red and blue are gbc (z80-like, not ARM).

pcwalton · on April 27, 2018

Well, Z80 shares a common ancestor with x86 and resembles it quite a bit [1]. The assembly syntax is just different because Zilog was afraid of lawsuits…

[1]: https://en.wikipedia.org/wiki/Zilog_Z80#Datapoint_2200_and_I...

danmg · on April 27, 2018

GB was Sharp LR35902. Similar but not quite an z80 or x80.

https://realboyemulator.wordpress.com/2013/01/02/the-nintend...

dahart · on April 27, 2018

There's the ray-caster in 128 bytes.

http://finalpatch.blogspot.com/2014/06/dissecting-128-byte-r...

https://news.ycombinator.com/item?id=7940212

Cieplak · on April 27, 2018

https://2ton.com.au/HeavyThing/#appsources

mirashii · on April 27, 2018

[xchg rax,rax] has a collection of interesting snippets. You can either try to reason what they're doing yourself, or there are a variety of writeups out there for each of the snippets.

kulu2002 · on April 27, 2018

This is not x86 but certainly an optimised code https://www.pagetable.com/?p=774

chatmasta · on April 27, 2018

Pretty incredible that such world changing code fits on one page.

It’s ineresting that when tools are new, like ASM in 1978, they give high leverage to the first to use them. Microsoft was able to leverage a small amount of code into a world changing platform. Now it would be nearly impossible to do the same with a team the same size.

But in 2018, the nascent state of ML tools looks similar to the nascent state of programming tools in 1978. And indeed we are seeing entire companies built around relatively basic AI in the scheme of things. As first movers these companies have the same kind of leverage with respect to AI that Microsoft did to Software in the 1980s.

Perhaps in 2058 someone will share a link to a Tensorflow script and we will all marvel at its terseness and apparent simplicity.

msl · on April 27, 2018

> Pretty incredible that such world changing code fits on one page.

What code are you looking at? I see "Microsoft BASIC for 6502 Original Source Code" which runs for 6955 lines. By my reasoning that is more than a hundred pages.

chatmasta · on April 27, 2018

Ok, “one page” may not be a technically fair description (though it does fit on a web page). Still, I think most programmers would agree that nowadays, few world changing technologies can be expressed in 6955 lines of code. That’s what I mean by high leverage.

linker3000 · on April 27, 2018

The current time on the command line in words - inspired by the original by Jim Button. Nothing fancy, but I remember at the time it was a personal challenge to shave as many bytes from the code as possible:

https://github.com/linker3000/Historic-code-PC-Pascal-and-AS...

inetsee · on April 28, 2018

The very first programming book I read was a programmed instruction text on Machine Language. This would have been in 1965, and the book probably dated from a few years earlier. It was another four years before I could actually run a program on a real computer (an Algol program on a Burroughs B5500, input via punched cards).

I tried to find the book a few years back, for nostalgic reasons, but couldn't.

davibu · on April 27, 2018

Here's a mix between C++ and asm. https://www.codeproject.com/Articles/36907/How-to-develop-yo...

mrcarruthers · on April 27, 2018

Someone got the source code for Super Mario Bros and commented it up: https://gist.github.com/1wErt3r/4048722 (not x86 though...)

p0nce · on April 27, 2018

My favourite was a function to swap either a byte, word or dword with a single ret. The function would have three entry points, calling itself recursively like hanoi-towers.

tiffanyh · on April 27, 2018

Isn't LuaJIT assembler DYNASM often referred to as poetry

https://luajit.org/dynasm.html

gesman · on April 27, 2018

I remember I wrote a 6 bytes code to destroy information on the whole hard drive.

Kind of opposite of what you're asking for :)

hackermailman · on April 27, 2018

This Defcon 18 talk called 'Trolling with Math' about trying to defeat reverse engineering with assembly tricks https://youtu.be/y124L75ZKAc

_o_ · on April 27, 2018

Well what we were doing for pe encryptors (will try to dig out some code from tapes when I get back into civilization :D) was generating a code that was driving execution flow using SEH (structured exception handling), sometimes the code executed as intended, sometimes the exception was trigered (with faulty code, again generated) to continue at completely different part of code. This was the complete pain in the "neck" to debug (timing the code execution is a great way to tell if someone is debugging it, and you just continue into some other generated code that is missleading the reverser), and completely impossible to solve on paper (well, with huge amount of knowlidge and a lot of time, you could do it :D). That's why I am laughing with todays overuse of try/catch/throws, we were using it to obfuscate execution flow and today it is used in normal coding with same devastating effect :D

raldu · on April 27, 2018

Check out these rather old examples for inspiration,

http://www.win32assembly.programminghorizon.com/source.html

utopcell · on April 27, 2018

The following snippet swaps two values (rax, rcx) only if they are out of order (i.e. rax > rcx). It does so without using any branches.

  mov rdx, rax
  cmp rax, rcx
  cmovg rax, rcx
  cmovg rcx, rdx

mtve · on April 27, 2018

not quite the assembly, but you may look at small intros at https://www.pouet.net/prodlist.php of sizes 32 bytes and so on.

many intros are provided with source codes, and others could easily be viewed in disassemblers.

especially look at intros by such brilliant people like Digimind and Řrřola.

jl2718 · on April 27, 2018

I was quite impressed with the Wolf3D rendering engine when somebody released the code years back.

0x7f800000 · on April 27, 2018

Michael Abrash's assembly optimizations for the Quake engine.

spati112 · on April 27, 2018

how du a hack

_o_ · on April 27, 2018

I really love this one, boot sector chess :)

https://gist.githubusercontent.com/jwieder/7e7e643cc71c81f63...

neengineering · on April 27, 2018

There were some pretty good Win32 API tutorials back in the day. It was similar to this, if not it:

http://win32assembly.programminghorizon.com/

It was quite enlightening to see common program constructs done only in assembly.

_o_ · on April 27, 2018

What about something here: http://z0mbie.daemonlab.org/

This guy was a genius (those were the times you didn't do it for profit, it was a game). Google for zmist...

_o_ · on April 27, 2018

The coolest one liner is xor eax,eax :) 1 cycle for setting register to 0.

thecompilr · on April 27, 2018

Yeah, but with the side effect of clearing the flag registers. So can't be used between operations that rely on flags. Alternatively it is great to break false flag dependency.

guitarbill · on April 27, 2018

Truly beautiful, I was so ecstatic when this clicked with me when I was first learning assembly. So much depth in one "simple" instruction.

senatorobama · on April 27, 2018

Linux boot.S