Hacker News new | past | comments | ask | show | jobs | submit login
The Solid-State Register Allocator (mattkeeter.com)
44 points by jmillikin on Oct 4, 2022 | hide | past | favorite | 11 comments



Uh? This *is* the LuaJIT register allocator. Period.

Code published 2009. Description published here: https://lua-users.org/lists/lua-l/2009-11/msg00089.html (ignore the TLS cert error).

Coming up with a silly markting name, writing a naive implementation and then claiming it's their invention is impertinent. Especially since they mention LuaJIT itself in the text ...


Hi Mike,

I promise that this was a genuine case of parallel evolution – I didn't read any LuaJIT source or documentation while writing the code or article.

Afterwards, I searched for "reverse linear scan register allocation", discovered the blog post which referenced LuaJIT, and linked it in the text.

I've updated the article to move the LuaJIT reference to the top, and adjusted the phrasing to be clear that this is not an invention, it's simply an interesting implementation. What you call "naive", others may consider didactic :)

Let me know if you have other feedback or suggested changes, either here or via email (in my profile).


You may want to clarify that in the GitHub repo, too. See my issue there.

If you want to go the didactic route, then consider documenting the improvements over the naive implementation: register hinting, register priorities (PHI), two-headed register picking, fixed register picking, optimized register picking for 2-operand instructions (x86/x64), register pair picking, ABI calling-conventions, weak allocations, cost heuristics, eviction heuristics, lazy/eager spill/restore, rematerialization, register shuffling (PHI) with cycle breaking, register renaming, etc. That's all in ~2000 lines of lj_asm.c.


Done, I've added the same disclaimer to the Github repo.

Given that your Github issue [1] was originally titled "Take Down Notice", I'm now hesitant to read any LuaJIT code.

If I read lj_asm.c and learn from it, will you try to take down any register allocation code that I write in the future?

[1] https://github.com/mkeeter/ssra/issues/1


I had already changed the title after your reply. The objection is about the naming, which implies an invention claim without further explanation. It's not about the code.


Wait,

Are you demanding that independent implementations of an idea you explicitly dedicated to the public domain must then describe the various ways your version is “superior”?

Who hurt you?


This is such an amazingly uncharitable response.

> I cannot possibly know all of the literature

Maybe this applies to other people too?


I think he gets dead code elimination for free. When you write to a variable that has no register allocated, it must be dead. Exception is whatever values leave the program or instructions with a side effect.


Scanning basic blocks in reverse is how you calculate liveness. This is ancient, documentedin texts like the Dragon Book, 1988 (from whose perspective it is already old hat).

Liveness extends from definition to use, so you can think about it in the ordinary forward direction. Like when you read code in a high level language, you see some variable definitions near the top of the scope and then look for uses below.

But one problem is that scanning in the forward direction, you don't know that a definition is actually used. A definition of the temporary isn't what makes it live; the uses of the definition is what makes it live. Liveness flows backwards from uses to definitions. The uses say "looking backwards, there is the definition I'm using."

Another problem is, where does the lifetime end? It ends with the last use. Scanning forward, we don't know whether any given use we have seen so far is the last use.

In reverse, the events are in the right order. See a use, the temporary is live there and in prior instructions. If it was live already, nothing changes. See a definition, and it is dead prior to that definition.

This is directly relevant to assigning the temporaries to a smaller set of physical registers.

For that matter, you can use a backward scan to simply reduce the temporaries to a minimal set. So that is to say, take some code that keeps using new temporaries throughout, and do this backward allocation algorithm, using unlimited temporaries (no need to spill). You end up with a version of the code using a smaller set of temporaries.

  t1 = a(b, c)
  t2 = d(t1)
  t3 = e(t1)
  t4 = f(t2)
  t5 = g(t4)
We want to reduce the t's, but are not constrained to a fixed set of machine registers; we can allocate as many t's as we want. So working backwards:

  t5 = g(t4)
t5 is defined. It has no next use; we have not seen it in our backward scan. So we assign it to t'1 and immediately free t'1, so our free stack holds (t'1). t4 is a use; that is live: we assign it to t4 == t'2. t'2 is live, t'1 isn't:

  t'1 = g(t'2)
now we see t4 = f(f2). That means t4 is before this point, which means t'2 is dead. We push t'2 onto our free stack so it holds (t'2 t'1). t2 is newly live. What can we map it to? Why the t'2 we just freed. So t2 == t'2. We generate

  t'2 = f(t'2)
  t'1 = g(t'2)
Next we have t3 = e(t1). t3 isn't in the use list (hasn't been seen before) and is here dying. We can allocate it to the t'1 we we freed earlier, emptying our free stack to (). Then since it is dead, we immediately free it, so the free stack is (t'1). The used t1 is new, and we allocate it t'1, emptying the free stack:

  t'1 = e(t'1)
  t'2 = f(t'2)
  t'1 = g(t'2)
Next is t2 = d(t1). Aha, here t1 is still live which we mapped to t'1. And t2 is being defined; we have that live as t'2. t'2 is freed now making our free stack (t'2)"

  t'2 = d(t'1)
  t'1 = e(t'1)
  t'2 = f(t'2)
  t'1 = g(t'2)
Finally t1 = a(b, c) is where t1 is defined, mapped to t'1. t'1 goes dead so our free stack is (t'1 t'2).

  t'1 = a(b, c)
  t'2 = d(t'1)
  t'1 = e(t'1)
  t'2 = f(t'2)
  t'1 = g(t'2)
  
Now we drop the apostrophe:

  t1 = a(b, c)
  t2 = d(t1)
  t1 = e(t1)
  t2 = f(t2)
  t1 = g(t2)
Look, everything is done with just two temporaries now instead of 5.


Just skimmed, central idea is to linear scan in reverse?


Yep. Allocate in opposite order to execution, assume no branches / control flow / phi nodes. Single pass.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: