This Advanced Data Structure course [1] which, while not accounting for any particular hardware, had some interesting "cache-oblivious" algorithms (i.e. designed to make the best use of your cache no matter the cache sizes.) Is that the type of work you are thinking about?
[1] https://courses.csail.mit.edu/6.851/fall17/lectures/