There isn't any memory of how it got to where it did because all weights are eva...

vrighter 21 days ago | parent | context | favorite | on: Training LLMs to Reason in a Continuous Latent Spa...

There isn't any memory of how it got to where it did because all weights are evaluated all the time. It got there through the entirety of the network. There is no logic, just (mostly) a bunch of multiply-accumulates.