Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone with a background in quantum chemistry and some types of machine learning (but not neural networks so much) it was a bit striking while watching this video to see the parallels between the transformer model and quantum mechanics.

In quantum mechanics, the state of your entire physical system is encoded as a very high dimensional normalized vector (i.e., a ray in a Hilbert space). The evolution of this vector through time is given by the time-translation operator for the system, which can loosely be thought of as a unitary matrix U (i.e., a probability preserving linear transformation) equal to exp(-iHt), where H is the Hamiltonian matrix of the system that captures its “energy dynamics”.

From the video, the author states that the prediction of the next token in the sequence is determined by computing the next context-aware embedding vector from the last context-aware embedding vector alone. Our prediction is therefore the result of a linear state function applied to a high dimensional vector. This seems a lot to me like we have produced a Hamiltonian of our overall system (generated offline via the training data), then we reparameterize our particular subsystem (the context window) to put it into an appropriate basis congruent with the Hamiltonian of the system, then we apply a one step time translation, and finally transform the resulting vector back into its original basis.

IDK, when your background involves research in a certain field, every problem looks like a nail for that particular hammer. Does anyone else see parallels here or is this a bit of a stretch?



I don't think the analogy holds: even if you forget all the preceding non linear steps, you are still left with just a linear dynamical system. It's neither complex nor unitary, which are two fundamental characteristics of quantum mechanics.


I think you're just describing a state machine, no? The fact that you encode the state in a vector and steps by matrices is an implementation detail...?


Perhaps a probabilistic FSM describes the actual computational process better since we don’t have a concept equivalent to superposition with transformers (I think?), but the framework of a FSM alone doesn’t seem to capture the specifics of where the model/machine comes from (what I’m calling the Hamiltonian), nor how a given context window (the subsystem) relates to it. The change of basis that involves the attention mechanism (to achieve context-awareness) seems to align better with existing concepts in QM.

One might model the human brain as a FSM as well, but I’m not sure I’d call the predictive ability of the brain an implementation detail.


| context window

I actually just asked a question on the physics stack exchange that is semi relevant to this. https://physics.stackexchange.com/questions/810429/functiona...

In my question I was asking about a hypothetical time-evolution operator that includes an analog of a light cone that you could think of as a context window. If you had a quantum state that was evolved through time by this operator then I think you could think of the speed of light being a byproduct of the width of the context window of some operator that progresses the quantum state forward by some time interval.

Note I am very much hobbyist-tier with physics so I could also be way off base and this could all be nonsense.


I’m way out of my depth here, but wouldn’t such a function have to encode an amount of information/state orders of magnitude larger than the definition of the function itself?

If this turns out to be possible, we will have found the solution to the Sloot mystery :D

https://en.m.wikipedia.org/wiki/Sloot_Digital_Coding_System


The article references patent “1009908C2” but I can’t find it in the Dutch patent site, nor Google Patent search.

The rest of the article has “crank” written all over it; almost certainly investor fraud too - it’d be straightforward to fake the claimed smartcard video thing to a nontechnical observer - though not quite as egregious as Steorn Orbo or Theranos though.


How can I not have heard of this before?! Sounds like the plot for a thriller movie.


Not who you asked (and I don't quite understand everything) but I think that's about right, except in the continuous world. You pick an encoding scheme (either the Lagrangian or the Hamiltonian) to go from state -> vector. You have a "rules" matrix, very roughly similar to a Markov matrix, H, and (stretching the limit of my knowledge here) exp(-iHt) very roughly "translates" from the discrete stepwise world to the continuous world. I'm sure that last part made more knowledgeable people cringe, but it's roughly in the right direction. The part I don't understand at all is the -i factor: exp(-it) just circles back on itself after t=2pi, so it feels like exp(-iHt) should be a periodic function?


Yes, exp(-iHt) means the vector state is rotating as time passes, and it rotates faster when the Hamiltonian (energy) is bigger. This rotation gives the wave like behavior. Slightly related, there is an old video of Feynman where he tries to teach quantum mechanics to some art students, and he explains this complex rotation and its effects without any reference to math.


I've been thinking about his a bit lately. If time is non-continuous then could you model the time evolution of the universe as some operator recursively applied to the quantum state of the universe? If each application of the operator progresses the state of the universe by a single planck-time could we even observe a difference between that and a universe where time is continuous?


So one of the most "out there" non-fiction books I've read recently is called "Alien Information Theory". It's a wild ride and there's a lot of flat-out crazy stuff in it but it's a really engaging read. It's written by a computational neuroscientist who's obsessed with DMT. The DMT parts are pretty wild, but the computational neuroscience stuff is intriguing.

In one part he talks about a thought experiment modeling the universe as a multidimensional cellular automata. Where fundamental particles are nothing more than the information they contain. And particles colliding is a computation that tells how that node and the adjacent nodes to update their state.

Way out and not saying there's anything truth to it. But it was a really interesting and fun concept to chew on.


Definitely way out there and later chapters are what I can only describe as wild conjecture, but I also found it to be full of extremely accessible foundational chapters on brain structure and function.


Im working on a model to do just that :) The game of life is not too far off either.


You might enjoy his next book: Reality Switch.


I think Wolfram made news proposing something roughly along these lines.

Either way, I find Planck time/energy to be a very spooky concept.

https://wolframphysics.org/


This sounds like the Bohmian pilot wave theory (which is a global formulation of QM). ... Which might be not that crazy, since spooky action at a distance is already a given. And in cosmology (or quantum gravity) some models are describing a region of space based only its surface. So in some sense the universe is much less information dense, than we think.

https://en.m.wikipedia.org/wiki/Holographic_principle


Not a direct comment on the question but I had a math PhD as an intern before. One of his comments was having tons of high dimensional linear algebra stuff was super advanced 1900s and has plenty of room for new cs discovery.

Didn’t make the “what was going on then in physics “ connection until now.


So what you are saying is that, we've reached the point where our own most sophisticated computer models are starting to approach the same algorithms that define the universe we live in? Aka, the simulation is showing again?


I only understand half of it, but it sounds very interesting. I've always wondered, if the principle of stationary action could be of any help with machine learning, e.g. provide an alternative point of view / formulation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: