Sincere question inspired purely by the headline: how many important ML architectures aren't in some way based on some proposed model of how something works in the brain?
(Not intended as a flippant remark, I know Quanta Magazine articles can generaly safely be assumed to be quality content, and that this is about how a language model unexpectedly seems to have relevance for understanding spatial awareness)
- Some families of ML techniques (SVMs, random forests, gaussian processes) got their inspiration elsewhere and never claimed to be really related to how brains do stuff.
- Among NNs, even if an idea takes loose inspiration from neuroscience (e.g. the visual system does have a bunch of layers, and the first ones really are pulling out 'simple' features like an edge near an area), I think it's relatively uncommon to go back and compare specifically what's happening in the brain with a given ML architecture. And a lot of the inspiration isn't about human-specific cognitive abilities (like language), but is really a generic description of neurons which is equally true of much less intelligent animals.
> I think it's relatively uncommon to go back and compare specifically what's happening in the brain with a given ML architecture.
Less common but not unheard of. Here's one example, primarily on focused on vision: http://www.brain-score.org/
DeepMind has also published works comparing RL architectures like IQN to dopaminergic neurons.
The challenge is that its very cross-disciplinary and most DL labs don't have a reason to explore the neuroscience side while most neuro labs don't have the expertise in DL.
Is it necessary to simulate the quantum chemistry of a biological neural network in order to functionally approximate a BNN with an ANN?
A biological systems and fields model for cognition:
Spreading activation in a dynamic graph with cycles and magnitudes ("activation potentials",) that change as neurally-regulated heart-generated electron potentials (and,) reverberate fluidically with intersecting paths. And a partially extra-cerebral induced field which nonlinearly affects the original signal source through local feedback; Representational shift.
M-theory String theory is also 11D, but IIUC they're not the same dimensions
Diffusion suggests fluids, which in physics and chaos theory suggests Bernoulli's fluid models (and other non-differentiable compact descriptions like Navier-Stokes), which are part of SQG Superfluid Quantum Gravity postulates.
Can e.g. ONNX or RDF with or without bnodes represent a complete connectome image/map?
> > New research from a Google team proposes replacing the self-attention sublayers with simple linear transformations that “mix” input tokens to significantly speed up the transformer encoder with limited accuracy cost. Even more surprisingly, the team discovers that replacing the self-attention sublayer with a standard, unparameterized Fourier Transform achieves 92 percent of the accuracy of BERT on the GLUE benchmark, with training times that are seven times faster on GPUs and twice as fast on TPUs."
> > Would Transformers (with self-attention) make what things better? Maybe QFT? There are quantum chemical interactions in the brain. Are they necessary or relevant for what fidelity of emulation of a non-discrete brain?
AlphaGo is a pretty good example here. It uses a neural net for evalutation, and that's vaguely inspired by the brain, sure. But it employs a Monte Carlo based game tree search which is probably very different from how humans think.
In addition it learns by iterated amplification and distillation: it plays games against itself, where one player gets more time, hence will be a stronger player(amplification). The weaker player then uses this stength differential as a fitness function to learn(distillation). Rinse and repeat. That's really nothing like how humans learn these games. While playing stronger players and evaluating is a huge part of becoming stronger, there's also a lot of targeted exercises, opening/endgame theory, etc. Humans can't really do that type of training at all.
> But it employs a Monte Carlo based game tree search which is probably very different from how humans think.
It's not _that_ different from how humans play. We have pattern matching that points out likely places, we read out what happens and try to evaluate the result. Humans are just less methodical at it really.
Parent said “different from how humans think”, not play, which seems key. Your description is very broad.
These machines don’t seem to carry narratives or plans yet (if they would benefit from them or be encumbered by them seems to be an open question).
Watching the machines play they have zero inertia. If the next opportunity means a completely inverted play strategy has a marginally better chance of winning, they will switch their entire approach.
Humans don’t typically do this, although having learned from machines that it can produce better outcomes perhaps we will start moving away from this local maximum.
> Watching the machines play they have zero inertia. If the next opportunity means a completely inverted play strategy has a marginally better chance of winning, they will switch their entire approach.
In Go, especially at the high level, this isn't that far outside of the norm. In particular, you see players play in other areas (tenuki) at what to a weaker player would look like pretty random times, depending on what's most urgent or biggest.
Computer go players aren't too chaotic. They're just _very_ good at some things that are already high-level-player traits. A computer will just give you what you want, but suddenly it's just not actually that good. It feels like Bruce Lee's flow/adaptation based fighting style applied to a go board.
I mean I guess you could argue some calculation kind of looks like a type of random walk (with intuited moves) based search. But that's kind of all AlphaGo does, and it does it so efficiently that's all it really needs to do.
I'm not a go player, but at least in chess, which is game theoretically very similar modulo branching factor, human thinking is much more of a mish mash of different search methods, different ways of picking moves, and strategic ideas(which I like to think of as sort of employing something more akin to A* or Dijkstra).
I.e there's a rough algorithm like this happening
1. Asses the opponent's last move, using some sort of abductive reasoning to figure out what the intent was and whether there's a concrete threat. If so, try to refute the threat(This can sometimes be a method of elimination search(best node search is a similar algorithm) if the candidate moves are few enough, or a more general one if not), find counterplay, find the lesser evil, or resign
2. If not, do you want to stop their plan or is it just a bad plan?
3. If you do, how?
4. If not, do you have any tactical ideas? search all the forcing moves in some intuitive order of plausibility and play the strongest one you find
5. If not, what is your plan? If you had a plan before, does it still make sense?
6. If not, find a new plan
7. Once you have a plan, how do you accomplish it? Break it into subgoals like "I want to get a knight to e5"
8. find the shortest route for a knight to get to e5(pathfinding while ignoring the opponent)
9. is there a tactical issue with that route?
10. rinse and repeat until you find the shortest route that works tactically.
I could probably elaborate this list for hours, getting longer and longer. But you probably get the idea at this point.
You are definitely right that computer players are missing some kind of narative-based reasoning for their own moves and for their oppoents' moves. In go it doesn't feel that extreme though. We're taught not to hold too hard to our plans anyway, and most good moves from the opponent will have more than one intention. So you can't get that far relying just on reading what their goal is.
How computers think isn't exactly how we do, for go, but it's close enough to rhyme pretty heavily imo.
Saying that neural networks are "similar to" or that they "mimic" the human brain can be misleading. Today's architectures are the byproduct of years of research and countless GPU-hours dedicated to training and testing architecture variants. Many neuroscience-based architectures that mimic the brain better than transformers end up performing much worse.
The Quanta article is overall pretty reasonable, but I've unfortunately seen other news outlets regurgitate this kind of blanket statement for the better part of a decade. The very first models were perhaps 100% inspired by the brain, but today's ML research more or less follow a "whatever works best" principle.
The convolution kernels in the first layers of AlexNet and all its DL image processing descendants converge to the Gabor filters (or some variation of) which are the response functions of the neurons in the first layer of the visual cortex. About 15 years before AlexNet there were works showing that such type of filter is kind of mathematically optimally encoding for the feature based image processing. (So, theoretically one could have just pre-generated the first layers in the net and use them fixed thus cutting significant time/effort on the training - i myself wanted to do it 20 years ago, yet just didn't get to it :)
I'm pretty sure that for middle layers in image DL as well as for transformers in lang we have a kind of similar optimality, ie. something like maximum entropy filter(separator/aggregator at higher levels?) at a given level of granularity/scale like Gabors at the first feature level.
> the first layers of AlexNet and all its DL image processing descendants converge to the Gabor filters (or some variation of)
and
> 15 years before AlexNet there were works showing that such type of filter is kind of mathematically optimally encoding for the feature based image processing
For the first - you can just look at the original AlexNet paper. The kernels are unmistakably strikingly Gabor-like. Some differences, like cross-color, are kind of giving rise to possibly interesting questions - is it improvement or deficiency(i.e. more training would correct) over biology? or may be it is just real-valued projection from the [plausible] fact that the optimal is complex-valued?
>?
I don't have that specific reference i had in mind that was published 15-20 years ago, yet you can trace that line of thought development through the works like these for example (there have been a bunch of them in the 199x and into 200x) :
Transformer networks have deeper connections to dense associative memory. For example, the update rule to minimize the energy functional of these Hopfield networks converges in a single iteration and coincides with the attention mechanism [1].
I think at some point similarities will naturally emerge. Smart moves in design space. That being said these similar designs will probably be minuscule compared to the overall architecture
It is my opinion that pretty much all architectures already exist in the brain for some use or another. Otherwise we wouldn’t be able to reason about them
Still outside. It's our thoughts about them, models, and concepts that are in the brain.
Plus, even if it was true, this wasn't the parent's point (that things exist in the brain while we're reasoning about them). His point (also wrong) was that different architectures must exist as structures in the brain (not as concepts we think and memories etc., but as parts of brains matter organization and wiring) for us to be able to reason about it.
On the contrary, I view it as a counterexample to "all architectures already exist in the brain for some use or another," which disproves your point. Let's not make the mistake of a fallacy fallacy here!
Perhaps you would like to expound or clarify your point to rule out the edge case of cars not existing in 5000 BC, but the models to derive cars 5000 years later suddenly came into being?
(Not intended as a flippant remark, I know Quanta Magazine articles can generaly safely be assumed to be quality content, and that this is about how a language model unexpectedly seems to have relevance for understanding spatial awareness)