There has been some work, but the problem is that its such a massive search space. Philosophically speaking, if you look at how humans came into existence, you could make an argument that the process of evolution from basic lifeforms can be represented as one giant compute per minute across of all of earth, where genetic selection happens and computation proceeds to the next minute. Thats a fuckload of compute.
In more practical terms, you would imagine that an advanced model contains some semblance of a CPU to be able to truly reason. Given that CPUs can be all NAND gates (which take 2 neurons to represent), and are structured in a recurrent way, you fundamentally have to rethink how to train such a network, because backprop obviously won't work to capture things like binary decision points.
I thought the whole point of neural networks was that they were good at searching through these spaces. I'm pretty sure OpenAI is pruning their models behind the scenes to reduce their costs because that's the only way they can keep reducing the cost per token. So their secret sauce at this point is whatever pruning AI they're using to whittle the large computation graphs into more cost efficient consumer products.
When you train a neural network, it is not search, it is descending through a curve.
If you were to search for billions of parameters by brute force, you literally could not do it in the lifespan of the universe.
A neural network is differentiable, meaning you can take the derivative of it. You train the parameters by taking finding gradient with respect to each parameter, and going in the opposite direction. Hence the name of the popular algorithm, gradient descent.
A biological neural network is certainly not differentiable. If the thing we want to build is not realizable with this technique, why can't we move on from it?
Gradient descent isn't the only way to do this. Evolutionary techniques can explore impossibly large, non-linear problem spaces.
Being able to define any kind of fitness function you want is sort of like a super power. You don't have to think in such constrained ways down this path.
The issue is that its still a massive search space.
You can do this yourself, go play nandgame, and beat it, at which point you should be able to make a cpu out of nandgates. Then set up a rnn that is the same layers at total layers of the nandgates and as wide as all the inputs, with every output being fed back into the first input. Then do PSO or GA on all the weights and see how long it takes you to make a fully functioning cpu.
>A biological neural network is certainly not differentiab
Biology is biology and has its constraints. Doesn't necessarily mean a biologically plausible optimizer would be the most efficient or correct way in silicon.
>If the thing we want to build is not realizable with this technique, why can't we move on from it?
All the biologically plausible optimizers we've fiddled with (and we've fiddled with quite a lot) just work (results wise) like gradient descent but worse. We've not "moved on" because gradient descent is and continues to be better.
>Evolutionary techniques can explore impossibly large, non-linear problem spaces.
Sure, with billions of years (and millions of concurrent experiments) on the table.
The search space is all off too wide, difficult to parameterize, and there is a wide gap between effective and ineffective architectures - ie: a very small change can make a network effectively DOA.
Notably architecture search was popular for small vision nets where the cost of many training runs was low enough. I suspect some of the train-then-prune approaches will come back, but even there only by the best funded teams.