also made known as "game semantics" by Hintikka, where the two players are "me" and "the Nature", and the two players read alternatively the elements of a quantified mathematical proposition.
Each time the Nature finds a counterexample, she wins, and if "me" is able to select the correct OR branches or to find the existing elements of \exists symbols, "me" wins.
I recently ran into the concept of a "variational inequality". A variational inequality defines the optimality conditions for a problem in terms of some F. It basically says that x is the optimal solution to the problem if "for all y, dot(F(x), y - x) >= 0".
Regular convex optimization (set gradient equal to zero) is a special case of a variational inequality, if you define F(x) to be the gradient of the objective. But one can also define F(x) for a saddle-point problem, or even in some cases a general-sum Nash equilibrium.
Anyway, I don't know very much about this topic, but it was a mind-expanding idea for me to run into, in terms of generalizing the connections between optimization and game-theoretic equilibria, and might be of interest to people who also find this work interesting.
They claim that it allows feasible computation of PCA (principal component analysis) on very large datasets. The key idea as I understand it is, instead of using some global technique to solve PCA (SVD), which cannot take advantage of the parallelism of GPU / TPU, they formulate PCA as a multiple-agent problem where each agent is trying individually to optimize its own "goal" (maximally explaining variance and being orthogonal to other "players"). There are two key non-trivialities, one is that the Nash equilibrium of such a formulation is achieved at the solution of "classical" PCA, and the other is that iterative, independent gradient ascent converges to this equilibrium.
This is my understanding as well. Section 3 sub "A decentralized algorithm" of their paper discusses the computation approach:
"In practice we can assign each eigenvector update to its own device (e.g. a GPU or TPU). Systems with fast interconnects may facilitate tens, hundreds or thousands of accelerators to be used. In such settings, the overhead of broadcast(vˆi) is minimal. We can also specify that the data stream is co-located with the update so vˆi updates with respect to its own Xi,t. This is a standard paradigm for e.g. data-parallel distributed neural network training. We provide further details in Section [6]."
So a general methodology that permits mapping analysis to "embarrassingly parallel" computational models.
Tangent: Given the provenance of PCA (Karl Pearson) this is all a bit ironic ..
> some global technique to solve PCA (SVD), which cannot take advantage of the parallelism of GPU / TPU
My reading of https://developer.download.nvidia.com/video/gputechconf/gtc/... suggests that it's just the usual modern SVD algorithm, in particular the QR factorization part, that's the limiting factor, and with some thought there are ways to do better.
Randomized SVD already allows this. Typically the calculation of PCA on a data set around the size that they use (1-100TB) is I/O bound. I work in genomics where this is a common problem. Their solution is interesting, but I'm not really sure if it would be more efficient due to parallelism. Better accuracy is important, but it's also unclear how much that affects the major PCs.
The idea of using a multi-agent game, rather than minimizing a single loss function, is already used in GANs - and there it seems to be an very powerful generalization of optimization. Personally I would say the idea is extremely interesting.
GANs have lots of applications, and PCA is useful for various tasks in data-analysis: compression, feature selection, reduced-dimensional modelling. I doubt finding applications will be a problem.
Reliably finding solutions (Nash equilibria), is much harder than optimization for minimum loss however. So I see these being much harder to train than loss-based models.
I should say this is the first CS paper I've ever read that evoked a mild sense of dread in me. Although the positive applications can and no doubt will be substantial.
I've been worrying about tech for a while now, but mostly on the cybersecurity side since I figured AI was still too far away. That more or less changed when I read a post by Gwern on GPT-3, where he makes a compelling case that "there is a hardware overhang" or, in other words, as research in AI advances we're going to find that we do not need substantially more compute in order to achieve the capabilities of advanced AI. I think this was the post where he talked about it:
"GPT-3 could have been done decades ago with global computing resources & scientific budgets; what could be done with today’s hardware & budgets that we just don’t know or care to do? There is a hardware overhang."
And thinking about it more, this multi-agent method should work in the offensive cybersecurity world if one could figure out how to crack the reward functions like they did for PCA. I think the core insight they found was a hierarchy of agents. If one could formulate the reward functions for the different agents intelligently enough it could allow layered privilege escalation to achieve RCE without random thrashing.
also made known as "game semantics" by Hintikka, where the two players are "me" and "the Nature", and the two players read alternatively the elements of a quantified mathematical proposition.
Each time the Nature finds a counterexample, she wins, and if "me" is able to select the correct OR branches or to find the existing elements of \exists symbols, "me" wins.