Hacker News new | past | comments | ask | show | jobs | submit login

The folks at Symbolica are also applying topos theory, category theory, etc to AI, backed by serious folks such as Khosla and Wolfram. https://finance.yahoo.com/news/vinod-khosla-betting-former-t...



Wolfram is serious about his software, but, to my mind, has been closer to a crackpot scientist than a real researcher the last few decades. Him backing something up diminishes my subjective probability that this is something "serious"


I was sure this was going to be a case of spraying jargon at investors, but they've actually produced an interesting paper in this direction (in collaboration with some DeepMind people): https://arxiv.org/abs/2402.15332


this paper is not good: it's easy to describe things in a categorical language (which this paper does), but not as easy to draw insights from that framework (which this paper does not do)


Classic application of abstract math in something sexy at the time. I once saw a paper describing a trading strategy using stochastic calculus. Turns out it boiled down to buying when price went under some indicator variable, selling when above another.


Stochastic calculus is the only available tool for proving things about continuous-time stochastic processes. There aren't any alternatives, save guessing at criteria and backtesting them.


Yes, for sure useful for the appropriate mathematics. My point is, the trading strategy was a simple heuristic wrapped into overly complicated definitions and proofs. The complicated mathematics added exactly nothing to the application.


I think the point was they (probably) used the abstract math to prove some desirable properties about the trading strategy?


Yes, it was something like that. But "desirable" here means something very different for mathematicians and traders actually applying the strategy (I.e., they don't care at all, and neither does anyone else working in finance).


<Doob rolls over in his grave>ugh</s>


Huge difference between stochastic (processes, ODEs, PDEs, etc) and category theory. One makes money every day and the other is only good for writing papers.


Don't get me wrong, stochastic calculus is very useful for options pricing for example. Totally useless in the case I was describing.


It is rarely within reach to draw new insights from applied category theory, in particular because of the Yoneda lemma and the greater familiarity of sets and functions, and also because as algebraic objects categories have very few properties.


Bare categories do not give much insight in pure mathematics either, it's just a common language; interesting things are categories with lots of extra structure like toposes, derived categories, infinity-categories, and so on.


I was definitely on a group chat of ML/category theory people roasting this paper when it came out, I’m sure I wasn’t the only one.


It's not that easy to do it well =) I'm interested in your objections if you would be willing to elaborate.


Ehh, I’m an ML scientist with a PhD in category theory and I really don’t see it going anywhere. The comparison to geometric deep learning is especially misplaced, because I think you’ll find that Bronstein’s school sets things us as a group action, etc, but then does a lot of really hard math to actually tease out properties of how information flows through a gnn. Here they just do the Applied Category Thing and say they’ve drawn a picture so QED.


Would you be willing to elaborate a bit on that? So my understanding here is that they are using monad algebras to model the kinds of constraints you might care about (symmetry invariance/translation invariance/etc), and then by instantiating these generic monads over the category of vector spaces and fiddling through the diagrams, you recover the usual constraints on the weights in your neural net. I don't think it's supposed to work around the fact that you have to do a lot of hard math in that step, but it gives you a blueprint for doing it that generalises symmetry constraints. So you could come up with some interesting idea for a NN layer/block and use this schematic to direct your derivation of the corresponding constraints on your neural net, I guess?

So it seems valuable to me in that respect, especially if they can achieve what they want (logical inference-rule-invariant NN blocks, PL semantics invariant stuff, etc).

Anyway, I'm interested in your perspective/objections! (if it's technical that's fine too, I have a lot of maths background)


Oh, I mean, the most obvious problem with that idea is there’s no way to ensure that whenever you update your weights that they will still satisfy those constraints.

This idea has been well studied in mathematical physics, going back to Poincaré, where you work with Lie groups and Lie group actions on your action space. The reason this works, however, is you get a Lie algebra/Lie algebra action that more-or-less behaves like the tangent bundle so the same basic theory around optimization works.

The main problem is they’re generalizing in the wrong direction. Everything still works when you move to Lie groupoids/Lie algebroids. You still get something like a tangent bundle, so ideas like gradient descent or Euler-Lagrange equations still make sense. Thats not the case with a generic monad - in fact the authors don’t seem to acknowledge the fact that there is some work to do regarding compatibility between the monad and derivative to ensure that gradient-based optimization will still make sense.

So, basically, anyone who is familiar with the basics of optimization on manifolds or Lie groups will immediately recognize this approach as hopelessly naive. All they’ve _really_ managed to do is draw some diagrams and say “wouldn’t it be cool if these things were preserved by gradient descent.”


Ah, I see, so they've basically found a nice way to express the easy stuff (finding the constraints) but the devil's in the optimisation, of course.

Thanks for taking the time to reply. Coincidentally, I'm doing a project with Arnold's book on CM at the moment, so that all makes perfect sense to me.


Right, basically the generalization goes: Lie groups -> Lie group actions -> Lie groupoids. This is not a new observation (in fact Arnold’s fluid mechanics can be rephrased using lie groupoids https://tspace.library.utoronto.ca/bitstream/1807/91859/1/Fu..., and big names like Alan Weinstein have worked in that area). I don’t think the authors actually understand that story, so they very naively went groups -> group actions -> monads. If it went that way someone in mathematical physics or optimization would have stumbled onto other concrete examples. But they haven’t, because it doesn’t.


“A new kind of deep learning.”

Edit: I need to think more before I comment, “A new kind of data science” was right there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: