Hacker Newsnew | past | comments | ask | show | jobs | submit | antixk's commentslogin

Hi, mind sharing the links of your talks or slides? Thanks!


Not sure if you have heard of Pytorch Mobile but it is very possible[0]

[0] https://pytorch.org/tutorials/beginner/deeplabv3_on_ios.html


The big thing that PyTorch Mobile is lacking compared to TF Lite is on-device accelerator support (GPU/DSP/etc.) (there's experimental support for NNAPI https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.ht..., but this is a hack).


Patchouli is of Tamil origin as well. From pachai - green, ilai - leaf.


Learnt something new today! I never imagined that to have originated from Tamil (I am a native speaker of the language) and always thought it had some esoteric European origin. From the phonetics, this definitely sounds quite plausible that it originated in Tamil. Thanks for the pointer.


Becuase unless you have a certain (high) level of sparsity, sparse formats are infact ineffective in storing. There are cases where sparse formats take more memory than storing dense tensors.


Sure. It’s still one of the methods to compress models. You might be surprised how often the weights are really quite sparse.


Did you study at JAIST? Sorry to ask, but when you said "deep in the mountains outside of Kanazawa", that's the only university that popped up in my mind that was more open to foreigners(which I assume you are/were).


It was Kindai (Kanazawa Daigaku). They moved from the castle inside Kanazawa to a new campus outside the city. It might be a stretch to say deep in the mountains, it was probably a 20 minute bus ride from downtown Kanazawa. I haven't been there in 25 years and I imagine it has all changed significantly.


I guess OP meant that each tensor core in the latest NVIDIA GPUs fundamentaly performs 4x4 matrix multiplication [0].

[0] https://nla-group.org/2020/07/21/numerical-behaviour-of-tens...


That's also not true. They do 4x4, 8x8, and 16x16.


Which is fine for the point being made here. There's no issue with n^3 for small matrices.


As a student of Information Geometry, let me provide some context why this is such an exciting field. Usually geometry reminds someone of triangle and other shapes that we see in our immediate surroundings, which is called as Euclidean or just flat geometry i. e a space where Euclid's axioms are valid. But there is a lot more to the story, we can bend/reformulate or even exclude certain axioms to conjure up new spaces - such as hyperbolic geometry or in general some complex curved geometry. Turns out, these indeed have wide applications in our real world.

Now what's all this gotta do with Information? Usually, information is represented in terms of statistical distributions, from Shannon's information theory. What the early founders of IG observed is that, these statistical distributions can be represented as points on some curved space called a Statistical Manifold. Now, all the terms used in information theory can be reinterpreted in terms of geometry.

So, why is it so exciting? Well in Deep Learning people predominantly work statistical distributions, some even without realising it. All our optimizations involve reducing distance between some statistical distributions like the distribution of of the data and the distribution that the neural network is trying to model. Turns out, such optimization when done in the space of statistical manifold, amounts to the gradient descent that we all know and love. All the gradient based optimisations are only approximations to the local geometry like gradient(local slope) , Hessian(local quadratic approximation of curvature), but optimisation in the statistical manifold can yield the exact curvature and thus are more efficient. This method is called Natural Gradient.

Hope this helps.


That's indeed mathematically very exciting and reason enough to study IG.

But does IG allow us to reason about Neural Nets in new ways that could move the needle on open questions about information representation in ANNs or, even better, BNNs?


Good question. Most people are focussing on the natural gradient and making it as efficient as SGD. But some have been exploring if we can introduce inductive bias in the function space rather than the weight space, using IG. But it is still quite a new field.


BNNs?


They're distinguishing between artificial neural networks(ANN) and biological neural networks(BNN)


Perhaps one day this could be used to reduce distance between mathematical proofs in "proof space"? :)


So it's the geometry of the set of probability measures on a pre-measure space? If so, that sounds like something that might be interesting for point processes as well.


But it also amounted to gradient descent without any mention of information geometry.


Thanks! I just wanted to dabble with Electron (and p5.js, being totally new to JS). It gave me the confidence to work on a bit more serious project.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: