Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pruning ( both structured and unstructured) is an extremely active area of research and development. Since the Ampere generation, Nvidia GPU’s have HW support for sparsified networks.

WRT to overparameterized models, even the largest models such as Google’s PALM 540b model has several orders of magnitude fewer connections than the human brain. Remembering a set of observations doesn’t forbid deducing a general pattern. Imagine trying to understand why the Pythagorean Theorem is true without first memorizing what it states! Children often benefit from seeing and even memorizing many concrete examples before being taught the abstract principles which explain them all. Kepler deduced his Three Laws of Planetary Motion by analyzing four decades of careful astronomical observations compiled by Tycho Brahe. Implicit and explicit regularizations bias modern learning algorithms to seek generally simpler solutions in “wide flat basins of the loss surface.” For instance, the humble skip connection acts to help smooth out the loss surface.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: