Machine learning spots natural selection at work in human genome

nestorD · on Nov 2, 2018

> To find these patterns, a growing number of geneticists are turning to a form of machine learning called deep learning. Proponents of the approach say that deep-learning algorithms incorporate fewer explicit assumptions about what the genetic signatures of natural selection should look like than do conventional statistical methods.

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. “What are you doing?”, asked Minsky. “I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied. “Why is the net wired randomly?”, asked Minsky. “I do not want it to have any preconceptions of how to play”, Sussman said. Minsky then shut his eyes. “Why do you close your eyes?”, Sussman asked his teacher. “So that the room will be empty.” At that moment, Sussman was enlightened.

TangoTrotFox · on Nov 2, 2018

I think AlphaGo vs AlphaZero is a strong argument against this. AlphaGo used the best knowledge of humanity to try to tune its play and mix human expertise, and centuries of master level play, with the strength of deep learning systems. It seems like this would be ideal, particularly as per your analogy. Google certainly believed this as this is the system they directed their resources towards developing and then very publicly demonstrating.

AlphaZero was likely a curious aside at one point. The idea with AlphaZero is to throw everything out the window. Even the game itself is just generalized into a generic position->transition->terminal state system where most 'simple' (in terms of rules) games can be relatively easy replaced. There was no human heuristics, no human games to go from instead relying entirely on self training, and so on. Of course it ended up that AlphaZero became vastly stronger vastly faster than AlphaGo ever managed to.

ctchocula · on Nov 2, 2018

It was shocking to me too that adding prior of human games limited the potential that the go playing computer could have.

ashelmire · on Nov 2, 2018

Humans are so bad that fitting the network to our games trapped it in a region of local minima that weren't even close to the ideal, I guess.

vl · on Nov 2, 2018

AlphaZero heavily used architecture developed for AlphaGo, an in this sense there is some prior embedded in the architecture, but to jump-start development of this fine-tuned architecture human prior was useful. I.e. path from nothing to AlphaZero would be more difficult then nothing->AlphaGo->AlphaZero, imho.

paraschopra · on Nov 2, 2018

I increasingly feel that deep learning needs to incorporate more ideas from evolution. Not just for parameter optimization but architecture discovery itself.

Imagine pitting neural networks in an adversarial environment (just like the real world). Under competitive pressure for limited food (computation resources), evolved neural network could start approaching optimal architectures that do the job but have no superfluous, pre-conceived notions of the (modeled) world. In fact, such evolved architectures could encode relevant notions of the world directly which we can learn about by reverse engineering evolved architecture.

This is closely related to the ideas from predictive processing which closely ties survival to prediction (as to predict your future states is to avoid getting dissipated). So I'm anticipating evolution or survival notions to come up in a big way in ML/Deep Learning in future.

rytill · on Nov 2, 2018

This is currently very prolific and successful. You don’t have to imagine anything.

Latest popular paper on the specific application of evolution you’re talking about: https://arxiv.org/abs/1808.00193

DEADBEEFC0FFEE · on Nov 2, 2018

Do you mean, like a GA of NNs, pretty sure that has been done many times.

Izkata · on Nov 2, 2018

> Imagine pitting neural networks in an adversarial environment (just like the real world). Under competitive pressure for limited food (computation resources), evolved neural network could start approaching optimal architectures that do the job but have no superfluous, pre-conceived notions of the (modeled) world.

Sounds almost like these robots [0], which learned to lie to each other. They didn't use a neural net, though.

[0] http://discovermagazine.com/2008/jan/robots-evolve-and-learn...

ovi256 · on Nov 2, 2018

>approaching optimal architectures that do the job but have no superfluous, pre-conceived notions of the (modeled) world

Usually, modelling efficiency is related to the amount of priors. The more (correct) priors, the faster a model learns, and from less data.

A prior-poor model just spends data learning those correct priors in the first place.

There are countless examples I could give to support intuition of this. There isn't, AFAIK, any theoretical work yet.

That the priors are correctly aligned with reality is vital. A model with fixed bad priors cannot unlearn them. If the priors are bad but not fixed, you're just spending time and data to correct those priors.

paraschopra · on Nov 2, 2018

Yes, and evolution builds priors into organisms.

dwiel · on Nov 2, 2018

We know all of the rules of tic-tac-toe. We have no idea how much we dont know about genetics.

The things we know can still be incorporated in various ways with deep learning, but we avoid having to make as many obviously not universally applicable assumptions as with less complex statistical methods.

ssivark · on Nov 2, 2018

> we avoid having to make as many obviously not universally applicable assumptions

The point of the above koan is that there are always assumptions (priors) whether you like them or not. You can make them explicit or random and unknown. This is the crux of the bias-variance tradeoff [1]. Whether deep NNs manage to circumvent that in any meaningful way (especially outside the domain of spatial/temporal analog signals) is an open question.

[1]: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff

teekert · on Nov 2, 2018

Exactly, there is simply no way for a human to understand this stuff without (some layers of) abstraction, each abstraction introduces bias and/or error.

9712263 · on Nov 2, 2018

How could I embed a knowledge/assumption in deep learning neurons? In high level programming language, it would be easy, but tweaking the neurons parameter to embed that knowledge? Sounds more difficult than writing machine code.

matt4077 · on Nov 2, 2018

Any deviation from a series of full-connected layers represents some assumption being made, usually to reduce the size of the parameter space to a subset that is considered more promising.

Convolutions are one example: they assume that proximity correlates with a logical connection.

Note that this is a very useful assumption. Just shuffle the pixels in a photo and try to discern what they show to see how much we rely on that assumption. In fact I'm having trouble coming up with an obvious counterexample[0].

So let's not fall into the trap of these armchair scientists with the big spliff, staring into the distance and intoning trivialities with the air of revelation: "Man.... you're just a slave to your assumptions. What if, like, space and time are one and the same?"

In fact, one could argue that all of AI is an endeavour to find abstract rules defining what's "trivially obvious" to us. You don't have to explain to children that objects in the distance are smaller than when they are close.

Once you succeed with that, it's possible that ML can find a sort of post-modern reality. One that we are blinded to for cultural reasons and the structure of our perception: what if God, for example, appears in the form of seemingly random "pixel errors"? You would easily miss her constant presence due to all the error correction in the pathway of your perception (and also your camera sensors).

But that's the future. Just as art often flourishes within the confines of (often arbitrary) limitations, so do we. And embracing these limitations is not done for reason of ignorance, but expedience.

visarga · on Nov 2, 2018

Depends. Many standard layers express a form of prior knowledge. A CNN layer embeds the assumption of spatial translation invariance, an RNN does the same for temporal translation. Graph Neural Nets have permutation invariance. Assumptions can also be expressed as regularisation terms added to the loss function. One common practice is to initialise a net with the weights of another net trained on a related task - usually CNNs trained on ImageNet, and word embeddings for NLP (though lately it is possible to use deep neural nets such as BERT, ELMo, ULMFiT and OpenAI transformer pre-trained on large text corpora).

ovi256 · on Nov 2, 2018

This is basically the biggest research question the field of ML has. How to express useful prior knowledge and how to embed it in a network ?

The current answer mostly is "network architectures can express priors". That's what matt's convolutional net example is about.

This is clearly insufficient to solve the many still unsolvable-by-deep-learning problems, so the search is still on for more mechanisms.

paraschopra · on Nov 2, 2018

The point is that even randomly wired network has an implicit bias. NNs initialize with random weights so even an untrained network is biased.

ArtWomb · on Nov 2, 2018

It's not just black box modelling of variants to hunt for mutations in a population anymore. But full functional design of RNA molecules for regulation and expression ;)

Learning to Design RNA

https://openreview.net/forum?id=ByfyHh05tQ

mxwsn · on Nov 2, 2018

Hm, please correct me if I'm wrong, but isn't the Zuker algorithm inherently a dynamic programming algorithm? If so, I would have liked to see this paper compare to directly optimizing the input space using a differentiable form of the Zuker algorithm, e.g., using ideas related to https://arxiv.org/abs/1802.03676.

IMO it's unnatural to rely on deep reinforcement learning when we know so much about the mapping between input space and reward space (here, because Zuker's algorithm is not a black box). To me, deep RL is primarily for the domain where the mapping is highly complex and largely black box. Deep RL might be more useful if the authors didn't use Zuker's approximate algorithm but instead performed molecular simulations to determine secondary structure folding that is more faithful to reality.

mistrial9 · on Nov 2, 2018

no, just no. It is not enlightened to become isolated to pure math rules, it is automaton. Integration with a living, active world is not optional. Read Minsky yourself, minus the hero-worship... very similar response to Kurzweil and a few others, btw.

tw1010 · on Nov 2, 2018

I don't see how that applies at all in this situation.

Amygaz · on Nov 2, 2018

A) Things don’t happen in a void.

B) If you try hard enough, you will always find a statistic that correlate with something.

C) In that specific paper, “deep learning” could be changed to randomly clustering, and then trying to come up with a reason

_0ffh · on Nov 2, 2018

I see you read the jargon files...

screye · on Nov 2, 2018

Found a great phrase, about failure modes of deep learning based authentication systems.

"Security through obscurity"

If no one knows how to make a system fail, not even the creator. Then there is no deterministic way to beat the authenticator, but brute force.

source: computerphile

a_bonobo · on Nov 2, 2018

There was a very fun preprint a few weeks ago on biorxiv: https://www.biorxiv.org/content/early/2018/10/22/336073

The same trick as DeepVariant, if you encode your genetic variants as images and give that to a CNN you get reasonably good results without doing much extra work!

klmr · on Nov 2, 2018

DeepVariant didn’t actually explicitly encode the variants (nor the raw data) as images. Press reports —including from Google Research itself) suggested that this was the case but nothing in the original publication said so, and the researchers themselves have disputed it.

It just so happens that the tensor representation lends itself well to visualisation as (multi-channel) images. But that wasn’t the intent, it’s just a nice side-effect. In reality the data is laid out in tensors that follow naturally from how they were generated (i.e. via alignment of many stochastically generated DNA fragments to a reference sequence).

a_bonobo · on Nov 2, 2018

Interesting!! I did not know that, so I was wrong - but why does the GitHub repo then talk of pileup images?

https://github.com/google/deepvariant

Edit: I think I get it now, this notebook shows it: https://github.com/google/deepvariant/blob/r0.7/docs/visuali... The channels are easily representable as images, but the network doesn't use the images directly.

klmr · on Nov 2, 2018

Yeah, the way they talk about this is admittedly confusing. The best way for me to think about it (from [1]) is to mentally add quotation marks around the word “image” whenever the methods mention it. Mathematically the data is stored in higher-dimensional tensors with one dimension representing genomic coordinates, another dimension representing the sequence depth (i.e. one row per sequence read), and additional dimensions representing features of the sequence (such as nucleoside identity, read direction, error probability, match/mismatch with the reference, etc). I use almost the same kinds of tensors for work that has no tangible relation to images or visualisation. It’s simply the most straightforward way of representing this data as a tensor.

But the similarity to images is so tantalising that even the paper’s methods [2] fall prey to this, even though the dimensionality and minor details are wrong. Furthermore, the term “pileup image” refers to a common way of visualising genome–read alignments [3]. The DeepVariant tensor is not a pileup image but it is very close. And the tensor can be converted in to an image [4] but as mentioned this requires some transformations (splitting the channels, and rescaling the values).

[1] https://bioinformatics.stackexchange.com/q/4098/29

[2] I initially claimed the paper didn’t mention this. That’s wrong; apologies.

[3] https://www.google.com/search?tbm=isch&q=genome+pileup

[4] https://github.com/google/deepvariant/blob/r0.7/docs/visuali...

a_bonobo · on Nov 3, 2018

Thank you for the explanation, it makes perfect sense - as an RGB picture is represented in a three dimensional list of numbers for red, green, blue you're not constrained by that, you might as well throw anything in there.

searine · on Nov 2, 2018

Great paper by good scientists. Also, cool that they put it on biorxiv.

tmaly · on Nov 2, 2018

I would be interested to hear more about how you overcome these cases when you do not have enough samples to train on. This seems like a useful area to improve upon.

mactrey · on Nov 2, 2018

So is the University of Oregon in Portland or Eugene?