A new model and dataset for long-range memory

cs702 · on Feb 10, 2020

Another great blog post on great research by the DeepMind guys, who are also simultaneously releasing a new dataset for long-range language modeling.

The post is worth reading in its entirety.

If I may summarize, the authors propose a transformer augmented with a short-term memory mechanism (analogous to TransformerXL) as well as a long-term memory mechanism (new) that learns to 'compress and memorize' embeddings from the short-term memory. The model is trained on book-length samples (!!!!), and seems to perform significantly better than prior models at generating language with long-range contexts. To my eyes, text generated by the trained model is virtually indistinguishable from human output, and qualitatively superior to GPT2 samples.

JRKrause · on Feb 10, 2020

Agreed that the generated sample is superior to similar outputs from GPT-2. Looking at the additional samples in the publication, my first thought is that the model cannot easily stray from or modify the context. Once a fact is stored within the compressed memory, it seems the model cannot easily generate sentences contradictory to that fact. This is problematic because frequent changes to relational information (e.g. the location a character is standing) is fundamental to story telling.

scribu · on Feb 10, 2020

In what way is it superior? I can spot several logical contradictions, just like with samples from GPT-2.

ColanR · on Feb 10, 2020

So is there a place to download the trained model? I don't see anything but the dataset available.

gwern · on Feb 10, 2020

They probably won't since DM doesn't open-source most of its work. The authors claimed way back in November that they'd at least open-source the code (https://openreview.net/forum?id=SylKikSYDH) but nothing yet. (The model isn't so big that open-sourcing it is all that important. It's no Turing-NLG https://www.microsoft.com/en-us/research/blog/turing-nlg-a-1... that's for sure!)

In the mean time, there's always Reformer, which has Trax and PyTorch implementations.

hooande · on Feb 10, 2020

I believe the transformer-xl pre-trained model can also be downloaded, to provide a similar long term memory functionality as the compression transformer. I don't have a direct link, but it's available via huggingface https://huggingface.co/transformers/pretrained_models.html

gwern · on Feb 11, 2020

Yeah. I didn't mention Transformer-XL because I'm not sure how much of a long-range dependency it actually learns to handle. The only papers I've seen on recurrency indicate that they tend to learn very short-range dependencies, while something like Reformer with direct access to thousands of timesteps seems more likely to actually be making use of them.

ColanR · on Feb 10, 2020

Wow, that's a lot of models. Thanks for pointing that out.

jwrae · on Feb 14, 2020

Hi, I have open-sourced the tensorflow model in the sonnet package:

https://github.com/deepmind/sonnet/blob/cd5b5fa48e15e4d020f7...

Will look into releasing some pre-trained weights, but the model trained on PG-19 is not really intended to be a general purpose language generation model so I'd prefer if it not be picked up for downstream applications like gpt2 & bert. The text from these old books contains some historical bias etc.

Hopefully the model can be useful for people wanting to model long sequences generally, or build on other compressive memory ideas.

ganzuul · on Feb 10, 2020

From the description in the research paper of how they compress the memory it sounds like a form of meta-learning.

Perhaps a network like this would be interested in reading the same books more than once. Perhaps it could find favorite books it wanted to read many times.

nloladze · on Feb 11, 2020

So the simple of it, is that it links parts of memory together? Just like human memory? Trying to keep the most valid parts of it together.

ganzuul · on Feb 11, 2020

In principle a feature of compression is exactly this. Lots of potential in this space.

zackmorris · on Feb 11, 2020

Thank you, this just made a huge connection for me between the role of sleep, memory, and its role in decision making (in the "consolidated episodic memories" link):

https://www.ncbi.nlm.nih.gov/pubmed/28641107

I was suffering from sleep apnea at this time last year and was on call 1 out of every 3 weeks so was not defragging my brain's hard drive. I got decision fatigue and my productivity fell to 10%, which led to me being unable to work for several months.