Hacker News new | past | comments | ask | show | jobs | submit login
A new model and dataset for long-range memory (deepmind.com)
166 points by atg_abhishek on Feb 10, 2020 | hide | past | favorite | 13 comments



Another great blog post on great research by the DeepMind guys, who are also simultaneously releasing a new dataset for long-range language modeling.

The post is worth reading in its entirety.

If I may summarize, the authors propose a transformer augmented with a short-term memory mechanism (analogous to TransformerXL) as well as a long-term memory mechanism (new) that learns to 'compress and memorize' embeddings from the short-term memory. The model is trained on book-length samples (!!!!), and seems to perform significantly better than prior models at generating language with long-range contexts. To my eyes, text generated by the trained model is virtually indistinguishable from human output, and qualitatively superior to GPT2 samples.


Agreed that the generated sample is superior to similar outputs from GPT-2. Looking at the additional samples in the publication, my first thought is that the model cannot easily stray from or modify the context. Once a fact is stored within the compressed memory, it seems the model cannot easily generate sentences contradictory to that fact. This is problematic because frequent changes to relational information (e.g. the location a character is standing) is fundamental to story telling.


In what way is it superior? I can spot several logical contradictions, just like with samples from GPT-2.


So is there a place to download the trained model? I don't see anything but the dataset available.


They probably won't since DM doesn't open-source most of its work. The authors claimed way back in November that they'd at least open-source the code (https://openreview.net/forum?id=SylKikSYDH) but nothing yet. (The model isn't so big that open-sourcing it is all that important. It's no Turing-NLG https://www.microsoft.com/en-us/research/blog/turing-nlg-a-1... that's for sure!)

In the mean time, there's always Reformer, which has Trax and PyTorch implementations.


I believe the transformer-xl pre-trained model can also be downloaded, to provide a similar long term memory functionality as the compression transformer. I don't have a direct link, but it's available via huggingface https://huggingface.co/transformers/pretrained_models.html


Yeah. I didn't mention Transformer-XL because I'm not sure how much of a long-range dependency it actually learns to handle. The only papers I've seen on recurrency indicate that they tend to learn very short-range dependencies, while something like Reformer with direct access to thousands of timesteps seems more likely to actually be making use of them.


Wow, that's a lot of models. Thanks for pointing that out.


Hi, I have open-sourced the tensorflow model in the sonnet package:

https://github.com/deepmind/sonnet/blob/cd5b5fa48e15e4d020f7...

Will look into releasing some pre-trained weights, but the model trained on PG-19 is not really intended to be a general purpose language generation model so I'd prefer if it not be picked up for downstream applications like gpt2 & bert. The text from these old books contains some historical bias etc.

Hopefully the model can be useful for people wanting to model long sequences generally, or build on other compressive memory ideas.


From the description in the research paper of how they compress the memory it sounds like a form of meta-learning.

Perhaps a network like this would be interested in reading the same books more than once. Perhaps it could find favorite books it wanted to read many times.


So the simple of it, is that it links parts of memory together? Just like human memory? Trying to keep the most valid parts of it together.


In principle a feature of compression is exactly this. Lots of potential in this space.


Thank you, this just made a huge connection for me between the role of sleep, memory, and its role in decision making (in the "consolidated episodic memories" link):

https://www.ncbi.nlm.nih.gov/pubmed/28641107

I was suffering from sleep apnea at this time last year and was on call 1 out of every 3 weeks so was not defragging my brain's hard drive. I got decision fatigue and my productivity fell to 10%, which led to me being unable to work for several months.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: