OT: I want to get into transformers for NLP, what's the best way? About me: Most...

FL33TW00D · on Oct 26, 2020

I've just recently been on a journey to understand transformers in and out, here are the resources I've found managed to drill it into my head:

1. Chapters 7, 9, 10

https://web.stanford.edu/~jurafsky/slp3/

This was really useful to really build up to the concepts of attention (although the actual attention section is still brief).

2. https://jalammar.github.io/visualizing-neural-machine-transl...

This was great for visualization and understanding that attention wasn't exclusively for transformers and actually was for RNNs first.

3. https://jalammar.github.io/illustrated-transformer/

Getting to understand how transformers actually work visually.

4. https://www.youtube.com/watch?v=S27pHKBEp30

This lecture by Leo Dirac was extremely helpful to finish off with, not only because it actually includes some pseudocode but it also revisits some key topics and covers why transformers are needed.

One of the big confusion points for me was that the concept of ATTENTION and SELF-ATTENTION are not the same thing.

Hope this helps.

antman · on Oct 26, 2020

These look great resources for transformer, thanks!

kanche · on Oct 26, 2020

Everyone has a different way of approaching. I like video walk-throughs of papers. If you are the same, for the theory bit you may find Yannic Kilcher's videos [0,1] helpful.

On implementation side, the best project out there is [2] with good documentation. Also spacy has transformer package [3] (I personally haven't tried it), so maybe it will be easier for you to jump in if you have prior experience with spacy.

[0] https://www.youtube.com/watch?v=iDulhoQ2pro

[1] https://www.youtube.com/watch?v=-9evrZnBorM

[2] https://huggingface.co/transformers/

[3] https://explosion.ai/blog/spacy-transformers

mrfusion · on Oct 26, 2020

Every time I research transformers it seems so hand wavy. Is there a simple description, maybe a bit of pseudo code?

Or at the other extreme they dump me into formula land without exposing what all the letters in the formula represent.

silveraxe93 · on Oct 26, 2020

This is quite a good explanation of transformers that gets shared a lot. [link](http://jalammar.github.io/illustrated-transformer/)

And here's a super simple implementation of GPT by Andrej Karpathy. [link](https://github.com/karpathy/minGPT/blob/master/mingpt/model....)

sjg007 · on Oct 26, 2020

Transformers are kinda similar to state vectors. They are tracking the current state of the world. The input becomes the output which is the input to the next iteration. The transformer transform the input to the output ad infinitum until a stop token is reached.

silveraxe93 · on Oct 26, 2020

re spacy-transformers. I really wouldn't recommend it. I tried using it but was a nightmare. They had a dependency on a previous major version of Thinc (spacy's NN backend) but removed the documentation for that version. I wasted a week trying to deal with it until I gave up and went pure pytorch.

Spacy v3 seems to have integrated the package functionality, so I'd go for the nightly release instead of this.

syllogism · on Oct 26, 2020

Sorry you lost time on this!

We took a long time to get Thinc documented and stable, because there was a long period where I wasn't sure where I wanted the library to go. The deep learning ecosystem in 2018 was pretty hard to predict, and we didn't want to encourage spaCy users to adopt Thinc as their machine learning code if we weren't sure what its status would be. So we actually never really got Thinc v7 stablised and documented.

This actually became a real issue in the previous version of spacy-transformers. It meant we were pushed into a design for spacy-transformers that really didn't work well. The library wasn't flexible enough, because there was no good way to interact with the transformers at the modelling level.

Pretrained transformers are interesting from an API perspective because you really don't want to put the neural network in a box behind a higher-level API. You can use the intermediate representations in many different ways, so long as you can backprop to them. So you want to expose the neural networking.

Thinc v8 was redesigned and finally documented earlier this year: https://thinc.ai . We now have a clear vision for the library: you can write your models in the library of your choice and easily wrap them in Thinc, so spaCy isn't limited to one particular library. For spaCy's own models, we try to implement them in "pure Thinc" rather than a library like PyTorch or Tensorflow, to keep spaCy itself lightweight (and to stop you from having to juggle competing libraries at the same time).

So, it's not quite true that we removed the docs for Thinc v7. We actually didn't have a good solution to do the things you needed to do in the previous spacy-transformers, which prompted a big redesign.

silveraxe93 · on Oct 26, 2020

Hey thanks for the super detailed response!

Yeah I was trying to do something that didn't quite fit with the spacy-transformers API at the time. I did get a bit of a headache trying to use thinc at the time, which was just when you guys did the redesign I think, so the docs were different from what I was seeing. I might not have searched enough though.

I didn't try it yet, but it seems that transformers got added to spacy v3 with first class support.

I did gain something from rummaging though spacy source though! NN layers were composed into module-like pieces, then added to this REGISTRY variable though a decorator. That way some things could be defined at runtime. It was super elegant.

I nicked the concept of that for my data preprocessing pipeline. Saved me a lot of time when trying new things.

syllogism · on Oct 26, 2020

No worries, and glad it wasn't a total loss! Yeah the registry solution is something we've been very happy with.

desmap · on Oct 26, 2020

What would I miss if went all transfomers without spaCy? I don't get the idea of a wrapper API through spaCy.

I'd like to be as close as possible to the core transformers API without any intermediate layers. Nothing against spaCy but also when looking at huggingface's side and all the pre-trained models... it feels that nobody talks about/uses spaCy if they use transformers already.

syllogism · on Oct 26, 2020

I think spaCy offers a lot of things to connect the models to the rest of your application.

spaCy's Doc object is pretty helpful for using the outputs, for instance you can iterate over the sentences and then iterate over the entities within each sentence, and look at the tokens within them, or get the dependency children of the words in the entity. The Doc object is backed by Cython data structures, so it's more memory efficient and faster than Python equivalents you'd likely write yourself.

I also think our pipeline stuff is a bit more mature than the one in transformers. The transformers pipeline class is relatively new, so I do think our Language object offers a better developer experience.

I think the new training config and improved train command will also be appealing to people, especially with the projects workflow.

The improved transformers support in v3 is very new, it's only just released in beta form. I do hope people find it useful, but of course no library or solution is ideal for every use-case, so I definitely encourage people to pick the mix of libraries that seems right to them.

kanche · on Oct 26, 2020

Missed this news, thanks! OP if you wish to use spacy try v3.

https://explosion.ai/blog/spacy-v3-nightly

xkgt · on Oct 27, 2020

Yannic Kilcher just released a new video[0] on the Performers paper. It will be useful to watch it after going through the above videos on transformers.

[0] https://youtu.be/xJrKIPwVwGM

sytelus · on Oct 26, 2020

+1 for Yannic Kilcher videos!

binarymax · on Oct 26, 2020

Shameless plug - I’m teaching an intense NLP training course over 4 half days, which covers transformers and KNN the latter 2 days. https://opensourceconnections.com/training/natural-language-...

silveraxe93 · on Oct 26, 2020

Spacy v3.0 nightly is out, which has integration with transformer models. So if you already have some familiarity with the package it might be worth a look.

It should be very similar to normal spacy usage, just instead of downloading "en_core_web_sm" etc, it's "en_trf_foo_bar"

nbardy · on Oct 26, 2020

If you know Tensorflow, hugging face is the best way to get started. It's got easy ways to transfer learn from the big models.

alquemist · on Oct 26, 2020

Huggingface is the best, Tensorflow or Pytorch.

Jack000 · on Oct 26, 2020

surprised no one's mentioned fairseq, which is probably the easiest way to train and use a transformer model. With huggingface etc you still have to write a bit of code for preprocessing input, training scheduling, batch inference and multi-gpu but fairseq has all that covered with built-in scripts.

gcarvalho · on Oct 26, 2020

If you have some background on neural networks McCormick's post is a good start.

https://mccormickml.com/2019/11/11/bert-research-ep-1-key-co...

The_rationalist · on Oct 26, 2020

Try spacy alpha 3.0 it integrate the https://github.com/huggingface/transformers library You should almost always use XLnet large in order to achieve the best accuracy