Hacker News new | past | comments | ask | show | jobs | submit login

OT: I want to get into transformers for NLP, what's the best way?

About me: Mostly done TS the last years. Dipped into Python, a bit pandas, a bit numpy, a bit Kaggle for the last 3-4 weeks.

Why I ask: It's so easy to get lost, this field is wide, e.g. I spent days with spaCy, CoreNLP, etc. before I learned that transformers-based stuff exist and outperform former.




I've just recently been on a journey to understand transformers in and out, here are the resources I've found managed to drill it into my head:

1. Chapters 7, 9, 10

https://web.stanford.edu/~jurafsky/slp3/

This was really useful to really build up to the concepts of attention (although the actual attention section is still brief).

2. https://jalammar.github.io/visualizing-neural-machine-transl...

This was great for visualization and understanding that attention wasn't exclusively for transformers and actually was for RNNs first.

3. https://jalammar.github.io/illustrated-transformer/

Getting to understand how transformers actually work visually.

4. https://www.youtube.com/watch?v=S27pHKBEp30

This lecture by Leo Dirac was extremely helpful to finish off with, not only because it actually includes some pseudocode but it also revisits some key topics and covers why transformers are needed.

One of the big confusion points for me was that the concept of ATTENTION and SELF-ATTENTION are not the same thing.

Hope this helps.


These look great resources for transformer, thanks!


Everyone has a different way of approaching. I like video walk-throughs of papers. If you are the same, for the theory bit you may find Yannic Kilcher's videos [0,1] helpful.

On implementation side, the best project out there is [2] with good documentation. Also spacy has transformer package [3] (I personally haven't tried it), so maybe it will be easier for you to jump in if you have prior experience with spacy.

[0] https://www.youtube.com/watch?v=iDulhoQ2pro

[1] https://www.youtube.com/watch?v=-9evrZnBorM

[2] https://huggingface.co/transformers/

[3] https://explosion.ai/blog/spacy-transformers


Every time I research transformers it seems so hand wavy. Is there a simple description, maybe a bit of pseudo code?

Or at the other extreme they dump me into formula land without exposing what all the letters in the formula represent.


This is quite a good explanation of transformers that gets shared a lot. [link](http://jalammar.github.io/illustrated-transformer/)

And here's a super simple implementation of GPT by Andrej Karpathy. [link](https://github.com/karpathy/minGPT/blob/master/mingpt/model....)


Transformers are kinda similar to state vectors. They are tracking the current state of the world. The input becomes the output which is the input to the next iteration. The transformer transform the input to the output ad infinitum until a stop token is reached.


re spacy-transformers. I really wouldn't recommend it. I tried using it but was a nightmare. They had a dependency on a previous major version of Thinc (spacy's NN backend) but removed the documentation for that version. I wasted a week trying to deal with it until I gave up and went pure pytorch.

Spacy v3 seems to have integrated the package functionality, so I'd go for the nightly release instead of this.


Sorry you lost time on this!

We took a long time to get Thinc documented and stable, because there was a long period where I wasn't sure where I wanted the library to go. The deep learning ecosystem in 2018 was pretty hard to predict, and we didn't want to encourage spaCy users to adopt Thinc as their machine learning code if we weren't sure what its status would be. So we actually never really got Thinc v7 stablised and documented.

This actually became a real issue in the previous version of spacy-transformers. It meant we were pushed into a design for spacy-transformers that really didn't work well. The library wasn't flexible enough, because there was no good way to interact with the transformers at the modelling level.

Pretrained transformers are interesting from an API perspective because you really don't want to put the neural network in a box behind a higher-level API. You can use the intermediate representations in many different ways, so long as you can backprop to them. So you want to expose the neural networking.

Thinc v8 was redesigned and finally documented earlier this year: https://thinc.ai . We now have a clear vision for the library: you can write your models in the library of your choice and easily wrap them in Thinc, so spaCy isn't limited to one particular library. For spaCy's own models, we try to implement them in "pure Thinc" rather than a library like PyTorch or Tensorflow, to keep spaCy itself lightweight (and to stop you from having to juggle competing libraries at the same time).

So, it's not quite true that we removed the docs for Thinc v7. We actually didn't have a good solution to do the things you needed to do in the previous spacy-transformers, which prompted a big redesign.


Hey thanks for the super detailed response!

Yeah I was trying to do something that didn't quite fit with the spacy-transformers API at the time. I did get a bit of a headache trying to use thinc at the time, which was just when you guys did the redesign I think, so the docs were different from what I was seeing. I might not have searched enough though.

I didn't try it yet, but it seems that transformers got added to spacy v3 with first class support.

I did gain something from rummaging though spacy source though! NN layers were composed into module-like pieces, then added to this REGISTRY variable though a decorator. That way some things could be defined at runtime. It was super elegant.

I nicked the concept of that for my data preprocessing pipeline. Saved me a lot of time when trying new things.


No worries, and glad it wasn't a total loss! Yeah the registry solution is something we've been very happy with.


What would I miss if went all transfomers without spaCy? I don't get the idea of a wrapper API through spaCy.

I'd like to be as close as possible to the core transformers API without any intermediate layers. Nothing against spaCy but also when looking at huggingface's side and all the pre-trained models... it feels that nobody talks about/uses spaCy if they use transformers already.


I think spaCy offers a lot of things to connect the models to the rest of your application.

spaCy's Doc object is pretty helpful for using the outputs, for instance you can iterate over the sentences and then iterate over the entities within each sentence, and look at the tokens within them, or get the dependency children of the words in the entity. The Doc object is backed by Cython data structures, so it's more memory efficient and faster than Python equivalents you'd likely write yourself.

I also think our pipeline stuff is a bit more mature than the one in transformers. The transformers pipeline class is relatively new, so I do think our Language object offers a better developer experience.

I think the new training config and improved train command will also be appealing to people, especially with the projects workflow.

The improved transformers support in v3 is very new, it's only just released in beta form. I do hope people find it useful, but of course no library or solution is ideal for every use-case, so I definitely encourage people to pick the mix of libraries that seems right to them.


Missed this news, thanks! OP if you wish to use spacy try v3.

https://explosion.ai/blog/spacy-v3-nightly


Yannic Kilcher just released a new video[0] on the Performers paper. It will be useful to watch it after going through the above videos on transformers.

[0] https://youtu.be/xJrKIPwVwGM


+1 for Yannic Kilcher videos!


Shameless plug - I’m teaching an intense NLP training course over 4 half days, which covers transformers and KNN the latter 2 days. https://opensourceconnections.com/training/natural-language-...


Spacy v3.0 nightly is out, which has integration with transformer models. So if you already have some familiarity with the package it might be worth a look.

It should be very similar to normal spacy usage, just instead of downloading "en_core_web_sm" etc, it's "en_trf_foo_bar"


If you know Tensorflow, hugging face is the best way to get started. It's got easy ways to transfer learn from the big models.


Huggingface is the best, Tensorflow or Pytorch.


surprised no one's mentioned fairseq, which is probably the easiest way to train and use a transformer model. With huggingface etc you still have to write a bit of code for preprocessing input, training scheduling, batch inference and multi-gpu but fairseq has all that covered with built-in scripts.


If you have some background on neural networks McCormick's post is a good start.

https://mccormickml.com/2019/11/11/bert-research-ep-1-key-co...


Try spacy alpha 3.0 it integrate the https://github.com/huggingface/transformers library You should almost always use XLnet large in order to achieve the best accuracy




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: