X-Transformers: A fully-featured transformer with experimental features

throwawaybbq1 · on May 9, 2021

FYI .. I work in deep learning and lucidrains is becoming a legend in my line of work. Seems someone who is obsessed about transformers (the deep learning ones, and rightly so, they are amazing). To the author (if you are reading this on HN), I want to thank you for the amazing work you have done!

For the non-DL crowd, Transformers are a tsunami in deep learning for the past few years. They are topping benchmarks in many subfields. I do research professionally and this work is amazingly useful for people like me.

lucidrains · on May 9, 2021

Thanks for the kind words :) Hope you train something amazing with the code

mrfusion · on May 9, 2021

So they replace the perceptron?

gwern · on May 9, 2021

Actually... we're not sure! While it's true that Transformers work amazingly on more domains than they have any right to, it's unclear if they truly obsolete multi-layer perceptrons/fully-connected networks.

You may have seen the recent splash of MLP-Mixer for image classification using perceptrons (https://arxiv.org/abs/2105.01601#google) but FC-heavy or FC-only nets have popped up here and there, never quite going away, and slowly getting better initializations & residual layers making them ever more trainable. I have a bibliography of some links I've noted over the years about them: https://www.gwern.net/notes/FC

I've wondered if MLPs will be another example of the Bitter Lesson - the additional flexibility and power was their undoing until data/compute caught up and 'grad student descent' figured out how to use them right... If in the next 5 years everyone gets on the "MLP Is All You Need" train, I will be only mildly surprised.

psychomugs · on May 9, 2021

A perceptron is essentially a vanilla single-layer linear-activation neural network, like a lone Lego brick. Neural networks are Lego sets, and Transformers are the 1254-piece Millennium Falcon.

fao_ · on May 8, 2021

As others have mentioned, anything obscure like this should literally come with a Wikipedia (or other such) link to explain what it is, what it does. This is the primary problem with small project READMEs, imo. They assume you're already familiar with them and know what the hell they are. Like, take Ironhide:

    https://github.com/MrMEEE/ironhide
    Optimus Support for Linux Through VirtualGL - PPA version also available

That's... great. So it's doing something with GL, and it's running on Linux, but uhhh.

    my branch of the original bumblebee project..

What is Optimus? What is Bumblebee? The trick of it is that it links to a blog where neither of these terms are ever explained. Maybe it's to just look impressive on someone's CV? How could I even tell the difference?

Likewise for this project, all you need in the README is one line that's like:

   X-Transformers is a re-implementation of Machine Learning Transformers that has been built based on experimental Arxiv papers

It's a one-line fix but it'll stop people like me being confused as to whether or not you're implementing a new HTTP header

dhruvdh · on May 8, 2021

It took me 30 seconds of scrolling and skimming through the README to understand what it does, and

    X-Transformers is a re-implementation of Machine Learning Transformers that has been built based on experimental Arxiv papers

doesn't really give any more information than what is already in the README -

    A concise but fully-featured transformer, complete with a set of promising experimental features from various papers.

What useful extra information did your sentence add vs this?

magicalhippo · on May 9, 2021

Well it says it's about machine learning, so gives a bit more context. To me transformers is something that is used in electrical circuits...

ncmncm · on May 9, 2021

This. The first line needs to give at least a hint which of about a hundred different meanings of "transformer" is meant. I had never heard of this use. That line would have told me that nothing I knew anything about would be useful to understand what the topic was.

nerdponx · on May 8, 2021

I would agree if this were some kind of public release announcement.

Would you say the same about an experimental programming language based on a bunch of recent PLT/CS research?

That's pretty much what this is, but for machine learning. It's not meant to be "for the public", it's effectively research.

enchiridion · on May 8, 2021

We're talking 2-3 sentences to explain what's going on.

A researcher releasing work publicly on github is presumably doing so too spread the ideas.

skybrian · on May 8, 2021

Sure, they have some audience in mind, but not necessarily us. There are a lot of documents that are public but are meant for a specialized audience. If they were meant for the general public, they'd be written very differently.

Sharing a link on Hacker News is effectively taking it out of context and sometimes it's up to us to add that context back in. The author doesn't owe it to us.

fao_ · on May 9, 2021

> The author doesn't owe it to us.

There are like 6 different definitions of what a "transformer" is and none of them fall under "machine learning". If you have a README, then that implies its intended for others to read, and so it should be made with that in mind. If I happened to be interested in machine learning, this would be a project that I would bookmark for further use -- except I wouldn't know to bookmark it because they didn't bother to include a total of two words ("machine learning") that give context of how to parse the word "transformer".

The fact that the owner didn't provide that context is not excused by the fact that the repository owner decided to implement a handful of research papers. It absolutely is the responsibility of the authors of projects, and the authors of README files, to provide context about what something does. The entire point of the README is to clear up ambiguity about what something is. You might as well say, why not include a README? People who are using the project know well enough how to run it!

ShamelessC · on May 10, 2021

If you had written this up on the issues for the project you would have helped your cause a bit better.

I personally disagree that volunteers owe you anything at all, much less a README, and that we should be thankful for all they do to help as is.

But for real, go post an issue! not trying to play devil's advocate, honestly! Your voice is not heard when complaining anywhere but the repository itself this is open source software and everyone should have a voice!

skybrian · on May 9, 2021

Normally people will find a Github repo in a way that gives them more context about what it’s about. For example, someone might link to this repo in an article about machine learning.

There is always context and sometimes you don’t have it. Do you expect everything to be written in English for your benefit?

But sure, it would be nice to clarify things for random passers-by so they know it’s not the droid they’re looking for. It’s just not required to do this in every Github repo.

fao_ · on May 10, 2021

> Do you expect everything to be written in English for your benefit?

I'm not even sure what you mean to say here. Is asking that the README contain a single two letter phrase reference to the subject matter, the same as expecting the entirety of the world to be written in english?

Quite literally all I've stated is that if you're making something that's clearly intended to be public-facing, it's worth taking the exact ½ of a second to allow people who can't read your mind to understand what context they should read something in?

skybrian · on May 10, 2021

I’m saying that “public-facing” isn’t a simple and binary. The public is diverse. Writing something so everyone can read it is actually pretty hard. Adding two lines so that a random Hacker News reader can understand the README wouldn’t be enough for a nontechnical audience to understand it.

You seem not to understand that people have different comprehension levels and the one you’re in isn’t necessary the most important one to the author.

fao_ · on May 12, 2021

> Adding two lines so that a random Hacker News reader can understand the README wouldn’t be enough for a nontechnical audience to understand it.

I disagree, it would be enough for a non-technical audience, since it would give enough information for them to do *research* and find out more information.

cmorez · on May 10, 2021

I sympathize with not knowing about Transformers in an ML sense, but there's plenty of context in the readme. Especially considering the direct links to relevant papers.

Some of lucidrains other projects include the line "This is an implementation of thing with link" which I feel you'd prefer.

Like others, I too encourage you to open an issue to do the same here instead of leaving it till "after the fold".

nerdponx · on May 9, 2021

This is BS. How many obscure security, devops, etc tools get posted here without so much complaint?

Paid products, public releases, etc. are different and are treated differently.

fao_ · on May 10, 2021

> This is BS. How many obscure security, devops, etc tools get posted here without so much complaint?

From me? Or are you lumping me in with all the "nay-sayers", despite the fact that I'm trying to offer criticism of a problem that's pretty common in computer science and outside it?

solidasparagus · on May 9, 2021

This repo isn't confusing for people in the field who would care about the project. Transformers are ubiquitous and need no explanation.

ShamelessC · on May 8, 2021

> X-Transformers is a re-implementation of Machine Learning Transformers that has been built based on experimental Arxiv papers.

Make a PR with this in the README.md. You still should - in the worst case it's rejected, but it starts a dialogue with the maintainer that can be referenced later unlike this comment which will be likely be buried and tough for others to find in the future.

bratao · on May 9, 2021

lucidrains and Ice Cream are my references in terms of research, knowledge and productivity. Phil was always available to guide and hear me. One time I told him about an underground research in another language and he was kind enough to check if it had any merit.

About X-Transformers, it is a very great piece of engineering that implemented almost of all possible improvements in transformers. But according to my experience and Phil himself, only the Feedforward GLU and RoPe (Rotary Positional Embeddings) works (or to be fair, they show improvements in more general use-cases)

lucidrains · on May 9, 2021

Lol, thanks for mentioning Ice Cream

argvargc · on May 8, 2021

Unfortunately for me, I genuinely thought this was going to be a DIY robot build that could disguise itself as something else.

CamperBob2 · on May 9, 2021

I was hoping for some new ferromagnetic hotness, myself.

themodelplumber · on May 9, 2021

Same. Can you imagine some kind of epic, liberated Transformers trademark tech which makes use of a GitHub account, and is called "X-Transformers"? Well I just did for a few seconds... Actually maybe I'm not done yet

giords · on May 8, 2021

*for us

adontz · on May 8, 2021

I have expected to see a 3D model for Optimus Prime.

bravura · on May 9, 2021

What do you use for images that don’t have identical height and width? It seems the image transformer here expects square images.

krick · on May 8, 2021

That's really cool. Now I need a bunch of pre-trained models for this...

mrfusion · on May 8, 2021

Explain like I’m a first year CS major?

erik_seaberg · on May 8, 2021

It’s a kind of machine learning model: https://medium.com/inside-machine-learning/what-is-a-transfo...

thesehands · on May 8, 2021

Transformers suffer from a quadratic bottleneck when calculating attention. Much work has been done investigating where memory can be saved by being more explicit on which attentions to calculate. This repo implements transformers with noted improvements

ericjang · on May 9, 2021

vanilla neural networks from the 1950's look like

  def f(x):
    for _ in range(3):
      x = g(Wx + b)
    return x

Essentially it is a matrix multiplication, a vector addition, a non-linearity.

Transformers are a modification to that architecture - using different multiplications, additions, and non-linearities. both of these are general in the sense that if you have enough of them, they can approximate any function. The ones used for transformers empirically do well on a lot of machine learning problems, particularly where data has a sequential nature.

shayankh · on May 8, 2021

absolutely fucking amazing