FYI .. I work in deep learning and lucidrains is becoming a legend in my line of work. Seems someone who is obsessed about transformers (the deep learning ones, and rightly so, they are amazing). To the author (if you are reading this on HN), I want to thank you for the amazing work you have done!
For the non-DL crowd, Transformers are a tsunami in deep learning for the past few years. They are topping benchmarks in many subfields. I do research professionally and this work is amazingly useful for people like me.
Actually... we're not sure! While it's true that Transformers work amazingly on more domains than they have any right to, it's unclear if they truly obsolete multi-layer perceptrons/fully-connected networks.
You may have seen the recent splash of MLP-Mixer for image classification using perceptrons (https://arxiv.org/abs/2105.01601#google) but FC-heavy or FC-only nets have popped up here and there, never quite going away, and slowly getting better initializations & residual layers making them ever more trainable. I have a bibliography of some links I've noted over the years about them: https://www.gwern.net/notes/FC
I've wondered if MLPs will be another example of the Bitter Lesson - the additional flexibility and power was their undoing until data/compute caught up and 'grad student descent' figured out how to use them right... If in the next 5 years everyone gets on the "MLP Is All You Need" train, I will be only mildly surprised.
A perceptron is essentially a vanilla single-layer linear-activation neural network, like a lone Lego brick. Neural networks are Lego sets, and Transformers are the 1254-piece Millennium Falcon.
As others have mentioned, anything obscure like this should literally come with a Wikipedia (or other such) link to explain what it is, what it does. This is the primary problem with small project READMEs, imo. They assume you're already familiar with them and know what the hell they are. Like, take Ironhide:
https://github.com/MrMEEE/ironhide
Optimus Support for Linux Through VirtualGL - PPA version also available
That's... great. So it's doing something with GL, and it's running on Linux, but uhhh.
my branch of the original bumblebee project..
What is Optimus? What is Bumblebee? The trick of it is that it links to a blog where neither of these terms are ever explained. Maybe it's to just look impressive on someone's CV? How could I even tell the difference?
Likewise for this project, all you need in the README is one line that's like:
X-Transformers is a re-implementation of Machine Learning Transformers that has been built based on experimental Arxiv papers
It's a one-line fix but it'll stop people like me being confused as to whether or not you're implementing a new HTTP header
This. The first line needs to give at least a hint which of about a hundred different meanings of "transformer" is meant. I had never heard of this use. That line would have told me that nothing I knew anything about would be useful to understand what the topic was.
Sure, they have some audience in mind, but not necessarily us. There are a lot of documents that are public but are meant for a specialized audience. If they were meant for the general public, they'd be written very differently.
Sharing a link on Hacker News is effectively taking it out of context and sometimes it's up to us to add that context back in. The author doesn't owe it to us.
There are like 6 different definitions of what a "transformer" is and none of them fall under "machine learning". If you have a README, then that implies its intended for others to read, and so it should be made with that in mind. If I happened to be interested in machine learning, this would be a project that I would bookmark for further use -- except I wouldn't know to bookmark it because they didn't bother to include a total of two words ("machine learning") that give context of how to parse the word "transformer".
The fact that the owner didn't provide that context is not excused by the fact that the repository owner decided to implement a handful of research papers. It absolutely is the responsibility of the authors of projects, and the authors of README files, to provide context about what something does. The entire point of the README is to clear up ambiguity about what something is. You might as well say, why not include a README? People who are using the project know well enough how to run it!
If you had written this up on the issues for the project you would have helped your cause a bit better.
I personally disagree that volunteers owe you anything at all, much less a README, and that we should be thankful for all they do to help as is.
But for real, go post an issue! not trying to play devil's advocate, honestly! Your voice is not heard when complaining anywhere but the repository itself this is open source software and everyone should have a voice!
Normally people will find a Github repo in a way that gives them more context about what it’s about. For example, someone might link to this repo in an article about machine learning.
There is always context and sometimes you don’t have it. Do you expect everything to be written in English for your benefit?
But sure, it would be nice to clarify things for random passers-by so they know it’s not the droid they’re looking for. It’s just not required to do this in every Github repo.
> Do you expect everything to be written in English for your benefit?
I'm not even sure what you mean to say here. Is asking that the README contain a single two letter phrase reference to the subject matter, the same as expecting the entirety of the world to be written in english?
Quite literally all I've stated is that if you're making something that's clearly intended to be public-facing, it's worth taking the exact ½ of a second to allow people who can't read your mind to understand what context they should read something in?
I’m saying that “public-facing” isn’t a simple and binary. The public is diverse. Writing something so everyone can read it is actually pretty hard. Adding two lines so that a random Hacker News reader can understand the README wouldn’t be enough for a nontechnical audience to understand it.
You seem not to understand that people have different comprehension levels and the one you’re in isn’t necessary the most important one to the author.
> Adding two lines so that a random Hacker News reader can understand the README wouldn’t be enough for a nontechnical audience to understand it.
I disagree, it would be enough for a non-technical audience, since it would give enough information for them to do *research* and find out more information.
I sympathize with not knowing about Transformers in an ML sense, but there's plenty of context in the readme. Especially considering the direct links to relevant papers.
Some of lucidrains other projects include the line "This is an implementation of thing with link" which I feel you'd prefer.
Like others, I too encourage you to open an issue to do the same here instead of leaving it till "after the fold".
> This is BS. How many obscure security, devops, etc tools get posted here without so much complaint?
From me? Or are you lumping me in with all the "nay-sayers", despite the fact that I'm trying to offer criticism of a problem that's pretty common in computer science and outside it?
> X-Transformers is a re-implementation of Machine Learning Transformers that has been built based on experimental Arxiv papers.
Make a PR with this in the README.md. You still should - in the worst case it's rejected, but it starts a dialogue with the maintainer that can be referenced later unlike this comment which will be likely be buried and tough for others to find in the future.
lucidrains and Ice Cream are my references in terms of research, knowledge and productivity.
Phil was always available to guide and hear me. One time I told him about an underground research in another language and he was kind enough to check if it had any merit.
About X-Transformers, it is a very great piece of engineering that implemented almost of all possible improvements in transformers. But according to my experience and Phil himself, only the Feedforward GLU and RoPe
(Rotary Positional Embeddings) works (or to be fair, they show improvements in more general use-cases)
Same. Can you imagine some kind of epic, liberated Transformers trademark tech which makes use of a GitHub account, and is called "X-Transformers"? Well I just did for a few seconds... Actually maybe I'm not done yet
Transformers suffer from a quadratic bottleneck when calculating attention. Much work has been done investigating where memory can be saved by being more explicit on which attentions to calculate. This repo implements transformers with noted improvements
def f(x):
for _ in range(3):
x = g(Wx + b)
return x
Essentially it is a matrix multiplication, a vector addition, a non-linearity.
Transformers are a modification to that architecture - using different multiplications, additions, and non-linearities. both of these are general in the sense that if you have enough of them, they can approximate any function. The ones used for transformers empirically do well on a lot of machine learning problems, particularly where data has a sequential nature.
For the non-DL crowd, Transformers are a tsunami in deep learning for the past few years. They are topping benchmarks in many subfields. I do research professionally and this work is amazingly useful for people like me.