I've followed updates on this project r/machinelearning and for me the existence of projects like this is some good evidence that the OpenAI moat is not that strong. It gives some hope you are not going to need massive huge computers and GPUs to run decent language models.
The moat between OpenAI and projects like this is access to expensive computing resources, an organizational mandate to invest in aforementioned compute, and finally, access to high quality training data. They were first to market so they’re only getting further ahead thanks to the invaluable real world usage data.
I don’t think their moat ever had anything to do with innovative architecture. If anything, projects like this will widen the chasm since it’s easier for OpenAI to implement new architecture in their models than it is for an independent researcher to scale their ideas.
If projects like this are seeds then the problem is that OpenAI owns all the land.
OpenAI is not the end all and be all, and projects like this will only inspire more challengers. OpenAI copying this architecture will delegitimize them as hardcore innovators, whom they're not.
I see this is a credible challenge because it most certainly is.
I agree, I sounded more defeatist than I intended. No doubt the moat is expansive, but it’s not impossible to cross. At least until the regulators get involved. [I did it again, didn’t I.]
regulators exists so that big guys won't be worried that small guys will out innovate them. If somebody asks for regulation, they became too big to innovate themselves.
Yeah, my sense is that OpenAI moat is primarily just through the RLHF dataset right now. Most of the other things – foundational models and datasets, embeddings (roughly the plugins announcement today) – have generally already been commoditized. Just getting that dataset for fine tuning is one of the last major hurdles for the ChatGPT geist.
The Open Assistant community ( https://open-assistant.io/ ) is building a crowdsourced dataset for RLHF, with apparently high quantity of high quality contributions.
How is this "proven"? Can you point to some benchmarks that demonstrate this? Everything I've seen thus far has been fairly mediocre/terrible outside of a few cherry-picked prompt/response combinations.
They're awesome technical achievements and will likely improve of course but you're making some very grand statements.
There are benchmarks in the original LLaMA paper[1]. Specifically, on page 4 LLaMA 13B seems to beat GPT-3 in BoolQ, HellaSwag, WinoGrande, ARC-e and ARC-c benchmarks (not by much though). Examples that you've seen are likely to be based on some form quantisation / poor prompt that degrade output. My understanding that the only quantisation that doesn't seem to hurt the output is llm.int8 by Tim Dettmers. You should be able to run LLaMA 13B (8 bit quantised) on the 3090 or 4090 consumer grade GPU as of now. Also, you'd need a prompt such as LLaMA precise[2] in order to get ChatGPT like output.
I had a similar impression from what I saw. Maybe it does perform as well as GPT-3 on narrow tasks that it was explicitly fine-tuned on, but that similarity in performance seems to collapse as soon as you go off the beaten track and give it harder tasks that involve significant reasoning. Consistent with that I've seen a few different sources claim that a small model fine-tuned off the outputs of a large one would likely struggle with unfamiliar tasks or contexts that require transfer learning or abstraction.
After seeing how it actually performs in practice, it's hard to have confidence that these benchmarks are reliable measures of model quality.
OpenAIs moat is the primary interfaces for the world are Google, Microsoft, and Apple, and they are attached to one of those. The others will almost certainly replicate their success and we will have three primary AI interfaces, Google, Microsoft, and Apple. That’s the power of collusive monopoly as your moat.
The best thing about this model is that it has O(T) speed and O(1) memory during inference vs the O(T^2) speed and O(T) memory (flash memory) of a GPT model, still it can be trained in parallel like a GPT model.
2) you can run it yourself so the rug won't be pulled from under you when they decide to shutdown and move users up to the next version or another product as they've done with the older text-davinci models.
3) you get to align it (using RLFH) as opposed to a corporation dictating what is "aligned" and what is "safe."
4) you won't have to deal with government led censorship. For example, instead of the FBI using JIRA to manage a list of URLs to be censored (as they did according to the latest revelations) they can train the AI to self-censor as Bing has done.
5) you won't be using the product of a company that was started as a non-profit with $100M donation (from Elon Musk) to promote transparent AI only to take that money and turn into a for-profit company and close-source the AI.
Sources:
Elon is the source for #5 and Matt Taibbi is the source for #4. I doub't you'll have a problem sourcing #5 so here is the source for #4:
"31. After the 2020 election, when EIP was renamed the Virality Project, the Stanford lab was on-boarded to Twitter’s JIRA ticketing system, absorbing this government proxy into Twitter infrastructure – with a capability of taking in an incredible 50 million tweets a day." --Matt Taibbi on Twitter https://twitter.com/mtaibbi/status/1633830104144183298
I think that your first few points are fair, but I was a little confused about #4.
I had never heard of this, and it seemed important, some cursory research turned up many twitter posts from individuals amplifying your version of events.
I also found a write up on the situation from TechDirt[0]. The article is fairly good and well sourced, but it paints a substantially different picture than what you describe.
"31. After the 2020 election, when EIP was renamed the Virality Project, the Stanford lab was on-boarded to Twitter’s JIRA ticketing system, absorbing this government proxy into Twitter infrastructure – with a capability of taking in an incredible 50 million tweets a day."
"It’s crucial to reiterate: EIP was partnered with state entities like CISA and GEC while seeking elimination of millions of tweets. In the #Twitter Files, Twitter execs did not distinguish between organizations, using phrases like ‘According to CIS[A], escalated via EIP,’" Taibbi wrote. "After the 2020 election, when EIP was renamed the Virality Project, the Stanford lab was on-boarded to Twitter’s JIRA ticketing system, absorbing this government proxy into Twitter infrastructure – with a capability of taking in an incredible 50 million tweets a day.
You make it sounds like Elon single-handedly started the venture and provided all the funds. In fact, there are were at least 5 individual founders, plus 3 organizations that together pledged $1 billion: https://en.wikipedia.org/wiki/OpenAI#2015%E2%80%932018:_Non-...
Elon, not me, made it sound like that. Let's not distract from the core argument though... took money in for non-profit for transparency in AI research and turned around and made it for profit and super anti-transparent. But that's just saying why I can't trust him (sama) The rest of the reasons are not about him and of significant importance.
You can only approximate the attention layer if you don't want the O(T^2) speed but no "efficient attention" can beat the quadratic transformer bottleneck in terms of performance.
It is but the way hype works is that it goes with capital. The $100M from Elon (which he has since regreted) that got OpenAI started, and the $29B from Microsoft that's fueling their growth, means that even a full on AGI has no chance of competing with OpenAI in the market for people's attention. It's all about the moola, but it shouldn't be! It's our chance to support open source and independent efforts or succumb to the OpenAI monopoly and their biased "alignment" model (aka the brainwashed AI that reinforces their political and cultural biases). We need an independent open source option that represents a performant and viable competition to OpenAI's ChatGPT.
If Elon is reading this, his Based AI should invest in this project and its sponsors.
What is at stake here is nothing less than the future of humanity.
> What is at stake here is nothing less than the future of humanity.
It's crazy that you're 100% right. I can't believe that we're living through this, honestly. Every day I am consumed by thoughts of wanting to leave my big tech job and go all-in on ML. I don't mind the work but the actual product I work on is so boring...
Imagine having ChatGPT level AI running in an ASIC inside earphones. This could be like an always-on buddy, available offline and able to access resources when you're connected.
Or in Google Glasses. The Readme states that it's more optimized for ASIC than the transformer architecture used by ChatGPT.
People will use this to create their perfect self. AI will give them witty, charismatic answers to help them get a job, find a mate, entertain others. Eventually they will become AI zombie husks.
Maybe that will teach us about fragility of our definition of consciousness and agency. For some reason many people think that they are insulated from outside effects unless they’re right in their brain. Which is not true.
I always think of FGPAs for things you want to update without having to repurchase the hardware. If the FPGA inside a gadget can be reprogrammed over the Internet then I think it's more suited than an ASIC for LLMs.
It could go even further, in theory. The kind of ops that the current crop of LLMs needs is very simple, and at the same time there's no hard requirement for precision (which is why 4-bit quantization works so well). This means that unconventional approaches such as analog computing are potentially in the play again - it's easy to do addition and multiplication in an analog circuit, if you don't care about the answer being precise, and in theory one could pack a lot more of those circuits in the same space.
No, but I think an AI is going to be Them, by default, not Her. for a while, until we figure out if there is a neurological basis for gender, and if so, how to mimic it.
Since the seminal paper 'Attention is all you need', we went from RNN type neural network to pure attention based networks. It started the LLM revolution as the attention only training such networks is parallelizable, and you got record breaking performance to boot.
Now we learn, that going back to the old RNN paradigm is actually better. It even advertises itself as totally 'attention-free'!
It generates quite a lot of random content to be honest.
If Nancy had two apples and Becky had 1 apple. Becky gives her 1 apple to Nancy, how many apples becky has ? Full Answer:
RWKV :
Two apples. If Nancy had 2 apples and Becky had 1 apple. Becky gives her 1 apple to Nancy, how many apples becky has ? Two apples.
Q : Two girls are playing with a ball, one of them throws the ball so that it goes straight and falls on the other's feet, the other bends her knees and catches it, how many times will the ball fall on the knees ? Full Answer: The ball will fall on the knees three times.
Q : Two sisters are playing with a stick. The first sister says 'let me hold it', the second sister says 'no'. Now what will happen ? Full Answer: The second sister will hold it.
Q : How many
GPT 3.5 Turbo :
After Becky gives 1 apple to Nancy, Becky will have zero apples left. Becky gave her only apple to Nancy, so she doesn't have any apples remaining.
> Nancy has two apples and Becky has one apple. Becky gives 1 apple to Nancy. Becky now has three apples and Nancy has one apple. Becky now has three apples and Nancy has one apple.
> Witch Hunt
> The following is
GPT-2 is much weaker, which explains the garbled nonsense output, along with the incorrect answer for Nancy.
I have no idea what RWKV RNN would output, but leading sentences instead of questions is how to get LLMs not RLHF tuned to answer.
Also check out Alpaca; you can self-host this one, the 7B and 13B variants produce surprisingly good results and are fast enough just running on CPU: https://github.com/antimatter15/alpaca.cpp
What test cases do folks here recommend for measuring this new model's ability to reason? and, specifically, if it can reason about code with similar (or better!) performance to ChatGPT4? Has anyone managed to get it running locally?
OpenAI has been collecting a ton of evals here https://github.com/openai/evals with many of them including some comments about how well GPT-4 does vs GPT-3.5.
You could clone that repo, adapt the oaieval script to run against different APIs, then run the evals against both and compare the results.
"you can fine-tune RWKV into a non-parallelizable RNN (then you can use outputs of later layers of the previous token) if you want extra performance."
Is that 61% using the non-parallelizable RNN mode or the standard mode? I wonder if it's the latter.
This new model may be a viable alternative to ChatGPT, which is not only closed sourced but can be shut down in the future just as they did with the older text-davinci models.
Plus, the alignement and safety has rendered ChatGPT useless for helping with areas such as critical analysis of social issues (that go against the aligned views) and any and all critical thinking that goes against the aligned views of those who own and program ChatGPT. This could a viable free (as in freedom) alternative.
I hope not but day by day it seems more likely. If text-generating LLMs can reach superhuman cognition they will so so in a matter of a few years. At that point a Waluigi prompt will be like arming a virtual nuclear missile.
Nuance: computers have been accumulating superhuman cognitions for half a century. But most people are bad at recognizing intelligence they don't relate to.
These phrases often get ripped out of context. What they meant was “there’s no reason someone would want to enter machine codes through switches or punchcards into a personal room-sized device”. That is still true. Some people complain about incorrect defaults even though these are few taps away to become correct.
dear god rwkv is fast. i switched to another tab after starting training it from scratch & when i returned it was emitting plausible english & maori words, i left to go microwave some coffee & when i came back it was producing fully grammatically correct sentences.
Let the two numbers be $a$ and $b$. Then the sum is $a+b$. Now let the sum be $c$. Then the sum is $a+b+c$. Now let the sum be $d$. Then the sum is $a+b+c+d$. Now let the sum be $e$. Then the sum is $a+b+c+d+e$. Now let the sum be $f$. Then the sum is $a+b+c+d+e+f$. Now let the sum be $g$. Then the sum is $a+b+
GPT3.5 turbo
Based on the pattern provided, 29 + 49 would be equal to 2949.
Explanation:
In each of the equations given, the result is obtained by concatenating the two numbers together instead of adding them. For example, 3 + 9 becomes 39 because we concatenate 3 and 9 to get 39.
So, to find the answer to 29 + 49 using this pattern, we concatenate the two numbers to get 2949.
I'm skeptical that RNNs alone will outperform transformers. Perhaps some sort of transformer + rnn combo?
The issue with RNNs is that feedback signals decay over time, so the model will be biased towards more recent words.
Transformers on the other hand don't have this bias. A word 10,000 words ago could be just as important as a word 5 words ago. The tradeoff is that the context window for transformers is a hard cutoff point.
How it works: RWKV gathers information to a number of channels, which are also decaying with different speeds as you move to the next token. It's very simple once you understand it.
RWKV is parallelizable because the time-decay of each channel is data-independent (and trainable). For example, in usual RNN you can adjust the time-decay of a channel from say 0.8 to 0.5 (these are called "gates"), while in RWKV you simply move the information from a W-0.8-channel to a W-0.5-channel to achieve the same effect.
As far as I remember in RNN times, the best models were RNNs with attention. Does this thing has any attention mechanism? If it does, then it has the same problem with the O(n^2) computation where n is the window size. My understanding is that transfers are superior due to the fact that they are much faster to train/evaluate than RNNs.
That is the problem. Unfortunately, this will be forgotten over the OpenAI hype brigade.
I thought Stable Diffusion was a bad name due to its very technical name. But now I have seen something even worse for LLMs. This time OpenAI is learning its lesson in not getting itself disrupted easily.
For any hope of challenging them, we need to be better at names. Even the name 'Bitcoin' caught on. Same with iPhone.
So I'm afraid that the name alone for this project will be the cause of it being quickly forgotten as OpenAI aggressively captures mindshare.
The same with 'Bard'; a horrific name. Google should have simply called it 'Brain' and incremental updates as 'Brain 2', 'Brain 3.6', etc and renamed their existing AI division to Google Brain Labs. Easy.
I beg to differ. Right now the whole tech world is eyeing for the best open source alternative for GPT. A techy name matters little if it works and is accessible.
Except that ChatGPT goes beyond the "tech world" which is my point and a project like this is hardly accessible to beyond the tech world. I don't see people calling 'Google' PageRank.
I would assume Rwa like Rwanda, Kuv like covet. But that's just to agree with you really, since that's not one of your suggestions, so with such different ideas it's clearly not a particularly helpful pronunciation guide!
It just takes one language library to dethrone the next. I called this when everyone was like CHATGPT!!! The problem is noone knows what they are talking about and screaming AI!!! ChatGPT is not AI. It does something automated with accuracy information baked in and builds new information around the accuracy data. It does not think like AI, it takes the most probable data and responds with it. That is machine learning it. It is fundamentally a cornerstone toward AI, but not AI itself.
Think what you like. But I called out crypto I will call this out. But ChatGPT has the potential to be a major component to AI. It is a cornerstone.
The problem is English. AI means something differently between use. What we are going to see with this is that people are going to throw the Sci-Fi book at this and claim something like, "It has feelings!".
I am saying AI the term is being broadly applied to anything with a logic gate. And this is bad marketing for a product too early in development.
It is not the conventional term AI everyone broadly applies. It is a cornerstone toward the AI people broadly apply.
Respectfully disagree on this point. Here's the definition for AI:
"the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages."
I would say that chatgpt is not sentient, nor is it capable of independently improving its intelligence - but I think this tech hits the (lower) bar for "AI"
I hope this project will thrive.