Promote and proliferate local LLMs. If you use GPT, you're giving OpenAI money t...

andai · on July 6, 2023

Can you elaborate on scary smart and fast?

It's been a month or two since I've tried but the results were depressingly slow and useless for more or less every task I tried.

Every time a model is claimed to be "90% of GPT-3" I get excited and every time it's very disappointing.

(On that note, after using GPT-4, GPT-3 now seems disappointing almost every time I interact with it.)

PostOnce · on July 7, 2023

Different quantizations can give you a big speedup if you've had "depressingly slow" issues. Even the slowest ones (that fit in RAM) will run at basically interactive speed, not instant, but also not "email speed". I have a laptop with a 2018 CPU and I'm working with them just fine.

Text generation style instead of chat style is another avenue that makes the feedback time not so annoying for a developer.

at 100ms/token, it's faster than most people type, I think. That's what you might get on an old laptop with a 7B model.

There's a useful leaderboard here to help you pick a model: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

It really depends on your task, lots and lots of natural language type tasks give great results, the models seem to have extensive knowledge of many fields. So for some kinds of Q&A bot (technical or not), for copy blurbs, for fiction, game NPCs, etc, the models (especially 13B and up) can be breathtaking, even moreso considering they run on bottom-dollar consumer hardware (I paid $250 for the laptop I'm developing on).

There are of course some things that neither the local LLMs nor GPT4 can do, like create useful OpenSCAD models :)

Things keep getting better, newer quantization methods give you more smarts in the same amount of RAM at basically the same speed -- the models are getting better, there are more permissively licensed ones now.

wokwokwok · on July 7, 2023

Whaaaaat, how are you getting 100ms per token on an 5 year old potato without a graphics card?

Like, not vaguely hand wavey stuff, specifically, what model and what inference code?

I get nothing like that performance for the 7B models, forget the larger models, using llama.cpp on a pc without an nvidia GPU.

pbmonster · on July 7, 2023

I'm running TheBlokes wizard-vicuna-13b-superhot-8k.ggmlv3 with 4-bit quantization on a Ryzen 5 that's probably older than OPs laptop.

I get around 5 tokens a second using the webui that comes with oogabooga using default settings. If I understand correctly, this does not get me 8k context length yet, because oogabooga doesn't have NTK-aware scaled RoPE implemented yet.

Using the same model with the newest kobold.cpp release should provide 8k context, but runs significantly slower.

Note that this model is great at creative writing, and sounding smart when talking about tech stuff, but it sucks horribly at stuff like logic puzzles or (re-)producing factually correct in-depth answers about any topic I'm an expert in. Still at least an order of magnitude below GPT4.

The model is also uncensored, which is amusing after using GPT4. It will happily elaborate on how to mix explosives and it has a dirty mouth.

Interestingly, the model speaks at least half a dozen languages much better than I do, and is proficient at translating between them (far worse than deepL, of course). Which is mindblowing for a 8GByte binary. It's actual black magic.

staticman2 · on July 7, 2023

"Note that this model is great at creative writing"

Could you elaorate on what you mean by that, like, are you telling it to write you a short story and it does a good job? My experiments with using these models for creative writing have not been particularly inspiring.

pbmonster · on July 10, 2023

Yes, having the model write an entire short story or chapter is not very good. It excels if you interact closely with it.

I tested it to create NPCs for fantasy role playing games. I think its the primary reason cobold.cpp exists (hence the name).

You give it a (ideally long, detailed) prompt describing the character traits of the NPCs you want, and maybe even add back and forth dialogue with other characters to the prompt.

And then you just talk to those characters in the scene you set.

There's also "story mode", where you and the model take turns writing a complete story, not only dialogue. So both of you can also provide exposition and events, and the model usually only creates ~10 sentences at a time.

There's communities online providing extremely complex starting prompts and objectives (escape prison, assassin someone at a party and get away, ect.) for the player, and for me, the antagonistic ones (the models has control over NPCs that don't like you) are surprisingly fun.

Note that one of the main drivers of having uncensored open source LLMs is people wanting to role-play erotica with the model. That's why the model that first had scaled RoPE for 8k context length is called "superhot" - and the reason it has 8K context is that people wanted to roleplay longer scenes.

Rastonbury · on July 7, 2023

This is a exactly a case in point why people decide to pay OpenAI instead of rolling their own. I'm non-technical but have setup an image gen app based custom SD model using diffusers, so not entirely clueless.

But for LLM I have no where idea where to start quickly. Finding a model on a leaderboard, download and setup then customising it and benchmarking is way too much time for me, I'll just pay for GPT4 if ever need to instead of chasing and troubleshooting to get some magical result. It'll be easier in the future I'm sure when an open model merges as the SD1.5 of LLM

lhl · on July 7, 2023

I've found https://gpt4all.io/ to be the fastest way to get started. I've also started moving my notes to https://llm-tracker.info/ which should help make it easier for people getting started: https://llm-tracker.info/books/howto-guides/page/getting-sta...

PostOnce · on July 7, 2023

Here is a short test of a 7B 4bit model on an intel 8350U laptop with no AMD/Nvidia GPU.

On that laptop CPU from 2017, using a copy of llama.cpp I compiled 2 days ago (just "make", no special options, no BLAS, etc):

  ./main -m models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin -n 128 -s 99 -p "A short test for Hacker News:"

  llama_print_timings:      sample time =    19.12 ms /    36 runs   (    0.53 ms per token,  1882.65 tokens per second)
  llama_print_timings: prompt eval time =   886.82 ms /     9 tokens (   98.54 ms per token,    10.15 tokens per second)
  llama_print_timings:        eval time =  5507.31 ms /    35 runs   (  157.35 ms per token,     6.36 tokens per second)

and a second run:

  ./main -m models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin -n 128 -s 99 -p "Sherlock Holmes favorite dinner was "

  llama_print_timings:      sample time =    54.37 ms /   102 runs   (    0.53 ms per token,  1875.93 tokens per second)
  llama_print_timings: prompt eval time =   876.94 ms /     9 tokens (   97.44 ms per token,    10.26 tokens per second)
  llama_print_timings:        eval time = 16057.95 ms /   101 runs   (  158.99 ms per token,     6.29 tokens per second)

at 158ms per token, if we guess a word is 2.5 tokens, then that's 151 words per minute, much faster than most people can type. On a $250 laptop. Isn't the future neat?

the code I was running: https://github.com/ggerganov/llama.cpp

and the model: https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML

There are other models that may perform better, I'm going to be doing a lot of screwing around with OpenLLaMA this weekend.

marci · on July 7, 2023

I'm on a thinkpad with a 2016 CPU (i5-7300U) running ubuntu.

I don't know anything so I left default settings.

I get about 450ms/t with airoboros-7b and 350ms/t with orca-mini-3b.

edit: with oobabooga webui

RossBencina · on July 7, 2023

How are you running inference? GPU or CPU? I'm trying to use GPT4All (ggml-based) on 32 cores of E5-v3 hardware and even the 4GB models are depressingly slow as far as I'm concerned (i.e. slower than the GPT4 API, which is barely usable for interactive work). I'd be much obliged if you could point me at a specific quantized model on HF that you think is "fast" and I'll download it and try it out.

lhl · on July 7, 2023

In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.

You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.

That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).

I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)

[1] https://github.com/turboderp/exllama#results-so-far

[2] https://github.com/aigoopy/llm-jeopardy

[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...

[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

ignoramous · on July 9, 2023

> https://github.com/turboderp/exllama

Is exllama an alternative to llama.cpp?

juliensalinas · on July 10, 2023

llama.cpp focuses on optimizing inference on a CPU, while exllama is for inference on a GPU.

ignoramous · on July 10, 2023

Thanks. I thought llama.cpp got CUDA capabilities a while ago? https://github.com/ggerganov/llama.cpp/pull/1827

juliensalinas · on July 11, 2023

Oh it seems you're right, I had missed that.

As far as I can see llama.cpp with CUDA is still a bit slower than ExLLaMA but I never had the chance to do the comparison by myself, and maybe it will change soon as these projects are evolving very quickly. Also I am not exactly sure whether the quality of the output is the same with these 2 implementations.

lhl · on July 13, 2023

Until recently, exllama was significantly faster, but they're about on par now (with llama.cpp pulling ahead on certain hardware or with certain compile-time optimizations now even).

There are a couple big difference as I see it. llama.cpp uses `ggml` encoding for their models. There were a few weeks where they kept making breaking revisions which was annoying, but it seems to have stabilized and now also supports more flexible quantization w/ k-quants. exllamma was built for 4-bit GPTQ quants (compatible w/ GPTQ-for-LLaMA, AutoGPTQ) exclusively. exllama still had an advantage w/ the best multi-GPU scaling out there, but as you say, the projects are evolving quickly, so it's hard to say. It has a smaller focus/community than llama.cpp, which also has its pros and cons.

It's good to have multiple viable options though, especially if you're trying to find something that works best w/ your environment/hardware and I'd recommend anyone to HEAD checkouts a try for both and see which one works best for them.

juliensalinas · on July 14, 2023

Thank you for the update! Do you happen to know if there are quality comparisons somewhere, between llama.cpp and exllama? Also, in terms of VRAM consumption, are they equivalent?

lhl · on July 19, 2023

ExLlama still uses a bit less VRAM than anything else out there: https://github.com/turboderp/exllama#new-implementation - this is sometimes significant since from my personal experience it can support full context on a quantized llama-33b model on a 24GB GPU that can OOM w/ other inference engines.

oobabooga recently did a direct perplexity comparison against various engines/quants: https://oobabooga.github.io/blog/posts/perplexities/

On wikitext, for llama-13b, the perplexity of a q4_K_M GGML on llama.cpp was within 0.3% of the perplexity of a 4-bit 128g desc_act GPTQ on ExLlama, so basically interchangeable.

There are some new quantization formats being proposed like AWQ, SpQR, SqueezeLLM that perform slightly better, but none have been implemented in any real systems yet (the paper for SqueezeLLM is the latest, and has comparison vs AWQ and SpQR if you want to read about it: https://arxiv.org/pdf/2306.07629.pdf)

abhinavkulkarni · on July 15, 2023

Here's one: https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul...

juliensalinas · on July 18, 2023

Thank you.

Gasp0de · on July 7, 2023

Those GPUs are 1200$ and upwards. This is equivalent to 20,000,000 tokens on GPT-4. I don't think I will ever use this many tokens for my personal use.

lhl · on July 7, 2023

I agree that everyone should do their own cost-benefit analysis, especially if they have to buy additional hardware (used RTX 3090s are ~$700 atm), but one important thing to note for those running the numbers is that all your tokens need to be resubmitted for every query. That means, that if you end up using the OpenAI API for long-running tasks like say a code assistant or pair programmer, with an avg of 4K tokens of context, you will pay $0.18/query, or hit $1200 at about 7000 queries. [1] At 100 queries a day, you'll hit that in just over 2 months. (Note, that is 28M tokens. In general tokens go much faster than you think. Even running a tiny subset of lm-eval against will use about 5M tokens.)

If people are mostly using their LLMs for specific tasks, then using cloud providers (Vast.ai and Runpod were cheapest last time I checked) can be cheaper than dedicated hardware, especially if your power costs are high. If you're needs are minimal, Google Colab offers a free tier with a GPU w/ 11GB of VRAM, so you can run 3B/7B quantized models easily.

There are reasons of course irrespective of cost to run your own model (offline access, fine-tuning/running task specific models, large context/other capabilities OpenAI doesn't provide (eg, you can run multi-modal open models now), privacy/PII, BCP/not being dependent on a single vendor, some commercial or other non-ToS allowed tasks, etc).

[1] https://gptforwork.com/tools/openai-chatgpt-api-pricing-calc...

whimsicalism · on July 7, 2023

i think the falcon instruct is considered pretty good but if you are expectation set by gpt4 it still will not compare

ApolloFortyNine · on July 7, 2023

Save for coding they've been pretty good in my experience.

There's definitely some prompt magic openai does behind the scenes that helps beat the raw style local llms usually go for. With proper prompting you can get chatgpt like answers.

bredren · on July 7, 2023

Running an LLM locally and paying for access to OpenAI are two separate concerns.

But to address both: is it very relevant what LLM you use right now? Local or hosted, openAI or other?

It seems like the interface has converged around chat-based prompts.

New ideas for tuning or improving the efficiency of foundational models are published almost every week.

If one wants to build a product on top of of generative AI, why not simply start with what’s free or works with one’s dev environment?

Presumably, the interaction with or API to text-based gen AI will be very similar no matter what engine is best for your use case at any given time.

This would imply these backends will be swappable, the way web services are that copy AWS S3 APIs.

So, to return to the point, can’t people just build their product with openAI or other and plan to move away based on the cost and fit for their circumstances?

Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

It seems far-fetched to believe this tech can be constrained by legislation.

OpenAI can lobby all they want, it won’t necessarily buy them anything. Look what happened with FTX.

Since LLMs can be run locally and the engines be black boxes to the user, how could a legislative act really prevent them from being everywhere—-especially given the public utility.

joaogante · on July 7, 2023

> Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

It can be done -- it is the basis for assisted generation and related work. It does require full access to the model, to be time and money-efficient. See https://huggingface.co/blog/assisted-generation

Disclaimer: I'm the author of the blog post linked above.

ignoramous · on July 7, 2023

> Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

This, infact, might be a better way to do inference anyway: https://twitter.com/Francis_YAO_/status/1675967988925710338

> So, to return to the point, can’t people just build their product with openAI or other and plan to move away based on the cost and fit for their circumstances?

Depends. There are signs that folks are buying into GPT-specific APIs (like function calls) which may not be as easy to migrate away from.

bredren · on July 9, 2023

Asking because I have not implemented these yet: is there anything unique about the syntax that it can't just be copied?

ignoramous · on July 10, 2023

Some (not all) projects are indeed "copying" the OpenAI APIs; ex: https://github.com/go-skynet/LocalAI/issues/588

john2x · on July 6, 2023

Care to share some links? My lack of GPU is the main blocker for me from playing with local-only options.

I have an old laptop with 16GB RAM and no GPU. Can I run these models?

PostOnce · on July 6, 2023

https://github.com/ggerganov/llama.cpp

https://huggingface.co/TheBloke

There's a LocalLLaMA subreddit, irc channels, and a whole big community around the web working on it on GitHub nd elsewhere.

edit: I forgot to directly answer you: yes you can run these models. 16GB of plenty. Different quantizations give you different amounts of smarts and speed. There are tables that tell you how much RAM is needed per which quantization you choose, as well as how fast it can produce results (ms per token). e.g. https://github.com/ggerganov/llama.cpp#quantization where RAM required a little more than the file size, but there are tables that list it explicitly which I don't have immediately at hand.

tensor · on July 6, 2023

A reminder that llama isn't legal for the vast majority of use cases. Unless you signed their contract and then you can use it only for research purposes.

PostOnce · on July 6, 2023

OpenLLaMA is though. https://github.com/openlm-research/open_llama

All of these are surmountable problems.

We can beat OpenAI.

We can drain their moat.

donw · on July 7, 2023

For the above, are the RAM figures system RAM or GPU?

PostOnce · on July 8, 2023

CPU RAM

tensor · on July 7, 2023

Absolutely, 100% agree. I just wouldn't touch the original LLaMA weights. There are many amazing open source models being built that should be used instead.

kordlessagain · on July 7, 2023

> We can drain their moat.

I've got an AI powered sump pump if you need it.

ignoramous · on July 7, 2023

They most certainly don't need / deserve the snark, to be sure, on hacker news of all places.

rvcdbn · on July 6, 2023

We don’t actually know that it’s not legal. The copyrightability of model weights is an open legal question right now afaik.

tensor · on July 6, 2023

It doesn't have to be copyrightable to be intellectual property.

twbarr · on July 6, 2023

No, but what is it? Not your lawyer, not legal advice, but it's not a trade secret, they've given it to researchers. It's not a trademark because it's not an origin identifier. The structure might be patentable, but the weights won't be. It's certainly not a mask work.

It might have been a contract violation for the guy who redistributed it, but I'm not a party to that contract.

Art9681 · on July 7, 2023

I'm going to play devil's advocate and state that a lot of what you mentioned will be relevant to a tiny part of the world that has the means to enforce this. The law will be forced to change as a response to AI. Many debates will be had. Many crap laws will be made by people grasping at straws but it's too late. Putting red tape around this technology puts that nation at a technological disadvantage. I would go as far as labeling a national security threat.

I'm calling it now. Based on what I see today. Europe will position itself as a leader in AI legislation, and its economy will give way to the nations that want to enter the race and grab a chunk of the new economy.

It's a Catch 22. You either gimp your own technological progress, or start a war with a nation that does not. Pretty sure Russia and China don't really care about the ethics behind it. There are plenty of nations capable enough in the same boat.

Now what? OK, so in some hypothetical future China has an uncensored model with free reign over the internet. The US and Europe has banned this. What's stopping anyone from running the Chinese model? There isn't enough money in the world to enforce software laws.

How long have they tried to take down The Pirate Bay? Pretty much every permutation of every software that's ever been banned can be found and ran with impunity if you have the technical knowledge to do so. No law exists that can prevent that.

If it did, OpenAI wouldn't exist.

rileymat2 · on July 7, 2023

> How long have they tried to take down The Pirate Bay? Pretty much every permutation of every software that's ever been banned can be found and ran with impunity if you have the technical knowledge to do so. No law exists that can prevent that.

Forms of this argument get tossed out a lot. Laws don’t prevent, they hopefully limit. Murder has been illegal for a long time, it still happens.

szundi · on July 7, 2023

You missed the point: these laws are not limiting other countries, only those who introduce them. Self-limiting, giving advantage to others.

andsoitis · on July 7, 2023

> It might have been a contract violation for the guy who redistributed it, but I'm not a party to that contract.

Wouldn’t that violate the Nemo dat quod non habet legal principle and so you cannot hide behind the claim that you weren’t party to the contact?

https://en.wikipedia.org/wiki/Nemo_dat_quod_non_habet

zarzavat · on July 7, 2023

No because the weights are not IP protected by the entity that trained the model, so they cannot prevent you to redistribute it because it doesn’t belong to them in any legal sense. GPU cycles alone don’t make IP.

The contracts in these cases are somewhat similar to an NDA, without the secrecy aspect. Restricted disclosure of public information. You can agree to such a contract if you want to, and a court might even enforce it, but it doesn’t affect anybody else’s rights to distribute that information.

Contracts are not statutes, they only bind the people directly involved. To restrict the actions of random strangers, you need to get elected.

rvcdbn · on July 7, 2023

I’m going to go out on a limb here and assume that you’re making this statement because it feels like they should have some intellectual property rights in this case. Independently of whether that feeling corresponds to legal reality (the original question) I would also encourage you to question the source of this feeling. I believe it is rooted in an ideology where information is restricted as property by default. This is a dangerous ideology that constantly threatens to encroach on intellectual freedom e.g. software patents, gene patents. We have a wonderful tradition in the US that information is free by default. It has been much eroded by this ideology but I believe freedom is still legally the default unless the information falls under the criteria of trademark, copyright or patent. I think it’s important to recognize how this ideology of non-freedom has perniciously warped people’s default expectation around information sharing.

tensor · on July 7, 2023

It has nothing to do with any sort of feeling. Perhaps you should check your own mental state.

It is the same as any confidential data. Logs, readings from sensors, etc etc. If it's confidential and given to a 3rd party through a contract that doesn't mean that it's suddenly not confidential data for the rest of the world, even if the 3rd party leaks it.

And if you really have a lawyer trying to tell you that some, at best, extreme grey area, is fine to build a business on, I think you should find a new lawyer.

rvcdbn · on July 8, 2023

I think that just further shows your worldview that defaults to information/data as property. I think this is wrong both in the sense that it isn't really what the law says (but aren't going to agree here anyway) but more importantly I think what it should say. Information should not be and is not property by default. There are only three specific ways in which it can become property ("intellectual property"): copyright, trademark and patent. If it's none of those then the government doesn't get to make any rules about how anyone deals in the data because of the 1st Amendment. That's my understanding of the US system at least.

actionfromafar · on July 6, 2023

Patents? Trademark? What do you mean?

Rexxar · on July 7, 2023

Maybe this: https://en.wikipedia.org/wiki/Database_right but it doesn't exist in every countries.

lhl · on July 7, 2023

This is the most well-maintained list of commercially usable open LLMs: https://github.com/eugeneyan/open-llms

MPT, OpenLLaMA, and Falcon are probably the most generally useful.

For code, Replit Code (specifically replit-code-instruct-glaive) and StarCoder (WizardCoder-15B) are the current top open models and both can be used commercially.

niemandhier · on July 7, 2023

It’s not clear if their license terms would hold, for the moment just act and worry later.

Update: That is only true for the legal system I am currently residing in. No idea about e.g. the US.

jstummbillig · on July 6, 2023

Just a heads up: If you are more interested in being effective than being an evangelist, beware.

While you can run all kinds of GPTs locally, GPT-4 still smokes everything right now – and even it is not actually good enough to not be a lynchpin for a lot of cases yet.

slaymaker1907 · on July 7, 2023

I guess ignoring copyright and treating the whole internet as your training data does have its advantages.

dcow · on July 7, 2023

Yes? That’s the point. Who cares about an outdated concept that has no digital analog? All the artists have moved on already #midjourney.

raxxorraxor · on July 7, 2023

No, I doubt artists have moved on. And if they want no artificial gatekeeper, than it is #stablediffusion instead of #midjourney.

I would argue that it creates better images too.

bombolo · on July 7, 2023

When mirosoft will open up all of their source code, I will agree with you.

logicchains · on July 7, 2023

>GPT-4 still smokes everything right now

Not if you want it to write adult (graphically pornographic or violent) content.

moffkalast · on July 6, 2023

16GB of RAM can fit a 5 bit 13B model at best, they're second dumbest class of LLama model. If Open Orca turns out any good than that might be enough for the time being, but you'll need more RAM to use anything serious.

Here's a handy model comparison chart (this is a coding benchmark, so coding-only models tend to rank higher): https://i.imgur.com/AqSjjj2.jpeg

PostOnce · on July 6, 2023

Your benchmark lacks the current #2 https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

It beats Claude and Bard.

You could probably get a 4bit 15B model going in 16GB of RAM and be approaching GPT4 in capability.

...on an old laptop, lol

Let's eat OpenAI's lunch! They deserve it for trying to steal this tech by "privatizing" a charity, hiding scientific data that was supposed to be shared with us by said charity whose purpose was to help us all, and dishonestly trying to persuade the government not to let us compete with them.

moffkalast · on July 7, 2023

Yeah I mean I wouldn't really include coding models in this list since they're not general purpose models and have an obvious fine tuning edge compared to the rest. But WizardCoder is definitely something to look at as a Copilot replacement.

I'd post a more well rounded benchmark but the problem is that all non-coding benchmarks are currently more or less complete garbage, especially the Vicuna benchmark that rates everything as 99.7% GPT 3.5 lol.

PostOnce · on July 7, 2023

The benchmark you linked was to "programming performance", not generic LLM "intelligence".

The situation for the little guy is wildly better than most people imagine.

moffkalast · on July 7, 2023

Yep, that's what I'm saying, programming performance is seemingly very indicative of model inteligence (assuming it's tuned well enough to be able to run the benchmark at all). Coding is an exercise in problem solving and abstract thinking after all.

There are exceptions of course, as there are a few models (e.g. Vicuna, Baize) that don't do well at coding at all but otherwise perform well for chat, and the coding models I mentioned that game the benchmark by sacrificing performance in all other areas.

If you exclude those, it's very a accurate overall reasoning level comparison, at least it fits most to what I've seen their performance was for various tasks when testing out individual models. The only other valid benchmark that isn't coding are the SAT and LSAT tests that OpenAI runs on all of their models, but afaik there isn't an open version that would be widely used.

tudorw · on July 6, 2023

https://gpt4all.io/index.html

yard2010 · on July 7, 2023

Keep in mind it doesn't relate to GPT4, the 4 in the name is for, not four. But I should try it. TBH openAI shady practices and MS behind them is just an anti trust waiting to happen and I don't want a part in this dystopia

tudorw · on July 8, 2023

also, I fall in love easily with the entities I fabricate, I don't want someone else to have the option to take them away... don't worry, I have real friends too...

tudorw · on July 9, 2023

it does support GPT3.5 turbo and GPT4, you can put in an OpenAI key, it's amongst the model options.

joeythedolphin · on July 6, 2023

Great point -- I was thinking of renewing my $20/subscription but I will keep it cancelled. We must not fund AI propaganda machines.

ed_mercer · on July 7, 2023

Forgive me as I’m out of the loop. What propaganda are you referring to?

joeythedolphin · on July 7, 2023

Sam tells Congress that AI is so dangerous it will extinct humanity. Why? So Congress can license him and only his buddies. Then get goes to euro and speaks with world leaders to remove consumer protection. Why? So he can mine data without any consequences. He is a narcissistic CEO who lies to win. If you are tired of the past decade of electronic corporate tyranny, abuse, manipulation and lies, then boycott OpenAi (should be named ClosedAi) and support open source, or ethical companies (if there are any).

concordDance · on July 7, 2023

> Sam tells Congress that AI is so dangerous it will extinct humanity. Why? So Congress can license him and only his buddies.

No, he says it because its true and concerning.

However, just because AGI has a good chance of making humanity extinct does not mean we're anywhere close to making AIs that capable. LLMs seem like a dead end.

kaba0 · on July 7, 2023

> However, just because AGI has a good chance of making humanity extinct

How? I mean surely it will lead humanity down some chaotic path, but I would fear climate catastrophe much much more than anything AI-related.

ben_w · on July 7, 2023

Imagine if you will that the companies responsible for the carbon emissions get themselves an AI, with no restrictions, and task it to endlessly spew pro-carbon propaganda and anti-green FUD.

That's one of the better outcomes.

A worse outcome is that an unrestricted AI helps walk a depressed and misanthropic teenager through the process of engineering airborne super-AIDS.

Or that someone suffering from a schizophrenic break reads "I Have No Mouth And I Must Scream" and tasks an unrestricted AI to make it real.

Or we have a bug we don't spot and the AI does any of those spontaneously; it's not like bugs are a mysterious thing which only exists in Hollywood plots.

kaba0 · on July 7, 2023

> with no restrictions, and task it to endlessly spew pro-carbon propaganda and anti-green FUD.

So what we have ongoing for half a century?

I honestly don’t see what changes here — super-human intelligence has limited benefits as it scales. Would you suddenly have more power in life, were you twice as smart? If so, we would have math professors as world leaders.

Life can’t be “won” by intelligence, that is only one factor, luck being a very significant other one. Also, if we want to predict the future with AIs we probably shouldn’t be looking at “one-on-one” interactions, as there is not much difference there compared to the status quo — a smart person with whatever motivation could easily do any of your mentioned scenarios. Hell, you couldn’t even tell the difference in theory if it happens through a text-only interface.

Also, it is naive to assume that many scientific breakthroughs are “blocked” by raw intelligence. Especially biology is massively data-limited, which won’t be any more available to an AI than to the researchers at hand, let alone that teenager.

The new dimension such a construct could open up is the complete loss of trust on the internet (which is again pretty close to where we stand today), which can have very profound effects indeed I’m not trying to diminish. But these sci-fi outcomes are just.. naive. It will be more of a newfound chaos with countless intelligent agents taking over the internet with different agendas - but their cumulative impact might very well move us back to closed forums/to the physical world. Which will definitely turn certain long-standing companies on its head. We will see, as this is basically already happening, we don’t need human-level intelligence, GPT’s output is more than enough.

ben_w · on July 7, 2023

> So what we have ongoing for half a century?

Except fully automated, cheaper, and with the capacity to fluently respond to each and every person who cares about the topic.

At GPT-4 prices, a billion words is only about 79800 USD.

> Life can’t be “won” by intelligence, that is only one factor, luck being a very significant other one.

It doesn't need to be the only factor, it just needs to be a factor. Luck in particular is the least helpful counterpoint, as it's not like only one person uses AI at any given moment.

> Especially biology is massively data-limited, which won’t be any more available to an AI than to the researchers at hand, let alone that teenager.

Indeed; I certainly hope this isn't as easy as copy-pasting bits of one of the many common cold virus strains with HIV.

But homebrew synbio and DNA alteration is already a thing.

jabradoodle · on July 7, 2023

> Life can’t be “won” by intelligence

Humans being the dominant life form on Earth may suggest otherwise.

> I honestly don’t see what changes here — super-human intelligence has limited benefits as it scales. Would you suddenly have more power in life, were you twice as smart? If so, we would have math professors as world leaders.

Intelligent humans by definition do not have super human intelligence.

kaba0 · on July 7, 2023

We know that this amount of intelligence was a huge evolutionary advantage. That tells us nothing whether being twice as smart would continue to give better results. But arguably the advantages of intelligence are diminishing, otherwise we would have much smarter people in more powerful positions.

Also, a big tongue in cheek but someone like John von Neumann definitely had superhuman intelligence.

ben_w · on July 7, 2023

> But arguably the advantages of intelligence are diminishing, otherwise we would have much smarter people in more powerful positions.

Smart people get what they want more often than less smart people. This can include positions of power, but not always — leadership decisions come with the cost of being responsible for things going wrong, so people who have a sense of responsibility (or empathy for those who suffer from their inevitable mistakes) can feel it's not for them.

This is despite the fact that successful power-seeking enables one to get more stuff done. (My impression of Musk is he's one who seeks arbitrary large power to get as much as possible done; I'm very confused about if he feels empathy towards those under him or not, as I see a very different personality between everything Twitter and everything SpaceX).

And even really dumb leaders (of today, not inbred monarchies) are generally above average intelligence.

kaba0 · on July 7, 2023

That doesn’t contradict what I said. There is definitely a huge benefit to an IQ 110 over 70. But there is not that big a jump between 110 and 150, let alone even further.

ben_w · on July 7, 2023

Really? You don't see a contradiction in me saying: "get what they want" != "get leadership position"?

A smart AI that also doesn't want power is, if I understand his fears right, something Yudkowsky would be 80% fine with; power-seeking is one of the reasons to expect a sufficiently smart AI that's been given a badly phrased goal to take over.

I don't think anyone has yet got a way to even score AI on power-seeking, let alone measure them, let alone engineer it, but hopefully something like that will come out of the super-alignment research position OpenAI also just announced.

I would be surprised if the average IQ of major leaders is less than 120, and anything over 130 is in the "we didn't get a big enough sample side to validate the test" region. I'm somewhere in the latter region, and power over others doesn't motivate me at all, if anything it seems like manipulation and that repulses me.

I didn't think of this previously, but I should've also mentioned there are biological fitness constraints that stop our heads getting bigger even if the IQ itself would be otherwise helpful, and our brains are unusually high power draws… but that's by biological standards, it's only 20 watts, which even personal computers can easily surpass.

jabradoodle · on July 7, 2023

On a serious note though a person with an IQ of 150 can't clone themselves 10k times.

They also tend to have some level of autonomy in not following the orders of idiots and psychopaths.

dfadsadsf · on July 7, 2023

At this point there are no evidence that climate catastrophe that can make human extinct is either likely or possible - at least due to global warming. At worst some coastal regions get flooded and places around equator become unlivable without AC. Some people will have to move but it does not make anyone extinct.

We should absolutely care about nature and our impact on it but climate alarmism is not a way to go.

concordDance · on July 7, 2023

Note that I said AGI there, not AI. The full AGI X-risk case is hundreds of pages, unsuitable for a hackernews discussion.

To oversimplify to the point of wrongness: Essentially how humans dominated our world, by being smarter.

kaba0 · on July 7, 2023

By being smarter by a lot than animals. But Neanderthals were arguably even smarter (bigger brain capacity at least), and they have not become the dominant species (though neither were killed off as “lesser” humanoids, but mostly merged).

regularfry · on July 7, 2023

> No, he says it because its true and concerning.

Both can be true. It is extremely convenient to someone who already has an asset if the nature of that asset means they can make a convincing argument that they should be granted a monopoly.

> LLMs seem like a dead end.

In support of your argument, bear in mind that he's making his argument with knowledge of what un-nerfed LLMs at GPT-4 level are capable of.

ben_w · on July 7, 2023

> It is extremely convenient to someone who already has an asset if the nature of that asset means they can make a convincing argument that they should be granted a monopoly.

While this is absolutely true, it's extremely unlikely that a de jure monopoly would end up at OpenAI's feet rather than any of the FAANGs'. Even in just the USA, and the rest of the world has very different attitudes to risks, freedoms, and data processing.

Not that this proves the opposite — there's enough recent examples of smart people doing dumb things, and even without that the possibility of money can inspire foolishness in most of us.

regularfry · on July 7, 2023

> While this is absolutely true, it's extremely unlikely that a de jure monopoly would end up at OpenAI's feet rather than any of the FAANGs'

Possibly. The Microsoft tie-up complicates things a bit from that point of view. It wouldn't shock me if we were all using Azure GPT-5 in a few years' time.

ben_w · on July 7, 2023

It's possible, I don't put much weight on it given all the anti-trust actions past and present, but it's possible.

moffkalast · on July 7, 2023

> its true and concerning

> LLMs seem like a dead end

These would seem contradictory. If you really think that both are true and Altman knows it, then you're saying he's a hype man lying for regulatory capture. And to some extent he definitely is overblowing the danger for his own gain.

I really doubt they are a dead end though, we've barely started to explore what they can do. There's a lot more that can be extracted from existing datasets, multimodality, gains in GPU power to wait for, fine tunes for use cases that don't even have datasets yet, etc. Just the absolute mountain of things we've learned since LLama came out are enough to warrant base model retrains.

ben_w · on July 7, 2023

> These would seem contradictory.

Only if you believe that LLM is a synonym for AI, which OpenAI doesn't.

The things Altman have said seem entirely compatible with "the danger to humanity is ahead of us, not here and now", although in part that's because of the effort put into making GPT-4 refuse to write propaganda for Al Quaida, as per the red team safety report they published at the same time as releasing the model.

Other people are very concerned with here-and-now harms from AI, but that's stuff like "AI perpetuates existing stereotypes" and "when the AI reaches a bad decision, who do you turn to to get it overturned?" and "can we, like, not put autonomous tasers onto the Boston Dynamics Spot dogs we're using as cheap police substitutes?"

concordDance · on July 7, 2023

A dead end for human+ level AGI, they will still be useful.

raxxorraxor · on July 7, 2023

And he should get an exclusive licence for that. I don't think it is the time for religion here.

NortySpock · on July 7, 2023

These ChatGPT tools allow anyone to write short marketing and propaganda prompts. They can then take the resulting paragraphs of puffery and post them using bots or sock puppets to whatever target community to create the illusion of action, consensus, conflict, discussion or dissention.

It used to be this took a few people to come up with writing actual responses to forum posts all day, or marketing operations plans, or pro- or anti-thing propaganda plans.

But now, you could astroturf a movement with a GPU, a ChatGPT clone, some bots and vpns hosted from a single computer, a cron job, and one human running it.

If you thought disinformation was bad 2 years ago, get ready for fully automated disinformation that can be targeted down to an online community or specific user in an online community...

kaba0 · on July 7, 2023

I believe a new wave of authentication might come out of this, where it is tied to citizenship for example (or something related to physical reality). Otherwise we will find ourselves in a truly chaotic situation.

kristianp · on July 7, 2023

Gpt-4 runs on 8 x 220B params[1] and gpt is about 220B params(?). Local LLMs can be good for some tasks, but they are much slower and less capable than the size of model and hardware that openai brings to their apis. Even running a 7B model on the CPU in ggml is much slower than the gpt-3-turbo api, in my experience with a 12th gen i7 intel laptop.

[1] GPT4 is 8 x 220B params = 1.7T params: https://news.ycombinator.com/item?id=36413296

Art9681 · on July 7, 2023

It's been well documented by now that the number of parameters does not necessarily translate to a better model. My guess is that OpenAI has learned a thing or two from the endless papers published daily that your "instance" of the model is not what it seems. They likely have a workflow that picks the best model suitable for your prompt. Some people may get a 13B permutation because it is "good enough" to produce a common answer to a common prompt. Why waste precious compute resources on a prompt that is common? Would it not be feasible to collect the data of the top worldwide prompts and produce a small model that can answer those? Why would OpenAI spend precious compute time on the typical user's "write a short story of...".

I would guesstimate that the great majority of prompts are trash. People playing with a toy and amusing themselves. The platform sends those to the trash models.

For the other tiny percentage that produces a prompt the size of a paragraph, using the techniques published by OpenAI themselves, they likely get the higher tier models. This is also why I believe many are recently complaining about the quality of the outputs. When your chat history is filled with "have waifu pretend to be my girlfriend" then whatever memory the model is maintaining will be poisoned by the quality of your past prompts.

Garbage in, garbage out. I am certain that the #1 priority for OpenAI/Microsoft is lowering the cost of each prompt while satisfying the majority.

The majority is not in HN.

ranguna · on July 7, 2023

> It's been well documented by now that the number of parameters does not necessarily translate to a better model.

That's certainly true, but it's hard to deny the quality of gpt 4. If the issue is the training data, let's just use their training data, it's not like they had to close up shop because of using restricted data.

I think the issue is more on the financial side, it must have been extremely expensive to train gpt 4. Open source models don't have that kind of money right now.

I'll finance open source models once they are actually good, or show realistic promises of reaching that level of quality on consumer hardware. Until then, open source will open source.

I've never bought any kind of subscription or paid api costs to openai, but if gpt 4 finally reached the point where I feel like it's a lot better than just good enough, I'll happily pay for it (while still being on the lookout for open source models that fit my hardware).

nethdeco · on July 7, 2023

Picking the best model based on the prompt seems to be the best way to simplify the task they are doing.

w3designer · on July 7, 2023

It does seem like a good approach, though that seems to imply that they understand the context of the prompt being entered. Has anyone tackled this context sensitive model routing? It seems like a good approach, but likely not straightforward.

PostOnce · on July 7, 2023

https://mpost.io/phi-1-a-compact-language-model-outpaces-gpt...

a 1billion parameter model beats 175billion parameter GPT3.5

OpenAI wants us all to drink the kool-aid.

andai · on July 6, 2023

Which models are you using and for which tasks? I have found local models largely a waste of time (except for very simple tasks with very heavy prompting). But perhaps there are some recent breakthroughs I haven't seen yet.

PostOnce · on July 7, 2023

I'm using a variety of 7 and 13B models (and a 3B one for fast feedback loop debugging) at between 8bit and 4_K_M quantizations.

Depending on your pre-prompt, your fine-tune (i.e. which model you downloaded), and your specific task, the results can be startlingly good, it's crazy that you can do this on a $250 laptop. I stay up nights working on it lately, it's so interesting.

More importantly, things change by the day. New models, new methods, new software, new interfaces... the possibilities are endless... unless we let OpenAI corrupt our government(s).

redox99 · on July 7, 2023

I'm surprised you're having such a good time with 7B and 13B models. I find anything below 33B to be almost useless. And only 65B is close to GPT 3.5.

mdale · on July 7, 2023

I don't think the "corrupt our government" thing is going to happen . The wave of change is too large the tech is moving too fast and into evey facet of data and software. There is competition globally and locally; a regulatory slow down is unlikely.

Jzush · on July 7, 2023

I’m currently using the free tier ChatGPT web interface to help me with mundane coding tasks like JavaScript, php or css.

Is there a local solution that is at least as intelligent as GPT 3.5 in that regard that I can run in a container?

gowld · on July 6, 2023

There's no need to run locally if you aren't utilizing 8 hrs/day.

You can rent time on a hosted GPU, sharing a hosted model with others.

willsmith72 · on July 7, 2023

My laptop already works too hard doing development and having chrome open, it's just not feasible. A good hosted alternative, sure, but local is not going to scale to the masses.

PostOnce · on July 7, 2023

I have a Dell 7490 (intel 8350u cpu) I paid $250 for and I have no trouble running 13B models through a custom interactive interface I wrote as a hobby project in an afternoon. It can still get a lot better. I made it async the following day and its even more fun.

Most of peoples' problem is watching the AI type, it's not instant, but then not all (or even most) applications need to be instant. You can also avoid that by having it return everything at once instead of streaming style.

Local absolutely can scale. All kinds of fun things can be done on a machine with 16GB of RAM, or 8GB if you work harder.

DJHenk · on July 7, 2023

> Most of peoples' problem is watching the AI type, it's not instant, but then not all (or even most) applications need to be instant. You can also avoid that by having it return everything at once instead of streaming style.

Funny, for me it is the complete opposite. I created an interface in Matrix that does just that: return everything at once. But the lag annoys me more than the slow typing in the regular chat interface. The slow typing helps me keep me focused on the conversation. Without it, my mind starts wandering while it waits.

Obscurity4340 · on July 7, 2023

Where can we aquire or access these local LLMs? How much space and specs does it actually require?

dagaci · on July 7, 2023

https://gpt4all.io/index.html is a good place to start, you can literally download one of the many recommended models.

https://github.com/imartinez/privateGPT is great if you want do it with code.

EagnaIonat · on July 7, 2023

Huggingface has them all.

https://huggingface.co

concordDance · on July 7, 2023

> OpenAI has no moat, unless you give them money to write legislation.

Their moat is that they had access to data sources which have since been clamped down on, eg reddit and twitter apis.

travisjungroth · on July 7, 2023

You can still download Reddit archives with the same data they used.

musha68k · on July 6, 2023

One has to give them credit for what must be the most grandiose stunt actually landed. And on so many angles! “It just works” - they even got the scientists fully aligned! Fiercely smart industriousness.

https://youtu.be/P_ACcQxJIsg?t=5946

jonplackett · on July 6, 2023

No equity? For real? He really does need an agent if that's the case.

darkerside · on July 6, 2023

Wow, under penalty of perjury

whimsicalism · on July 7, 2023

If you listen to him talk at any point, you can see him explain why.

lynx23 · on July 7, 2023

I tried, and decided it is not worth it. llama.cpp with a 13B model fit into RAM of my laptop, but pushes CPU temperature to 95 degrees within a few seconds, and mightily sucks the battery dry. Besides, the results were slow and rather useless. GPT is the first cloud application I deliberately use to push off computing and energy consumption to an external host which is clearly more capable of handling the request then my local hardware.

I sympathize with the idea of wanting to run a local LLM, but IMO, this would require building a desktop with a GPU and plenty of horsepower + silent cooling and put it somewhere in a closet in my apartment. Running LLMs on my laptop is (to me) clearly a waste of my time and its battery/cooling.

regularfry · on July 7, 2023

So I do actually want a really good games machine, and an AI worker box. Since I can't both use inference output and play games at the same time, having a ludicrously over-specced desktop for both uses actually makes sense to me.

two_in_one · on July 7, 2023

I see no moral problems paying OpenAI for GPT Plus. it helps a lot in development. Their free speech-to-text 'whisper' is really good too. I'm going to use it + small local GPT for voice control.

> I can currently run some scary smart and fast LLMs on a 5 year old laptop with no GPU.

And, something useful or just playing? I played with local models, and will keep playing, training, experimenting. It's interesting, but not a solution, not yet.

two_in_one · on July 7, 2023

I'll take downvote as a sign you have nothing to say :) Just one warning, bad karma will be hard to fix.

klysm · on July 7, 2023

Not as good as chatgpt 4 unfortunately, and they do have a moat. You could argue the most will fall in time but I’m not seeing chatgpt4 equivalents at the moment

marinhero · on July 7, 2023

Make a tutorial?

VladimirGolovin · on July 7, 2023

Can you recommend some local LLMs that are (roughly) equivalent to ChatGPT?

dimgl · on July 6, 2023

I'd love to get into AI and AI development. Where can I start?