Hacker News new | past | comments | ask | show | jobs | submit | more a5huynh's comments login

As an alternative for purely local LLMs, I've been having fun with this setup: https://github.com/oobabooga/text-generation-webui


Neat, eventually would like to run a purely local LLM Emacs shell https://github.com/xenodium/chatgpt-shell. For now ChatGPT only, but working on making more generic/reusable.


The oobabooga setup feels a lot more mature and has a larger community. Skimming OP's repo, there seems like a lot of fiddling and faffing with JSON to get things running.

Nice stuff all the same.


Do you know what is the way to find models compatible with oobabooga/text-generation-webui ? I downloaded one with the included script and that worked, but if I try different ones it seems there are so many formats so no idea say how do I search huggingface or google to find the correct format. I would like to try this new quantized LLAMA versions with the GUI, I can run them in the CLI on the CPU but llama.cpp uses ggml formats .


https://rentry.org/nur779 (scroll down to the Huggingface ones)


Note that what they released are the delta weights from the og LLaMa model. To play around with it, you'll need to grab the original LLaMA 13B model and apply the changes.

  > We release Vicuna weights as delta weights to comply with the LLaMA model
  > license. You can add our delta to the original LLaMA weights to obtain
  > the Vicuna weights.
Edit: took me a while to find it, here's a direct link to the delta weights: https://huggingface.co/lmsys/vicuna-13b-delta-v0


That's what they say but I just spent 10 minutes searching the git repo, reading the relavent .py files and looking at their homepage and the vicuna-7b-delta and vicuna-13b-delta-v0 files are no where to be found. Am I blind or did they announce a release without actually releasing?


If you follow this command in their instruction, the delta will be automatically downloaded and applied to the base model. https://github.com/lm-sys/FastChat#vicuna-13b: `python3 -m fastchat.model.apply_delta --base /path/to/llama-13b --target /output/path/to/vicuna-13b --delta lmsys/vicuna-13b-delta-v0`


This can be then quantized to the llama.cpp/gpt4all format, right? Specifically, this only tweaks the existing weights slightly, without changing the structure?


I may have missed the detail, but it also expects the pytorch conversion rather than original LLaMa model.


Yes, you need to convert the original LLaMA model to the huggingface format, according to https://github.com/lm-sys/FastChat#vicuna-weights and https://huggingface.co/docs/transformers/main/model_doc/llam...


You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b) The delta weights are hosted on huggingface and will be automatically downloaded.


Thanks! https://huggingface.co/lmsys/vicuna-13b-delta-v0

Edit, later: I found some instructive pages on how to use the vicuna weights with llama.cpp (https://lmsysvicuna.miraheze.org/wiki/How_to_use_Vicuna#Use_...) and pre-made ggml format compatible 4-bit quantized vicuna weights, https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/tree/ma... (8GB ready to go, no 60+GB RAM steps needed)


I did try, but got:

``` ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. ```


> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.


Just rename it in the tokenconfig.json


Thanks, that indeed worked!

This and using conda in wsl2, instead on bare windows


so an extra licensing issue to get around the original non commercial license... this is just a research curiosity is it not?


Seems that way, it would probably be a bad idea to use this for anything commercial at the very least.


Vicuna at huggingface.com? This keeps making me think of "facehuggers" from Aliens and Vecna from Stranger Things.

(I know a vicuna is a llama like animal.)


Not a lawyer, but that still feels like dubious territory. I would still be on the hook for acquiring the original download, which Facebook has been launching dmca takedown requests for the llama-dl project.


(I work on llama-dl.)

We’re fighting back against the DMCA requests on the basis that NN weights aren’t copyrightable. This thread has details: https://news.ycombinator.com/item?id=35393782

I don't think you have to worry about Facebook going after you. The worst that will happen is that they issue a DMCA, in which case your project gets knocked offline. I don’t think they’ll be going the RIAA route of suing individual hackers.

The DMCAs were also launched by a third party law firm, not Meta themselves, so there’s a bit of “left hand doesn’t know what the right hand is doing” in all of this.

I’ll keep everyone updated. For now, hack freely.


If they aren't copyrightable, couldn't they still be classes as a trade secret and still fall under IP law? Though I'm not sure if distributing the weights to people who sign a simple agreement to not redistribute would count as taking reasonable precautions in maintaining secrecy.


If facebook freely distributed their trade secrets, I'm not sure they'd have any legal defense.


I'm sure they wouldn't have any legal recourse on the trade secrets front if they distributed them to anyone who asked...


keep up god's work!


> god's work

creating sentient life?


That can't be his work, since he only picked up that hobby about 0.000625% of the universe's timespan ago.


For many humans, some "hobbies" involve "projects" which may involve seemingly infinite degrees of procrastination. (This certainly applies to me!)


You're not wrong - but for perspective this is equivalent to a 90 year old picking up a hobby 5 hours ago.


Is it though? It could be a child picking up a hobby after being old enough to appreciate the hobby. There is so much more time left in the universe before heat death, so the 90y metaphor doesn't really describe the current point in time


Gotta do something in your old age. Better than crossword puzzles, I'll bet.


Lemme save whoever is donating the legal here the time: model weights are definitely copyrightable.


Usually, you don't know if something is "definitely" anything in the legal world unless it's been tested in court. You have any case you want to reference here? Or what makes you so certain?


> model weights are definitely copyrightable.

on what legal theory or precedence makes this true?

IMHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable; only the layouts and expressive portion of a phone directory is copyrightable.

So to make the weights copyrightable, it needs to be argued that the 'layout' of the weight is a creative expression, rather than a 'fact'. But the weights are matrices , which is not expressive or creative. Someone else could derive this exact same set of weights from scratch via the same algorithmic procedure, and therefore, these weights cannot be a creative expression.


"Definitely" is too certain w.r.t. law, but it's pretty obvious how you'd argue these fall under copyright. The difficulty would really be the opposite, it'd be arguing the weights are not derived works of the copyrighted input data sets.

Firstly, weights are not merely a collection of facts like a telephone book is. If two companies train two LLMs they'll get different weights every time. The weights are fundamentally derived from the creative choices they make around hyperparameter selection, training data choices, algorithmic tweaks etc.

Secondly, weights can be considered software and software is copyrightable. You might consider it obvious that weights are not software, but to argue this you'd need an argument that also generalizes to other things that are commonly considered to be copyrightable like compiled binaries, application data files and so on. You'd also need to tackle the argument that weights have no value without the software that uses them (and thus are an extension of that software).

Finally, there's the practical argument. Weights should be copyrightable because they cost a lot of money to produce, society benefits from having large models exist, and this requires them to be treated as the private property of whoever creates them. This latter one should in theory more be a political matter, but copyright law is vague enough that it can come down to a social decision by judges.


I agree but I'd suggest that weights are less like the telephone numbers in a directory and much more like the proportional weights in a recipe.

Recipes, famously, are almost but not quite copyrightable | patentable.

eg:

https://copyrightalliance.org/are-recipes-cookbooks-protecte...

https://etheringtons.com.au/are-recipes-protected-by-copyrig...


> MHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable

I would contest the analogy, but even if we accept it, it's still not clear whether phone directories (or other compilation of factual data) are definitely not copyrightable. The position is clear in the US, but in the UK and presumably other jurisdictions, I wouldn't be so sure.

You could claim we're just talking about US law here, but if you release something on github/huggingface without geo-restrictions, and your company does business in Europe, you might not only have to comply with US law...

eg. https://www.jstor.org/stable/24866738 , eg. https://books.google.com.hk/books?id=wHJBemWuPT4C&pg=PA114&l...


Ok. What if I train it for one micro step?


thanks zero comment bot account!


If NN weights aren't protected by IP law that could slow down progress quite a lot. That could be very good for people worried about alignment.


>If NN weights aren't protected by IP law that could slow down progress quite a lot.

What do you mean? IP law is overwhelmingly an impediment to progress; innovation happens faster when people are free to build on existing weights.


Yes, but there's less incentive for large companies to spend huge amounts of money training these systems when other companies can just take their work for free.

Removing IP protection would make it a lot easier to innovate at this level, but it would reduce the amount of money flowing into getting us to the next level.


Or development could shift out of the hands of these large corporations, which might be a good thing.

Somehow, though, I doubt they'll let the golden goose slip through their fingers, no matter what happens.


Not really. This model only made it to the public because meta was offering it publicly.

This won't happen to GPT any time soon so they are safe, copyright or not.


I'm curious, do you not think this might have adverse effects? Namely, if NN weights aren't copyrightable, limited releases like Meta has done might not be possible anymore so they might just cease completely with releases, ultimately leading access to large models to be more restricted.


I think we already live in that era, unfortunately. Meta's model release is probably going to be the largest for some years.

There's more detail about the upsides/downsides in this thread: https://twitter.com/theshawwn/status/1641804013791215619


i honestly do not know what is worse from the three realistic alternatives:

1- to have large corporations and people with privileged access to them have these models exclusively and have them collaborate as a clique

2- to have those models openly released to everybody, or de-facto released to everybody as they leak in short order

3- to have the people who think releasing models is a bad thing simply not release them and work alone in their proprietary solutions, as the smaller companies and hobbyists do collaborate

i say let them have a go at number 3 and see how that works for them - shades of "Microsoft Network" vs Internet all over again


The llama-dl project actually helped you download the weights, whereas this just assumes you already have them. That feels like a pretty massive difference to me.


It's fairly similar to a ROM patch in the video game space, which has mostly stood the test of time.


With a ROM, you could at least make a claim that it was your backup copy. I have no such claims to Facebook’s model.


Researchers unaffiliated with Facebook are allowed to possess and use the original weights though, and they can make use of these weights.


like that but requiring 60GBs of CPU RAM for some reason :-P

one has to wonder how did they implement the storage of those deltas to require that sort of RAM


For perspective, that's about $200-$250 of RAM on a desktop computer. They might just not have cared.

Though I expect somebody to write a patch to make this more accessible to people on laptops.



Nobody at Facebook approved it? Given the attention it has received, hard to imagine it has slipped through the cracks, but a deliberate decision to not address.


Very unlikely you'd face any legal action for usage of anything. If you share it, then it becomes less unlikely.

Edit: Also, judging by a comment from the team in the GitHub repository (https://github.com/lm-sys/FastChat/issues/86#issuecomment-14...), they seem to at least hint about been in contact with the llama team.


You can calculate them yourself as well! huggingface has a great article on this: https://huggingface.co/blog/getting-started-with-embeddings

tl;dr, use: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...


Thanks, but I already worked with thus model and it was not good at all for my domain. Therefore, I wanted to finetrain llama for my domain and then use llama for embeddings. Should I finetune this model then?


(I want to focus more attention on that "tl;dr", which I will arguing is carrying a lot of load in that response: the high-level answer to how one does this using the llama weights is "you don't, as that isn't the right kind of model; you need to use a different model, of which there are many".)


I've run into similar issues and found that the `thiserror` crate (https://crates.io/crates/thiserror) combined w/ anyhow makes a lot of that pain go away


For those reading this that aren't super familiar, common Rust advice is "use thiserror for libraries and anyhow for applications," as they make slightly different tradeoffs and so are useful, especially together.


`anyhow` + `?` make writing an application as smooth as butter. You won't miss exceptions.

Don't use `anyhow` for libraries, though. You want to provide your consumers the ability to `match`.


here we go again!


I would add `snafu`(https://crates.io/crates/snafu) here as a good alternative to thiserror+anyhow.


Shameless self-plug, I've been building some similar that you can run locally as an app: https://github.com/a5huynh/spyglass

You can define some basic rules & it'll go out and crawl those particular sites. Or use one that someone else has built. It can also sync with your Chrome/Firefox bookmarks. Would love feedback from folks who get a chance to use it !


In addition to React/Vue, you have the ability to use Rust client-side frameworks such as Yew[0] or seed[1] for a truly full-stack Rust experience. I've been using Tauri + Yew to build an cross-platform app (shameless plug: [2]) and it's been a pleasure. There are some rough edges, but most of those have been due to platform specific issues (notifications, Windows vs Unix paths, etc.) and not Tauri/Yew itself.

[0] https://yew.rs/

[1] https://github.com/seed-rs/seed

[2] https://github.com/a5huynh/spyglass - Create your own personal search engine by crawling & indexing files/docs/websites that you want.


For those looking for an alternative to that, I've been building a self-hosted search engine that crawls what you want based on a basic set of rules. It can be a list of domains, a very specific list of URLs, and/or even some basic regexes.

https://github.com/a5huynh/spyglass


Great project! Given a local archive of Wikipedia and other sources, this can be very powerful.

Which raises the question: does archive.org offer their Wayback Machine index for download anywhere? Technically, why should anyone go through the trouble of crawling the web if archive.org has been doing it for years, and likely has one of the best indexes around? I've seen some 3rd-party downloaders for specific sites, but I'd like the full thing. Yes, I realize it's probably petabytes of data, but maybe it could be trimmed down to just the most recent crawls.

If there was a way of having that index locally, it would make a very powerful search engine with a tool like yours.


A bit different, but I've been building something similar that runs locally: https://github.com/a5huynh/spyglass

You create some rules for topics you want to index and it'll go out and crawl them. Searching through it is a global hotkey away.


Spyglass looks amazing! Thank you for this.


Thank you! Feel free to ping me on Github/Discord if you run into any issues, always looking to make it better :)


Thanks for the feedback! I’ll keep that in mind as this is built out. Fortunately the initial bootstrapping uses data from the Internet Archive and the crawls afterwards is to check for updates (at a reasonable rate). The number of URLs being hit is much much lower in the end than you would think.


Hey HN, I'm building an open source search platform that lives on your device, indexing what you want, exposing it to you in a super simple & super fast interface.

I took the idea of adding "site:reddit.com" to your Google searches and expanded on it with the idea of "lenses" to add context to your search query and give the crawler direction in terms of what to crawl & index. This means that all queries are run locally, it does not relay your search to any 3rd-party search engine. Think of it as your personal bookcase at home vs. the Library of Congress.

It's still in a super early state but would love for people to start using it and providing some feedback and see what sort of lenses people want to build and search through!

Some details about the stack for the interested:

    * All Rust w/ some HTML/CSS for the client.

    * Client is built w/ yew + tauri

    * Backend uses tantivy to index the web pages, sqlite3 to hold metadata / crawl queue
Thanks in advance!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: