I was wondering if you have come across any models that can answer questions based on information from a single source and that source alone (a website or document)?
I asked GPT 4, Claude 1 & 2, and Bard questions about a single documentation page and they all gave both incorrect and hallucinated answers. For example, I asked them to list all functions that accepted integers based on provided documentation. Their responses included functions that didn't accept integers and functions never mentioned in the documentation.
> H2O LLM Studio requires a machine with Ubuntu 16.04+ and at least one recent Nvidia GPU with Nvidia drivers version >= 470.57.02. For larger models, we recommend at least 24GB of GPU memory.
why is downloading a docker image and using docker from the command line less of a deterrent for you than downloading a model and using it from the command line?
Because it's impossible to get started as someone who has no background in ML. I tried earlier this year and also failed.
1. Obtain the Llama models. Apparently you have to sign up for access and get a download link? I don't want to do that, found some public download links instead. Ok, now I have a few hundred GBs of model files.
2. Compile llama.cpp. Missing dependencies, took an hour to figure out how to resolve.
3. Quantize the model? What does that even mean?
4. Install pytorch. Run the command in the README. Python exception.
5. Install NVIDIA helper libraries. Doesn't work. Try installing the AMD helpers to run on CPU instead. No instructions for how to do this. Eventually figured it out.
6. Try running pytorch again. Same exception.
I gave up after a full day of trying to make this work. The Docker image is for people like me.
This isn't really fair. There are many manuals with rapidly changing or incomplete information, or information only months old that is no longer accurate.
This is a fast moving space and can definitely be confusing.
I agree in principle, however you appear to be arguing in bad faith.
The person you responded outlined their process (to a level you can look at and see they clearly made an effort) and stated they put a full day in. This is not "spoiled lazy child syndrome", although there are for sure those out there who fit that mold.
Did you ever figure out why? I've been using Ollama on my M2 air for quite a few weeks now and have never had that issue. If it was your first time running, it should show the output of it downloading the model.
This will unlock a ton of Enterprise use cases. I have met with a number of enterprises that have a hard no on sending much of their data into OpenAI, even on Azure. I think this will persist as the brand faces many hurdles with respect for IP, in public perception.
You could go even lower with smaller quantization if necessary. I personally wouldn't use anything smaller than 7B and Mistral already pushing it in coherence. Overall it depends on your use case, not everyone needs smart models, or large context that sometimes takes half of required memory.
Codellama is also surprisingly good even for non-coding tasks
What is the most simple docker-compose.yml needed to get this running? (No I don’t want to run that command they show I want to use docker compose and am not an expert yet) - thanks
I mean, at this point it should be trivial to turn this into a helm chart or something, though I haven't tried setting up a cluster with gpu access (if that is desired)
He just explained that he’s using open source tools to take other people’s hard work and re-host it, using an account he set up seemingly just to promote this service. Looking at the website itself I don’t think it’s particularly offensive but I could see why people might not like it.
An AI summary of the article by definition contains information from the article; otherwise it would not be a summary. Whether or not you consider this rehosting is muddy but some people might, which is all my comment was saying.
https://youtu.be/FIZt7MinpMY?si=fzcmKq5cw2bniH26