Hacker News new | past | comments | ask | show | jobs | submit login
Ollama is now available as an official Docker image (ollama.ai)
197 points by alexzeitler on Oct 6, 2023 | hide | past | favorite | 47 comments



Ollama is great, I recently integrated it into my text editor (Neovim), now I have local AI assistance when coding/writing text.

https://youtu.be/FIZt7MinpMY?si=fzcmKq5cw2bniH26



I havent touched llm stuff because the barrier to entry felt pretty high. This has reduced the barrier so much if it works as promised.


I also highly recommend trying LMStudio. It’s a nice interface for downloading, using, and holding LLMs locally


I was wondering if you have come across any models that can answer questions based on information from a single source and that source alone (a website or document)?

I asked GPT 4, Claude 1 & 2, and Bard questions about a single documentation page and they all gave both incorrect and hallucinated answers. For example, I asked them to list all functions that accepted integers based on provided documentation. Their responses included functions that didn't accept integers and functions never mentioned in the documentation.


> H2O LLM Studio requires a machine with Ubuntu 16.04+ and at least one recent Nvidia GPU with Nvidia drivers version >= 470.57.02. For larger models, we recommend at least 24GB of GPU memory.


yes but it's closed source, no?


why is downloading a docker image and using docker from the command line less of a deterrent for you than downloading a model and using it from the command line?


Because it's impossible to get started as someone who has no background in ML. I tried earlier this year and also failed.

1. Obtain the Llama models. Apparently you have to sign up for access and get a download link? I don't want to do that, found some public download links instead. Ok, now I have a few hundred GBs of model files.

2. Compile llama.cpp. Missing dependencies, took an hour to figure out how to resolve.

3. Quantize the model? What does that even mean?

4. Install pytorch. Run the command in the README. Python exception.

5. Install NVIDIA helper libraries. Doesn't work. Try installing the AMD helpers to run on CPU instead. No instructions for how to do this. Eventually figured it out.

6. Try running pytorch again. Same exception.

I gave up after a full day of trying to make this work. The Docker image is for people like me.


Yeah it's great, you can get this running in no time (even without gpu, just to play with it):

  docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
  docker exec -it ollama ollama run llama2
And that's all!


And with GPU just do this for the first command

    docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
You'll need to have the NVidia container toolkit installed.


Thank you!

For docker-compose, do

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
Docs: https://docs.docker.com/compose/gpu-support/


It's nice to have this possibility on Linux now. It wasn't much more complicated on macOS before:

    brew install ollama
    brew services start ollama
    ollama run llama2
Works great on my M1 MacBook Air, although it's definitely not as good as ChatGPT.


Btw, the quantized models are on Huggingface, so step 1 and steps 3-6 can be avoided. (The link is 3/4ths into the llama.cpp README)


Oh no, gotta RTFM

Why is everyone so afraid to get their hands dirty?


This isn't really fair. There are many manuals with rapidly changing or incomplete information, or information only months old that is no longer accurate.

This is a fast moving space and can definitely be confusing.


Regardless, people can do their research. They just don't. Yes this is a fast moving space, personally it's exhausting to keep up on.

But rather than pushing people to improve themselves or LEARN we cater to spoiled lazy child syndrome.

Put in the work, reap the rewards. No shortcuts!


I agree in principle, however you appear to be arguing in bad faith.

The person you responded outlined their process (to a level you can look at and see they clearly made an effort) and stated they put a full day in. This is not "spoiled lazy child syndrome", although there are for sure those out there who fit that mold.


got it thanks for the explanation!

helps the command line actually work


The Docker container allows you to limit the scope of what the application can access on your computer.


sure, that's always true though


It does.


From, https://github.com/jmorganca/ollama

> Get up and running with large language models locally.

> To run and chat with Llama 2:

    ollama run llama2


It froze my MacBook Air m2 for a few minutes


Did you ever figure out why? I've been using Ollama on my M2 air for quite a few weeks now and have never had that issue. If it was your first time running, it should show the output of it downloading the model.


This will unlock a ton of Enterprise use cases. I have met with a number of enterprises that have a hard no on sending much of their data into OpenAI, even on Azure. I think this will persist as the brand faces many hurdles with respect for IP, in public perception.


Yes, we are currently doing something with T5 for those exact reasons. It could probably benefit a lot from throwing a bigger model at it.


Thanks, ollama


I'm on phone and can't test this right now.

What are the system requirements for getting a decent model running in background?

Is it something I can run on a laptop with 8-12GB of RAM and not a huge harddrive?


Mistral 7B ~ 8 GiB

StableLM 3B ~4 GiB

You could go even lower with smaller quantization if necessary. I personally wouldn't use anything smaller than 7B and Mistral already pushing it in coherence. Overall it depends on your use case, not everyone needs smart models, or large context that sometimes takes half of required memory.

Codellama is also surprisingly good even for non-coding tasks


What is the most simple docker-compose.yml needed to get this running? (No I don’t want to run that command they show I want to use docker compose and am not an expert yet) - thanks


You can use https://www.composerize.com/ to see what a `docker run` command would look like as a docker-compose file


Wow I didn’t know! Thanks


  version: '3'
  
  services:
    ollama:
      image: ollama/ollama
      container_name: ollama
      ports:
        - "11434:11434"
      volumes:
        - ollama-volume:/root/.ollama


Perfect thank you


Does the docker container run on Windows though?

I still don't see any official documentation showing how to get Ollama running on Windows?


Use Ubuntu on WSL - for me it makes using Windows as a dev machine actually useable


I'm running it on windows using this docker image without any issues.


does this mean it works on windows too?


what about k8s?


I mean, at this point it should be trivial to turn this into a helm chart or something, though I haven't tried setting up a cluster with gpu access (if that is desired)


It's a docker image, you can run that in k8s.


I was referring to a example helm deployment


[dead]


why the downvotes?


He just explained that he’s using open source tools to take other people’s hard work and re-host it, using an account he set up seemingly just to promote this service. Looking at the website itself I don’t think it’s particularly offensive but I could see why people might not like it.


It doesn’t look like anything is being rehosted. The links go straight to the original blogs.


An AI summary of the article by definition contains information from the article; otherwise it would not be a summary. Whether or not you consider this rehosting is muddy but some people might, which is all my comment was saying.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: