Ollama is now available as an official Docker image

davidkunz · on Oct 6, 2023

Ollama is great, I recently integrated it into my text editor (Neovim), now I have local AI assistance when coding/writing text.

https://youtu.be/FIZt7MinpMY?si=fzcmKq5cw2bniH26

davidkunz · on Oct 6, 2023

Here's the plugin: https://github.com/David-Kunz/gen.nvim

kevinlinxc · on Oct 6, 2023

I havent touched llm stuff because the barrier to entry felt pretty high. This has reduced the barrier so much if it works as promised.

ericskiff · on Oct 6, 2023

I also highly recommend trying LMStudio. It’s a nice interface for downloading, using, and holding LLMs locally

dr_kiszonka · on Oct 6, 2023

I was wondering if you have come across any models that can answer questions based on information from a single source and that source alone (a website or document)?

I asked GPT 4, Claude 1 & 2, and Bard questions about a single documentation page and they all gave both incorrect and hallucinated answers. For example, I asked them to list all functions that accepted integers based on provided documentation. Their responses included functions that didn't accept integers and functions never mentioned in the documentation.

thebruce87m · on Oct 6, 2023

> H2O LLM Studio requires a machine with Ubuntu 16.04+ and at least one recent Nvidia GPU with Nvidia drivers version >= 470.57.02. For larger models, we recommend at least 24GB of GPU memory.

ilteris · on Oct 18, 2023

yes but it's closed source, no?

yieldcrv · on Oct 6, 2023

why is downloading a docker image and using docker from the command line less of a deterrent for you than downloading a model and using it from the command line?

lopkeny12ko · on Oct 6, 2023

Because it's impossible to get started as someone who has no background in ML. I tried earlier this year and also failed.

1. Obtain the Llama models. Apparently you have to sign up for access and get a download link? I don't want to do that, found some public download links instead. Ok, now I have a few hundred GBs of model files.

2. Compile llama.cpp. Missing dependencies, took an hour to figure out how to resolve.

3. Quantize the model? What does that even mean?

4. Install pytorch. Run the command in the README. Python exception.

5. Install NVIDIA helper libraries. Doesn't work. Try installing the AMD helpers to run on CPU instead. No instructions for how to do this. Eventually figured it out.

6. Try running pytorch again. Same exception.

I gave up after a full day of trying to make this work. The Docker image is for people like me.

imp0cat · on Oct 6, 2023

Yeah it's great, you can get this running in no time (even without gpu, just to play with it):

  docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
  docker exec -it ollama ollama run llama2

And that's all!

politelemon · on Oct 6, 2023

And with GPU just do this for the first command

    docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

You'll need to have the NVidia container toolkit installed.

alexchantavy · on Oct 7, 2023

Thank you!

For docker-compose, do

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

Docs: https://docs.docker.com/compose/gpu-support/

stefandesu · on Oct 6, 2023

It's nice to have this possibility on Linux now. It wasn't much more complicated on macOS before:

    brew install ollama
    brew services start ollama
    ollama run llama2

Works great on my M1 MacBook Air, although it's definitely not as good as ChatGPT.

brandonasuncion · on Oct 6, 2023

Btw, the quantized models are on Huggingface, so step 1 and steps 3-6 can be avoided. (The link is 3/4ths into the llama.cpp README)

pdntspa · on Oct 6, 2023

Oh no, gotta RTFM

Why is everyone so afraid to get their hands dirty?

averageRoyalty · on Oct 12, 2023

This isn't really fair. There are many manuals with rapidly changing or incomplete information, or information only months old that is no longer accurate.

This is a fast moving space and can definitely be confusing.

pdntspa · on Oct 12, 2023

Regardless, people can do their research. They just don't. Yes this is a fast moving space, personally it's exhausting to keep up on.

But rather than pushing people to improve themselves or LEARN we cater to spoiled lazy child syndrome.

Put in the work, reap the rewards. No shortcuts!

averageRoyalty · on Oct 15, 2023

I agree in principle, however you appear to be arguing in bad faith.

The person you responded outlined their process (to a level you can look at and see they clearly made an effort) and stated they put a full day in. This is not "spoiled lazy child syndrome", although there are for sure those out there who fit that mold.

yieldcrv · on Oct 6, 2023

got it thanks for the explanation!

helps the command line actually work

TuringNYC · on Oct 6, 2023

The Docker container allows you to limit the scope of what the application can access on your computer.

yieldcrv · on Oct 6, 2023

sure, that's always true though

thatxliner · on Oct 6, 2023

It does.

rovr138 · on Oct 6, 2023

From, https://github.com/jmorganca/ollama

> Get up and running with large language models locally.

> To run and chat with Llama 2:

    ollama run llama2

55555 · on Oct 6, 2023

It froze my MacBook Air m2 for a few minutes

reustle · on Oct 6, 2023

Did you ever figure out why? I've been using Ollama on my M2 air for quite a few weeks now and have never had that issue. If it was your first time running, it should show the output of it downloading the model.

shaburn · on Oct 6, 2023

This will unlock a ton of Enterprise use cases. I have met with a number of enterprises that have a hard no on sending much of their data into OpenAI, even on Azure. I think this will persist as the brand faces many hurdles with respect for IP, in public perception.

nottheengineer · on Oct 6, 2023

Yes, we are currently doing something with T5 for those exact reasons. It could probably benefit a lot from throwing a bigger model at it.

backflippinbozo · on Oct 6, 2023

Thanks, ollama

kramerger · on Oct 6, 2023

I'm on phone and can't test this right now.

What are the system requirements for getting a decent model running in background?

Is it something I can run on a laptop with 8-12GB of RAM and not a huge harddrive?

keonix · on Oct 6, 2023

Mistral 7B ~ 8 GiB

StableLM 3B ~4 GiB

You could go even lower with smaller quantization if necessary. I personally wouldn't use anything smaller than 7B and Mistral already pushing it in coherence. Overall it depends on your use case, not everyone needs smart models, or large context that sometimes takes half of required memory.

Codellama is also surprisingly good even for non-coding tasks

ssss11 · on Oct 6, 2023

What is the most simple docker-compose.yml needed to get this running? (No I don’t want to run that command they show I want to use docker compose and am not an expert yet) - thanks

westernpopular · on Oct 6, 2023

You can use https://www.composerize.com/ to see what a `docker run` command would look like as a docker-compose file

ssss11 · on Oct 6, 2023

Wow I didn’t know! Thanks

kwertyops · on Oct 6, 2023

  version: '3'
  
  services:
    ollama:
      image: ollama/ollama
      container_name: ollama
      ports:
        - "11434:11434"
      volumes:
        - ollama-volume:/root/.ollama

ssss11 · on Oct 6, 2023

Perfect thank you

remmargorp64 · on Oct 6, 2023

Does the docker container run on Windows though?

I still don't see any official documentation showing how to get Ollama running on Windows?

alexchantavy · on Oct 7, 2023

Use Ubuntu on WSL - for me it makes using Windows as a dev machine actually useable

sakopov · on Oct 6, 2023

I'm running it on windows using this docker image without any issues.

dvngnt_ · on Oct 7, 2023

does this mean it works on windows too?

vanillax · on Oct 6, 2023

what about k8s?

kelvie · on Oct 6, 2023

I mean, at this point it should be trivial to turn this into a helm chart or something, though I haven't tried setting up a cluster with gpu access (if that is desired)

imp0cat · on Oct 6, 2023

It's a docker image, you can run that in k8s.

vanillax · on Oct 6, 2023

I was referring to a example helm deployment

on Oct 6, 2023

[dead]

nf17 · on Oct 6, 2023

why the downvotes?

subjectsigma · on Oct 6, 2023

He just explained that he’s using open source tools to take other people’s hard work and re-host it, using an account he set up seemingly just to promote this service. Looking at the website itself I don’t think it’s particularly offensive but I could see why people might not like it.

rafram · on Oct 6, 2023

It doesn’t look like anything is being rehosted. The links go straight to the original blogs.

subjectsigma · on Oct 6, 2023

An AI summary of the article by definition contains information from the article; otherwise it would not be a summary. Whether or not you consider this rehosting is muddy but some people might, which is all my comment was saying.