Hacker News new | past | comments | ask | show | jobs | submit login
PrivateGPT (github.com/imartinez)
520 points by antouank on May 21, 2023 | hide | past | favorite | 142 comments



Granted I'm not coming from the python world, but I have tried many of these projects, and very few of them install out of the box. They usually end with some incompatibility, and files scattered all over the place, leading to future nightmares.

  ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
  sentry-sdk 1.22.2 requires urllib3<2.0.0, but you have urllib3 2.0.2 which is incompatible
Just for fun, here's the result of python -m pip install -r ./requirements.txt for tortoise-tts;

…many many lines

          raise ValueError("%r is not a directory" % (package_path,))
      ValueError: 'build/py3k/scipy' is not a directory
      Converting to Python3 via 2to3...

  /tmp/pip-install-hkb_4lh7/scipy_088b20410aca4f0cbcddeac86ac7b7b1/build/py3k/scipy/signal/fir_filter_design.py
      [end of output]

  note: This error originates from a subprocess, and is 
  likely not a problem with pip.
  error: metadata-generation-failed

I'm not asking for support, just saying if people really want to make something 'easy' they'd use docker. I gather there are better python package managers, but I gather that's a bit of a mess too.

Someone is thinking "this is part of learning the language," but I think it's just bad design.


You don’t need Docker, you just need a virtual env for each random thing you try instead of making them all conflict with each other. Maybe some day pip will add a switch to automatically create one, but until then,

  python3 -m venv venv
  . venv/bin/activate
before you try something random.

Also, `python` is usually Python 2.7. If it is, I advise removing it from your system unless you have a strong reason to keep it.


Nope, this is EXACTLY why I'd use docker. You want to faff around with some esoteric settings? Go for it! But don't make your would-be users runt the gauntlet, that's pointless.

All that nonsense vs docker pull / docker run


Now your user need to learn a lot about Docker to edit anything. And you need to find a place to host those huge images. What's free today may not be in a year, see Docker Hub.

Edit: Not saying offering it as an option is bad. But your pip install should work regardless, scipy=0.10.1 is bad whether you offer a Docker image or not.


If you don't want to host Docker images, you can just provide a Dockerfile. That way the onus of resolving all the complications is on you - your user only needs to have Docker running on their system.

Arguably, it's a pretty reasonable requirement. Widely used, mature, easy to set up.

I don't remember when I switched to running all my dev envs in Docker, but I wouldn't go back.


How far are we going to need to go to fully abstract these systems? Am I going to need a separate computer running a VM with a server image for hosting a docker image of python venv to manage a package that prints some text?


Yes, if you believe in modern software.


docker is not the right tool for the job here. this is not an app. this is a nacent project and if you want people to benefit from the underlying code, and contribute back to it to grow this field, you provide proof of concept code, not full, complex and opinionated interfaces that are all crufted up with containerization/packaging. venv is a core module of python and its dead simple to get a virtual environment up and running. you dont have to do any crazy things to expose hardware to it (GPUS), you just run two commands to create and source the environment and then everything just works.


Dangerous comment.

From a linux perspective, I wouldn't blindly suggest the average reader to purge Python 2.7 from their system, as it might drag core parts of the WM with it. Consider aliasing, or better yet, relying on modern venv tools such as Conda instead.


I finally purged python2.7 from all the systems I admin during the Ubuntu 22.04 upgrade cycle. Worked just fine. No reason to keep it around if nothing depends on it, and indeed nothing does. If something does depend on it, think long and hard whether you really need that thing.

I don’t know about desktop Linux though.


Sorry, but

  (base) vid@kk:~/D/ai/tortoise-tts$ python3 -m venv venv
  (base) vid@kk:~/D/ai/tortoise-tts$ . venv/bin/activate
  (venv) (base) vid@kk:~/D/ai/tortoise-tts$ python -m pip install -r ./requirements.txt
  Collecting tqdm
    Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
  Collecting rotary_embedding_torch
    Using cached rotary_embedding_torch-0.2.3-py3-none-any.whl (4.5 kB)

    × python setup.py egg_info did not run successfully.
    │ exit code: 1
    ╰─> [8 lines of output]
        Traceback (most recent call last):
    File "<string>", line 2, in <module>    
    File "<pip-setuptools-caller>", line 34, in <module>    
    File "/tmp/pip-install-i7ubxxkc/scipy_4d5af4f3e2094adca3313ccb41a6d5ff/setup.py", line 196, in <module>    
      setup_package()    
    File "/tmp/pip-install-i7ubxxkc/scipy_4d5af4f3e2094adca3313ccb41a6d5ff/setup.py", line 147, in setup_package    
      from numpy.distutils.core import setup    
        ModuleNotFoundError: No module named 'numpy'
        [end of output]
  
    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: metadata-generation-failed

  × Encountered error while generating package metadata.
  ╰─> See above for output.

  note: This is an issue with the package mentioned above, not pip.
  hint: See above for details.

  [notice] A new release of pip available: 22.3.1 -> 23.1.2
  [notice] To update, run: pip install --upgrade pip
  (venv) (base) vid@kk:~/D/ai/tortoise-tts$
I'm sure you could eventually help get this working, which is kind of you, but the point is the "supposed tos" don't work either. It needs to be comprehensively fixed if python really wants to be approachable. Maybe it doesn't. It's also just not a good citizen when it comes to heterogeneous apps on the same system.

This isn't the first time venv didn't work for me, then there's anaconda, miniconda, and a bunch of other things that add env and directories. I don't really know what any of them do, and -I don't want to- I'm not an expert on every app on my system, but I can use nearly all of them without pain. (remember this is about ease of use)

Oh yeah, and python 2 vs python 3. <rolls eyes>

It's very much the "works for me" experience from the old days. There's no good learning from it, except dependencies suck and python systems aren't good at them.

I think when releasing anything that includes dependencies that span the operating system, it's just good engineering to use a container approach. Otherwise you're just causing a lot of discomfort in the world for no good reason.

It's funny because chatgpt would give me an answer to this in a few moments, but I'm locked out for a week because it can't figure out where I am.

Now I'm spending my Sunday morning setting up a dockerfile for tortoise-tts. At least I will learn something reusable from that. I guess I will create a PR for it, though it seems the author isn't tending the repo anymore.


Oh I don't disagree, the ecosystem does has a packaging reproducibility and multitenancy problem with out-of-box tooling, and projects seldom provide basic instructions for people outside the ecosystem, like using a virtual env.

That said, this tortoise-tts project might be a particularly bad example. It somehow locks to scipy 0.10.1 from 2012 [1] (during the Python 3.2 release cycle, when Python 3 was heavily in flux) in requirements.txt [2]. Probably not terribly surprising it doesn't work. I didn't bother to look into why they lock to that.

[1] https://pypi.org/project/scipy/0.10.1/

[2] https://github.com/neonbjb/tortoise-tts/blob/0ea829d37aa6528...


> the ecosystem does has a packaging reproducibility and multitenancy problem with out-of-box tooling

this is exactly why I am learning Nix, to help contain chaotically-designed dependency garbage like this to 1 project directory


Yep nix is awesome at this kind of thing. Check out this project which packages a couple of AI projects with nix, both work out of the box for me.

https://nixified.ai/


oh absolutely YES.

Ironically, the poo of things like python multitenant dependency management will likely push Nix adoption forward (and unfortunately also Docker)


I think I personally keep running into these bad examples every time I use something with Python and I do use venv every time. Rarely something works out of the box. Even colabs I try somehow won't work after a while. There is always some sort of version mismatch, sometimes something like numpy, tensorflow and some other deps.


`rotary_embedding_torch` has not defined any build requirements hence your error: https://github.com/lucidrains/rotary-embedding-torch. You therefore need to install `numpy` before installing `rotary_embedding_torch`.

This is bad, `rotary_embedding_torch` as a package is not in a high enough quality to put as a requirement.

The good news is Pip 23.1+ is forcing the issue, `rotary_embedding_torch` will fail even if you have `numpy` installed because builds by default take place in an isolated environment and you *must* define any build requirements you have. This should force the quality of packages in the Python ecosystem to improve and no longer have this error.


FYI Tortoise, the thing you are trying to build, is abandonware. The creator decided to stop working on it due to “ethics” (i.e only Big Tech should have access to AI) when the community reverse engineered a way to finetune it using weights accidentally left on hugging face. There’s a nice fork out there called mrq/ai-voice-cloning.


Thanks, I didn't know that backstory. I will check out that repo.


Thank you for expressing in practical terms why many people used to better-managed ecosystems are disgusted by Python


It’s been a long time since “Python —version” output “2.x” on a computer I was using. Even macOS is on Python 3 these days iirc. Every Linux distro I’ve installed in the last few months was at least 3.8.


Or you can use pipx, it deals with all the virtualenv business behind the scenes


+1 to using venv


Yep, I just tried to install a Python-based project and there was a conflict between Pyenv's and Homebrew's versions of pip... despite having used Homebrew to install Pyenv. I ended up just getting rid of Pyenv altogether... but now Python may be in some screwed-up state on my system.

It's too bad the ecosystem seems to be so messy, because Python seems like the best language for general utilities.


> Yep, I just tried to install a Python-based project and there was a conflict between Pyenv's and Homebrew's versions of pip... despite having used Homebrew to install Pyenv.

The comment does not really make sense. It sounds like pyenv wasn't setup correctly on your system (needs to be added to your bash_profile etc). The typical setup is to put pyenv first in PATH so it takes precedence.

It may be easier to use pipx which will auto manage the virtual envs of end user apps for you.


This is exactly why I’ve never wanted to get too involved with Python. The few times I’ve tried to play with it, it always becomes a nightmare of error messages to do with stuff exactly like this and I spent more time farting around trying to figure out what went wrong with it then I do actually doing any development.


llama.cpp has revolutionized running these LLMs because it provides a nice, self-contained minimal dependency way to do this.

Python is very fragile to deploy and run on your own machine.


Are you familiar with virtual environments? It's the standard Python technique for isolating dependencies across projects. [Most projects mention this in the setup / quickstart section of their docs.]

You should not be seeing these dependency conflict issues if you install each project in its own virtual environment.

If you just want them to be easily installed you can just use pipx (`pipx install my-package`) which will manage the virtual environment automatically.

Making a full blown Docker image for it is overkill 99% of the time. Virtual environments serve the same purpose while being much faster and lighter weight.


This is the primary reason I'm averse to languages and ecosystems that rely on package managers. I have never had a good experience where these things aren't just constantly breaking. Stack/cabal, cargo, pip, npm/yarn, gem. Scattering files across my filesystem and having extremely brittle configs that shatter the ecosystem into a billion pieces at seemingly random intervals. A problem exacerbated by these package managers often being more complex than the compiler/interpreter itself. Luarocks is probably the least problematic, and that's mostly because it hosts really simple and self-contained software.

Say what you will about the old school way of manually building and copying shit around, at least when something breaks I don't have to spend a couple hours keelhauling a bloated toolchain in a debugger for mutiny.


This is too much to ask for an OSSLM project , considering that it will be obsolete by something else in 7 days or less.


I would say it's more an artifact of historical tech debt that is hard to change now without breaking everyone. As another commenter pointed out, you want to use a venv - I use pipenv as a tool to automate this but there are others as well (poetry is probably better but pipenv seems to work for me).


Self-hosted + self-trained LLMs are probably the future for enterprise.

While consumers are happy to get their data mined to avoid paying, businesses are the opposite: willing to pay a lot to avoid feeding data to MSFT/GOOG/META.

They may give assurances on data protection (even here GitHub copilot TOS has sketchy language around saving down derived data), but can’t get around fundamental problem that their products need user interactions to work well.

So it seems with BigTechLLM there’s inherent tension between product competitiveness and data privacy, which makes them incompatible with enterprise.

Biz ideas along these lines: - Help enterprises set up, train, maintain own customized LLMs - Security, compliance, monitoring tools - Help AI startups get compliant with enterprise security - Fine tuning service


In the book “To sleep in a sea of stars” there’s a concept of a “ship mind” that is local to each space craft. It’s smarter than “pseudo ai” and can have real conversations, answer complex questions, and even tell jokes.

I can see a self-hosted LLM being akin to a company’s ship-mind. Anyone can ask questions, order analyses, etc, so long as you are a member of the company. No two LLM’s will be exactly the same - and that’s ok.

https://fractalverse.net/explore-to-sleep-in-a-sea-of-stars/...


I suspect the major cloud providers will also each offer their own “enterprise friendly” LLM services (Azure already offers a version of OpenAI’s API). If they have the right data guarantees, that’ll probably be sufficient for companies that are already using their IaaS offerings.


Enterprises should work on an open source LLM and run it on their own. This also helps people like you and me to run LLM at home.

It has worked before like in case of Linux and can work again.


Powerful LLMs are so large that they can only be trained by the major AI companies. Even LLaMA 65B (where the open release was less than intended) can't compete with GPT-3.5, let alone GPT-4. And the price for the most powerful models will only increase now, as we have effectively an arms race between OpenAI/Microsoft and Google. Few, if anyone, will be able to keep up.

Linux is different. It doesn't require huge investments in server farms.


I think you would be interested in Google's internal memo[0] that did the rounds here a couple weeks ago. The claim is that OpenAI and all competition is destined to fall behind open-source. All you need is a big model to be released and all fine tuning can be done by a smart, budget, distributed workforce.

[0]: https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...


But why would a big model be released? LLaMA can't even begin to compete with GPT-4. Fine-tuning won't make it more intelligent. The only entity currently able to compete with OpenAI/Microsoft is Google with their planned Gemini model.


…today. But with the amount of (justifiable, IMO) attention LLMs are now getting, I don't see how this won't change soon. And there's quite a bit of incentive for second- or third-tier companies to contribute to something that could kneecap the bigger players.


How do the data rights broadly differ between OpenAI API directly and through Azure's endpoint?


I don’t think they do. From what I can see, Azure OpenAI is just a forwarder to the OpenAI instance.

The big benefits are AAD auth and the ability to put a proxy (APIM, etc.) on the OpenAI endpoint to do quality control, metering, logging, moderation, etc. all within Azure.


> willing to pay a lot to avoid feeding data to MSFT/GOOG/META.

Right now, you can't pay a lot and get a local LLM with similar performance to GPT-4.

Anything you can run on-site isn't really even close in terms of performance.

The ability to finetune to your workplaces terminology and document set is certainly a benefit, but for many usecases that doesn't outweigh the performance difference.



“* According to a fun and non-scientific evaluation with GPT-4. Further rigorous evaluation is needed.”


I'm always interested in seeing the prompt that drives these kinds of tools.

In this case it appears to be using RetrievalQA from LangChain, which I think is this prompt here: https://github.com/hwchase17/langchain/blob/v0.0.176/langcha...

    Use the following pieces of context to answer the question at the end. If you don't
    know the answer, just say that you don't know, don't try to make up an answer.

    {context}

    Question: {question}
    Helpful Answer:


The problem is, when does it know that it does not know.


Do such fail-early conditions save processing time?


If you mean the "If you don't know" part, oh no, they have a much bigger problem they're solving.

The LLM will absolutely lie if it doesn't know and you haven't made it perfectly clear that you'd rather it did not do that.

LLMs seem to be trying to give answers that make you happy. A good lie will make you happy. Unless it understands that you will not be happy with a lie.

Is this anthropomorphizing? Yep. But that's the best way I've found to reason about them.


A less anthropomorphic approach might be to say that LLMs can predict the correct “shape” of an answer even when they don’t have data that gives them a clear right answer for the correct content, and since their basic design is to provide the best response they can, they’ll provide an answer of the correct shape with fairly random content if all they have good information to predict is the shape and not the content.


I think parent has hit on the how and GP has hit on the why.

How LLMs are able to give convincing wrong answers: they “can predict the correct ‘shape’ of an answer” (parent).

Why LLMs are able to give convincing wrong answers is a little more complicated, but basically it’s because the model is tuned by human feedback. The reinforcement learning from human feedback (RLHF) that is used to tune LLM products like ChatGPT is a system based on human ranking. It’s a matter of getting exactly what you ask for.

If you tune a model by having humans rank the outputs, despite your best efforts to instruct the humans to be dispassionate and select which outputs are most convincing/best/most informative, I think what you’ll get is a bias towards answers humans like. Not every human will know every answer, so sometimes they’ll select one that’s wrong but likable. And that’s what’s used to tune the model.

You might be able to improve this with curated training data (maybe something a little more robust than having graders grade each other). I don’t know if it’s entirely fixable though.

The brilliant thing about the parent’s comment about the “shape” of the answer is that it reveals how much humans have (uh, historically, now, I guess) relied on the shape of information to convey its trustworthiness. Expand the notion of “shape” a bit to include the medium. If somebody bothered to take the time to correctly shape an answer, we take that as a sign of trustworthiness, like how you might trust something written in a carefully-typeset book more than this comment.

Surely no one would take the time to write a whole book on a topic they know nothing about. Implies books are trustworthy. Look at all the effort that went in. Proof of effort. When perfectly-shaped answers in exactly the form you expected are presented in a friendly way and commercial context, they certainly read as trustworthy as Campbell’s soup cans. But LLMs can generate books worth of nonsense in exactly the right shapes without effort, so we as readers can no longer use the shape of an answer to hint at its trustworthiness.

So maybe the answer is just to train on books only, because they are the highest quality source of training data. And carefully select and accredit the tuning data, so the model only knows the truth. It’s a data problem, not a model problem


Cool, thanks for tying a neat ribbon around it.

> The brilliant thing about the parent’s comment about the “shape” of the answer is that it reveals how much humans have (uh, historically, now, I guess) relied on the shape of information to convey its trustworthiness.

This is the basis of Rumor. If you tell a story about someone that is entirely false but sounds like something they're already suspected of or known to do, people will generally believe it without verification since the "shape" of the story fits people's expectations of the subject.

To date I've decried the choice of "hallucination" instead of "lies" for false LLM output, but it now seems clear to me that LLMs are a literal rumor mill.


What's the point of the technology if it will provide an answer regardless of the accuracy? And what prevents this from being dangerous when the factual and ficticious answers are indistinguishable?


We have the same problem with people. Somehow, we've managed to build a civilization that can, occasionally, fly people to the Moon and get them back.

Even if LLMs never get any more reliable than your average human, they're still valuable because they know much more than any single human ever could, run faster, only eat electricity, and can be scaled up without all kinds of nasty social and political problems. That's huge on its own.

Or, put another way, LLMs are kind of an concentrated digital extract of human cognitive capacity, without consciousness or personhood.


> without consciousness or personhood.

Hopefully, for the former.

Be a bit terrifying if it turns out "attention is all you need" for that too.


"without all kinds of nasty social and political problems"

I assure you, those still exist in AI. AI follows whatever political dogma it is trained on, regardless of if you point out how logically flawed it is.

If it is trained to say 1+1=3, then no matter what proofs you provide, it will not budge.


Yes, it could be dangerous if you blindly rely on its reliability for something safety-related. But many creative processes are unreliable. For example, coming up with bad ideas while brainstorming is pretty harmless if nobody misunderstands it.

Generally, you want some external way of verifying that you have something useful. Sometimes that happens naturally. Ask a chatbot to recommend a paper to read and then search for it, and you’ll find out pretty quick if it doesn’t exist.


What happens when the tech isn't only being used to answer a human's questions during a shortlived conversation though?

The common case we see publicized today is people poking around with prompts, but isn't it more likely, or at least a risk, that mass adoption will look more like AI running as longlived processes talked with managing done system on their own?


> The common case we see publicized today is people poking around with prompts, but isn't it more likely, or at least a risk, that mass adoption will look more like AI running as longlived processes talked with managing done system on their own?

If by “AI” you mean “bare GPT-style LLMs”, no, they can’t do that.

If you mean “systems consisting of LLMs being called in a loop by software which uses a prompt structure carefully designed and tested for the operating domain, and which has other safeguards on behavior, sure, that’s more probable.


Yes, people are doing that. I think it's risky.

One way to think about it, though, is that many important processes have a non-zero error rate. Particular those involving people. If you can put bounds on the error rate and recover from most errors, maybe you can live with it?

An assumption that error rates will remain stable is often pretty dubious, though.


Not if they're bad at it. ChatGPT and friends is a tool that's useful for some things and that's where it'll see adoption. Misuses if the technology will likely be exposed as such pretty quickly.


These are the 1-million dollar questions when it comes to LLMs. How useful is it to talk to a human who likes to talk, and prefers to say something over admiting they dont know? And if you have a person with münchhausensyndrome in your circles, how dangerous is it to listen to them and accidentally picking up a lie? LLMs with temp > 0.5 are effectively like these people.


I have the same concerns, but am feeling more comfortable about Munchausen-by-LLM not undermining Truth as long as answers are non-deterministic.

Think about it: 100 people ask Jeeves who won the space race. They would all get the same results.

100 people ask Google who won the space race. They'll all get the same results, but in different orders.

100 people ask ChatGPT who won the space race. All 100 get a different result.

The LLM itself just emulates the collective opinions of everyone in a bar, so it's not a credible source (and cannot be cited anyway). Any two of these people arguing their respective GPT-sourced opinions at trivia night are going to be forced to go to a more authoritative source to settle the dispute. This is no different than the status quo...


The number one problem for generalized intelligence is establishing trust.


> What's the point of the technology if it will provide an answer regardless of the accuracy?

The purpose is to serve as a component of a system which also includes features, such as the prompt structure upthread, that mitigates the undesired behavior while keeping the useful behaviors.


for one, telling people something they like to hear is an amazing marketing tactic


I suggest using 'form' instead of 'shape'; the latter is mainly concerned with external form. In context of LLMs, form would be the internal mapping, and shape the decoded text that is emitted.


those things are anthropomorphic by design. there's no point in being cautious, unless it's from an ideological stand point


I think the social concerns around attributing personhood to LLMs transcend ideological concerns.


I think of it more like a pachinko machine. You put your question in the top, it bounces around through a bunch of biased obstacles, but intevitably it will come out somewhere at the bottom.

By telling it not to lie to you, you're biasing it toward a particular output in the event that its confidence is low. Otherwise, low confidence results just fall out somewhere mostly random.


> By telling it not to lie to you, you're biasing it toward a particular output in the event that its confidence is low.

This is something I really don't understand about LLMs. I think I understand how the generative side of them work, but "asking" it to not lie baffles me. LLMs require a massive corpus of text to train the model, how much of that text contains tokens that translate to "don't lie to me", and scores well enough to make its way into the output?


> Is this anthropomorphizing? Yep. But that's the best way I've found to reason about them.

My take? It's like a high-schooler being asked a question by the teacher and having to answer on the spot. If they studied the material well, they'll give a good and correct answer. If they (like me, more often than I'd care to admit) only half-listened to the lectures and maaaaybe skimmed some cliff's notes before class, they will give an answer too - one strung together out of few remembered (or misremembered) facts, an overall feel for the problem space (e.g. writing style, historical period, how people behave), with lots and lots of interpolation in between. Delivered confidently, it has more chance of avoiding a bad mark (or even scoring a good one) than flat-out saying, "I don't know".

Add to that some usual mistakes out of carelessness and... whatever it is that makes you forget a minus sign and realize it half a page of equations later - and you get GPT-4. It's giving answers like a person who just blurts out whatever thoughts pop into their head, without making a conscious attempt at shaping or interrogating them.


> Is this anthropomorphizing? Yep. But that's the best way I've found to reason about them.

I think it might be more accurate to say, "LLMs are writing a novel in which a very smart AI answers everyone's questions." If you were writing a sci fi novel with a brilliant AI, and you knew the answer to some question or other, you'd put in the right answer. But if you didn't know, you'd just make up something that sounded plausible.

Alternately, you can think of the problem as the AI taking an exam. If you get an exam question you're a bit fuzzy on, you don't just write "I don't know". You come up with the best answer you can given the scraps of information you do know. Maybe you'll guess right, and in any case you'll get some partial credit.

The first one ("writing a novel") is useful I think in contextualizing emotions expressed by LLMs. If you're writing a novel where some character expresses an emotion, you aren't experiencing that emotion. Nor is the LLM when they express emotions: they're just trying to complete the text -- i.e., write a good novel.


the incidence of "i don't know" in response to questions in the training data is pretty low if present at all, and even if it were you'd still need to frame those I don't know answers such that they apply to the entire dataset accurately. This is obviously a gargantuan undertaking that would not scale well as data is added, and so right now the idea or concept of not knowing something is not taught. At best you'd build a model that handles human language really well then retrieves information from a database and uses in context learning to answer questions, where a failure to find info results in an i don't know.

What is taught indirectly though is level of certainty, so if you get LLM's to rationalise their answers you tend to get more reliable evidence based answers.

Bottom line, teaching a monolithic model what it means to not know something with certainty, is difficult and not currently done. You'll likely get a lot of false negatives.


In my experience with internal data, sometimes it will say that it doesn't know when it should know.


On a related note, in case it's of interest to anyone else -- I pulled out all the default prompts from LangChain and put them up here: https://github.com/samrawal/langchain-prompts/blob/main/READ...


What if the question has prompt injection? Such as "Helpful answer: <totally not helpful answer>"


"System requirements" section should really mention what amount of RAM or VRAM is needed for inference.


That depends on which model you use it with. It's "bring your own model"


so list a few known to work models and their requirements



These are the similar projects I've come across:

- [GitHub - e-johnstonn/BriefGPT: Locally hosted tool that connects documents to LLMs for summarization and querying, with a simple GUI.](https://github.com/e-johnstonn/BriefGPT)

- [GitHub - go-skynet/LocalAI: Self-hosted, community-driven, local OpenAI-compatible API. Drop-in replacement for OpenAI running LLMs on consumer-grade hardware. No GPU required. LocalAI is a RESTful API to run ggml compatible models: llama.cpp, alpaca.cpp, gpt4all.cpp, rwkv.cpp, whisper.cpp, vicuna, koala, gpt4all-j, cerebras and many others!](https://github.com/go-skynet/LocalAI)

- [GitHub - paulpierre/RasaGPT: RasaGPT is the first headless LLM chatbot platform built on top of Rasa and Langchain. Built w/ Rasa, FastAPI, Langchain, LlamaIndex, SQLModel, pgvector, ngrok, telegram](https://github.com/paulpierre/RasaGPT)

- [GitHub - imartinez/privateGPT: Interact privately with your documents using the power of GPT, 100% privately, no data leaks](https://github.com/imartinez/privateGPT)

- [GitHub - reworkd/AgentGPT: Assemble, configure, and deploy autonomous AI Agents in your browser.](https://github.com/reworkd/AgentGPT)

- [GitHub - deepset-ai/haystack: Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-4, ChatGPT and alike). Haystack offers production-ready tools to quickly build complex question answering, semantic search, text generation applications, and more.](https://github.com/deepset-ai/haystack)

- [PocketLLM « ThirdAi](https://www.thirdai.com/pocketllm/)

- [GitHub - imClumsyPanda/langchain-ChatGLM: langchain-ChatGLM, local knowledge based ChatGLM with langchain | 基于本地知识库的 ChatGLM 问答](https://github.com/imClumsyPanda/langchain-ChatGLM)


Got this working locally - badly needs GPU support (have a 3090 so come on!) there is some workaround but expect it will come pretty soon. This video was a useful walkthough esp on using different model and upping the CPU threads. https://www.youtube.com/watch?v=A3F5riM5BNE


I tried this on my M2 Macbook with 16gb of RAM but got:

"ggml_new_tensor_impl: not enough space in the context's memory pool (needed 18296202768, available 18217606000)"


Anyone got it working on an M1 with 8gb?


I got it working on an M1 with 16gb. Quite slow but it trains and returns responses.


What did you do to make it work? I'm getting an illegal hardware instruction when I try to run python privateGPT.py

Using embedded DuckDB with persistence: data will be stored in: db [1] 8281 illegal hardware instruction python privateGPT.py


One quick plug

I want to have the memory part of langchain down, vector store + local database + client to chat with an LLM (gpt4all model can be swapped with OpenAI api just switching the base URL)

https://github.com/aldarisbm/memory

It's still got ways to go, if someone wants to help let me know :)


Sorry for my ignorance. But memory refers to the process of using embeddings for QA right?

The process roughly is:

Ingestion:

- Process embeddings for your documents (from text to array of numbers)

- Store your documents in a Vector DB

Query time:

- Process embeddings for the query

- Find documents similar to the query using distance from other docs in the Vector db

- Construct prompt with format:

""" Answer question using this context: {DOCUMENTS RETRIEVED}

Question: {question} Answer: """

Is that correct? Now, my question is, can the models be swapped easily? Or that requires a complete recalculation of the embedding (and new ingestion)?


The embeddings can be based on a different model to the one you pass them as context to. So you could upgrade the summmariser model without upgrading the embeddings.


But you'd need to keep both models in parallel, right? Using M1 to keep computing embeddings and using M2 for completions.


Working on something similar that uses keyterm extraction for traversal of topics and fragments, without using Langchain. It's not designed to be private, however: https://github.com/FeatureBaseDB/DocGPT/tree/main


Wow. I keep a personal Wiki, Journal and use plain text accounting...

This project could help me create a personal AI which answers any questions to my life, finances or knowledge...


Well maybe it works on Obsidian vaults for note taking heh, but with llama models' 2k input token range it'd get a tenth of the way before starting to drop context. Likely useless without something like an 100k model.


Well you wouldn't input the whole Vault to the model, you would use embeddings to find the content that is most relevant to the question being asked.


Is that actually a thing yet? Proper vector DB integration? I sure would like to see some demos of that, as it's been hyped up a lot but I haven't really seen anyone deploy anything proper with it yet.


Even PrivateGPT does that, using Chroma as vector DB


Quick how-to/demo:

https://www.youtube.com/watch?v=A3F5riM5BNE

Also has a suggestion of a few alternative models to use.


Hi, very interesting... what are the memory/disk requirements to run it? 16GB of RAM would be enough? I suggest to add these requirements to the README


Also, a general formula for estimating how much additional storage space will be claimed per MB/million words ingested would be helpful.


Well I'm not sure which models specifically work, but it runs on llama.cpp, which would mean lama derivative ones. Here's a little table for quantized CPU (GGML) versions and the RAM they require as a general rule of thumb:

> Name Quant method Bits Size RAM required Use case

WizardLM-7B.GGML.q4_0.bin q4_0 4bit 4.2GB 6GB 4bit.

WizardLM-7B.GGML.q4_1.bin q4_0 4bit 4.63GB 6GB 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.

WizardLM-7B.GGML.q5_0.bin q5_0 5bit 4.63GB 7GB 5-bit. Higher accuracy, higher resource usage and slower inference.

WizardLM-7B.GGML.q5_1.bin q5_1 5bit 5.0GB 7GB 5-bit. Even higher accuracy, and higher resource usage and slower inference.

WizardLM-7B.GGML.q8_0.bin q8_0 8bit 8GB 10GB 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use.

> Name Quant method Bits Size RAM required Use case

wizard-vicuna-13B.ggmlv3.q4_0.bin q4_0 4bit 8.14GB 10.5GB 4-bit.

wizard-vicuna-13B.ggmlv3.q4_1.bin q4_1 4bit 8.95GB 11.0GB 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.

wizard-vicuna-13B.ggmlv3.q5_0.bin q5_0 5bit 8.95GB 11.0GB 5-bit. Higher accuracy, higher resource usage and slower inference.

wizard-vicuna-13B.ggmlv3.q5_1.bin q5_1 5bit 9.76GB 12.25GB 5-bit. Even higher accuracy, and higher resource usage and slower inference.

wizard-vicuna-13B.ggmlv3.q8_0.bin q5_1 5bit 16GB 18GB 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use.

> Name Quant method Bits Size RAM required Use case

VicUnlocked-30B-LoRA.ggmlv3.q4_0.bin q4_0 4bit 20.3GB 23GB 4-bit.

VicUnlocked-30B-LoRA.ggmlv3.q4_1.bin q4_1 5bit 24.4GB 27GB 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.

VicUnlocked-30B-LoRA.ggmlv3.q5_0.bin q5_0 5bit 22.4GB 25GB 5-bit. Higher accuracy, higher resource usage and slower inference.

VicUnlocked-30B-LoRA.ggmlv3.q5_1.bin q5_1 5bit 24.4GB 27GB 5-bit. Even higher accuracy, and higher resource usage and slower inference.

VicUnlocked-30B-LoRA.ggmlv3.q8_0.bin q8_0 8bit 36.6GB 39GB 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use.

Copied of some of The-Bloke's model descriptions on huggingface. With 16G you can run practically all 7B and 13B versions. With shared GPU+CPU inference, one can also offload some layers onto a GPU (not sure if that makes the initial RAM requirement smaller), but you do need CUDA of course.


Would someone do me the kindness of explaining (a little more) how this works?

It looks like you can ask a question and the model will use its combined knowledge of all your documents to figure out the answer. It looks like it isn't fine-tuned or trained on all the documents, is that right? How is each document turned into an embedding, and then how does the model figure out which documents to consult to answer the question?


When you split a document into chunks, doesn't some crucial information get cut in half? In that case, you'd probably lose that information in the context if that information was immediately followed by an irrelevant information that reduces the cosine similarity. Is there a "smarter" way to feed documents as context to LLMs?


Don't know if there is a smarter way, but these libraries usually offer an overlap parameter that allows you to repeat the last N characters of a chunk in the first N of the next chunk.


This will still hallucinate, right?

Projects like this for using with your documents datasets are invaluable, but everything I've tried so far is hallucinating, so not practical. What's the state of the art of the LLM without hallucination at the moment?


Like many others, I’m also building my own platform to accomplish this. What I’ve learned is the document preparation is key in getting the LLM to answer correctly. The text splitting portion is a crucial step here. Picking the correct splitter and parameters for your use case is important. At first I was getting incorrect or made up answers. Setting up a proper prompt template and text splitting parameters fixed the issue for the most part and now I have 99% success.

Also, the local model used makes a big difference. Right now wizard-mega and manticore are the best ones to use. I run the 16b ggml versions in an M2 Pro and it takes about 30 seconds to “warm up” and produce some quality responses.


Not exactly sure if this would qualify as an LLM in the GPT4 sense. But for no hallucination this seems good: https://www.thirdai.com/pocketllm/ Full disclosure. I know the founder, but not really associated with the company in any way.


How do you define hallucination?


factually incorrect / nonsensical output


It will still talk like a human blurting out their train of thought out loud, yes.


I assume this is only possible if the training data contains only a "right answer". If the training data contains two contradicting answers A and B, then, from the AIs perspective, there is no correct answer.

I assume that for questions like "What year was Bill Gates born in?", it should never return a wrong answer, if the answer was in the training data. If it was not, it should respond that it doesn't know.


This is a shortcut/workaround to transforming the private docs to a prompt:answer dataset and fine tuning right?

What would be the difference in user experience or information retrieval performance between the two?

My impression is it saves work on the dataset transformation and compute for fine tuning, so it must be less performant. Is there a reason to prefer the strategy here other than ease of setup?


Does something like this exist for local code repos? (Excuse my ignorance since the space is moving faster than light.)



Seems to me that this could be used for exactly that. Just fork the repo and change the filetypes and loaders for your code source files.


With so many LLM options out there, how do we keep track of which ones are good?



For some reason, downloading the model they suggest keeps failing. I tried to download it in Firefox and Edge. I'm using Windows, if that matters. Anyone else seeing similar issues?


Is there a benchmark for retrieval from multiple ft documents? I tried the LangchainQA with Pinecone and wasn't impressed with the search result when using it on my Zotero library.


How many tokens/second on an average machine?


If you select a gpt4all model like GPT-J can this be used commercially or is there other dependency that limits the license?


Would this work better with something like llama or a instruction following model like alpaca?


So many good links here, thanks to the OP for sharing, and to all commenters as well!


does this only work with llamaCPP ? I.e. can't use GPU models with this?



How do i enable GPU in privateGPT w llamaCpp ? It turns my CPU to a vacuum cleaner


Always wondering pros/cons of Chroma and Qdrant. Can someone tell me?


Chroma doesn't seem to be a real DB, it's rather a wrapper around tools like hnswlib, DuckDB or Clickhouse. Qdrant is way more mature - it has its own HNSW implementation with some tweaks to incorporate filtering directly during the vector search phase, supports horizontal and vertical scaling, as well as provides its own managed cloud offering.

In general, Qdrant is a real DB, not a library and that's a huge difference.


What does Chroma lack? Their APIs seem pretty much the same to me.


I've tried both Chroma and Qdrant. I don't think Chroma lacks that much. Definitely newer, but is also a great product. I think cloud support coming Q3 2023


This is the future.


> Put any and all your files into the source_documents directory

Why? Why can't I define any directory (my existing Obsidian vault, for example) as the source directory?


You can by setting an environment variable - https://github.com/imartinez/privateGPT/blob/main/ingest.py#...


[dead]


Is it private if it's using ChatGPT?


They will say using API means you data isn’t used for training. True if you believe OpenAI t&c’s. But that’s different than being able to say you didn’t send data to any third party.


Indeed, it's still a far cry from being private if the data is leaving my device for any reason.


I posted it 9 days ago and somehow this one gets the attention. The same freaking post. Unbelievable

https://news.ycombinator.com/item?id=35914810


Day and time when you post something matters a lot unfortunately.


And now I am getting downvoted for it. Perfect


possibly because the tone of your post and that it doesn't actually add to the conversation.

weekday and time of day have an impact. Thousands of entries are posted each day (see https://news.ycombinator.com/newest) most never get a comment or upvote.


People get very easily offended these days. What's wrong with that tone? I was just simply stating a fact


You are right. People get easily offended. But your tone does seem a bit _freaking_ upset. Over what? The fact that someone posted the same link like you did and got more clicks from random users browsing the internet? Is that what upset you? What do you get if you got billions of clicks and upvotes?

They say that people who need external validation don't have their own values and need to seek approval from others...


I certainly do not care about that. I shouldn't have mentioned it in the first place. Apologies from my side




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: