Hacker News new | past | comments | ask | show | jobs | submit login

I have collected so much information in text files on my computer that it has become unmanageable to find anything. Now with local AI solutions, I wondered if I could create a smart search engine that could provide answers to the information that exists on my personal data.

My question is.

1 - Even if there is so much data that I can no longer find stuff, how much text data is needed to train an LLM to work ok? Im not after an AI that could answer general question, only an AI that should be able to answer what I already know exist in the data.

2 - I understand that the more structured the data are, the better, but how important is it when training an LLM with structured data? Does it just figuring stuff out anyways in a good way mostly?

3 - Any recommendation where to start, how to run an LLM AI locally, train on your own data?




/r/localllama is probably the place where you want to ask your questions. They are very up to date and lots of good recommendations there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: