I’m struggling to understand the point of this. It appears to be a more simplifi...

operator-name · 2024-02-13T17:43:56 1707846236

This is a tech demo for TensorRT, which is ment to greatly improve inference time for compatible models.

brucethemoose2 · 2024-02-13T23:53:04 1707868384

> the more technical users will leverage llama.cpp to run whatever models they are interested in.

Llama.cpp is much slower, and does not have built-in RAG.

TRT-LLM is a finicky deployment grade framework, and TBH having it packaged into a one click install with llama index is very cool. The RAG in particular is beyond what most local LLM UIs do out-of-the-box.

dkarras · 2024-02-13T18:54:25 1707850465

>It appears to be a more simplified way of getting a local LLM running on your machine

No, it answers questions from the documents you provide. Off the shelf local LLMs don't do this by default. You need a RAG stack on top of it or fine tune with your own content.

westurner · 2024-02-13T21:31:45 1707859905

From "Artificial intelligence is ineffective and potentially harmful for fact checking" (2023) https://news.ycombinator.com/item?id=37226233 : pdfgpt, knowledge_gpt, elasticsearch :

> Are LLM tools better or worse than e.g. meilisearch or elasticsearch for searching with snippets over a set of document resources?

> How does search compare to generating things with citations?

pdfGPT: https://github.com/bhaskatripathi/pdfGPT :

> PDF GPT allows you to chat with the contents of your PDF file by using GPT capabilities.

GH "pdfgpt" topic: https://github.com/topics/pdfgpt

knowledge_gpt: https://github.com/mmz-001/knowledge_gpt

From https://news.ycombinator.com/item?id=39112014 : paperai

neuml/paperai: https://github.com/neuml/paperai :

> Semantic search and workflows for medical/scientific papers

RAG: https://news.ycombinator.com/item?id=38370452

Google Desktop (2004-2011): https://en.wikipedia.org/wiki/Google_Desktop :

> Google Desktop was a computer program with desktop search capabilities, created by Google for Linux, Apple Mac OS X, and Microsoft Windows systems. It allowed text searches of a user's email messages, computer files, music, photos, chats, Web pages viewed, and the ability to display "Google Gadgets" on the user's desktop in a Sidebar

GNOME/tracker-miners: https://gitlab.gnome.org/GNOME/tracker-miners

src/miners/fs: https://gitlab.gnome.org/GNOME/tracker-miners/-/tree/master/...

SPARQL + SQLite: https://gitlab.gnome.org/GNOME/tracker-miners/-/blob/master/...

https://news.ycombinator.com/item?id=38355385 : LocalAI, braintrust-proxy; promptfoo, chainforge, mixtral

qdequelen · 2024-02-15T05:59:02 1707976742

> Are LLM tools better or worse than e.g. meilisearch or elasticsearch for searching with snippets over a set of document resources?

Absolutely worse, LLM are not made for it at all.

fortran77 · 2024-02-13T17:25:02 1707845102

It seems really clear to me! I downloaded it, pointed it to my documents folder, and started running it. It's nothing like the "AI built into Windows" and it's much easier than dealing with rolling my own.

SirMaster · 2024-02-13T17:02:38 1707843758

This lets you run Mistral or Llama 2, so whomever has an RTX card and wants to run either of those models?

And perhaps they will add more models in the future?

pquki4 · 2024-02-13T21:54:52 1707861292

I don't think your comment answers the question? Basically, those who bother to know underlying model's name can already run their model without this tool from nvidia?

ls612 · 2024-02-14T05:51:02 1707889862

It will run a lot faster by using the tensor (Ray Tracing) cores than the standard CUDA cores.

McAtNite · 2024-02-13T17:09:23 1707844163

I suppose I’m just struggling to see the value add. Ollama already makes it dead simple to get a local LLM running, and this appears to be a more limited vendor locked equivalent.

From my point of view the only person who would be likely to use this would be the small slice of people who are willing to purchase an expensive GPU, know enough about LLMs to not want to use CoPilot, but don’t know enough about them to know of the already existing solutions.

kkielhofner · 2024-02-13T17:28:55 1707845335

With all due respect this comment has fairly strong (and infamous) HN Dropbox thread vibes.

It's an Nvidia "product", published and promoted via their usual channels. This is co-sign/official support from Nvidia vs "Here's an obscure name from a dizzying array of indistinguishable implementations pointing to some random open source project website and Github repo where your eyes will glaze over in seconds".

Completely different but wider and significantly less sophisticated audience. The story link is on The Verge and because this is Nvidia it will also get immediately featured in every other tech publication, website, subreddit, forum, twitter account, youtube channel, etc.

This will get more installs and usage in the next 72 hours than the entire Llama/open LLM ecosystem has had in its history.

McAtNite · 2024-02-13T17:41:15 1707846075

Unfortunately I’m not aware of the reference to the HN Dropbox thread.

I suppose my counter point is only that the user base that relies on simplified solutions is largely already addressed with the wide number of cloud offerings from OpenAi, Microsoft, Google, whatever other random company has popped up. Realistically I don’t know if the people who don’t want to use those, but also don’t want to look at GitHub pages is really that wide of an audience.

You could be right though. I could be out of touch with reality on this one, and people will rush to use the latest software packaged by a well known vendor.

thecal · 2024-02-13T17:54:09 1707846849

It is probably the most famous HN comment ever made and comes up often. It is a dismissive response to Dropbox years ago:

https://news.ycombinator.com/item?id=9224

McAtNite · 2024-02-13T18:22:42 1707848562

Thanks for the explanation. I guess my only hope for not looking like I had a bad opinion is people’s intertia to move beyond CoPilot.

anonymousab · 2024-02-13T18:22:06 1707848526

> the user base that relies on simplified solutions is largely already addressed

There is a wide spectrum of users for which a more white-labelled locally-runnable solution might be exactly what they're looking for. There's much more than just the two camps of "doesn't know what they're doing" and "technically inclined and knows exactly what to do" with LLMs.

pquki4 · 2024-02-13T22:00:19 1707861619

Anyone who bothers to distinguish a product from Microsoft/nvidia/meta/someone else already know what they are doing.

Most users don't care whether whether the model is run, online or local. They go to ChatGPT or Bing/Copilot to get answers, as long as they are free. Well, if it becomes a (mandatory) subscription, they are more likely to pay for it rather than figure out how to run a local LLM.

Sounds like you are the one who's not getting the message.

So basically the only people who runs a local LLM are those who are interested enough in this. Any why would brand name matter? What matters is whether a model is good, whether it can run on a specific machine and how fast it is etc, and there are objectives for it. People who run local LLM don't automatically choose Nvidia's product over something just because nvidia is famous.

kkielhofner · 2024-02-14T15:17:04 1707923824

I'll try again.

Have you ever tried to use ChatGPT alone to work with documents? In terms of the free/ready to use product it's very painful. Give it a URL to a PDF (or something) and assuming it can load it (often can't) you can "chat" with it. One document at a time...

This is for the (BIG) world of Nvidia Windows desktop users (most of whom are fanboys who will install anything Nvidia announces that sounds cool) who don't know what an LLM is. They certainly wouldn't know/have the inclination to wander into /r/LocalLLaMA or some place to try to sort through a bunch of random projects with obscure names that are peppered with jargon and references to various models they've also never heard of or know the difference between. Then the next issue is figuring out the RAG aspects, which is an entirely different challenge.

This is a Windows desktop installer that picks one of two models automatically depending on how much VRAM you have, loads them to run on your GPU using one of the fastest engines out there, and then allows you to load your own local content and interact with it in a UI that just pops up after you double-click the installer. It's green and peppered with Nvidia branding everywhere. They love it.

What the Nvidia Windows desktop users will be able to understand is "WOW, look it's using my own GPU for everything according to my process manager. I just made my own ChatGPT and can even chat with my own local documents. Nvidia is amazing!"

> why would brand name matter?

Do you know anything about humans? Brands make a HUGE difference.

> People who run local LLM don't automatically choose Nvidia's product over something just because nvidia is famous.

/r/LocalLLaMA is currently filled with people ranting and raving about this even though it's inferior (other than ease of use and brand halo) to much of the technology that has been discussed there since forever.

Again - humans spend many billions and billions of dollars choosing products that are inferior solely because of the name/brand.

Capricorn2481 · 2024-02-13T18:21:26 1707848486

I have no idea what you're talking about and am waiting for an answer to OPs question. Downloading text-generation-webui takes a minute, let's you use any model and get going. I don't really understand what this Nvidia thing adds? It seems even more complicated than the open source offerings.

I don't really care how many installs it gets, does it do anything differently or better?

kkielhofner · 2024-02-14T15:34:42 1707924882

> Downloading text-generation-webui takes a minute, let's you use any model and get going.

What you're missing here is you're already in this area deep enough to know what ooogoababagababa text-generation-webui is. Let's back out to the "average Windows desktop user who knows they have an Nvidia card" level. Assuming they even know how to find it:

1) Go to https://github.com/oobabooga/text-generation-webui?tab=readm...

2) See a bunch of instructions opening a terminal window and running random batch/powershell scripts. Powershell, etc will likely prompt you with a scary warning. Then you start wondering who ooobabagagagaba is...

3) Assuming you get this far (many users won't even get to step 1) you're greeted with a web interface[0] FILLED to the brim with technical jargon and extremely overwhelming options just to get a model loaded, which is another mind warp because you get to try to select between a bunch of random models with no clear meaning and non-sensical/joke sounding names from someone called "TheBloke". Ok... Oh yeah, what's a "model"? GGUF? GPTQ? AWQ? Exllama? Prompt format? Transformers? Tokens? Temperature? Repeat for dozens of things you're familiar with but are meaningless to them.

Let's say you somehow braved this gauntlet and get this far now you get to chat with it. Ok, what about my local documents? text-generation-webui itself has nothing for that. Repeat this process over the 10 random open source projects from a bunch of names you've never heard of in an attempt to accomplish that.

This is "I saw this thing from Nvidia explode all over media, twitter, youtube, etc. I downloaded it from Nvidia, double-clicked, pointed it at a folder with documents, and it works".

That's the difference and it's very significant.

[0] - https://raw.githubusercontent.com/oobabooga/screenshots/main...

tracerbulletx · 2024-02-13T19:01:12 1707850872

It's a different inference engine with different capabilities. It should be a lot faster on Nvidia cards. I don't have comp benchmarks for llama.cpp but if you find some compare them to this.

https://nvidia.github.io/TensorRT-LLM/performance.html https://github.com/lapp0/lm-inference-engines/

sevagh · 2024-02-13T18:39:22 1707849562

It brings more authority than "oh just use <string of gibberish from the frontpage of hn>"

Capricorn2481 · 2024-02-13T23:15:57 1707866157

That tells you how it might affect people's perception of it, not whether it's better in any way.

sevagh · 2024-02-13T23:19:31 1707866371

Sure, it's just disingenuous to pretend that authority doesn't matter.

Capricorn2481 · 2024-02-14T01:23:03 1707873783

Disingenuous to what? I'm asking what it brings someone who can already use an open source solution. I feel like you're just trying to argue for the sake of it.

SirMaster · 2024-02-13T17:13:10 1707844390

I just looked up Ollama and it doesn't look like it supports Windows. (At least not yet)

McAtNite · 2024-02-13T17:16:52 1707844612

Oh my apologies for the wild goose chase. I thought they had added support for Windows already. Should be possible to run it through WSL, but I suppose that’s a solid point for Nvidia in this discussion.

SirMaster · 2024-02-13T17:22:56 1707844976

I think there's a market for a user who is not very computer savvy who at least understands how to use LLMs and would potentially run a chat one on their GPU especially if it's just a few clicks to turn on.

dist-epoch · 2024-02-13T17:44:21 1707846261

There are developers which fail to install Ollama/CUDA/Python/create-venv/download-models on their computer after many hours of trying.

You think a regular user has any chance?

McAtNite · 2024-02-13T17:47:33 1707846453

Not really. I expect those users will just use copilot.

se4u · 2024-02-13T17:39:39 1707845979

You are forgetting about developers who may want to develop on top of something stable and with long term support. That's a big market.

McAtNite · 2024-02-13T17:43:50 1707846230

Would they not prefer to develop for CoPilot? In comparison this seems niche.

imtringued · 2024-02-14T08:22:39 1707898959

>people who are willing to purchase an expensive GPU,

Codeword for people who have hardware specialized and suitable for AI.

ribosometronome · 2024-02-13T18:43:17 1707849797

Gamers who bought an expensive card and see this advertised to them in Nvidia's Geforce app?

papichulo2023 · 2024-02-13T16:31:51 1707841911

Does windows uses the pc's gpu or just cpu or cloud?

robotnikman · 2024-02-13T16:54:01 1707843241

If they are talking about the Bing AI, just using whatever OpenAI has in the cloud

McAtNite · 2024-02-13T17:02:23 1707843743

I’m referring to CoPilot which for your average non technical user who doesn’t care whether something is local or not has the huge benefit of not requiring the purchase an expensive GPU.

zamadatix · 2024-02-13T17:09:31 1707844171

Never underestimate people's interest in running something which lets them generate crass jokes about their friends or smutty conversation when hosted solutions like CoPilot could never allow such non-puritan morals. If this delivers on being the easiest way to run local models quickly then many people will be interested.

joenot443 · 2024-02-13T19:04:55 1707851095

The immediate value prop here is the ability to load up documents to train your model on the fly. 6mos ago I was looking for a tool to do exactly this and ended up deciding to wait. Amazing how fast this wave of innovation is happening.

seydor · 2024-02-13T16:54:31 1707843271

Windows users who haven't bought an Nvidia card yet