Improving recommendation systems and search in the age of LLMs

x1xx · 2025-03-23T10:41:03 1742726463

> Spotify saw a 9% increase in exploratory intent queries, a 30% rise in maximum query length per user, and a 10% increase in average query length—this suggests the query recommendation updates helped users express more complex intents

To me it's not clear that it should be interpreted as an improvement: what I read in this summary is that users had to search more and to enter longer queries to get to what they needed.

rorytbyrne · 2025-03-23T11:04:02 1742727842

We would need to normalise query length by the success rate to draw any informative conclusions here. The rate of immediate follow-up queries could be a decent proxy for this.

Traubenfuchs · 2025-03-23T11:22:35 1742728955

> a 9% increase in exploratory intent queries

Users struggle to find the right stuff or stuff that‘s so good they don‘t need do do more queries.

> a 30% rise in maximum query length per user, and a 10% increase in average query length

Users need to execute more complex queries to find what they are looking for.

RicoElectrico · 2025-03-23T13:08:34 1742735314

I can understand tracking metrics for performance (as in speed, server load) or revenue. But I don't see how anyone could make such conclusions as they did with a straight face, apart from achieving some OKR for promotion reasons. There's no substitute for user research, focused mindset and good taste.

I can imagine that's why today's apps suck so much as most of the pain points won't be easily caught by user behavior metrics.

One thing Alex from Organic Maps taught me is how important it is to just listen to your users. Many of the UX improvements were driven by addressing complaints from e-mail feedback.

RamblingCTO · 2025-03-24T09:18:15 1742807895

100%. I've switched over to Apple Music because you really feel that they are pushing public playlists. Search terms maximize their playlists vs mine. I now had to go to my library to find my playlists because they wouldn't even show up.

singron · 2025-03-24T15:53:40 1742831620

This is a hard problem. We had similar issues evaluating success with real users. In the literature, there is "abandonment" (i.e. I couldn't find what I wanted and gave up) and "positive abandonment" (I got what I wanted from the SERP and didn't click on anything). A flurry of requests might be a series of positive abandonment, a natural fruitful process of refining the request, or rage querying where the user repeatedly fails to correct a model that is incapable of understanding the query. It's especially devious if they rage query for a while before switching to an easier task and succeeding (e.g. clicking a result) since you might count that whole interaction as positive when it was really quite negative.

barrenko · 2025-03-24T07:20:06 1742800806

People are just more and more used to interacting with an LLM / GPT, I think that's the why of the long questions + yes, people are not finding what they need.

1oooqooq · 2025-03-23T12:33:40 1742733220

that's what you get when you have a "search pm".

braiamp · 2025-03-23T19:29:34 1742758174

Yeah, this should be evaluated in a multivariate/bivariate model. Of the successful queries, how the length changed before and after interventions.

wildrhythms · 2025-03-23T17:14:05 1742750045

No you don't understand, more queries = more engagement!

MostlyStable · 2025-03-23T19:35:18 1742758518

It's relatively easy to construct a scenario where more search is in fact indicative of better search. To stick with Spotify: let's imagine they have an amazing search tool that consistently finds new, interesting music that the user genuinely likes. I can imagine that in that situation, users are going to search more, because doing so consistently gets them new, enjoyable music.

But the opposite is equally possible: a terrible search tool could regularly fail to find what the user is looking for or produce music that they enjoy. In this situation, I can also imagine users searching more, because it takes more search effort to find something they like.

They key is why are users searching. In Spotify's case I imagine that you could try and connect number of searches per listen, or how often a search results in a listen and how often those listens result in a positive rating. There are probably more options, but there needs to be some way of connecting the amount of search with how the user feels about those search results.

And yeah, using nothing other than search volume is probably a bad way to go about it

cco · 2025-03-24T00:04:04 1742774644

Or more saves and thumbs up on signs resulting from a search is because users are desperate to save a song they like because they have no faith that they'll be able to find it again with search.

The only way is to use the product yourself and honestly engage with it. Stats can't answer this question.

genewitch · 2025-03-24T02:02:56 1742781776

Contextually unique searches versus contextually similar searches.

Nin, NIN, nine inch nails, Trent Reznor

VS

Nin, pantera, nail bomb, muse

This should be easy to differentiate, with a "[someone's name] distance algorithm" or such, right?

jimmyl02 · 2025-03-24T00:49:42 1742777382

I feel like understanding this difference is what a good product manager should be responsible for. Not just optimizing any metric that is available but understanding the meaning behind them and choosing the push them the right direction.

whatevertrevor · 2025-03-24T00:28:48 1742776128

But isn't that actually the point? That measuring query volume tells you nothing?

MostlyStable · 2025-03-24T01:47:41 1742780861

Yes, I was agreeing and expanding on the point.

porridgeraisin · 2025-03-24T10:01:49 1742810509

A conclusion true to the concerned executive's MBA.

novia · 2025-03-23T14:45:38 1742741138

I started listening to this article (using a text to speech model) shortly after waking up.

I thought it was very heavy on jargon. Like, it was written in a way that makes the author appear very intelligent without necessarily effectively conveying information to the audience. This is something that I've often seen authors do in academic papers, and my one published research paper (not first author) is no exception.

I'm by no means an expert in the field of ML, so perhaps I am just not the intended audience. I'm curious if other people here felt the same way when reading though.

Hopefully this observation / opinion isn't too negative.

curious_cat_163 · 2025-03-23T15:02:29 1742742149

To me, it reads like a survey paper intended for (and maybe written by) a researcher about to start a new project. I am not a researcher in this space but I have dabbled elsewhere, so it is somewhat accessible. The degree to which one leverages existing jargon in their writing is a choice, of course.

I am curious -- what would have made it more effective at conveying information to you? Different people learn differently but I wonder how people get beyond the hurdles of jargon.

novia · 2025-03-23T15:14:46 1742742886

Yeah I'm not sure if it's just me and my learning style or if researchers purposefully use terminology that's obstructive to understanding to maintain walled gardens. I don't think my reading comprehension level is particularly low!

Usually the best way to learn about things like this for me is to see some actual code or to write things myself, but the lack of coding examples in the text isn't the thing that I find troubling. I don't know, it's just.. like, excessively pointer heavy?

Maybe if you've been in the field long enough, reading a particular term will instantly conjure up an idea of a corresponding algorithm or code block or something and that's what I'm missing.

7d7n · 2025-03-23T17:04:12 1742749452

Thank you for the feedback! I'm sorry you found it jargony/less accessible than you'd like.

The intended audience was my team and fellow practitioners; assuming some understanding of the jargon allowed me to skip the basics and write more concisely.

LZ_Khan · 2025-03-23T23:49:26 1742773766

I work in the field. The amount of jargon is indeed large but it's not out of the ordinary. It's simply how things are referred to. If the author explained what everything is the content would span a textbook.

That being said I do find the content difficult to understand, and I think reading the actual papers would be much more enlightening. But it's a great survey of all the things people have done.

softwaredoug · 2025-03-23T17:35:36 1742751336

A lot of teams can do a lot with search with just LLMs in the loop on query and index side doing enrichment that used to be months-long projects. Even with smaller, self hosted models and fairly naive prompts you can turn a search string into a more structured query - and cache the hell out of it. Or classify documents into a taxonomy. All backed by boring old lexical or vector search engine. In fact I’d say if you’re NOT doing this you’re making a mistake.

syndacks · 2025-03-23T18:43:59 1742755439

Can you share more, or at least point me in the right direction?

ntonozzi · 2025-03-23T22:24:55 1742768695

One place to explore more would be Doc2Query: https://arxiv.org/abs/1904.08375.

It’s not the latest and hottest but super simple to do with LLMs these days and can improve a lexical search engine quite a lot.

jamesblonde · 2025-03-23T07:19:30 1742714370

It is very interesting that Eugene does this work and publishes it so soon after conferences. Traditionally this would be a literature survey by a PhD student and would take 12 months to come out as some obscure journal behind a walled garden. I wonder if it is an outlier (Eugene is good!) or a sign of things to come?

drodgers · 2025-03-23T11:26:08 1742729168

> a sign of things to come

Isn't this, like, a sign of what's been happening for the last 20+ years (arxiv, blogs etc.)?

jamesblonde · 2025-03-23T18:23:45 1742754225

To some extent. But it's hard to find quality. Eugene's stuff is quality. For example, i'm in distributed systems, databases, and MLOps. Murat Demirbas (Uni Buffalo) has been the best in dist systems. Andy Pavlo (CMU) for databases. Stanford (Matei) have been doing the best summarizing in MLOps.

tullie · 2025-03-23T13:29:07 1742736547

The other direction that isn’t explicitly mentioned in this post is the variants of SASRec and Bert4Rec that are still trained on ID-Tokens but showing scaling laws much like LLMs. E.g. Meta’s approach https://arxiv.org/abs/2402.17152 (paper write up here: https://www.shaped.ai/blog/is-this-the-chatgpt-moment-for-re...)

anon8764352 · 2025-03-24T06:12:06 1742796726

@7d7n Eugene / others experienced in recommendation systems: for someone who is new to recommendation systems and uses variants of collaborative filtering for recommendations, what non-LLM approach would you suggest to start looking into? The cheaper the compute (ideally without using GPUs in the first place) the better, while also maximizing the performance of the system :)

mhuffman · 2025-03-24T14:00:35 1742824835

IMHO it depends on the types of things you are recommending. If you have a good way of accurately and specifically textually classifying items it is hard to beat the performance of good old-fashioned embeddings and vector search/ANN. There are plenty of embeddings that do not need GPU like the newer LLM-based ones all crave. Word2Vec, GloVe, and FastText are all high-performance and you wouldn't need GPUs. There are plenty of vector-search libraries that are high-performance and predate the vector-db popularity of late, so also would not depend on GPUs to be high-performance. Most are memory-hungry however, so something to keep in mind. That performance, especially with the embeddings, will come at the cost of loss of some context. No free lunch.

thaumiel · 2025-03-23T12:07:08 1742731628

ah this explains why my spotify experience has gotten worse over time.

UrineSqueegee · 2025-03-23T14:00:10 1742738410

I have the exact opposite experience, recently when a playlist I have is over, I find that every recommended track that plays after, I love so much I end up putting in my playlist

thaumiel · 2025-03-23T14:57:20 1742741840

My taste in music is apparently so varied, that if I want to keep the "daily" Spotify list as I want them, I have to limit myself in variation in what I listen to, otherwise they will get too mixed up and I will not enjoy them anymore. So I use other peoples recommendations or music review sites instead to find new music/bands/artists. I tried the spotify AI dj service a couple of times, but it has not been a good experience, when it tries to push in a new direction it has never really gotten it right for me.

appleorchard46 · 2025-03-23T14:17:39 1742739459

I liked when you could make a playlist radio and do that manually. That's been removed now of course.

Melatonic · 2025-03-23T15:10:09 1742742609

On desktop I believe you can still take any of your playlists and tell it to generate a "similar" playlist. Works really well.

appleorchard46 · 2025-03-27T22:17:00 1743113820

Oops, I'm four days late but - no, it appears to have been removed there too. If I'm missing it please let me know.

whatever1 · 2025-03-23T06:16:57 1742710617

Why we don’t have an LLM based search tool for our pc / smartphones?

Specially for the smartphones all of your data is on the cloud anyway, instead of just scraping it for advertising and the FBI they could also do something useful for the user?

rudedogg · 2025-03-23T06:39:08 1742711948

This is roughly what Apple Intelligence was supposed to deliver but has yet to.

curious_cat_163 · 2025-03-23T14:45:23 1742741123

> Why we don’t have an LLM based search tool for our pc / smartphones?

I'll offer my take as an outside observer. If someone has better insights, feel free to share as well.

In market terms, I think it is because Google, Microsoft and Apple are all still trying with varied success. It has to be them because that's where a big bulk of the users are. They are all also public companies with impatient investors wanting the stock to go up into the right. So, they are both cautious about what ship to billions of devices (brand protection) and cautious about "opening up" their OS beyond that they have already done (fear of disruption).

In technical terms, it is taking a while because if the tool is going to use LLMs, then they need to solve for 99.999% of the reliability problems (brand protection) that come with that tech. They need to solve for power consumption (either on edge or in the data centers) due to their sheer scale.

So, their choices are ship fast (which Google has been trying to do more) and iterate in public; or partner with other product companies by investing in them (which Microsoft has been doing with Open AI and Google is doing with Anthropic, etc.).

Apple is taking some middle path but they just fired the person who was heading up the initiative [1] so let's see how that goes.

My two cents.

[1] https://www.reuters.com/technology/artificial-intelligence/a...

visarga · 2025-03-23T07:01:10 1742713270

I found that ChatGPT or Claude are really good at music and shopping suggestions. Just chat with them about your tastes for a while, then ask for suggestions. Compared to old recommender systems this method allows much better user guidance.

josephg · 2025-03-23T08:10:57 1742717457

Yeah, Claude helped me decide what to get my girlfriend for her birthday a few weeks ago. It suggested some great gift ideas I hadn’t thought of - and my girlfriend loved them.

Workaccount2 · 2025-03-23T13:56:15 1742738175

I think we can expect this to be rapidly monetized.

KoftaBob · 2025-03-23T14:13:55 1742739235

For shopping suggestions, I've had the best experience with Perplexity.

GraemeMeyer · 2025-03-23T11:15:13 1742728513

It's coming for PCs soon: https://www.theregister.com/2025/01/20/microsoft_unveils_win...

And to a certain extent for the Microsoft cloud experience as well: https://www.theverge.com/2024/10/8/24265312/microsoft-onedri...

wildrhythms · 2025-03-24T23:05:03 1742857503

Pixels already have the Screenshots app that indexes screenshots and makes them searchable. My assumption is the context window size is still too small for all of your data to go into it.

dmbche · 2025-03-23T06:22:13 1742710933

It doesn't solve any problem, you can just search your files using your prefered file explorer (crtl-f)

I'd assume most people organise their files so that they know where things are as well.

nine_k · 2025-03-23T06:34:06 1742711646

> you can just search your files using your prefered file explorer

This only work if you remember specific substrings. An LLM (or some other language model) can summarize and interpolate. It can be asked to find that file that mentions a transaction for buying candy, and it has a fair chance to find it, even if none of the words "transaction", "buying" or "candy" are present in the file, e.g. it says "shelled out $17 for a huge pack of gobstoppers".

> I'd assume most people organise their files

You'll be shocked, but...

dmbche · 2025-03-23T09:32:05 1742722325

But isn't that candy example non-sensical? In what situation do you need some information without any of the context(or without knowing any of the context)?

i really believe that this is not an actual problem in need of solving, but instead creating a tool (personal ai assistant) and trying to find a usecase

Edit0: note to self, rambling - assuming there exist valuable information that one needa to access in their files, but one doesn't know where it is, when it was made, it's name or other information about it(as you could find said file right away with this information).

Say you need an information for some documentation like the C standard - you need precise information on some process. Is it not much simpler to just open the doc and use the index? Then again for you to be aeare of the C standard makes the query useless.

If it's from something less well organised, say you want letters you wrote to your significant other, maybe the assistant could help. But then again, what are you asking? How hard is it to keep your letters in a folder? Or even simply know what you've done (I surely can't imagine forgetting things I've created but somehow finding use in a llm that finds it for me).

Like asking it "what is my opinion on x" or "what's a good compliment I wrote" is nonsensical to me, but asking it about external ressources makes the idea of training it on your own data pointless. "How did I write X API" - just open your file, no? You know where it is, you made it.

Like saying "get me that picture of unle tony in Florida" might save you 10 seconds instead of going into your files and thinking about when you got that picture, but it's not solving a real issue or making things more efficient. (Edit1: if you don't know Tony, when you got the picture or of what it's a picture of, why are you querying? What's the usecase for this information, is it just to prove it can be done? It feels like the user needs to contorts themselves in a small niche for this product to be useful)

Either it's used for non valuable work (menial search) or you already know how to get the answer you need.

I cannot imagine a query that would be useful compared to simply being aware of what's in your computer. And if you're not aware of it, how do you search for it?

mjlee · 2025-03-23T11:28:56 1742729336

I think your brain may just work differently to mine, and I don't think I'm unique.

> "get me that picture of unle tony in Florida" might save you 10 seconds instead of going into your files and thinking about when you got that picture

I don't have a memory for time, and I can't picture things in my mind. Thinking about when I took a picture does nothing for me, I could be out by years. Having some unified natural language search engine would be amazing for me. I might remember it was a sunny day and that we got ice cream, and that's what I want to search on.

The "small niche" use case for me is often my daughter wants to see a photo of a family member I'm talking about, or I want to remember some other aspect of the day and the photo triggers that for me.

dmbche · 2025-03-23T12:27:34 1742732854

Makes a lot of sense. Thanks for the response, enjoy your day!

IanCal · 2025-03-24T10:37:54 1742812674

> But isn't that candy example non-sensical? In what situation do you need some information without any of the context(or without knowing any of the context)?

I know the context and the content but not the specific substrings in an email I received several years ago.

Here's one of the first things that gemini in gmail actually helped with. I wanted to check when I bought a car seat for my kids, which one it was and how much it cost.

So I knew the rough time it was when I bought it, I know it's a receipt I'm looking for, it's for a child seat, and roughly when. I know the context here.

What I struggled with was finding the exact text that would be in that. There are hundreds or more emails with invoice/receipt/order in. I didn't recall exactly who I bought it from, and there are large numbers of more advertising emails with kids seats in.

I couldn't easily find it, because the actual email I wanted did not say child seat in it. It had a brand and other information, but nothing in the text had a substring I was searching for. I might have found it with "booster seat" but I didn't think of that exact phrase at the time.

Instead I asked gemini to find it. That can then trawl through a bunch of emails and find things that mean but do not say child seat.

dmbche · 2025-03-24T13:47:55 1742824075

Makes total sense, I'm not entirely sure why but I had assumed we were talking about an AI assistant being device specific, which I understood as "based on my offline data and files" - I'm saying I'm not sure why because the parent comment is specifically mentionning the cloud. Anyhoot.

Enjoy your day.

pizza · 2025-03-23T21:05:43 1742763943

Here’s an example of a type of feature I want: I’m looking at a menu from a popular restaurant and it has hundreds of choices. I start to feel some analysis paralysis. I say to my computer, “hey computer, I’m open to any suggestions, so long as it’s well-seasoned, spicy, salty, has some protein and fiber, easy to digest, rich in nutrients, not too dry, not too oily, pairs well with <whatever I have in my fridge>, etc..” Basically, property-oriented search queries whose answers can be verified, without having to trudge through them myself, where I don’t really care about correctness, just satisficing.

ozim · 2025-03-23T08:00:12 1742716812

I think the same, people are not organized - even with things that make them money and being organized could earn them much more.

whatever1 · 2025-03-23T06:31:29 1742711489

But file explorer does not read the actual files and build context. Even for pure text files that sometimes search functions can also access, I need to remember exactly the string of characters I am looking for.

I was hoping an LLM would have a context of all of my content (text and visual) and for the first time use my computers data as a knowledge base.

Queries like “what was my design file for that x service” ? Today it’s impossible to answer unless you have organized your data your self.

Why do we still have to organize our data manually?

pests · 2025-03-23T07:05:58 1742713558

The photos apps do this well now. Can search Apple/Google photos with questions about the content of images and videos and get useful results.

ozim · 2025-03-23T06:52:21 1742712741

I think you are really wrong.

Most people I see at work and outside don’t care and they want stupid machine to deal with it.

That is why smartphones and tablets move away from providing „file system” access.

It is super annoying for me but most people want to save their tax form or their baby photo not even understanding each is different file type - because they couldn’t care less about file types let alone making folder structure to keep them organized.

acchow · 2025-03-23T10:21:30 1742725290

Curiously, the things I search most often are not located in files: calendar, photo content/location, email, ChatGPT history, Spotify library, iMessage/whatsapp history, contacts, notes, Amazon order history

anthk · 2025-03-23T10:06:54 1742724414

Use 'Recoll' and learn to use search strings. For Windows users, older Recoll releases are standalone and have all the dependencies bundled, so you can search into PDF's, ODT/DOCX and tons more.

stuaxo · 2025-03-23T11:28:03 1742729283

Off topic - but I think joining recommendation systems and forums (aka all the social media that isn't bsky or fedi) has been a complete disaster for society.

anonymousDan · 2025-03-23T06:59:40 1742713180

It's interesting that none of these papers seem to be coming out of academic labs....

pizza · 2025-03-23T08:48:30 1742719710

Checking if a recommendation system is actually good in practice is kind of tough to do without owning a whole internet media platform as well. At best, you'll get the table scraps from these corporations (in the form of toy datasets/models made available), and you still will struggle to make your dev loop productive enough without throwing similar amounts of compute that the ~FAANGs do so as to validate whether that 0.2% improvement you got really meant anything or not. Oh, and also, the nature of recommendations is that they get very stale very quickly, so be prepared to check that your method still works when you do yet another huge training run on a weekly/daily cadence.

bradly · 2025-03-23T20:57:09 1742763429

> you still will struggle to make your dev loop productive enough without throwing similar amounts of compute that the ~FAANGs do so as to validate whether that 0.2% improvement you got really meant anything or not

And do not forget the incredible of number of actual humans FAANG pays every day to evaluate any changes in result sets for top x,000 queries.

lmeyerov · 2025-03-23T14:15:04 1742739304

As someone whose customers do this stuff, I'm 100% for most academics chasing harder and more important problems

Most of these papers are specialized increments on high baselines for a primarily commercial problem. Likewise, they focus on optimizing phenomena that occur in their product, which may not occur in others. Eg, Netflix sliding window is neato to see the result of, but I rather students user their freedom to explore bigger ideas like mamba, and leave sliding windows to a masters student who is experimenting with intentionally narrowly scoped tweaks.

lmeyerov · 2025-03-23T14:21:12 1742739672

As someone whose customers do this stuff, I'm 100% for most academics chasing harder and more important problems.

Most of these papers are specialized increments on high baselines for a primarily commercial problem. Likewise, they focus on optimizing phenomena that occur in their product, which may not occur in others. Eg, Netflix sliding window is neato to see the result of, but I rather students user their freedom to explore bigger ideas like mamba, and leave sliding windows to a masters student who is experimenting with intentionally narrowly scoped tweaks. At that point, to top PhD grads at industrial labs will probably win.

That said, recsys is a general formulation with applications beyond shopping carts and social feeds, and bigger ideas do come out, where I'd expect competitive labs to do projects on. GNN for recsys was a big bet a couple years ago, and LLMs now, and it is curious to me those bigger shifts are industrial labs papers as you say. Maybe the statement there is recsys is one of the areas that industry hires a lot of PhDs on, as it is so core to revenue lift: academia has regular representation, while industry is overrepresented.

memhole · 2025-03-23T14:29:22 1742740162

It looks like a great overview of recommendation systems. I think my main takeaways are:

1. Latency is a major issue.

2. Fine tuning can lead to major improvements and I think reduce latency. If I didn’t misread.

3. There’s some threshold or problems where prompting or fine tuning should be used.

a_bonobo · 2025-03-24T05:58:51 1742795931

Elicit has a nice new feature where given a research question, it seems to give the question to an LLM with the prompt to improve the question. It's a neat trick.

As an example, I gave it 'What is the impact of LLMs on search engines?' and it suggested three alternative searches under keywords, the keyword 'Specificity' has the suggested question 'How do large language models (LLMs) impact the accuracy and relevance of search engine results compared to traditional search algorithms?'

It's a really cool trick that doesn't take much to implement.

bookofjoe · 2025-03-23T14:18:38 1742739518

Perplexity Pro suggested several portable car battery chargers, which led me to search online reviews, whose consensus (five or so review sites) highest-rated chargers were the first two on Perplexity's recommendation list. In other words, the AI was an helpful guide to focused deeper search.

thorum · 2025-03-23T08:19:27 1742717967

In the age of local LLMs I’d like to see a personal recommendation system that doesn’t care about being scalable and efficient. Why can’t I write a prompt that describes exactly what I’m looking for in detail and then let my GPU run for a week until it finds something that matches?

osmarks · 2025-03-23T09:01:50 1742720510

You could just run a local LLM over every document and ask it "is this related to this query". I don't think you actually want to wait a week (and holding all the documents you might ever want to search would run to petabytes).

(the reasonable way is embedding search, which runs much faster with some precomputation, but you still have to store things)

amelius · 2025-03-23T13:54:37 1742738077

A better way would be to ask the LLM to generate keywords (or queries). And then use old school techniques to find a set of documents, and then filter those using another LLM.

brookst · 2025-03-23T14:30:43 1742740243

How is that better than embeddings? You’re using embeddings to get a finite list of keywords, throwing out the extra benefits of embeddings (support for every human language, for instance), using a conventional index, and then going back to embeddings space for the final LLM?

That whole thing can be simplified to: compute and store embeddings for docs, compute embeddings for query, find most similar docs.

amelius · 2025-03-23T14:44:17 1742741057

Yes, you can do the "old school search" part with embeddings.

brookst · 2025-03-23T16:13:21 1742746401

Ah, I had interpreted “old school search” to mean classic text indexing and Boolean style search. I’d argue that if it’s using embeddings and cosine similarity, it’s not old school. But that’s just semantics.

osmarks · 2025-03-23T20:19:11 1742761151

https://arxiv.org/abs/2212.10496

kortilla · 2025-03-23T09:57:50 1742723870

The entire library of Congress is like 10TB. You don’t need anything near petabytes until you get out of text into rich media.

osmarks · 2025-03-23T10:13:21 1742724801

Common Crawl is petabytes. Anna's Archive is about a petabyte, but it includes PDFs with images.

pizza · 2025-03-23T08:53:14 1742719994

It's worth pointing out that even with the largest models out there, coherence drops fast over length. In a local home ML setup, until somebody radically improves long-term coherence, models with < x memory may be a diametrically opposed constraint to something that still says the right thing after > y minutes of search.

whiplash451 · 2025-03-23T08:37:29 1742719049

Why would it take a week?

Is this because you want it to continuously watch for live data that could match your need?

mdp2021 · 2025-03-23T09:11:29 1742721089

Because thinking takes time.

r4ndomname · 2025-03-23T08:33:00 1742718780

This is exactly what I am hoping to get sometimes (but I would say, 1 week is maybe a little long).

If I go through my current tasks and see, that for some task I need a set of documents, emails, .., why cant I just prompt the system to get it in 30-ish minutes. But as someone already stated Apple Intelligence is supposed to fill this gap.

mdp2021 · 2025-03-23T09:13:30 1742721210

> maybe a little long

Many of us have ongoing problems pending for years - for just "a week", "where do I sign".

It really depends on the task.

bryanrasmussen · 2025-03-23T09:07:40 1742720860

this is sort of like a dream I had https://medium.com/luminasticity/the-county-map-of-the-world...

>The idea was that he could graft queries in this that he did not expect to finish quickly but which he could let run for hours or days and how freeing it was to do more advanced research this way.

fhe · 2025-03-23T08:56:47 1742720207

or it keeps monitoring the web and notify me whenever something that matches my interests shows up -- like a more sophisticated Google alert. I really would love that.

desdenova · 2025-03-23T11:26:52 1742729212

Why can't you?

Just run the biggest model you can find out of swap and wait a long time for it to finish.

You'll obviously see more focus on smaller models, because most people aren't willing to wait weeks for their slop, and also don't have server GPU clusters to run huge models.

HeatrayEnjoyer · 2025-03-23T15:15:43 1742742943

> Just run the biggest model you can find out of swap

This kills the SSD

onel · 2025-03-23T09:04:08 1742720648

Another amazing post from Eugene

anon373839 · 2025-03-23T06:59:32 1742713172

Terrific post. Just about everything Eugene writes about AI/ML is pure gold.

hackernewds · 2025-03-23T07:33:36 1742715216

haha this is some solid astroturfing Eugene :)

7d7n · 2025-03-23T17:05:07 1742749507

haha that wasn't me ;)

anon373839 · 2025-03-23T20:19:32 1742761172

Oops, sorry! Wasn’t trying to make you look bad. Just a fan of your writing.

7d7n · 2025-03-23T20:56:00 1742763360

Not at all! I appreciate the kind words. Thank you!