Sure! So in many, if not most, contemporary Information Retrieval (IR) problems,...

nextaccountic · on Sept 26, 2023

So basically you're using PID as a metaheuristic to guide an optimization process? (like particle swarm, simulated annealing, genetic algorithms, etc)

Or it's more like, it's being used to fine tune the parameters of another search procedure? And in principle you could use a neural net rather than PID (that would encode a surface that matches the latency vs search depth profile that you want)

edit: just figured out those two are the same thing; the PID is optimizing the search depth for a given target latency

mrfox321 · on Sept 26, 2023

It can be stated as follows:

The number of documents (denoted as N) to search over consumes resources and increases the overall latency of the search infrastructure. However, the amount of traffic ebbs and flows. Under periods of lower traffic, we likely can increase N to provide good search results without violating latency constraints. Conversely, high traffic periods likely requires lower values of N.

Let's then approximate system strain by p99 (or p99.XXX) latency.

Solution:

Use a PID controller to set N as a function of latency (p99, p99.5, etc.) of the cluster. This leads to the outcome where N reduces when p99 latency starts to spike (resource starvation), and increases when p99 is low.

adm_ · on Sept 26, 2023

Really interested if you can go into more detail about multi-stage ranking, compact posting lists and stuff.

srean · on Sept 26, 2023

Ah thanks, now I have the context.Thanks again for replying.