For the case of *load balancing* as in the OP, for a statement of the problem, o...

For the case of load balancing as in the OP, for a statement of the problem, over time, requests for work arrive at an Internet computer server farm. The requests differ in what computing resources at the server farm are needed and for each resource how much work. E.g., a request may need 10 GB of main memory and 40 GB of virtual memory paged memory, 3 TB of I/O to rotating disk, and four processor cores, all for 50 minutes.

So, at each arrival, have to assign the work to a server. When the assignment is made, we do know about about the, say, state of each of the servers, say, what work has been assigned and how busy the server is.

The goal of the work is in some sense to get the best performance from the server farm for the loads. There are various candidate definitions/measures of performance, e.g., minimize the average elapsed time for the work of the requests to be done, that is, minimize the average response time as seen by the users/customers.

So, the work of the load balancing is just the assignments, one work request at a time, to the servers. This assignment is under uncertainty maybe about the details of each request and, so that can plan ahead, also about what the next requests will be.

E.g., if we had some idea that soon we would get a request that would need all of one server, then maybe we should think ahead and have reserved an empty server for that request. If we don't reserve such a server, then to assign this request we might have to keep waiting until have an empty server and, unless make an effort to have such an empty server, the request could go unassigned for a long time thus hurting the goal.

So, we have to make decisions over time under uncertainty to achieve a goal that is an average of the results.

Now likely we have a problem in the field of applied math called stochastic optimal control. In our case the control is the freedom we have when we make the assignments. Stochastic means varying with uncertainty over time. The stochastic part is the uncertainty of the next requests that arrive. The optimal part is that we want to do best as possible likely on some average measure, only average because we are working with uncertainty. Sure, if we don't like average, we could use median, etc.

IIRC there was some work on this problem by some especially diligent researchers at the IBM Watson lab and based in part on work of H. Kushner at the Division of Applied Mathematics at Brown University.

IIRC the good news is that those researchers made good progress. IIRC the bad news was that the computations for their solution were a bit too large to be practical! Also, from the references I saw them using, the math prerequisites for their work were a bit much!

So, maybe good work for a solution is in the literature but too difficult to use. Maybe.

My suggestion: Be practical. First cut, assign the next request for work to the least busy server. If the server farm is so busy that often all the servers are too busy for more work, then have to leave requests in a FIFO queue waiting until some server is not too busy for the first request in the queue. Collect empirical, production data on this control, the FIFO queue, the response times, any problems etc. If see no significant problems, then declare the problem solved for now. Else focus on where the problems are and look for a control that also handles those problems.

This material about stochastic optimal control was heavily from R. Bellman. Other authors include W. Fleming, D. Bertsekas, S. Shreve, E. Dynkin, R. Rockafellar, R. Wets. The field has been of interest by parts of applied math (e.g., Brown), operations research, and systems and electronic engineering.

This load balancing at a server farm contains as a special case some of the topics in queuing theory.

Computer science has long been interested in the problem of job scheduling on batch computer systems which, of course, is a relatively small special case of load balancing as in the OP. IIRC the usual result of the job scheduling work was that the problem was in NP-complete and otherwise in practice more work to do than the work being scheduled! Maybe now computer science will also be interested in something stochastic optimal control for load balancing.