For the case of load balancing
as in the OP, for a statement of the
problem, over time, requests for work
arrive at an Internet computer
server farm. The requests differ in
what computing resources
at the server farm are needed
and for each resource how much work.
E.g., a request may need 10 GB of main
memory and 40 GB of virtual memory
paged memory, 3 TB of I/O to
rotating disk, and four processor
cores, all for 50 minutes.
So, at each arrival, have to
assign the work to a server.
When the assignment is made,
we do know about about the, say,
state of each of the servers,
say, what work has been assigned
and how busy the server is.
The goal of the work is in some
sense to get the best performance
from the server farm for the loads.
There are various candidate
definitions/measures of performance,
e.g., minimize the average
elapsed time
for the work of the requests to
be done, that is, minimize the
average response time
as seen by the users/customers.
So, the work of the load
balancing is just the
assignments, one
work request at a time,
to the servers. This
assignment is under
uncertainty maybe about
the details of each request
and, so that can
plan ahead, also about what
the next requests will be.
E.g., if we had some idea
that soon we would get a
request that would need
all of one server, then
maybe we should think ahead
and have reserved an empty
server for that request. If
we don't reserve such a
server, then to assign
this request we might have to
keep waiting until have
an empty server and, unless
make an effort to have
such an empty server,
the request could
go unassigned for a long
time thus hurting the goal.
So, we have to make decisions
over time under uncertainty
to achieve a goal that is
an average of the results.
Now likely we have a problem in
the field of applied math called
stochastic optimal control.
In our case the control is the freedom
we have when we make the
assignments. Stochastic
means varying with uncertainty
over time. The stochastic part
is the uncertainty of the
next requests that arrive.
The optimal part is that
we want to do best as possible
likely on some average measure,
only average because we
are working with uncertainty.
Sure, if we don't like average,
we could use median, etc.
IIRC there was some work on
this problem by some especially
diligent researchers at the IBM
Watson lab and based in part on
work of H. Kushner at the
Division of Applied Mathematics
at Brown University.
IIRC the good news is that those researchers made good
progress. IIRC the bad news was
that the computations for
their solution were a bit
too large to be practical!
Also, from the references I
saw them using, the math
prerequisites for their
work were a bit much!
So, maybe good work for
a solution is in the literature
but too difficult to use.
Maybe.
My suggestion: Be practical.
First cut, assign the next
request for work to the least
busy server. If the server farm
is so busy that often all the
servers are too busy for more
work, then have to leave requests
in a FIFO queue waiting until
some server is not too busy
for the first request in the queue.
Collect empirical, production data on this control, the FIFO queue,
the response times, any
problems etc. If see no significant
problems, then declare the problem
solved for now. Else focus on where
the problems are and look
for a control that also handles
those problems.
This material about stochastic optimal control was heavily from R. Bellman. Other authors include W. Fleming,
D. Bertsekas, S. Shreve, E. Dynkin, R.
Rockafellar, R. Wets. The field has
been of interest by parts of
applied math (e.g., Brown), operations
research, and systems and
electronic engineering.
This load balancing at a server
farm contains as a special
case some of the topics
in queuing theory.
Computer science has long
been interested in the
problem of job scheduling
on batch computer systems
which, of course, is a
relatively small special
case of load balancing
as in the OP. IIRC the usual
result of the job scheduling
work was that
the problem was in NP-complete
and otherwise in practice
more work to do than
the work being scheduled!
Maybe now computer science
will also be interested in
something stochastic
optimal control for
load balancing.
So, at each arrival, have to assign the work to a server. When the assignment is made, we do know about about the, say, state of each of the servers, say, what work has been assigned and how busy the server is.
The goal of the work is in some sense to get the best performance from the server farm for the loads. There are various candidate definitions/measures of performance, e.g., minimize the average elapsed time for the work of the requests to be done, that is, minimize the average response time as seen by the users/customers.
So, the work of the load balancing is just the assignments, one work request at a time, to the servers. This assignment is under uncertainty maybe about the details of each request and, so that can plan ahead, also about what the next requests will be.
E.g., if we had some idea that soon we would get a request that would need all of one server, then maybe we should think ahead and have reserved an empty server for that request. If we don't reserve such a server, then to assign this request we might have to keep waiting until have an empty server and, unless make an effort to have such an empty server, the request could go unassigned for a long time thus hurting the goal.
So, we have to make decisions over time under uncertainty to achieve a goal that is an average of the results.
Now likely we have a problem in the field of applied math called stochastic optimal control. In our case the control is the freedom we have when we make the assignments. Stochastic means varying with uncertainty over time. The stochastic part is the uncertainty of the next requests that arrive. The optimal part is that we want to do best as possible likely on some average measure, only average because we are working with uncertainty. Sure, if we don't like average, we could use median, etc.
IIRC there was some work on this problem by some especially diligent researchers at the IBM Watson lab and based in part on work of H. Kushner at the Division of Applied Mathematics at Brown University.
IIRC the good news is that those researchers made good progress. IIRC the bad news was that the computations for their solution were a bit too large to be practical! Also, from the references I saw them using, the math prerequisites for their work were a bit much!
So, maybe good work for a solution is in the literature but too difficult to use. Maybe.
My suggestion: Be practical. First cut, assign the next request for work to the least busy server. If the server farm is so busy that often all the servers are too busy for more work, then have to leave requests in a FIFO queue waiting until some server is not too busy for the first request in the queue. Collect empirical, production data on this control, the FIFO queue, the response times, any problems etc. If see no significant problems, then declare the problem solved for now. Else focus on where the problems are and look for a control that also handles those problems.
This material about stochastic optimal control was heavily from R. Bellman. Other authors include W. Fleming, D. Bertsekas, S. Shreve, E. Dynkin, R. Rockafellar, R. Wets. The field has been of interest by parts of applied math (e.g., Brown), operations research, and systems and electronic engineering.
This load balancing at a server farm contains as a special case some of the topics in queuing theory.
Computer science has long been interested in the problem of job scheduling on batch computer systems which, of course, is a relatively small special case of load balancing as in the OP. IIRC the usual result of the job scheduling work was that the problem was in NP-complete and otherwise in practice more work to do than the work being scheduled! Maybe now computer science will also be interested in something stochastic optimal control for load balancing.