Okay, once I looked into simulated
annealing and here just read all the
comments on this thread. So, apparently
what I have to contribute is not here yet.
Here I outline non-linear duality theory
and Lagrangian relaxation and report a
war story where this approach badly beat
some efforts with simulated annealing.
Basically non-linear duality theory has
some surprising properties commonly just
too powerful to ignore.
Basically we're considering a case of
optimization or mathematical
programming.
So, let R denote the set of real numbers,
and suppose positive integers m and n are
given. Let R^n be the usual n-dimensional
real vector space of n-tuples. To use
matrix notation, we regard each n-tuple x
in R^n as a matrix with n rows and one
column, that is, as n x 1.
Similarly for R^m.
We are given set C a subset of R^n and
functions
f: R^n --> R
g: R^n --> R^m
We seek x in R^n to solve non-linear
program (NLP)
min z = f(x)
subject to
g(x) <= 0
x in C
where the 0 here is the m x 1 matrix of
zeros.
We let the feasible region R, a subset of
R^n, be
F = { x | x in C and g(x) <= 0 }
We regard g as m x 1 where for j = 1, 2,
..., m component j of g is
g_j: R^n --> R
where here g_j, borrowing TeX notation, is
g with subscript j.
Okay, given 1 x m l, we let the
Lagrangian
L( x, l ) = f(x) + l g(x)
Then for l >= 0 and x in F,
f(x) >= L( x, l )
and
z >= L( x, l )
Then the dual is to find l to maximize
M( l ) = min_{x in C} L( x, l )
Then M( l ) <= z to that M( l ) is a lower
bound on z.
In particular (H. Everett), if g(x) = 0, l
>= 0, and
z >= M( l ) = min_{x in C} L( x, l ) =
min_{x in C} f(x) + l g(x) = f(x)
and x in F so that x solves NLP.
A practical special case is essentially if
spend all the productive resources and do
so optimally, then are optimal.
Of course, the usual notation used for l
is lambda, and for a while around DC
Everett ran Lambda Corporation which did
resource allocation problems for the US
DoD.
Theorem: M is concave.
Proof: For t in the interval [0,1] and m
x 1 l_1 and l_2, we have
M( tl_1 + (1 - t)l_2) =
min_{x in C} L( x, t l_1 + (1 - t) l_2) =
min_{x in C} f(x) + ( t l_1 + (1 - t) l_2) g(x) =
min_{x in C} ( t + (1 - t)) f(x) + ( t l_1 + (1
- t) l_2) g(x) =
min_{x in C} t L( x, l_1 ) + (1 - t) L( x, l_2 )
>= t min_{x in C} L( x, l_1 ) + (1 - t) min_{x in C} L(
x, l_2 )
= t M( l_1 ) + (1 - t) M( l_2 )
so that M is concave. Done.
Immediately we have supporting hyperplanes
for our concave M:
Suppose y and l are given and
M( l ) = L( y, l )
Then for 1 x m u,
M( l + u ) = min_{x in C} L( x, l + u )
= min_{x in C} ( L( x, l ) + u g(x) )
<= L( y, l ) + u g(x)
= M( l ) + u g(x)
so that for 1 x m u
s( u ) = M( l ) + u g(x)
is a supporting hyperplane for concave
M( l ); that is,
M( l + u ) <= s( u ) = M( l ) + u g(x)
and s( 0 ) = M( l + 0 ) = M( l ).
So, here we have the basic theory of
non-linear duality as needed by the
iterative algorithmic approach of
primal-dual Lagrangian relaxation.
War Story:
Once some guys had a big resource
application problem having to do with a
suddenly legal case of cross selling for
banks. They had formulated their problem
as a 0-1 integer linear program with
~40,000 constraints and 600,000 variables.
They had tried simulated annealing, ran
for days, and quit with no idea how close
they were to optimality.
I looked at their formulation and found
that, except for 16 of the constraints,
the problem was comparatively easy.
So, I wrote out
L( x, l ) = f(x) + l g(x)
where l was 1 x 16 and g represented the
16 troublesome constraints. The set C
represented the 0-1 constraints and the
rest of the 40,000 constraints.
Then for each l, finding x to solve
M( l ) = min_{x in C} L( x, l )
was easy. Then I just had to find l to
solve
max M( l )
subject to
l >= 0
and that was just to maximize a concave
function.
So, I did some primal-dual iterations:
Set
k = 1
Pick l^k >= 0
Step 1 (Primal):
Find x = x^k to solve
min_{x in C} L( x, l^k )
Then
M( l^k ) = L( x^k, l^k )
Step 2 (Termination)
If x^k in F then
M( l^k ) <= z <= f( x^k )
so that if
M( l^k ) - f( x^k )
is sufficiently small, then
terminate. Else if k i
to large, terminate.
Step 3 (Dual):
Find 1 x m u and w in R to solve
max w
subject to
w <= M( l^p ) + u g(x^p)
p = 1, 2, ..., k
Of course, this is just a linear
program. Set
l^(k + 1) = l^K + u
k = k + 1
Go to Step 1.
There are refinements: E.g., for the
linear program to find l^{k + 1}, can hope
to get better numerical performance with
by using central cutting planes.
For the real problem, in 905 seconds on a
90 MHz processor, I ran 500 primal-dual
iterations and, then, had a feasible
solution, from the bound inequalities,
guaranteed to be within 0.025% of
optimality.
Worked fine! So, simulated annealing is
not the only tool in the tool box!
Sometimes Lagrangian relaxation can work
well, too!
Introducing lagrangian relaxation quickly in a comment to a simulated annealing post is perhaps not the best way to convey a method. The machinery of mathematical programming you have to introduce quickly becomes an incomprehensible wall of text to most people. Better write this in a blog post somewhere and refer that.
Indeed, there are often better solutions than simulated annealing, if the problem you are facing have some nice properties. I've been doing optimization on a highly nonlinear mathematical problem, but the objective function is so nice it only has a single maxima in the region I'm looking. So I just ran Nelder-Mead which terminates much quicker.
For anyone interested in simulated annealing,
what I wrote could be of more interest to
them than simulated annealing and, really,
easier to read than a comparably detailed
description of simulated annealing.
It's some quite precisely done mathematics
with one theorem with a careful proof
and some smaller results also proved.
And there's some impressive computational
experience. We're talking a 0-1 integer
linear program with 40,000 constraints and
600,000 variables -- that is an example of
a problem in NP-complete. That we could
do anything with such a problem is a bit
amazing.
That we can apply non-linear optimization
to a problem in discrete or combinatorial
optimization, that is, 0-1, is curious,
in this case powerful, and, thus,
generally welcome.
That we are making good progress with
Lagrange multipliers without
taking any derivatives, e.g., as in
the Kuhn-Tucker conditions,
is also
curious -- so, this is not your
father's Lagrange multipliers.
And I proved Everett's theorem: That's
a famous result, and quite useful,
much more useful than difficult to prove,
in
a wide range of resource allocation problems.
And Everett is a famous guy, a physics
Ph.D. under J. Wheeler (one of the
main guys in relativity theory, in the world)
at Princeton
and the first guy to expound on the
many worlds (parallel universes)
interpretation of quantum
mechanics. Everett thought that his
result was worth publishing, and what
I wrote is more general and more powerful.
Simulated annealing is famous;
that some non-linear duality theory
and Lagrangian relaxation totally blew
away simulated annealing on an impressive
practical problem should also be of
interest.
That it is possible to get better
performance using central cutting planes
is actually not very well known.
Also I covered the mathematics
in essentially full detail (everything
important and not obvious, proved)
starting with next to nothing,
gave a description of an algorithm
that is ready to code (it's basically
just what I did code from successfully)
and also
the impressive computational experience,
all in just that one post.
Simulated
annealing is rarely described so
carefully, and it doesn't provide bounds
on optimality, that is, we don't know
when to stop.
As I illustrated, my work
provides bounds which commonly in practice
provide a good stopping criterion.
Simulated annealing is sometimes regarded
as part of computer science; well, my post
shows something that sometimes is better
and from applied mathematics.
Actually, the mathematics for such
optimization problems is a well-developed
and deep field; try to do much in
optimization, and really should consider
some of the mathematics, e.g., what
I posted.
Oh, by the way, my Ph.D. is in
stochastic optimal control, and
I've published peer-reviewed original
research on non-linear optimization,
e.g., some deep details about the
Kuhn-Tucker conditions.
My professors included some of the
best people in the world in optimization.
The Chair of my Ph.D. orals committee
is a Member, US National Academy of
Engineering and was a student of
A. Tucker of the Kuhn-Tucker conditions --
Kuhn and Tucker were both at Princeton.
You might let us know where you found the
part with the "opinion"?
Supposedly HN likes only really highly
relevant comments and viciously down votes
anything else, especially jokes, and
everything else, no matter how serious or
relevant, by the user they are down voting.
Here I've been viciously, bitterly, attacked
by some people apparently totally pissed off
at me for next to nothing, a
simple contribution follow up,
a famous quote from the movie Casablanca,
as a follow up
to a post
with the earlier part of that famous
quote from Casablanca.
So, my post on Lagrangian relaxation, that took
much of an afternoon to write, and with careful
theorems and proofs, with some material not
so easy to find, especially in such a succinct,
careful, and complete
form, and that stands to be of high interest
to anyone interested in simulated annealing,
gets downvoted -- a joke about my post does not,
and my response to the joke post does.
HN is getting to be a very nasty place,
not nearly as nice as, say, Reddit. Sorry
HN.
Sam,
PG, you've got a problem. You need to look
at your log files and find who has decided
to hate me and down vote anything I post,
no matter what I write.
My post on Lagrangian relaxation is right up
there in quality with some of the best
expository writing in optimization, and I
know enough to know. That I get down voted
is the problem of HN, not my post. In the
past few days, the bitter, hostile, personal
attacks on me have cost me 70+ points,
no doubt to the great pleasure of
my attackers. I'm being bitterly attacked
personally, for no good reason, and now
independent of what I post.
Here I outline non-linear duality theory and Lagrangian relaxation and report a war story where this approach badly beat some efforts with simulated annealing.
Basically non-linear duality theory has some surprising properties commonly just too powerful to ignore.
Basically we're considering a case of optimization or mathematical programming.
So, let R denote the set of real numbers, and suppose positive integers m and n are given. Let R^n be the usual n-dimensional real vector space of n-tuples. To use matrix notation, we regard each n-tuple x in R^n as a matrix with n rows and one column, that is, as n x 1.
Similarly for R^m.
We are given set C a subset of R^n and functions
f: R^n --> R
g: R^n --> R^m
We seek x in R^n to solve non-linear program (NLP)
min z = f(x)
subject to
g(x) <= 0
x in C
where the 0 here is the m x 1 matrix of zeros.
We let the feasible region R, a subset of R^n, be
F = { x | x in C and g(x) <= 0 }
We regard g as m x 1 where for j = 1, 2, ..., m component j of g is
g_j: R^n --> R
where here g_j, borrowing TeX notation, is g with subscript j.
Okay, given 1 x m l, we let the Lagrangian
L( x, l ) = f(x) + l g(x)
Then for l >= 0 and x in F,
f(x) >= L( x, l )
and
z >= L( x, l )
Then the dual is to find l to maximize
M( l ) = min_{x in C} L( x, l )
Then M( l ) <= z to that M( l ) is a lower bound on z.
In particular (H. Everett), if g(x) = 0, l >= 0, and
z >= M( l ) = min_{x in C} L( x, l ) =
min_{x in C} f(x) + l g(x) = f(x)
and x in F so that x solves NLP.
A practical special case is essentially if spend all the productive resources and do so optimally, then are optimal.
Of course, the usual notation used for l is lambda, and for a while around DC Everett ran Lambda Corporation which did resource allocation problems for the US DoD.
Theorem: M is concave.
Proof: For t in the interval [0,1] and m x 1 l_1 and l_2, we have
M( tl_1 + (1 - t)l_2) =
min_{x in C} L( x, t l_1 + (1 - t) l_2) =
min_{x in C} f(x) + ( t l_1 + (1 - t) l_2) g(x) =
min_{x in C} ( t + (1 - t)) f(x) + ( t l_1 + (1
- t) l_2) g(x) =
min_{x in C} t L( x, l_1 ) + (1 - t) L( x, l_2 )
>= t min_{x in C} L( x, l_1 ) + (1 - t) min_{x in C} L( x, l_2 )
= t M( l_1 ) + (1 - t) M( l_2 )
so that M is concave. Done.
Immediately we have supporting hyperplanes for our concave M:
Suppose y and l are given and
M( l ) = L( y, l )
Then for 1 x m u,
M( l + u ) = min_{x in C} L( x, l + u )
= min_{x in C} ( L( x, l ) + u g(x) )
<= L( y, l ) + u g(x)
= M( l ) + u g(x)
so that for 1 x m u
s( u ) = M( l ) + u g(x)
is a supporting hyperplane for concave M( l ); that is,
M( l + u ) <= s( u ) = M( l ) + u g(x)
and s( 0 ) = M( l + 0 ) = M( l ).
So, here we have the basic theory of non-linear duality as needed by the iterative algorithmic approach of primal-dual Lagrangian relaxation.
War Story:
Once some guys had a big resource application problem having to do with a suddenly legal case of cross selling for banks. They had formulated their problem as a 0-1 integer linear program with ~40,000 constraints and 600,000 variables.
They had tried simulated annealing, ran for days, and quit with no idea how close they were to optimality.
I looked at their formulation and found that, except for 16 of the constraints, the problem was comparatively easy.
So, I wrote out
L( x, l ) = f(x) + l g(x)
where l was 1 x 16 and g represented the 16 troublesome constraints. The set C represented the 0-1 constraints and the rest of the 40,000 constraints.
Then for each l, finding x to solve
M( l ) = min_{x in C} L( x, l )
was easy. Then I just had to find l to solve
max M( l )
subject to
l >= 0
and that was just to maximize a concave function.
So, I did some primal-dual iterations:
There are refinements: E.g., for the linear program to find l^{k + 1}, can hope to get better numerical performance with by using central cutting planes.For the real problem, in 905 seconds on a 90 MHz processor, I ran 500 primal-dual iterations and, then, had a feasible solution, from the bound inequalities, guaranteed to be within 0.025% of optimality.
Worked fine! So, simulated annealing is not the only tool in the tool box! Sometimes Lagrangian relaxation can work well, too!