Hacker News new | past | comments | ask | show | jobs | submit login

Okay, once I looked into simulated annealing and here just read all the comments on this thread. So, apparently what I have to contribute is not here yet.

Here I outline non-linear duality theory and Lagrangian relaxation and report a war story where this approach badly beat some efforts with simulated annealing.

Basically non-linear duality theory has some surprising properties commonly just too powerful to ignore.

Basically we're considering a case of optimization or mathematical programming.

So, let R denote the set of real numbers, and suppose positive integers m and n are given. Let R^n be the usual n-dimensional real vector space of n-tuples. To use matrix notation, we regard each n-tuple x in R^n as a matrix with n rows and one column, that is, as n x 1.

Similarly for R^m.

We are given set C a subset of R^n and functions

f: R^n --> R

g: R^n --> R^m

We seek x in R^n to solve non-linear program (NLP)

min z = f(x)

subject to

g(x) <= 0

x in C

where the 0 here is the m x 1 matrix of zeros.

We let the feasible region R, a subset of R^n, be

F = { x | x in C and g(x) <= 0 }

We regard g as m x 1 where for j = 1, 2, ..., m component j of g is

g_j: R^n --> R

where here g_j, borrowing TeX notation, is g with subscript j.

Okay, given 1 x m l, we let the Lagrangian

L( x, l ) = f(x) + l g(x)

Then for l >= 0 and x in F,

f(x) >= L( x, l )

and

z >= L( x, l )

Then the dual is to find l to maximize

M( l ) = min_{x in C} L( x, l )

Then M( l ) <= z to that M( l ) is a lower bound on z.

In particular (H. Everett), if g(x) = 0, l >= 0, and

z >= M( l ) = min_{x in C} L( x, l ) =

min_{x in C} f(x) + l g(x) = f(x)

and x in F so that x solves NLP.

A practical special case is essentially if spend all the productive resources and do so optimally, then are optimal.

Of course, the usual notation used for l is lambda, and for a while around DC Everett ran Lambda Corporation which did resource allocation problems for the US DoD.

Theorem: M is concave.

Proof: For t in the interval [0,1] and m x 1 l_1 and l_2, we have

M( tl_1 + (1 - t)l_2) =

min_{x in C} L( x, t l_1 + (1 - t) l_2) =

min_{x in C} f(x) + ( t l_1 + (1 - t) l_2) g(x) =

min_{x in C} ( t + (1 - t)) f(x) + ( t l_1 + (1

- t) l_2) g(x) =

min_{x in C} t L( x, l_1 ) + (1 - t) L( x, l_2 )

>= t min_{x in C} L( x, l_1 ) + (1 - t) min_{x in C} L( x, l_2 )

= t M( l_1 ) + (1 - t) M( l_2 )

so that M is concave. Done.

Immediately we have supporting hyperplanes for our concave M:

Suppose y and l are given and

M( l ) = L( y, l )

Then for 1 x m u,

M( l + u ) = min_{x in C} L( x, l + u )

= min_{x in C} ( L( x, l ) + u g(x) )

<= L( y, l ) + u g(x)

= M( l ) + u g(x)

so that for 1 x m u

s( u ) = M( l ) + u g(x)

is a supporting hyperplane for concave M( l ); that is,

M( l + u ) <= s( u ) = M( l ) + u g(x)

and s( 0 ) = M( l + 0 ) = M( l ).

So, here we have the basic theory of non-linear duality as needed by the iterative algorithmic approach of primal-dual Lagrangian relaxation.

War Story:

Once some guys had a big resource application problem having to do with a suddenly legal case of cross selling for banks. They had formulated their problem as a 0-1 integer linear program with ~40,000 constraints and 600,000 variables.

They had tried simulated annealing, ran for days, and quit with no idea how close they were to optimality.

I looked at their formulation and found that, except for 16 of the constraints, the problem was comparatively easy.

So, I wrote out

L( x, l ) = f(x) + l g(x)

where l was 1 x 16 and g represented the 16 troublesome constraints. The set C represented the 0-1 constraints and the rest of the 40,000 constraints.

Then for each l, finding x to solve

M( l ) = min_{x in C} L( x, l )

was easy. Then I just had to find l to solve

max M( l )

subject to

l >= 0

and that was just to maximize a concave function.

So, I did some primal-dual iterations:

          Set

               k = 1

          Pick l^k >= 0

     Step 1 (Primal):

          Find x = x^k to solve

               min_{x in C} L( x, l^k )

          Then

               M( l^k ) = L( x^k, l^k )

     Step 2 (Termination)

          If x^k in F then

               M( l^k ) <= z <= f( x^k )

          so that if

               M( l^k ) - f( x^k )

          is sufficiently small, then
          terminate.  Else if k i
          to large, terminate.

          Step 3 (Dual):

          Find 1 x m u and w in R to solve

               max w

               subject to

               w <= M( l^p ) + u g(x^p)

               p = 1, 2, ..., k

          Of course, this is just a linear
          program.  Set

               l^(k + 1) = l^K + u

               k = k + 1

          Go to Step 1.
There are refinements: E.g., for the linear program to find l^{k + 1}, can hope to get better numerical performance with by using central cutting planes.

For the real problem, in 905 seconds on a 90 MHz processor, I ran 500 primal-dual iterations and, then, had a feasible solution, from the bound inequalities, guaranteed to be within 0.025% of optimality.

Worked fine! So, simulated annealing is not the only tool in the tool box! Sometimes Lagrangian relaxation can work well, too!




Introducing lagrangian relaxation quickly in a comment to a simulated annealing post is perhaps not the best way to convey a method. The machinery of mathematical programming you have to introduce quickly becomes an incomprehensible wall of text to most people. Better write this in a blog post somewhere and refer that.

Indeed, there are often better solutions than simulated annealing, if the problem you are facing have some nice properties. I've been doing optimization on a highly nonlinear mathematical problem, but the objective function is so nice it only has a single maxima in the region I'm looking. So I just ran Nelder-Mead which terminates much quicker.


For anyone interested in simulated annealing, what I wrote could be of more interest to them than simulated annealing and, really, easier to read than a comparably detailed description of simulated annealing.


Pff. That's, like, your opinion man.


> opinion?

It's some quite precisely done mathematics with one theorem with a careful proof and some smaller results also proved.

And there's some impressive computational experience. We're talking a 0-1 integer linear program with 40,000 constraints and 600,000 variables -- that is an example of a problem in NP-complete. That we could do anything with such a problem is a bit amazing.

That we can apply non-linear optimization to a problem in discrete or combinatorial optimization, that is, 0-1, is curious, in this case powerful, and, thus, generally welcome.

That we are making good progress with Lagrange multipliers without taking any derivatives, e.g., as in the Kuhn-Tucker conditions, is also curious -- so, this is not your father's Lagrange multipliers.

And I proved Everett's theorem: That's a famous result, and quite useful, much more useful than difficult to prove, in a wide range of resource allocation problems. And Everett is a famous guy, a physics Ph.D. under J. Wheeler (one of the main guys in relativity theory, in the world) at Princeton and the first guy to expound on the many worlds (parallel universes) interpretation of quantum mechanics. Everett thought that his result was worth publishing, and what I wrote is more general and more powerful.

Simulated annealing is famous; that some non-linear duality theory and Lagrangian relaxation totally blew away simulated annealing on an impressive practical problem should also be of interest.

That it is possible to get better performance using central cutting planes is actually not very well known.

Also I covered the mathematics in essentially full detail (everything important and not obvious, proved) starting with next to nothing, gave a description of an algorithm that is ready to code (it's basically just what I did code from successfully) and also the impressive computational experience, all in just that one post.

Simulated annealing is rarely described so carefully, and it doesn't provide bounds on optimality, that is, we don't know when to stop.

As I illustrated, my work provides bounds which commonly in practice provide a good stopping criterion.

Simulated annealing is sometimes regarded as part of computer science; well, my post shows something that sometimes is better and from applied mathematics.

Actually, the mathematics for such optimization problems is a well-developed and deep field; try to do much in optimization, and really should consider some of the mathematics, e.g., what I posted.

Oh, by the way, my Ph.D. is in stochastic optimal control, and I've published peer-reviewed original research on non-linear optimization, e.g., some deep details about the Kuhn-Tucker conditions. My professors included some of the best people in the world in optimization. The Chair of my Ph.D. orals committee is a Member, US National Academy of Engineering and was a student of A. Tucker of the Kuhn-Tucker conditions -- Kuhn and Tucker were both at Princeton.

You might let us know where you found the part with the "opinion"?


I believe it was a joke


Supposedly HN likes only really highly relevant comments and viciously down votes anything else, especially jokes, and everything else, no matter how serious or relevant, by the user they are down voting.

Here I've been viciously, bitterly, attacked by some people apparently totally pissed off at me for next to nothing, a simple contribution follow up, a famous quote from the movie Casablanca, as a follow up to a post with the earlier part of that famous quote from Casablanca.

So, my post on Lagrangian relaxation, that took much of an afternoon to write, and with careful theorems and proofs, with some material not so easy to find, especially in such a succinct, careful, and complete form, and that stands to be of high interest to anyone interested in simulated annealing, gets downvoted -- a joke about my post does not, and my response to the joke post does.

HN is getting to be a very nasty place, not nearly as nice as, say, Reddit. Sorry HN.

Sam, PG, you've got a problem. You need to look at your log files and find who has decided to hate me and down vote anything I post, no matter what I write.

My post on Lagrangian relaxation is right up there in quality with some of the best expository writing in optimization, and I know enough to know. That I get down voted is the problem of HN, not my post. In the past few days, the bitter, hostile, personal attacks on me have cost me 70+ points, no doubt to the great pleasure of my attackers. I'm being bitterly attacked personally, for no good reason, and now independent of what I post.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: