Notes of Jean-Bernard Lasserre’s lecture nr 3

1. Moments, sums of squares and optimization

1.1. Summary of previous courses

Recall that in a polynomial optimization problem I call primal problem the minimization of {\int_{K}f\,d\mu} over the space of probability measures on {K}, and dual problem the maximization of real numbers {\lambda} such that {f-\lambda \geq 0} on {K}.

Real agebraic geometry provides us with efficient characterizations of

  • finite Borel measures supported on {K};
  • polynomials positive on {K};

when {K} is basic semialgebraic.

Kindler: can one give dual proofs of these characterizations ? Answer: not that I know. The characterization of measures is functional analysis (spectral theory of selfadjoint operators). The characterization of polynomials positive on {K} is purely algebraic.

Relaxation of the primal problem. A nonnegative sequence {y_{\alpha\beta}} is the sequence of moments of a finite measure supported on {K} iff the associated symmetric moment matrix {M_r (y)} is positive semidefinite. Therefore the global optimization problem is equivalent to the following SDP: if {K=\{g_i \geq 0\}},

minimize (over {y}) {L_y (f):=\sum_{\alpha}f_{\alpha}y_{\alpha}},

subject to constraints

{\forall d}, {M_d (y)}, {M_d (g_i y)} positive semidefinite, {L_y (1)=1}.

The {Q_d} relaxation corresponds to sticking to one value of {d}.

Relaxation of the dual problem.

maximize (over {\lambda} and polynomials {\sigma_j}) {\lambda},

subject to constraints

{f-\lambda=\sigma_0 +\sum_{k=1}^{m}\sigma_k g_k} where {\sigma_k} are sums of squares.

Note that there is a {1-1} correspondance between polynomials of degree {\leq 2d} which are sums of squares and positive semidefinite matrices of size {\begin{pmatrix}n+d\\ d\end{pmatrix}}, so this is indeed an SDP.

The {Q_d^*} relaxation corresponds to sticking to polynomials such that degree{(\sigma_0)}, degree{(\sigma_k g_k)\leq 2d}.

Theorem 1 (Lasserre) Assume that {M-|X|^2 \in Q(g)}. If {K} has non empty interior, {\max Q^*_d =\min Q_d}.

1.2. Detecting global optimality and extracting solutions

In the primal setting, the ranks of the semidefinite matrices tell us wether we can stop or go to higher degree polynomials (i.e. larger matrices).

Theorem 2 (Lasserre) Let {y} be an optimal solution of {Q_d} and let {2v\geq \max\mathrm{degree}g_i}. If

\displaystyle  \begin{array}{rcl}  \mathrm{rank}M_d (y)=\mathrm{rank}M_{d-v}(g_j y),\quad(:=s) \end{array}

then

  • {\min Q_d =\min_{K}f}.
  • One may extract {s} global minimizers

Advertisment for GloptiPoly 3, a software package devoting to optimization of polynomials, which detects global optimality and extracts solutions, http://www/laas.fr/~henrion/software

It does well to up to 10 variables, degree 8.

1.3. {0/1} optimization

This is the case when {K=\{x\in\{ 0,1 \}^n \,;\,g_j (x)\geq 0\}}. Write

\displaystyle  \begin{array}{rcl}  K=\{x_i^2 -x_i \geq 0,\,x_i -x_i^2 \geq 0,\,\,g_j (x)\geq 0\}. \end{array}

The binary constraints {x_i^2 -x_i} yield equations {y_{\alpha}=y_{\alpha\,\mathrm{mod}\,2}} on moments. This simplifies the moment matrix: only rows and columns corresponding to square free monomials need to be taken into account. So {Q_d} stabilizes after {d=n+v} (often, much earlier).

1.4. Zeros of polynomials

Usually, to compute the zero set of an ideal of polynomials, people compute a Gröbner basis of the ideal, check that it is {0}-dimensional. If so, then there is a procedure to compute complex zeroes. Check which are reals. This may represent a huge loss of energy. If complex zero set is higher dimensional, hopeless. Whereas

Example 1 Solve {x_1^2 +x_2^2 =0} in {{\mathbb R}^2}.

Compute the {4\times 4} moment matrix. The equation {y_{20}+y_{02}=0} makes the matrix be equal to the moment matrix of the Dirac mesure at {(0,0)}, showing that the only solution is {(0,0)}.

1.5. Putinar versus Karush-Kuhn-Tucker

Theorem 3 (Karush, Kuhn, Tucker) For {x^* \in K} to be a local minimum, a necessary condition is that there exist nonnegative scalar multipliers {\lambda\in{\mathbb R}^m} such that

\displaystyle  \begin{array}{rcl}  \nabla(f(x^*)-\sum_{j=1}^{m}\lambda_j g_j (x^*))=0,\quad \lambda_j g_j (x^*)=0,\quad \lambda_j \geq 0. \end{array}

The condition is also necessary in the convex case.

Putinar’s theorem tells us that {f-f^* -\sum \sigma_i g_i} is a sum of squares, so {\geq 0} on all of {{\mathbb R}^n}. Evaluation at {x^*} gives {f(x^*)-f^* =0}, thus all {\sigma_j (x^*)g_j (x^*)=0}, {\sigma_0 (x^*)=0}. Let {\lambda_j =\sigma_j (x^*)}. Write {\sigma_j =\sum_{k} q_{jk}^2} Differentiating gives

\displaystyle  \begin{array}{rcl}  \nabla\sigma_j (x^*)=\sum_{k}2q_{jk}(x^*)\nabla q_{jk}, \end{array}

\displaystyle  \begin{array}{rcl}  \nabla f (x^*)&=&\sum_{jk}2g_j (x^*)q_{jk}(x^*)\nabla q_{jk}+\lambda_j \nabla g_j (x^*)\\ &=&\sum_{jk}\lambda_j \nabla g_j (x^*), \end{array}

and {\lambda_j g_j (x^*)=0}, {\lambda_j \geq 0}. So the Putinar representation implies the Karush-Kuhn-Tucker necessary condition. Constraints which are not active at {x^*} give rise to vanishing KKT scalar multipliers. Putinar replaces them by globally determined sum of squares polynomials, whose effect is not felt locally at {x^*} but globally.

1.6. Example: convex envelope of a function

2. How to handle sparsity

2.1. No free lunch rule

The {r}-th SDP relaxation has {O(n^{2r})}-variables and matrix size {O(n^{r})}. SO very soon, SDP solvers are overflowed.

One should exploit symmetry. It seems that the algebraic method provides equivariant SDP’s of much smaller size than

One should exploit sparsity. In general, each of the constraints involves a small number {\kappa} of variables, and the objective function is a sum of {p} polynomials, each depending on only {\kappa} variables. Kojima has developed a systematic procedure to detext these sparsity patterns (look for cliques in a chordal extension of a graph associated to the problem). The resulting {r}-th SDP relaxation has at most {pO(\kappa^{2r})} variables and matrix size {O(\kappa^r)}.

Theorem 4 (Lasserre) Let {I_k} denote the sets of variables involved in the constraint {g_k} and in {f}. Assume that for every {k=2,\ldots,p},

\displaystyle  \begin{array}{rcl}  I_k \cap(\bigcap_{j=1}^{k-1}I_j)\subset I_s \quad\textrm{for some}\quad s\leq k-1. \end{array}

Assume that {f>0} on {K}. Then there exist sums of squares {\sigma_k} such that

\displaystyle  \begin{array}{rcl}  f=\sigma_0 +\sum_{k=1}^{m}\sigma_k g_k , \end{array}

where {\sigma_k} depends on the same set {I_k} of variables as {g_k}, and {\sigma_0} is a sum of polynomials in sets {I_j}‘s of variables.

Kojima’s software gives good results with {\kappa\leq 6} and {n} up to {100}.

About metric2011

metric2011 is a program of Centre Emile Borel, an activity of Institut Henri Poincaré, 11 rue Pierre et Marie Curie, 75005 Paris, France. See http://www.math.ens.fr/metric2011/
This entry was posted in Course and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s