Notes on Mathieu’s Lecture 1


1. Approximation algorithms

I am a problem solver: I design algorithms. An algorithm for optimization must produce output

  • that satisfies constraints (feasibility)
  • that optimizes (minimizes or maximizes) an objective function.

Such problems are often NP-hard. So we look for a solution that is not exact but close to optimum.

Example 1 TSP (Travelling Salesman Problem). The input is {n} cities. The output is a tour that visits each city exactly once. The objective is minimizing the total length.

This is NP-hard (the term “NP-complete” may only be applied to decision problems, whose output is “yes” or “no”).

The quality of an approximate solution is measured by relative error. Here’s the definition for minimization; the definition for maximization is similar.

Definition 1 A (c,s)-approximation, for a minimization problem, means that if {OPT\leq c}, then output value is {\leq s}. An {\alpha}-approximation is short for {(c,\alpha c)}-approximation for all {c}.

Example 2 TSP has a 2-approximation in metric spaces.

The idea is to use minimum spanning tree double cover. More precisely, using a greedy algorithm, compute a minimum cost tree spanning all cities (add edges in the order of increasing distances, maintaining an acyclic structure). Double each edge, this gives an Eulerian graph (connected and all degrees are even). Find an Eulerian tour (traverses every edge exactly once), and to build a tour, follow the Eulerian tour, jumping (shortcuts) when the Eulerian tour lets you pass by a point already visited.

Claim.

\displaystyle  \begin{array}{rcl}  length(OUTPUT)\leq length(OPT(TSP)). \end{array}

Proof of the claim. By construction,

\displaystyle  \begin{array}{rcl}  length(\textrm{minimum spanning tree})\leq length(OPT(TSP)), \end{array}

and

\displaystyle  \begin{array}{rcl}  length(OUTPUT)&\leq& length(\textrm{double cover of minimum spanning tree})\\ &=&2 length(\textrm{minimum spanning tree}). \end{array}

TSP is one of the 21 problems that Richard Karp proved to be NP complete. So a {1}-approximation cannot be given in polynomial time if P{\not=}NP. Our goal is to get as closed to {1} as possible. The best we can hope is to get arbitrarily close to {1}:

{\forall\epsilon>0}, {\exists} poly-time algorithm giving a {(1+\epsilon)} -approximation.

Such a family of algorithms are called an approximation scheme. In some sense, problems that have an approximation scheme are the easiest among NP-hard combinatorial problems.

Example 3 KNAPSACK. Given objects, each with a value and a volume, you want to put as much value as possible in your knapsack without exceeding the knapsack’s capacity.

KNAPSACK admits an approximation scheme.

Some problems do not admit approximation schemes (if P{\not=}NP).

Example 4 MAX CLIQUE. Input: a graph. Output: a clique (complete graph embedded in given graph). Objective: size, i.e number of vertices of clique.

MAX CLIQUE is NP-hard (Karp). Worse, if P{\not=}NP, {\forall \delta>0}, one cannot find a clique of size {OPT/n^{1-\delta}} in polynomial time.

In the 1970’s, people explored hundreds of decision problems and classified them in P or NP-hard. These days, one deals with optimization problems, and one tries to classify them according to hardness of approximation.

2. A case study: approximation algorithms for SET COVER

Example 5 SET COVER. Cover a set with the cheapest subsets. Input: a finite set {U} of elements, {\mathcal{C}} a collection of subsets of {U}, each subset has a weight. Output is a covering of {U} with elements of {\mathcal{C}}. Objective: the sum of the weights of the chosen sets.

SET COVER is NP-hard. Let us give an approximation algorithm based on Linear Programming.

2.1. A linear programming relaxation for SET COVER

1. (Integer program.) Give a model for SET COVER as an integer program. Let {U=\{e_1 ,...,e_n\}}, {\mathcal{C}=\{S_1 ,...,S_m\}}, {w_j =} weight of {S_j}. Output is a \{ 0,1 \} vector {x} ({x_j=1} iff we use {S_j} in the covering). Let {a_{ij} =1} iff {a_i \in S_j}. This {n\times m} matrix encodes the {S_j}‘s. We assume that the {S_j}‘s cover. The constraint that the chosen {S_j}‘s cover is {Ax\geq 1} (i.e. each entry is {\geq 1}). The goal is to minimize {w\cdot x}.

2. (Linear program.) We are stuck since integer programming (solving the linear program for an integer valued unknown vector x) is NP-hard. So we drop the integrality constraint and replace it with {0\leq x_j \leq 1}. The resulting LP has a poly-time solution {x}.

2.2. A first algorithm using deterministic rounding

3. (Rounding.) If {x} is integral, done. If not, round {x_j} to the nearest integer, {\tilde{x}_j =0} or {1}.

(Analysis.) Does {\tilde{x}} cover, i.e. {A\tilde{x}\geq 1} ? Not in general. For instance, if all {S_j}‘s appear thrice with equal weights, the solution is divided by {3}, all {x_j} are {\leq 1/3}, and so {\tilde{x}_j =0}.

Theorem 2 Let {f} denote the maximum, over all elements of {U}, of the number of sets of {\mathcal{C}} containing it. Then there is an {f}-approximation.

Proof: (Algorithm:) Modify third step as follows.

3. (Rounding) Choose a {\delta>0} and round {x_j} to {1} if {x_j \geq \delta}, to {0} otherwise.

(Analysis:) Check that the output covers every element: Pick an element {e_i \in U}.   The LP solution satisfies {row_i\cdot x\geq 1}. By assumption, {f} bounds the number of {1}‘s in that row. This implies that at least one of the involved {x_j} is {\geq 1/f}. So as long as {\delta} is such that {\delta\leq 1/f}, at least one of the involved {\tilde{x}_j} is equal to {1}, and {e_i} is covered.

The weight of the output is {\leq (1/\delta) \textrm{weight of LP solution} \leq (1/\delta) OPT}.

Conclusion. If one takes {\delta=1/f}, the rounded solution covers and its weight is {\leq f\times OPT}. \Box

2.3. A second algorithm using randomized rounding

Modify third step as follows.

3. (Rounding) For all {j}, include {S_j} independently with probability {\min\{\beta x_j ,1\}}.

Theorem 3

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb E}(\textrm{value of }OUTPUT)\leq \beta\times OPT \end{array}

and with probability {\geq 1-n\,e^{-\beta}}, every element is covered.

Proof:

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb E}(\textrm{value of }OUTPUT) &=&\mathop{\mathbb E}(\sum_j 1(S_j\textrm{ is chosen}))\\ &=&\sum_j \mathop{\mathbb E}(1(S_j\textrm{ is chosen}))\\ &=&\sum_j \mathop{\mathbb P}(S_j\textrm{ is chosen})\\ &\leq&\sum_j \beta x_j \\ &=&\beta\times \textrm{value of LP solution}\\ &\leq& \beta\times OPT. \end{array}

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb P}(\exists\textrm{ uncovered element}) &\leq&\sum_i \mathop{\mathbb P}(e_i\textrm{ is not covered})\\ &\leq&\sum_i \prod_{\{j\,|\,e_i \in S_j\}} \mathop{\mathbb P}(S_j\textrm{ is rejected})\\ &\leq&\sum_i \prod_{\{j\,|\,e_i \in S_j\}} (1-\min\{\beta x_j ,1\}). \end{array}

Note that {1-\min\{\beta x_j ,1\}\leq (1-x_j)^{\beta}}. We know that {e_i} is (fractionnally) covered by the LP solution, i.e. {\displaystyle \sum_{\{j\,|\,e_i \in S_j\}} x_j \geq 1}. Thus

\displaystyle  \begin{array}{rcl}  \prod_{\{j\,|\,e_i \in S_j\}}(1-x_j)^{\beta} &\leq& \prod_{\{j\,|\,e_i \in S_j\}}e^{-\beta x_j}\\ &=&\exp(-\beta(\sum_{\{j\,|\,e_i \in S_j\}} x_j ))\\ &\leq& e^{-\beta}. \end{array}

Therefore {\mathop{\mathbb P}(\exists\textrm{ uncovered element})\leq n\,e^{-\beta}}, and

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb P}(\textrm{rounded solution covers})\geq 1-n\,e^{-\beta}. \end{array}

\Box

2.4. Epilogue

Take {\beta=c\log n}, this gives {O(\log n)} approximation, with high probability when c is strictly greater than 1. Can we do better ?

Theorem 4 If P{\not=}NP, there exists a constant {c>0} such that no polynomial algorithm can give a {(c\log n)}-approximation for SET COVER.

This is a result of Raz-Safra, 1997, [RS], improving on Lund-Yannakakis 1990 [LY]. Later on, Uriel Feige (1998) has been able to pin down the exact constant. He proved that {c=1} under a slight strengthening of P{\not=}NP, [F].

To summarize, the technique goes as follows.

  1. Express problem as integer linear program.
  2. Solve a LP relaxation.
  3. Round deterministically or randomly the solution to make it integral but still feasible.

3. Solving linear or semi-definite programs

One might skip this subsection and keep LP/SDP solvers as a black box. Claire would prefer so (she has not prepared this part well), but audience is eager to know more…

Geometrically, an LP problem amounts to the input of a convex polytope (intersection of the half-spaces defined by linear inequalities, convex since intersection of convexes) and a linear function. The goal is to maximize the linear function on the polytope.

3.1. The simplex method

The simplex algorithm is of the greedy type: walk from vertex to vertex, choosing the steepest ascending edge, until a local optimum is reached. Complexity can be exponential, since the number of vertices of the polytope can be exponential. However, it works well in practice, i.e. on data arising in applications. (Note that one must also take care to deal with special cases such as the polytope being empty or unbounded.)

3.2. The ellipsoid method

Khachyian 1979 for the analysis. This is polynomial time ({O(n^4 \ell)} where {\ell} is the number of digits needed to express data).

Let us first assume that our goal is just to find a feasible solution (we shall see later on why this helps solving the optimisation problem).

Assume that we are given a starting ellipsoid {\mathcal{E}} that contains the entire polytope {P}. An ellipsoid is defined by an inequation of the form {(x-x_0)^{\top}A(x-x_0)\leq 1}, {A} positive definite symmetric. If {x_0 \in P}, we are done. If not, we find a constraint of the LP that {x_0} violates, i.e. a half-space {H} containing {x_0} which is disjoint from {P}. Now {\mathcal{E}\setminus H} has volume {\leq \frac{1}{2}volume(\mathcal{E})}. There is a new ellipsoid that contains {\mathcal{E}\setminus H} (and thus {P}), and its volume is {\leq (1-\epsilon)volume(\mathcal{E})}. After a few steps, one can prove that the center of the ellipsoid is contained in {P}. For this, one needs a lower bound on {volume(P)}. This involves some preprocessing – if one of the constraints is an equality constraint, we can use it to eliminate a variable and reduce dimensions.

HELP! Please complete…

In the end, running time is polynomial in the number of constraints.

The optimisation problem is solved as follows. Start from a point of {P}. Add a constraint parallel to the linear function. Previous algorithm gives new point of {P} with better value.

HELP! Please complete…

3.3. Generalisation

Note that all that is needed for the ellipsoid method is a separation oracle, which says whether a point belongs to {P}, and if not, provides a separating hyperplane. So the method sometimes applies to problems with an exponential number of constraints.

Example 6 MINIMUM SPANNING TREE. Input : a graph with edge weights. Output: a subset of edges connecting all the vertices. Objective: total weight.

This is equivalent the following IP: Let {x_e \in\{ 0,1 \}} according to whether we take edge {e} or not. Connectivity constraint is

\displaystyle  \begin{array}{rcl}  \forall \textrm{proper subset }S\subset V, \sum_{u\in S,\,v\notin S}w_{uv}\geq 1. \end{array}

Objective: {\sum_e w_e x_e}.

There are exponentially many constraints. But there is a poly-time separation oracle for the LP-relaxation. Indeed, use the max-flow/min-cut theorem. {\forall u\in V}, {\forall v\in V}, compute {min-cut(u,v)} in graph where edge {e} has capacity {x_e}.

  • if this cut has value {\sum_{e~\mathrm{ crossing\,cut}}x_e <1}, then it gives a violated constraint.
  • otherwise, i.e. if all cuts have values {\geq 1}, we have a solution.

3.4. Semi-definite programming

In a semi-definite program, the unknowns are coded into a square matrix {X} and the constraints in the fact that {X} is positive semi-definite (plus a few linear constraints).

Recall the four equivalent conditions for a real symmetric matrix to be positive semi-definite: depending on the algorithm and analysis, we will need one or the other of the characterizations.

Is there a separation oracle ? Yes, using linear algebra. The question is: how do we test whether a matrix is positive semi-definite, and give a separating hyperplane if the answer is negative? One answer is to build a Cholesky decomposition A=VV* (using a version of Gaussian elimination). Other answers are discussed on http://math.stackexchange.com/questions/13282/determining-whether-a-symmetric-matrix-is-positive-definite-algorithm.

4. Using SDP for approximation algorithms: MAX CUT

Example 7 MAX CUT. Input: a graph with edge weight. Output: a partition {V=L\cup R} of the vertices. Goal: maximize {\sum_{u\in L,\,v\in R, \,uv \in E}w_{uv}}.

Remarks. 1. This is NP-hard. Note that if max is replaced by min, there is a polynomial solution.

2. Using linear programs, we do not know how to improve on this.

3. Goemans and Williamson, 1994, [GW], have used an SDP to prove the following theorem.

Theorem 5 There is a 0.878..-approximation algorithm for MAX CUT.

Proof: Let us try to turn this into an LP. Take as unknown {\pm1}-valued functions on vertices (equivalent to 0 1 integer variables, up to a linear transform). Then one must maximize {\sum_{uv}w_{uv}(1-x_u x_v)/2}. This is not linear. Failure.

1. (SDP relaxation) Remember the following criterion for semi-definiteness:

\displaystyle  \begin{array}{rcl}  X\textrm{ semi-definite}\quad \Leftrightarrow\quad \exists\textrm{ matrix } V\textrm{ such that }X=V^{\top}V. \end{array}

If our unknowns are vectors {v_i} (columns of {V}), then an SDP can accomodate an objective function and constraints which are linear in inner products, like

  • objective {\sum_{uv\in E}w_{ij}(1-v_i v_j)/2}, and
  • constraints {v_i \cdot v_i =1}.

The solution {X} of the SDP exactly solves MAX CUT iff its rank is equal to {1}, i.e. {V} is a column vector. If not, the SDP solution produces a bunch of unit vectors in {{\mathbb R}^d} for some {d\leq n=}number of vertices.

2. (Randomized rounding) The rounding procedure (partitionning {n} points on the unit sphere of {{\mathbb R}^d}) consists in picking a random hyperplane through the origin, vectors on one side go to {L}, vectors on the other side go to {R}. This is a cut.

(Analysis) Let us evaluate the expectation of objective

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb E}(\sum_{ij\in E}w_{ij}1_{H\textrm{ separates }v_i \textrm{ from }v_j}) =\sum_{ij\in E}\frac{1}{\pi}\theta_{ij}w_{ij}, \end{array}

where {\theta_{ij}} is the angle between {v_i} and {v_j}. On the other hand, the value of the SDP (which is {\geq OPT}), is

\displaystyle  \begin{array}{rcl}  \sum_{ij}w_{ij}\frac{1}{2}(1-v_i \cdot v_j) =\sum_{ij}w_{ij}\frac{1}{2}(1-\cos(\theta_{ij}). \end{array}

Analysis gives that for all {\theta\in [0,\pi ]},

\displaystyle  \begin{array}{rcl}  \frac{1}{\pi}\theta\geq 0.878...\frac{1}{2}(1-\cos(\theta)), \end{array}

thus

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb E}(\textrm{objective})\geq 0.878.. OPT. \end{array}

\Box

A striking, though conditional, result of S. Khot, G. Kindler, E. Mossel, R. O’Donnell 2007, [KKMO], asserts that the Goemans-Williamson approximability bound could be sharp.

Theorem 6 If UGC holds, then no poly-time algorithm can {\alpha}-approximate MAX CUT if {\alpha>0.878..}.

Now you want to understand UGC ? Come on friday, 3pm, in Jussieu, b\^atiment Atrium, ground floor, grand salle de visioconference, and listen to Subhash Khot. And attend Sanjeev Arora’s mini-course in Amphi Hermite the week after.

References

[F] Feige, Uriel , A threshold of {\ln n} for approximating set cover. J. ACM 45 (1998), no. 4, 634–652.

[GW] Goemans, Michel X.; Williamson, David P.; Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42 (1995), no. 6, 1115–1145.

[KKMO] Khot, Subhash; Kindler, Guy; Mossel, Elchanan; O’Donnell, Ryan; Optimal inapproximability results for MAX-CUT and other 2-variable CSPs? SIAM J. Comput. 37 (2007), no. 1, 319–357.

[LY] Lund, Carsten; Yannakakis, Mihalis; On the hardness of approximating minimization problems. J. ACM 41 (1994), no. 5, 969–981.

[RS] Raz, Ran; Safra, Shmuel; A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. STOC ’97 (El Paso, TX), 475–484 (electronic), ACM, New York, 1999.

Richard M. Karp (1972). “Reducibility Among Combinatorial Problems“. In R. E. Miller and J. W. Thatcher (editors). Complexity of Computer Computations. New York: Plenum. pp. 85–103.

Advertisements

About metric2011

metric2011 is a program of Centre Emile Borel, an activity of Institut Henri Poincaré, 11 rue Pierre et Marie Curie, 75005 Paris, France. See http://www.math.ens.fr/metric2011/
This entry was posted in Course and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s