Scribe: Eric Colin de Verdière

Here is the plan for today:

- Duality in linear programming,
- duality in semidefinite programming,
- application to correlation clustering (work by Warren Schudy and

Claire Mathieu, SODA’10).

**1. Duality **

As an example, let us consider the following linear program (taken from

Va\v sek Chvátal’s book):

subject to for each and:

To prove a lower bound on the value of this program, it suffices to

find a feasible solution: is feasible and has value 9, so

.

Can we prove an upper bound on ? Yes! The second equation,

multiplied by two, reads . And its LHS is

at least the value of the objective (since the ‘s are nonnegative).

In other words, the second constraint, multiplied by two, has its

coefficients larger than in the objective function. So .

We may try some other combinations. (2)+(3) gives a proof that .

More generally, consider a *nonnegative* linear combination of the

constraints: for nonnegative

‘s. The condition that all coefficients are larger than in the

objective function rewrites as:

As long as these constraints are satisfied, we have . So we want to minimize subject to these

constraints. This is a linear program in the ‘s, called the

*dual linear program*. Of course, taking the dual of the dual gives

the initial linear program.

Theorem 1

If both values of the primal and dual LP are finite, then their

values are equal: .If the solution to the primal is unbounded

(), then the dual is not feasible

().Dually, if the primal is not feasible

(), then the dual is unbounded

().

No case distinction is needed under the convention that

and .

*Proof:*

is easy, we did this on an example. The other direction is

harder, and we will skip it.

**2. Semi-definite programming duality **

In general, in approximation algorithms, we follow the following steps:

- Solve a linear programming relaxation (say a maximization problem);
- round the solution to obtain an integer solution;
- for the analysis, argue that the value of the output is at least

times the value of the linear program, for some value of~. This gives

a -approximation.

Duality gives a new possibility for approximation algorithms: If we can

find one feasible solution in the dual, such that the value of the

output is at least times the value of this feasible solution, then we

know that is at most times the output.

SDP also have a dual, so this proof technique can in principle be used.

I will illustrate this by an example on correlation clustering, but SDP

duality potentially goes far beyond.

Let us consider the following SDP:

subject to: for every ,

and the SDP constraint: is semi-definite positive.

(This presentation differs from the one used in previous lectures, but it

is more convenient to demonstrate duality.)

We rewrite the SDP constraint as: For every , . This

constraint is linear in the ‘s. In this sense the SDP program is

just a LP with an infinite number of constraints.

We can write the dual like for a linear program. Eventually, we find that

, where is a vector, , and

depends on the parameters of the primal. We recognize that this is

equivalent to having positive semi-definite.

Interlude by Nati Linial: This works for convex programming also, via

polarity.

More precisely, let containing in its interior. Let

be the *polar*

of~. Observe that if is the set of PSD matrices, then .

is PSD if and only if, for every PSD, we have . (This

is not entirely trivial but easy.)

For each , is equivalent to: For each , we

have . This is the same phenomenon as LP duality.

**3. Correlation clustering **

Today we consider a graph with similarity or dissimilarity constraints, and

we want to partition the vertex set by minimizing the number of

disagreements. (This is more difficult than the previous clustering

problem we studied: the exact problem is the same, but in terms of

approximation, this is different.)

More precisely, given the complete graph where all edges are labeled either

(similar) or (dissimilar), find a partition of the vertex set that

minimizes the number of inconsistent edges: an edge is inconsistent if it

is labeled but its two vertices are in different pieces of the

partition, or it is labeled but its two vertices are in the same piece

of the partition.

The problem is of course NP-hard, so let us look for an approximation

algorithm. If , there exists a partition satisfying all the

constraints. In this case, the partition is easy to find using a simple

greedy algorithm: Iteratively take a vertex~ and put together with~

every such that is a . That’s one cluster. Set it aside and

repeat.

A simple randomized variant provides a good approximation for the problem

(even if ): We may take vertex uniformly at random. Let us

call this variant “Randomized greedy”.

Theorem 2

Randomized greedy is a 3-approximation for this problem.

Here is an example showing why randomization is necessary. Denote by

the vertex set. Put a edge for edge iff or

is equal to~. If is picked by the algorithm, then we are really

far from the optimum: we get a solution of cost . On the

other hand, if we use “Randomized greedy”, then is picked with

probability , and in the remaining cases the solution has cost

. On average, we obtain a solution with cost

.

*Proof:*

The analysis uses the notion of “bad triangle”. A triangle is

bad if exactly one of its edges is labeled . Certainly, is at

least the maximum number of edge-disjoint bad triangles.

More generally, to every bad triangle~, we associate a value

. We consider positive combinations of bad triangles

satisfying the following condition: For each edge, the sum of the ,

for every bad triangle containing~, is at most one. The same

argument shows that .

Analysis: Let be the probability that, during a random execution of

“Randomized greedy”, one of ‘s vertices is chosen before

disappears. Then I claim that the form a feasible solution in

the dual.

Proof: Let be an edge, contained in bad triangles. Consider the

first vertex of the union of these triangles picked by the algorithm. If

it is an endpoint of~, then the algorithm destroys bad triangles.

If it is one of the other vertices, it destroys a single triangle.

Thus the sum of the , where ranges over all bad triangles

containing~, is at most~3.

The rest of the proof is left as exercise.

Theorem 3

Consider now the complete graph with weighted edges and

such that (instead of and , we have weights).

Then “Randomized greedy” (where you put the neighbor of the picked

vertex~ in the same cluster as~ iff is at least ) is

a 5-approximation.

Kleinberg proved that the problem does not admit a PTAS (unless P=NP).

Which variant of the problem admits a PTAS? Let us change the model and

move away from the worst case to the “planted model”.\footnote{The name

comes from the original study of such models in the max-clique problem;

cliques were “planted”, imposed on a random graph. See Uriel Feige’s

work.}

**4. Other model **

Assume that we are given a (possibly non-complete) graph with and

on the edges. Assume that there is a perfect clustering, but that the

input is generated by a random perturbation: Flip independently the label

of every edge with probability~.

Theorem 4

There is an algorithm that achieves an -approximation in expectation.

This slightly improves the state of the art, but with very different

(SDP) techniques.

\note{From here on, refer to Claire’s slides also.}

View the solution as a set of vectors , such that is iff

the th vertex is in cluster~. In this sense we want to minimize the

Hamming distance between the input and .

The algorithm is now very simple: Solve the dual SDP relaxation using the

ellipsoid method. This defines for every edge , which we

interpret as . This gives another clustering problem. We solve

it using the 5-approximation algorithm described before.

We assume here is a constant (say ).

Let be the “true” clustering before applying the noise, be the

input, and be the solution of the SDP.

Lemma 5

.

*Proof:*

Assume not. Imagine that we add to the initial SDP, called SDP1, the

constraint that . Let SDP2 be the resulting

SDP. We get a new solution that must have a *larger* value.

\note{To be filled…}

We obtain that a matrix

must be SDP, where the noise comes from the perturbation of the input.

For this step, we find an oracle (a mathematician) and asks him about the

eigenvalues of random matrices. It turns out that the eigenvalues of the

noisy matrix are in expectation , so all eigenvalues are

nonnegative, so the matrix~ is SDP (with high probability). So with

high probability, the dual is feasible, so the two SDP are different.

In applications, and do not sum up to one. This extension

of the model has been studied also. The problem becomes more difficult to

approximate (best known is a -approximation or worse).

If , it is just as if the edge were not there.

And if the input is not really a complete graph, things become much more

complicated.

References:

- 3-approximation and 5-approximation: Ailon, Charikar, Newman, 2005.
- Inapproximability: Jon Kleinberg, 1990’s.

SDP duality has been used in problems related to the Unique Games

Conjecture. Maybe this will prove useful for other problems as well.

There are also recent algorithms for solving SDP that take advantage of the

dual, by Arora and one of his PhD students.

Next course in March: detailed presentation of the subexponential algorithm

for UGC presented by Arora and Steurer.