## Notes of Assaf Naor’s talk, jan. 17th

Approximate kernel clustering

Two papers with Subhash Khot, [KN1], [KN2], plus a recent update. Motivated by practitioners, Song, Smola, Getton, Borgwardt, who came up with nice experiments, and raised a beautiful question.

1. Statistical motivation

Let ${X_i}$ be random variables, ${A=}$ covariance matrix, called the kernel. We want to cluster ${A}$, i.e. find a partition ${S_1 ,\ldots,S_k}$ of ${[n]}$, form the much smaller matrix

$\displaystyle \begin{array}{rcl} C_{ij}=\sum_{p\in S_i ,\,q\in S_j} a_{ij}. \end{array}$

Given a small, ${k\times k}$, positive semidefinite matrix ${B}$, called the hypothesis matrix, find a clustering of ${A}$ such that ${trace(BC)}$ is maximal. Notation: ${Clust(A|B)=\max\{trace(BC)\,;\,\textrm{all partitions}\}}$.

This contains many problems, related for instance to locating low dimensional manifolds in large data spaces.

Question: is there a polytime algorithm to compute (approximate) ${Clust(A|B)}$.

Theorem 1 Exact computation is NP-hard, but there exists a polytime algorithm which computes ${Clust(A|B)}$ up to the factor ${1+\frac{3\pi}{2}}$.

2. Towards optimal results ?

Say ${A}$ is centered if ${\sum_{ij}a_{ij}=0}$. In case ${A}$ is centered, we believe that our results are optimal.

Example 1 ${B=}$ identity matrix. Then the goal is to find a partition which maximizes the sum of correlations within each piece.

Example 2 ${B=\begin{pmatrix} 1&-1 \\ -1&1 \end{pmatrix}}$. Then the goal is the positive semidefinite Grothendieck problem.

There is a natural method, which consists in replacing signs in ${B}$ by inner products between unit vectors. Grothendieck showed that the integrality gap of this SDP relaxation is bounded, and ${\geq \pi/2}$. Later on, Rietz showed that it is equal to ${\pi/2}$. For general ${n\times n}$ matrices ${A}$, the best ${K}$ is ${\Theta(\log n)}$. Nemirovsky et al., Alon and the Makarychev bros.

Example 3 ${A=}$ Laplacian of a graph, i.e. ${a_{ii}=d_{i}}$, ${a_{ij}=-1}$ if ${ij\in E}$, ${=0}$ otherwise.

${trace(BA)=4E(S,S^c)}$, so ${Clust(A|\begin{pmatrix} 1&-1 \\ -1&1 \end{pmatrix})=4}$ MAX CUT. So there is some inapproximability threshold for CLUST.

Theorem 2 Assume ${A}$ is centered and ${B}$ is centered and spherical (i.e. ${1}$‘s on the diagonal) or ${B=}$ identity matrix. Then there exists a polytime algorithm that computes ${Clust(A|B)}$ up to

• If ${k=2}$, ${\frac{\pi}{2}}$, sharp assuming UGC.
• If ${k=3}$, ${\frac{16\pi}{27}}$, sharp assuming UGC.
• If ${k\geq 4}$, ${\frac{8\pi}{9}(1-\frac{1}{k})}$.

The last value may be tight as well. It depends on a geometric conjecture which I discuss now.

3. A geometric puzzle

Puzzle: Partition the plane into 3 measurable sets ${A,B,C}$. Define the Gaussian moment ${x_A}$ of a set ${A}$ as the vector whose components are the integrals of coordinate functions with respect to Gaussian measure. For which partition of the plane is ${|x_A|^2 +|x_B|^2 +|x_C|^2}$ maximal ?

Theorem 3 Maximum is attained at partitions into 3 equal sectors.

Next puzzle: Partition ${k}$-space into ${k+1}$ measurable sets. Similar question. We do not know the answer, but we formulate the Propeller conjecture: only 3 pieces are needed, and are products of ${2}$-dimensional sectors with complementary subspaces.

Theorem 4 In a partition of ${{\mathbb R}^{k-1}}$ which maximises the sum of the squares of Gaussian moments, the non empty pieces are products of vector subspaces and pieces of a simplicial and conical partition in lower dimension.

If we could prove that the simplices are regular, the Propeller Conjecture would follow.

A similar phenomenon (only a constant number of pieces are necessary) already showed up in work of Keith Ball.

4. Back to the algorithm for CLUST${(A,B)}$

Given a positive definite matrix ${B}$, view it as a Gram matrix ${v_i \cdot v_j}$. Let ${R(B)}$ be the radius Euclidean ball containing all ${v_i}$‘s. Let ${C(B)}$ be the maximum if ${\sum_{ij}b_{ij}z_{i}\cdot z_{j}}$ over all vectors ${z_i}$ arising as Gaussian moments of a partition of Euclidean space.

Theorem 5 There is a polytime algorithm which, given ${A}$ computes ${Clust(A|B)}$ up to ${\frac{R(B)^2}{C(B)}}$. Assuming UGC, no other algorithm can do better.

For ${B=B(c)=\begin{pmatrix} 1&0&0 \\ 0&1&0 \\ 0&0&c \end{pmatrix}}$, the UGC threshold is

$\displaystyle \begin{array}{rcl} \frac{4\pi c (1+c)^2}{(1+2c)^3}&&\quad \textrm{if }c\geq\frac{1}{2},\\ \frac{\pi (1+c)^2}{2+4c}&&\quad \textrm{if }c\leq\frac{1}{2},\\ \end{array}$

4.1. Connection with Grothendieck’s constant

Grothendieck’s inequality is

$\displaystyle \begin{array}{rcl} \max_{x_i \in S^{k-1}}\sum_{ij}a_{ij}x_i \cdot x_j \leq K\,\max_{\epsilon_i =\pm 1}\sum_{ij}a_{ij}\epsilon_i \epsilon_j . \end{array}$

Given ${k}$ vectors ${v_j}$ in ${{\mathbb R}^k}$, there is a generalization

$\displaystyle \begin{array}{rcl} \max_{x_i \in S^{k-1}}\sum_{ij}a_{ij}x_i \cdot x_j \leq K\,\max_{u_i ,\,u_j \in \{v_1 ,\ldots,v_k}\sum_{ij}a_{ij}u_i \cdot u_j . \end{array}$

Theorem 6 The best ${K}$ in the generalized Grothendieck inequality above is ${1/C(B)}$ where ${B=(v_i \cdot v_j)_{ij}}$.

Proof: Project… $\Box$

4.2. Reduction from UGC to CLUST

Clustering is like minimizing over maps ${[n]\rightarrow[k]}$. Reduction amounts to assigning to all functions ${[k]^n \rightarrow [k]}$ a number ${OBJ(f)}$. Assume that all coordinate functions have ${OBJ(x_i)\geq a}$. If ${f}$ has no influential coordinate, ${OBJ(f)\leq b+o(1)}$. This implies that the UGC threshold is ${a/b}$.

References

[KN1] Khot Subhash; Naor, Assaf; Approximate kernel clustering. Mathematika 55 (2009), no. 1-2, 129–165.

[KN2] Khot Subhash; Naor, Assaf; Sharp kernel clustering algorithms and their associated Grothendieck inequalities. SODA 2010.