## Notes of Moses Charikar’s talk, jan. 17th

Dimension reduction in ${\ell_1}$

1. The problem

Problem: Given ${n}$ points in Banach space ${\ell_1}$, how many dimensions are needed to represent pairwise distances to within ${1\pm\epsilon}$ ?

In other words, given a finite subset ${F\subset \ell_{1}^{d}}$, we want a map ${f:\ell_{1}^{d}\rightarrow\ell_{1}^{d'}}$ such that for all ${x}$, ${y\in F}$,

$\displaystyle \begin{array}{rcl} \frac{1}{1+\epsilon}\|x-y|_{1}\leq\|f(x)-f(y)|_{1}\leq \|x-y|_{1}. \end{array}$

This works well in ${\ell_{2}}$, this is the Johnson-Lindenstrauss theorem: for an ${n}$-point set, ${d'=O(\epsilon^{-2}\log n)}$ dimensions suffice. If ${\epsilon=0}$, ${d'=n-1}$ are sufficient, and often necessary.

2. Old and new results

In ${\ell_1}$, exact reduction (i.e. ${\epsilon=0}$) requires ${\Theta(n^2)}$ dimensions.

For ${1+\epsilon}$ reduction, a result of Schechtman (later improved by Talagrand), states that ${O(\epsilon^{-2}n\log n)}$ dimensions suffice. Newman and Rabinovich improved this to ${O(\epsilon^{-2} n)}$ dimensions suffice. An unpublished result of Andoni, Naor and Newman says that ${O(n/D)}$ dimensions suffice for large ${D}$.

Lower bounds. With Brinkman, I proved that ${n^{\Omega(1/D^2)}}$ dimensions are necessary for distorsion ${D}$, [BC]. A simpler proof has been given since by James Lee and Assaf Naor, [LN].

Today, I focus on ${D}$ close to ${1}$. Previously, I proved that ${n^{\frac{1}{2}+\epsilon}}$ are needed. Alexandru Andoni, Ofer Neiman and Nguyen and I show

Theorem 1 For ${1+\epsilon}$ reduction, ${n^{1-O(1/\log(1/\epsilon))}}$ dimensions are needed.

3. The examples

The lower bound is based on a recursive ${2k}$-cycle graph. This means one starts with a cycle with two marked antipodal vertices ${s}$ and ${t}$. Then we replace each edge with a copy of the cycle, and so on. For ${k=2}$, this is called the diamond graph (basis of the ${n^{\Omega(1/D^2)}}$ lower bound). The diamond graph is isometrically embeddable in ${\sqrt{n}}$ dimensions, so cannot give a sharp lower bound.

The ${i}$-th graph ${G_i}$ has diameter ${k^{i}}$, ${(2k)^{i}}$ edges, roughly ${(2k)^{i}}$ vertices. It embeds isometrically in ${k^i}$ dimensions. Note that ${k^i \sim n^{1-1/\log k}}$, fitting with the Andoni, Naor and Newman upper bound.

4. From distorsion to stretch

We first show that the ${2k}$-cycle needs ${1/2\epsilon}$ dimensions for ${1=\epsilon}$ distorsion., if ${\epsilon\leq 1/k}$. To prove this, think of an ${\ell_1}$ embedding as a probabilistic line embedding. I.e.

$\displaystyle \begin{array}{rcl} |x-y|=\sum_{i=1}^{n}|x_i -y_i|=\mathop{\mathbb E}(d|x_i -y_i|). \end{array}$

Say a line embedding has stretch ${s}$ if

$\displaystyle \begin{array}{rcl} |f(x)-f(y)|\leq s.d(x,y). \end{array}$

Say a distribution over line embeddings has stretch ${s}$ if above inequality holds for all ${f}$ in the support.

Our key estimate is the following: a (low) ${d}$-dimensional embedding has stretch ${d}$. So it suffices to estimate stretch of line embeddings from below.

5. Analysis of embeddings of the ${2k}$-cycle

The canonical embedding of the ${2k}$-cycle uses ${k}$ dimensions. For instance, if ${k=4}$, 0000, 0001, 0011, 0111, 1111, 1110, 1100, 1000. When we view it as a distribution over line embedding, we rescale by ${k}$.

Let ${L}$ be the sum of antipodals, ${E}$ be the sum of edges. In the exact embedding case, ${L=k^2}$, ${E=2k}$. In the ${(1+\epsilon)}$-distorsion case, ${L\geq k^2 /1+\epsilon}$ and ${E\leq 2k}$, so that

$\displaystyle \begin{array}{rcl} 2L-k(1-2\epsilon)E \geq 2\frac{k^2}{1+\epsilon}-k(1-2\epsilon)2k=2\epsilon k^2 . \end{array}$

Now we show that low (${<1/2\epsilon}$) stretch implies ${2L-k(1-2\epsilon)E <2\epsilon k^2}$. When the cycle is mapped to a line, every two antipodal pairs ${(a,b)}$ and ${(a',b')}$ are mapped to non nested intervals, so

$\displaystyle \begin{array}{rcl} |f(a)-f(a')|+|f(b)-f(b')|\leq |f(a)-f(b)|+|f(a')-f(b')|. \end{array}$

This implies that nearby antipodal pairs are mapped to comparable lengths. This propagates to not that nearby antipodal pairs, and so on… This leads to

$\displaystyle \begin{array}{rcl} 2L\leq kE_0 +(k-2)E_1 +(k-4)E_2 +\cdots+2E_{k/2 -1}, \end{array}$

where ${E_i}$ are edge lengths on the line. Think of ${\epsilon}$ as ${i/k}$. Then

$\displaystyle \begin{array}{rcl} 2L-k(1-2\epsilon)E =2L-(k-2i)E \leq 2iE_0 +2(i-1)E_1 +2(i-2)E_2 +\cdots \end{array}$

Assume that every edge has stretch at most ${s}$. Then ${E_0 =2s}$ and other ${E_j =4s}$. This gives ${2L-k(1-2\epsilon)E\leq 4si^2 \leq 2\epsilon k^2}$ if stretch is ${\leq 1/2\epsilon}$.

6. General case

Now we turn to graph ${G_t}$. It has roughly ${(2k)^t}$ vertices. We show that for distorsion ${1+1/k}$, one needs stretch ${\geq (k/16)^t \simeq n^{1- 5/\log k}}$.

The canonical embedding of ${G_t}$ is built recursively, adding ${k}$ dimensions at each step. Edges have length ${1}$, antipodal pairs at level ${i}$ are mapped to distance ${k^{t-i+1}}$. There are ${2^{i-1}}$ such pairs.

Use a Poincaré inequality. Let ${L_i}$ be the sums of antipodals at level ${i}$ and ${E}$ the sum of edges at level ${t}$. Now consider

$\displaystyle \begin{array}{rcl} P=\sum_{i=1}^{t}2^{t+1-i}L_i -t(k-2)E. \end{array}$

One shows that low (${1+1/k}$) distorsion implies ${P\geq t(2k)^t}$, and low (${(k/16)^t}$) stretch implies ${P. For this, one adds up estimates for ${2L}$ for all cycles in the construction.

This style of analysis can be used to give a simple proof of our previous result. It was discovered using LP methods, including experiments.

Regev: connection with functional analysis ? Answer: this does not go to ${\ell_p}$ spaces. Dimension reduction in ${\ell_p}$ is widely open.

Question: how tight is the result ? role of aspect ratio ? Answer: there is room for slight improvements.

Linial: do things generically embed well ? these examples look extreme. Answer: there are other graphs for which we care about embeddings. The recursive nature of examples suggests is there a parallel repetition phenomenon.

References

[BC] Brinkman, B; Charikar, Moses; On the Impossibility of Dimension Reduction in l1. FOCS (2003).

[LN] a, Embedding the Diamond Graph in Lp and Dimension Reduction in L1. GAFA 14 (2004) 745–747.