Dimension reduction in
1. The problem
Problem: Given points in Banach space , how many dimensions are needed to represent pairwise distances to within ?
In other words, given a finite subset , we want a map such that for all , ,
This works well in , this is the Johnson-Lindenstrauss theorem: for an -point set, dimensions suffice. If , are sufficient, and often necessary.
2. Old and new results
In , exact reduction (i.e. ) requires dimensions.
For reduction, a result of Schechtman (later improved by Talagrand), states that dimensions suffice. Newman and Rabinovich improved this to dimensions suffice. An unpublished result of Andoni, Naor and Newman says that dimensions suffice for large .
Today, I focus on close to . Previously, I proved that are needed. Alexandru Andoni, Ofer Neiman and Nguyen and I show
Theorem 1 For reduction, dimensions are needed.
3. The examples
The lower bound is based on a recursive -cycle graph. This means one starts with a cycle with two marked antipodal vertices and . Then we replace each edge with a copy of the cycle, and so on. For , this is called the diamond graph (basis of the lower bound). The diamond graph is isometrically embeddable in dimensions, so cannot give a sharp lower bound.
The -th graph has diameter , edges, roughly vertices. It embeds isometrically in dimensions. Note that , fitting with the Andoni, Naor and Newman upper bound.
4. From distorsion to stretch
We first show that the -cycle needs dimensions for distorsion., if . To prove this, think of an embedding as a probabilistic line embedding. I.e.
Say a line embedding has stretch if
Say a distribution over line embeddings has stretch if above inequality holds for all in the support.
Our key estimate is the following: a (low) -dimensional embedding has stretch . So it suffices to estimate stretch of line embeddings from below.
5. Analysis of embeddings of the -cycle
The canonical embedding of the -cycle uses dimensions. For instance, if , 0000, 0001, 0011, 0111, 1111, 1110, 1100, 1000. When we view it as a distribution over line embedding, we rescale by .
Let be the sum of antipodals, be the sum of edges. In the exact embedding case, , . In the -distorsion case, and , so that
Now we show that low () stretch implies . When the cycle is mapped to a line, every two antipodal pairs and are mapped to non nested intervals, so
This implies that nearby antipodal pairs are mapped to comparable lengths. This propagates to not that nearby antipodal pairs, and so on… This leads to
where are edge lengths on the line. Think of as . Then
Assume that every edge has stretch at most . Then and other . This gives if stretch is .
6. General case
Now we turn to graph . It has roughly vertices. We show that for distorsion , one needs stretch .
The canonical embedding of is built recursively, adding dimensions at each step. Edges have length , antipodal pairs at level are mapped to distance . There are such pairs.
Use a Poincaré inequality. Let be the sums of antipodals at level and the sum of edges at level . Now consider
One shows that low () distorsion implies , and low () stretch implies . For this, one adds up estimates for for all cycles in the construction.
7. Comments and questions
This style of analysis can be used to give a simple proof of our previous result. It was discovered using LP methods, including experiments.
Regev: connection with functional analysis ? Answer: this does not go to spaces. Dimension reduction in is widely open.
Question: how tight is the result ? role of aspect ratio ? Answer: there is room for slight improvements.
Linial: do things generically embed well ? these examples look extreme. Answer: there are other graphs for which we care about embeddings. The recursive nature of examples suggests is there a parallel repetition phenomenon.