Notes of Gil Kalai’s lecture nr 3

Before I go on, let me come back to the discussion of optimality of KKL Theorem. It is sharp inasmuch as {I(f)} is large. One can be a bit more precise, even if one is merely interested in {\max I_k}.

Theorem 1 {t=1/2}. Let {f} be Boolean. If {\max_k I_k (f)\leq s}, then

\displaystyle  \begin{array}{rcl}  I(f)\geq c\log(\frac{1}{s}). \end{array}

But there is life beyond {\max_k I_k}. This is captured in Talagrand’s improvement.

Theorem 2 (Talagrand) Let {f} be Boolean with {\mathop{\mathbb P}(f=1)=t}. Then

\displaystyle  \begin{array}{rcl}  \sum_{k}\frac{I_k (f)}{\log 1/I_k (f)}\geq ct(1-t). \end{array}

This is sharp (up to a multiplicative constant) for describing the whole vector of influences. Examples are variants of tribes with unequal tribe sizes.

1. Application to first passage percolation

What I discuss today in the special case of the square grid holds in much more generality, see Raphaël Rossignol’s work.

Theorem 3 Let {D=d((0,0),(v,0))} in the random metric with edge-lengths chosen at random in {\{1,2\}}.

\displaystyle  \begin{array}{rcl}  Var(D)\leq C\frac{v}{\log v}. \end{array}

Theorem 4 (Kesten, Talagrand) Let {D=d((0,0),(v,0))} in the random metric with edge-lengths chosen at random in {\{1,2\}}.

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb P}(D\geq \mathrm{median}(D)+t\sqrt{v})\leq e^{-ct^2}. \end{array}

One might believe that {D} is asymptotically Gaussian. This is not the case.

Conjecture. {Var(D)=v^{2/3}}.

1.1. Our approach to first passage percolation

{D} is {{\mathbb N}}-valued, but {D_k =D-D\circ\sigma_k} is {\{-1,0,1\}}-valued, so most of our previous calculations apply to {D}.

\displaystyle  \begin{array}{rcl}  Var(D)=\sum_{S\not=\emptyset}\hat{D}_S^2 \leq \sum_{S}|S|\hat{D}_S^2 = I(f)=\sum_{k}I_k (D). \end{array}

Given {x=(x_1 ,\ldots,x_n)}, let {h(x)} be the number of neighbours {y} of {x} such that {D(y)\not=D(x)}. Then

\displaystyle  \begin{array}{rcl}  I(D)=\sum_{k}I_k (D)=\mathop{\mathbb E}(h(x)). \end{array}

Things would be simple if one could prove that {h\leq 100 v} with high probability. We cannot prove this directly. Here is a simple trick to circumvent this. Let us consider separately {y}‘s which increase (resp. decrease) {D}, i.e.

\displaystyle  \begin{array}{rcl}  h^+ =|\{y\sim x \,;\, D(y)>D(x)\}|,\quad h^- =|\{y\sim x \,;\, D(y)<D(x)\}|. \end{array}

They have equal expectations (for every pair {(x,y)} of neighbours, one is counted in {h^+} and the other in {h^-}), so

\displaystyle  \begin{array}{rcl}  I(D)=\mathop{\mathbb E}(h^+)+\mathop{\mathbb E}(h^-)=2\mathop{\mathbb E}(h^+). \end{array}

Claim. {h^+ \leq v}.

Indeed, the only way to increase {D} is to increase the length of one of the edges of the minimal path from {(0,0)} to {(v,0)}. This path has at most {2v} edges. Even better if there are several minimal paths.

Now we know that {Var(D)\leq I(D)\leq 4v}. This is the proof of Kesten’s theorem.

Next we are concerned with {\mathop{\mathbb P}(D(x)>s)=1-t}. Assume {t} is small. Put

\displaystyle  \begin{array}{rcl}  \bar{D}(x)=\max\{D(x),s\}. \end{array}

Then {\bar{D}} is non constant only on a {1-t}-fraction of {\Omega_n}, so

\displaystyle  \begin{array}{rcl}  Var(\bar{D})\leq 4vt. \end{array}

This implies that {\mathop{\mathbb P}(D(x)>t\sqrt{v})} decays exponentially (Kesten). Indeed, the variance of {D} conditionned by {\{D(x)>t\sqrt{v}\}} is bounded above by {4v} as well, so probability is divided by {2} when {s} is incremented.

1.2. Improvements

To improve exponential into Gaussian decay (and to get the {v/\log v} bound on variance), we use hypercontractivity. We need to show that most Fourier coefficients of {D} (or equivalently of {D_k}) are above {c\log n} level. By our version of Harper’s theorem (Theorem 2 of previous lecture), it suffices to show that the support of {|D_k|} is small. It is enough to show it except for edges in balls containing {v^{1/100}} edges around {(0,0)} and {(v,0)}.

I believe the following Lemma is true.

Lemma 5 For every edge ouside {B((0,0),\frac{1}{10})} and {B((v,0),\frac{1}{10})}, the probability to belong to a minimal path between {(0,0)} and {(v,0)} is {\leq v^{-1/10}}.

But I am unable to prove it. Here is how we circumvent it. Consider paths connecting vertical segments of length {v^{1/10}} at endpoints. This defines a new function {D'} which is very close to {D}, and the Lemma holds for {D'}.

To get Talagrand’s Gaussian tail estimate, consider again {\bar{D}}. By our version of Harper’s Theorem, since {\bar{D}_k} has support of size {\leq t}, the Fourier transform {\widehat{\bar{D}_k}} is concentrated on sets of size {\geq \log\frac{1}{t}}. This gives {I(\bar{D})\leq 4vt\log\frac{1}{t}}. The extra {\log\frac{1}{t}} also gives Gaussian decay.

Last passage percolation refers to the longest path (paths are restricted to go north and east only). There are results as well, often sharper.

2. Back to the density increasing argument

For many problems, the inductive projection argument gives the best known results, although rather weak when compared with examples.

Let {A\subset\Omega_n} have measure {1-t}. KKL Theorem implies that there is a coordinate {k} so that if one projects {A} orthogonally to this coordinate, the measure of projection is

\displaystyle  \begin{array}{rcl}  \mu(\textrm{projection of }A)\geq (1-t)(1-ct\frac{\log n}{n}). \end{array}

Indeed, the effect of projection is to increase measure exactly by influence. This implies that there is a set of {c\frac{\log n}{n}} coordinates such that if we project orthogonally to these coordinates, {\mu(\textrm{projection of }A)\geq 1-\epsilon}.

The theorem even shows that there is a set of {n/10} coordinates such that if we project orthogonally to them, then {\mu(\textrm{projection of }A)\geq 1-n^{-1/5}}.

This is probably far from the true behaviour: measure of projection is probably exponentially close to {1}.

Conjecture (B. Chor). Suppose {A\subset\Omega_n} has half measure. There exist {o(n)} coordinates such that the measure of the projection of {A} to these coordinates is {1-c^n}.

A similar difficulty arises in the Blum, L, Rubinfeld linearity testing theorem.

The weak point is the fact one has to iterate. Here is a third situation where the same issue arises.

Let {A\subset ({\mathbb Z}/3{\mathbb Z})^n}. Assume that {A} does not contain triples of distinct vectors whose sum is zero.

Question. How large can {A} be ?

Example: {A=\{0,1\}^n} has size {2^n}. One can do better: record is {(2.2)^n}.

Theorem 6 (Meshulam) {|A|\leq \frac{3^n}{n}}.

Recently, this was improved.

Theorem 7 (Bateman, Katz) {|A|\leq \frac{3^n}{n^{1+\epsilon}}}.

Note Fourier analysis over {({\mathbb Z}/3{\mathbb Z})^n} works quite similarly. Meshulam’s argument goes as follows.

1. Give a Fourier theoretic description to the number of triples in {A} in terms of {f=1_A}:

\displaystyle  \begin{array}{rcl}  \sum \hat{f}_\omega^3 &=&\hat{f}_0^3 +\sum\textrm{ others}^3 \\ &\geq&\hat{f}_0^3 -\sum|\textrm{others}|^2 \max|\textrm{others}|. \end{array}

To spoil this estimate requires a large Fourier coefficient. There must be a linear subspace where {A} has larger density.

2. Iterate.

3. Threshold behaviour of random graphs

3.1. Monotone functions

Definition 8 A Boolean function {f:\Omega_n \rightarrow\{-1,1\}} is monotone if

\displaystyle  \begin{array}{rcl}  x_1 \geq y_1 ,\ldots x_n \geq y_n \Rightarrow f(x)\geq f(y). \end{array}

{I_k (f)} for monotone functions is simpler, {I_k (f)=-\hat{f}_{\{k\}}}. This follows from the general formula, for monotone {f},

\displaystyle  \begin{array}{rcl}  \hat{f}_S =\frac{1}{2^n}\sum_{x\in\Omega_n} (-1)^{|\{i\in S\,;\,x_i =1\}|}f(x). \end{array}

It follows that {\sum_k I_k(f)^2 \leq 1}.

We shall use non uniform distributions on {\Omega_n}. For instance,

\displaystyle  \begin{array}{rcl}  \mu^{p_1 ,\ldots,p_n}=\mu^{p_1}\otimes \cdots\otimes \mu^{p_n}, \end{array}

and in particular {\mu^p =\mu^{p,\ldots,p}}.

Denote by {\mu^{p_1 ,\ldots,p_n}(f):=\mu^{p_1 ,\ldots,p_n}(\{x\,;\,f(x)=1\})}, {\mu^{p}(f):=\mu^{p}(\{x\,;\,f(x)=1\})}.

Claim. If {f} is monotone, {p\mapsto \mu^p (f)} is non decreasing.

Indeed, to simulate {\mu^q}, {q\geq p}, simulate {\mu^p} first and then flip bits from {-1} to {1} randomly again. This increases the value of {f}.

Lemma 9 (Russo) If {f} is monotone,

\displaystyle  \begin{array}{rcl}  \frac{d\mu^p (f)}{dp}=I^p (f). \end{array}

Proof. {\mu^{p_1 ,\ldots,p_n}(f)} is linear in each variable, and

\displaystyle  \begin{array}{rcl}  \frac{\partial\mu^{p_1 ,\ldots,p_n}_k (f)}{\partial p_k}=I_k^{p_1 ,\ldots,p_n} (f). \end{array}

3.2. Sharp thresholds

Let us plot {p\mapsto \mu^p (f)}. For Dictators, the graph is a line (an instance of a coarse threshold). Many monotone functions exhibit sharp thresholds: {\mu^p (f)} jumps from nearly {0} to nearly {1} in a small window. This happens notably for properties fd random graphs in the Erdös-Renyi model. The sharpest threshold is achieved by Majority.

Advertisements

About metric2011

metric2011 is a program of Centre Emile Borel, an activity of Institut Henri Poincaré, 11 rue Pierre et Marie Curie, 75005 Paris, France. See http://www.math.ens.fr/metric2011/
This entry was posted in Course, Workshop lecture and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s