## Notes of Gil Kalai’s lecture nr 2

1. Proof of the KKL Theorem

Harper’s theorem deals with the sum of influences. It is a bit stronger than the direct consequences of Fourier expansion described by Mossel yesterday. The KKL Theorem is even stronger.

1.1. Fourier expression for influences

Notation: ${\Omega_n =\{-1,1\}^n}$. It is a metric space of diameter ${n}$ (Hamming metric) and a graph. Let ${A}$ be a subset of ${\Omega_n}$. Let ${f=1_A}$ be the 0-1-valued characteristic function of ${A}$. Assume that ${\mathop{\mathbb P}(f=1)=t}$. Recall that

$\displaystyle \begin{array}{rcl} f=\sum_{S\subset[n]}\hat{f}_S \chi_S \end{array}$

is nicely written as a sum of multilinear polynomials. In particular,

$\displaystyle \begin{array}{rcl} \hat{f}_{\emptyset}=t,\quad \sum_S \hat{f}_S^2 =t,\quad \sum_{S\not=\emptyset} \hat{f}_S^2 =t-t^2 . \end{array}$

We are aiming at a Fourier expression of influences. Let ${\sigma_k :\Omega_N \rightarrow\Omega_n}$ flip the ${k}$-th coordinate. Then

$\displaystyle \begin{array}{rcl} I_k (t)=\mathop{\mathbb P}(f(\sigma_k (x))\not=f(x)),\quad I(f)=\sum_{k=1}^{n} I_k (f)=\frac{1}{2^{n-1}}|E(A,\bar{A})|, \end{array}$

where ${E(A,\bar{A})}$ is the set of edges joining ${A}$ to its complement ${\bar{A}}$. ${I_k (f)}$ corresponds to the subset of edges parallel the ${k}$-axis.

Note that ${f_k =f-f\circ\sigma_k}$ takes values ${0}$, ${1}$ or ${-1}$. ${I_k (f)}$ is the probability that ${f_k (x)=\pm}$, so

$\displaystyle \begin{array}{rcl} I_k (f)=||f_k||_2^2 . \end{array}$

The Fourier coefficients of ${f\circ\sigma_k}$ are

$\displaystyle \begin{array}{rcl} \widehat{f\circ\sigma_k}_S=\begin{cases} \hat{f}_k & \text{ if }k\notin S, \\ -\hat{f}_S & \text{otherwise}. \end{cases} \end{array}$

This yields

$\displaystyle \begin{array}{rcl} I_k (f)=4\sum_{k\in S}\hat{f}_S^2 , \quad I(f)=4\sum_{S}|S|\hat{f}_S^2 . \end{array}$

This implies

$\displaystyle \begin{array}{rcl} I(t)\geq 4 Var(f)=4t-4t^2 . \end{array}$

This is sharp for ${t=1/2}$.

Theorem 1 (Consequence of Harper’s theorem)

$\displaystyle \begin{array}{rcl} I(t)\geq 2t\log(\frac{1}{t}). \end{array}$

This is sharp for subcubes. Indeed, every vertex of the ${\ell}$-cube has ${\ell}$ neighbours in the subcube. Since ${\ell=\log_2 t}$, this explains the log term. There is a simple proof, using the method of canonical paths, see below.

Our main intermediate step towards KKL Theorem is the following result.

Theorem 2 Let ${f=1_A}$, ${||f||_2^2 =t}$. Then

$\displaystyle \begin{array}{rcl} \sum_{|S|\leq \log(\frac{1}{t})/5}\hat{f}_S^2 \leq t^{6/5}. \end{array}$

This implies that

$\displaystyle \begin{array}{rcl} I(f)\geq\frac{t}{6}\log(\frac{1}{t}). \end{array}$

This consequence is still weaker than Harper’s theorem as far as ${I(f)}$ is concerned, but the theorem is stronger. It says that the portion of the ${L_2}$-norm carried by small sets is negligible.

1.2. Hypercontractivity

Recall that our main aim is: If ${supp(f)}$ is small and ${f}$ is Boolean, then ${\hat{f}}$ is concentrated on high frequences, i.e. large sets. Note that for Boolean functions, ${||f||_p^p =||f_||q^q}$ for all ${p}$ and ${q}$. If ${||f||_p^p =t}$ is small, then for distinct ${p}$ and ${q}$, ${||f||_p}$ and ${||f_||q}$ are very different.

Recall Khintchine’s theorem. It says that if a function has only degree one Fourier coefficients, then all the ${L_p}$ norms of ${f}$ are comparable.

Theorem 3 If ${g:\Omega_n \rightarrow{\mathbb R}}$ is of degree one, ${g(x)=\sum_{k=1}^{n}\alpha_k x_k}$, then

$\displaystyle \begin{array}{rcl} c_{p,q}||g||_q \leq ||g||_p \leq ||g||_q. \end{array}$

We want to replace degree ${1}$ by small degree. Also, we cannot assume a Boolean function is entirely supported on low degrees. We can merely assume that it holds approximately. Fortunately for us, the relevant extension of Khintchine’s theorem exists in the litterature. Also, the best constant in Khintchine’s theorem was not given by Khintchine, but only several decades later.

Theorem 4 (Bonami, Gross, Beckner (Nelson,…)) Let ${f:\Omega_n \rightarrow{\mathbb R}}$, ${f=\sum_{S} \hat{f}_S \chi_S}$. Let

$\displaystyle \begin{array}{rcl} T_\epsilon f =\sum_{S} \hat{f}_S \epsilon^{|S|}\chi_S . \end{array}$

Then

$\displaystyle \begin{array}{rcl} ||T_\epsilon f||_2 \leq ||f||_{1+\epsilon^2}. \end{array}$

It is easy to derive the inequality for ${\Omega_n}$ from the ${1}$-dimensional case, by induction, due to the product structure. The ${1}$-dimensional case is a not quite easy exercise in calculus (only 3 parameters involved).

Aline Bonami was the first to prove this, although Ed Nelson considered similar issues motivated by mathematical physics earlier. Gross observed the connection with log-Sobolev inequalities.

1.3. Back to Harper’s Theorem

Now I state Harper’s theorem in its full force.

Theorem 5 (Harper) Use lexicographic ordering on ${\Omega_n}$. Then sets of the form ${\{x\in\Omega_n \,;\,x\leq x_0\}}$ minimize ${|E(A,\bar{A})|}$ among sets of the same size.

Here is a proof of the consequence alluded to above. Given ${x}$, ${y\in\Omega_n}$, the canonical path from ${x}$ to ${y}$ consists in flipping bits when necessary, from left to right. Consider all canonical paths from ${x\in A}$ to ${y\in\bar{A}}$. There are ${m(2^n -m)}$ such paths, ${m=|A|}$. Every such path contains at least one edge in ${E(A,\bar{A})}$. Conversely, given an edge ${e\in E(A,\bar{A})}$ which amounts to flipping the ${k}$-th coordinate, ${e}$ belongs to at most ${2^{k-1}2^{n-k}=2^{n-1}}$ such canonical paths. This implies that

$\displaystyle \begin{array}{rcl} |E(A,\bar{A})|\geq\frac{m(2^{n}-m)}{2^{n-1}}. \end{array}$

1.4. Proof of Theorem 2

Now I prove the Theorem above. Take ${\epsilon=1/2}$. Then

$\displaystyle \begin{array}{rcl} ||T_\epsilon f||_2^2 \leq ||f||_{5/4}^2 =t^{8/5}. \end{array}$

Spell it out

$\displaystyle \begin{array}{rcl} \sum_{S}\epsilon^{2|S|}\hat{f}_S^2 \leq t^{8/5}. \end{array}$

When one sticks to ${S}$ such that ${|S|\leq\frac{\log t}{5}}$, ${\epsilon^{2|S|}\geq t^{-2/5}}$. Since ${\frac{8}{5}-\frac{2}{5}=\frac{6}{5}}$, we get Theorem 2.

1.5. How KKL Theorem applies to collective coin flipping

Theorem 6 (KKL) Let ${f:\Omega_n \rightarrow\{-1,1\}}$. Assume ${\mathop{\mathbb P}(f=1)=t\leq\frac{1}{2}}$. Then

$\displaystyle \begin{array}{rcl} \max_k I_k (f)\geq ct(1-t)\frac{\log n}{n}, \end{array}$

where ${c}$ is a universal constant.

By applying it iteratively, one gets the following Corollary, which applies to collective coin flipping.

Corollary 7 (KKL) Let ${f\Omega_n \rightarrow\{-1,1\}}$. Assume ${\mathop{\mathbb P}(f=1)=t\leq\frac{1}{2}}$. For all ${\epsilon>0}$, there exists ${C(\epsilon)}$ such that there exists ${S}$ with ${|S|\leq C\frac{n}{\log n}}$ such that

$\displaystyle \begin{array}{rcl} I_S (f)\geq 1-\epsilon. \end{array}$

Proof: Project ${f}$ to some coordinate. Then ${\mathop{\mathbb P}(\textrm{projection}=1)=t(1+2I_k (f))}$.

1.6. Proof of KKL Theorem

Suppose that for all ${k}$, ${I_k (f)\leq \frac{1}{\sqrt{n}}}$. We will show that the sum ${I(f)\geq c\frac{\log n}{n}t(1-t)}$.

Recall that ${f_k =f-f\circ\sigma_k}$. ${||f_k||_2^2=4I_k (f)}$. Apply hypercontractivity.

$\displaystyle \begin{array}{rcl} \sum_{k=1}^n ||T_{1/2}f_k||_2^2 &\leq& \sum_{k=1}^n (||f_k||_{5/4}^{5/4})^{8/5}\\ &\leq&(\frac{1}{\sqrt{n}})^{3/5}(\sum_{k=1}^n I_k (f)). \end{array}$

On the other hand,

$\displaystyle \begin{array}{rcl} \sum_{k=1}^n ||T_{1/2}f_k||_2^2 &=&\sum_{k=1}^n \sum_S \widehat{f_k}_S^2 (\frac{1}{2})^|S|\\ &\geq&\sum_{|S|\leq\frac{\log n}{10}}|S|\hat{f}^2_S (\frac{1}{2})^|S|\\ &\geq& n^{-1/10}\sum_{|S|\leq\frac{\log n}{10}}\hat{f}^2_S , \end{array}$

which yields

$\displaystyle \begin{array}{rcl} \sum_{|S|\leq\frac{\log n}{10}}\hat{f}^2_S \leq n^{-2/7}. \end{array}$

In other words, all except ${o(1)}$ of the Fourier weights of ${f}$ are above ${\frac{\log n}{10}}$ level. This implies that ${I(f)\geq \frac{\log n}{10}}$.

KKL is sharp up to the universal constant. Regarding constants, the gap between the tribes example and the KKL estimate is not that large: a factor of ${\sqrt{2}}$. Note that we have applied hypercontractivity blindly to a Boolean function, without taking into account the fact it is Boolean. There may be some room for improvement, but I am unable to make it.

2. Application to first passage percolation

On the ${{\mathbb Z}^2}$ grid, change the length of edges: every edge is assigned a length ${1}$ or ${2}$ with probability ${1/2}$. What is the distance between ${(0,0)}$ and ${(m,0)}$ ?

It is at most ${2m}$, so we need not care of edges outside a square of size ${2m}$. Let ${n}$ be the number of relevant edges, let ${\Omega_n}$ record the random choices. Then distance is a function ${D:\Omega_n \rightarrow{\mathbb N}}$.

One can show that ${\mathop{\mathbb E}(D)\sim cn}$ where not much is known about the constant ${c}$. What is the variance of ${D}$ ?

Theorem 8 (Kesten) ${Var(D)=O(m)}$.

Later on, Talagrand gave Gaussian tail estimates on ${D}$.

I will show next time that Kesten’s estimate follows easily from ${Var(f)\leq I(f)}$ for Boolean functions.

Using hypercontractivity, I will show that ${Var(D)=O(\frac{m}{\log m})}$, and also how Gaussian tail estimates follow.