## Notes of Prasad Raghavendra’s lecture nr 2

1. Invariance principle

1.1. Some notation

The hypercube is ${\{\mp 1\}^R}$. The Fourier-Walsh decomposition of a real function ${f}$ on the hypercube is

$\displaystyle \begin{array}{rcl} f=\sum_{S\subset[R]}\hat{f}_S \chi_S , \end{array}$

where ${\chi_S (x)=\prod_{i\in S}x_i}$. Parseval identity reads

$\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(f^2)=\sum_{S\subset[R]}|\hat{f}_S |^2 . \end{array}$

The influence of the ${i}$-th coordinate on a Boolean function ${f}$ is

$\displaystyle \begin{array}{rcl} Inf_i (f)=\mathop{\mathbb P}_{x}(f(xe_i)\not=f(x)), \end{array}$

where ${x}$ is uniformly distributed on the hypercube and ${e_i}$ has all entries ${1}$ but the ${i}$-th.

Example 1 If ${f}$ is a dictator, ${f(x)=x_i}$, then ${Inf_i (f)=1}$, ${Inf_j (f)=0}$ if ${j\not=i}$.

If ${f(x)=Maj(x):=sign(\sum x_i)}$, then for all ${i}$, ${Inf_i (f)=\frac{1}{\sqrt{R}}}$.

For non Boolean functions, we define influence as

$\displaystyle \begin{array}{rcl} Inf_i (f)=\frac{1}{4}\mathop{\mathbb E}_{x}|f(xe_i)-f(x)|^2 . \end{array}$

This has a nice Fourier expression

$\displaystyle \begin{array}{rcl} Inf_i (f)= \sum_{S\subset[R]\,;\,i\in S}|\hat{f}_S |^2 . \end{array}$

Lemma 1

$\displaystyle \begin{array}{rcl} \sum_{i=1}^{R}Inf_i (f)\leq d\mathop{\mathbb E}|f|^2 . \end{array}$

Proof:

$\displaystyle \begin{array}{rcl} \sum_{i=1}^{R}Inf_i (f)&=& \sum_i \sum_{i\in S}\hat{f}_S^2 \\ &\leq& d\sum_{S}\hat{f}_S^2 =d\mathop{\mathbb E}|f|^2 . \end{array}$

$\Box$

Equivalently, we shall consider multilinear polynomials on ${{\mathbb R}^n}$, i.e. functions of the form ${p(x)=\sum_{S\subset[n]}p_S \chi_S (x)}$. We continue the same notation.

1.2. Central limit theorem

The invariant principle is a non linear generalization of the Central Limit Theorem. Here is a baby version.

Theorem 2 Let ${X_i}$ be independant identically distributed random variables admitting a ${4}$-th moment. The distribution of their sum converges to a Gaussian distribution.

Here is our interpretation. Consider the multilinear polynomial ${p(x)=x_1 +\cdots+x_n}$. If we feed it with a uniform Boolean vector ${x}$ or a Gaussian vector ${Y}$, or any random vector with i.i.d. coordinates, admitting a ${4}$-th moment, the resulting distributions are similar.

Our generalization replaces ${p(x)=x_1 +\cdots+x_n}$ by an arbitrary degree ${d}$ multilinear polynomial. Note that the theorem cannot hold for dictators ${p(x)=x_i}$. So one must restrict to polynomials which are from dictators, namely, have uniformly low influences. We shall also need a more quantitative version, which generalizes the Berry-Esseen Theorem in the linear case.

1.3. A baby invariance principle

Reference: Ryan O’Donnell’s course on Analysis of Boolean functions.

Theorem 3 Let ${Q}$ be a degree ${d}$ multilinear polynomial such that ${\mathop{\mathbb E}(Q(x))\leq 1}$ and for all ${i}$, ${Inf_i (Q)\leq\tau}$. Let ${b}$ be uniformly distributed on the hypercube, let ${g}$ be a Gaussian vector. Then the distributions of ${Q(b)}$ and ${Q(g)}$ are close in the following sense: For every ${C^4}$ function ${\Psi:{\mathbb R}\rightarrow{\mathbb R}}$, ${\|\Psi\|_{C^4}\leq B}$,

$\displaystyle \begin{array}{rcl} |\mathop{\mathbb E}_b (\Psi(Q(b))) - \mathop{\mathbb E}_g (\Psi(Q(g)))|\leq d\,9^d B\tau . \end{array}$

Proof: Hybridize: Replace one variable at a time. Denote by

$\displaystyle \begin{array}{rcl} Z_{i}=Q(b_1 ,\ldots,b_{i-1},g_i ,\ldots,g_n). \end{array}$

It suffices to bound

$\displaystyle \begin{array}{rcl} |\mathop{\mathbb E}_b (\Psi(Z_{i})) - \mathop{\mathbb E}_g (\Psi(Z_{i+1}))|. \end{array}$

Lemma 4

$\displaystyle \begin{array}{rcl} |\mathop{\mathbb E}_b (\Psi(Z_{i})) - \mathop{\mathbb E}_g (\Psi(Z_{i+1}))|\leq 9^d B \,Inf_i (Q)^2 . \end{array}$

Summing up yields

$\displaystyle \begin{array}{rcl} |\mathop{\mathbb E}_b (\Psi(Q(b))) - \mathop{\mathbb E}_g (\Psi(Q(g)))|&\leq& 9^d B \,Inf_i (Q)^2 \\ &\leq& 9^d B \max_i Inf_i (Q)\sum_i Inf_i (Q) \\ &\leq& d\,9^d B\tau. \end{array}$

$\Box$

Proof: of Lemma.

Write ${Q(x)=r(x)+x_i s(x)}$ where ${r}$ and ${s}$ are multilinear polynomials not depending on ${x_i}$. Then

$\displaystyle \begin{array}{rcl} Z_i =R+g_i S ,\quad Z_{i+1}=R+b_i S, \end{array}$

where ${R}$, ${S}$ are random variables which are independant from ${b_i}$ and ${g_i}$. Use Taylor expansion

$\displaystyle \begin{array}{rcl} \Psi(Z_i)&=&\Psi(R)+g_i S \Psi'(R)+\cdots+\frac{1}{3!}(g_i S)^3 \Psi^{(3)}(R)+ \rho, \end{array}$

$\displaystyle \begin{array}{rcl} \Psi(Z_{i+1})&=&\Psi(R)+b_i S \Psi'(R)+\cdots+\frac{1}{3!}(b_i S)^3 \Psi^{(3)}(R)+ \rho', \end{array}$

where remainders ${|\rho|\leq \frac{B}{4!}(g_i S)^4}$, ${|\rho'|\leq \frac{B}{4!}(b_i S)^4}$. By independance and linearity, the expectations of all terms in the expansion are the same for ${Z_{i}}$ and ${Z_{i+1}}$, since ${b_i}$ and ${g_i}$ have the same moments up to order ${3}$. With ${\mathop{\mathbb E}(g_i^4)\leq 9}$, it follows that

$\displaystyle \begin{array}{rcl} |\mathop{\mathbb E}_b (\Psi(Z_{i})) - \mathop{\mathbb E}_g (\Psi(Z_{i+1}))|\leq B\mathop{\mathbb E}(S^4). \end{array}$

The proof ends with the following hypercontractivity lemma. Indeed,

$\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(S^2)=Inf_i (Q). \end{array}$

$\Box$

Lemma 5 (Bonami) For a degree ${d}$ multilinear polynomial ${p}$,

$\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(p^4)\leq 9^d \mathop{\mathbb E}(p^2)^2 . \end{array}$

when inputs are independant variables, either Boolean or Gaussian.

Proof: By induction on ${n}$. Write ${p=r+x_n s}$ where ${r}$ and ${s}$ do not depend on ${x_n}$. then

$\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(p^4)&=&\mathop{\mathbb E}(r^4)+4\mathop{\mathbb E}(r^3 x_n s)+6\mathop{\mathbb E}(r^2 x_n^2 s^2)+4\mathop{\mathbb E}(r x_n^3 s^3)+\mathop{\mathbb E}(x_n^4 s^4)\\ &\leq&\mathop{\mathbb E}(r^4) +6\sqrt{\mathop{\mathbb E}(r^4)\mathop{\mathbb E}(s^4)}+9\mathop{\mathbb E}(s^4)\\ &\leq&9^{d-1}\mathop{\mathbb E}(r^2)^2 +6\,9^{d-1}\mathop{\mathbb E}(r^2)\mathop{\mathbb E}(s^2)+9^{d}\mathop{\mathbb E}(s^2)^2\\ &\leq& 9^d (\mathop{\mathbb E}(r^2)+\mathop{\mathbb E}(s^2))^2 \\ &=& 9^d \mathop{\mathbb E}(p^2)^2 . \end{array}$

$\Box$

1.4. A more general invariance principle

We some more work, one can prove the following vector valued version of Theorem 3.

Theorem 6 Let ${Q}$ be an ${R}$-variable multilinear polynomial of degree ${d}$. Let ${X^{(1)},\ldots,X^{(k)}}$, ${Y^{(1)},\ldots,Y^{(k)}}$ be random variables. Assume that

1. For all ${i}$, ${\mathop{\mathbb E}(X^{(i)})=\mathop{\mathbb E}(Y^{(i)})}$. For all ${i}$ and ${j}$, ${\mathop{\mathbb E}(X^{(i)}X^{(j)})=\mathop{\mathbb E}(Y^{(i)}Y^{(j)})}$.
2. For all ${i}$, ${\mathop{\mathbb E}|X^{(i)}|^4 \leq q}$, ${\mathop{\mathbb E}|Y^{(i)}|^4 \leq q}$,
3. For all ${i}$, ${Inf_i (Q) \leq \tau}$.
4. ${\mathop{\mathbb E}(Q^2)\leq 1}$.

For each ${i}$, consider ${R}$ independant copies ${X_1^{(i)},\ldots,X_R^{(i)}}$ of ${X^{(i)}}$. Denote by ${Q(X)}$ the random vector whose ${i}$-th component is ${Q(X_1^{(i)},\ldots,X_R^{(i)})}$. Idem for the ${Y}$‘s. Then for every smooth function ${\Psi:{\mathbb R}^k \rightarrow {\mathbb R}}$,

$\displaystyle \begin{array}{rcl} |\mathop{\mathbb E}(\Psi(Q(X)))-\mathop{\mathbb E}(\Psi(Q(Y)))|\leq O_{d,k,B}(\tau^{1/d}). \end{array}$

1.5. Smoothing

Let ${T_{1-\epsilon}}$ act on real functions on the hypercube as follows

$\displaystyle \begin{array}{rcl} (T_{1-\epsilon}f)(x)=\mathop{\mathbb E}_z (f(xz)), \end{array}$

where ${z}$ has independent components, each takes value ${-1}$ with probability ${\epsilon}$. In Fourier expansion,

$\displaystyle \begin{array}{rcl} T_{1-\epsilon}f =\sum_{S\subset[R]}(1-\epsilon)^{|S|}\hat{f}_S . \end{array}$

It has a damping effect on high frequencies, this is why one can view it as a smoothing operator. Theorem 6 applies only to degree ${d}$ polynomials ${Q}$. A slight variant of it applies to ${Q=T_{1-\epsilon}f}$ for arbitrary ${f}$.

2. Dictatorship tests

2.1. What it this good for ?

A dictatorship test is a gadget used in proving hardness of approximation results. It is used in reductions from Unique Games. Say we want to prove hardness of a Boolean CSP ${\Lambda}$. The corresponding gadget ${DICT_{\Lambda}}$ is an instance of ${\Lambda}$. So it takes variables in a hypercube ${\{\pm 1\}^R}$, ${R}$ is the label size of the Unique Games instance to reduce from. An assignment to ${DICT_{\Lambda}}$ is a boolean function on the hypercube (it is huge). ${DICT_{\Lambda}}$ must distinguish dictator assignments from others, with the following guarantees.

• Completeness: Dictator assignments have value ${\geq c}$.
• Soundness: Assignments which are far from dictators (e.g. Boolean functions with low influences) have value ${\leq s}$.

From such a ${DICT_{\Lambda}}$, one gets mechanically a UGC conditional hardness of approximation result for ${\Lambda}$ with threshold ${\frac{c}{s}}$. See Guy Kindler’s course.

2.2. Construction of ${DICT_{\Lambda}}$

We start from a hard instance ${I}$ of ${\Lambda}$, and construct a corresponding instance ${DICT_{\Lambda}}$. We shall see that completeness can be taken to be ${SDP(I)-\epsilon}$ and soundness ${OPT(I)+\epsilon}$. From the SDP, we get a probability distribution ${\mu_{P}}$ on the hypercube for each predicate in ${\Lambda}$. ${DICT_{\Lambda}}$ is a distribution on clauses, viewed as Boolean functions on the hypercube ${\{\pm 1\}^R}$. It depends on a large parameter ${R}$ (this will be the label size of a Unique Games instance to reduce from). To sample a clause,

1. Sample a clause ${P}$ of ${I}$.
2. Sample a random boolean function on the hypercube according to ${\mu_{P}^{\otimes R}}$.
3. Introduce ${\epsilon}$-noise.

Example 2 For instance, for MAX CUT, instances are weighted graphs ${G}$. There is one predicate per edge, so probability distributions ${\mu_{ij}}$ on ${\{\pm 1\}^2}$.

Let us continue with this example (MAX CUT). ${DICT_{\Lambda}}$ is a weighted graph with vertex set the hypercube. To sample an edge,

1. Sample an edge ${e=ij}$ of ${G}$. Get a positive number ${|v_i -v_j|}$ from the SDP solution.
2. Sample a random pair of points of the hypercube according to ${\mu_{ij}^{\otimes R}\otimes \mu_{ij}^{\otimes R}}$ (under this distribution, the expected distance is ${|v_i -v_j|}$.
3. Introduce ${\epsilon}$-noise.

2.3. Completeness analysis

Dictators are unaffected by noise. By symmetry, all dictators get the same value.

2.4. Soundness analysis

First ignore the noise. Let us continue with MAX CUT. The invariance principle tells that low influence cuts in the hypercube or in the sphere behave in the same way. In particular, the value of cuts in the graph ${DICT_{\Lambda}}$ are the same than in the symmetric graph in the sphere obtained by averaging rotated images of the SDP solution.

The role of the noise is to replace arbitrary degree boolean functions ${f}$ (cuts, in the case of MAX CUT) by ${T_{1-\epsilon}f}$, in order to apply the invariance principle.