Notes of Parikshit Gopalan’s talk

The pseudorandomness-learning nexus

We focus on a class of Boolean functions called half-spaces, signs of linear forms. An on the class ${TC_0}$ of depth ${2}$-circuits where gates are half-spaces. We shall illustrate connections between learning and pseudorandomness on this class. Note this class is not that easy.

Question [ O’Neill, 1971]: How many half spaces are needed to slice every edge of the hypercube ?

This naive looking question is still widely open.

1. Learning

Problem: Predict Boolean function ${f}$ from a sample drawn according to some distribution ${D}$.

1. Most general model is PAC-learning : ${D}$ is unknown.
2. Uniform distribution learning: Assume ${D}$ us uniform.
3. Query learning : can ask for ${f(x)}$ for certain ${x}$.
4. Agnostic learning: the unknown function need not be a half-space, look for a predictor in the family of half-spaces, do the best possible.

Half spaces are PAC learnable. Indeed, this boils down to solving linear systems of equations.

2. Pseudorandomness

Theorem 1 (Peres 2001) Polynomials of degree ${O(\frac{1}{\epsilon^2})}$ suffice for an ${\epsilon}$ approximation.

Peres stated this is terms of noise sentivity. See Ryan O’Donnell’s lecture notes.

This implies bounds on learning time under uniform distribution.

2.1. Pseudorandom generators for half-spaces

Construct a small support distribution ${D}$ that fools half-spaces. Candidate: a ${k}$-wise independent distribution. Indeed, Peres’ theorem can be interpreted in these terms (see Braverman’s talk this morning). But one must upgrade approximation into sandwich. This cannot work. This follows from BGGP’s independence sensitivity for Majority of majority.

Our approach: Replace ${sign}$ function by a low degree polynomial.

We expect linear function to be distributed like a Gaussian.

Berry-Esseen: Quantitative comparison to a Gaussian. This requires that coefficients of the linear form be uniformly small (regularity).

2.2. Sandwiching ${sign}$ by polynomials

Since Gaussian is concentrated, one must achieve excellent approximation near the origin. It turns out that degree ${O(\frac{1}{\epsilon^2})}$ suffices for an ${\epsilon}$ approximation of ${sign}$.

Example 1 ${2^n x_1 +2^{n-1}x_2 +\cdots+x_n}$.

Only the two first variables matter.

Example 2 ${\sqrt{n} x_1 +2^{n-1}x_2 +\cdots+x_n}$.

This becomes regular once conditionned w.r.t. ${x_1}$.

Lemma 2 (Servedio 2006) Randomly setting the first ${O(\frac{1}{\epsilon^2})}$ head variables gives regular halfspaces or constants, up to ${\epsilon}$.

Theorem 3 (Diakonikolas, Gopalan, Jaiswal, Servedio, Viola) Halfspaces are fooled by a pseudorandom generator with seed length ${O(\frac{1}{\epsilon^2}\log n)}$.

2.3. Sandwiching ${sign}$ by branching programs

Improve on the seed-length ? Replace polynomials by branching programs. Certain half-spaces require exponential width. Meka and Zuckerman show that sandwiching with small width branching programs is possible. Their generator has seed length ${\log n +(\log\frac{1}{\epsilon})^2}$.

If queries are allowed, then learning small width branching programs is possible.

2.4. Application: counting Knapsack solutions

This amounts to counting ${|h^{-1}(1)|}$, for some half-space. Exactly: NP-hard. Approximately ? Related to pseudorandom generators.

Morris-Sinclair 1999: Use MCMC approach. We give a deterministic algorithm: We approximate the given half-space by a suitable branching program.

3. Conclusion

Concerning half-spaces, our present state of knowledge is very similar on both sides: pseudorandomness and learning. This state of affair might change someday if progress is made in cryptography.

The slides of this talk can be found there.