## Notes of Irit Dinur’s lecture #1

Probabilistically checkable proofs

Goal: prove the basic PCP theorem.

Today: introduction, connection between PCP’s and inapproximability. I will prove a simple but suggestive NP${\subset}$PCP(poly,1).

Tomorrow: Gap amplification.

Wednesday: Gap amplification and PCP composition.

1. A story

Suppose you want to check your telephone bill. The sum of a lot of numbers depends on all numbers: a little change of an input changes the output. Randomly selecting a few conversations does not help. So let us ask the phone company to provide more info that would help. Ask for all possible subsums. It is an exponentially large list. You can check it by comparing certain subsums (the sums for subsets ${A}$, ${B}$, ${A\cup B}$, ${A\cap B}$ are related). PCP will go like that: be demanding to the prover, verify just a few consistencies.

2. Proofs

NP is the class of all problems with checkable solutions. Equivalently, it is the class of all efficient proof systems.

Example 1 ${3}$-COLORING. View the graph as a theorem and a ${3}$-coloring as a proof.

NP-completeness means the proof system can encode all other proof systems. Here is an informal statement of the PCP theorem, [AS], [ALMSS].

Theorem 1 Every NP proof system can be encoded by a robust and locally testable proof system.

Locally testable means if there is a bug, it can be seen nearly in every small bit of the proof.

Formalizing this requires definitions.

Definition 2 A regular verifier for a language is a polynomial time algorithm which reads proofs ${\pi}$ and decides yes or no, and such that

• Yes If ${G\in L}$, then ${\exists \pi}$ such that verifiers says yes.
• No If ${G\notin L}$, then for all ${\pi}$, verifier says no.

Given two functions of ${n}$ (${n=}$input size), ${r}$ (number of random bits) and ${q}$ (number of queries), an ${(r,q)}$-verifier for ${L}$ is a polynomial time algorithm which reads given graph ${G}$, tosses ${r}$ coins, reads ${q}$ bits from ${\pi}$ and decides yes or no, with the following properties.

• Yes: If ${G\in L}$, then ${\exists \pi}$ such that ${\mathop{\mathbb P}(\textrm{verifiers says yes})=1}$.
• No: If ${G\notin L}$, then for all ${\pi}$, ${\mathop{\mathbb P}(\textrm{verifier says no})\geq 1/2}$. The class PCP${(r,q)}$ is the class of all languages which admit ${(r,q)}$-verifiers.

By definition, a language is in LP iff it admits a regular verifier. Here is the formal statement of the PCP Theorem.

Theorem 3 NP=PCP${(\log n,1)}$.

2.1. PCP and hardness of approximation

PCP is equivalent to a hardness of approximation result for a constraint satisfaction problem.

Theorem 4 PCP ${\Leftrightarrow}$ there exists a Karp reduction of every LP language to a ${(1,1/2)}$-CSP involving ${q}$-ary constraints on boolean variables.

Proof: Suppose GAP${(1,1/2)}$-CSP is NP-hard. So for any NP language ${L}$, there is a mapping ${x\mapsto \phi(x)}$ such that

• if ${x\in L}$, ${\phi(x)}$ is a satisfiable CSP instance.
• if ${x\notin L}$, ${\phi(x)}$ is a CSP instance only half of which constraints can be simultaneously satisfied.

This provides us with an ${(r,q)}$-verifier for ${L}$. Indeed, the verifier reads ${x}$, computes ${\phi(x)}$. It expects a proof in the form of an assignment of variables that satisfies all clauses of ${\phi(x)}$. It picks a constraint from ${\phi(x)}$ at random (${O(\log n)}$ coins suffice for that), reads the values of the variables of the proof corresponding to this constraint (there are only ${q=O(1)}$ of them), and says yes iff the values satisfy the constraint. If ${x\in L}$, there is a correct proof, on which the verifier always says yes. If ${x\notin L}$, no proof can satisfy more than half the constraints, so the probability that the verifier says no is ${\geq 1/2}$.

Conversely, given an ${(r\log n,q)}$-verifier for ${L}$, build a constraint satisfaction problem as follows. For each sequence ${R}$ of ${r\log n}$ bits and each proof ${\pi}$ that ${x\in L}$, let ${\phi(R,\pi)}$ be the boolean function whose variables are the bits of ${\pi}$ read by the ${(r\log n,q)}$-verifier and which takes value ${1}$ iff the ${(r\log n,q)}$-verifier says yes on ${\pi}$ when the tossed coins return ${R}$. The set of all ${\phi(R,\pi)}$, ${R\in \{ 0,1 \}^{r\log n}}$, constitutes the instance ${\phi(x)}$ of a constraint satisfaction problem. Constraints are ${q}$-ary since only ${q}$ bits of each proof are read.

By assumption, if ${x\in L}$, for every proof ${\pi}$ that ${x\in L}$, all ${\phi(R,\pi)}$ are satisfied by the relevant bits of ${\pi}$. Conversely, if ${x\notin L}$, for every proof ${\pi}$, at least half of the constraints ${\phi(R,\pi)}$ are not satisfied. $\Box$

In other words, the PCP theorem amounts to a transformation on graphs, which maps ${3}$-colorable graphs to ${3}$-colorable graphs, and maps non-${3}$-colorable graphs to graphs which are blatantly ${3}$-colorable, in the sense that every ${3}$-coloring fails on a definite fraction of edges. And this transformation is computable in polynomial time. Note that the algorithm cannot know wether the graph it is working on is ${3}$-colorable or not. All it can do is amplify the errors.

3. A baby example

Back to the telephone bill story. The phone company claims that the sum of a large set of numbers equals 200\$. I ask the company to provide a long proof, the set of all subsums. A mistake on one conversation affects half of the subsums, so I can’t miss it.

Let us be more formal. Let us work mod ${2}$, for simplicity. We are given ${n}$ numbers in ${\{ 0,1 \}}$. We want to check that ${\sum_{i=1}^{n}x_i =0}$ mod ${2}$. We require proofs to be vectors ${y=(x\cdot v)_{v\in\{ 0,1 \}^n}}$ (this is sometimes called the Hadamard coding of ${x\in\{ 0,1 \}^n}$).

1. Step 1. Check validity of encoding.
2. Step 2. Decode the desired value from ${y}$.

The first task is known as linearity testing. Here is how it can be done. Choose a random ${S\subset [n]}$, or equivalently, ${\alpha\in\{ 0,1 \}^n}$. Choose at random ${\beta\in\{ 0,1 \}^n}$. Check wether ${y_{\alpha}+y_{\beta}=y_{\alpha+\beta}}$.

Claim. ${\mathop{\mathbb P}(\textrm{test succeeds})=1 \Leftrightarrow y}$ is a legal encoding of some ${x}$.

Furthermore, this test is robust: If ${\mathop{\mathbb P}(\textrm{test succeeds})}$ is close to ${1}$ only if ${y}$ is close to a legal encoding of some ${x}$.

Drawback of method: the Hadamard encoding is costly (exponential). We shall see tomorrow how to do better.

Question: Is PCP${(\log n,1)\subset NP}$ ? Answer: Yes. This direction is easy.

Question: Why use the values ${1}$ and ${1/2}$ ? What do different values give ? Answer: yes, this is an important point. Other values relate to approximability issues.

References

[AS] Arora, Sanjeev; Safra, Shmuel; Probabilistic checking of proofs: a new characterization of NP. J. ACM 45 (1998), no. 1, 70–122.

[ALMSS] Arora, Sanjeev; Lund, Carsten; Motwani, Rajeev; Sudan, Madhu; Szegedy, Mario; Proof verification and the hardness of approximation problems. J. ACM 45 (1998), no. 3, 501–555.