Probabilistically checkable proofs
Goal: prove the basic PCP theorem.
Today: introduction, connection between PCP’s and inapproximability. I will prove a simple but suggestive NPPCP(poly,1).
Tomorrow: Gap amplification.
Wednesday: Gap amplification and PCP composition.
1. A story
Suppose you want to check your telephone bill. The sum of a lot of numbers depends on all numbers: a little change of an input changes the output. Randomly selecting a few conversations does not help. So let us ask the phone company to provide more info that would help. Ask for all possible subsums. It is an exponentially large list. You can check it by comparing certain subsums (the sums for subsets , , , are related). PCP will go like that: be demanding to the prover, verify just a few consistencies.
NP is the class of all problems with checkable solutions. Equivalently, it is the class of all efficient proof systems.
Example 1 -COLORING. View the graph as a theorem and a -coloring as a proof.
Theorem 1 Every NP proof system can be encoded by a robust and locally testable proof system.
Locally testable means if there is a bug, it can be seen nearly in every small bit of the proof.
Formalizing this requires definitions.
Definition 2 A regular verifier for a language is a polynomial time algorithm which reads proofs and decides yes or no, and such that
- Yes If , then such that verifiers says yes.
- No If , then for all , verifier says no.
Given two functions of (input size), (number of random bits) and (number of queries), an -verifier for is a polynomial time algorithm which reads given graph , tosses coins, reads bits from and decides yes or no, with the following properties.
- Yes: If , then such that .
- No: If , then for all , . The class PCP is the class of all languages which admit -verifiers.
By definition, a language is in LP iff it admits a regular verifier. Here is the formal statement of the PCP Theorem.
Theorem 3 NP=PCP.
2.1. PCP and hardness of approximation
PCP is equivalent to a hardness of approximation result for a constraint satisfaction problem.
Theorem 4 PCP there exists a Karp reduction of every LP language to a -CSP involving -ary constraints on boolean variables.
Proof: Suppose GAP-CSP is NP-hard. So for any NP language , there is a mapping such that
- if , is a satisfiable CSP instance.
- if , is a CSP instance only half of which constraints can be simultaneously satisfied.
This provides us with an -verifier for . Indeed, the verifier reads , computes . It expects a proof in the form of an assignment of variables that satisfies all clauses of . It picks a constraint from at random ( coins suffice for that), reads the values of the variables of the proof corresponding to this constraint (there are only of them), and says yes iff the values satisfy the constraint. If , there is a correct proof, on which the verifier always says yes. If , no proof can satisfy more than half the constraints, so the probability that the verifier says no is .
Conversely, given an -verifier for , build a constraint satisfaction problem as follows. For each sequence of bits and each proof that , let be the boolean function whose variables are the bits of read by the -verifier and which takes value iff the -verifier says yes on when the tossed coins return . The set of all , , constitutes the instance of a constraint satisfaction problem. Constraints are -ary since only bits of each proof are read.
By assumption, if , for every proof that , all are satisfied by the relevant bits of . Conversely, if , for every proof , at least half of the constraints are not satisfied.
In other words, the PCP theorem amounts to a transformation on graphs, which maps -colorable graphs to -colorable graphs, and maps non--colorable graphs to graphs which are blatantly -colorable, in the sense that every -coloring fails on a definite fraction of edges. And this transformation is computable in polynomial time. Note that the algorithm cannot know wether the graph it is working on is -colorable or not. All it can do is amplify the errors.
3. A baby example
Back to the telephone bill story. The phone company claims that the sum of a large set of numbers equals 200$. I ask the company to provide a long proof, the set of all subsums. A mistake on one conversation affects half of the subsums, so I can’t miss it.
Let us be more formal. Let us work mod , for simplicity. We are given numbers in . We want to check that mod . We require proofs to be vectors (this is sometimes called the Hadamard coding of ).
- Step 1. Check validity of encoding.
- Step 2. Decode the desired value from .
The first task is known as linearity testing. Here is how it can be done. Choose a random , or equivalently, . Choose at random . Check wether .
Claim. is a legal encoding of some .
Furthermore, this test is robust: If is close to only if is close to a legal encoding of some .
Drawback of method: the Hadamard encoding is costly (exponential). We shall see tomorrow how to do better.
4. Comments and questions
Question: Is PCP ? Answer: Yes. This direction is easy.
Question: Why use the values and ? What do different values give ? Answer: yes, this is an important point. Other values relate to approximability issues.