So far we have seen two concentration bounds for scalar random variables: Markov and Chernoff. For our sort of applications, these are by far the most useful. In most introductory probability courses, you are likely to see another inequality, which is Chebyshev’s inequality. Its strength lies between the Markov and Chernoff inequalities: the concentration bounds we get from Chebyshev is usually better than Markov but worse than Chernoff. On the other hand, Chebyshev requires stronger hypotheses than Markov but weaker hypotheses than Chernoff.
We begin by reviewing variance, and other related notions, which should be familiar from an introductory probability course. The variance of a random variable is
The covariance between two random variables and is
This gives some measure of the correlation between and .
Here are some properties of variance and covariance that follow from the definitions by simple calculations.
Claim 1 If and are independent then .
Claim 2 .
More generally, induction shows
Claim 3 .
Claim 4 .
More generally, induction shows
Claim 6 Let be mutually independent random variables. Then .
2. Chebyshev’s Inequality
Chebyshev’s inequality you’ve also presumably seen before. It is a 1-line consequence of Markov’s inequality.
Theorem 7 For any ,
where the inequality is by Markov’s inequality.
As a quick example, suppose we independently flip fair coins. What’s the probability that we see at least heads? Let be the indictator random variable of the event “th toss is heads”. Let . So we want to analyze .
Bound from Chebyshev: Note that
By Chebyshev’s inequality
Bound from Chernoff: Chernoff’s inequality gives
This is better than the bound from Chebyshev for .
So Chebyshev is weaker than Chernoff, at least for analyzing sums of independent Bernoulli trials. So why do we bother studying Chebyshev? One reason is that Chernoff is designed for analyzing sums of mutually independent random variables. That is quite a strong assumption. In some scenarios, our random variables are not mutually independent, or perhaps we deliberately choose them not to be mutually independent.
- For example, generating mutually independent random variables requires a lot of random bits and, as discussed last time, randomness is a “precious resource”. We will see that decreasing the number of random bits give another method to derandomize algorithms.
- Another important example is in constructing hash functions, which are random-like functions. Generating a completely random function takes a huge number of random bits. So instead we often try to use hash functions involving less randomness.
3. -wise independence
A set of events are called -wise independent if for any set with we have
The term pairwise independence is a synonym for -wise independence.
Similarly, a set of discrete random variables are called -wise independent if for any set with and any values we have
Proof: For notational simplicity, consider the case . Then
Example: To get a feel for pairwise independence, consider the following three Bernoulli random variables that are pairwise independent but not mutually independent. There are 4 possible outcomes of these three random variables. Each of these outcomes has probability .
They are certainly not mutually independent because the event has probability , whereas . But, by checking all cases, one can verify that they are pairwise independent.
3.1. Constructing Pairwise Independent RVs
Let be a finite field and . We will construct RVs such that each is uniform over and the ‘s are pairwise independent. To do so, we need to generate only two independent RVs and that are uniformly distributed over . We then define
Claim 9 Each is uniformly distributed on .
Proof: For we have , which is uniform. For and any we have
since as ranges through , also ranges through all of . (In other words, the map is a bijection of to itself.) So is uniform.
Claim 10 The ‘s are pairwise independent.
Proof: We wish to show that, for any distinct RVs and and any values , we have
This event is equivalent to and . We can also rewrite that as:
This holds precisely when
Since and are independent and uniform over , this event holds with probability .
Proof: Apply the previous construction to the finite field . The mutually independent random bits are used to construct and . The random strings are constructed as in (1).
3.2. Example: Max Cut with pairwise independent RVs
Once again let’s consider the Max Cut problem. We are given a graph where . We will choose -valued random variables . If then we add vertex to .
Our original algorithm chose to be mutually independent and uniform. Instead we will pick to be pairwise independent and uniform. Then
So the original algorithm works just as well if we make pairwise independent decisions instead of mutually independent decisions for placing vertices in . The following theorem shows the advantage of making pairwise independent decisions.
Theorem 12 There is a deterministic, polynomial time algorithm to find a cut with .
Proof: By Corollary 11, we only need mutually independent, uniform random bits in order to generate our pairwise independent, uniform random bits . We have just argued that these pairwise independent ‘s will give us
So there must exist some particular bits such that fixing for all , we get . We can deterministically find such bits by exhaustive search in trials. This gives a deterministic, polynomial time algorithm.
4. Chebyshev with pairwise independent RVs
One of the main benefits of pairwise independent RVs is that Chebyshev’s inequality still works beautifully. Suppose that are pairwise independent. For any ,
by Claim 8. So
by Claim 5. So
This is exactly the same bound that we would get if the ‘s were mutually independent.