So far we have seen two concentration bounds for scalar random variables: Markov and Chernoff. For our sort of applications, these are by far the most useful. In most introductory probability courses, you are likely to see another inequality, which is **Chebyshev’s inequality**. Its strength lies between the Markov and Chernoff inequalities: the concentration bounds we get from Chebyshev is usually better than Markov but worse than Chernoff. On the other hand, Chebyshev requires stronger hypotheses than Markov but weaker hypotheses than Chernoff.

**1. Variance **

We begin by reviewing **variance**, and other related notions, which should be familiar from an introductory probability course. The variance of a random variable is

The **covariance** between two random variables and is

This gives some measure of the correlation between and .

Here are some properties of variance and covariance that follow from the definitions by simple calculations.

Claim 1If and are independent then .

Claim 2.

More generally, induction shows

Claim 3.

Claim 4.

More generally, induction shows

In particular,

Claim 6Let bemutually independentrandom variables. Then .

**2. Chebyshev’s Inequality **

Chebyshev’s inequality you’ve also presumably seen before. It is a 1-line consequence of Markov’s inequality.

Theorem 7For any ,

*Proof:*

where the inequality is by Markov’s inequality.

As a quick example, suppose we independently flip fair coins. What’s the probability that we see at least heads? Let be the indictator random variable of the event “th toss is heads”. Let . So we want to analyze .

**Bound from Chebyshev:** Note that

By independence,

By Chebyshev’s inequality

**Bound from Chernoff:** Chernoff’s inequality gives

This is better than the bound from Chebyshev for .

So Chebyshev is weaker than Chernoff, at least for analyzing sums of independent Bernoulli trials. So why do we bother studying Chebyshev? One reason is that Chernoff is designed for analyzing sums of **mutually independent** random variables. That is quite a strong assumption. In some scenarios, our random variables are not mutually independent, or perhaps we deliberately choose them not to be mutually independent.

- For example, generating mutually independent random variables requires a lot of random bits and, as discussed last time, randomness is a “precious resource”. We will see that decreasing the number of random bits give another method to derandomize algorithms.
- Another important example is in constructing hash functions, which are random-like functions. Generating a completely random function takes a huge number of random bits. So instead we often try to use hash functions involving less randomness.

**3. -wise independence **

A set of events are called **-wise independent** if for any set with we have

The term **pairwise independence** is a synonym for -wise independence.

Similarly, a set of discrete random variables are called **-wise independent** if for any set with and any values we have

*Proof:* For notational simplicity, consider the case . Then

**Example:** To get a feel for pairwise independence, consider the following three Bernoulli random variables that are pairwise independent but not mutually independent. There are 4 possible outcomes of these three random variables. Each of these outcomes has probability .

They are certainly not mutually independent because the event has probability , whereas . But, by checking all cases, one can verify that they are pairwise independent.

** 3.1. Constructing Pairwise Independent RVs **

Let be a finite field and . We will construct RVs such that each is uniform over and the ‘s are pairwise independent. To do so, we need to generate only *two* independent RVs and that are uniformly distributed over . We then define

Claim 9Each is uniformly distributed on .

*Proof:* For we have , which is uniform. For and any we have

since as ranges through , also ranges through all of . (In other words, the map is a bijection of to itself.) So is uniform.

Claim 10The ‘s are pairwise independent.

*Proof:* We wish to show that, for any distinct RVs and and any values , we have

This event is equivalent to and . We can also rewrite that as:

This holds precisely when

Since and are independent and uniform over , this event holds with probability .

Corollary 11Given mutually independent, uniformly random bits, we can construct pairwise independent, uniformly random strings in .

*Proof:* Apply the previous construction to the finite field . The mutually independent random bits are used to construct and . The random strings are constructed as in (1).

** 3.2. Example: Max Cut with pairwise independent RVs **

Once again let’s consider the Max Cut problem. We are given a graph where . We will choose -valued random variables . If then we add vertex to .

Our original algorithm chose to be mutually independent and uniform. Instead we will pick to be *pairwise independent* and uniform. Then

So the original algorithm works just as well if we make pairwise independent decisions instead of mutually independent decisions for placing vertices in . The following theorem shows the advantage of making pairwise independent decisions.

Theorem 12There is a deterministic, polynomial time algorithm to find a cut with .

*Proof:* By Corollary 11, we only need mutually independent, uniform random bits in order to generate our pairwise independent, uniform random bits . We have just argued that these pairwise independent ‘s will give us

So there must *exist* some particular bits such that fixing for all , we get . We can deterministically find such bits by exhaustive search in trials. This gives a deterministic, polynomial time algorithm.

**4. Chebyshev with pairwise independent RVs **

One of the main benefits of pairwise independent RVs is that Chebyshev’s inequality still works beautifully. Suppose that are pairwise independent. For any ,

by Claim 8. So

by Claim 5. So

This is exactly the same bound that we would get if the ‘s were mutually independent.

faro 財布