**1. Graph Sparsifiers **

Let be an undirected graph. How well can be approximated by a sparse graph? Such questions have been studied for various notions of approximation. Today we will look at **approximating the cuts** of the graph. As before, the cut defined by is

Let be a weight function on the edges. The **weight** of a set is defined to be

So the weight of the cut defined by is .

A **graph sparsifier** is a non-negative weight function such that

We can think of any edge with weight zero as being deleted. So the goal is to find a sparse, but weighted, subgraph of such that the weight of every cut is preserved up to a multiplicative factor of .

How could one find a sparsifier? A natural idea is to sample the edges independently with some probability . That works well if is the complete graph because it essentially amounts to constructing an Erdos-Renyi random graph, which is well-studied.

Unfortunately this approach falls apart when is quite different from the complete graph. One such graph is the “dumbbell graph”, which consists of two disjoint cliques, each on vertices, and a single edge in the middle connecting the cliques. We would like to get rid of most edges in the cliques, but we would need to keep the edge in the middle. This example tells us that we should not sample all edges with the same probability .

So now the question is: for each edge, how “important” is it? Should we sample it with low probability or high probability? The notion of **edge-connectivity**, which we defined in the previous lecture, seems quite useful. Recall that for an edge we let be the minimum size of a cut containing , i.e.,

By the Max-Flow Min-Cut theorem, equals the maximum amount of flow that can be sent between the endpoints of the edge. So can be efficiently computed. Edges with high connectivity only appear in cuts with many other edges, so intuitively they are not terribly important. In the dumbbell example, the clique edges have connectivity , whereas the single edge in the middle has connectivity . So the connectivity values seem to do a good job at identifying important edges.

** 1.1. The sampling process **

Consider the following sampling process, where is a parameter the determines the number of “rounds” of sampling.

- Initially .
- Compute all edge connectivities .
- For
- For each
- With probability , increase by .

One great feature of this sampling process is that all edge weights are preserved in expectation.

Claim 1For every edge we have .

*Proof:* The expected increase in in the th iteration is . By linearity of expectation, the expected increase over all iterations is .

Moreover, the weight of any set of edges is also preserved in expectation.

Corollary 1For any , we have .

*Proof:* By linearity of expectation.

In particular, for every . Unfortunately preserving cuts in expectation is not good enough. We would like to say that, with high probability, *every* cut’s weight is is close to its expectation. This is a statement about concentration, so the Chernoff bound seems like a natural tool to try.

** 1.2. An analysis that doesn’t work **

Consider some cut . The weight of that cut after sampling is the random variable

where is a Bernoulli random variable indicating whether edge was sampled during the th round of sampling. The Chernoff bound is designed for analyzing sums of independent Bernoullis, so it seems that we are in great shape.

But there is a problem: the coefficients . The Chernoff bound works for any sum of independent random variables taking values in . Unfortunately we have these coefficients that can be quite large (e.g., ), so a straightforward application of Chernoff will not work.

Actually it is not really a problem that the coefficients are *big*, but it is a problem that they could be *wildly different*. Consider the example:

where the ‘s are independent Bernoulli random variable. Even though there are these large coefficients, we can still analyze with the Chernoff bound because it is simply times the random variable , which has no coefficients. On the other hand, consider the example:

where are Bernoulli random variables that are with probability . Then , but . The Chernoff bound cannot directly give useful tail bounds on because it is designed to show that the probability of being times larger than the expectation is *exponentially decreasing* in , and that is simply not true for .

** 1.3. Analysis trick: group edges by connectivity **

To handle the different weights, we will partition the edges into groups whose connectivities are roughly the same. Formally, we set where

Now instead of proving concentration for , we will prove concentration for , because all edges in have nearly the same coefficients .

So let be some set of the form . We will call such a set a **cut-induced set**. Note that there could be some other cut such that . We’re going to focus on the *smallest* such cut, because we want the sampling error of to be small relative to , and that is hardest when is small. So define

This definition is a bit hard to understand, so we illustrate it with the following example. The set is a cut-induced subset of because . But we also have . The smallest cut that induces is , so we have .

We can use Chernoff bounds to prove the following concentration bound for .

The proof is just a calculation, so we save it to Section 2.

** 1.4. The Main Theorem **

Our main theorem is:

Theorem 2Let be a graph with . Then with probability at least , our sampling process will produce weights satisfying (1) and with only non-zero entries.

By a slightly more careful analysis one can improve the to . Instead of edge connectivities, if we use a slightly different quantity to determine the importance of an edge, the can be improved to . And by a completely different technique, the can be removed entirely!

To prove our theorem we need the following result which we stated last time, and which follows from a variant of the contraction algorithm.

Theorem 3Let be a graph. Let be arbitrary and let . Then, for every real ,

We also need the following fact:

*Proof:* (of Theorem 2). We will set .

The number of non-zeros is easy to analyze. Let be the indicator random variable that is if edge is sampled in round , so . The number of non-zero weights in is at most . So the expected number of non-zero weights is at most , by Fact 1. By Markov’s inequality, the number of non-zero weights is at most four times that with probability at least .

Now we must show that the cuts are concentrated. We will show that, with high probability, every cut-induced set satisfies

If this is true, then for every cut we have

where the first inequality comes from the triangle inequality, the second comes from (2), and the third follows since is the size of the smallest cut that induces . This proves our desired inequality (1).

So it remains to show (2). Fix any . Let be all the cut-induced subsets of , ordered such that . Define

by Claim 2. We will use a union bound to show that all of the ‘s are concentrated. The trouble is that there can be exponentially many of them, so the argument needs to be a bit clever.

Consider the first cut-induced sets . Since every edge belongs to we have . This means that every cut containing that edge has size at least , and therefore . Plugging this into (3), we get

So .

Now we consider the remaining cut-induced sets for . We now apply Theorem 3 with , which means we can take . The theorem states that, for any ,

So for any , the number of with is less than . By our ordering of the ‘s, we must have . Substituting we get . Plugging into (3) we get

Thus, summing over all ,

So, by a union bound, the probability that any cut-induced subset of violates (2) is at most

That analysis is for a particular . Applying a union bound over all , the total failure probability is at most .