1. Example: Balls and Bins
Last time we proved the Chernoff bound, the most useful analysis tool we will encounter in this course. Today we illustrate the power of the Chernoff bound by using it to analyze a fundamental balls-and-bins problem. But first of all we introduce the other (almost trivial) tool which is often used in conjunction with the Chernoff bound.
Proof: Mitzenmacher and Upfal, Lemma 1.2.
Many interesting problems can be modeled using simple problems involving balls and bins. Today we are interested in the bin which has the most balls. Suppose there are bins. We repeat the following experiment times: throw a ball into a uniformly chosen bin. (The experiments are mutually independent.) Let be the number of balls in bin . What is ?
Theorem 2 Let . Then with probability tending to as .
This theorem is optimal up to constant factors. It is known that with probability at least . (See, e.g., Lemma 5.12 in Mitzenmacher-Upfal.)
Proof: Let us focus on the first bin. Let be indicator random variables where is if the th ball lands in the first bin. Obviously . What is the probability that this bin has more than balls?
We will analyze this using the Chernoff bound. This is possible since is a sum of independent random variables, each of which take values in . Recall that the Chernoff bound says
We apply this to , setting and using . We obtain
Using our usual techniques from Notes on Convexity Inequalities, one can show that
Plugging that in,
So, let be the event that . By Lemma 1,
Thus, with probability at least , all bins have less than balls.
2. Congestion Minimization
One of the classically important areas in algorithm design and combinatorial optimization is network flows. A central problem in that area is the maximum flow problem. We now look at a generalization of this problem.
An instance of the problem consists of a directed graph and a sequence of pairs of vertices. Let . (It is not crucial that the graph be directed; the problem is equally interesting in undirected graphs. However in network flow problems it is often more convenient to look at directed graphs. Feel free to think about whatever variant you find easier.)
A natural question to ask is: do there exist paths from to for every such that these paths share no arcs? This is called the edge-disjoint paths problem. Quite remarkably, it is NP-hard even in the case , assuming the graph is directed. For undirected graphs, it is polynomial time solvable if is a fixed constant, but NP-hard if is a function of .
We will look at a variant of this problem called the congestion minimization problem. The idea is to allow each arc to be used in multiple paths, but not too many. The number of paths using a given arc is the “congestion” of that arc. We say that a solution has congestion if it is a collection of paths from to , where each arc is contained in at most of the paths. The problem is to find the minimum value of such that there is a solution of congestion . This problem is still NP-hard, since determining if is the edge-disjoint paths problem.
We will look at the congestion minimization problem from the point of view of approximation algorithms. Let be the minimum congestion of any solution. We would like to give an algorithm which can produce a solution with congestion at most for some . This factor is the called the approximation factor of the algorithm.
Theorem 3 There is an algorithm for the congestion minimization problem with approximation factor .
To design such an algorithm we will use linear programming. We write down an integer program (IP) which captures the problem exactly, relax that to a linear program (LP), then design a method for “rounding” solutions of the LP into solutions for the IP.
The Integer Program. Writing an IP formulation of an optimization problem is usually quite simple. That is indeed true for the congestion minimization problem. However, we will use an IP which you might find rather odd: our IP will have exponentially many variables. This will simplify our explanation of the rounding.
Let be the set of all paths in from to . (Note that may be exponential in .) For every path , we create a variable . This variable will take values only in , and setting it to corresponds to including the path in our solution.
The integer program is as follows
The last constraint says that we must decide for every path whether or not to include it in the solution. The first constraint says that the solution must choose exactly one path between each pair and . The second constraint ensures that the number of paths using each arc is at most . The optimization objective is to find the minimum value of such that a solution exists.
Every solution to the IP corresponds to a solution for the congestion minimization problem with congestion , and vice-versa. Thus the optimum value of the IP is , which we previously defined to be the minimum congestion of any solution to the original problem.
The Linear Program. The integer program is NP-hard to solve, so we “relax” it into a linear program. This amounts to replacing the integrality constraints with non-negativity constraints. The resulting linear program is:
This LP can be solved in time polynomial in the size of , even though its number of variables is exponential in the size of . This can be done either by the ellipsoid method or by finding a “compact formulation” of the LP which uses fewer variables (much like the usual LP that you have probably seen for the ordinary maximum flow problem).
So, without going into details, our algorithm will solve this LP and obtain a solution where the number of non-zero variables is only polynomial in the size of . Let be the optimum value of the LP.
Proof: The LP was obtained from the IP by removing constraints. Therefore any feasible solution for the IP is also feasible for the LP. In particular, the optimal solution for the IP is feasible for the LP. So the LP has a solution with objective value equal to .
The Rounding. Our algorithm will solve the LP and most likely obtain a “fractional” solution — a solution with some non-integral variables, which is therefore not feasible for the IP. The next step of the algorithm is to “round” that fractional solution into a solution which is is feasible for the IP. In doing so, the congestion might increase, but we will ensure that it does not increase too much.
The technique we will use is called randomized rounding. For each each , we randomly choose exactly one path by setting with probability . (The LP ensures that these are indeed probabilities: they are non-negative and sum up to 1.) The algorithm outputs the chosen paths .
Analysis. All that remains is to analyze the congestion of these paths. Let be the indicator random variable that is if and otherwise. Let be the congestion on arc . The expected value of is easy to analyze:
where the inequality comes from the LP’s second constraint. (Recall we assume that the fractional solution is optimal for the LP, and therefore .)
The Chernoff bound says, if is a sum of independent random variables each of which take values in , and is an upper bound on , then
We apply this to , taking and . Following the argument in Section~1,
We now use a union bound to analyze the probability of any arc having congestion greater than .
since the graph has at most arcs. So, with probability at least , the algorithm produces a solution for which every arc has congestion at most , which is at most by Claim 1. So our algorithm has approximation factor .
Further Remarks. The rounding algorithm that we presented is actually optimal: there are graphs for which . Consequently, every rounding algorithm which converts a fractional solution of LP to an integral solution of IP must necessarily incur an increase of in the congestion.
That statement does not rule out the possibility that there is a better algorithm which behaves completely differently (i.e., one which does not use IP or LP at all). But sadly it turns out that there is no better algorithm (for the case of directed graphs). It is known that every efficient algorithm must have approximation factor , assuming a reasonable complexity theoretic conjecture (). So the algorithm that we presented is optimal, up to constant factors.
3. The Negative Binomial Distribution
The negative binomial distribution is perhaps the probability distribution with the worst public relations. It shows up in many different randomized algorithms, but it is not taught or covered in textbooks as much as it should be.
There are a few ways to define this distribution. We adopt the following definition. There are two parameters, and . Suppose we perform a sequence of independent Bernoulli trials, each succeeding with probability . Let be the number of trials performed until we see the th success. Then is said to have the negative binomial distribution.
Note that this is quite different from the usual binomial distribution. For example, if is a binomial random variable with parameters and then the value of is always at most . In contrast, has positive probability of taking any integer value larger than or equal to . Nevertheless, there is a relationship between the tails of and . The following claim is quite useful, although often not stated explicitly in the literature.
Claim 2 Let be a random variable distributed according to the negative binomial distribution with parameters and . Let be a random variable distributed according to the binomial distribution with parameters and . Then .
Informally, this is quite easy to see. The event is the event that, after performing trials, we still have not seen successes. And that is also the event . That argument is not completely formal because the sample spaces of and are not the same. The following argument explains the connection in more detail.
Proof: Let . We need the following facts about the sample space of the negative binomial distribution.
- Probability of an elementary event.The sample space underlying the negative binomial distribution consists of all finite sequences of successes and failures with exactly successes and any number of failures, where the last outcome must be a success. For any such sequence, its probability in the sample space is . One can check that these probabilities sum up to using properties of binomial coefficients generalized to negative numbers.
- Probability of seeing a prefix. For any sequence with successes and any number of failures, the probability that gives the outcomes of the first trials is . The proof of this is very similar to the proof of the previous property.
The event consists of all sequences with successes and failures, where the last outcome is a success. To compute the total probability of this event, we can partition these sequences into groups where the members of each group all have the same prefix of length (i.e., the same outcomes in the first trials). For any group with successes in the first trials, the total probability of that group is , by the second property given above. Since there are ways to choose the locations of the successes in the trials, we have
An important consequence of the previous claim is that Chernoff bounds give tail bounds on .
Proof: Let have the binomial distribution with parameters and . Note that . By Claim 2,
where the inequality comes from the Chernoff bound.