In this lecture we discuss the topic of derandomization — converting a randomized algorithm into a deterministic one.
1. Method of Conditional Expectations
One of the simplest methods for derandomizing an algorithm is the “method of conditional expectations”. In some contexts this is also called the “method of conditional probabilities”.
Let us start with a simple example. Let denote . Suppose is an random variable taking values in . Let be any function and suppose . How can we find an such that ? Well, the assumption guarantees that there exists with . So we can simply use exhaustive search to try all possible values for in only time. The same idea can also be used to find an with .
Now let’s make the example a bit more complicated. Suppose are independent random variables taking values in . Let be any function and suppose . How can we find a vector with ? Exhaustive search is again an option, but now it will take time, which might be too much.
The method of conditional expectations gives a more efficient solution, under some additional assumptions. Suppose that for any numbers we can efficiently evaluate
(If you prefer, you can think of this as , which is a conditional expectation of . This is where the method gets its name.) Then the following algorithm will produce a point with .
- Set .
- Set .
- EndFirst we claim that the algorithm will terminate (i.e., the repeat loop will eventually succeed). To see this, define
Just like in our simple example above, there exists an with , so we can find such an by exhaustive search. That is exactly what the repeat loop is doing.
1.1. Example: Max Cut
To illustrate this method, let us consider our algorithm for the Max Cut problem from Lecture 1. We are given a graph . Recall that this algorithm generates a cut simply by picking a set uniformly at random. Equivalently, for each vertex , the algorithm independently flips a fair coin to decide whether to put . We argued that .
We will use the method of conditional expectations to derandomize this algorithm. Let the vertex set of the graph be . Let
Let be independent random variables where each is or with probability . We identify the event “” with the event “vertex ”. Then . We wish to deterministically find values for which .
To apply the method of conditional probabilities we must be able to efficiently compute
for any numbers . What is this quantity? It is the expected number of edges cut when we have already decided which vertices amongst belong to , and the remaining vertices are placed in randomly (independently, with probability ). This expectation is easy to compute! For any edge with both endpoints in we already know whether it will be cut or not. Every other edge has probability exactly of being cut. So we can compute that expected value in linear time.
In conclusion, the method of condition expectations gives us a deterministic, polynomial time algorithm outputting a set with .
2. Method of Pessimistic Estimators
So far we have derandomized our very simple Max Cut algorithm, which doesn’t use any sophisticated probabilistic tools. Next we will see what happens when we try to apply these ideas to algorithms that use the Chernoff bound.
Let be independent random variables in . Define the function as follows:
which is the typical sort of quantity to which one would apply a Chernoff bound.
Can we apply the method of conditional expectations to this function ? For any numbers , we need to efficiently evaluate
Unfortunately, computing this is not so easy. If the ‘s were i.i.d. Bernoullis then we could compute that probability by expanding it in terms of binomial coefficients. But in the non-i.i.d. or non-Bernoulli case, there does not seem to be an efficient way to compute this probability.
Here is the main idea of “pessimistic estimators”: instead of defining to be equal to that probability, we will define to be an easily-computable upper-bound on that probability. Because is an upper bound on the probability of the bad event “”, the function is called a pessimistic estimate of that probability. So what upper bound should we use? The Chernoff bound, of course!
Important Remark: This step holds for any joint distribution on the ‘s, including any non-independent or conditional distribution. This is because we have only used exponentiation and Markov’s inequality, which need no assumptions on the distribution.
We will use the upper bound in (1) to define our function . Specifically, define
This expectation is easy to compute in linear time, assuming we know the distribution of each (i.e., we know that ).
Applying the method of conditional expectations to the pessimistic estimator: Now we’ll see how to use this function to find with . Set , and . We have
where the first inequality is from (1) and the second inequality comes from the remainder of our Chernoff bound proof. Suppose and are such that this last quantity is strictly less than . Then we know that there exists a vector with .
We now explain how to efficiently and deterministically find such a vector. The method of conditional expectation will give us a vector for which . We now apply the same argument as in (1) to a conditional distribution:
But, under the conditional distribution “”, there is no randomness remaining. The sum is not a random variable; it is simply the number . Since the event “” has probability less than , it must have probability . In other words, we must have .
This example is actually quite silly. If we want to achieve , the best thing to do is obviously to set each . But the method is useful because we can apply it in more complicated scenarios that involve multiple Chernoff bounds.
2.1. Congestion Minimization
In Lecture 3 we gave a randomized algorithm which gives a approximation to the congestion minimization problem. We now get a deterministic algorithm by the method of pessimistic estimators.
Recall that an instance of the problem consists of a directed graph with and a sequence of pairs of vertices. We want to find – paths such that each arc is contained in few paths. Let be the set of all paths in from to . For every path , we create a variable .
We obtain a fractional solution to the problem by solving this LP.
Let be the optimal value of the LP.
We showed how randomized rounding gives us an integer solution (i.e., an actual set of paths). The algorithm chooses exactly one path from by setting with probability . For every arc let be the indicator of the event “”. Then the congestion on arc is . We showed that . Let . We applied Chernoff bounds to every arc and a union bound to show that
We will derandomize that algorithm with the function
How did we obtain this function? For each arc we applied a Chernoff bound, so each arc has a pessimistic estimator as in (2). We add all of those functions to give us this function .
Applying the method of conditional expectations, we can find a vector of paths for which . Thus,
Under that conditional distribution there is no randomness left, so the event “any has ” must have probability . So, if we choose the paths then every arc has congestion at most , as desired.