Lecture 16: Derandomization: Method of Conditional Expectations, Method of Pessimistic Estimators

In this lecture we discuss the topic of derandomization — converting a randomized algorithm into a deterministic one.

1. Method of Conditional Expectations

One of the simplest methods for derandomizing an algorithm is the “method of conditional expectations”. In some contexts this is also called the “method of conditional probabilities”.

Let us start with a simple example. Let {[k]} denote {\{1,\ldots,k\}}. Suppose {X} is an random variable taking values in {[k]}. Let {f : [k] \rightarrow {\mathbb R}} be any function and suppose {{\mathrm E}[ f(X) ] \leq \mu}. How can we find an {x \in \{1,\ldots,k\}} such that {f(x) \leq \mu}? Well, the assumption {{\mathrm E}[ f(X) ] \leq \mu} guarantees that there exists {x \in \{1,\ldots,k\}} with {f(x) \leq \mu}. So we can simply use exhaustive search to try all possible values for {x} in only {O(k)} time. The same idea can also be used to find an {x} with {f(x) \geq \mu}.

Now let’s make the example a bit more complicated. Suppose {X_1,\ldots,X_n} are independent random variables taking values in {[k]}. Let {f : [k]^n \rightarrow {\mathbb R}} be any function and suppose {{\mathrm E}[ f(X_1,\ldots,X_n) ] \leq \mu}. How can we find a vector {(x_1,\ldots,x_n) \in [k]^n} with {f(x_1,\ldots,x_n) \leq \mu}? Exhaustive search is again an option, but now it will take {O(k^n)} time, which might be too much.

The method of conditional expectations gives a more efficient solution, under some additional assumptions. Suppose that for any numbers {(x_1,\ldots,x_i)} we can efficiently evaluate

\displaystyle {\mathrm E}_{X_{i+1},\ldots,X_n}[\: f(x_1,\ldots,x_i,X_{i+1},\ldots,X_n) \:].

(If you prefer, you can think of this as {{\mathrm E}[~ f(X_1,\ldots,X_n) ~|~ X_1=x_1,\ldots,X_i=x_i ~]}, which is a conditional expectation of {f}. This is where the method gets its name.) Then the following algorithm will produce a point {x_1,\ldots,x_n} with {f(x_1,\ldots,x_n) \leq \mu}.

  • For {i=1,\ldots,n}
  • Set {x_i=0}.
  • Repeat
  • Set {x_i = x_i + 1}.

 

  • Until {{\mathrm E}[\: f(x_1,\ldots,x_i,X_{i+1},\ldots,X_n) \:] \leq {\mathrm E}[\: f(x_1,\ldots,x_{i-1},X_i,\ldots,X_n) \:]}
  • EndFirst we claim that the algorithm will terminate (i.e., the repeat loop will eventually succeed). To see this, define

    \displaystyle g(y) ~=~ {\mathrm E}_{X_{i+1},\ldots,X_n}[\: f(x_1,\ldots,x_{i-1},y,X_{i+1},\ldots,X_n) \:].

    Just like in our simple example above, there exists an {x_i} with {g(x_i) \leq {\mathrm E}_{X_i}[ g(X_i) ]}, so we can find such an {x_i} by exhaustive search. That is exactly what the repeat loop is doing.

    1.1. Example: Max Cut

    To illustrate this method, let us consider our algorithm for the Max Cut problem from Lecture 1. We are given a graph {G=(V,E)}. Recall that this algorithm generates a cut {\delta(U)} simply by picking a set {U \subseteq V} uniformly at random. Equivalently, for each vertex {v \in V}, the algorithm independently flips a fair coin to decide whether to put {v \in U}. We argued that {{\mathrm E}[|\delta(U)|] \geq |E|/2}.

    We will use the method of conditional expectations to derandomize this algorithm. Let the vertex set of the graph be {V = \{1,\ldots,n\}}. Let

    \displaystyle f(x_1,\ldots,x_n) = |\delta(U)| \qquad\text{where}\qquad U = \{\: i \::\: x_i=1 \}.

    Let {X_1,\ldots,X_n} be independent random variables where each {X_i} is {0} or {1} with probability {1/2}. We identify the event “{X_i=1}” with the event “vertex {i \in U}”. Then {{\mathrm E}[ f(X_1,\ldots,X_n) ] = {\mathrm E}[ |\delta(U)| ] = |E|/2}. We wish to deterministically find values {x_1,\ldots,x_n} for which {f(x_1,\ldots,x_n) \geq |E|/2}.

    To apply the method of conditional probabilities we must be able to efficiently compute

    \displaystyle {\mathrm E}_{X_{i+1},\ldots,X_n}[\: f(x_1,\ldots,x_i,X_{i+1},\ldots,X_n) \:],

    for any numbers {(x_1,\ldots,x_i)}. What is this quantity? It is the expected number of edges cut when we have already decided which vertices amongst {\{1,\ldots,i\}} belong to {U}, and the remaining vertices {\{i+1,\ldots,n\}} are placed in {U} randomly (independently, with probability {1/2}). This expectation is easy to compute! For any edge with both endpoints in {\{1,\ldots,i\}} we already know whether it will be cut or not. Every other edge has probability exactly {1/2} of being cut. So we can compute that expected value in linear time.

    In conclusion, the method of condition expectations gives us a deterministic, polynomial time algorithm outputting a set {U} with {|\delta(U)| \geq |E|/2}.

    2. Method of Pessimistic Estimators

    So far we have derandomized our very simple Max Cut algorithm, which doesn’t use any sophisticated probabilistic tools. Next we will see what happens when we try to apply these ideas to algorithms that use the Chernoff bound.

    Let {X_1,\ldots,X_n} be independent random variables in {[0,1]}. Define the function {f} as follows:

    \displaystyle f(x_1,\ldots,x_n) ~=~ \begin{cases} 1 &\qquad\mathrm{if}~ \sum_i x_i \geq \alpha \\ 0 &\qquad\mathrm{if}~ \sum_i x_i > \alpha \end{cases}.

    So

    \displaystyle {\mathrm E}[f(X_1,\ldots,X_n)] ~=~ {\mathrm{Pr}}[\textstyle \sum_i X_i \geq \alpha ],

    which is the typical sort of quantity to which one would apply a Chernoff bound.

    Can we apply the method of conditional expectations to this function {f}? For any numbers {(x_1,\ldots,x_i)}, we need to efficiently evaluate

    \displaystyle {\mathrm E}_{X_{i+1},\ldots,X_n}[\: f(x_1,\ldots,x_i,X_{i+1},\ldots,X_n) \:] ~=~ {\mathrm{Pr}}[~ \textstyle \sum_i X_i \geq \alpha ~|~ X_1=x_1,\ldots,X_i=x_i ~].

    Unfortunately, computing this is not so easy. If the {X_i}‘s were i.i.d. Bernoullis then we could compute that probability by expanding it in terms of binomial coefficients. But in the non-i.i.d. or non-Bernoulli case, there does not seem to be an efficient way to compute this probability.

    Here is the main idea of “pessimistic estimators”: instead of defining {f} to be equal to that probability, we will define {f} to be an easily-computable upper-bound on that probability. Because {f} is an upper bound on the probability of the bad event “{\sum_i X_i \geq \alpha}”, the function {f} is called a pessimistic estimate of that probability. So what upper bound should we use? The Chernoff bound, of course!

    For simplicity, suppose that {X_1,\ldots,X_n} are independent Bernoulli random variables. The first step of the Chernoff bound (exponentiation and Markov’s inequality) shows that, for any parameter {t>0},

    \displaystyle {\mathrm{Pr}}[~ \textstyle \sum_i X_i \geq \alpha ~] ~\leq~ {\mathrm E}\Big[ e^{-t \alpha} \prod_{j=1}^n e^{t X_j} \Big]. \ \ \ \ \ (1)

     

    Important Remark: This step holds for any joint distribution on the {X_i}‘s, including any non-independent or conditional distribution. This is because we have only used exponentiation and Markov’s inequality, which need no assumptions on the distribution.

    We will use the upper bound in (1) to define our function {f}. Specifically, define

    \displaystyle f(x_1,\ldots,x_n) ~=~ e^{-t \alpha} \cdot \prod_{j=1}^n e^{t x_j}. \ \ \ \ \ (2)

     

    Let’s check that the conditional expectations are easy to compute with this new definition of {f}. Given any numbers {(x_1,\ldots,x_i)}, we have

    \displaystyle \begin{array}{rcl} {\mathrm E}_{X_{i+1},\ldots,X_n}[\: f(x_1,\ldots,x_i,X_{i+1},\ldots,X_n) \:] &=& e^{-t \alpha} \cdot \prod_{j=1}^i e^{t x_j} ~\cdot~ {\mathrm E}_{X_{i+1},\ldots,X_n}\Bigg[\: \prod_{j=i+1}^n e^{t X_j} \:\Bigg] \\ &=& e^{-t \alpha} \cdot \prod_{j=1}^i e^{t x_j} ~\cdot~ \prod_{j=i+1}^n {\mathrm E}_{X_j}\big[\: e^{t X_j} \:\big]. \end{array}

    This expectation is easy to compute in linear time, assuming we know the distribution of each {X_j} (i.e., we know that {{\mathrm{Pr}}[ X_j=1 ] = p_j}).

    Applying the method of conditional expectations to the pessimistic estimator: Now we’ll see how to use this function {f} to find {(x_1,\ldots,x_n)} with {\sum_i x_i < \alpha}. Set {\mu = \sum_i {\mathrm E}[X_i]}, {\alpha = (1+\delta) \mu} and {t = \ln(1+\delta)}. We have

    \displaystyle {\mathrm{Pr}}[ \textstyle \sum_i X_i > (1+\delta) \mu ] ~\leq~ {\mathrm E}[ f(X_1,\ldots,X_n) ] ~\leq~ \exp\Big( - \mu \big( (1+\delta) \ln(1+\delta) - \delta\big) \Big),

    where the first inequality is from (1) and the second inequality comes from the remainder of our Chernoff bound proof. Suppose {\mu} and {\delta} are such that this last quantity is strictly less than {1}. Then we know that there exists a vector {(x_1,\ldots,x_n)} with {\sum_i x_i < \alpha}.

    We now explain how to efficiently and deterministically find such a vector. The method of conditional expectation will give us a vector {(x_1,\ldots,x_n)} for which {f(x_1,\ldots,x_n) < 1}. We now apply the same argument as in (1) to a conditional distribution:

    \displaystyle \begin{array}{rcl} {\mathrm{Pr}}[~ \textstyle \sum_i X_i \geq (1+\delta) \mu ~|~ X_1=x_1,\ldots,X_n=x_n ~] &\leq& {\mathrm E}\Big[ f(X_1,\ldots,X_n) ~|~ X_1=x_1,\ldots,X_n=x_n \Big] \\ &=& f(x_1,\ldots,x_n) ~<~ 1. \end{array}

    But, under the conditional distribution “{X_1=x_1,\ldots,X_n=x_n}”, there is no randomness remaining. The sum {\sum_i X_i} is not a random variable; it is simply the number {\sum_i x_i}. Since the event “{\sum_i x_i \geq (1+\delta) \mu}” has probability less than {1}, it must have probability {0}. In other words, we must have {\sum_i x_i < (1+\delta) \mu}.

    This example is actually quite silly. If we want to achieve {\sum_i x_i < \alpha}, the best thing to do is obviously to set each {x_i = 0}. But the method is useful because we can apply it in more complicated scenarios that involve multiple Chernoff bounds.

    2.1. Congestion Minimization

    In Lecture 3 we gave a randomized algorithm which gives a {O(\log n / \log \log n)} approximation to the congestion minimization problem. We now get a deterministic algorithm by the method of pessimistic estimators.

    Recall that an instance of the problem consists of a directed graph {G=(V,A)} with {n = |V|} and a sequence {(s_1,t_1), \ldots, (s_k,t_k)} of pairs of vertices. We want to find {s_i}{t_i} paths such that each arc {a} is contained in few paths. Let {{\mathcal P}_i} be the set of all paths in {G} from {s_i} to {t_i}. For every path {P \in {\mathcal P}_i}, we create a variable {x_P^i}.

    We obtain a fractional solution to the problem by solving this LP.

    \displaystyle \begin{array}{llll} \mathrm{min} & C && \\ \mathrm{s.t.} & {\displaystyle \sum_{P \in {\mathcal P}_i}} x_P^i &= 1 &\qquad\forall i=1,\ldots,k \\ & {\displaystyle \sum_i ~~ \sum_{P \in {\mathcal P}_i \mathrm{~with~} a \in P}} x_P^i &\leq C &\qquad\forall a \in A \\ & x_P^i &\geq 0 &\qquad\forall i=1,\ldots,k \mathrm{~and~} P \in {\mathcal P}_i \end{array}

    Let {C^*} be the optimal value of the LP.

    We showed how randomized rounding gives us an integer solution (i.e., an actual set of paths). The algorithm chooses exactly one path {P_i} from {{\mathcal P}_i} by setting {P_i = P} with probability {x^i_P}. For every arc {a} let {Y_i^a} be the indicator of the event “{a \in P_i}”. Then the congestion on arc {a} is {Y^a = \sum_i Y_i^a}. We showed that {{\mathrm E}[ Y^a ] \leq C^*}. Let {\alpha = 6 \log n / \log \log n}. We applied Chernoff bounds to every arc and a union bound to show that

    \displaystyle {\mathrm{Pr}}[~ \mathrm{any} ~a~ \mathrm{has}~ Y^a > \alpha C^* ~] ~\leq~ \sum_{a \in A} {\mathrm{Pr}}[~ Y^a > \alpha C^* ~] ~\leq~ 1/n.

    We will derandomize that algorithm with the function

    \displaystyle f(P_1,\ldots,P_k) ~=~ \sum_{a \in A} e^{-t \alpha} \cdot \prod_{j=1}^k e^{t Y_j^a}.

    How did we obtain this function? For each arc {a} we applied a Chernoff bound, so each arc {a} has a pessimistic estimator as in (2). We add all of those functions to give us this function {f}.

    Note that

    \displaystyle \textstyle {\mathrm{Pr}}[~ \mathrm{any} ~a~ \mathrm{has}~ Y^a > \alpha C^* ~] ~\leq~ \sum_{a \in A} {\mathrm{Pr}}[~ Y^a > \alpha C^* ~] ~\leq~ {\mathrm E}[ f(P_1,\ldots,P_k) ] ~\leq~ 1/n.

    Applying the method of conditional expectations, we can find a vector of paths {(p_1,\ldots,p_k)} for which {f(p_1,\ldots,p_k) \leq 1/n}. Thus,

    \displaystyle \begin{array}{rcl} {\mathrm{Pr}}[~ \mathrm{any} ~a~ \mathrm{has}~ Y^a > \alpha C^* \:|\: P_1=p_1,\ldots,P_k=p_k ~] &\leq& {\mathrm E}[~ f(P_1,\ldots,P_k) \:|\: P_1=p_1,\ldots,P_k=p_k ~] \\ &=& f(p_1,\ldots,p_k) ~\leq~ 1/n. \end{array}

    Under that conditional distribution there is no randomness left, so the event “any {a} has {Y^a > \alpha C^*}” must have probability {0}. So, if we choose the paths {p_1,\ldots,p_k} then every arc {a} has congestion at most {\alpha C^*}, as desired.

 

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Lecture 16: Derandomization: Method of Conditional Expectations, Method of Pessimistic Estimators

  1. Anver Hisham says:

    Possible error in last equation in section 1 (before section 1.1)

    I couldn’t understand what is x_{x-1}, in this??

    \displaystyle g(y) ~=~ {\mathrm E}_{X_{i+1},\ldots,X_n}[\: f(x_1,\ldots,x_{x-1},y,X_{i+1},\ldots,X_n) \:].

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s