## Lecture 12: Concentration for sums of random matrices

1. Concentration for sums of random matrices

Let ${X}$ be a random real matrix of size ${d \times d}$. In other words, we have some probability distribution on the space of all ${d \times d}$ matrices, and we let ${X}$ be a matrix obtained by sampling from that distribution. Alternatively, we can think of ${X}$ as a matrix whose entries are real-valued random variables (that are not necessarily independent).

As usual, the expectation of ${X}$ is simply the weighted average of the possible matrices that ${X}$ could be, i.e., ${{\mathrm E}[X] = \sum_A {\mathrm{Pr}}[ X=A ] \cdot A}$. Alternatively, we can think of ${{\mathrm E}[X]}$ as matrix whose entries are the expectations of the entries of ${X}$.

Many concentration results are known for matrices whose entries are independent random variables from certain real-valued distributions (e.g., Gaussian, subgaussian, etc.) In fact, in Lecture 8 on Compressed Sensing, we proved concentration of the singular values of a matrix whose entries are independent Gaussians. In this lecture, we will look at random matrices whose entries are not independent, and we will obtain concentration results by summing multiple independent copies of those matrices.

1.1. The Ahlswede-Winter Inequality

The Chernoff bound is a very powerful tool for proving concentration for sums of independent, real-valued random variables. Today we will prove the Ahlswede-Winter inequality, which is a generalization of the Chernoff bound for proving concentration for sums of independent, matrix-valued random variables.

Let ${X_1, \ldots, X_k}$ be random, independent, symmetric matrices of size ${d \times d}$. Define the partial sums ${S_j = \sum_{i=1}^j X_i}$. We would like to analyze the probability that all eigenvalues of ${S_k}$ are at most ${t}$ (i.e., ${S_k \preceq tI}$). For any ${\lambda > 0}$, this is equivalent to all eigenvalues of ${e^{\lambda S_k}}$ being at most ${e^{\lambda t}}$ (i.e., ${e^{\lambda S_k} \preceq e^{\lambda tI}}$). If this event fails to hold then then certainly ${\mathop{\mathrm{tr}}\,{e^{\lambda S_k}} > e^{\lambda t}}$, since all eigenvalues of ${e^{\lambda S_k}}$ are non-negative. Thus we have bounded the probability that some eigenvalue of ${S_k}$ is greater than ${t}$ as follows:

$\displaystyle {\mathrm{Pr}}[ S_k \not\preceq tI ] ~\leq~ {\mathrm{Pr}}[ \mathop{\mathrm{tr}}\,{e^{\lambda S_k}} > e^{\lambda t} ] ~\leq~ {\mathrm E}[ \mathop{\mathrm{tr}}\,{e^{\lambda S_k}} ] / e^{\lambda t}, \ \ \ \ \ (1)$

by Markov’s inequality.

Now let us observe a useful property of the trace. Since it is linear, it commutes with expectation:

$\displaystyle \begin{array}{rcl} {\mathrm E}[ \mathop{\mathrm{tr}}\, X ] &=& \sum_{A} {\mathrm{Pr}}[ X=A ] \cdot \sum_i A_{i,i} ~=~ \sum_i \sum_{A} {\mathrm{Pr}}[ X=A ] \cdot A_{i,i} \\ &=& \sum_i \sum_{a} {\mathrm{Pr}}[ X_{i,i}=a ] \cdot a ~=~ \sum_i {\mathrm E}[ X_{i,i} ] ~=~ \mathop{\mathrm{tr}}\,\big( {\mathrm E}[X] \big). \end{array}$

The proof of the Ahlswede-Winter inequality is very similar to the proof of the Chernoff bound; one just has to be a bit careful to do the matrix algebra properly. As in the proof of the Chernoff bound, the main technical step is to bound the expectation in (1) by a product of expectations that each involve a single ${X_i}$, because those individual expectations are much easier to analyze. This is where the Golden-Thompson inequality (Theorem 17 in the Notes on Symmetric Matrices) is needed.

$\displaystyle \begin{array}{rcl} {\mathrm E}[ \mathop{\mathrm{tr}}\,{e^{\lambda S_k}} ] &=& \mathrm{~} {\mathrm E}[ \mathop{\mathrm{tr}}\,{e^{\lambda X_k + \lambda S_{k-1}}} ] \qquad(\mathrm{since}~ S_k = X_k + S_{k-1})\\ ~ &\leq& \mathrm{~} {\mathrm E}[ \mathop{\mathrm{tr}}\,(e^{\lambda X_k} \cdot e^{\lambda S_{k-1}}) ] \qquad(\mathrm{by~Golden-Thompson})\\ ~ &=& \mathrm{~} {\mathrm E}_{X_1,...,X_{k-1}}\big[ {\mathrm E}_{X_k}[ \mathop{\mathrm{tr}}\,(e^{\lambda X_k} \cdot e^{\lambda S_{k-1}}) ] \big] \qquad(\mathrm{by~independence}) \\ ~ &=& \mathrm{~} {\mathrm E}_{X_1,...,X_{k-1}}\Big[ \mathop{\mathrm{tr}}\,\big( {\mathrm E}_{X_k}[ e^{\lambda X_k} \cdot e^{\lambda S_{k-1}} ] \big) \Big] \qquad(\mathrm{trace~and~expectation~commute})\\ ~ &=& \mathrm{~} {\mathrm E}_{X_1,...,X_{k-1}}\Big[ \mathop{\mathrm{tr}}\,\big( {\mathrm E}_{X_k}[ e^{\lambda X_k} ] \cdot e^{\lambda S_{k-1}} \big) \Big] \qquad (X_k ~\mathrm{and}~ S_{k-1} ~\mathrm{are~independent}) \\ ~ &\leq& \mathrm{~} {\mathrm E}_{X_1,...,X_{k-1}}\big[ \lVert {\mathrm E}_{X_k}[ e^{\lambda X_k} ] \rVert \cdot \mathop{\mathrm{tr}}\, e^{\lambda S_{k-1}} \big] \\ ~ &=& \mathrm{~} \lVert {\mathrm E}_{X_k}[ e^{\lambda X_k} ] \rVert \cdot {\mathrm E}_{X_1,...,X_{k-1}}[ \mathop{\mathrm{tr}}\, e^{\lambda S_{k-1}} ], \end{array}$

where the last inequality follows from Corollary 14 in the Notes on Symmetric Matrices. Applying this inequality inductively, we get

$\displaystyle {\mathrm E}[ \mathop{\mathrm{tr}}\,{e^{\lambda S_k}} ] ~\leq~ \prod_{i=1}^k \lVert {\mathrm E}[ e^{\lambda X_i} ] \rVert \cdot \mathop{\mathrm{tr}}\, e^{\lambda 0} ~=~ d \cdot \prod_{i=1}^k \lVert {\mathrm E}[ e^{\lambda X_i} ] \rVert,$

since ${e^{\lambda 0} = I}$ and ${\mathop{\mathrm{tr}}\, I = d}$. Combining this with (1), we obtain

$\displaystyle {\mathrm{Pr}}[ S_k \not \preceq tI ] ~\leq~ d e^{-\lambda t} \prod_{i=1}^k \lVert {\mathrm E}[ e^{\lambda X_i} ] \rVert.$

We can also bound the probability that any eigenvalue of ${S_k}$ is less than ${-t}$ by applying the same argument to ${-S_k}$. This shows that the probability that any eigenvalue of ${S_k}$ lies outside ${[-t,t]}$ is

$\displaystyle {\mathrm{Pr}}[ \lVert S_k \rVert > t ] ~\leq~ d e^{-\lambda t} \Bigg( \prod_{i=1}^k \lVert {\mathrm E}[ e^{\lambda X_i} ] \rVert + \prod_{i=1}^k \lVert {\mathrm E}[ e^{-\lambda X_i} ] \rVert \Bigg). \ \ \ \ \ (2)$

This is the basic inequality. Much like the Chernoff bound, there are many variations. We will see some next time.