Sums of random Hermitian matrices and an inequality by Rudelson

Roberto Imbuzeiro Oliveira

Introduction

This note mainly deals with estimates for the operator norm $\|Z_{n}\|$ of random sums

of deterministic Hermitian matrices $A_{1},\dots,A_{n}$ multiplied by random coefficients. Recall that a Rademacher sequence is a sequence $\{\epsilon_{i}\}_{i=1}^{n}$ of i.i.d. random variables with $\epsilon_{1}$ uniform over $\{-1,+1\}$ . A standard Gaussian sequence is a sequence i.i.d. standard Gaussian random variables. Our main goal is to prove the following result.

for some universal $C>0$ , whenever the RHS of the above inequality is at most $1$ . This important result has been applied to several different problems, such as bringing a convex body to near-isotropic position ; the analysis of for low-rank approximations of matrices and graph sparsification ; estimating of singular values of matrices with independent rows ; analysing compressive sensing ; and related problems in Harmonic Analysis .

The key ingredient of the original proof of Theorem 1 is a non-commutative Khintchine inequality by Lust-Picard and Pisier . This states that there exists a universal $c>0$ such that for all $Z_{n}$ as in the Theorem, all $p\geq 1$ and all $d\times d$ matrices $\{B_{i},D_{i}\}_{i=1}^{n}$ with $B_{i}+D_{i}=A_{i}$ , $1\leq i\leq n$ ,

where $\|\cdot\|_{S^{p}}$ denotes the $p$ -th Schatten norm: $\|A\|^{p}_{S^{p}}\equiv{\rm Tr}[(A^{*}A)^{p/2}].$ Unfortunately, the proof of the Lust-Picard/Pisier inequality employs language and tools from non-commutative probability that are rather foreign to most potential users of (2).

This note presents an elementary proof of Theorem 1 that bypasses the above inequality. Our argument is based on an improvement of the methodology created by Ahlswede and Winter in order to prove their operator Chernoff bound, which also has many applications e.g. (the improvement is discussed in Section 3.1). This approach only requires elementary facts from Linear Algebra and Matrix Analysis. The most complicated result that we use is the Golden-Thompspon inequality :

The elementary proof of this classical inequality is sketched in Section 5 below.

We have already noted that Rudelson’s bound (2) follows simply from Theorem 1; see [11, Section 3] for detais. Here we prove a concentration lemma corresponding to that result under the stronger assumption that $|Y_{1}|$ is a.s. bounded. While similar results have appeared in other papers , our proof is simpler and gives explicit (albeit quite large) constants.

whenever $\epsilon(n,M)\leq 1$ . A key feature both of this Lemma is that the ambient dimension $d$ plays no direct role in the bound. In fact, the same result holds for $Y_{i}$ taking values in a separable Hilbert space (as in the last section of ).

To conclude the introduction, we present an open problem: is it possible to improve upon Rudelson’s bound under further assumptions? There is some evidence that the dependence on $\ln(d)$ in the Theorem, while necessary in general [12, Remark 3.4], can sometimes be removed. For instance, Adamczak et al. have improved upon Rudelson’s original application of Theorem 1 to convex bodies, obtaining exactly what one would expect in the absence of the $\sqrt{\log(2d)}$ term. Another setting where our bound is a $\Theta\left(\sqrt{\ln d}\right)$ factor away from optimality is that of more classical random matrices (cf. the end of Section 3.1 below). It would be interesting if one could sharpen the proof of Theorem 1 in order to reobtain these results. [Related issues are raised by Vershynin .]

Preliminaries

Moreover, ${\rm Tr}(A)$ (the trace of $A$ ) is the sum of the eigenvalues of $A$ .

By this we mean that the eigenvalues of $f(A)$ are the numbers $f(\lambda)$ with $\lambda\in{\rm spec}(A)$ . Moreover, the multiplicity of $\xi\in{\rm spec}f(A)$ is the sum of the multiplicities of all preimages of $\xi$ under $f$ that lie in ${\rm spec}(A)$ .

2 The positive-semidefinite order

Moreover, spectral mapping (4) implies that:

We will also need the following simple fact.

Proof: To prove this, assume the LHS and observe that the RHS is equivalent to ${\rm Tr}(C\Delta)\geq 0$ where $\Delta\equiv B-A$ . By assumption, $\Delta\succeq 0$ , hence it has a Hermitian square root $\Delta^{1/2}$ . The cyclic property of the trace implies:

Since the trace is the sum of the eigenvalues, we will be done once we show that $\Delta^{1/2}C\Delta^{1/2}\succeq 0$ . But, since $\Delta^{1/2}$ is Hermitian and $C\succeq 0$ ,

which shows that $\Delta^{1/2}C\Delta^{1/2}\succeq 0$ , as desired. $\Box$

3 Probability with matrices

The definition of expectations implies that traces and expectations commute:

Moreover, one can check that the usual product rule is satisfied:

Proof of Theorem 1

Proof: [of Theorem 1] We wish to control the tail behavior of:

However, $Z_{n}$ and $-Z_{n}$ have the same distribution. It follows that:

The usual Bernstein trick implies that for all $t\geq 0$ ,

The function “ $x\mapsto e^{sx}$ ” is monotone non-decreasing and positive for all $s\geq 0$ . It follows from the spectral mapping property (4) that for all $s\geq 0$ , the largest eigenvalue of $e^{sZ_{n}}$ is $e^{s\lambda_{\max}(Z_{n})}$ and all eigenvalues of $e^{sZ_{n}}$ are non-negative. Using the equality “trace $=$ sum of eigenvalues” implies that for all $s\geq 0$ ,

Up to now, our proof has followed Ahlswede and Winter’s argument. The next lemma, however, will require new ideas.

This lemma is proven below. We will now show how it implies Rudelson’s bound. Let

[The second inequality follows from $\sum_{i=1}^{n}A_{i}^{2}\succeq 0$ , which holds because of (5) and (6).] We note that:

where the equality is yet another application of spectral mapping (4) and the fact that “ $x\mapsto e^{s^{2}x/2}$ ” is monotone increasing. We deduce from the Lemma and (10) that:

Since $0\leq\|Z_{n}\|\leq\sqrt{2\ln(2d)}\sigma+(\|Z_{n}\|-\sqrt{2\ln(2d)}\sigma)_{+},$ this implies the $L^{p}$ estimate in the Theorem. The bound “ $C_{p}\leq c\sqrt{p}$ ” is standard and we omit its proof. $\Box$

Proof: [of Lemma 2] Define $D_{0}\equiv\sum_{i=1}^{n}s^{2}A_{i}^{2}/2$ and

We will prove that for all $1\leq j\leq n$ :

By the monotonicity of the trace (7) and the fact that $\exp\left(D_{j-1}\right)\succeq 0$ (which follows from (4)), we will be done once we show that:

The key fact is that $s\epsilon_{j}A_{j}$ and $-s^{2}A_{j}^{2}/2$ always commute, hence the exponential of the sum is the product of the exponentials. Applying (9) and noting that $e^{-s^{2}A_{j}^{2}/2}$ is constant, we see that:

which implies that $f(A_{j})\preceq I$ . This proves (13) in this case and finishes the proof of (12) and of the Lemma. $\Box$

A direct adaptation of the original argument of Ahlswede and Winter would lead to an inequality of the form:

However, only the second inequality seems to be useful, as there is no obvious relationship between

which is what we would need to proceed with induction. [Note that Golden-Thompson (3) cannot be undone and fails for three summands, .] The best one can do with the second inequality is:

This would give a version of Theorem 1 with $\sum_{i=1}^{n}\|A_{i}\|^{2}$ replacing $\|\sum_{i=1}^{n}A_{i}^{2}\|$ . This modified result is always worse than the actual Theorem, and can be dramatically so. For instance, consider the case of a Wigner matrix where:

with the $\epsilon_{ij}$ i.i.d. standard Gaussian and each $A_{ij}$ has ones at positions $(i,j)$ and $(j,i)$ and zeros elsewhere (we take $d=m$ and $n=\binom{m}{2}$ in this case). Direct calculation reveals:

We note in passing that neither approach is sharp in this case, as $\|\sum_{ij}\epsilon_{ij}A_{ij}\|$ concentrates around $2\sqrt{m}$ .

Concentration for rank-one operators

By Jensen’s inequality, $\phi(2Ms^{2}/n)\leq\phi(s)^{2M^{2}s/n}$ whenever $2M^{2}s/n\leq 1$ , hence (14) implies:

and a few simple calculations. [Notice that $2M^{2}s/n\leq 1/2$ with this choice, hence $1/(1-2M^{2}s/n)\leq 2$ .]

To prove (14), we begin with symmetrization (see e.g. ):

where $\{\epsilon_{i}\}_{i=1}^{n}$ is a Rademacher sequence independent of $Y_{1},\dots,Y_{n}$ . Let $\mathcal{S}$ be the (random) span of $Y_{1},\dots,Y_{n}$ and ${\rm Tr}_{\mathcal{S}}$ denote the trace operation on linear operators mapping $\mathcal{S}$ to itself. Following the argument in Theorem 1, we notice that:

using spectral mapping (4), the equality “trace $=$ sum of eigenvalues” and the fact that $\mathcal{S}$ has dimension $\leq n$ . A quick calculation shows that $0\preceq(Y_{i}Y_{i}^{*})^{2}=|Y_{i}|^{2}\,Y_{i}Y_{i}^{*}\preceq M^{2}Y_{i}Y_{i}^{*}$ , hence (5) implies:

Proof sketch for Golden-Thompson inequality

As promised in the Introduction, we sketch an elementary proof of inequality (3). We will need the Trotter-Lie formula, a simple consequence of the Taylor formula for $e^{X}$ :

Inequality (3) follows from letting $k\to+\infty$ , using (15) and noticing that ${\rm Tr}(\cdot)$ is continuous.