Sums of random Hermitian matrices and an inequality by Rudelson

Roberto Imbuzeiro Oliveira

Introduction

This note mainly deals with estimates for the operator norm Zn\|Z_{n}\| of random sums

of deterministic Hermitian matrices A1,,AnA_{1},\dots,A_{n} multiplied by random coefficients. Recall that a Rademacher sequence is a sequence {ϵi}i=1n\{\epsilon_{i}\}_{i=1}^{n} of i.i.d. random variables with ϵ1\epsilon_{1} uniform over {1,+1}\{-1,+1\}. A standard Gaussian sequence is a sequence i.i.d. standard Gaussian random variables. Our main goal is to prove the following result.

for some universal C>0C>0, whenever the RHS of the above inequality is at most 11. This important result has been applied to several different problems, such as bringing a convex body to near-isotropic position ; the analysis of for low-rank approximations of matrices and graph sparsification ; estimating of singular values of matrices with independent rows ; analysing compressive sensing ; and related problems in Harmonic Analysis .

The key ingredient of the original proof of Theorem 1 is a non-commutative Khintchine inequality by Lust-Picard and Pisier . This states that there exists a universal c>0c>0 such that for all ZnZ_{n} as in the Theorem, all p1p\geq 1 and all d×dd\times d matrices {Bi,Di}i=1n\{B_{i},D_{i}\}_{i=1}^{n} with Bi+Di=AiB_{i}+D_{i}=A_{i}, 1in1\leq i\leq n,

where Sp\|\cdot\|_{S^{p}} denotes the pp-th Schatten norm: ASppTr[(AA)p/2].\|A\|^{p}_{S^{p}}\equiv{\rm Tr}[(A^{*}A)^{p/2}]. Unfortunately, the proof of the Lust-Picard/Pisier inequality employs language and tools from non-commutative probability that are rather foreign to most potential users of (2).

This note presents an elementary proof of Theorem 1 that bypasses the above inequality. Our argument is based on an improvement of the methodology created by Ahlswede and Winter in order to prove their operator Chernoff bound, which also has many applications e.g. (the improvement is discussed in Section 3.1). This approach only requires elementary facts from Linear Algebra and Matrix Analysis. The most complicated result that we use is the Golden-Thompspon inequality :

The elementary proof of this classical inequality is sketched in Section 5 below.

We have already noted that Rudelson’s bound (2) follows simply from Theorem 1; see [11, Section 3] for detais. Here we prove a concentration lemma corresponding to that result under the stronger assumption that Y1|Y_{1}| is a.s. bounded. While similar results have appeared in other papers , our proof is simpler and gives explicit (albeit quite large) constants.

whenever ϵ(n,M)1\epsilon(n,M)\leq 1. A key feature both of this Lemma is that the ambient dimension dd plays no direct role in the bound. In fact, the same result holds for YiY_{i} taking values in a separable Hilbert space (as in the last section of ).

To conclude the introduction, we present an open problem: is it possible to improve upon Rudelson’s bound under further assumptions? There is some evidence that the dependence on ln(d)\ln(d) in the Theorem, while necessary in general [12, Remark 3.4], can sometimes be removed. For instance, Adamczak et al. have improved upon Rudelson’s original application of Theorem 1 to convex bodies, obtaining exactly what one would expect in the absence of the log(2d)\sqrt{\log(2d)} term. Another setting where our bound is a Θ(lnd)\Theta\left(\sqrt{\ln d}\right) factor away from optimality is that of more classical random matrices (cf. the end of Section 3.1 below). It would be interesting if one could sharpen the proof of Theorem 1 in order to reobtain these results. [Related issues are raised by Vershynin .]

Preliminaries

Moreover, Tr(A){\rm Tr}(A) (the trace of AA) is the sum of the eigenvalues of AA.

By this we mean that the eigenvalues of f(A)f(A) are the numbers f(λ)f(\lambda) with λspec(A)\lambda\in{\rm spec}(A). Moreover, the multiplicity of ξspecf(A)\xi\in{\rm spec}f(A) is the sum of the multiplicities of all preimages of ξ\xi under ff that lie in spec(A){\rm spec}(A).

2 The positive-semidefinite order

Moreover, spectral mapping (4) implies that:

We will also need the following simple fact.

Proof: To prove this, assume the LHS and observe that the RHS is equivalent to Tr(CΔ)0{\rm Tr}(C\Delta)\geq 0 where ΔBA\Delta\equiv B-A. By assumption, Δ0\Delta\succeq 0, hence it has a Hermitian square root Δ1/2\Delta^{1/2}. The cyclic property of the trace implies:

Since the trace is the sum of the eigenvalues, we will be done once we show that Δ1/2CΔ1/20\Delta^{1/2}C\Delta^{1/2}\succeq 0. But, since Δ1/2\Delta^{1/2} is Hermitian and C0C\succeq 0,

which shows that Δ1/2CΔ1/20\Delta^{1/2}C\Delta^{1/2}\succeq 0, as desired. \Box

3 Probability with matrices

The definition of expectations implies that traces and expectations commute:

Moreover, one can check that the usual product rule is satisfied:

Proof of Theorem 1

Proof: [of Theorem 1] We wish to control the tail behavior of:

However, ZnZ_{n} and Zn-Z_{n} have the same distribution. It follows that:

The usual Bernstein trick implies that for all t0t\geq 0,

The function “xesxx\mapsto e^{sx}” is monotone non-decreasing and positive for all s0s\geq 0. It follows from the spectral mapping property (4) that for all s0s\geq 0, the largest eigenvalue of esZne^{sZ_{n}} is esλmax(Zn)e^{s\lambda_{\max}(Z_{n})} and all eigenvalues of esZne^{sZ_{n}} are non-negative. Using the equality “trace == sum of eigenvalues” implies that for all s0s\geq 0,

Up to now, our proof has followed Ahlswede and Winter’s argument. The next lemma, however, will require new ideas.

This lemma is proven below. We will now show how it implies Rudelson’s bound. Let

[The second inequality follows from i=1nAi20\sum_{i=1}^{n}A_{i}^{2}\succeq 0, which holds because of (5) and (6).] We note that:

where the equality is yet another application of spectral mapping (4) and the fact that “xes2x/2x\mapsto e^{s^{2}x/2}” is monotone increasing. We deduce from the Lemma and (10) that:

Since 0Zn2ln(2d)σ+(Zn2ln(2d)σ)+,0\leq\|Z_{n}\|\leq\sqrt{2\ln(2d)}\sigma+(\|Z_{n}\|-\sqrt{2\ln(2d)}\sigma)_{+}, this implies the LpL^{p} estimate in the Theorem. The bound “CpcpC_{p}\leq c\sqrt{p}” is standard and we omit its proof. \Box

Proof: [of Lemma 2] Define D0i=1ns2Ai2/2D_{0}\equiv\sum_{i=1}^{n}s^{2}A_{i}^{2}/2 and

We will prove that for all 1jn1\leq j\leq n:

By the monotonicity of the trace (7) and the fact that exp(Dj1)0\exp\left(D_{j-1}\right)\succeq 0 (which follows from (4)), we will be done once we show that:

The key fact is that sϵjAjs\epsilon_{j}A_{j} and s2Aj2/2-s^{2}A_{j}^{2}/2 always commute, hence the exponential of the sum is the product of the exponentials. Applying (9) and noting that es2Aj2/2e^{-s^{2}A_{j}^{2}/2} is constant, we see that:

which implies that f(Aj)If(A_{j})\preceq I. This proves (13) in this case and finishes the proof of (12) and of the Lemma. \Box

A direct adaptation of the original argument of Ahlswede and Winter would lead to an inequality of the form:

However, only the second inequality seems to be useful, as there is no obvious relationship between

which is what we would need to proceed with induction. [Note that Golden-Thompson (3) cannot be undone and fails for three summands, .] The best one can do with the second inequality is:

This would give a version of Theorem 1 with i=1nAi2\sum_{i=1}^{n}\|A_{i}\|^{2} replacing i=1nAi2\|\sum_{i=1}^{n}A_{i}^{2}\|. This modified result is always worse than the actual Theorem, and can be dramatically so. For instance, consider the case of a Wigner matrix where:

with the ϵij\epsilon_{ij} i.i.d. standard Gaussian and each AijA_{ij} has ones at positions (i,j)(i,j) and (j,i)(j,i) and zeros elsewhere (we take d=md=m and n=(m2)n=\binom{m}{2} in this case). Direct calculation reveals:

We note in passing that neither approach is sharp in this case, as ijϵijAij\|\sum_{ij}\epsilon_{ij}A_{ij}\| concentrates around 2m2\sqrt{m} .

Concentration for rank-one operators

By Jensen’s inequality, ϕ(2Ms2/n)ϕ(s)2M2s/n\phi(2Ms^{2}/n)\leq\phi(s)^{2M^{2}s/n} whenever 2M2s/n12M^{2}s/n\leq 1, hence (14) implies:

and a few simple calculations. [Notice that 2M2s/n1/22M^{2}s/n\leq 1/2 with this choice, hence 1/(12M2s/n)21/(1-2M^{2}s/n)\leq 2.]

To prove (14), we begin with symmetrization (see e.g. ):

where {ϵi}i=1n\{\epsilon_{i}\}_{i=1}^{n} is a Rademacher sequence independent of Y1,,YnY_{1},\dots,Y_{n}. Let S\mathcal{S} be the (random) span of Y1,,YnY_{1},\dots,Y_{n} and TrS{\rm Tr}_{\mathcal{S}} denote the trace operation on linear operators mapping S\mathcal{S} to itself. Following the argument in Theorem 1, we notice that:

using spectral mapping (4), the equality “trace == sum of eigenvalues” and the fact that S\mathcal{S} has dimension n\leq n. A quick calculation shows that 0(YiYi)2=Yi2YiYiM2YiYi0\preceq(Y_{i}Y_{i}^{*})^{2}=|Y_{i}|^{2}\,Y_{i}Y_{i}^{*}\preceq M^{2}Y_{i}Y_{i}^{*}, hence (5) implies:

Proof sketch for Golden-Thompson inequality

As promised in the Introduction, we sketch an elementary proof of inequality (3). We will need the Trotter-Lie formula, a simple consequence of the Taylor formula for eXe^{X}:

Inequality (3) follows from letting k+k\to+\infty, using (15) and noticing that Tr(){\rm Tr}(\cdot) is continuous.

References