Sums of random Hermitian matrices and an inequality by Rudelson
Roberto Imbuzeiro Oliveira
Introduction
This note mainly deals with estimates for the operator norm of random sums
of deterministic Hermitian matrices multiplied by random coefficients. Recall that a Rademacher sequence is a sequence of i.i.d. random variables with uniform over . A standard Gaussian sequence is a sequence i.i.d. standard Gaussian random variables. Our main goal is to prove the following result.
for some universal , whenever the RHS of the above inequality is at most . This important result has been applied to several different problems, such as bringing a convex body to near-isotropic position ; the analysis of for low-rank approximations of matrices and graph sparsification ; estimating of singular values of matrices with independent rows ; analysing compressive sensing ; and related problems in Harmonic Analysis .
The key ingredient of the original proof of Theorem 1 is a non-commutative Khintchine inequality by Lust-Picard and Pisier . This states that there exists a universal such that for all as in the Theorem, all and all matrices with , ,
where denotes the -th Schatten norm: Unfortunately, the proof of the Lust-Picard/Pisier inequality employs language and tools from non-commutative probability that are rather foreign to most potential users of (2).
This note presents an elementary proof of Theorem 1 that bypasses the above inequality. Our argument is based on an improvement of the methodology created by Ahlswede and Winter in order to prove their operator Chernoff bound, which also has many applications e.g. (the improvement is discussed in Section 3.1). This approach only requires elementary facts from Linear Algebra and Matrix Analysis. The most complicated result that we use is the Golden-Thompspon inequality :
The elementary proof of this classical inequality is sketched in Section 5 below.
We have already noted that Rudelson’s bound (2) follows simply from Theorem 1; see [11, Section 3] for detais. Here we prove a concentration lemma corresponding to that result under the stronger assumption that is a.s. bounded. While similar results have appeared in other papers , our proof is simpler and gives explicit (albeit quite large) constants.
whenever . A key feature both of this Lemma is that the ambient dimension plays no direct role in the bound. In fact, the same result holds for taking values in a separable Hilbert space (as in the last section of ).
To conclude the introduction, we present an open problem: is it possible to improve upon Rudelson’s bound under further assumptions? There is some evidence that the dependence on in the Theorem, while necessary in general [12, Remark 3.4], can sometimes be removed. For instance, Adamczak et al. have improved upon Rudelson’s original application of Theorem 1 to convex bodies, obtaining exactly what one would expect in the absence of the term. Another setting where our bound is a factor away from optimality is that of more classical random matrices (cf. the end of Section 3.1 below). It would be interesting if one could sharpen the proof of Theorem 1 in order to reobtain these results. [Related issues are raised by Vershynin .]
Preliminaries
Moreover, (the trace of ) is the sum of the eigenvalues of .
By this we mean that the eigenvalues of are the numbers with . Moreover, the multiplicity of is the sum of the multiplicities of all preimages of under that lie in .
2 The positive-semidefinite order
Moreover, spectral mapping (4) implies that:
We will also need the following simple fact.
Proof: To prove this, assume the LHS and observe that the RHS is equivalent to where . By assumption, , hence it has a Hermitian square root . The cyclic property of the trace implies:
Since the trace is the sum of the eigenvalues, we will be done once we show that . But, since is Hermitian and ,
which shows that , as desired.
3 Probability with matrices
The definition of expectations implies that traces and expectations commute:
Moreover, one can check that the usual product rule is satisfied:
Proof of Theorem 1
Proof: [of Theorem 1] We wish to control the tail behavior of:
However, and have the same distribution. It follows that:
The usual Bernstein trick implies that for all ,
The function “” is monotone non-decreasing and positive for all . It follows from the spectral mapping property (4) that for all , the largest eigenvalue of is and all eigenvalues of are non-negative. Using the equality “trace sum of eigenvalues” implies that for all ,
Up to now, our proof has followed Ahlswede and Winter’s argument. The next lemma, however, will require new ideas.
This lemma is proven below. We will now show how it implies Rudelson’s bound. Let
[The second inequality follows from , which holds because of (5) and (6).] We note that:
where the equality is yet another application of spectral mapping (4) and the fact that “” is monotone increasing. We deduce from the Lemma and (10) that:
Since this implies the estimate in the Theorem. The bound “” is standard and we omit its proof.
Proof: [of Lemma 2] Define and
We will prove that for all :
By the monotonicity of the trace (7) and the fact that (which follows from (4)), we will be done once we show that:
The key fact is that and always commute, hence the exponential of the sum is the product of the exponentials. Applying (9) and noting that is constant, we see that:
which implies that . This proves (13) in this case and finishes the proof of (12) and of the Lemma.
A direct adaptation of the original argument of Ahlswede and Winter would lead to an inequality of the form:
However, only the second inequality seems to be useful, as there is no obvious relationship between
which is what we would need to proceed with induction. [Note that Golden-Thompson (3) cannot be undone and fails for three summands, .] The best one can do with the second inequality is:
This would give a version of Theorem 1 with replacing . This modified result is always worse than the actual Theorem, and can be dramatically so. For instance, consider the case of a Wigner matrix where:
with the i.i.d. standard Gaussian and each has ones at positions and and zeros elsewhere (we take and in this case). Direct calculation reveals:
We note in passing that neither approach is sharp in this case, as concentrates around .
Concentration for rank-one operators
By Jensen’s inequality, whenever , hence (14) implies:
and a few simple calculations. [Notice that with this choice, hence .]
To prove (14), we begin with symmetrization (see e.g. ):
where is a Rademacher sequence independent of . Let be the (random) span of and denote the trace operation on linear operators mapping to itself. Following the argument in Theorem 1, we notice that:
using spectral mapping (4), the equality “trace sum of eigenvalues” and the fact that has dimension . A quick calculation shows that , hence (5) implies:
Proof sketch for Golden-Thompson inequality
As promised in the Introduction, we sketch an elementary proof of inequality (3). We will need the Trotter-Lie formula, a simple consequence of the Taylor formula for :
Inequality (3) follows from letting , using (15) and noticing that is continuous.