Hanson-Wright inequality and sub-gaussian concentration
Mark Rudelson, Roman Vershynin
Hanson-Wright inequality
Hanson-Wright inequality is a general concentration result for quadratic forms in sub-gaussian random variables. A version of this theorem was first proved in , however with one weak point mentioned in Remark 1.2. In this article we give a modern proof of Hanson-Wright inequality, which automatically fixes the original weak point. We then deduce a useful concentration inequality for sub-gaussian random vectors, and illustrate it with two applications.
One of the aims of this note is to give a simple and self-contained proof of the Hanson–Wright inequality using only the standard toolkit of the large deviation theory. Several partial results and alternative proofs are scattered in the literature.
Improving upon an earlier result on Hanson-Wright , Wright established a slightly weaker version of Theorem 1.1. Instead of , both papers had in the right side. The latter norm can be much larger than the norm of , and it is often less easy to compute. This weak point went unnoticed in several later applications of Hanson-Wright inequality, however it was clear to experts that it could be fixed.
A proof for the case where are independent symmetric Bernoulli random variables appears in the lecture notes of Nelson . The moment inequality which essentially implies the result of can be also found in . A different approach to Hanson-Wright inequality, due to Rauhut and Tropp, can be found in [8, Proposition 8.13]. It is presented for diagonal-free matrices (however this assumption can be removed by treating the diagonal separately as is done below), and for independent symmetric Bernoulli random variables (but the proof can be extended to sub-gaussian random variables).
The paper contains an alternative short proof of Hanson–Wright inequality due to Latala for diagonal-free matrices. Like in the proof below, Latala’s argument uses decoupling of the order 2 chaos. However, unlike the current paper, which uses a simple decoupling argument of Bourgain , his proof uses a more general and more difficult decoupling theorem for U-statistics due to de la Peña and Montgomery-Smith . For an extensive discussion of modern decoupling methods see .
Large deviation inequalities for polynomials of higher degree, which extend the Hanson-Wright type inequalities, have been obtained by Latala and recently by Adamczak and Wolff .
By replacing with we can assume without loss of generality that . Let us first estimate
Let . By independence and zero mean of , we can represent
The problem reduces to estimating the diagonal and off-diagonal sums:
These standard bounds can be found in [18, Remark 5.18 and Lemma 5.14]. Then we can use a Bernstein-type inequality (see [18, Proposition 5.16]) and obtain
Step 2: decoupling. It remains to bound the off-diagonal sum
The argument will be based on estimating the moment generating function of by decoupling and reduction to normal random variables.
Let be a parameter whose value we will determine later. By Chebyshev’s inequality, we have
where denotes expectation with respect to both and . Consider the set of indices and express
Next, we use a standard estimate of the moment generating function of centered sub-gaussian random variables, see [18, Lemma 5.5]. It yields
Taking expectations of both sides with respect to , we obtain
Recall that this estimate holds for every fixed . It remains to estimate .
Step 3: reduction to normal random variables. Consider where are independent random variables. The rotation invariance of normal distribution implies that for each fixed and , we have
Rearranging the terms, we can write Z=\sum_{i\in\Lambda_{\delta}}X_{i}\Big{(}\sum_{j\in\Lambda_{\delta}^{c}}a_{ij}g_{j}\Big{)}. Then we can bound the moment generating function of in the same way we bounded the moment generating function of in Step 2, only now relying on the sub-gaussian properties of , . We obtain
Recall that this bound holds for each fixed . We have removed the original random variables from the problem, so it now becomes a problem about normal random variables .
Step 4: calculation for normal random variables. By the rotation invariance of the distribution of , the random variable is distributed identically with where denote the singular values of . Hence by independence,
Using the numeric inequality which is valid for all , we can simplify this as follows:
This is a uniform bound for all . Now we take expectation with respect to . Recalling (1.3) and (1.4), we obtain the following estimate on the moment generating function of :
Step 5: conclusion. Putting this estimate into the exponential Chebyshev’s inequality (1.2), we obtain
Optimizing over , we conclude that
Now we combine with a similar estimate (1.1) for and obtain
Sub-gaussian concentration
Hanson-Wright inequality has a useful consequence, a concentration inequality for random vectors with independent sub-gaussian coordinates.
A few special cases of Theorem 2.1 can be easily deduced from classical concentration inequalities. For Gaussian random variables , this result is a standard consequence of Gaussian concentration, see e.g. . For bounded random variables , it can be deduced in a similar way from Talagrand’s concentration for convex Lipschitz functions , see [16, Theorem 2.1.13]. For more general random variables, one can find versions of Theorem 2.1 with varying degrees of generality scattered in the literature (e.g. the appendix of ). However, we were unable to find Theorem 2.1 in the existing literature.
Using a standard symmetrization argument, we can deduce from Theorem 2.1 some bounds on small ball probabilities. The following result is due to Latala et al. [12, Theorem 2.5].
The components of the random vector have mean zero, variances bounded below by and sub-gaussian norms bounded above by . Thus we can apply Theorem 2.1 for and conclude that
Suppose this event occurs. Then by triangle inequality, . Combining this with the second inequality in (2.3), we obtain that
Two applications
Concentration results like Theorem 2.1 have many useful consequences. We include two applications in this article; the reader will certainly find more.
The first application is a concentration of distance from a random vector to a fixed subspace. For random vectors with bounded components, one can find a similar result in [16, Corollary 2.1.19], where it was deduced from Talagrand’s concentration inequality.
Our second application of Theorem 2.1 is for operator norms of random matrices. The result essentially states that an matrix obtained as a product of a deterministic matrix and a random matrix with independent sub-gaussian entries satisfies
with high probability. For random matrices with heavy-tailed rather than sub-gaussian components, this problem was studied in .
The last part of the proof is a standard covering argument. Let be an -net of in the Euclidean metric. We can choose this net so that , see [18, Lemma 5.2]. By a union bound, with probability at least
The same holds if is an matrix such that .
We formulated the results in Sections 2 and 3 for real matrices and real valued random variables. Using a standard complexification trick, one can easily obtain complex versions of these results. Let us show how to complexify Theorem 2.1; the other applications follow from it.
Suppose now that both and are complex. Let us assume that the components have independent real and imaginary parts, such that