Spectral norm of products of random and deterministic matrices
Roman Vershynin
Introduction
This paper grew out of an attempt to understand the class of random matrices with non-independent entries, but which can be factorized through random matrices with independent entries. Equivalently, we are interested in sample covariance matrices of a wide class of random vectors – the linear transformations of vectors with independent entries.
For random matrices with independent and identically distributed entries, the spectral norm is well studied. Let be an matrix whose entries are real independent and identically distributed random variables with mean zero, variance and finite fourth moment. Estimates of the type
are known to hold (and are sharp) in both the limit regime for dimensions increasing to infinity, and the non-limit regime where the dimensions are fixed. The meaning of (1.1) in the limit regime is that, for a family of matrices as above whose dimensions and increase to infinity and whose aspect ratio converges to a constant, the ratio converges to almost surely .
In the non-limit regime, i.e. for arbitrary dimensions and , variants of (1.1) were proved by Y. Seginer and R. Latala . If is an matrix whose entries are i.i.d. mean zero random variables, then denoting the rows of by and the columns by , the result of Y. Seginer states that
where is an absolute constant. This estimate is sharp because is obviously bounded below by the Euclidean norm of any row and any column of . Furthermore, if the entries of the matrix are not necessarily identically distributed, then R. Latala’s result states that
In particular, if is an matrix whose entries are independent random variables with mean zero and fourth moments bounded by , then one can deduce from either Y. Seginer’s or R. Latala’s result that
This is a variant of (1.1) in the non-limit regime.
The fourth moment hypothesis is known to be necessary. Consider again a family of matrices whose dimensions and increase to infinity, and whose aspect ratio converges to a constant. If the entries are independent and identically distributed random variables with mean zero and infinite fourth moment, then the upper limit of the ratio is infinite almost surely .
2. The main result
The main result of this paper is an extension of the optimal bound (1.2) to the class of random matrices with non-independent entries, but which can be factored through a matrix with independent entries.
Let and let be positive integers. Consider a random matrix , where is an random matrix whose entries are independent random variables with mean zero and -th moment bounded by , and is an non-random matrix such that . Then
where is a function that depends only on .
1. An important feature of this result is that its conclusion is independent of the dimension .
2. The proof of Theorem 1.1 yields the stronger estimate
4. Under the stronger subgaussian moment assumption (1.6) on the entries, Theorem 1.1 is easy to prove using standard concentration and an -net argument. In contrast, if only some finite moment is assumed, we do not know any simple proof.
3. The smallest singular value
Our main motivation for Theorem 1.1 was to complete the analysis of the smallest singular value of random rectangular matrices carried out by M. Rudelson and the author in . The smallest singular value of a matrix can be equivalently described as .
Analyzing the smallest singular value is generally harder than analyzing the largest one (the spectral norm). The analogue of (1.1) for the smallest singular value of random matrices (for ) is
The optimal limit version of this result proved in holds under exactly the same hypotheses as (1.1) – for i.i.d. entries with mean zero, variance and finite fourth moment.
Many papers addressed (1.5) for fixed dimensions , . Sufficiently tall matrices ( for sufficiently large ) were studied in ; extensions to genuinely rectangular matrices ( for some ) were studied in , with gradually improving dependence on . An optimal version of (1.5) for all dimensions was obtained in . All these works put somewhat stronger moment assumptions than the fourth moment of the entries of the matrix . A convenient assumption is that the entries are subgaussian random variables. This means that all their moments are bounded by the corresponding moments of the standard normal random variable, i.e.
where is called the subgaussian moment. It was proved in that if the entries of are i.i.d. mean zero subgaussian random variables with unit variance, then for every one has
where depend only on the subgaussian moment . In particular, for such matrices we have
where depends only on the desired probability and the subgaussian moment. This result encompasses the case of square matrices where and hence (1.8) yields . For Gaussian square matrices this optimal bound was obtained in and ; for general square matrices a weaker bound was obtained in and the best bound as above in ; the estimate is shown to be optimal in .
Whether (1.8) holds under weaker moment assumptions was only known in the case of square matrices. It was proved in using (1.2) that (1.8) holds under the fourth moment assumption for square matrices, i.e. for . Whether the same is true for arbitrary rectangular matrices under the fourth moment assumption was left open in . The bottleneck of the argument occurred in Proposition 7.3 on where we needed a correct bound on the spectral norm of a product of a random matrix and a fixed orthogonal projection. Such a bound was easy to get only under the subgaussian hypothesis. Theorem 1.1 of the present paper extends the argument of for random matrices with bounded -th moment. It follows directly from the argument of and Theorem 1.1.
Let and be positive integers. Let be a random matrix whose entries are i.i.d. random variables with mean zero, unit variance and -th moment bounded by . Then, for every there exist and which depend only on , and , and such that
This result follows by the argument in , where one considers probability estimates conditional on the event that the norm of a product of a random matrix and a non-random orthogonal projection is small (see [28, Proposition 7.3]).
After this paper was written, two important related results appeared on the universality of the smallest singular value in two extreme regimes – for almost square matrices and for genuinely rectangular matrices. One of these results, by T. Tao and V. Vu works for square and almost square matrices where the the defect is constant. It is valid for matrices with i.i.d. entries with mean zero, unit variance and bounded -th moment where is a sufficiently large absolute constant. The result states that the smallest singular value of such matrices is asymptotically the same as of the Gaussian matrix of the same dimensions and with i.i.d. standard normal entries. Specifically,
This universality result, combined with the known asymptotic estimates of the smallest singular value of Gaussian matrices allows one to obtain bounds sharper than in Corollary 1.2. However, the universality result of is only known in the almost square regime (and under stronger moment assumptions), while Corollary 1.2 is valid for all dimensions .
Another recent universality result was obtained by O. Feldheim and S. Sodin for genuinely rectangular matrices, i.e. with aspect ratio separated from by a constant, and with subgaussian i.i.d. entries. In particular they proved the inequality
Deviation inequalities (1.7) and (1.10) complement each other – the former is multiplicative (and is valid for arbitrary dimensions) while the latter is additive (and is applicable for genuinely rectangular matrices). Each of these two inequalities clearly has the regime where it is stronger.
4. Outline of the argument
This bound is already independent of the dimension , but is off by from being optimal. The logarithmic term is unfortunately a limitation of this method. This term comes from M. Rudelson’s result, Theorem 3.1 below, where it is needed in full generality. It would be useful to understand the situations where the logarithmic term can be removed from M. Rudelson’s theorem. So far, only one such situation is known from where the independent random vectors are uniformly distributed in a convex body.
The advantage of almost square matrices is that the magnitude of their entries is easy to control. A simple consequence of the -th moment hypothesis and Markov’s inequality yields that the entries of satisfy with high probability. Note that the same estimate holds for square matrices () under the fourth moment assumption. So, in regard to the magnitude of entries, almost square matrices are similar to exactly square matrices, for which the desired bound follows from R. Latala’s result (1.2).
This prompts us to construct the proof of Theorem 1.1 for almost square matrices similarly to R. Latala’s argument in , i.e. using fairly standard concentration of measure results in the Gauss space, coupled with delicate constructions of nets. We first decompose into a sum of matrices which contain entries of similar magnitude. As the magnitude increases, these matrices become sparser. This quickly reduces the problem to random sparse matrices, whose entries are i.i.d. random variables valued in . The spectral norm of random sparse matrices was studied in as a development of the work of Z. Furedi and J. Komlos . However, we need to bound the spectral norm of the matrix rather than . Independence of entries is not available for , which makes it difficult to use the known combinatorial methods based on the bounding trace of high powers of .
Acknowledgement
The author is grateful for the referee for careful reading of the manuscript, and for many suggestions which greatly improved the presentation.
Preliminaries
Throughout the paper, the results are stated and proved over the field of real numbers. They are easy to generalize to complex numbers.
We denote by positive absolute constants, and by positive quantities that may depend only on the parameter . Their values can change from line to line.
2. Concentration of measure
The method that we carry out in Section 4 uses concentration in the Gauss space in combination with constructions of -nets. Here we recall some basic facts we need.
where is an absolute constant.
As a very restrictive but useful example, Theorem 2.1 implies the following deviation inequality for sums of independent exponential random variables (which can also be derived by the more standard approach via moment generating functions).
Let be a vector of real numbers, and let be independent standard normal random variables. Then, for every we have
Another classical deviation inequality we will need is Bennett’s inequality, see e.g. [9, Theorem 2]:
Let be independent mean zero random variables such that for all . Consider the sum and let . Then, for every we have
We will also need M. Talagrand’s concentration inequality for convex Lipschitz funcitons from [31, Theorem 6.6]; see also [18, Corollary 4.10] and the discussion below it.
3. Nets
Consider a subset of a normed space , and let . Recall that an -net of is a subset of such that the distance from any point of to is at most . In other words, for every there exists such that .
The following estimate follows by a volumetric argument, see e.g. the proof of Lemma 9.5 in .
When computing norms of linear operators, -nets provide a convenient discretization of the problem. We formalize it in the next proposition.
Let be a linear operator between normed spaces and , and let be an -net of either the unit sphere or the unit ball of for some . Then
We give the proof for an -net of the unit sphere; the case of the unit ball is similar. Every has the form , where and . Since , the triangle inequality yields
The last term in the right hand side is bounded by . Thus we have shown that
4. Symmetrization
We will use the standard symmetrization technique as was done in ; see more general inequalities in e.g. [19, Section 6.1]. To this end, let the matrices and be as in Theorem 1.1. Let be an independent copy of , and let be independent symmetric Bernoulli random variables. Then, by Jensen’s inequality,
Therefore, we can assume without loss of generality in Theorem 1.1 that are symmetric random variables. Furthermore, let be independent standard normal random variables. Then, again by Jensen’s inequality,
Conditioning on , we thus reduce the problem to random gaussian matrices.
We will use a similar symmetrization technique several times in our argument. In particular, in the proof of Lemma 3.8 we apply the following observation, which can be deduced from standard symmetrization lemma ( Lemma 6.3) and the contraction principle ( Theorem 4.4). For the reader’s convenience we include a direct proof.
Consider independent mean zero random variables such that , independent symmetric Bernoulli random variables , and vectors in some Banach space, where both and range in some finite index sets. Then
To be specific, we can assume that both indices and range in the interval for some integer . Let denote an independent copy of the sequence of random variables . Then are symmetric random variables. We have
is a convex function. Therefore, on the compact convex set it attains its maximum on the extreme points, where all . By symmetry, the function takes the same value at each extreme point, which equals
5. Truncation and conditioning
We will need some elementary observations related to truncation and conditioning of random variables.
Let be a non-negative random variable, and let , . Then
We will also need two elementary conditioning lemmas. In Section 4, we will need to control the maximal magnitude of the entries of the random matrix . Conditioning on will unfortunately destroy the independence of the entries. So, we will instead condition on an event for fixed , which will clearly preserve the independence. This conditional argument used in the proof of Corollary 4.11 relies on the following two elementary lemmas.
Let be a random variable and be a real number. Then
Let , be non-negative random variables. Assume there exists such that one has for every :
Without loss of generality we can assume that by rescaling to . Thus we have for every :
By (2.3) and Hölder’s inequality, the first term is bounded as
Further terms can be estimated by Cauchy-Schwarz inequality and using (2.3) and the second inequality in (2.2). Indeed,
6. On the deterministic matrix B𝐵B in Theorem 1.1.
We start with two initial observations that will make our proof of Theorem 1.1 more transparent. By adding an appropriate number of zero rows to or zero columns to we can assume without loss of generality that , thus is an matrix.
where denotes the Hilbert-Schmidt norm. Throughout the argument, we will only have access to the matrix through inequalities (2.4). This explains Remark 2 following Theorem 1.1, which states that the range space of is irrelevant as long as we control the spectral and Hilbert-Schmidt norms of .
Approach via M. Rudelson’s theorem
In particular, for every , with probability at least one has
The first estimate is taken from [22, inequality (3.4)]. The second estimate can be easily derived from it using the following elementary lemma:
Suppose a non-negative random variable satisfies for some that
Next, if then by choosing the absolute constant sufficiently small right hand side of (3.1) is larger than for a sufficiently small absolute constant . Therefore, for every one has
because if then the right hand side of (3.1) is larger than one, which makes the inequality trivial. This completes the proof. ∎
The next lemma is a consequence of M. Rudelson’s Theorem 3.1 and a standard symmetrization argument.
Let be independent symmetric Bernoulli random variables. By the triangle inequality, the standard symmetrization argument (see e.g. [19, Lemma 6.3]), and the assumption, we have
Now we take expectation with respect to and use Cauchy-Schwarz inequality to get
2. Theorem 1.1 up to a logarithmic term
We now state a version of Theorem 1.1 with a logarithmic factor.
Let be positive integers. Consider an random matrix whose entries are independent random variables with mean zero and -th moment bounded by . Let be an matrix such that . Then
The proof will need two auxiliary lemmas. Recall that denote the columns of the matrix .
The estimate on the expectation follows easily from (2.4):
To estimate the variance, we need to compute
By independence and the mean zero assumption, the only nonzero terms in this sum are those for which or or . Therefore
By the fourth moment assumption and using (2.4) we have
This result says that all columns of the matrix have norm with high probability. Since the spectral norm of a matrix is bounded below by the norm of any column, this result is a necessary step in proving our desired estimate .
Let us fix and use Lemma 3.5. This gives
Now we use Chebychev’s inequality, which states that for a random variable with and for an arbitrary , one has
Let be arbitrary. Using Chebychev’s inequality along with (3.5) for , , we obtain
Taking the union bound over all , we conclude that
This shows that condition (3.2) holds. Lemma 3.3 then gives
Estimating the maximum in the right hand side using Lemma 3.6, we conclude that
3. Tradeoff between the matrix norm and the magnitude of entries
We would like now to gain more control over the logarithmic factor than we have in Proposition 3.4. Our next result establishes a tradeoff between the logarithmic factor and the magnitude of the matrices , . It will be used in the proof of Theorem 3.9.
Let and be positive integers. Let be an matrix whose entries are random independent variables with mean zero and such that
Let be an matrix such that , and whose columns satisfy
The proof will again be based on M. Rudelson’s Theorem 3.1, although this time we use Rudelson’s theorem in a more delicate way:
Under the assumptions of Proposition 3.7, we have
Next, clearly , so
where denote independent symmetric Bernoulli random variables.
Let . By the second part of M. Rudelson’s Theorem 3.1 and taking the union bound over random variables, we conclude that, with probability at least , we have
The second estimate follows from (3.7) and since by the hypothesis.
Let be arbitrary. We apply the above estimate for chosen so that . This shows that, with probability at least , one has
Putting this into (3.9) and, together with (3.8), back into (3.6), we complete the proof. ∎
which does not depend on the random variables , has expectation
We condition on the random variables ; this fixes a value of .
Consider a -net of the unit Euclidean sphere of cardinality , which exists by Lemma 2.5. Using Proposition 2.6, we have
Fix . For every , the random variable
is a Gaussian random variable with mean zero and variance
(To obtain the first inequality, take the supremum over ). Therefore, by Corollary 2.2 with , we have for every :
Let be arbitrary. The previous estimate for gives
Taking the union bound over and using (3.11), we obtain
Finally, we take expectation with respect to the random variables and use (3.10) to conclude that
4. Theorem 1.1 for logarithmically small columns
Our next step is to combine Propositions 3.4 and 3.7 and obtain a weaker version of the main Theorem 1.1 – this time with the correct bound on the norm, but under the additional assumption that the columns of the matrix are logarithmically small.
Let and let be positive integers. Consider an random matrix whose entries are independent random variables with mean zero and -th moment bounded by . Let be an matrix such that , and whose columns satisfy for some that
By the symmetrization argument described in Section 2, we can assume without loss of generality that all entries of the matrix are symmetric random variables. Let
We decompose every entry of the matrix according to its absolute value as
The norm of can be bounded using Proposition 3.7, which we can apply with as above and . This gives
where the last inequality follows by our choice of and .
Putting the two estimates together, we conclude by the triangle inequality that
The factor in the conclusion of Theorem 3.9 can easily be improved to about by choosing in the proof and optimizing in . We will not need this improvement in our argument.
Approach via concentration
In this section, we develop an alternative way to bound the norm of , which rests on Gaussian concentration inequalities and elaborate choice of -nets. The main technical result of this section is the following theorem, which, like Theorem 3.9, gives the correct bound under some boundedness assumptions on the entries of .
Let , and let be positive integers such that . Consider an random matrix whose entries are independent random variables with mean zero and such that
Let be an matrix such that . Then
where depends only on .
1. If the entries have bounded -th moment, it is easy to check that holds with high probability. Therefore, under the -th moment assumption, the hypotheses of Theorem 4.1 are satisfied for almost square matrices, i.e. those for which . This will quickly yield the main Theorem 1.1 for almost square matrices, see Corollary 4.11 below.
3. Using M. Talagrand’s concentration result, Theorem 2.4, one can also obtains tail bounds for the norm :
Under the assumptions of Theorem 4.1, one has for every :
In particular, one has for every :
Theorem 4.1 will follow from our analysis of sparse matrices. We will decompose the entries according to their magnitude. As the magnitude increases, the moment assumptions will ensure that there will be fewer such entries, i.e. the resulting matrix becomes sparser.
We start with an elementary lemma, which will help us analyze the magnitude of the rows and columns of the matrix when is a sparse matrix.
Let be positive integers. Consider independent random variables , , . Let , and suppose that
Let be an matrix such that , whose columns are denoted . Then
We will only prove inequality (4.2); the proof of inequality (4.1) is similar. By the assumptions, we have
Consider the sums of independent random variables
The above estimates show that for every we have
Taking the union bound over all , we conclude that
Now let be arbitrary, and use the last inequality for . We obtain
The estimates in Lemma 4.3 motivate us to consider the class of matrices whose entries satisfy the following inequalities for some parameters and :
2. Concentration for a fixed vector
Our goal will be to estimate the magnitude of for matrices of the form , where are independent standard normal random variables, and are fixed numbers that satisfy conditions (4.4). Such an estimate will be established in Proposition 4.8 below. By the standard symmetrization, the same estimate will hold true if is a random matrix with entries as in Lemma 4.3. This will be done in Corollary 4.9. Finally, Theorem 4.1 will be deduced from this by decomposing the entries of a random matrix according to their magnitude.
Our first step toward this goal is to check the magnitude of for a fixed vector .
Let be positive integers. Consider an random matrix where are independent standard normal random variables and are numbers that satisfy conditions (4.4). Let be an matrix such that . Then, for every vector we have
Denoting as usual the columns of by , we have
Since and using the last condition in (4.4), we have
We will now strengthen Lemma 4.4 into a deviation inequality for . This is a simple consequence of the Gaussian concentration, Theorem 2.1. This deviation inequality is universal in that it holds for any vector ; in the sequel we will need more delicate inequalities that depend on the distribution of the coordinates in .
Let and be matrices as in Lemma 4.4. Then, for every vector and every we have
where are the columns of the matrix . Therefore, the random vector is distributed identically with the random vector
and where are independent standard normal random variables. Since all by conditions (4.4), and by the assumptions, we have
Then the Gaussian concentration, Theorem 2.1, gives for every :
where . Since as we noted above, is distributed identically with , Lemma 4.4 completes the proof. ∎
3. Control of sparse vectors
Nevertheless, the bound in Lemma 4.5 can be made uniform over a set of sparse vectors, whose metric entropy is smaller than that of the whole sphere:
Let and be matrices as in Lemma 4.4. There exists an absolute constant such that the following holds. Consider the set of vectors
Let be a constant to be determined later, and let . Then
Using Proposition 2.6 and taking the union bound over all , we obtain
Since there are ways to choose the subset , by taking the union bound over all we conclude that
Finally, if the absolute constant in the definition of is chosen sufficiently small, we have . Thus the right hand side of (4.6) is at most
4. Control of spread vectors
Although we now have a good control of sparse vectors, they unfortunately comprise a small part of the unit ball . More common but harder to deal with are “spread vectors” – those having many coordinates that are not close to zero. The next result gains control of the spread vectors.
Let and be matrices as in Lemma 4.4 with . Let . Consider the set of vectors
This time we will need to work with multiple nets to account for different possible distributions of the magnitude of the coordinates of vectors . Since , without loss of generality we can assume that .
A standard calculation shows that is an -net of in the -norm, i.e. for every there exists such that . Therefore, by Proposition 2.6,
Fix . Since , the number of coordinates of that satisfy is at most , for every . Decomposing according to the coordinates whose absolute value is , we have by the triangle inequality that
Fix and assume that . Since , we have
To estimate the cardinality of , note that there are at most ways to choose ; there are ways to choose the support of ; and there are ways to choose the (signs of) nonzero coordinates of . Hence by Stirling’s approximation and using (4.8), we have
Step 2: control of a fixed vector. Fix and fix . As we saw in the proof of Lemma 4.5,
and where are independent standard normal random variables. Since , we have . This and the second condition in (4.4) yield
Repeating the estimate in the proof of Lemma 4.5, we bound the Lipschitz norm as
Then the Gaussian concentration, Theorem 2.1, gives for every :
where . Since as we noted above, is distributed identically with , Lemma 4.4 yields that
Let be arbitrary. Applying the above estimate for and using we conclude that
Step 3: union bound. Taking the union bound in (4.10) over all and using estimate (4.9) on the cardinality of , we have for all :
Let . We choose , where . Since and , , we obtain from the above estimate that
Putting this back in (4.7), we conclude that
5. Norms of sparse matrices, and proof of Theorem 4.1
Propositions 4.6 and 4.7 together handle all vectors in the unit ball, and yield the following norm estimate:
Let and be matrices as in Lemma 4.4 with . Then
Let be the absolute constant as in Proposition 4.6; we can clearly assume that . We define
Note that as required in Proposition 4.6.
Fix a vector . We decompose it according to the magnitude of the coordinates, as follows:
Clearly, , . By Markov’s inequality, we have
Then as in Proposition 4.6. On the other hand, by definition, so as in Proposition 4.7. Therefore, by Propositions 4.6 and 4.7 we have
Our choice of and the assumption completes the proof. ∎
Finally, a standard symmetrization argument yields the following norm estimate, which we shall use for sparse random matrices.
Let and let be positive integers. Consider an random matrix whose entries are independent random variables with mean zero and such that
Let be an matrix such that . Then
It would be interesting to remove the logarithmic term from this estimate.
By Lemma 4.3, conditions (4.4) hold with some random parameter which only depends on the random variables and not on , and which satisfies
Condition on the random variables . Proposition 4.8 then yields
Therefore, when we remove the conditioning, we obtain by (4.12) that
Then we have a decomposition . This sum is actually finite because of the boundedness assumption on . Indeed, we have
where is the maximal integer such that
Using Corollary 4.9 for the matrix and , we obtain
where the last line follows because and by the hypothesis.
Now we fix . Using the -th moment assumption, we have by Markov’s inequality that
By the definition of and by (4.14), we have
Therefore, , so
Using (4.13) and the triangle inequality, then using (4.15) and (4.5), we conclude that
This completes the proof of Theorem 4.1. ∎
6. Almost square matrices
The main application of Theorem 4.1 is for almost square matrices – those for which . The next lemma verifies the hypotheses of Theorem 4.1 for such matrices.
Let and let be positive integers satisfying . Let be an random matrix whose entries are independent random variables with -th moment bounded by . Define the random variable by the equation
By Markov’s inequality, we have for every that
Taking the union bound over all random variables , we obtain
The assumption yields that
Therefore, since and , we have
We are now ready to state and prove a partial case of Theorem 1.1 for almost square matrices.
Let and let be positive integers satisfying . Let be an random matrix whose entries are independent random variables with mean zero and -th moment bounded by . Let be an matrix such that . Then
Without loss of generality we may assume that by adding an appropriate number of zero rows to and zero columns to . Also, using the standard symmetrization, we can assume that the random variables are symmetric. Let be the random variable as in Lemma 4.10, and let . By the definition, is the product event. Therefore, conditioning on this event (i) preserves the independence of the entries of ; (ii) makes all these entries bounded as in (4.17); (iii) can only reduce their moments by Lemma 2.9, thus for all we have
Therefore, we can apply Corollary 4.2 conditionally, with and with replaced by , which gives
Completion of the proof of Theorem 1.1
By adding an appropriate number of zero rows to or zero columns to we can assume that , thus is an matrix. Consider the exponent
As usual, let be the columns of the matrix . Consider the subset of large columns defined as
Here we choose sufficiently large so that, by (2.4) and Markov’s inequality, we have
Denote by the submatrix of whose rows are in , by the submatrix of whose columns are in (and similarly for ). The decomposition implies by the triangle inequality that
This splits our problem into two subproblems, one for and one for . Of course, if or is empty then the corresponding matrix is zero and we can skip its estimation.
The matrices , are almost square, so Corollary 4.11 applies for them, giving
On the other hand, the columns of the matrix are small by the definition of :
Therefore, Theorem 3.9 applies to the matrices , , which gives
Putting estimates (5.2) and (5.3) into (5.1), we conclude that