Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems

Yash Deshpande, Andrea Montanari

Introduction

Characterizing the computational complexity of statistical estimation and statistical learning problems is an outstanding challenge. On one hand, a large part of research in this area focuses on the analysis of specific polynomial-time algorithms, thus establishing upper bounds on the problem complexity. On the other, information-theoretic techniques are used to derive fundamental limits beyond which no algorithm can solve the statistical problem under study. While in some cases algorithmic and information-theoretic bounds match, in many other examples a large gap remains in which the problem is solvable assuming unbounded resources but simple algorithms fail. The hidden clique and hidden submatrix problems are prototypical examples of this category.

The entries of AA above the diagonal (Aij)i<j(A_{ij})_{i<j} are i.i.d. random variables with the same marginal law AijP0A_{ij}\sim P_{0}.

Given a (hidden) subset Q[n]{\sf Q}\subseteq[n] the entries (Aij)i<j(A_{ij})_{i<j} are independent with

Further, Q{\sf Q} is a uniformly random subset conditional on its size, that is fixed Q=k|{\sf Q}|=k.

Of interest is also the estimation version of this problem, whereby the special subset Q{\sf Q} is known to exist, and an algorithm is sought that identifies Q{\sf Q} with high probability.

This model encapsulates the basic computational challenges underlying a number of problems in which we need to estimate a matrix that is both sparse and low-rank. Such problems arise across genomics, signal processing, social network analysis, and machine learning [SWPN09, JL09, OJF+12].

In this case, the data matrix AA can be interpreted as the adjacency matrix of a graph GG over nn vertices (whereby Aij=+1A_{ij}=+1 encodes presence of edge {i,j}\{i,j\} in GG, and Aij=1A_{ij}=-1 its absence). Under hypothesis H1H_{1}, the set Q{\sf Q} induces a clique in the (otherwise) random graph GG. For the rest of this introduction, we shall focus on the hidden clique problem, referring to Section 2 for a formal statement of our general results.

The largest clique in a uniformly random graph has size 2log2n+o(logn)2\log_{2}n+o(\log n), with high probability [GM75]. Thus, allowing for exhaustive search, the hidden clique problem can be solved when k(2+ε)log2nk\geq(2+{\varepsilon})\log_{2}n. On the other hand, despite significant efforts [AKS98, AV11, DGGP11, FR10, DM14], no polynomial time algorithm is known to work when k=o(n)k=o(\sqrt{n}). As mentioned above, this is a prototypical case for which a large gap exists between performances of well-understood polynomial-time algorithms, and the ultimate information-theoretic (or statistical) limits. This remark motivated an ongoing quest for computational lower bounds.

Finding the maximum clique in a graph is a classical NP-hard problem [Kar72]. Even a very rough approximation to its size is hard to find [Has96, Kho01]. In particular, it is hard to detect the presence of a clique of size n1εn^{1-{\varepsilon}} in a graph with nn vertices.

Unfortunately, worst-case reductions do not imply computational lower bounds for distributions of random instances dictated by natural statistical models. Over the last two years there have been fascinating advances in crafting careful reductions that preserve the instances distribution in specific cases [BR13, MW13a, CX14, HWX14, CLR15]. This line of work typically establishes that several detection problems (sparse PCA, hidden submatrix, hidden community) are at least as hard as the hidden clique problem with k=o(n)k=o(\sqrt{n}). This approach has two limitations:

It yields conditional statements relying on the unproven assumption that the hidden clique problem is hard. In absence of any ‘completeness’ result, this is a strong assumption that calls for further scrutiny.

Reductions among instance distributions are somewhat fragile with respect changes in the distribution. For instance, it is not known whether the hidden submatrix problem with Gaussian distributions P0=N(0,1)P_{0}={\sf N}(0,1) and P1=N(μ,1)P_{1}={\sf N}(\mu,1) is at least as hard as the hidden clique problem, although a superficial look might suggest that they are very similar.

A complementary line of attack consists in proving unconditional lower bounds for broad classes of algorithms. In an early contribution, Jerrum [Jer92] established such a lower bound for a class of Markov Chain Monte Carlo methods. Feldman et al. [FGR+12] considered a query-based formulation of the problem and proved a similar result for ‘statistical algorithms.’ Closer to the present paper is the work of Feige and Krauthgamer [FK00], who analyzed the Lovász-Schrijver semidefinite programming (SDP) hierarchy. Remarkably, these authors proved that rr rounds of this hierarchy (with complexity nO(r)n^{O(r)}) fail to detect the hidden clique unless kn/2rk\gtrsim\sqrt{n}/2^{r}. (Here and below we write f(n,r,)g(n,r,)f(n,r,\dots)\gtrsim g(n,r,\dots) if there exists a constant CC such that f(n,r,)Cg(n,r,)f(n,r,\dots)\geq C\,g(n,r,\dots).)

While this failure of the Lovász-Schrijver hierarchy provides insightful evidence towards the hardness of the hidden-clique problem, an even stronger indication could be obtained by establishing an analogous result for the Sum of Squares (SOS) hierarchy [Sho87, Las01, Par03]. This SDP hierarchy unifies most convex relaxations developed for this and similar problems. Its close connection with the unique games conjecture has led to the idea that SOS might indeed be an ‘optimal’ algorithm for a broad class of problems [BS14]. Finally, many of the low-rank estimation problems mentioned above include naturally quadratic constraints, that are most naturally expressed within the SOS hierarchy.

The SOS hierarchy is formulated in terms of polynomial optimization problems. The level of a relaxation in the hierarchy corresponds to the largest degree dd of any monomial whose value is explicitly treated as a decision variable. Meka and Wigderson [MW13b] proposed a construction of a sequence of feasible solutions, or witnesses (one for each degree dd), that can be used to prove lower bounds for the hidden clique problem within the SOS hierarchy. The key technical step consisted in proving that a certain moment matrix is positive semidefinite: unfortunately this part of their proof contained a fatal flaw.

In the present paper we undertake the more modest task of analyzing the Meka-Wigderson witness for the level d=4d=4 of the SOS hierarchy. This is the first level at which the SOS hierarchy differs substantially from the baseline spectral algorithm of Alon, Krivelevich and Sudakov [AKS98], or from the Lovász-Schrijver hierarchy. We prove that this relaxation fails unless

Notice that the natural guess would be that the SOS hierarchy fails (for any bounded dd) whenever k=o(n)k=o(\sqrt{n}). While our result falls short of establishing this, an argument presented in [Bar14] shows that this is a limitation of the Meka-Wigderson construction. In other words, by refining our analysis it is impossible to improve the bound (3) except –possibly– by removing the logarithmic factor.

Apart from the lower bound on the hidden clique problem, our analysis provides two additional sets of results:

We apply a similar witness construction to the hidden submatrix problem with entries distributions P0=N(0,1)P_{0}={\sf N}(0,1), P1=N(μ,1)P_{1}={\sf N}(\mu,1). We define a polynomial-time computable statistical test that is based on a degree-44 SOS relaxation of a nearly optimal combinatorial test. We show that this fails unless kμ1n1/3/lognk\gtrsim\mu^{-1}n^{1/3}/\log n.

As mentioned above, the main technical contribution consists in proving that a certain random matrix is (with high probability) positive semidefinite. Abstractly, the random matrix in question is function of an underlying (Erdös-Renyi) random graph GG over nn vertices. The matrix has rows/columns indexed by subsets of size at most d/2=2d/2=2, and elements depending by the subgraphs of GG induced by those subsets. We shall loosely refer to this type of random matrix as to a random association scheme.

In order to prove that this witness is positive semidefinite, we decompose the linear space on which it acts into irreducible representation of the group of permutations over nn objects. We then use the moment method to characterize each submatrix defined by this decomposition, and paste together the results to obtain our final condition for positivity.

We believe that both the matrix definition and the proof technique are so natural that they are likely to be useful in related problems.

As an illustration of the last point, our analysis covers the case of Erdös-Renyi graphs with sublinear average degree (namely, with average degree of order n1an^{1-a}, a<1/12a<1/12). In particular, it is easy to derive sum-of-squares lower bounds for finding cliques in such graphs.

The rest of the paper is organized as follows. In Section 2 we state our main technical result, which concerns the spectrum of random association schemes. We then show that it implies lower bounds for the hidden clique and hidden submatrix problem. Section 3 presents a brief outline of the proof. Finally, Section 4 presents the proof of our main technical result.

While this paper was being written, we became aware through [Bar14] that –in still unpublished work– Meka, Potechin and Wigderson proved that the degree-dd SOS relaxation is unsuccessful unless kn1/dk\gtrsim n^{-1/d}. It would be interesting to compare the proof techniques.

Main results

In this section we present our results. Subsection 2.1 introduces a feasible random association scheme that is a slight generalization of the witness developed in [MW13b] (for the degree d=4d=4 SOS). We state conditions implying that this matrix is positive semidefinite with high probability. These conditions are in fact obtained by specializing a more general result stated in Proposition 4.1. We then derive implications for hidden cliques and hidden submatrices.

We shall often identify the collections of subsets of size one, ([n]1)={{i}:i[n]}\binom{[n]}{1}=\{\{i\}:\,i\in[n]\} with [n][n]. Also, we identify ([n]2){\binom{[n]}{2}} with the set of ordered pairs {(i,j):i,j[n],i<j}\{(i,j):i,j\in[n],i<j\}. If A={i,j}A=\{i,j\} with i<ji<j we call ii (jj) the head (respectively, tail) of AA denoted by h(A)h(A) (respectively, t(A)t(A)).

Given the graph GG and a set A[n]A\subseteq[n], we let GAG_{A} denote the subgraph of GG induced by AA. We define the indicator GA\mathcal{G}_{A}

Suppose α\underline{\alpha}, pp satisfy:

The proof of this theorem can be found in Section 4. As mentioned above, a more general set of conditions that imply M(G,α)0M(G,\underline{\alpha})\succeq 0 with high probability is given in Proposition 4.1. The proof of Theorem 1 consists in checking that the conditions of Proposition 4.1 hold and deriving the consequences.

2 A Sum of Squares lower bound for Hidden Clique

Denote by Val(G;d=4){\sf Val}(G;d=4) the value of this optimization problem for graph GG (which is obviously an upper bound on the size of the maximum clique in GG). We can then try to detect the clique (i.e. distinguish hypothesis H1H_{1} and H0H_{0} defined in the introduction), by using the test statistics

with cc_{*} a numerical constant. The rationale for this test is as follows: if we replace Val(G;4){\sf Val}(G;4) by the size of the largest clique, then the above test is essentially optimal, i.e. detects the clique with high probability as soon as klognk\gtrsim\log n (with c=1c_{*}=1).

We then have the following immediate consequence of Theorem 1.

Consider M(α,G)M(\underline{\alpha},G) from Theorem 1 (with p=1/2p=1/2). For M(α,G)M(\underline{\alpha},G) to be positive semidefinite with high probability, we set κ=c0n2/3/logn\kappa=c_{0}\,n^{-2/3}/\log n for some absolute constant c0c_{0}. It is easy to check that M(α,G)M(\underline{\alpha},G) is a feasible point for the optimization problem (8). Recalling that M{i},{i}=α1=κM_{\{i\},\{i\}}=\alpha_{1}=\kappa, we conclude that the objective function at this point is nκ=c0n1/3/lognn\kappa=c_{0}n^{1/3}/\log n, and the claim follows. ∎

We are now in position to derive a formal lower bound on the test (9).

Indeed we can always set to variables indexed by sets A[n]A\subseteq[n] with A⊈QcA\not\subseteq{\sf Q}^{c}. Hence, applying again Corollary 2.1, we deduce that, with probability 1(nk)11-(n-k)^{-1}, Val(G;4)C(nk)1/3/log(nk){\sf Val}(G;4)\geq C(n-k)^{1/3}/\log(n-k), which is larger than ckc_{*}k. Hence T(G)=1T(G)=1 with high probability. ∎

3 A Sum of Squares lower bound for Hidden Submatrix

In order to motivate our definition of an SOS-based statistical test, we begin by introducing a nearly-optimal combinatorial test, call it TcombT_{\rm comb}. This test essentially look for a principal submatrix of AA of dimension kk, with average value larger than μ/2\mu/2. Formally

A straightforward union-bound calculation shows that Tcomb()T_{\rm comb}(\,\cdot\,) succeeds with high probability provided kμ2lognk\gtrsim\mu^{-2}\log n.

As in the previous section, the degree-44 SOS relaxation of the set of binary vectors x{0,1}nx\in\{0,1\}^{n} consists in the following convex set of matrices

This suggests the following relaxation of the test Tcomb()T_{\rm comb}(\,\cdot\,):

We begin by stating a corollary of Theorem 1.

Assume AA is distributed according to hypothesis H0H_{0}, i.e. AijN(0,1)A_{ij}\sim{\sf N}(0,1) for all i,j[n]i,j\in[n]. Then, with probability at least 12n11-2n^{-1}, there exists XC4(n)X\in\mathcal{C}_{4}(n) such that

We choose X=M(G,α)X=M(G,\underline{\alpha}) a random association scheme, where α\underline{\alpha} is set according to Theorem 1, with

with cc a suitably small constant. This ensures that the conditions of Theorem 1 are satisfied, whence XC4(n)X\in\mathcal{C}_{4}(n) with high probability. Further, by definition

It remains to check that the second inequality in (15) hold. We have

Note that the random variables (AijGij)i<j(A_{ij}\mathcal{G}_{ij})_{i<j} are independent and subgaussian. By a standard concentration-of-measure argument we have, with probability at least 1n21-n^{-2}, for a suitably small constant cc^{\prime}, i<jAijGijcn2ϕ(λ)\sum_{i<j}A_{ij}\mathcal{G}_{ij}\geq c^{\prime}n^{2}\phi(\lambda) and hence

Consider the Hidden Submatrix problem with entries’ distributions P0=N(0,1)P_{0}={\sf N}(0,1), and P1=N(μ,1)P_{1}={\sf N}(\mu,1).

Then, the degree-44 Sum-of-Squares test defined in Eq. (14), fails to distinguish between hypotheses H0H_{0} and H1H_{1} if kμ1n1/3/lognk\lesssim\mu^{-1}n^{1/3}/\log n. In particular, T(A)=1T(A)=1 with high probability both under H0H_{0} and under H1H_{1}.

First consider AA distributed according to hypothesis H0H_{0}. Note that, if X0C4(n)X_{0}\in\mathcal{C}_{4}(n) and ss\in is a scaling factor, then sX0C4s\,X_{0}\in\mathcal{C}_{4}. Therefore (by choosing s=ckn1/3logns=ckn^{-1/3}\log n for a suitable constant cc) Corollary 2.2 implies that with high probability there exists XC4(n)X\in\mathcal{C}_{4}(n) such that

Therefore, for μkcn1/3/logn\mu\,k\leq c\,n^{1/3}/\log n with cc a sufficiently small constant, we have i<jAijX{i},{j}cμk2\sum_{i<j}A_{ij}X_{\{i\},\{j\}}\geq c_{*}\mu\,k^{2} and therefore T(A)=1T(A)=1 with high probability.

Consider next AA distributed according to hypothesis H1H_{1}. Note that A=μ1Q1QT+A~A=\mu\,{\sf 1}_{{\sf Q}}{\sf 1}_{{\sf Q}}^{{\sf T}}+\widetilde{A}, where 1Q{\sf 1}_{{\sf Q}} is the indicator vector of set Q{\sf Q}, and A~\widetilde{A} is distributed according to H0H_{0}. Since i<jAijX{i},{j}\sum_{i<j}A_{ij}X_{\{i\},\{j\}} is increasing in AA, we also have that T(A~)=1T(\widetilde{A})=1 implies T(A)=1T(A)=1. As shown above, for μkcn1/3/logn\mu\,k\leq c\,n^{1/3}/\log n, we have T(A~)=1T(\widetilde{A})=1 with high probability, and hence T(A)=1T(A)=1. ∎

Further definitions and proof strategy

Notice that MA,B=NA,BGAGBM_{A,B}=N_{A,B}\mathcal{G}_{A}\mathcal{G}_{B}, i.e. MM is obtained from NN by zeroing columns (rows) indexed by sets A,BA,B that do not induce cliques in GG. Thus, N    0N{\;\succeq\;}0 implies M    0M{\;\succeq\;}0.

Since H{H} is the Schur complement of NN, H0{H}\succeq 0 implies N0N\succeq 0 and hence M0M\succeq 0. The next section is devoted to prove H0{H}\succeq 0: here we sketch the main ingredients.

In the rest of this section we demonstrate the essentials of our strategy to show the weaker assertion H2,2    0{H}_{2,2}{\;\succeq\;}0. We will assume that pp is order one, for concreteness p=1/2p=1/2 which corresponds to the hidden clique problem. It suffices to show that

We are therefore left with the task of proving

The bulk of the proof is devoted to developing operator norm bounds for the matrices PaJ~η,νPb{\cal P}_{a}{\widetilde{J}}_{\eta,\nu}{\cal P}_{b} that hold with high probability. We then bound PaQPb{\cal P}_{a}Q{\cal P}_{b} using triangle inequality

The matrices J~4,1,J~3,ν,J~2,1,J~2,6{\widetilde{J}}_{4,1},{\widetilde{J}}_{3,\nu},{\widetilde{J}}_{2,1},{\widetilde{J}}_{2,6} turn out to have an approximate “Wigner”-like behavior, in the following sense. Note that these are symmetric matrices of size (n2)n2/2\binom{n}{2}\approx n^{2}/2 with random zero-mean entries bounded by α4\alpha_{4}. If their entries were independent, they would have operator norms of order α4n2/2κ4n\alpha_{4}\sqrt{n^{2}/2}\approx\kappa^{4}n [FK81]. Although the entries are actually not independent, the conclusion still holds for J~4,1,J~3,ν,J~2,1,J~2,6{\widetilde{J}}_{4,1},{\widetilde{J}}_{3,\nu},{\widetilde{J}}_{2,1},{\widetilde{J}}_{2,6} and they have operator norms of order κ4n\kappa^{4}n. Hence PaJ~η,νPb2J~η,ν2κ4n\|{\cal P}_{a}{\widetilde{J}}_{\eta,\nu}{\cal P}_{b}\|_{2}\leq\|{\widetilde{J}}_{\eta,\nu}\|_{2}\approx\kappa^{4}n for these cases.

We are now left with the cases (J~1,ν)1ν4({\widetilde{J}}_{1,\nu})_{1\leq\nu\leq 4} and (J~2,ν)2ν5({\widetilde{J}}_{2,\nu})_{2\leq\nu\leq 5}. These require more care, since their typical norms are significantly larger than nn. For instance consider J~1,ν{\widetilde{J}}_{1,\nu} where

with high probability. This suggests that J~1,ν2α4n3/2κ4n3/2\|{\widetilde{J}}_{1,\nu}\|_{2}\lesssim\alpha_{4}n^{3/2}\approx\kappa^{4}n^{3/2} with high probability. This turns out to be the correct order for all the matrices J~1,ν{\widetilde{J}}_{1,\nu} and J~2,ν{\widetilde{J}}_{2,\nu} under consideration.

This heuristic calculation shows the need to be careful with these terms. Indeed, a naive application of this results yields that PaQPb2    κ4n3/2\left\lVert{{\cal P}_{a}Q{\cal P}_{b}}\right\rVert_{2}{\;\lesssim\;}\kappa^{4}n^{3/2}. Recalling Eq. (35), this imposes that λ2κ4n3/2\lambda_{2}\gg\kappa^{4}n^{3/2}. Since we have λ2κ2\lambda_{2}\approx\kappa^{2}, we obtain the condition κn3/4\kappa\ll n^{-3/4}. The parameter κ\kappa turns out to be related to the size of the planted clique through knκk\approx n\kappa. Hence this argument can only prove that the SOS hierarchy fails to detect hidden cliques of size kn1/4k\ll n^{1/4}.

Using these observations and Eq. (36) we obtain that P2QP2    κ4n\left\lVert{{\cal P}_{2}Q{\cal P}_{2}}\right\rVert{\;\lesssim\;}\kappa^{4}n, while for any other pair (a,b){0,1,2}2(a,b)\in\{0,1,2\}^{2} we have that PaQPb    κ4n3/2\left\lVert{{\cal P}_{a}Q{\cal P}_{b}}\right\rVert{\;\lesssim\;}\kappa^{4}n^{3/2}. As noted before, since λ0n2κ4\lambda_{0}\approx n^{2}\kappa^{4}, λ1nκ3\lambda_{1}\approx n\kappa^{3} and λ1κ2\lambda_{1}\approx\kappa^{2} whence the condition in Eq. (35) reduces to:

The 2,22,2 entry of this matrix inequality yields that κ2κ4n0\kappa^{2}-\kappa^{4}n\gg 0 or κn1/2\kappa\ll n^{-1/2}. Considering the (1,1)(1,1) entry yields a similar condition. The key condition is that corresponding to the minor indexed by rows (and columns) 1,21,2:

This requires that nκ5n3κ8n\kappa^{5}\gg n^{3}\kappa^{8} or, equivalently κn2/3\kappa\ll n^{-2/3}. Translating this to clique size k=nκk=n\kappa, we obtain the condition kn1/3k\ll n^{1/3}. This calculation thus demonstrates the origin of the threshold of n1/3n^{1/3} beyond which the Meka-Wigderson witness fails to be positive semidefinite. The counterexample of [BS14] shows that our estimates are fairly tight (indeed, up to a logarithmic factor).

Proofs

Throughout the proof we denote the identity matrix in mm dimensions by Im{\rm I}_{m}, and the all-ones vector by 1m{\sf 1}_{m}. We let Qn=1n1nT/n{\cal Q}_{n}={\sf 1}_{n}{\sf 1}_{n}^{\sf T}/n be the projector onto the all ones vector 1n{\sf 1}_{n}, and Qn=InQn{\cal Q}_{n}^{\perp}={\rm I}_{n}-{\cal Q}_{n} its orthogonal complement.

As mentioned above, we write f(n,r,)g(n,r,)f(n,r,\dots)\gtrsim g(n,r,\dots) if there exists a constant CC such that f(n,r,)Cg(n,r,)f(n,r,\dots)\geq C\,g(n,r,\dots). Similarly we write f(n,r,)g(n,r,)f(n,r,\dots)\gg g(n,r,\dots) if, for any constant CC, we have f(n,r,)Cg(n,r,)f(n,r,\dots)\geq C\,g(n,r,\dots) for all nn large enough. These conditions are always understood to hold uniformly with respect to the extra arguments r,r,\dots, provided these belong to a range depending on nn, that will be clear from the context.

We finally use the shorthand nˉnlogn\bar{n}\equiv n\log n.

2 Main technical result and proof of Theorem 1

Assume the following conditions hold for a suitable constant CC:

Then with probability exceeding 1n11-n^{-1} all of the following are true:

The next two lemmas develop simplified expressions for matrices W{\overline{W}}, WW under the parameter choices of Theorem 1.

Setting (α,p)(\underline{\alpha},p) as in Theorem 1, there exists δn=δn(κ,p)\delta_{n}=\delta_{n}(\kappa,p) with δn(κ,p)0\delta_{n}(\kappa,p)\to 0 as nn\to\infty, such that

Setting (α,p)(\underline{\alpha},p) as in Theorem 1, there exists δn=δn(κ,p)\delta_{n}=\delta_{n}(\kappa,p) with δn(κ,p)0\delta_{n}(\kappa,p)\to 0 as nn\to\infty, such that, for some absolute constant CC,

With Proposition 4.1 and the auxiliary Lemmas 4.3, 4.2 in hand, the proof of Theorem 1 is straightforward.

As noted in Section 3 it suffices to prove that H0{H}\succeq 0. By taking the Schur complement with respect to H11{H}_{11}, we obtain that H0{H}\succeq 0 if and only if

Suppose that the conditions of Proposition 4.1 are verified under the values of α,p\underline{\alpha},p specified as in Theorem 1. Then we have H110{H}_{11}\succeq 0 by Eq. (55). Further by Eqs. (56) and (57), we have

We are now left to verify the conditions of Proposition 4.1. To begin, we verify that α1    2α2p+2α2nˉ1/2\alpha_{1}{\;\gtrsim\;}2\alpha_{2}p+2\alpha_{2}\bar{n}^{1/2}. This condition is satisfied if:

Since κn2/3\kappa\leq n^{-2/3}, this is true.

The condition α2pα120\alpha_{2}p-\alpha_{1}^{2}\geq 0 holds since α2pα12=2κ2κ2=κ2>0\alpha_{2}p-\alpha_{1}^{2}=2\kappa^{2}-\kappa^{2}=\kappa^{2}>0.

It remains to check that W    W{\overline{W}}{\;\succeq\;}W. By Sylvester’s criterion, we need to verify that:

It suffices to check the above values using the simplifications provided by Lemmas 4.2 and 4.3 respectively as follows. Throughout, we will assume that nn is large enough, and write δn\delta_{n} for a generic sequence such that δn0\delta_{n}\to 0 uniformly over κ[logn/n,c4n2/3/logn]\kappa\in[\log n/n,c^{-4}n^{-2/3}/\log n], p[c(κlogn)1/4n1/6,1]p\in[c(\kappa\log n)^{1/4}n^{1/6},1].

For Eq. (71), using Lemmas 4.2 and 4.3 we have that:

Hence , W00W00n2κ4/2p2>0{\overline{W}}_{00}-W_{00}\geq{n^{2}\kappa^{4}}/{2p^{2}}>0 for large enough nn.

The ratio of the two terms above is (up to a constant) given by p4/(κn1/2(logn)3/2)p^{4}/(\kappa n^{1/2}(\log n)^{3/2})\to\infty, hence for nn large enough we have W11W11nκ3/2p2{\overline{W}}_{11}-W_{11}\geq n\kappa^{3}/2p^{2}. Thus Eq. (72) holds if

However as we set p    (κlogn)1/4n1/6p{\;\gtrsim\;}(\kappa\log n)^{1/4}n^{1/6}, this is satisfied for nn large. Indeed this implies that:

Consider now Eq. (73). Expanding the determinant along the third column

We start by noting that, for all nn large enough,

Indeed, by Lemma 4.2 and 4.3, to prove this claim it is sufficient to show that

This is satisfied when we choose pc(κlogn)1/4n1/6p\geq c(\kappa\log n)^{1/4}n^{1/6} when we choose cc a large enough constant. Along with the argument for the second condition above, this implies that:

We now consider the second term. Let wCκ4nˉ3/2/p6w\equiv C\kappa^{4}\bar{n}^{3/2}/p^{6}. Then by Lemmas 4.2 and 4.3, for all nn large enough:

as n2κ4/p2>2wn^{2}\kappa^{4}/p^{2}>2w whenever p(logn)3/8n1/8p\geq(\log n)^{3/8}n^{-1/8}. As we have pn1/12p\geq n^{-1/12} this is satisfied.

The second term in the parentheses above dominates when pκ1/4(logn)3/8n1/8p\geq\kappa^{1/4}(\log n)^{3/8}n^{1/8} which holds as we keep pc(κlogn)1/4n1/6p\geq c(\kappa\log n)^{1/4}n^{1/6}. Hence:

Thus, using Eqs. (84), (86), (88), we conclude that Eq. (73) holds if

or, equivalently, p9c1n2κ3(logn)3p^{9}\geq c_{1}n^{2}\kappa^{3}(\log n)^{3} for an appropriate c1c_{1} large enough. This holds under the stated condition pc(κlogn)1/4n1/6p\geq c(\kappa\log n)^{1/4}n^{1/6} provided cc is large enough. This completes the proof of Theorem 1. ∎

The proofs of Lemma 4.3 and 4.2 follow by a simple calculation and are given in Section 4.3.

3 Proofs of Lemmas 4.3 and 4.2

Firstly, since pc(κlogn)1/4n1/6p\geq c(\kappa\log n)^{1/4}n^{1/6}, and nκlognn\kappa\geq\log n, we have that pn1/12p\geq n^{-1/12} asymptotically. Hence:

Hence the term (nnα3p2)2/n(α2pα1)(n\sqrt{n}\alpha_{3}p^{2})^{2}/n(\alpha_{2}p-\alpha_{1}) is dominant in W00W_{00} and the first claim of the lemma follows.

It suffices to check that Cα4nˉ3/2C\alpha_{4}\bar{n}^{3/2} is the dominant term. By the argument in W00W_{00} we already have that the first term is negligible. Further since, α3nˉ/nα2=κnlogn/p2=(κlogn)1/2n1/60\alpha_{3}\bar{n}/\sqrt{n}\alpha_{2}=\kappa\sqrt{n}\log n/p^{2}=(\kappa\log n)^{1/2}n^{1/6}\to 0, to prove that the third term is negligible, it suffices that

By the estimates in W00W_{00} the fourth term is negligible if:

This implies the claim for W01W_{01}. The calculation for W02W_{02} and W12W_{12} is similar.

As in W00W_{00}, the first term is negligible. For the third term, first we note that α3nˉ/α2=(κlogn)n/p2log2n\alpha_{3}\bar{n}/\alpha_{2}=(\kappa\log n)n/p^{2}\geq\log^{2}n\to\infty. Hence to prove that the third term is negligible, it suffices that:

The final term in W11W_{11} is negligible by the same argument, since n(α2pα12)=nκ2α1n(\alpha_{2}p-\alpha_{1}^{2})=n\kappa^{2}\geq\alpha_{1}.

Since n(α2pα12)=nκ2α1n(\alpha_{2}p-\alpha_{1}^{2})=n\kappa^{2}\geq\alpha_{1} it is easy to see that the third term dominates the fourth above. To see that the first dominates the second, it suffices that their ratio diverge i.e.

as κ1/n\kappa\geq 1/n. Thus we have that the first and third terms dominate the contribution for W22W_{22}. This completes the proof of the lemma. ∎

It is straightforward to check that the third and fourth terms dominates the sum above i.e.:

for some δn0\delta_{n}\to 0. The claim for W00{\overline{W}}_{00} then follows.

The claims for W11{\overline{W}}_{11} and W22{\overline{W}}_{22} follow in the same fashion as above where we instead use the following, adjusting δn\delta_{n} appropriately:

4 Graph definitions and moment method

In this section we define some family of graphs that will be useful in the moment calculations of Sections 4.6, 4.7 and 4.8. We then state and prove a moment method lemma, that will be our basic tool for controlling the norm of random matrices.

A cycle of length mm is a graph D=(V,E)D=(V,E) with vertices V={v1,vm}V=\{v_{1},\dots v_{m}\} and edges E={{vi,vi+1}:  i[m]}E=\{\{v_{i},v_{i+1}\}:\;i\in[m]\} where addition is taken modulo mm.

A couple is an ordered pair of vertices (u,v)(u,v) where we refer to the first vertex in the couple as the head and the second as the tail.

A bridge of length 2m2m is a graph B=(V,E)B=(V,E) with vertex set V={ui,vi,wi:  i[m]}V=\{u_{i},v_{i},w_{i}:\;i\in[m]\}, and edges E={{ui,vi},{ui,wi},{ui+1,vi},{ui+1,wi}:i[m]}E=\{\{u_{i},v_{i}\},\{u_{i},w_{i}\},\{u_{i+1},v_{i}\},\{u_{i+1},w_{i}\}:\,i\in[m]\} where addition above is modulo mm. We regard (vi,wi)(v_{i},w_{i}) for i[m]i\in[m] as couples in the bridge.

A ribbon of length mm is a graph R=(V,E)R=(V,E) with vertex set V={u1um,v1vm}V=\{u_{1}\dots u_{m},\allowbreak v_{1}\dots v_{m}\} and edge set E={{ui,ui+1},{ui,vi+1},{vi,ui+1},{vi,vi+1}:i[m]}E=\{\{u_{i},u_{i+1}\},\{u_{i},v_{i+1}\},\{v_{i},u_{i+1}\},\{v_{i},v_{i+1}\}:i\in[m]\} where addition is modulo mm. Further we call the subgraph induced by the 4-tuple (ui,vi,ui+1,vi+1)(u_{i},v_{i},u_{i+1},v_{i+1}) a face of the ribbon and we call the ordered pairs (ui,vi)(u_{i},v_{i}), i[m]i\in[m] couples of ribbon.

Each face of the ribbon has 44 edges, hence there are (4η)\binom{4}{\eta} ways to remove 4η4-\eta edges from the face. We define a ribbons of class η\eta, type ν\nu and length 2m2m as follows.

For 1η41\leq\eta\leq 4 and 1ν(4η)1\leq\nu\leq\binom{4}{\eta}, we define a ribbon of length 2m2m, class η\eta and type ν\nu to be the graph obtained from a ribbon of length 2m2m by keeping η\eta edges in each face of the ribbon, so that the following happens. The subgraphs induced by the tuples (u2i1,v2i1,u2i,v2i)(u_{2i-1},v_{2i-1},u_{2i},v_{2i}) and (u2i+1,v2i+1,u2i,v2i)(u_{2i+1},v_{2i+1},u_{2i},v_{2i}) for i1i\geq 1 are faces of class η\eta and type ν\nu as shown in Table 1.

For brevity, we write (η,ν)(\eta,\nu)-ribbon to denote a ribbon of class η\eta and type ν\nu.

A (η,ν)(\eta,\nu)-star ribbon S=(V,E)S=(V,E) of length 2m2m is a graph formed from a (η,ν)(\eta,\nu)-ribbon R(V,E)R(V^{\prime},E^{\prime}) of length 2m2m by the following process. For each face (ui,vi,ui+1,vi+1)(u_{i},v_{i},u_{i+1},v_{i+1}) we identify either the vertex pair (ui,ui+1)(u_{i},u_{i+1}) or the pair (vi,vi+1)(v_{i},v_{i+1}) and delete the self loop formed, if any, from the edge set. Note here that the choice of the pair identified can differ across faces of RR.

We let Sη,νm{\cal S}_{\eta,\nu}^{m} denote this collection of (η,ν)(\eta,\nu)-star ribbons.

Suppose FF is one of the graphs defined above and CC is a face of FF. We write, with slight abuse of notation, CFC\subseteq F to denote “a face CC of the graph FF”. Furthermore, to lighten notation, we will often write eFe\in F for an edge ee in the graph FF.

Let L(F){\mathfrak{L}}(F) denote the set of valid labelings of a graph F=(V,E)F=(V,E) and L2(F){\mathfrak{L}}_{2}(F) denote the set of contributing labelings. Further, we define

The following is a simple and general moment method lemma.

Then, for every nn large enough, with probability exceeding 1n(Γc2)/21-n^{-(\Gamma-c_{2})/2} we have that

By rescaling XX we can assume that c5=1c_{5}=1. Since Tr{(XTX)2r}=i(σi(X))2r{\sf{Tr}}\{(X^{\sf T}X)^{2r}\}=\sum_{i}(\sigma_{i}(X))^{2r} where σi(X)\sigma_{i}(X) are the singular values of XX ordered σ1(X)σ2(X)σN(X)\sigma_{1}(X)\geq\sigma_{2}(X)\dots\sigma_{N}(X), we have that:

Then, by Markov inequality and the given assumption:

Using (nk)(ne/k)k\binom{n}{k}\leq(ne/k)^{k} we have:

Setting r=(lognc2)/c1r=\lceil(\log n-c_{2})/c_{1}\rceil and using c2c4c_{2}\geq c_{4} we obtain the bound:

We can now set t={exp(Γ)n(logn)c3/c11}c1/2t=\left\{\exp(\Gamma)n(\log n)^{c_{3}/c_{1}-1}\right\}^{c_{1}/2} whereupon the bound on the right hand side is at most n(Γc2)/2n^{-(\Gamma-c_{2})/2} for every nn large enough. This yields the claim of the lemma. ∎

The next lemma specialized the previous one to the type of random matrices we will be interested in.

By rescaling XX it suffices to show the case β=1\beta=1. Taking expectations on either side of Eq. (125) we have that:

Consider the setting of Lemma 4.14. If we additionally have

where c32c1c_{3}\leq 2c_{1} then X2    βnˉc1/2\left\lVert{X}\right\rVert_{2}{\;\lesssim\;}\beta\bar{n}^{c_{1}/2} with probability at least 1n51-n^{-5}.

The proof follows by combining Lemmas 4.14 and 4.13. ∎

The above eigenvalues can also be computed using [MW13b] which relies on the theory of association schemes. We preferred to present a direct and self-contained derivation.

The following hold for all nn large enough:

For A([n]1)A\in\binom{[n]}{1} and B([n]2)B\in{\binom{[n]}{2}}:

This implies all but the second and the fourth claims immediately as V1P0=V1P2=0V_{1}{\cal P}_{0}=V_{1}{\cal P}_{2}=0, QnV1=0{\cal Q}_{n}V_{1}=0 and Qn1n=0{\cal Q}_{n}^{\perp}{\sf 1}_{n}=0. For the second claim, the above decomposition yields:

Since v1A,v1A=1/(n1)\langle v^{A}_{1},v^{A^{\prime}}_{1}\rangle=-1/(n-1) when AAA\neq A^{\prime} and 11 otherwise, we have that:

hence λmax(V1V1T)=n/(n1)\lambda_{\rm max}(V_{1}V_{1}^{\sf T})=n/(n-1). This implies that:

The block H11{H}_{11} is a linear combination of the identity and the adjacency matrix of GG. Hence, its spectral properties are well understood, since the seminal work of Füredi-Komlós [FK81]. While the nest proposition could be proved using these results, we present an self-contained proof for pedagogical reasons, as the same argument will be repeated several times later for more complex examples.

Suppose that α\underline{\alpha} satisfies:

Then with probability at least 1n51-n^{-5}:

for some constant CC. Under the condition α1/2α2p    α2nˉ1/2\alpha_{1}/2-\alpha_{2}p{\;\gtrsim\;}\alpha_{2}\bar{n}^{1/2} (with a sufficiently large constant which we suppress) we have that:

Inverting this inequality yields the claim for H111{H}_{11}^{-1}. This completes the proof of the proposition. ∎

The following proposition is the key result of this subsection.

With probability at least 125n51-25n^{-5} the following hold:

When AB=0\left\lvert{A\cap B}\right\rvert=0 (last case above) we can expand HA,B{H}_{A,B} as a sum of sixteen terms:

Compactly, we can represent the above summation as follows. Each term above is indexed by a pair (η,ν)(\eta,\nu) where 0η40\leq\eta\leq 4 denotes the number of variables g,g_{\cdot,\cdot} occurring in the product, and ν(4η)\nu\leq\binom{4}{\eta} determines exactly which η\eta-tuple of gg variables occur. For instance, when η=1\eta=1, we have (41)\binom{4}{1} terms α4p3gh(A)h(B),α4p3gh(A)t(B),α4p3gt(A)h(B),α4p3gt(A)t(B)\alpha_{4}p^{3}g_{h(A)h(B)},\alpha_{4}p^{3}g_{h(A)t(B)},\alpha_{4}p^{3}g_{t(A)h(B)},\alpha_{4}p^{3}g_{t(A)t(B)}. Equivalently, if RA,B(η,ν)R_{A,B}(\eta,\nu) is a labeled (η,ν)(\eta,\nu)-ribbon with exactly one face and vertices labeled h(A),t(A),h(B),t(B)h(A),t(A),h(B),t(B) in order, each term corresponds to one specific class and type of ribbon, i.e.

The exact mapping of the pair (η,ν)(\eta,\nu) to the choice of edges in RA,B(η,ν)R_{A,B}(\eta,\nu) is given in Table 1. With a slight abuse of terminology, we refer to η\eta as the class and ν\nu the type of the term. We define the matrices Jη,νJ_{\eta,\nu} (for η=1,2,3,4\eta=1,2,3,4 and ν=(4η)\nu=\binom{4}{\eta}) and KK as follows.

Here we ignore the constraint that A,BA,B do not intersect, and follow the convention that gii=0g_{ii}=0 for every i[n]i\in[n].

Thus, with Eq. (173) we arrive at the following expansion:

We now prove a sequence of lemmas regarding the spectral properties of the matrices K,Jη,νK,J_{\eta,\nu}. The first one concerns the case η=2\eta=2, ν=1,6\nu=1,6 and η=4\eta=4, ν=1\nu=1.

With probability at least 13n51-3n^{-5}, we have that:

We prove that with probability at least 1n51-n^{-5}

for (η,ν)=(2,1),(2,6),(4,1)(\eta,\nu)=(2,1),(2,6),(4,1). The claim then follows by a union bound.

Let R(η,ν,m)R(\eta,\nu,m) denote a (η,ν)(\eta,\nu)-ribbon of length 2m2m. Then, by expanding the product we have:

With probability at least 18n51-8n^{-5}, we have

By the triangle inequality, it suffices to show that for ν{1,,4}\nu\in\{1,\dots,4\}, with probability 1n(Γ2)/21-n^{-(\Gamma-2)/2}:

Then we have J~3,2=α4pP([n]2)QP([n]2)T{\widetilde{J}}_{3,2}=\alpha_{4}p{\cal P}_{{\binom{[n]}{2}}}Q{\cal P}_{{\binom{[n]}{2}}}^{{\sf T}} and, consequently, J~3,22α4pQ2\|{\widetilde{J}}_{3,2}\|_{2}\leq\alpha_{4}p\left\lVert{Q}\right\rVert_{2}. Therefore it suffices to bound the latter, which we do again by the moment method. Firstly we define:

where in the last step we used the fact that U2=D2\|U\|_{2}=\|D\|_{2} by symmetry. Since mm can be taken arbitrarily large, we conclude that Q2U2\|Q\|_{2}\leq\|U\|_{2} and we proceed to bound the latter.

Finally, similar to Proposition 4.19 we have that g    nˉ1/2\left\lVert{g}\right\rVert{\;\lesssim\;}\bar{n}^{1/2} with probability at least 1n51-n^{-5}, hence with the same probability:

Here, R(3,2,m)R(3,2,m) is a (3,2)(3,2)-ribbon of length 2m2m and L~(R(3,2,m)\widetilde{{\mathfrak{L}}}(R(3,2,m) is a collection of labelings of R(3,2,m)R(3,2,m) satisfying the following criteria

The second condition follows because the vertices of degree smaller than 4 already have non-unique labels due to condition 2 of the labeling set L~(R(3,2,m))\widetilde{{\mathfrak{L}}}(R(3,2,m)).

By Eq. (199), it follows that with probability at least 12n51-2n^{-5}, J3,22    α4pnˉ    α4nˉ\|J_{3,2}\|_{2}{\;\lesssim\;}\alpha_{4}p\bar{n}{\;\lesssim\;}\alpha_{4}\bar{n}. This completes the proof of the lemma. ∎

For the case η=1\eta=1 we prove the following

Recall from the definition of J~1,ν{\widetilde{J}}_{1,\nu} that

We prove the second claim –cf. Eq. (202)– by the moment method, similar to Lemma 4.21. Let R(1,ν,m)R({1,\nu},m) be a (1,ν)(1,\nu)-ribbon of length 2m2m. Then:

By Lemma 4.15 it suffices to prove that v(R(1,ν,m))=3m+2v_{*}(R({1,\nu},m))=3m+2. The claim then follows, using Lemma 4.15 and the union bound.

In a similar fashion, we bound the norm of the terms J~2,2{\widetilde{J}}_{2,2}, J~2,3{\widetilde{J}}_{2,3}, J~2,4{\widetilde{J}}_{2,4}, J~2,5{\widetilde{J}}_{2,5}:

Further with probability at least 12n41-2n^{-4}

We prove the claim on the spectral norm for J~2,2{\widetilde{J}}_{2,2}. The claim for J~2,4{\widetilde{J}}_{2,4} holds in an analogous fashion. Let R(2,2,m)R({2,2},m) be a (2,2)(2,2)-ribbon of length mm. Then:

Finally, we have to deal with the remainder terms (recall that matrix KK is defined in Eq. (177)).

We have with probability at least 1n51-n^{-5} that:

We compute Tr{(KTK)m}{\sf{Tr}}\left\{(K^{\sf T}K)^{m}\right\}. Note that:

Here we set Am+1A1A_{m+1}\equiv A_{1}. The second equality follows since KK is supported on entries A,BA,B such that A,BA,B share exactly one vertex. Recalling the definition of star ribbons, each term that does not vanish in the summation above corresponds a labeling of a star ribbon S(2,1,m)S2,1mS({2,1},m)\in{\cal S}_{2,1}^{m} formed from a (2,1)(2,1)-ribbon of length 2m2m, i.e. we have:

Finally, we deal with the differences Jη,νJ~η,νJ_{\eta,\nu}-{\widetilde{J}}_{\eta,\nu}. (Recall that Jη,νJ_{\eta,\nu} and J~η,ν{\widetilde{J}}_{\eta,\nu} are defined in Eqs. (176) and (178).)

With probability at least 16n51-6n^{-5}, for each η2\eta\leq 2 and ν(4η)\nu\leq\binom{4}{\eta}:

We first consider Tr{((J~η,νJη,ν)T(J~η,νJη,ν))m}{\sf{Tr}}\left\{(({\widetilde{J}}_{\eta,\nu}-J_{\eta,\nu})^{\sf T}({\widetilde{J}}_{\eta,\nu}-J_{\eta,\nu}))^{m}\right\}. Let R(η,ν,m)R({\eta,\nu},m) be a (η,ν)(\eta,\nu)-ribbon of length 2m2m. As in the previous lemmas, we can write Tr{((J~η,νJη,ν)T(J~η,νJη,ν))m}{\sf{Tr}}\left\{(({\widetilde{J}}_{\eta,\nu}-J_{\eta,\nu})^{\sf T}({\widetilde{J}}_{\eta,\nu}-J_{\eta,\nu}))^{m}\right\} as a sum over labelings of R(η,ν,m)R({\eta,\nu},m) as follows:

On taking expectations the only labelings that do not vanish satisfy the additional criterion that every labeled edge is repeated at least twice in R(η,ν,m)R({\eta,\nu},m). We call this set of labelings L~2(R(η,ν,m))\widetilde{{\mathfrak{L}}}_{2}(R({\eta,\nu},m)). As in Lemma 4.24 it suffices to show that L~2(R(η,ν,m))(n2m+2)(22m(2m+2)3m+2)\left\lvert{\widetilde{{\mathfrak{L}}}_{2}(R({\eta,\nu},m))}\right\rvert\leq\binom{n}{2m+2}(2^{2m}(2m+2)^{3m+2}). This follows from the same arguments as in Lemmas 4.24, 4.23 (for η=1,2\eta=1,2 respectively), with the additional caveat that the isolated vertices in R(η,ν,m)R({\eta,\nu},m) are not unconstrained as before. Indeed, once the labels of the connected component of R(η,ν,m)R({\eta,\nu},m) are decided, there are only 2m2^{m} possible ways of choosing the labels for the isolated vertices. Consequently, we have the bound:

Applying Lemma 4.13, union bound and the triangle inequality yields the final result. ∎

The intersection of high probability events of Lemmas 4.21, 4.22, 4.23, 4.24, 4.25 and 4.26 holds with probability at least 125n51-25n^{-5}. We will condition on this event for the proof of the proposition.

This proves Eq. (172) and hence finishes our proof of Proposition 4.20.

With probability at least 15n51-5n^{-5} the following are true.

We first prove two Lemmas on the spectral properties of the matrices Lη,νL_{\eta,\nu}

With probability at least 1n51-n^{-5}, we have that

Equivalently, letting B(m)B(m) be a bridge of length 2m2m we have:

By Lemma 4.15 it suffices to show that v(B(m))2m+1v_{*}(B(m))\leq 2m+1. This argument is already covered in Lemma 4.24 and the claim hence follows. ∎

With probability exceeding 12n51-2n^{-5} the following holds:

We prove the claim for L1,1L_{1,1}. The same argument applies for L1,2L_{1,2} with minor modifications.

The above a sum over labelings of a bridge B(m)B(m) of type 1 and class 1, of length 2m2m. This is union of a cycle D(m)D(m) of length 2m2m, and mm isolated vertices. The lemma follows from Lemma 4.15 if v(B(m))2m+1v_{*}(B(m))\leq 2m+1. But by the above decomposition v(B(m))v(D(m))+m=m+1+m=2m+1v_{*}(B(m))\leq v_{*}(D(m))+m=m+1+m=2m+1, as in Proposition 4.19. This completes the proof. ∎

The intersection of favorable events of lemmas 4.28, 4.29 probability at least 15n(Γ4)/21-5n^{(\Gamma-4)/2}. The required claim then follows from Lemmas 4.28, 4.29 and triangle inequality. ∎

9 Proof of Proposition 4.1

The intersection of high probability favorable events of Propositions 4.19, 4.20 and 4.27 holds with probability at least 130n51n41-30n^{-5}\geq 1-n^{-4} for large enough nn. By Proposition 4.19 we already have the required bounds on H11{H}_{11} and H111{H}_{11}^{-1}, cf. Eqs. (55) and (56). It remains to show that on the same event:

By expanding the Rayleigh quotient of each term in Eq. (236), and noting that Wab=0{\overline{W}}_{ab}=0 for aba\neq b, it is straightforward to see that Eq. (236) holds if

The first condition correspond to assumption (53). For the second one, we develop explicit expressions of W{\overline{W}}, WW as follows. For W{\overline{W}}, we use Proposition 4.16, that yields immediately Wa,b=0{\overline{W}}_{a,b}=0 for aba\neq b as claimed, and W0,0{\overline{W}}_{0,0}, W1,1{\overline{W}}_{1,1}, W2,2{\overline{W}}_{2,2} as in Eqs. (43), (44), (45).

In order to develop expressions for WW we note that it is sufficient to guarantee

Using the upper bounds in Propositions 4.18, 4.20, 4.27 we obtain the expressions in Eqs. (46) to (51). This completes the proof.

References