Restricted isometry property of matrices with independent columns and neighborly polytopes by random sampling

Radosław Adamczak, Alexander E. Litvak, Alain Pajor, Nicole Tomczak-Jaegermann

Introduction

The connection between the neighborliness of K(A)K(A) and sparse solutions of underdetermined linear equations was discovered in , Theorem 1, where it is proved that the following two statements are equivalent:

K(A)K(A) has 2N2N vertices and is mm-neighborly

whenever y=Azy=Az has a solution zz having at most mm non-zero coordinates (in other words zz is mm-sparse), then zz is the unique solution of the program:

Definition. Let MM be a n×Nn\times N matrix. For any 1mmin(n,N)1\leq m\leq\min(n,N), the isometry constant of MM is defined as the smallest number δm=δm(M)\delta_{m}=\delta_{m}(M) so that

The relevance of this parameter for the reconstruction property ii) is for instance revealed in ,, where it was shown that if δm(M)+δ2m(M)+δ3m(M)<1\delta_{m}(M)+\delta_{2m}(M)+\delta_{3m}(M)<1 then MM satisfies ii) (see also , , ). In the present paper, we shall use the following sufficient condition from : if a matrix MM satisfies

then i) and ii) are satisfied. In other words, if MM has RIP2m(21){}_{2m}(\sqrt{2}-1) then MM has the reconstruction property ii). This approach gives the strategy of our paper.

Recall that no general construction of centrally symmetric polytopes is known to produce polytopes with an optimal order of neighborliness. All known results are of randomized nature, namely, they show that for a certain probability on the space of n×Nn\times N matrices, a polytope K(A)K(A) is mm-neighborly with overwhelming probability, for (large) mm depending on nn and NN. Consequently, from now on, AA will be a random matrix in some Ensemble in the sense of Random Matrix Theory. Due to the normalization, we shall consider the isometry constant of A/n{A/\sqrt{n}}. The plan consists in specializing to some model of random matrices, the condition δ2m(An)<21\delta_{2m}\left({A\over\sqrt{n}}\right)<\sqrt{2}-1.

Linear forms obey a uniform sub-exponential decay, that is, for all 1iN1\leq i\leq N, all ySn1{y\in S^{n-1}}, and t>0t>0,

The Euclidean norms of X1,,XNX_{1},\dots,X_{N} are concentrated around their average:

Note that such a concentration inequality is clearly necessary in order to have RIP1((21)/2){}_{1}((\sqrt{2}-1)/2).

One of the main results of this paper, Theorem 4.3, claims that under these conditions, whenever

the random polytope K(A)K(A) is mm-centrally-neighborly with probability larger than 12λCexp(cn)1-2\lambda-C\exp(-c\sqrt{n}), where C,c>0C,c>0 are universal numerical constants. We will make it more precise in Section 4. This model includes the cases when

XiX_{i}’s are independent isotropic random vectors with a log-concave density;

the entries of the matrix are independent, centered with variance one and satisfy a sub-exponential tail inequality;

XiX_{i}’s are on the sphere of radius n\sqrt{n} and linear forms exhibit a uniform sub-exponential tail inequality.

These examples give rise to new classes of compressed sensing matrices. The class of i.i.d. entries with sub-exponential tail behavior (that is, entries being ψ1\psi_{1} random variables), contains a subclass of matrices with i.i.d. ψr\psi_{r} entries for 1<r21<r\leq 2 (see Definition 2.1 below of ψr\psi_{r} random variables). Since in this case the obtained bounds are better by a power of logarithm that may be essential in applications, we prove our results in full generality, for 1r21\leq r\leq 2.

Sub-gaussian matrices with independent ψ2\psi_{2} entries, which correspond to r=2r=2, are by now well understood. They include for instance the Gaussian case when the matrix AA is built with i.i.d. Gaussian N(0,1)N(0,1) random variables (see ,,); the case when the entries of AA are i.i.d. (±1)(\pm 1) Bernoulli random variables (, , ); a general case of i.i.d. sub-gaussian entries is treated in (,, also see for simpler proofs).

Results of this paper are based on concentration type inequalities for random matrices under consideration. The proof of the main technical result, Theorem 3.2, will employ methods from . A crucial new ingredient consists of an analysis of the quantity

Notation and preliminaries

It is well known that the ψr\psi_{r}-norm of a random variable may be estimated from the growth of the moments. More precisely if a random variable YY is such that for any p1p\geq 1, Ypp1/rK\|Y\|_{p}\leq p^{1/r}K, for some K>0K>0, then YψrcK\|Y\|_{\psi_{r}}\leq cK where c>0c>0 is a numerical constant.

Remark: The above notation of Xψr\|X\|_{\psi_{r}} for the weak ψr\,\psi_{r} norm of a random vector XX should not be confused with the standard convention in the probability theory that this notation stands for the ψr\psi_{r} norm of the random variable X|X|, i.e., Xψr\|\,|X|\,\|_{\psi_{r}}–this latter meaning will never be used in this paper.

We recall the well known Bernstein’s inequality which we shall use in the form of a ψ1\psi_{1} estimate ().

Let Y1,...,YnY_{1},...,Y_{n} be independent real random variables with zero mean such that for some ψ>0\psi>0 and every ii, Yiψ1ψ\|Y_{i}\|_{\psi_{1}}\leq\psi. Then, for any t>0t>0,

in other words, if XX is centered and its covariance matrix is the identity.

It is well known that if a measure has a log-concave density, then linear functionals exhibit a sub-exponential decay. More precisely, we have:

where c>0c>0 is a universal constant. As a consequence, if XX is an isotropic random vector with a log-concave density then Xψ1c\|X\|_{\psi_{1}}\leq c.

The Euclidean norm of an isotropic random vector with a log-concave density highly concentrates around its expectation, this translates geometrically to the concentration of mass of an isotropic convex body within a thin Euclidean shell (, see also ). We will use here the following result immediately derived from , Theorem 4.4.

Moreover, one can take c0=3.33c_{0}=3.33 and c1=0.33c_{1}=0.33.

Remark: It is conjectured that in the above theorem one can replace θ3.33n0.33\theta^{3.33}n^{0.33} by c(θ)n1/2c(\theta)n^{1/2}.

We shall also use the following result from as formulated in .

with probability at least 1exp(Kn)1-\exp(-K\sqrt{n}).

In this paper, different universal positive constants may be denoted by the same letters C,C0,C,c,c0,cC,C_{0},C^{\prime},c,c_{0},c^{\prime}, etc.

Isometry constant

We begin this section by formulating, in Theorem 3.3, a general estimate for the isometry constant of random matrices with independent ψr\psi_{r} columns. Then, in order to apply such an estimate, we introduce two sufficient conditions that determine large classes of random matrices. Finally, we give examples of important classes that satisfy the estimates from Theorem 3.3 and thus provide us with models: the Log-Concave Ensemble, matrices with i.i.d. ψr\psi_{r} entries, and matrices defined by independent ψr\psi_{r} vectors on a sphere.

Techniques of “compressed sensing” rely on properties of the sampling matrix, which should act almost isometrically on sparse vectors. This motivated the concept of Restricted Isometry Property (RIP) defined in . To quantify this property of the “sensing” matrix, the authors introduced the isometry constant defined in the introduction, that we recall here for the convenience of the reader.

Let MM be a n×Nn\times N matrices and let δ(0,1)\delta\in(0,1). For any 1mmin(n,N)1\leq m\leq\min(n,N), the isometry constant of MM is defined as the smallest number δm=δm(M)\delta_{m}=\delta_{m}(M) so that

Thus the isometry constant is controlled by quantity BmB_{m} and the second term, maxiNXi2n1.\max_{i\leq N}\left|{|X_{i}|^{2}\over n}-1\right|. We begin by estimating BmB_{m}.

Then setting ξ=ψK+K\xi=\psi K+K^{\prime}, the inequality

where C,cC,c are absolute positive constants.

We postpone the proof of Theorem 3.2 to the last section. Combining this theorem with inequality (3.3), relating the RIP, BmB_{m} and concentration of the Euclidean norm of the XiX_{i}’s, we immediately deduce an estimate for the isometry constant of a random matrix with independent ψr\psi_{r} columns.

Condition H1(r,ψ)H_{1}(r,\psi): Linear forms obey a uniform ψr\psi_{r} estimate:

Condition H2(λ)H_{2}(\lambda): Xi|X_{i}|’s are concentrated around their average:

As already mentioned in the Introduction, a condition such as H2(λ)H_{2}(\lambda) is necessary to have the RIP. Indeed, if the matrix A/nA/\sqrt{n} has RIP1((21)/2){}_{1}((\sqrt{2}-1)/2) with probability λ\lambda then H2(λ)H_{2}(\lambda) is satisfied.

2 Examples

We now specialize Theorem 3.3 to some specific classes of matrices.

Assume the above “log-concave setting”. There exist universal constants ψ,C,c>0\psi,C,c>0 such that conditions H1(1,ψ)H_{1}(1,\psi) and H2(Cexp(cnc1))H_{2}(C\exp(-cn^{c_{1}})) are satisfied whenever Nexp(cnc1)N\leq\exp{(cn^{c_{1}})}, where c1{c_{1}} is given in Lemma 2.6.

The proof is immediate from Lemmas 2.5 and 2.6.

Applying Theorem 3.3 (with r=1r=1) together with Lemmas 3.4 and 2.7 to the Log-Concave Ensemble, we get that for every Nexp(cnc1)N\leq\exp(cn^{c_{1}}),

where C,c>0C,c>0 are universal constants and c1c_{1} is given in Lemma 2.6.

It might be worthwhile to note that using directly Lemma 2.6 one can replace the second term in estimate (3.6) by a term tending to 0 when nn\to\infty, but this would require an adjustment in probability. For example 1/nc1/2c01/n^{c_{1}/2c_{0}} works with the probability estimate in which exp(cnc1)\exp(-cn^{c_{1}}) is replaced by exp(cnc1/2)\exp(-cn^{c_{1}/2}). (Here c0c_{0} is given in Lemma 2.6.)

Consider now the “ψr\psi_{r} setting”, where the entries aija_{ij} of the matrix AA are independent centered, with variance one, random ψr\psi_{r} variables (with rr\in). Set ψ=maxijaijψr\psi=\max_{ij}\|a_{ij}\|_{\psi_{r}}.

Assume the above “ψr\psi_{r} setting” with rr\in. Then conditions H1(r,Cψ)H_{1}(r,C\psi) and H2(2exp(cnr/2/ψ2r))H_{2}(2\exp(-cn^{r/2}/\psi^{2r})) are satisfied whenever Nexp(cnr/2/ψ2r)N\leq\exp(cn^{r/2}/\psi^{2r}), where C,cC,c are absolute positive constants.

The following Lemma is a combination of Corollaries 2.9 and 2.10 of .

where 1/r+1/r=11/r^{\ast}+1/r=1 and aq=(a1q++anq)1/q\|a\|_{q}=(|a_{1}|^{q}+\ldots+|a_{n}|^{q})^{1/q}, for 1q<1\leq q<\infty.

The behavior of general centered ψr\psi_{r} variables can be easily reduced to symmetric Weibull variables. The argument is quite standard, we sketch it here for the sake of completeness.

Assume thus that Z1,,ZnZ_{1},\ldots,Z_{n} are independent mean zero random variables with Ziψr1\|Z_{i}\|_{\psi_{r}}\leq 1. Let β=(log2)1/r\beta=(\log 2)^{1/r} and set Ui=(Ziβ)+U_{i}=(|Z_{i}|-\beta)_{+}. Let YiY_{i} be defined as in Lemma 3.6.

We will use the above observation together with symmetrization and the contraction principle to estimate moments of linear combinations of variables ZiZ_{i}. We have for p1p\geq 1,

where to get the last two inequalities we used Khinchine’s inequality, Lemma 3.6 and integration by parts to pass from tail to moment estimates.

We are now ready to prove condition H1(r,Cψ)H_{1}(r,C\psi). Fix ySn1y\in S^{n-1} and consider the linear combination i=1nyiaij\sum_{i=1}^{n}y_{i}a_{ij}. Since aijψrψ\|a_{ij}\|_{\psi_{r}}\leq\psi, we obtain by homogeneity

The proof of condition H2H_{2} goes along similar lines. Instead of Lemma 3.6 we will now use the following lemma, which is an easy consequence of Theorem 6.2 in and the observation that the pp-th moment of a Weibull variable with parameter ss is of order Csp1/sC_{s}p^{1/s}, where CsC_{s} remains bounded for ss away from .

Moreover, for s1/2s\geq 1/2, CsC_{s} is bounded by some absolute constant.

Using similar arguments as in the proof of condition H1H_{1} we can infer from the above lemma that if Z1,,ZnZ_{1},\ldots,Z_{n} are independent mean zero random variables with Ziψsb\|Z_{i}\|_{\psi_{s}}\leq b (s[1/2,1)s\in[1/2,1)), then for p2p\geq 2,

Therefore, for any p2p\geq 2 by the Chebyshev inequality in LpL_{p},

for some (new) universal constant CC or equivalently

(The additional constants appearing above stem from the fact that under the standard definition for s<1s<1, ψs\|\cdot\|_{\psi_{s}} is not a norm but only a quasi-norm and additionally 1ψr/21\|1\|_{\psi_{r/2}}\neq 1. One can modify the function xexr1x\mapsto e^{x^{r}}-1 so that it is convex. For rr away from zero, this modification changes the norm by an absolute constant). Therefore, applying (3.7) with t=εnt=\varepsilon n yields

For r=2r=2 the proof is similar, but uses Lemma 3.6 (which in this case reduces to Bernstein’s ψ1\psi_{1} inequality) instead of Lemma 3.7 (the argument is simpler since in this case the involved norms of the vector aa do not depend on pp and we get (3.7) directly).

The lemma follows now by the union bound.

Applying Theorem 3.3 together with Lemma 3.5 to the “ψr\psi_{r} setting”, we get that for every Nexp(cnr/2/ψ2r)N\leq\exp(cn^{r/2}/\psi^{2r}),

2.3 Vectors on a sphere

Another interesting case is when the vectors X1,,XNX_{1},\dots,X_{N} lie on a common sphere. To keep the same normalization as in the previous cases we assume that the sphere has the radius n\sqrt{n}. Then condition (3.5) becomes empty. Let 1r21\leq r\leq 2 and assume that the vectors are ψr\psi_{r} and let ψ=maxiNXiψr\psi=\max_{i\leq N}\|X_{i}\|_{\psi_{r}}. Let K1K\geq 1 and set ξ=ψK\xi=\psi K. Then Theorem 3.3 immediately gives that

The geometry of faces of random polytopes

In this Section we discuss the geometry of random polytopes. Let AA be an n×Nn\times N matrix. We denote by K+(A)K^{+}(A) (resp. K(A)K(A)) the convex hull (resp., the symmetric convex hull) of the NN columns of AA.

For an integer 1mn1\leq m\leq n, a polytope is called mm-neighborly if any set of less than mm vertices is the vertex set of a face. In the symmetric setting, a centrally symmetric convex polytope is mm-centrally-neighborly if any set of less than mm vertices containing no-opposite pairs is the vertex set of a face. We refer the reader to the books and for classical details on neighborly polytopes. (Some new quantitative invariants related to neighborliness were recently developed in .)

The relation between the problem of reconstruction and neighborly polytopes was discovered in .

(, Theorem 1) Let AA be a n×Nn\times N matrix, nNn\leq N. The following two assertions are equivalent.

The polytope K(A)K(A) has 2N2N vertices and is mm-centrally-neighborly.

Whenever y=Azy=Az has a solution zz having at most mm non-zero coordinates, zz is the unique solution of the optimization problem (P)(P):

We will also use the following result from (which could be replaced by a similar result from ).

We are now ready to state the main result on neighborly random polytopes.

Let 1mnN1\leq m\leq n\leq N be integers. Let 1r21\leq r\leq 2. Let ψ1\psi\geq 1 and λ(0,1/2)\lambda\in(0,1/2). Let X1,,XNX_{1},\dots,X_{N} be independent random vectors satisfying H1(r,ψ)H_{1}(r,\psi) with parameter ψ\psi and H2(λ)H_{2}(\lambda) with probability λ\lambda. Let AA be the n×Nn\times N matrix with X1,,XNX_{1},\dots,X_{N} as columns. Then, with probability larger than

the polytopes K+(A)K^{+}(A) and K(A)K(A) are mm-neighborly and mm-centrally-neighborly, respectively, whenever

Observe that the probability is positive for nn large enough provided that λ<1/2\lambda<1/2.

Proof. Theorem 3.3 and the definition of property H1(r,ψ)H_{1}(r,\psi) imply that for arbitrary θ(0,1)\theta^{\prime}\in(0,1), and K,K1K,K^{\prime}\geq 1, setting ξ=ψK+K\xi=\psi K+K^{\prime}, we have

In view of Lemma 4.2, we look for mm and θ\theta^{\prime} to ensure δ2m(A/n)<21\delta_{2m}(A/\sqrt{n})<\sqrt{2}-1. For instance, we let θ=(21)/2\theta^{\prime}=(\sqrt{2}-1)/2 and note that (3.5) implies

Now set m_{0}=[{c^{\prime}n\big{/}\psi^{4}\log^{2/r}(C^{\prime}\psi^{6}N/n)}] (for some new constants C,c>0C^{\prime},c^{\prime}>0). It is clearly sufficient to prove that the polytopes K+(A)K^{+}(A) and K(A)K(A) are m0m_{0}-neighborly and m0m_{0}-centrally-neighborly, respectively. Thus adjusting the constants C,c>0C^{\prime},c^{\prime}>0 and writing mm for m0m_{0}, we obtain

Combining this with the choice of θ\theta^{\prime}, passing from mm to 2m2m and adjusting the constants again if necessary, we conclude that δm(An)<21\delta_{m}\left({A\over\sqrt{n}}\right)<\sqrt{2}-1 with probability larger than

The last estimate follows from (4.1) by applying (3.5) and (4.2) to the last two terms, respectively; and where C,c>0C^{\prime\prime},c^{\prime\prime}>0 are again new constants.

2 Examples

We will now apply Theorem 4.3 in the three different settings introduced in the previous section.

Applying Lemma 3.4 and bound (3.6) we get the following:

Let 1mnN1\leq m\leq n\leq N be integers. Let X1,,XNX_{1},\dots,X_{N} be independent isotropic vectors with log-concave densities. This is for instance the case if X1,,XNX_{1},\dots,X_{N} are i.i.d. random vectors uniformly distributed on an isotropic convex body. Then, for any Nexp(cnc1/2)N\leq\exp(cn^{c_{1}/2}), with probability at least 1Cexp(cnc1/2)1-C\exp(-cn^{c_{1}/2}), the polytopes K+(A)K^{+}(A) and K(A)K(A) are mm-neighborly and mm-centrally-neighborly, respectively, whenever

where C,c>0C,c>0 are universal constants and c1{c_{1}} is given in Lemma 2.6.

In a similar way as above, Lemma 3.5 and bound (3.8) imply the following theorem (note that its conclusion becomes empty if Nexp(cnr/2/ψ2r)N\geq\exp(cn^{r/2}/\psi^{2r}) and ψ1\psi\geq 1).

Let AA be a matrix with entries that are independent centered variance one random variables. Let 1r21\leq r\leq 2 and assume that the ψr\psi_{r} norms of the entries are bounded by some constant ψ\psi. Then, for any Nexp(cnr/2/ψ2r)N\leq\exp(cn^{r/2}/\psi^{2r}), with probability at least 1Cexp(cnr/2/ψ2r)1-C\exp(-cn^{r/2}/\psi^{2r}), the polytopes K+(A)K^{+}(A) and K(A)K(A) are mm-neighborly and mm-centrally-neighborly, respectively, whenever 1mn1\leq m\leq n satisfies

2.3 Vectors on a sphere

Finally assume that the vectors are on a sphere of radius n\sqrt{n}. From bound (3.9) we obtain:

Let 1mnN1\leq m\leq n\leq N be integers. Let 1r21\leq r\leq 2. Let X1,,XNX_{1},\dots,X_{N} be independent vectors on a sphere of radius n\sqrt{n} and satisfying H1(r,ψ)H_{1}(r,\psi) for some parameter ψ>0\psi>0. Let K1K\geq 1 and set ξ=ψK\xi=\psi K. Then, with probability at least 1Cexp(Kn/ψ2)1-C\exp(-K\sqrt{n}/\psi^{2}), the polytopes K+(A)K^{+}(A) and K(A)K(A) are mm-neighborly and mm-centrally-neighborly, respectively, whenever

Remark: 1) For the matrix AA with i.i.d. Gaussian N(0,1)N(0,1) entries (the case considered in Section 3.2.2 above when r=2r=2), it is known that with overwhelming probability, K(A)K(A) is mm-centrally-neighborly, whenever 1mn1\leq m\leq n satisfies

where C,c>0C,c>0 are universal constants, (see ,,,). The precise asymptotic dependence of mm on nn and NN has been well studied in when n/Nδ(0,1)n/N\to\delta\in(0,1) and in when n/N0n/N\to 0.

Main technical result

and define the other two quantities as follows:

Given a real number ss, we will denote max(s,0)\max(s,0) by s+s_{+}.

The main purpose of this Section is to prove Theorem 3.2. In fact we will prove a stronger technical result, Theorem 5.1, from which Theorem 3.2 will follow.

where C0C_{0} is an absolute constant and q=max{1,1/r}q=\max\{1,1/r\}.

CC is an absolute constant and q=max{1,1/r}q=\max\{1,1/r\}.

Before starting the proof of the theorem we show how it implies Theorem 3.2, stated in Section 3.

Fix K11K_{1}\geq 1 and let KK1K\geq K_{1} be such that

By Theorem 5.1 with r1r\geq 1, and the condition on mm,

and cc and C0C_{0} are absolute positive constants. Thus, if CmK2nC_{m}\leq K_{2}\sqrt{n} for some K2K_{2}, then

where C1C_{1} is an absolute constant. This concludes the proof.

2 Proof of Theorem 5.1

The proof will use the same construction as in , which however requires some modifications. For completeness and the reader’s convenience we provide details of the argument.

We require the following two lemmas proved in with r=1r=1. Since the proofs for general rr repeat the same arguments, we leave them for the reader.

The following formula is well known and the proof is in its statement.

We are now ready to start the proof of Theorem 5.1.

Proof of Theorem 5.1. As in , the construction splits into two cases.

Now we split DxD_{x} according to the structure of xx. Namely we let

We first estimate DxD^{\prime}_{x}. By Lemma 5.4 we have

We now set q=max{1,1/r}q=\max\{1,1/r\} and apply Lemma 5.2 to each summand in the sum above with the parameters

We now pass to the estimate for DxD^{\prime\prime}_{x} which essentially follows the same lines.

Then Mk(θ)2B2N{\cal{M}}_{k}(\theta)\subset 2B_{2}^{N} and (similarly as in ) we can estimate the cardinality

where C2C_{2} is the an absolute constant.

Since Dx=Dx+DxD_{x}=D^{\prime}_{x}+D^{\prime\prime}_{x}, then

Moreover, xx is chosen to have the same support as zz, and thus w=zxw=z-x has the support suppwm|\mathop{\rm supp}w|\leq m.

It follows from the definitions of DzD_{z} and AA that

(here w(i)w(i), x(i)x(i) and z(i)z(i) denote the coordinates of ww, xx and zz, respectively). Thus

Thus, by (5.4) and using again AmBm2+Cm2Bm+CmA_{m}\leq\sqrt{B_{m}^{2}+C_{m}^{2}}\leq B_{m}+C_{m} we obtain

3 Optimality of estimates

We conclude this section by an example showing optimality, in a certain sense, of estimates in Theorem 3.2. We will limit ourselves to the ψ1\psi_{1} case, that is to r=1r=1. To this end we consider a special case when Xi=(Xij)j=1nX_{i}=(X_{ij})_{j=1}^{n} where XijX_{ij} are i.i.d. symmetric exponential variables with variance one. We begin by showing an optimal estimate for AmA_{m}.

First, from (Theorem 3.5) we have that for Nexp(cn)N\leq\exp(c\sqrt{n}) and any K1K\geq 1,

where C,c>0C,c>0 are numerical constants. In the other direction, we have the following

by general tail estimates for linear combinations of exponential variables with vector valued coefficients (see e.g. Corollary 1 in ), we get

where to simplify the notation we set Yi=Xi1Y_{i}=X_{i1}. On the right hand side we actually have i=1mYi\sum_{i=1}^{m}|Y_{i}^{\ast}|, where YiY_{i}^{\ast} is such a rearrangement of YiY_{i} that Y1Y2Yn|Y_{1}^{\ast}|\geq|Y_{2}^{\ast}|\geq\ldots\geq|Y_{n}^{\ast}|, which can be used to derive lower bounds on the expectation. We will however not rely on this representation, instead we will use a Sudakov type minoration principle for exponential variables proved in , Theorem 5.2.9, which we state here in a simplified version, adapted to our purposes.

References