A simple proof that random matrices are democratic

Mark A. Davenport, Jason N. Laska, Petros T. Boufounos, Richard G. Baraniuk

Introduction

where Φ\boldsymbol{\Phi} is an M×NM\times N matrix that models the measurement system. The hope is that we can design Φ\boldsymbol{\Phi} so that x{\mathbf{x}} can be accurately recovered even when MNM\ll N. In general this is not possible, but if x{\mathbf{x}} is KK-sparse, meaning that it has only KK nonzero entries then it is possible to design Φ\boldsymbol{\Phi} that preserve the information about x{\mathbf{x}} using only M=O(Klog(N/K))M=O(K\log(N/K)) measurements. The most commonly studied Φ\boldsymbol{\Phi} that satisfy this bound on MM are random, i.e., each entry of Φ\boldsymbol{\Phi} is drawn independently from some suitable distribution . We will focus our attention on such Φ\boldsymbol{\Phi}.

Among the advantages of random measurements is a property commonly referred to as democracy. While it is not usually rigorously defined in the literature, democracy is usually taken to mean that each measurement contributes a similar amount of information about the signal x{\mathbf{x}} to the compressed representation y{\mathbf{y}} .The original introduction of this term was with respect to quantization , i.e., a democratic quantizer would ensure that each bit is given “equal weight.” As the CS framework developed, it became empirically clear that CS systems exhibited this property with respect to compression . Others have described democracy to mean that each measurement is equally important (or unimportant) . Despite the fact that democracy is so frequently touted as an advantage of random measurements, it has received little analytical attention in the CS context. Perhaps more surprisingly, the property has not been explicitly exploited in applications until recently .

The fact that random measurements are democratic seems intuitive; when using random measurements, each measurement is a randomly weighted sum of a large fraction (or all) of the entries of x{\mathbf{x}}, and since the weights are chosen independently at random, no preference is given to any particular entries. More concretely, suppose that the measurements y1,y2,,yMy_{1},y_{2},\ldots,y_{M} are independent and identically distributed (i.i.d.) according to some distribution fYf_{Y}, as is the case for the Φ\boldsymbol{\Phi} considered in this report. Now suppose that we select M~<M\widetilde{M}<M of the yiy_{i} at random (or according to some procedure that is independent of y{\mathbf{y}}). Then clearly, we are left with a length-M~\widetilde{M} measurement vector y~\widetilde{{\mathbf{y}}} such that each y~ifY\widetilde{y}_{i}\sim f_{Y}. Stated another way, if we set D=MM~D=M-\widetilde{M}, then there is no difference between collecting M~\widetilde{M} measurements and collecting MM measurements and deleting DD of them, provided that this deletion is done independently of the actual values of y{\mathbf{y}}.

However, following this line of reasoning will ultimately lead to a rather weak definition of democracy. To see this, consider the case where the measurements are deleted by an adversary. By adaptively deleting the entries of y{\mathbf{y}} one can change the distribution of y~\widetilde{{\mathbf{y}}}. For example, the adversary can delete the DD largest elements of y{\mathbf{y}}, thereby skewing the distribution of y~\widetilde{{\mathbf{y}}}. In many cases, especially if the same matrix Φ\boldsymbol{\Phi} will be used repeatedly with different measurements being deleted each time, it would be far better to know that any M~\widetilde{M} measurements will be sufficient to reconstruct the signal. This is a significantly stronger requirement.

In order to formally define this stronger notion of democracy, we must first describe the properties that a matrix must satisfy to ensure stable reconstruction. Towards that end, we recall the definition of the restricted isometry property (RIP) for the matrix Φ\boldsymbol{\Phi} .

A matrix Φ\boldsymbol{\Phi} satisfies the RIP of order KK with constant δ(0,1)\delta\in(0,1) if

holds for all x{\mathbf{x}} such that x0K\|{\mathbf{x}}\|_{0}\leq K.

Much is known about matrices that satisfy the RIP, but for our purposes it suffices to note that if we draw a random M×NM\times N matrix Φ\boldsymbol{\Phi} whose entries ϕij\phi_{ij} are i.i.d. sub-Gaussian random variables, then provided that

we have that with high probability Φ\boldsymbol{\Phi} will satisfy the RIP of order KK with constant δ\delta .

The RIP also provides us with a way to quantify our notion of democracy. To do so, we first establish some notation that will prove useful throughout this report. Let Γ{1,2,,,M}\Gamma\subset\{1,2,,\ldots,M\}. By ΦΓ\boldsymbol{\Phi}^{\Gamma} we mean the Γ×M|\Gamma|\times M matrix obtained by selecting the rows of Φ\boldsymbol{\Phi} indexed by Γ\Gamma. Alternatively, if Λ{1,2,,N}\Lambda\subset\{1,2,\dots,N\}, then we use ΦΛ\boldsymbol{\Phi}_{\Lambda} to indicate the M×ΛM\times|\Lambda| matrix obtained by selecting the columns of Φ\boldsymbol{\Phi} indexed by Λ\Lambda. Following , we now formally define democracy as follows.

Let Φ\boldsymbol{\Phi} be and M×NM\times N matrix, and let M~M\widetilde{M}\leq M be given. We say that Φ\boldsymbol{\Phi} is (M~,K,δ)(\widetilde{M},K,\delta)-democratic if for all Γ\Gamma such that ΓM~|\Gamma|\geq\widetilde{M} the matrix ΦΓ\boldsymbol{\Phi}^{\Gamma} satisfies the RIP of order KK with constant δ\delta.

In Section 2 below we present a simple proof that Gaussian matrices are democratic and demonstrate how the proof can be extended to sub-Gaussian matrices. The core of this proof can be found in , but is included in full in this report. In Section 3 we discuss the implications of the result and alternative interpretations. Section 4 contains the additional theorems required by the proof.

Random matrices are democratic

We now demonstrate that certain randomly generated matrices are democratic. While the theorem actually holds (with different constants) for the more general class of sub-Gaussian matrices, for simplicity we restrict our attention to Gaussian matrices. We provide discussion of the sub-Gaussian case in Section 4.

Let Φ\boldsymbol{\Phi} by an M×NM\times N matrix with elements ϕij\phi_{ij} drawn according to N(0,1/M)\mathcal{N}(0,1/M) and let M~M\widetilde{M}\leq M, K<M~K<\widetilde{M}, and δ(0,1)\delta\in(0,1) be given. Define D=MM~D=M-\widetilde{M}. If

then with probability exceeding 13eC2M1-3e^{-C_{2}M} we have that Φ\boldsymbol{\Phi} is (M~,K,δ/(1δ))(\widetilde{M},K,\delta/(1-\delta))-democratic, where C1C_{1} is arbitrary and C2=(δ/8)2log(42e/δ)/C1.C_{2}=(\delta/8)^{2}-\log(42e/\delta)/C_{1}.

Our proof consists of two main steps. We begin by defining the M×(N+M)M\times(N+M) matrix A=[I Φ]{\mathbf{A}}=[{\mathbf{I}}~{}\boldsymbol{\Phi}] formed by appending Φ\boldsymbol{\Phi} to the M×MM\times M identity matrix. Theorem 2, also found in , demonstrates that under the assumptions in the theorem statement, with probability exceeding 13eC2M1-3e^{-C_{2}M} we have that A{\mathbf{A}} satisfies the RIP of order K+DK+D with constant δ\delta. The second step is to use this fact to show that all possible M~×N\widetilde{M}\times N submatrices of Φ\boldsymbol{\Phi} satisfy the RIP of order KK with constant δ/(1δ)\delta/(1-\delta).

Towards this end, we let Γ{1,2,,M}\Gamma\subset\{1,2,\ldots,M\} be an arbitrary subset of rows such that ΓM~|\Gamma|\geq\widetilde{M}. Define Λ={1,2,,M}Γ\Lambda=\{1,2,\ldots,M\}\setminus\Gamma and note that Λ=D|\Lambda|=D. Additionally, let

be the orthogonal projector onto R(AΛ)\mathcal{R}({\mathbf{A}}_{\Lambda}), i.e., the range, or column space, of AΛ{\mathbf{A}}_{\Lambda}.AΛ=(AΛTAΛ)1AΛT{\mathbf{A}}_{\Lambda}^{\dagger}=({\mathbf{A}}_{\Lambda}^{T}{\mathbf{A}}_{\Lambda})^{-1}{\mathbf{A}}_{\Lambda}^{T} denotes the Moore-Penrose pseudoinverse of AΛ{\mathbf{A}}_{\Lambda}. Furthermore, we define

as the orthogonal projector onto the orthogonal complement of R(AΛ)\mathcal{R}({\mathbf{A}}_{\Lambda}). In words, this projector nulls the columns of A{\mathbf{A}} corresponding to the index set Λ\Lambda. Now, note that Λ{1,2,,M}\Lambda\subset\{1,2,\ldots,M\}, so AΛ=IΛ{\mathbf{A}}_{\Lambda}={\mathbf{I}}_{\Lambda}. Thus,

where we use I(Λ){\mathbf{I}}(\Lambda) to denote the M×MM\times M matrix with all zeros except for ones on the diagonal entries corresponding to the columns indexed by Λ\Lambda. (We distinguish the M×MM\times M matrix I(Λ){\mathbf{I}}(\Lambda) from the M×DM\times D matrix IΛ{\mathbf{I}}_{\Lambda} — in the former case we replace columns not indexed by Λ\Lambda with zero columns, while in the latter we remove these columns to form a smaller matrix.) Similarly, we have

Thus, we observe that the matrix PΛA=I(Γ)A{\mathbf{P}}_{\Lambda}^{\perp}{\mathbf{A}}={\mathbf{I}}(\Gamma){\mathbf{A}} is simply the matrix A{\mathbf{A}} with zeros replacing all entries on any row ii such that iΓi\notin\Gamma, i.e., (PΛA)Γ=AΓ({\mathbf{P}}_{\Lambda}^{\perp}{\mathbf{A}})^{\Gamma}={\mathbf{A}}^{\Gamma} and (PΛA)Λ=0({\mathbf{P}}_{\Lambda}^{\perp}{\mathbf{A}})^{\Lambda}=\boldsymbol{0}. Furthermore, Theorem 3, also found in , states that for A{\mathbf{A}} satisfying the RIP of order K+DK+D with constant δ\delta, we have that

Discussion

Observe that we require roughly O(Dlog(N))O(D\log(N)) additional measurements to ensure that Φ\boldsymbol{\Phi} is (M~,K,δ)(\widetilde{M},K,\delta)-democratic compared to the number of measurements required to simply ensure that Φ\boldsymbol{\Phi} satisfies the RIP of order KK. This seems intuitive; if we wish to be robust to the loss of any DD measurements while retaining the RIP of order KK, then we should expect to take at least DD additional measurements. This is not unique to the CS framework. For instance, by oversampling, i.e., sampling faster than the minimum required Nyquist rate, uniform sampling systems can also improve robustness with respect to the loss of measurements. However, a benefit of the democratic CS system is that the number of additional measurements needed grows more slowly than in the Nyquist case. To see this, consider the case where we lose DD samples or measurements. For a fixed time period, suppose that sampling the signal at the Nyquist rate yields NN samples. To be robust to the loss of a contiguous block of DD samples, we must sample at D+1D+1 times the Nyquist rate, yielding DNDN additional samples. In contrast, the number of additional measurements needed for a CS measurement system to be democratic is O(Dlog(N))O(D\log(N)), given by (4). Thus, the number of additional samples required by a Nyqust sampler depends linearly on DD and NN while the number of additional measurements for democratic CS systems is still linear in DD but only logarithmic in NN. If NN is large, this can result in tremendous savings. Note also that for a fixed NN and KK, by driving MM higher a CS measurement system can be robust to the loss of a large fraction of the acquired measurements, whereas in Nyquist oversampling, the fraction of (consecutive) samples that can be dropped can never exceed 1/N1/N.

In some applications, this difference may have significant impact. For example, in finite dynamic range quantizers, the measurements saturate when their magnitude exceeds some level. Thus, when uniformly sampling with a low saturation level, if one sample saturates, then the likelihood that any of the neighboring samples will saturate is high, and significant oversampling may be required to ensure any benefit. However, in CS, if many adjacent measurements were to saturate, then for only a slight increase in the number of measurements we can mitigate this kind of error by simply rejecting the saturated measurements; the fact that Φ\boldsymbol{\Phi} is democratic ensures that this strategy will be effective.

In addition to robustness, Theorem 1 implies that reconstruction from a subset of CS measurements is stable to the loss of a potentially larger number of measurements than anticipated. To see this, suppose that and M×NM\times N matrix Φ\boldsymbol{\Phi} is (MD,K,δ)(M-D,K,\delta)-democratic, but consider the situation where D+D~D+\widetilde{D} measurements are dropped. It is clear from the proof of Theorem 1 that if D~<K\widetilde{D}<K, then the resulting matrix ΦΓ\boldsymbol{\Phi}^{\Gamma} will satisfy the RIP of order KD~K-\widetilde{D} with constant δ\delta. Thus, from , if we define K~=(KD~)/2\widetilde{K}=(K-\widetilde{D})/2, then the reconstruction error is then bounded by

where xK~{\mathbf{x}}_{\widetilde{K}} denotes the best K~\widetilde{K}-term approximation of x{\mathbf{x}} and C3C_{3} is an absolute constant depending on Φ\boldsymbol{\Phi} that can be bounded using the constants derived in Theorem 1. Thus, if D~\widetilde{D} is small then the additional error caused by dropping too many measurements will also be relatively small. In contrast, there is simply no analog to this kind of stability result for uniform sampling with linear reconstruction. When the number of dropped samples exceeds DD (where DD represents the oversampling factor described above), there is are no guarantees as to the accuracy of the reconstruction.

2 Numerical exploration

As discussed previously, the democracy property is a stronger condition than the RIP. To demonstrate this, we perform a numerical simulation which illustrates this point. Specifically, we would like to compare the case where the measurements are dropped at random versus the case where the dropped measurements are selected by an adversary. Ideally, we would like to know whether the resulting matrices satisfy the RIP. Of course, this experiment is impossible to perform for two reasons: first, determining if a matrix satisfies the RIP is computationally intractable as it would require checking all possible KK-dimensional sub-matrices of ΦΓ\boldsymbol{\Phi}^{\Gamma}. Moreover, in the adversarial setting one would also have to search for the worst possible Γ\Gamma as well, which is impossible for the same reason. Thus, we instead perform a far simpler experiment, which serves as a very rough proxy to the experiment we would like to perform.

The experiment proceeds over 100100 trials as follows. We fix the parameters N=2048N=2048 and K=13K=13 and vary MM in the range (0,380)(0,380). In each trial we draw a new matrix Φ\Phi with ϕijN(0,1/M)\phi_{ij}\sim\mathcal{N}(0,1/M) and a new signal with KK nonzero coefficients, also drawn from a Gaussian distribution, and then the signal is normalized x2=1\|{\mathbf{x}}\|_{2}=1. Over each set of trials we estimate two quantities:

the maximum DD such that we achieve exact reconstruction for a randomly selected (MD)×N(M-D)\times N submatrix of Φ\Phi on each of the 100100 trials;

the maximum DD such that we achieve exact reconstruction for R=300R=300 randomly selected (MD)×N(M-D)\times N submatrices of Φ\Phi on each of the 100100 trials..

Ideally, the second case should consider all (MD)×N(M-D)\times N submatrices of Φ\Phi rather than just 300 submatrices, but as this is not possible (for reasons discussed above) we simply perform a random sampling of the space of possible submatrices. Note also that exact recovery on one signal is also not proof that the matrix satisfies the RIP, although failure is proof that the matrix does not.

The results of this experiment are depicted in Figure 1. The circles denote data points with the empty circles corresponding to the random selection experiment and the solid circles corresponding to the democracy experiment. The lines denote the best linear fit for each data set where D>0D>0, with the dashed line corresponding to the random selection experiment and the solid line corresponding to democracy experiment.

The maximum DD corresponding to the random selection experiment grows linearly in MM (with coefficient 1) once the minimum number of measurements required for RIP, denoted by MM^{\prime}, is reached. This is because beyond this point at most D=MMD=M-M^{\prime} measurements can be discarded. As demonstrated by the plot, M90M^{\prime}\approx 90 for this experiment. For the democracy experiment M150M^{\prime}\approx 150, larger than for the RIP experiment. Furthermore, the maximum DD for democracy grows more slowly than for the random selection case, which indicates that to be robust to the loss of any DD measurements, CDCD additional measurements, with C>1C>1, are actually necessary.

Theorems

In this section, we prove the two supporting Theorems used in the proof of Theorem 1. We begin by demonstrating that the matrix A=[I Φ]{\mathbf{A}}=[{\mathbf{I}}~{}\boldsymbol{\Phi}] satisfies the RIP. To do so, we first establish the following lemma, that closely parallels the result in equation (4.3) of . The lemma demonstrates that for any u{\mathbf{u}}, if we draw Φ\boldsymbol{\Phi} at random, then Au2\|{\mathbf{A}}{\mathbf{u}}\|_{2} is concentrated around u2\|{\mathbf{u}}\|_{2}.

We first note that since Au=w+Φx{\mathbf{A}}{\mathbf{u}}={\mathbf{w}}+\boldsymbol{\Phi}{\mathbf{x}}, we have that

and since u22=w22+x22+\|{\mathbf{u}}\|_{2}^{2}=\|{\mathbf{w}}\|_{2}^{2}+\|{\mathbf{x}}\|_{2}^{2}+, this establishes (9).

We now turn to (10). Using the arguments in , one can show that

As noted above, 2wTΦxN(0,4w22x22/M)2{\mathbf{w}}^{T}\boldsymbol{\Phi}{\mathbf{x}}\sim\mathcal{N}\left(0,4\|{\mathbf{w}}\|_{2}^{2}\|{\mathbf{x}}\|_{2}^{2}/M\right). Hence, we have that

where Q()Q(\cdot) denotes the tail integral of the standard Gaussian distribution. From (13.48) of we have that

Thus, combining (12) and (13) we obtain that with probability at least 13eMη2/81-3e^{-M\eta^{2}/8} we have that both

Using (11), we can combine (14) and (15) to obtain

where the last inequality follows from the fact that w2x2u2u2\|{\mathbf{w}}\|_{2}\|{\mathbf{x}}\|_{2}\leq\|{\mathbf{u}}\|_{2}\|{\mathbf{u}}\|_{2}. Similarly, we also have that

We note that while the above proof assumes that the entries of Φ\boldsymbol{\Phi} are Gaussian, this proof holds with essentially no modifications for a wide class of sub-Gaussian distributions. A random variable XX is sub-Gaussian if there exists a constant C>0C>0 such that

Using Lemma 1, we now demonstrate that the matrix A{\mathbf{A}} satisfies the RIP provided that MM is sufficiently large.

Let Φ\boldsymbol{\Phi} be an M×NM\times N matrix with elements ϕij\phi_{ij} drawn according to N(0,1/M)\mathcal{N}(0,1/M) and let let A=[I Φ]{\mathbf{A}}=[{\mathbf{I}}~{}\boldsymbol{\Phi}]. If

then with probability exceeding 13eC2M1-3e^{-C_{2}M} we have that A{\mathbf{A}} satisfies the RIP of order (K+D)(K+D) with constant δ\delta, where C1C_{1} is arbitrary and C2=(δ/8)2log(42e/δ)/C1.C_{2}=(\delta/8)^{2}-\log(42e/\delta)/C_{1}.

First note that it is enough to prove (17) in the case x2=1\|\mathbf{x}\|_{2}=1, since A{\mathbf{A}} is linear. Next, fix an index set J{1,2,,N+M}J\subset\{1,2,\ldots,N+M\} with J=K+D|J|=K+D, and let XJX_{J} denote the (K+D)(K+D)-dimensional subspace spanned by the columns of A{\mathbf{A}} indexed by JJ. We choose a finite set of points SJS_{J} such that SJXJS_{J}\subseteq X_{J}, s21\|\mathbf{s}\|_{2}\leq 1 for all sSJ\mathbf{s}\in S_{J}, and for all xXJ\mathbf{x}\in X_{J} with x21\|\mathbf{x}\|_{2}\leq 1 we have

One can show (see Chapter 15 of ) that such a set SJS_{J} exists with SJ(3/ϵ)K+D|S_{J}|\leq(3/\epsilon)^{K+D}. We then repeat this process for each possible index set JJ, and collect all the sets SJS_{J} together

There are (N+MK+D)(eN+MK+D)K+D\binom{N+M}{K+D}\leq\left(e\frac{N+M}{K+D}\right)^{K+D} possible index sets JJ, and hence S(3eϵN+MK+D)K+D|S|\leq\left(\frac{3e}{\epsilon}\frac{N+M}{K+D}\right)^{K+D}. We now use the union bound to apply Lemma 1 to this set of points such that, with probability exceeding

We now define ΣK+D={x:x0K+D}\Sigma_{K+D}=\{{\mathbf{x}}:\|{\mathbf{x}}\|_{0}\leq K+D\}. We define BB as the smallest number such that

Our goal is to show that B1+δB\leq\sqrt{1+\delta}. For this, we recall that for any xΣK+D\mathbf{x}\in\Sigma_{K+D} with x21\|\mathbf{x}\|_{2}\leq 1, we can pick a sS\mathbf{s}\in S such that xs2ϵ\|\mathbf{x}-\mathbf{s}\|_{2}\leq\epsilon and such that xsΣK+D\mathbf{x}-\mathbf{s}\in\Sigma_{K+D} (since if xXJ\mathbf{x}\in X_{J}, we can pick sSJXJ\mathbf{s}\in S_{J}\subset X_{J} satisfying xs2ϵ\|\mathbf{x}-\mathbf{s}\|_{2}\leq\epsilon). In this case we have

Since by definition BB is the smallest number for which (21) holds, we obtain B1+2η+Bϵ\sqrt{B}\leq\sqrt{1+2\eta}+\sqrt{B}\epsilon, which upon rearranging yields B1+2η/(1ϵ)\sqrt{B}\leq\sqrt{1+2\eta}/(1-\epsilon). One can show that by setting ϵ=δ/14\epsilon=\delta/14 and η=δ/22\eta=\delta/2\sqrt{2}, we have that 1+2η/(1ϵ)1+δ\sqrt{1+2\eta}/(1-\epsilon)\leq\sqrt{1+\delta}, which establishes the upper inequality in (2). The lower inequality follows from this since

where the last inequality again holds with ϵ=δ/14\epsilon=\delta/14 and η=δ/22\eta=\delta/2\sqrt{2}. This establishes the theorem. To arrive at the formula for C2C_{2} we first bound the result in (20) using

and then we replace (K+D)log((N+M)/(K+D))(K+D)\log((N+M)/(K+D)) with M/C1M/C_{1}. After simplification, this yields C2=η2/8log(3e/ϵ)/C1C_{2}=\eta^{2}/8-\log(3e/\epsilon)/C_{1}. By substituting the values for ϵ\epsilon and η\eta, we obtain the desired result. ∎

In Theorem 3 below, we show that the matrix PΛA{\mathbf{P}}_{\Lambda}^{\perp}{\mathbf{A}} satisfies a modified version of the RIP. We begin with an elementary lemma that is a straightforward generalization of Lemma 2.1 of , and states that RIP operators approximately preserve inner products between sparse vectors.

We first assume that u2=v2=1\|{\mathbf{u}}\|_{2}=\|{\mathbf{v}}\|_{2}=1. From the fact that

and since A{\mathbf{A}} satisfies the RIP, we have that

From the parallelogram identity we obtain

Similarly, one can show that Au,Avu,vδ\langle{\mathbf{A}}{\mathbf{u}},{\mathbf{A}}{\mathbf{v}}\rangle\geq\langle{\mathbf{u}},{\mathbf{v}}\rangle-\delta, and thus Au,Avu,vδ|\langle{\mathbf{A}}{\mathbf{u}},{\mathbf{A}}{\mathbf{v}}\rangle-\langle{\mathbf{u}},{\mathbf{v}}\rangle|\leq\delta. The result follows for u{\mathbf{u}}, v{\mathbf{v}} with arbitrary norm from the bilinearity of the inner product. ∎

Suppose that A{\mathbf{A}} satisfies the RIP of order KK with isometry constant δ\delta, and let Λ{1,2,,N}\Lambda\subset\{1,2,\ldots,N\}. Define PΛ{\mathbf{P}}^{\perp}_{\Lambda} as in(6). If Λ<K|\Lambda|<K then

From the definition of PΛA{\mathbf{P}}^{\perp}_{\Lambda}{\mathbf{A}} in (5), we may decompose PΛAu{\mathbf{P}}^{\perp}_{\Lambda}{\mathbf{A}}{\mathbf{u}} as PΛAu=AuPΛAu{\mathbf{P}}^{\perp}_{\Lambda}{\mathbf{A}}{\mathbf{u}}={\mathbf{A}}{\mathbf{u}}-{\mathbf{P}}_{\Lambda}{\mathbf{A}}{\mathbf{u}}. Since PΛ{\mathbf{P}}_{\Lambda} is an orthogonal projection, we can write

Our goal is to show that Au2PΛAu2\|{\mathbf{A}}{\mathbf{u}}\|_{2}\approx\|{\mathbf{P}}^{\perp}_{\Lambda}{\mathbf{A}}{\mathbf{u}}\|_{2}, or equivalently, that PΛAu2\|{\mathbf{P}}_{\Lambda}{\mathbf{A}}{\mathbf{u}}\|_{2} is small. Towards this end, we note that since PΛAu{\mathbf{P}}_{\Lambda}{\mathbf{A}}{\mathbf{u}} is orthogonal to PΛAu{\mathbf{P}}^{\perp}_{\Lambda}{\mathbf{A}}{\mathbf{u}}, we have

Since we trivially have that PΛAu20\|{\mathbf{P}}_{\Lambda}{\mathbf{A}}{\mathbf{u}}\|_{2}\geq 0, we can combine this with (26) to obtain

Since u0K\|{\mathbf{u}}\|_{0}\leq K, we can use the RIP to obtain

References