Off-grid Direction of Arrival Estimation Using Sparse Bayesian Inference

Zai Yang, Lihua Xie, Cishen Zhang

Introduction

Recent advancements in array signal processing include compressive (CS-) MUSIC and subspace-augmented (SA-) MUSIC . They are combinations of the conventional MUSIC technique and recent CS methods with guaranteed support recovery performance and can outperform MUSIC and standard CS approaches. Though existing CS-based approaches have shown their improvements in DOA estimation, e.g., their success in the case of limited snapshots, there are still difficulties in practical situations where the true DOAs are not on the sampling grid. On one hand, a dense sampling grid is necessary for accurate DOA estimation to reduce the gap between the true DOA and its nearest grid point since the estimated DOAs are constrained on the grid. On the other hand, a dense sampling grid leads to a highly coherent matrix that violates the condition for the sparse signal recovery. We refer to the model adopted in the standard CS methods as an on-grid model hereafter in the sense that the estimated DOAs are constrained on the fixed grid.

An off-grid model for DOA estimation is studied in where the estimated DOAs are no longer constrained in the sampling grid set. The model takes into account the basis mismatch in the measurement matrix caused by the off-grid DOAs. It has been shown in that the sparse total least squares (STLS) solver proposed in can yield an MAP optimal estimate if the matrix perturbation caused by the basis mismatch is Gaussian. However, we show in this paper that the Gaussian condition cannot be satisfied in the off-grid DOA estimation problem and hence a new solver is needed.

The rest of the paper is organized as follows. Section 2 studies the off-grid model used in this paper. Section 3 introduces the proposed OGSBI and OGSBI-SVD algorithms. Section 4 presents our simulation results. Section 5 concludes this paper.

Off-grid DOA Estimation Model

Consider $K$ narrowband far-field sources $s_{k}(t)$ , $k=1,\cdots,K$ , impinging on an array of $M$ omnidirectional sensors from directions $\theta_{k}$ , $k=1,\cdots,K$ . Time delays at different sensors can be represented by simple phase shifts, leading to the observation model:

where $\boldsymbol{y}(t)=\left[y_{1}(t),\cdots,y_{M}(t)\right]^{T}$ , $\boldsymbol{\theta}=\left[\theta_{1},\cdots,\theta_{K}\right]^{T}$ , $\boldsymbol{s}(t)=\left[s_{1}(t),\cdots,s_{K}(t)\right]^{T}$ , $\boldsymbol{e}(t)=\left[e_{1}(t),\cdots,e_{M}(t)\right]^{T}$ , and $y_{m}(t)$ and $e_{m}(t)$ , $m=1,\cdots,M$ , are the output and measurement noise of the $m$ th sensor at time $t$ respectively. The matrix $\boldsymbol{A}(\boldsymbol{\theta})=\left[\boldsymbol{a}\left(\theta_{1}\right),\cdots,\boldsymbol{a}\left(\theta_{K}\right)\right]$ is an array manifold matrix and $\boldsymbol{a}\left(\theta_{k}\right)$ is called steering vector of the $k$ th source. The entry $\boldsymbol{a}_{m}\left(\theta_{k}\right)$ contains the delay information of the $k$ th source to the $m$ th sensor. In this paper, we assume that the number of sources $K$ is already known. Readers are referred to a preprint version for discussions on the case of unknown $K$ . So, the goal is to find the unknown DOAs $\boldsymbol{\theta}$ given $K$ , $\boldsymbol{y}(t)$ and the mapping $\boldsymbol{\theta}\rightarrow\boldsymbol{A}(\boldsymbol{\theta})$ . In the following we re-derive the off-grid model proposed in using linear approximation and further show its relationship with the on-grid one.

which is the off-grid model to be used in this paper. This model will be empirically validated in Subsection 4.1 by showing that the total noise (approximation error plus measurement noise) follows the Gaussian distribution with high probability if the measurement noise is Gaussian.

It should be noted that the off-grid model in (4) is closely related to the on-grid one that can be obtained by setting $\boldsymbol{\beta}=\boldsymbol{0}$ in (4) ( $\boldsymbol{\Phi}\left(\boldsymbol{0}\right)=\boldsymbol{A}$ ). In fact, the off-grid model can be considered as the first order approximation of the true observation model while the on-grid one is the zeroth order approximation. As a result, the off-grid model has a much smaller modeling error than the on-grid one. Such an advantage is twofold. First, by adopting the same sampling grid the off-grid model results in higher accuracy, especially in the case of a low measurement noise where the modeling error is the dominant modeling uncertainty. Second, a coarser sampling grid can be adopted in the off-grid model to achieve a considerably reduced computational workload with a comparable modeling accuracy.

To estimate the DOAs $\boldsymbol{\theta}$ we need to find not only the support of the sparse signals $\boldsymbol{x}(t)$ , $t=1,\cdots,T$ , but also the off-grid difference $\boldsymbol{\beta}$ . In this paper, we formulate the problem based on a Bayesian perspective and develop an iterative algorithm to jointly estimate $\boldsymbol{x}(t)$ , $t=1,\cdots,T$ , and $\boldsymbol{\beta}$ in the following section.

OGSBI: Off-grid Sparse Bayesian Inference

We consider complex-valued signals throughout the paper since the matrix $\boldsymbol{\Phi}\left(\boldsymbol{\beta}\right)$ is complex-valued. We derive our algorithm in the MMV case. The SMV is a special case by simply setting $T=1$ . Denote $\boldsymbol{Y}=\left[\boldsymbol{y}(1),\cdots,\boldsymbol{y}(T)\right]$ , $\boldsymbol{X}=\left[\boldsymbol{x}(1),\cdots,\boldsymbol{x}(T)\right]$ and $\boldsymbol{E}=\left[\boldsymbol{e}(1),\cdots,\boldsymbol{e}(T)\right]$ . The off-grid DOA estimation model in (4) becomes

Under an assumption of white (circular symmetric) complex Gaussian noises, we have

where $\alpha_{0}=\sigma^{-2}$ denotes the noise precision with $\sigma^{2}$ being the noise variance, the probability density function (PDF) of a (circular symmetric) complex Gaussian distributed random variable $\boldsymbol{u}\sim\mathcal{CN}\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right)$ with mean $\boldsymbol{\mu}$ and covariance $\boldsymbol{\Sigma}$ is

In this paper we assume that the noise precision $\alpha_{0}$ is unknown. A Gamma hyperprior is assumed for $\alpha_{0}$ since it is a conjugate prior of the Gaussian distribution:

where $\Gamma\left(\alpha_{0}|c,d\right)=\left[\Gamma\left(c\right)\right]^{-1}d^{c}{\alpha_{0}}^{c-1}\exp\left\{-d\alpha_{0}\right\}$ with $\Gamma\left(\cdot\right)$ being the Gamma function. We set $c,d\rightarrow 0$ as in to obtain a broad hyperprior.

1.2 Sparse signal model

It is easy to show that all columns of $\boldsymbol{X}$ are independent and share the same prior. According to , for $t=1,\cdots,T$ both $\Re\left\{\boldsymbol{x}(t)\right\}$ and $\Im\left\{\boldsymbol{x}(t)\right\}$ are Laplace distributed and share the same PDF that is strongly peaked at the origin. As a result, the two-stage hierarchical prior is a sparse prior that favors most rows of $\boldsymbol{X}$ being zeros.

1.3 Off-grid distance model

We assume a uniform prior for $\boldsymbol{\beta}$ :

The prior is noninformative in the sense that the only information of $\boldsymbol{\beta}$ we use is its boundedness.

By combining the stages of the hierarchical Bayesian model, the joint PDF is

with the distributions on the right hand side as defined by (8), (10), (11), (9) and (12) respectively.

2 Bayesian Inference

An evidence procedure is exploited to perform the Bayesian inference since the exact posterior distribution $p\left(\boldsymbol{X},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y}\right)$ cannot be explicitly calculated. Similar approaches have been used in standard Bayesian CS methods . First it is easy to show that the posterior distribution of $\boldsymbol{X}$ is a complex Gaussian distribution:

Calculations of $\boldsymbol{\Sigma}$ and $\boldsymbol{\mu}(t)$ , $t=1,\cdots,T$ , need estimates of the hyperparameters $\alpha_{0}$ , $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ . In an evidence procedure, they are estimated using an MAP estimate that maximizes $p\left(\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y}\right)$ . It can be easily observed that to maximize $p\left(\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y}\right)$ is equivalent to maximizing the joint PDF $p\left(\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}\right)=p\left(\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y}\right)p\left(\boldsymbol{Y}\right)$ since $p\left(\boldsymbol{Y}\right)$ is independent of the hyperparameters. An expectation-maximization (EM) algorithm is implemented that treats $\boldsymbol{X}$ as a hidden variable and turns to maximizing $E\left\{\log p\left(\boldsymbol{X},\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}\right)\right\}$ , where $p\left(\boldsymbol{X},\boldsymbol{Y},\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}\right)$ is given in (13) and $E\left\{\cdot\right\}$ denotes an expectation with respect to the posterior of $\boldsymbol{X}$ as given in (14) using the current estimates of the hyperparameters.

Denote $\boldsymbol{\mathcal{U}}=\left[\boldsymbol{\mu}(1),\cdots,\boldsymbol{\mu}(T)\right]=\alpha_{0}\boldsymbol{\Sigma}\boldsymbol{\Phi}^{H}\boldsymbol{Y}$ , $\underline{\boldsymbol{X}}=\boldsymbol{X}/\sqrt{T}$ , $\underline{\boldsymbol{Y}}=\boldsymbol{Y}/\sqrt{T}$ , $\underline{\boldsymbol{\mathcal{U}}}=\boldsymbol{\mathcal{U}}/\sqrt{T}$ and $\underline{\rho}=\rho/T$ . Following a similar procedure as in , it is easy to obtain the following updates of $\boldsymbol{\alpha}$ and $\alpha_{0}$ :

where $E\left\{\left\|\underline{\boldsymbol{X}}^{n}\right\|_{2}^{2}\right\}=\left\|\underline{\boldsymbol{\mathcal{U}}}^{n}\right\|_{2}^{2}+\Sigma_{nn}$ , $E\left\{\left\|\underline{\boldsymbol{Y}}-\boldsymbol{\Phi}\underline{\boldsymbol{X}}\right\|_{\text{F}}^{2}\right\}=\left\|\underline{\boldsymbol{Y}}-\boldsymbol{\Phi}\underline{\boldsymbol{\mathcal{U}}}\right\|_{\text{F}}^{2}+\alpha_{0}^{-1}\sum_{n=1}^{N}\gamma_{n}$ with $\gamma_{n}=1-\alpha_{n}^{-1}\Sigma_{nn}$ .

For $\boldsymbol{\beta}$ , its estimate maximizes $E\left\{\log p\left(\boldsymbol{Y}|\boldsymbol{X},\alpha_{0},\boldsymbol{\beta}\right)p\left(\boldsymbol{\beta}\right)\right\}$ by (13) and thus minimizes

where $C$ is a constant term independent of $\boldsymbol{\beta}$ , $\boldsymbol{P}$ is a positive semi-definite matrix and

The detailed derivation of (19) is provided in Appendix for simplicity of exposition. As a result, we have

Though an explicit expression of $\boldsymbol{\beta}^{new}$ cannot be given, by recognizing that $\boldsymbol{\beta}$ is jointly sparse with $\boldsymbol{x}$ , the dimension of $\boldsymbol{\beta}$ can be reduced to $K$ in the computation and hence $\boldsymbol{\beta}^{new}$ can be efficiently calculated. We provide the details in Subsection 3.5.

The proposed OGSBI algorithm is implemented as follows. After initializations of the hyperparameters $\boldsymbol{\alpha}$ , $\alpha_{0}$ and $\boldsymbol{\beta}$ , we calculate $\boldsymbol{\Sigma}$ and $\boldsymbol{\mu}(t)$ , $t=1,\cdots,T$ , using the current values of the hyperparameters according to (16) and (15) respectively. Then we update $\boldsymbol{\alpha}$ , $\alpha_{0}$ and $\boldsymbol{\beta}$ according to (17), (18) and (22) respectively. The process is repeated until some convergence criterion is satisfied. We note that OGSBI has guaranteed convergence since the function $p\left(\alpha_{0},\boldsymbol{\alpha},\boldsymbol{\beta}|\boldsymbol{Y}\right)$ is guaranteed to increase at each iteration by the property of EM algorithm.

3 OGSBI-SVD

In (23), $\boldsymbol{Y}_{SV}$ , $\boldsymbol{X}_{SV}$ and $\boldsymbol{E}_{SV}$ can be viewed as the new matrices of sensor measurements, source signals and measurement noises respectively. The joint sparsity still holds in $\boldsymbol{X}_{SV}$ . We do not exploit possible correlations that exist between columns of $\boldsymbol{X}_{SV}$ (and in $\boldsymbol{E}_{SV}$ ), i.e., we still assume that $\boldsymbol{X}_{SV}$ (and $\boldsymbol{E}_{SV}$ ) have independent columns.The correlations between columns of the signal matrix ( $\boldsymbol{X}_{SV}$ in our case) have recently been studied in . It is then straightforward to apply the proposed OGSBI algorithm to estimate $\boldsymbol{X}_{SV}$ , $\boldsymbol{\beta}$ and then the DOAs. We use OGSBI-SVD to refer to the resulting algorithm.

Based on implementation details to be introduced in Subsection 3.5, it can be shown that OGSBI-SVD has a computational complexity of order $O\left(MN^{2}\right)$ per iteration while that for OGSBI is $O\left(\max\left(MN^{2},MNT\right)\right)$ per iteration. An additional computational workload of order $O\left(\max\left(M^{2}T,MT^{2}\right)\right)$ is for the SVD of $\boldsymbol{Y}$ in OGSBI-SVD. Since it is empirically found that OGSBI-SVD converges much faster than OGSBI, the whole computational workload of OGSBI-SVD is less than that of OGSBI in general.A possible exception happens in the case of $T\gg N$ where the computation for the SVD is quite heavy. A modified approach in such a case is to partition $\boldsymbol{Y}$ firstly into blocks with each of about $N$ columns, then operate the SVD on each block and keep the resulting signal subspaces, and finally do another SVD on the new signal matrix composed of all signal subspaces. A model similar to (23) can be cast.

4 Source Power and DOA Estimation

5 Implementation Details

By the fact that $\boldsymbol{\beta}$ is jointly sparse with $\boldsymbol{x}(t)$ whose $K$ nonzero entries correspond to the locations of the $K$ sources, we calculate only entries of $\boldsymbol{\beta}$ that correspond to locations of the maximum $K$ entries of $\boldsymbol{\alpha}$ and set others to zeros. As a result, $\boldsymbol{\beta}$ , $\boldsymbol{P}$ and $\boldsymbol{v}$ can be truncated into dimension of $K$ or $K\times K$ . We still use $\boldsymbol{\beta}$ , $\boldsymbol{P}$ and $\boldsymbol{v}$ hereafter to denote their truncated versions for simplicity. By (22) and $\frac{\partial}{\partial\boldsymbol{\beta}}\left\{\boldsymbol{\beta}^{T}\boldsymbol{P}\boldsymbol{\beta}-2\boldsymbol{v}^{T}\boldsymbol{\beta}\right\}=2\left(\boldsymbol{P}\boldsymbol{\beta}-\boldsymbol{v}\right)$ we have $\boldsymbol{\beta}^{new}=\check{\boldsymbol{\beta}}$ if $\boldsymbol{P}$ is invertible and $\check{\boldsymbol{\beta}}=\boldsymbol{P}^{-1}\boldsymbol{v}\in\left[-\frac{1}{2}r,\frac{1}{2}r\right]^{K}$ . Otherwise, we update $\boldsymbol{\beta}$ elementwise, i.e., at each step we update one $\beta_{n}$ by fixing up the other entries of $\boldsymbol{\beta}$ . For $n=1,\cdots,K$ , first we let

where $\boldsymbol{u}_{-n}$ is $\boldsymbol{u}$ without the $n$ th entry for a vector $\boldsymbol{u}$ . Then by constraining $\beta_{n}\in\left[-\frac{1}{2}r,\frac{1}{2}r\right]$ we have

It is easy to show that the objective function is guaranteed to decrease at each step with $\beta_{n}$ defined in (26).

Numerical simulations

2 Comparison with STLS

The off-grid model has recently been used in for DOA estimation. In , a sparse total least-squares (STLS) approach is proposed. In the SMV case, STLS seeks to solve the nonconvex optimization problem

In our experiment, we consider two DOAs from $63.2^{\circ}$ and $90.3^{\circ}$ with $\text{SNR}=20$ dB. We consider $r=2^{\circ}$ and $4^{\circ}$ for both OGSBI and STLS. The parameter $\lambda$ in (27) is tuned to our best such that STLS achieves the smallest error. Table 2 presents the averaged MSEs and CPU times of STLS and OGSBI over $R=200$ trials. OGSBI obtains more accurate DOA estimations than STLS in both the scenarios with remarkably less computational times. We also note that it is possible to accelerate STLS using state-of-the-art algorithms for CS.

3 Sensitivity to Measurement Outliers

The SVD procedure in OGSBI-SVD is related to the principal component analysis (PCA). As is known that the standard PCA is sensitive to outliers. Even a single corrupted measurement can deteriorate the quality of the approximation. In this subsection we carry out experiments to check whether the proposed OGSBI-SVD is sensitive to measurement outliers due to the SVD. The experimental setup is similar to that in Subsection 4.1 but with $\text{SNR}=\infty$ . After acquiring the noiseless measurements, we randomly choose 3 out of the $MT=1600$ measurements, multiply by a constant ratio $\kappa$ and then save as the outliers. Beside the case of no outliers ( $\text{ratio}=1$ ) we consider five other cases where $\kappa$ is set to 5, 10, 20, 50 and 100 respectively. Table 3 presents our simulation results of the MSEs. It can be seen that the estimation accuracy of OGSBI-SVD can degrade significantly even with about $0.2\%$ measurements being corrupted due to the sensitivity of the SVD.

Note that the corrupted measurement matrix $\boldsymbol{Y}$ due to the outliers is a sum of a low-rank matrix (noiseless measurement matrix of rank $K$ ) and a sparse matrix (outliers). A robust PCA technique has recently been proposed in that can recover the original low-rank matrix from the sparse outliers. So, it is possible to combine the robust PCA technique in with the proposed OGSBI-SVD to improve its robustness to outliers, which, however, is beyond the scope of this paper.

Conclusion

Appendix: Derivation of (19)

Denote $\boldsymbol{\Delta}=\text{diag}\left(\boldsymbol{\beta}\right)$ . Eq. (19) is based on the following two equalities:

where $C_{1}$ , $C_{2}$ are constants independent of $\boldsymbol{\beta}$ , and the equality