Global and Local Uncertainty Principles for Signals on Graphs

Nathanael Perraudin, Benjamin Ricaud, David Shuman, Pierre Vandergheynst

Introduction

The major research thrust to date in the emerging area of signal processing on graphs has been to design multiscale wavelet and vertex-frequency transforms -. Objectives of these transforms are to sparsely represent different classes of graph signals and/or efficiently reveal relevant structural properties of high-dimensional data on graphs. As we move forward, it is important to both test these transforms on myriad applications, as well as to develop additional theory to help answer the question of which transforms are best suited to which types of data.

Uncertainty principles such as the ones presented in - are an important tool in designing and evaluating linear transforms for processing “classical” signals such as audio signals, time series, and images residing on Euclidean domains. It is desirable that the dictionary atoms are jointly localized in time and frequency, and uncertainty principles characterize the resolution tradeoff between these two domains. Moreover, while “the uncertainty principle is [often] used to show that certain things are impossible,” Donoho and Stark present “examples where the generalized uncertainty principle shows something unexpected is possible; specifically, the recovery of a signal or image despite significant amounts of missing information.” In particular, uncertainty principles can provide guarantees that if a signal has a sparse decomposition in a dictionary of incoherent atoms, this is indeed a unique representation that can be recovered via optimization . This idea underlies the recent wave of sparse signal processing techniques, with applications such as denoising, source separation, inpainting, and compressive sensing. While there is still limited theory showing that different mathematical classes of graph signals are sparsely represented by the recently proposed transforms (see for one preliminary work along these lines), there is far more empirical work showing the potential of these transforms to sparsely represent graph signals in various applications.

Many of the multiscale transforms designed for graph signals attempt to leverage intuition from signal processing techniques designed for signals on Euclidean data domains by generalizing fundamental operators and transforms to the graph setting (e.g., by checking that they correspond on a ring graph). While some intuition, such as the notion of filtering with a Fourier basis of functions that oscillate at different rates (see, e.g., ) carries over to the graph setting, the irregular structure of the graph domain often restricts our ability to generalize ideas. One prime example is the lack of a shift-invariant notion of translation of a graph signal. As shown in and discussed in [23, Section 3.2], the concentration of the Fourier basis functions is another example where the intuition does not carry over directly. Complex exponentials, the basis functions for the classical Fourier transform, have global support across the real line. On the other hand, the eigenvectors of the combinatorial or normalized graph Laplacians, which are most commonly used as the basis functions for a graph Fourier transform, are sometimes localized to small regions of the graph. Because the incoherence between the Fourier basis functions and the standard normal basis underlies many uncertainty principles, we demonstrate this issue with a short example.

Below we discuss three different families of uncertainty principles, and their extensions to the graph setting, both in prior work and in this contribution.

The first family of uncertainty principles measure the spreading around some reference point, usually the mean position of the energy contained in the signal. The well-known Heisenberg uncertainty principle belongs to this family. It views the modulus square of the signal in both the time and Fourier domains as energy probability density functions, and takes the variance of those energy distributions as measures of the spreading in each domain. The uncertainty principle states that the product of variances in the time and in the Fourier domains cannot be arbitrarily small. The generalization of this uncertainty principle to the graph setting is complex since there does not exist a simple formula for the mean value or the variance of graph signals, in either the vertex or the graph spectral domains. For unweighted graphs, Agaskar and Lu also view the square modulus of the signal in the vertex domain as an energy probability density function and use the geodesic graph distance (shortest number of hops) to define the spread of a graph signal around a given center vertex. For the spread of a signal $f$ in the graph spectral domain, Agaskar and Lu use the normalized variation $\frac{f^{\top}\mathcal{L}f}{||f||_{2}^{2}}$ , which captures the smoothness of a signal. They then specify uncertainty curves that characterize the tradeoff between the smoothness of a graph signal and its localization in the vertex domain. This idea is generalized to weighted graphs in . As pointed out in , the tradeoff between smoothness and localization in the vertex domain is intuitive as a signal that is smooth with respect to the graph topology cannot feature values that decay too quickly from the peak value. However, as shown in Figure 1 (and subsequent examples in Table 1), graph signals can indeed be simultaneously highly localized or concentrated in both the vertex domain and the graph spectral domain. This discrepancy is because the normalized variation used as the spectral spread in is one method to measure the spread of the spectral representation around the eigenvalue 0, rather than around some mean of that signal in the graph spectral domain. In fact, using the notion of spectral spread presented in , the graph signal with the highest spectral spread on a graph $\mathcal{G}$ is the graph Laplacian eigenvector associated with the highest eigenvalue. The graph spectral representation of that signal is a Kronecker delta whose energy is completely localized at a single eigenvalue. One might argue that its spread should in fact be zero. So, in summary, while there does exist a tradeoff between the smoothness of a graph signal and its localization around any given center vertex in the vertex domain, the classical idea that a signal cannot be simultaneously localized in the time and frequency domains does not always carry over to the graph setting. While certainly an interesting avenue for continued investigation, we do not discuss uncertainty principles based on spreads in the vertex and graph spectral domains any further in this paper.

where $\mu$ is the maximum magnitude of the inner product between any vector in the first basis with any vector in the second basis. In Section 3.1, we apply (1) to graph signals by taking one basis to be the canonical basis of Kronecker delta functions in the graph vertex domain and the other to be a Fourier basis of graph Laplacian eigenvectors. We also apply other such finite dimensional uncertainty principles from , , and to the graph setting. In Section 3.2, we adapt the Hausdorff-Young inequality [43, Section IX.4], a classical result for infinite dimensional signals, to the graph setting. These results typically depend on the mutual coherence between the graph Laplacian eigenvectors and the canonical basis of deltas. For the special case of shift-invariant graphs with circulant graph Laplacians [44, Section 5.1], such as ring graphs, these bases are incoherent, and we can attain meaningful uncertainty bounds. However, for less homogeneous graphs (e.g., a graph with a vertex with a much higher or lower degree than other vertices), the two bases can be more coherent, leading to weaker bounds. Moreover, as we discuss in Section 2, the bounds are global bounds, so even if the majority of a graph is for example very homogenous, inhomogeneity in one small area can prevent the result from informing the behavior of graph signals across the rest of the graph.

In Section 5, we generalize Lieb’s uncertainty principle to the graph setting to provide upper bounds on the concentration of the transform coefficients of any graph signal under (i) any frame of dictionary atoms, and (ii) a special class of dictionaries called localized spectral graph filter frames, whose atoms are of the form $T_{i}g_{k}$ , where $T_{i}$ is a localization operator that centers on vertex $i$ a pattern described in the graph spectral domain by the kernel $\widehat{g_{k}}$ .

While the second family of uncertainty principles above yields global uncertainty principles, we can generalize the third family to the graph setting in a way that yields local uncertainty principles. In the classical Euclidean setting, the underlying domain is homogenous, and thus uncertainty principles apply to all signals equally, regardless of where on the real line they are concentrated. However, in the graph setting, the underlying domain is irregular, and a change in the graph structure in a single small region of the graph can drastically affect the uncertainty bounds. For instance, the second family of uncertainty principles all depend on the coherence between the graph Laplacian eigenvectors and the standard normal basis of Kronecker deltas, which is a global quantity in the sense that it incorporates local behavior from all regions of the graph. To see how this can limit the usefulness of such global uncertainty principles, we return to the motivating example from above.

In Section 6, we generalize the approach of Lieb to build a local uncertainty principle that bounds the concentration of the analysis coefficients of each atom of a localized graph spectral filter frame in terms of quantities that depend on the local structure of the graph around the center vertex of the given atom. Thus, atoms localized to different regions of the graph feature different concentration bounds. Such local uncertainty principles also have constructive applications, and we conclude with an example of non-uniform sampling for graph inpainting, where the varying uncertainty levels across the graph suggest a strategy of sampling more densely in areas of higher uncertainty. For example, if we were to take $M$ measurements of a smooth signal on manifold 2 in Figure 1, this method would lead to a higher probability of sampling signal values near the spike, and a lower probability of sampling signal values in the more homogenous flat parts of the manifold, where reconstruction of the missing signal values is inherently easier.

Notation and graph signal concentration

In this section, we introduce some notation and illustrate further how certain intuition from signal processing on Euclidean spaces does not carry over to the graph setting.

See for example for more details on spectral graph theory, and for more details on signal processing on graphs.

2 Concentration measures

In order to discuss uncertainty principles, we must first introduce some concentration/sparsity measures. Throughout the paper, we use the terms sparsity and concentration somewhat interchangeably, but we reserve the term spread to describe the spread of a function around some mean or center point, as discussed in the first family of uncertainty principles in Section 1. The first concentration measure is the support measure of $f$ , denoted $\|f\|_{0}$ , which counts the number of non-zero elements of $f$ . The second concentration measure is the Shannon entropy, which is used often in information theory and physics:

3 Concentration of the graph Laplacian eigenvectors

The spectrum of the graph Laplacian replaces the frequencies as coordinates in the Fourier domain. For the special case of shift-invariant graphs with circulant graph Laplacians [44, Section 5.1], the Fourier eigenvectors can still be viewed as pure oscillations. However, for more general graphs (i.e., all but the most highly structured), the oscillatory behavior of the Fourier eigenvectors must be interpreted more broadly. For example, [1, Fig. 3] displays the number of zero crossings of each eigenvector; that is, for each eigenvector, the number of pairs of connected vertices where the signs of the values of the eigenvector at the connected vertices are opposite. It is generally the case that the graph Laplacian eigenvectors associated with larger eigenvalues contain more zero crossings, yielding a notion of frequency to the graph Laplacian eigenvalues. However, despite this broader notion of frequency, the graph Laplacian eigenvectors are not always globally-supported, pure oscillations like the complex exponentials. In particular, they can feature sharp peaks, meaning that some of the Fourier basis elements can be much more similar to an element of the canonical basis of Kronecker deltas on the vertices of the graph. As we will see, uncertainty principles for signals on graphs are highly affected by this phenomenon.

One way to compare a graph Fourier basis to the canonical basis is to compute the coherence between these two representations.

This quantity measures the similarity between the two sets of vectors. If the sets possess a common vector, then $\mu_{\mathcal{G}}=1$ (the maximum possible value for $\mu_{\mathcal{G}}$ ). If the two sets are maximally incoherent, such as the canonical and Fourier bases in the standard discrete setting, then $\mu_{\mathcal{G}}=1/\sqrt{N}$ (the minimum possible value).

Because the graph Laplacian matrix encodes the weights of the edges of the graph, the coherence $\mu_{\mathcal{G}}$ clearly depends on the structure of the underlying graph. It remains an open question exactly how structural properties of weighted graphs such as the regularity, clustering, modularity, and other spectral properties can be linked to the concentration of the graph Laplacian eigenvectors. For certain classes of random graphs or large regular graphs , the eigenvectors have been shown to be non-localized, globally oscillating functions (i.e., $\mu_{\mathcal{G}}$ is low). Yet, empirical studies such as show that graph Laplacian eigenvectors can be highly concentrated (i.e., $\mu_{\mathcal{G}}$ can be close to 1), particularly when the degree of a vertex is much higher or lower than the degrees of other vertices in the graph. The following example illustrates how $\mu_{\mathcal{G}}$ can be influenced by the graph structure.

In this example, we discuss two classes of graphs that can have graph Fourier coherences. The first, called comet graphs, are studied in . They are composed of a star with $k$ vertices connected to a center vertex, and a single branch of length greater than one extending from one neighbor of the center vertex (see Figure 3, top). If we fix the length of the longer branch (it has length 10 in Figure 3), and increase $k$ , the number of neighbors of the center vertex, the graph Laplacian eigenvector associated with the largest eigenvalue approaches a Kronecker delta centered at the center vertex of the star. As a consequence, the coherence between the graph Fourier and the canonical bases approaches 1 as $k$ increases.

The second class are the modified path graphs, which we use several times in this contribution. We start with a standard path graph of 10 nodes equally spaced (all edge weights are equal to one) and we move the first node out to the left; i.e., we reduce the weight between the first two nodes (see Figure 3, bottom). The weight is related to the distance by $W_{12}=1/d(1,2)$ with $d(1,2)$ being the distance between nodes 1 and 2. When the weight between nodes 1 and 2 decreases, the eigenvector associated with the largest eigenvalue of the Laplacian becomes more concentrated, which increases the coherence $\mu_{\mathcal{G}}$ . These two examples of simple families of graphs illustrate that the topology of the graph can impact the graph Fourier coherence, and, in turn, uncertainty principles that depend on the coherence.

In Figure 4, we display the eigenvector associated with the largest graph Laplacian eigenvalue for a modified path graph of 100 nodes, for several values of the weight $W_{12}$ . Observe that the shape of the eigenvector has a sharp local change at node 1.

Example 1 demonstrates an important point to keep in mind. A small local change in the graph structure can greatly affect the behavior of one eigenvector, and, in turn, a global quantity such as $\mu_{\mathcal{G}}$ . However, intuitively, a small local change in the graph should not drastically change the processing of signal values far away, for example in a denoising or inpainting task. For this reason, in Section 6, we introduce a notion of local uncertainty that depicts how the graph is behaving locally.

Note that not only special classes of graphs or pathological graphs yield highly localized graph Laplacian eigenvectors. Rather, graphs arising in applications such as sensor or transportation networks, or graphs constructed from sampled manifolds (such as the graph sampled from manifold 2 in Figure 1) can also have graph Fourier coherences close to 1 (see, e.g., [23, Section 3.2] for further examples).

Global uncertainty principles relating the concentration of graph signals in two domains

In this section, we derive basic uncertainty principles using concentration measures and highlight the limitations of those uncertainty principles.

We start by applying three known uncertainty principles for discrete signals to the graph setting.

for any subset $\mathcal{V}_{S}$ of the vertices $\mathcal{V}$ in the graph $\mathcal{G}$ .

The following example illustrates the relation between the graph, the concentration of a specific graph signal, and one of the uncertainty principles from Theorem 1. We return to this example in Section 3.3 to discuss further the limitations of these uncertainty principles featuring $\mu_{\mathcal{G}}$ .

Figure 5 shows the computation of the quantities involved in (5), with $p=1$ and different $\mathcal{G}$ ’s taken to be the modified path graphs of Example 1, with different distances between the first two vertices. We show the lefthand side of (5) for two different Kronecker deltas, one centered at vertex 1, and one centered at vertex 10. We have seen in Figure 3 that as the distance between the first two vertices increases, the coherence increases, and therefore the lower bound on the right-hand side of (5) decreases. For $\delta_{1}$ , the uncertainty quantity on the left-hand side of (5) follows a similar pattern. The intuition behind this is that as the weight between the first two vertices decreases, a few of the eigenvectors start to have local jumps around the first vertex (see Figure 4). As a result, we can sparsely represent $\delta_{1}$ as a linear combination of those eigenvectors and $||\widehat{\delta_{1}}||_{1}$ is reduced. However, since there are not any eigenvectors that are localized around the last vertex in the path graph, we cannot find a sparse linear combination of the graph Laplacian eigenvectors to represent $\delta_{10}$ . Therefore, its uncertainty quantity on the left-hand side of (5) does not follow the behavior of the lower bound.

2 The Hausdorff-Young inequalities for signals on graphs

Conversely, for $2\leq p\leq\infty$ , we have

The proof of Theorem 2, given in Sec. 8.1, is an extension of the classical proof using the Riesz-Thorin interpolation theorem. In the classical (infinite dimensional) setting, the inequality only depends on $p$ and $q$ . On a graph, it depends on $\mu_{\mathcal{G}}$ and hence on the structure of the graph.

Dividing both sides of each inequality in Theorem 2 by $\|f\|_{2}$ leads to bounds on the concentrations (or sparsity levels) of a graph signal and its graph Fourier transform.

Theorem 2 and Corollary 1 assert that concentration or sparsity level of a graph signal in one domain (vertex or graph spectral) limits the concentration or sparsity level in the other domain. However, once again, if the coherence $\mu_{\mathcal{G}}$ is close to 1, the result is not particularly informative as $s_{p}(f)s_{q}(\hat{f})$ is trivially upper bounded by 1. The following numerical experiment illustrates the quantities involved in the Hausdorff-Young inequalities for graph signals. We again see that as the graph Fourier coherence increases, signals may be simultaneously concentrated in both the vertex domain and the graph spectral domain.

Continuing with the modified path graphs of Examples 1 and 2, we illustrate the bounds of the Hausdorff-Young inequalities for graph signals in Figure 6. For this example, we take the signal $f$ to be $\delta_{1}$ , a Kronecker delta centered on the first node of the modified path graph. As a consequence, $\|\delta_{1}\|_{p}=1$ for all $p$ , which makes it easier to compare the quantities involved in the inequalities. For this example, the bounds of Theorem 2 are fairly close to the actual values of $\|\hat{\delta_{1}}\|_{q}$ .

3 Limitations of global concentration-based uncertainty principles in the graph setting

The motivation for this section was twofold. First, we wanted to derive the uncertainty principles for graph signals analogous to some of those that are so fundamental for signal processing on Euclidean domains. However, we also want to highlight the limitations of this approach (the second family of uncertainty principles described in Section 1) in the graph setting. The graph Fourier coherence is a global parameter that depends on the topology of the entire graph. Hence, it may be greatly influenced by a small localized changes in the graph structure. For example, in the modified path graph examples above, a change in a single edge weight leads to an increased coherence, and in turn significantly weakens the uncertainty principles characterizing the concentrations of the graph signal in the vertex and spectral domains. Such examples call into question the ability of such global uncertainty principles for graph signals to accurately describe phenomena in inhomogeneous graphs. This is the primary motivation for our investigation into local uncertainty principles in Section 6. However, before getting there, we consider global uncertainty principles from the third family of uncertainty principles described in Section 1 that bound the concentration of the analysis coefficients of a graph signal in a time-frequency transform domain.

Graph signal processing operators and dictionaries

As mentioned in Section 1, uncertainty principles can inform dictionary design. In the next section, we present uncertainty principles characterizing the concentration of the analysis coefficients of graph signals in different transform domains. We focus on three different classes of dictionaries for graph signal analysis: (i) frames, (ii) localized spectral graph filter frames, and (iii) graph Gabor filter bank frames, where localized spectral graph filter frames are a subclass of frames, and graph Gabor filter bank frames are a subclass of localized spectral graph filter frames. In this section, we define these different classes of dictionaries, and highlight some of their mathematical properties. Note that our notation uses dictionary atoms that are double indexed by $i$ and $k$ , but these could be combined into a single index $j$ for the most general case.

If $A=B$ , the frame is said to be a tight frame.

For more properties of frames, see, e.g., . Most of the recently proposed dictionaries for graph signals are either orthogonal bases (e.g., ) , which are a subset of tight frames, or overcomplete frames (e.g., ).

In order to define localized spectral graph filter frames, we need to first recall one way to generalize the translation operator to the graph setting.

We localize (or translate) a kernel $\hat{g}$ to center vertex $i\in\{1,2,\ldots,N\}$ by applying the localization operator $T_{i}$ , whose action is defined as

Note that this generalized localization operator is a kernelized operator. It does not translate an arbitrary signal defined in the vertex domain to different regions of the graph, but rather localizes a pattern defined in the graph spectral domain to be centered at different regions of the graph. The smoothness of the kernel $\hat{g}(\cdot)$ to be localized can be used to bound the localization of the translated kernel around a center vertex $i$ ; i.e., if a smooth kernel $\hat{g}(\cdot)$ is localized to center vertex $i$ , then the magnitude of $T_{i}g(n)$ decays as the distance between $i$ and $n$ increases [13, Section 5.2], [23, Section 4.4]. Except for special cases such as when $\mathcal{G}$ is a circulant graph with $\mu_{\mathcal{G}}=\frac{1}{\sqrt{N}}$ and the Laplacian eigenvectors are the DFT basis, the generalized localization operator of Definition 3 is not isometric. Rather, we have

which yields the following upper bound on the operator norm of $T_{i}$ :

It is interesting to note that although the norm is not preserved when a kernel is localized on an arbitrary graph, it is preserved on average when translated to separately to every vertex on the graph:

The following example presents more precise insights on the interplay between the localization operator, the graph structure, and the concentration of localized functions.

where the second equality follows from [23, Corollary 1]. Thus, recalling that a large value for $s_{1}(T_{i}g)$ means that $T_{i}g$ is concentrated, we can combine (10) and (12) to derive an upper bound on the concentration of $T_{i}g$ :

Thus, $||T_{i}g||_{2}$ serves as a measure of concentration, and according to the numerical experiments of Figure 7, localized heat kernels centered on the relatively well-connected regions of a graph tend to be less concentrated than the ones centered on relatively less well-connected areas. Intuitively, the values of the localized heat kernels can be linked to the diffusion of a unit of energy from the center vertex to surrounding vertices over a fixed time. In the well-connected regions of the graph, energy diffuses faster, making the localized heat kernels less concentrated.

The main class of dictionaries for graph signals that we consider is localized spectral graph filter frames.

In practice, each filter $\widehat{g_{k}}(\cdot)$ is often defined as a continuous function over the interval $[0,\lambda_{\max}]$ and then applied to the discrete set of eigenvalues in $\sigma(\mathcal{L})$ . The following lemma characterizes the frame bounds for a localized spectral graph filter frame.

Examples of localized spectral graph filter frames include the spectral graph wavelets of , the Meyer-like tight graph wavelet frames of , the spectrum-adapted wavelets and vertex-frequency frames of , and the learned parametric dictionaries of . The dictionaries constructions in choose the filters so that their energies are localized in different spectral bands. Different choices of filters lead to different tilings of the vertex-frequency space, and can for example lead to wavelet-like frames or vertex-frequency frames (analogous to classical windowed Fourier frames). The frame condition that $G(\lambda)>0$ for all $\lambda\in\sigma(\mathcal{L})$ ensures that these filters cover the entire spectrum, so that no band of information is lost during analysis and reconstruction.

In this paper, in order to generalize classical windowed Fourier frames, we often use a localized graph spectral filter bank where the kernels are uniform translates, which we refer to as a graph Gabor filter bank.

When the $K$ kernels used to generate the localized graph spectral filter frame are uniform translates of each other, we refer to the resulting dictionary as a graph Gabor filter bank or a graph Gabor filter frame. If we use the warping technique of on these uniform translates, we refer to the resulting dictionary as a spectrum-adapted graph Gabor filter frame.

Graph Gabor filter banks are generalizations of the short time Fourier transform. When $\hat{g}$ is smooth, the atoms are localized in the vertex domain. In this contribution, for all graph Gabor filter frames, we use the following mother window: $\hat{g}(t)=\sin\left(0.5\pi\cos(\pi(t-0.5))^{2}\right),$ for $t\in[-0.5,0.5]$ and elsewhere. A few desirable properties of this choice of window are (a) it is perfectly localized in the spectral domain in $[-0.5,0.5]$ , (b) it is smooth enough to be approximated by a low order polynomial, and (c) the frame formed by uniform translates (with an even overlap) is tight.

Lieb’s uncertainty principle in the continuous one-dimensional setting states that the cross-ambiguity function of a signal cannot be too concentrated in the time-frequency plane. In the following, we transpose these statements to the discrete periodic setting, and then generalize them to frames and signals on graphs. The following discrete version of Lieb’s uncertainty principle is partially presented in [61, Proposition 2].

Define the discrete Fourier transform (DFT) as

and the discrete windowed Fourier transform (or discrete cross-ambiguity function) as (see, e.g., [35, Section 4.2.3])

For two discrete signals of period $N$ , we have for $2\leq p<\infty$

These inequalities are proven in Section 8.2.2 of the Appendix. Note that the minimizers of this uncertainty principle are the so-called "picket fence" signals, trains of regularly spaced diracs.

2 Generalization of Lieb’s uncertainty principle to frames

Combining (15) and (16), for any $p\in[1,\infty]$ , we have

When $\mathcal{D}$ is a tight frame with frame bound $A$ , (17) reduces to

A proof is included in Section 8.2.1 of the Appendix. The proof of Theorem 3 in Section 8.2.2 of the Appendix also demonstrates that this uncertainty principle is indeed a generalization of the discrete periodic variant of Lieb’s uncertainty principle.

3 Lieb’s uncertainty principle for localized spectral graph filter frames

Lemma 1 implies that $\max_{i,k}||T_{i}g_{k}||_{2}\leq\sqrt{N}\mu_{\mathcal{G}}\max_{k}\|\widehat{g_{k}}\|_{2}$ . Therefore the following is a corollary to Theorem 4 for the case of localized spectral graph filter frames.

where $A=\min_{\lambda\in\sigma(\mathcal{L})}G(\lambda)$ is the lower frame bound and $B=\max_{\lambda\in\sigma(\mathcal{L})}G(\lambda)$ is the upper frame bound. When $\mathcal{D}$ is a tight frame with frame bound $A$ , (18) reduces to

In the following example, we compute the first uncertainty bound in (19) for different types of graphs and filters. It provides some insight on the influence of the graph topology and filter bank design on the uncertainty bound.

We use the techniques of to construct four tight localized spectral graph filter frames for each of eight different graphs. Figure 8 shows an examples of the four sets of filters for a 64 node sensor network. For each graph, two of the sets of filters (b and d in Figure 8) are adapted via warping to the distribution of the graph Laplacian eigenvalues so that each filter contains an appropriate number of eigenvalues (roughly equal in the case of translates and roughly logarithmic in the case of wavelets). The warping avoids filters containing zero or very few eigenvalues at which the filter has a nonzero value. These tight frames are designed such that $A=N$ , and thus Theorem 5 yields

Table 1 displays the values of the first concentration bound $\max_{i,k}\|T_{i}g_{k}\|_{2}$ for each graph and frame pair. The uncertainty bound is largest when the graph is far from a regular lattice (ring or path). As expected, the worst cases are for highly inhomogeneous graphs like the comet graph or a modified path graph with one isolated vertex. The choice of the filter bank may also decrease or increase the bound, depending on the graph.

The uncertainty principle in Theorem 5 bounds the concentration of the graph Gabor transform coefficients. In the next example, we examine these coefficients for a series of signals with different vertex and spectral domain localization properties.

Local uncertainty principles for signals on graphs

In the previous section, we defined a global bound for the concentration of the localized spectral graph filter frame analysis coefficients. In the classical setting, such a global bound is also local in the sense that each part of the domain has the same structure, due to the regularity of the underlying domain. However, this is not the case for the graph setting where the domain is irregular. Example 1 shows that a “bad” structure (a weakly connected node) in a small region of the graph reduces the uncertainty bound even if the rest of the graph is well behaved. Functions localized near the weakly connected node can be highly concentrated in both the vertex and frequency domains, whereas functions localized away from it are barely impacted. Importantly, the worst case determines the global uncertainty bound. As another example, suppose one has two graphs $G_{1}$ and $G_{2}$ with two different structures, each of them having a different uncertainty bound. The uncertainty bound for the graph $\mathcal{G}$ that is the union of these two disconnected graphs is the minimum of the uncertainty bounds of the two disconnected graphs, which is suboptimal for one of the two graphs.

In this section, we ask the following questions. Where does this worse case happen? Can we find a local principle that more accurately characterizes the uncertainty in other parts of the graph? In order to answer this question, we investigate the concentration of the analysis coefficients of the frame atoms, which are localized signals in the vertex domain. This technique is used in the classical continuous case by Lieb , who defines the (cross-) ambiguity function, the STFT of a short-time Fourier atom. The result is a joint time-frequency uncertainty principle that does not depend on the localization in time or in frequency of the analyzed atom.

Thus, we start by generalizing to the graph setting the definition of ambiguity (or cross-ambiguity) functions from time-frequency analysis of one-dimensional signals.

The ambiguity function of a localized spectral frame $\mathcal{D}=\{g_{i,k}\}=\{T_{i}g_{k}\}$ is defined as:

In order to probe the local uncertainty of a graph, we take a set of localized kernels in the graph spectral domain and center them at different local regions of the graph in the vertex domain. The atoms resulting from this construction are jointly localized in both the vertex and graph spectral domains, where "localized" means that the values of the function are zero or close to zero away from some reference point. By ensuring that the atoms are localized or have support within a small region of the graph, we focus on the properties of the graph in that region. In order to get a local uncertainty principle, we apply the frame operator to these localized atoms, and analyze the concentration of the resulting coefficients. In doing so, we develop an uncertainty principle relating these concentrations to the local graph structure.

To prepare for the theorem, we first state a lemma that gives a hint to how the scalar product of two localized functions depends on the graph structure and properties. In the following, we multiply two kernels $\hat{g}$ and $\hat{h}$ in the graph spectral domain. For notation, we represent the product of these two kernels in vertex domain as $g\cdot h$ .

For two kernels $\hat{g}$ , $\hat{h}$ and two nodes $i,j$ , the localization operator satisfies

Equation (20) shows more clearly the conditions on the kernels and nodes under which the scalar product is small. Let us take two examples. First, suppose $\hat{g}$ and $\hat{h}$ have a compact support on the spectrum and do not overlap (kernels localized in different places), then $\hat{g}\cdot\hat{h}$ is zero everywhere on the spectrum, and therefore the scalar product on the left-hand side of (20) is also equal to zero. Second, assume $i$ and $j$ are distant from each other. Then $|T_{i}(g\cdot h)(j)|$ is small if $\hat{g}$ and $\hat{h})$ are reasonably smooth. In other words, the two atoms $T_{i}g$ and $T_{j}h$ must be localized both in the same area of graph in the vertex domain and the same spectral region in order for the scalar product to be large. This localization depends on the atoms, but also on the graph structure.

Let $\{T_{i}g\}_{\left\{i\in[1,N],k\in[0,M-1]\right\}}$ be a localized spectral graph filter frame with lower frame bound $A$ and upper frame bound $B$ . For any $i_{0}\in[1,N],k_{0}\in[0,M-1]$ such that $\|T_{i_{0}}g_{k_{0}}\|_{2}>0$ , the quantity

Finally, thanks to Hölder’s inequality, we have for $p\leq 2$ and $\frac{1}{p}+\frac{1}{q}=1$

Under the assumptions of Theorem 6 and, assuming additionally

$k_{0}=\operatorname*{arg\,max}_{k}\|T_{i_{0}}(g_{k}\cdot g_{k_{0}})\|_{\infty}$ , and

$i_{0}=\operatorname*{arg\,max}_{j}|T_{i_{0}}g_{k_{0}}^{2}(j)|$ ,

The proof follows directly from the two following equalities. For the denominators, since the frame is tight, we have:

where (6.1) and (6.1) follow from (20), (28) follows from the second hypothesis, and (29) follows from the third hypothesis. ∎

Under the assumptions of Theorem 6, we have

which is a lower bound on the concentration measure.

Additionally, because $\{T_{i}g_{k}\}_{i=1,2,\ldots,N;k=0,1,\ldots,M-1}$ is a frame, we have

Combining (32) and (33) yields the desired inequality in (31). ∎

2 Illustrative examples

In order to better understand this local uncertainty principle, we illustrate it with some examples.

Let us concentrate on the case where $p=\infty$ . Theorem 6 tells us that

In the next example, we compare the local and global uncertainty principles on a modified path graph.

On a $64$ node modified path graph (see Example 1 for details), we compute the graph Gabor transform of the signals $f_{1}=T_{1}g_{0}$ and $f_{2}=T_{64}g_{0}$ . In Figure 12, we show the evolution of the graph Gabor transforms of the two signals with respect to the distance $d=1/W_{12}$ from the first to the second vertex in the graph. As the first node is pulled away, a localized eigenvector appears centered on the isolated vertex. Because of this, as this distance increases, the signal $f_{1}$ becomes concentrated in both the vertex and graph spectral domains, leading to graph Gabor transform coefficients that are highly concentrated (see the top right plot in Fig. 12). However, since the graph modification is local, it does not drastically affect the graph Gabor transform coefficients of the signal $f_{2}$ (middle row of Fig. 12), whose energy is concentrated on the far end of the path graph.

In Figure 13, we plot the evolution of the uncertainty bounds as well as the concentration of the Gabor transform coefficients of $f_{1}$ and $f_{2}$ . The global uncertainty bound from Theorem 5 tells us that

The local uncertainty bound from Theorem 6 tells us that

Thus, we can view the global uncertainty bound as an upper bound on all of the local uncertainty bounds. In fact the bumps in the global uncertainty bound in Figure 13 correspond to the local bound with $i_{0}=1$ and different frequency bands $k_{0}$ . We plot the local bounds for $i_{0}=1$ and $k_{0}=0$ and $k_{0}=2$ .

3 Single kernel analysis

Let us focus on the case where we analyze a single kernel $\hat{g}$ . Such an analysis is relevant when we model the signal as a linear combination of different localizations of a single kernel:

This model has been proposed in different contributions , and has also been used as an interpolation model, e.g., in and [24, Section V.C]. In this case, we could ask the following question. If we measure the signal value at node $j$ , how much information do we get about $w_{j}$ ? We can answer this by looking at the overlap between the atom $T_{j}g$ and the other atoms. When $T_{j}g$ has a large overlap with the other atoms, the value of $f(j)$ does not tell us much about $w_{j}$ . However, in the case where $T_{j}g$ has a very small overlap with the other atoms (an isolated node for example), knowing $f(j)$ gives an excellent approximation for the value of $w_{j}$ . The following theorem uses the sparsity level of $g(\mathcal{L})T_{j}g$ to analyze the overlap between the atom $T_{j}g$ and the other atoms.

For a kernel $\hat{g}$ , the overlap between the atom localized to center vertex $j$ and the other atoms satisfies

This result follows directly from the application of (21) in Lemma 3. ∎

4 Application: non-uniform sampling

In order to motivate Theorem 7 from a practical signal processing point of view, we use it to optimize the sampling of a signal over a graph. To asses the quality of the sampling, we solve a small inpainting problem where only a part of a signal is measured and the goal is to reconstruct the entire signal. Assuming that the signal varies smoothly in the vertex domain, we can formulate the inverse problem as:

where $y$ is the observed signal, $M$ the inpainting masking operator and $x^{T}\mathcal{L}x$ the graph Tikhonov regularizer ( $\mathcal{L}$ being the Laplacian). In order to generate the original signal, we filter Gaussian noise on the graph with a low pass kernel $\hat{h}$ . The frequency content of the resulting signal will be close to the shape of the filter $\hat{h}$ . For this example, we use the low pass kernel $\hat{h}(x)=\frac{1}{1+\frac{100}{\lambda_{\rm max}}x}$ to generate the smooth signal.

For a given number of measurements, the traditional idea is to randomly sample the graph. Under that strategy, the measurements are distributed across the network. Alternatively, we can use our local uncertainty principles to create an adapted mask. The intuitive idea that nodes with less uncertainty (higher local sparsity values) should be sampled with higher probability because their value can be inferred less easily from other nodes. Another way to picture this fact is the following. Imagine that we want to infer a quantity over a random sensor network. In the more densely populated parts of the network, the measurements are more correlated and redundant. As result, a lower sampling rate is necessary. On the contrary, in the parts where there are fewer sensors, the information has less redundancy and a higher sampling rate is necessary. The heat kernel $\hat{g}(x)=e^{-\tau x}$ is a convenient choice to probe the local uncertainty of a graph, because $\widehat{g^{2}}(x)=e^{-2\tau x}$ is also a heat kernel, resulting in a sparsity level depending only on $\|T_{j}g^{2}\|_{2}$ . Indeed we have $\|T_{j}g^{2}\|_{1}=\sqrt{N}$ . The local uncertainty bound of Theorem 7 becomes:

Based on this measure, we design a second random sampled mask with a probability proportional to $\|T_{i}g^{2}\|_{2}$ ; that is, the higher the overlap level at vertex $j$ , the smaller the probability that vertex $j$ is chosen as a sampling point, and vice-versa. For each sampling ratio, we performed $100$ experiments and averaged the results. For each experiment, we also randomly generated new graphs. The experiment was carried out using open-source code: the UNLocBoX and the GSPBox . Figure 14 presents the result of this experiment for a sensor graph and a community graph. In the sensor graph, we observe that our local measure of uncertainty varies smoothly on the graph and is higher in the more dense part. Thus, the likelihood of sampling poorly connected vertices is higher than the likelihood of sampling well connected vertices. In the community graph, we observe that the uncertainty is highly related to the size of the community. The larger the community, the larger the uncertainty (or, equivalently, the smaller the local sparsity value). In both cases, the adapted, non-uniform random sampling performs better than random uniform sampling.

Other works are also starting to use uncertainty principles to develop sampling theory for signals on graphs. In , the cumulative coherence is used to optimize the sampling distribution. This can be seen as sampling proportionally to $\|T_{i}g\|_{2}^{2}$ , where $\hat{g}$ is a specific rectangular kernel, in order to minimize the cumulative coherence of band-limited signals. In , Tsitsvero et al. make a link between uncertainty and sampling to obtain a non-probabilistic sampling method. While non-uniform random sampling is only an illustrative example in this paper, we are currently working on a separate contribution that uses our uncertainty theory to optimize sampling.

Conclusion

The global uncertainty principles discussed in Section 3 may be less informative when applied to signals residing on inhomogeneous graphs, because the structure of a specific area of the graph can affect global quantities such as the coherence $\mu_{\mathcal{G}}$ , which play a key role in the uncertainty bounds. Our main contribution was to suggest a new way of considering uncertainty by incorporating a notion of locality; specifically, we focused on the concentration of the analysis coefficients under a linear transform whose dictionary atoms are generated by localizing kernels defined in the graph spectral domain to different areas of the graph. The equivalent physical approach would be to say that the uncertainty on the measurements depends on the medium where the particle is located. Comparing the first inequality in (23) from the local uncertainty Theorem 6 with the first inequality in (18) from the global uncertainty Theorem 5, we see that the latter global bound can be viewed as the maximum of the local bounds over all regions of the graph and all regions of the spectrum.The leading constants in the middle terms of (18) and (23) are equal for $p>2$ . When $p<2$ , there is a constant factor $\left(\frac{B}{A}\right)^{\frac{1}{2}-\frac{1}{p}}\geq 1$ between the two bounds. This factor is equal to $1$ in the case of a tight frame ( $A=B$ ). This supports our view that the benefit of the global uncertainty principle is restricted to the behavior in the region of the graph with the least favorable structure. The local uncertainty principle, on the other hand, provides information about each region of the graph separately.

The key quantities $\{||T_{i}g_{k}||_{2}\}_{i,k}$ appear in both the global and local uncertainty principles. While we know that smoother kernels $\widehat{g_{k}}$ lead to atoms of the form $T_{i}g_{k}$ being more concentrated in the vertex domain, further study of the norms of these atoms is merited, as they seem to carry some notions of both uncertainty and centrality.

Finally, we showed in Example 9 how this local notion of uncertainty can be used constructively in the context of a sampling and interpolation experiment. The uncertainty quantities suggest to sample non-uniformly, often with higher weight given to less connected vertices. We envision future work applying these local uncertainty principles to other signal processing tasks, as well as extending the notion of local uncertainty to other types of dictionaries for graph signals.

Acknowledgment

This work has been supported by the Swiss National Science Foundation research project Towards Signal Processing on Graphs, grant number: 2000_21/154350/1.

Appendix

To prove the Hausdorff-Young inequalities for graph signals, we start by restating the Riesz-Thorin interpolation theorem, which can be found in [43, Section IX.4]. This theorem is valid for any measure spaces with $\sigma$ -finite measures, and hence in the finite dimensional case.

We shall also need the following reverse form of the result:

or, equivalently, there exist constants $M_{p}$ and $M_{q}$ such that

If ${\mathcal{T}}$ is invertible and has a left-inverse ${\mathcal{T}}^{-1}$ that satisfies ${\mathcal{T}}^{-1}{\mathcal{T}}f=f$ for all $f$ , then the equivalence of (36) and (37) follows from taking $g={\mathcal{T}}f$ , $f={\mathcal{T}}^{-1}g$ , $M_{p}=N_{p}^{-1}$ , and $M_{q}=N_{q}^{-1}$ . The proof of (38) follows from the application of Theorem 8, with ${\mathcal{T}}$ replaced by ${\mathcal{T}}^{-1}$ and $f$ by ${\mathcal{T}}f$ . ∎

First, we have the Parseval equality $\|f\|_{2}^{2}=\|\hat{f}\|_{2}^{2}$ . Second, we have

Applying the Riesz-Thorin theorem with $p_{1}=2$ , $p_{2}=2$ , $M_{p}=1$ , $q_{1}=1$ , $q_{2}=\infty$ , $M_{q}=\mu_{\mathcal{G}}$ , $t=\frac{2}{q}$ , $r_{1}=p$ , and $r_{2}=q$ leads to the first inequality (8). The proof of the converse is similar, as we have

The graph Fourier transform is invertible, so (9) then follows from Corollary 4, with $p_{1}=\infty$ , $p_{2}=1$ , $M_{p}=\mu_{\mathcal{G}}^{-1}$ , $q_{1}=2$ , $q_{2}=2$ , $M_{q}=1$ , $t=\frac{2}{q}-1$ , $r_{1}=p$ , and $r_{2}=q$ . ∎

2 Variations of Lieb’s uncertainty principle

For any $f$ , the frame $\mathcal{D}$ satisfies

We combine (41) and (43) to obtain (39). The second inequality (40) is obtained using the following instance of Hölder’s inequality:

We then use Corollary 4, the converse of Riesz-Thorin, to interpolate between (44) and (41), and we find for $p\in$ :

Combining (45) with the second inequality in (41) yields (40). ∎

2.2 Discrete version of Lieb’s uncertainty principle

Theorem 3 is actually a particular case of Theorem 4. To see why, we need to understand the transformation between the graph framework used in this contribution and the classical discrete periodic case. The DFT basis vectors $\left\{u_{k}(n)=\frac{1}{\sqrt{N}}\exp\left(\frac{i2\pi kn}{N}\right)\right\}_{k=0,1,\ldots,N-1}$ can also be chosen as the eigenvectors of the graph Laplacian for a ring graph with $N$ vertices . The frequencies of the DFT, which correspond up to a sign to the inverse of the period of the eigenvectors, are not the same as the graph Laplacian eigenvalues on the ring graph, which are all positive. We can, however, form a bijection between the set $\sigma(\mathcal{L})$ of graph Laplacian eigenvalues and the set of $N$ frequencies of the DFT, by associating one member from each set sharing the same eigenvector. At this point, instead of considering graph filters as continuous functions evaluated on the Laplacian eigenvalues, we can define a graph filter as a mapping from each individual eigenvalue to a complex number. Note that an eigenvalue with multiplicity $2$ can have two different outputs (e.g., $\lambda_{3}=\lambda_{4}=1$ , but the filter has different values at $\lambda_{3}$ and $\lambda_{4}$ ). With this bijection and view of the graph spectral domain, we can recover the classical discrete periodic setting by forming a ring graph with $N$ vertices. Because the classical translation and modulation preserve 2-norms, the discrete windowed Fourier atoms of the form

all have the same norm $||g||_{2}$ . Together these $N^{2}$ atoms comprise a tight frame on the ring graph with frame bounds $A=B=N\|g\|_{2}^{2}$ . Inserting these values into (15) and (16) yields (13) and (14). ∎

For the case of $p\geq 2$ , we also provide an alternative direct proof following similar ideas to those used in Lieb’s proof for the continuous case . The arguments below follow the sketch of the proof of Proposition 2 in and supporting personal communication from Bruno Torrésani. We need two lemmas. The first one is a direct application of Theorem 2, where here $\mu_{\mathcal{G}}=1/\sqrt{N}$ .

Conversely, for $2\leq p\leq\infty$ , we have

The second lemma is an equivalent of Young’s inequality in the discrete case. We denote the circular convolution between two discrete signals $f,g$ by $f\ast g$ . The circular convolution satisfies $\widehat{f\ast g}=\hat{f}\cdot\hat{g}$ .

Let $f\in L^{p}$ , $g\in L^{q}$ , where $1\leq p,q,r\leq\infty$ satisfy $1+\frac{1}{r}=\frac{1}{p}+\frac{1}{q}$ . Then

The proof is based on the following inequalities [70, p. 174]

where $\frac{1}{p}+\frac{1}{p^{\prime}}=1$ . For a fixed function $g\in L^{q}$ , we define an operator ${\mathcal{T}}_{g}$ by $({\mathcal{T}}_{g}f)(n)=(f*g)(n)$ . Using (46) and (47), we observe that this operator is bounded from $L^{1}$ to $L^{1}$ by $\|g\|_{1}$ and from $L^{\infty}$ to $L^{\infty}$ by $\|g\|_{1}$ . Thus, we can apply the Riesz-Thorin theorem to this operator to get

Similarly, for a fixed function $f\in L^{p}$ , we define another operator $T_{f}$ by $(T_{f}g)(n)=(f*g)(n)$ . From (49) and (48), we observe that this new operator is bounded from $L^{1}$ to $L^{p}$ by $\|f\|_{p}$ and from $L^{p^{\prime}}$ to $L^{\infty}$ by $\|f\|_{p}$ . One more application of the Riesz-Thorin theorem leads to the desired result:

where $1+\frac{1}{r}=\frac{1}{p}+\frac{1}{q}$ . ∎

Suppose $p>2$ and let $\frac{1}{p}+\frac{1}{p^{\prime}}=1$ . We denote the DFT by $\mathcal{F}$ . Noting that $\frac{p}{p^{\prime}}>1$ , we have

for any $1\leq\alpha,\beta\leq\infty$ satisfying $\frac{1}{\alpha}+\frac{1}{\beta}=p^{\prime}$ . Equation (8.2.2) follows from the Hausdorff-Young inequality given in Lemma 4 and (8.2.2) follows from the Young inequality given in Lemma 5 with $r=\frac{p}{p^{\prime}}$ . Now we can perform a change variable $a=\alpha p^{\prime}$ and $b=\beta p^{\prime}$ so that $\frac{1}{a}+\frac{1}{b}=1$ , and (8.2.2) becomes

Finally, we take $a=b=2$ and take the $p^{th}$ root of (52) to show the first half of Theorem 3. Note that we cannot follow the same line of logic for the case $1\leq p\leq 2$ without a converse of the Young’s inequality in Lemma 5.