Multivariate Stein Factors for a Class of Strongly Log-concave Distributions
Lester Mackey, Jackson Gorham
Introduction
Here, represents the density of with respect to Lebesgue measure.
Next, one shows that, for every test function in a convergence-determining class , the Stein equation
admits a solution in a set of functions with uniformly bounded low-order derivatives. These uniform derivative bounds are commonly termed Stein factors.
Finally, one uses whatever tools necessary to upper bound the Stein discrepancyNot to be confused with the “Stein discrepancy” of , which names an entirely different quantity.
which by construction upper bounds the reference metric .
To date, this recipe has been successfully used with the Langevin operator (1) to obtain explicit approximation error bounds for a wide variety of univariate targets [see, e.g., 7, 6].In the univariate setting, the operator (1) is commonly called Stein’s density operator. The same operator has been used to analyze multivariate Gaussian approximation , but few other multivariate distributions have established Stein factors. To extend the reach of the multivariate literature, we derive uniform Stein factor bounds for a broad class of strongly log-concave target distributions in Theorem 2.1. The result covers common Bayesian target distributions, including Bayesian logistic regression posteriors under Gaussian priors, and explicitly relates the Stein discrepancy (3) and practical Monte Carlo diagnostics based thereupon to standard probability metrics, like the Wasserstein distance.
and we term a function -strongly log-concave if is -strongly concave. We finally let for all functions and define the Lipschitz constants
Stein factors for strongly log-concave distributions
solves the the Stein equation (2) and satisfies
Theorem 2.1 implies that the Stein discrepancy (3) with set
To establish the second, we fix and and define the smoothed function
In the final equality we have used the fact that and are jointly normal with zero mean and covariance \Sigma=\begin{bmatrix}\mathopen{}\mathclose{{}\left\|{v}}\right\|_{2}^{2}&\langle{v},{w}\rangle\\ \langle{v},{w}\rangle&\mathopen{}\mathclose{{}\left\|{w}}\right\|_{2}^{2}\end{bmatrix}, so that the product has the distribution of the off-diagonal element of the Wishart distribution with scale and degree of freedom.
We can now develop a bound for using our smoothed functions. Let
While Lemma 2.2 targets Lipschitz test functions, comparable results can be obtained for non-smooth functions, like the indicators of convex sets, by adapting the smoothing technique of [3, Lem. 2.1].
Before turning to the proof of Theorem 2.1, we illustrate a practical application to measuring the quality of Monte Carlo or cubature sample points in Bayesian inference. Consider the Bayesian logistic regression posterior density [see, e.g., 11]
Hence, Theorem 2.1 applies with k=1/\sigma^{2},L_{3}=\frac{\sum_{l=1}^{L}\mathopen{}\mathclose{{}\left\|{v_{l}}}\right\|_{2}^{3}}{6\sqrt{3}}, and L_{4}=\frac{\sum_{l=1}^{L}\mathopen{}\mathclose{{}\left\|{v_{l}}}\right\|_{2}^{4}}{8}. We may now plug the associated Stein factors
into the non-uniform graph Stein discrepancy of to obtain a computable upper bound on or for any discrete probability measure .
Proof of Theorem 2.1
Before tackling the main proof, we will establish a series of useful lemmas. We will make regular use of the following well-known Lipschitz property:
Our first lemma enumerates several properties of the overdamped Langevin diffusion that will prove useful in the proofs to follow.
Consider the Lyapunov function V(x)=\mathopen{}\mathclose{{}\left\|{x}}\right\|_{2}^{2}+1. The strong log-concavity of , the Cauchy-Schwarz inequality, and the arithmetic-geometric mean inequality imply that
2 High-order weighted difference bounds
A second, technical lemma bounds the growth of weighted smooth function differences in terms of the proximity of function arguments. The result will be used to characterize the smoothness of as a function of the starting point (Lemma 3.5) and, ultimately, to establish the smoothness of (Theorem 2.1).
To establish the second-order difference bound (5), we first apply Taylor’s theorem with mean-value remainder to and to obtain
To derive the third-order difference bound (6), we apply Taylor’s theorem with mean-value remainder to , , , and to write
To bound the subsequent line, we note that Cauchy-Schwarz, the definition of the operator norm, and the Lipschitz property (4) imply that
Finally, Cauchy-Schwarz and the definition of the operator norm give
Bounding the third-order difference (7) in terms of these four estimates yields (6).
3 Synchronous coupling lemma
Our proof of Theorem 2.1 additionally rests upon a series of coupling inequalities which serve to characterize the smoothness of as a function of . The couplings espoused in the lemma to follow are termed synchronous, because the same Brownian motion is used to drive each process.
For each starting point of the form with , , and , consider an overdamped Langevin diffusion solving the stochastic differential equation
These coupled processes almost surely satisfy the synchronous coupling bounds,
the second-order differenced function bound,
and the third-order differenced function bound,
By Lemma 3.1, each process with , , and is well-defined for all times . The first-order bound (10) is well known, and a concise proof can be found in .
To establish the second conclusion (11), we consider the Itô process of second-order differences
and apply Itô’s lemma to the mapping (t,w)\mapsto e^{kt/2}\mathopen{}\mathclose{{}\left\|{w}}\right\|_{2}. This yields
where, to achieve the second inequality, we used the -strong log-concavity of . Now we may derive the second-order synchronous coupling bound (11), since
Applying the synchronous coupling bound (11) to the estimate (16) finally delivers the second-order differenced function bound (13).
Third-order bounds
To establish the third conclusion (12), we consider the Itô process of third-order differences
and invoke Itô’s lemma once more for the mapping (t,w)\mapsto e^{kt/2}\mathopen{}\mathclose{{}\left\|{w}}\right\|_{2}. This produces
In the final line, we used the -strong log-concavity of . Our efforts now yield (12) via
The third-order differenced function bound (3.5) then follows by applying the third-order synchronous coupling bound (12) to the estimate (18).
4 Proof of Theorem 2.1
for a -dimensional Wiener process. In what follows, when considering the joint distribution of a finite collection of overdamped Langevin diffusions, we will assume that the diffusions are coupled in the manner of Lemma 3.5, so that each diffusion is driven by a shared -dimensional Wiener process .
To see that the integral representation of is well-defined, note that
The first relation uses the stationarity of , the second uses the Lipschitz relation (4), the third uses the first-order coupling inequality (10) of Lemma 3.5, and the last uses the fact that strongly log-concave distributions have subexponential tails and therefore finite moments of all orders [8, Lem. 1].
The second relation is an application of the Lipschitz relation (4), and the third applies the first-order coupling inequality (10) of Lemma 3.5.
To demonstrate that is differentiable with Lipschitz gradient, we first establish a weighted second-order difference inequality for .
We apply the Lemma 3.5 second-order function coupling inequality (13) to obtain
The desired bound follows by integrating the final expression.
Indeed, Lemma 3.7 implies that, for any integers ,
Hence, the sequence \mathopen{}\mathclose{{}\left(\frac{u_{h}(x+v/m)-u_{h}(x)}{1/m}}\right)_{m=1}^{\infty} is Cauchy, and the directional derivative (21) exists.
where the second inequality follows from Lemma 3.7. Since each directional derivative is Lipschitz continuous, we may conclude that is continuously differentiable with Lipschitz continuous gradient . Our Lipschitz function deduction (19) and the Lipschitz relation (4) additionally supply the uniform bound
To demonstrate that is differentiable with Lipschitz gradient, we begin by establishing a weighted third-order difference inequality for .
Introduce the shorthand and . We apply the Lemma 3.5 third-order function coupling inequality (3.5) to the thrice continuously differentiable function to obtain
Integrating this final expression yields the advertised bound.
Lemma 3.9 guarantees that, for any integers ,
Hence, the sequence \mathopen{}\mathclose{{}\left(\frac{\nabla_{v}u_{h}(x+v^{\prime}/m)-\nabla_{v}u_{h}(x)}{1/m}}\right)_{m=1}^{\infty} is Cauchy, and the directional derivative (24) exists.