Rectangular Bounding Process

Xuhui Fan, Bin Li, Scott Anthony Sisson

Introduction

Stochastic partition processesThe code of the paper is provided in https://github.com/xuhuifan/RBP on a product space have found many real-world applications, such as regression trees , relational modeling , and community detection . By tailoring a multi-dimensional space (or multi-dimensional array) into a number of rectangular regions, the partition model can fit data using these “blocks” such that the data within each block exhibit certain types of homogeneity. As one can choose an arbitrarily fine resolution of partition, the data can be fitted reasonably well.

The cost of finer data fitness is that the partition model may induce unnecessary dissections in sparse regions. Compared to the regular-grid partition process , the Mondrian process (MP) is more parsimonious for data fitting due to a hierarchical partition strategy; however, the strategy of recursively cutting the space still cannot largely avoid unnecessary dissections in sparse regions. Consider e.g. a regression tree on a multi-dimensional feature space: as data usually lie in some local regions of the entire space, a “regular-grid” or “hierarchical” (i.e. kkd-tree) partition model would inevitably produce too many cuts in regions where data points rarely locate when it tries to fit data in dense regions (see illustration in the left panel of Figure 1). It is accordingly challenging for a partition process to balance fitness and parsimony.

Instead of this cutting-based strategy, we propose a bounding-based partition process – the Rectangular Bounding Process (RBP) – to alleviate the above limitation. The RBP generates rectangular bounding boxes to enclose data points in a multi-dimensional space. In this way, significant regions of the space can be comprehensively modelled. Each bounding box can be efficiently constructed by an outer product of a number of step functions, each of which is defined on a dimension of the feature space, with a segment of value “1” in a particular interval and “0” otherwise. As bounding boxes are independently generated, the layout of a full partition can be quite flexible, allowing for a simple description of those regions with complicated patterns. As a result, the RBP is able to use fewer blocks (thereby providing a more parsimonious expression of the model) than those cutting-based partition models while achieving a similar modelling capability.

The RBP has several favourable properties that make it a powerful nonparametric partition prior: (1) Given a budget parameter and the domain size, the expected total volume of the generated bounding boxes is a constant. This is helpful to set the process hyperparameters to prefer few large bounding boxes or many small bounding boxes. (2) Each individual data point in the domain has equal probability to be covered by a bounding box. This gives an equal tendency towards all possible configurations of the data. (3) The process is self-consistent in the sense that, when restricting a domain to its sub-domain, the probability of the resulting partition on the sub-domain is the same as the probability of generating the same partition on the sub-domain directly. This important property enables the RBP to be extendable from a hypercube to the infinite multi-dimensional space according to the Kolmogorov extension theorem . This is extremely useful for the domain changing problem setting, such as regression problems over streaming data.

The RBP can be a beneficial tool in many applications. In this paper we specifically investigate (1) regression trees, where bounding boxes can be viewed as local regions of certain labels, and (2) relational modelling, where bounding boxes can be viewed as communities in a complex network. We develop practical and efficient MCMC methods to infer the model structures. The experimental results on a number of synthetic and real-world data sets demonstrate that the RBP can achieve parsimonious partitions with competitive performance compared to the state-of-the-art methods.

Related Work

Stochastic partition processes divide a multi-dimensional space (for continuous data) or a multi-dimensional array (for discrete data) into blocks such that the data within each region exhibit certain types of homogeneity. In terms of partitioning strategy, state-of-the-art stochastic partition processes can be roughly categorized into regular-grid partitions and flexible axis-aligned partitions (Figure 1).

A regular-grid stochastic partition process is constituted by DD separate partition processes on each dimension of the DD-dimensional array. The resulting orthogonal interactions between two dimensions produce regular grids. Typical regular-grid partition models include the infinite relational model (IRM) and mixed-membership stochastic blockmodels . Bayesian plaid models generate “plaid” like partitions, however they neglect the dependence between latent feature values and more importantly, they are restricted to discrete arrays (in contrast to our continuous space model). Regular-grid partition models are widely used in real-world applications for modeling graph data .

To our knowledge, only the Mondrian process (MP) and the rectangular tiling process (RTP) can produce flexible axis-aligned partitions on a product space. The MP recursively generates axis-aligned cuts on a unit hypercube and partitions the space in a hierarchical fashion known as a kkd-tree. Compared to the hierarchical partitioning strategy, the RTP generates a flat partition structure on a two-dimensional array by assigning each entry to an existing block or a new block in sequence, without violating the rectangular restriction on the blocks.

2 Bayesian Relational Models

Most stochastic partition processes are developed to target relational modelling . A stochastic partition process generates a partition on a multi-dimensional array, which serves as a prior for the underlying communities in the relational data (e.g. social networks in the 2-dimensional case). After generating the partition, a local model is allocated to each block (or polygon) of the partition to characterize the relation type distribution in that block. For example, the local model can be a Bernoulli distribution for link prediction in social networks or a discrete distribution for rating prediction in recommender systems. Finally, row and column indexes are sampled to locate a block in the partition and use the local model to further generate the relational data.

Compared to the existing stochastic partition processes in relational modelling, the RBP introduces a very different partition approach: the RBP adopts a bounding-based strategy while the others are based on cutting-based strategies. This unique feature enables the RBP to directly capture important communities without wasting model parameters on unimportant regions. In addition, the bounding boxes are independently generated by the RBP. This parallel strategy is much more efficient than the hierarchical strategy and the entry-wise growing strategy .

3 Bayesian Regression Trees

The ensemble of regression/decision trees is a popular tool in regression tasks due to its competitive and robust performance against other models. In the Bayesian framework, Bayesian additive regression trees (BART) and the Mondrian forest are two representative methods.

BART adopts an ensemble of trees to approximate the unknown function from the input space to the output label. Through prior regularization, BART can keep the effects of individual trees small and work as a “weak learner” in the boosting literature. BART has shown promise in nonparametric regression; and several variants of BART have been developed to focus on different scenarios, including heterogeneous BART allowing for various observational variance in the space, parallel BART enabling parallel computing for BART, and Dirichlet additive regression trees imposing a sparse Dirichlet prior on the dimensions to address issues of high-dimensionality.

The Mondrian forest (MF) is built on the idea of an ensemble of Mondrian processes (kkd-trees) to partition the space. The MF is distinct from BART in that the MF may be more suitable for streaming data scenarios, as the distribution of trees sampled from the MP stays invariant even if the data domain changes over time.

The Rectangular Bounding Process

Sample the number of bounding boxes Kτ\mboxPoisson(τd=1D[1+λL(d)]){K}_{\tau}\sim\mbox{Poisson}(\tau\prod_{d=1}^{D}\left[1+\lambda L^{(d)}\right]);

For k=1,,Kτk=1,\ldots,K_{\tau}, d=1,,Dd=1,\ldots,D, sample the initial position sk(d)s_{k}^{(d)} and the length lk(d)l_{k}^{(d)} of the kk-th bounding box (denoted as k\Box_{k}) in the dd-th dimension:

Sample the initial position sk(d)s_{k}^{(d)} as

Sample KτK_{\tau} i.i.d. time points uniformly in (0,τ](0,\tau] and index them to satisfy t1<<tKτt_{1}<\ldots<t_{K_{\tau}}. Set the cost of k\square_{k} as mk=tktk1m_{k}=t_{k}-t_{k-1} (t0=0t_{0}=0).

Given a domain XX, hyperparameters τ\tau and λ\lambda, a random partition sampled from the RBP can be represented as: X\mboxRBP(X,τ,λ)\boxplus_{X}\sim\mbox{RBP}(X,\tau,\lambda). We assume that the costs of bounding boxes are i.i.d. sampled from the same exponential distribution, which implies there exists a homogeneous Poisson process on the time (cost) line. The generating time of each bounding box is uniform in (0,τ](0,\tau] and the number of bounding boxes has a Poisson distribution. We represent a random partition as X:={k,mk}k=1KτΩX\boxplus_{X}:=\{\Box_{k},m_{k}\}_{k=1}^{K_{\tau}}\in\Omega_{X}.

2 Coverage Probability

For any data point xXx\in X (including the boundaries of XX), the probability of x(d)x^{(d)} falling in the interval of [sk(d),sk(d)+lk(d)][s_{k}^{(d)},s_{k}^{(d)}+l_{k}^{(d)}] on uk(d){u_{k}^{(d)}} is a constant 11+λL(d)\frac{1}{1+\lambda L^{(d)}} (and does not depend on x(d)x^{(d)}). As the step functions {uk(d)}k\{u_{k}^{(d)}\}_{k} for constructing the kk-th bounding box k\Box_{k} in X\boxplus_{X} are independent, xx is covered by k\Box_{k} with a constant probability.

The property of constant coverage probability is particularly suitable for regression problems. Proposition 2 implies there is no biased tendency to favour different data regions in XX. All data points have equal probability to be covered by a bounding box in a RBP partition X\boxplus_{X}.

Another interesting observation can be seen from this property: Although we have specified a direction for generating the dd-th dimension of k\Box_{k} in the generative process (i.e. the initial position sk(d)s_{k}^{(d)} and the terminal position vk(d)=sk(d)+lk(d)v_{k}^{(d)}=s_{k}^{(d)}+l_{k}^{(d)}), the probability of generating u(d)u^{(d)} is the same if we reverse the direction of the dd-th dimension, which is p(sk(d),vk(d))=eλlk(d)1+λL(d)λ1sk(d)>0+1vk(d)<Lp(s_{k}^{(d)},v_{k}^{(d)})=\frac{e^{-\lambda l_{k}^{(d)}}}{1+\lambda L^{(d)}}\cdot\lambda^{\boldsymbol{1}_{s_{k}^{(d)}>0}+\boldsymbol{1}_{v_{k}^{(d)}<L}}. It is obvious that the joint probability of the initial position and the terminal position is invariant if we reverse the direction of the dd-th dimension. Direction is therefore only defined for notation convenience – it will not affect the results of any analysis.

3 Self-Consistency

The self-consistency property can be verified in three steps: (1) the distribution of the number of bounding boxes is self-consistent; (2) the position distribution of a bounding box is self-consistent; (3) the RBP is self-consistent. In the following, we use πY,X\pi_{Y,X} to denote the projection that restricts YΩY\boxplus_{Y}\in\Omega_{Y} to XX by keeping Y\boxplus_{Y}’s partition in XX unchanged and removing the rest.

The time points of bounding boxes crossing into XX from YY follow the same Poisson process for generating the time points of bounding boxes in a \mboxRBP(X,τ,λ)\mbox{RBP}(X,\tau,\lambda). That is PKτ,{mk}k=1KτY(πY,X1(KτX,{mkX}k=1KτX))=PKτ,{mk}k=1KτX(KτX,{mkX}k=1KτX)P^{Y}_{K_{\tau},\{m_{k}\}_{k=1}^{K_{\tau}}}\left(\pi_{Y,X}^{-1}\left(K_{\tau}^{X},\{m_{k}^{X}\}_{k=1}^{K_{\tau}^{X}}\right)\right)=P^{X}_{K_{\tau},\{m_{k}\}_{k=1}^{K_{\tau}}}\left(K_{\tau}^{X},\{m_{k}^{X}\}_{k=1}^{K_{\tau}^{X}}\right).

The marginal probability of the pre-images of a bounding box X\Box^{X} in YY (given the bounding box in YY would cross into XX) equals the probability of X\Box^{X} being directly sampled from a \mboxRBP(X,τ,λ)\mbox{RBP}(X,\tau,\lambda). That is, P^{Y}_{\Box}\left(\pi_{Y,X}^{-1}(\Box^{X})\bigm{|}{\pi_{Y,X}(\Box^{Y})}\neq\emptyset\right)=P^{X}_{\Box}(\Box^{X}).

Combining 1 and 2 leads to the self-consistency of the RBP: PY(πY,X1(X))=PX(X)P^{Y}_{\boxplus}\left(\pi_{Y,X}^{-1}(\boxplus_{X})\right)=P^{X}_{\boxplus}(\boxplus_{X}).

Application to Regression Trees

The generative process for the RBP-regression tree is as follows:

2 Sampling for RBP-Regression Tree

The joint probability of the label data {yn}n=1N\{y_{n}\}_{n=1}^{N}, the number of bounding boxes KτK_{\tau}, the variables related to the bounding boxes {ωk,sk(1:D),lk(1:D)}k=1Kτ\{\omega_{k},s_{k}^{(1:D)},l_{k}^{(1:D)}\}_{k=1}^{K_{\tau}}, and the error variance σ2\sigma^{2} is

We adopt MCMC methods to iteratively sampling from the resulting posterior distribution.

Sample KτK_{\tau}: We use a similar strategy to for updating KτK_{\tau}. We accept the addition or removal of a bounding box with an acceptance probability of min(1,αadd)\min(1,\alpha_{\textrm{add}}) or min(1,αdel)\min(1,\alpha_{\textrm{del}}) respectively, where

λ=d[1+λL(d)]\lambda_{*}=\prod_{d}\left[1+\lambda L^{(d)}\right], PKτ(yn)=P(ynKτ,{ωk,sk(d),lk(d)}k,σ2,{xn}n)P_{K_{\tau}}(y_{n})=P(y_{n}|K_{\tau},\{\omega_{k},s_{k}^{(d)},l_{k}^{(d)}\}_{k},\sigma^{2},\{x_{n}\}_{n}), and P0=12P_{0}=\frac{1}{2} (or 1P01-P_{0}) denotes the probability of proposing to add (or remove) a bounding box.

Both σ2\sigma^{2} and {ωk}k\{\omega_{k}\}_{k} can be sampled through Gibbs sampling:

where μ=(σ2)(μωϵω2+n(ynkkωk1xk(xn))σ2)\mu^{*}=(\sigma^{2})^{*}\left(\frac{\mu_{\omega}}{\epsilon_{\omega}^{2}}+\frac{\sum_{n}\left(y_{n}-\sum_{k^{\prime}\neq k}\omega_{k^{\prime}}\cdot\boldsymbol{1}_{x\in\Box_{k^{\prime}}}(x_{n})\right)}{\sigma^{2}}\right), (σ2)=(1ϵω2+Nkσ2)1(\sigma^{2})^{*}=\left(\frac{1}{\epsilon_{\omega}^{2}}+\frac{N_{k}}{\sigma^{2}}\right)^{-1}, and NkN_{k} is the number of data points belonging to the kk-th bounding box.

To update uk(d)u_{k}^{(d)} we use the Metropolis-Hastings algorithm, generating the proposed sk(d),lk(d)s_{k}^{(d)},l_{k}^{(d)} (which determine uk(d)u_{k}^{(d)}) using the RBP generative process (Step 2.(a)(b)). Thus, the acceptance probability is based purely on the likelihoods for the proposed and current configurations.

3 Experiments

We empirically test the RBP regression tree (RBP-RT) for the regression task. We compare the RBP-RT with several state-of-the-art methods: (1) a Bayesian additive regression tree (BART) ; (2) a Mondrian forest (MF) ; (3) a random forest (RF) ; and (4) an extremely randomized tree (ERT) . For BART, we implement a particle MCMC strategy to infer the structure of each tree in the model; for the MF, we directly use the existing code provided by the authorhttps://github.com/balajiln/mondrianforest; for the RF and ERT, we use the implementations in the scikit-learn toolbox .

We first test the RBP-RT on a simple synthetic data set. A total of 77 bounding boxes are assigned to the unit square 2^{2}, each with its own mean intensity ωk\omega_{k}. From each bounding box 508050\sim 80 points are uniformly sampled, with the label data yiy_{i} generated from a normal distribution, with mean the sum of the intensities of the bounding boxes covering the data point, and standard deviation σ=0.1\sigma=0.1. In this way, a total of 400400 data points are generated in 2^{2}.

To implement the RBP-TR we set the total number of iterations to 500500, λ=2\lambda=2 (i.e. the expected box length is 13\frac{1}{3}) and τ=1\tau=1 (i.e. the expected number of bounding boxes is 9090). It is worth noting that 9090 is a relatively small number of blocks when compared to the other state-of-the-art methods. For instance, BART usually sets the number of trees to be at least 5050 and there are typically more than 1616 blocks in each tree (i.e. at least 800800 blocks in total). The right panel of Figure 3 shows a visualization of the data fitting result of the RBP regression-tree.

Real-world data:

We select several real-world data sets to compare the RBP-RT and the other state-of-the-art methods: Protein Structure (N=45,730,D=9N=45,730,D=9), Naval Plants (N=11,934,D=16N=11,934,D=16), Power Plants (N=9,569,D=4N=9,569,D=4), Concrete (N=1,030,D=8N=1,030,D=8), and Airfoil Self-Noise (N=1,503,D=8N=1,503,D=8). Here, we first use PCA to select the 44 largest components and then normalize them so that they lie in the unit hypercube for ease of implementation. As before, we set the total number of iterations to 500, λ=2\lambda=2 and this time set τ=2\tau=2 (i.e. the expected number of bounding boxes is 180180).

The resulting Residual Mean Absolute Errors (RMAE) are reported in Table 1. In general, the three Bayesian tree ensembles perform worse than the random forests. This may in part be due to the maturity of development of the RF algorithm toolbox. While the RBP-RT does not perform as well as the random forests, its performance is comparable to that of BART and MF (sometimes even better), but with many fewer bounding boxes (i.e. parameters) used in the model, clearly demonstrating its parsimonious construction.

Application to Relational Modeling

Another possible application of the RBP is in relational modeling. Given relational data as an asymmetric matrix R{0,1}N×NR\in\{0,1\}^{N\times N}, with RijR_{ij} indicating the relation from node ii to node jj, the bounding boxes {k}k\{\Box_{k}\}_{k} with rates {ωk}k\{\omega_{k}\}_{k} belonging to a partition \boxplus may be used to model communities with different intensities of relations.

Together, the RBP and the mapping function σ()\sigma(\cdot) play the role of the random function W()W(\cdot) defined in the graphon literature . Along with the uniformly generated coordinates for each node, the RBP relational model is expected to uncover homogeneous interactions in RR as compact boxes.

2 Sampling for RBP-Relational Model

The joint probability of the label data {Rij}i,j\{R_{ij}\}_{i,j}, the number of bounding boxes KτK_{\tau}, the variables related to the bounding boxes {ωk,sk(1:D),lk(1:D)}k=1Kτ\{\omega_{k},s_{k}^{(1:D)},l_{k}^{(1:D)}\}_{k=1}^{K_{\tau}}, and the coordinates {ξn,ηn}n\{\xi_{n},\eta_{n}\}_{n} for the nodes (with L(1)==L(D)=1L^{(1)}=\ldots=L^{(D)}=1 in the RBP relational model) is

We adopt MCMC methods to iteratively sample from the resulting posterior distribution.

Sample KτK_{\tau}: We use a similar strategy to for updating KτK_{\tau}. We accept the addition or removal of a bounding box with an acceptance probability of min(1,αadd)\min(1,\alpha_{\textrm{add}}) or min(1,αdel)\min(1,\alpha_{\textrm{del}}) respectively, where

where λ=(1+λ)2\lambda_{*}=(1+\lambda)^{2}, PKτ(Rn1,n2)=P(Rn1,n2Kτ,{ωk,sk(1:D),lk(1:D)}k,ξn1,ηn2)P_{K_{\tau}}(R_{n_{1},n_{2}})=P(R_{n_{1},n_{2}}|K_{\tau},\{\omega_{k},s_{k}^{(1:D)},l_{k}^{(1:D)}\}_{k},\xi_{n_{1}},\eta_{n_{2}}) and P0=12P_{0}=\frac{1}{2} (or 1P01-P_{0}) denotes the probability of proposing to add (or remove) a bounding box.

For the kk-th box, k{1,,Kτ}k\in\{1,\cdots,K_{\tau}\}, a new ωk\omega_{k}^{*} is sampled from the proposal distribution \mboxExp(1)\mbox{Exp}(1). We then accept ωk\omega_{k}^{*} with probability min(1,α)\min(1,\alpha), where

This update is the same as for the RBP-regression tree sampler (Section 4.2).

We propose new ξn1,ηn2\xi_{n_{1}},\eta_{n_{2}} values from the uniform prior distribution. Thus, the acceptance probability is purely based on the likelihoods of the proposed and current configurations.

3 Experiments

We empirically test the RBP relational model (RBP-RM) for link prediction. We compare the RBP-RM with four state-of-the-art relational models: (1) IRM (regular grids); (2) LFRM (plaid grids); (3) MP relational model (MP-RM) (hierarchical kkd-tree); (4) BSP-tree relational model (BSP-RM) ; (5) Matrix tile analysis relational model (MTA-RM) (noncontiguous tiles). For inference on the IRM and LFRM, we adopt collapsed Gibbs sampling algorithms, for MP-RM we use reversible-jump MCMC , for BSP-RM we use particle MCMC, and for MTA-RM we implement the iterative conditional modes algorithm used in .

Data Sets: Five social network data sets are used: Digg, Flickr , Gplus , Facebook and Twitter . We extract a subset of nodes (the top 1000 active nodes based on their interactions with others) from each data set for constructing the relational data matrix.

Experimental Setting: The hyper-parameters for each method are set as follows: In IRM, we let the concentration parameter α\alpha be sampled from a gamma Γ(1,1)\Gamma(1,1) prior, and the row and column partitions be sampled from two independent Dirichlet processes; In LFRM, we let α\alpha be sampled from a gamma Γ(2,1)\Gamma(2,1) prior. As the budget parameter for MP-RM and BSP-RM is hard to sample , we set it to 33, implying that around (3+1)×(3+1)(3+1)\times(3+1) blocks would be generated. For the parametric model MTA-RM, we simply set the number of tiles to 16; In RBP-RM, we set λ=0.99\lambda=0.99 and τ=3\tau=3, which leads to an expectation of 1212 boxes. The reported performance is averaged over 10 randomly selected hold-out test sets (Train:Test = 9:1).

Results: Table 2 reports the link prediction performance comparisons for each method and datasets. We see that the RBP-RM achieves competitive performance against the other methods. Even on the data sets it does not obtain the best score, its performance is comparable to the best. The overall results validate that the RBP-RM is effective in relational modelling due to its flexible and parsimonious construction, attaching bounding boxes to dense data regions.

Figure 4 visually illustrates the RBP-RM partitions patterns for each dataset. As is apparent, the bounding-based RBP-RM method indeed describing dense regions of relational data matrices with relatively few bounding boxes (i.e. parameters). An interesting observation from this partition format, is that the overlapping bounding boxes are very useful for describing inter-community interactions (e.g. overlapping bounding boxes in Digg, Flickr, and Gplus) and community-in-community interactions (e.g. upper-right corner in Flickr and Gplus). Thus, in addition to competitive and parsimonious performance, the RBP-RM also produces intuitively interpretable and meaningful partitions (Figure 4).

Conclusion

We have introduced a novel and parsimonious stochastic partition process – the Rectangular Bounding Process (RBP). Instead of the typical cutting-based strategy of existing partition models, we adopt a bounding-based strategy to attach rectangular bounding boxes to model dense data regions in a multi-dimensional space, which is able to avoid unnecessary dissections in sparse data regions. The RBP was demonstrated in two applications: regression trees and relational modelling. The experimental results on these two applications validate the merit of the RBP, that is, competitive performance against existing state-of-the-art methods, while using fewer bounding boxes (i.e. fewer parameters).

Acknowledgements

Xuhui Fan and Scott A. Sisson are supported by the Australian Research Council through the Australian Centre of Excellence in Mathematical and Statistical Frontiers (ACEMS, CE140100049), and Scott A. Sisson through the Discovery Project Scheme (DP160102544). Bin Li is supported by Fudan University Startup Research Grant and Shanghai Municipal Science & Technology Commission (16JC1420401).

Appendix A Calculation of the expected side length in Proposition 1

where the first term after =(1)\overset{(1)}{=} refers to the expected length of the interval when starting at and the second term refers to the expected length when starting from a point larger than . We need to use the equality d[(1λ2xλ)eλx]dx=xeλx\frac{d\left[(-\frac{1}{\lambda^{2}}-\frac{x}{\lambda})e^{-\lambda x}\right]}{dx}=xe^{-\lambda x} in the above derivation.

Appendix B Proof for Proposition 2 (Coverage Probability)

If x(d)x^{(d)} locates on the initial boundary, u(d)(0)=1u^{(d)}(0)=1 i.f.f. s(d)=0s^{(d)}=0, which is

If x(d)>0x^{(d)}>0, we have the corresponding probability as

Appendix C Proof for Proposition 3 (Self-Consistency)

According to the definition of Poisson process, the bounding boxes sampled from \mboxRBP(Y,τ,λ)\mbox{RBP}(Y,\tau,\lambda) (or \mboxRBP(X,τ,λ)\mbox{RBP}(X,\tau,\lambda)) follow a homogeneous Poisson process with intensity d(1+λLY(d))\prod_{d}(1+\lambda L_{Y}^{(d)}) (or d(1+λLX(d))\prod_{d}(1+\lambda L_{X}^{(d)})). Given the same budget τ\tau, the result holds if we can prove the following equality of the two Poisson process intensities

Due to the independence of dimensions, P(πY,X(Y))P\left({\pi_{Y,X}(\Box^{Y})}\neq\emptyset\right) can be rewritten as

where we use πY,X(uY(d))>0|\pi_{Y,X}(u_{Y}^{(d)})|>0 to denote the case that the step function uY(d)u_{Y}^{(d)} would take value “1” in an interval of the dd-th dimension of domain XX.

W.l.o.g, we assume that the two domains, XX and YY, have the same shape apart from the dd^{\prime}-th dimension where the length of dimension LY(d)L_{Y}^{(d^{\prime})} in YY is larger than that of in XX.

There are three cases to consider: (1) XX and YY share the terminal boundary in the dd^{\prime}-th dimension; (2) XX and YY share the initial boundary in the dd^{\prime}-th dimension; (3) XX and YY do not share the boundaries in the dd^{\prime}-th dimension. Proving case (1) and case (2) together would obviously lead to case (3). In either case, by independence of dimensions, we need to prove the following equation.

In case (1) where XX and YY share the terminal boundary in the dd^{\prime}th dimension, we have

where the first term after =(2)\overset{(2)}{=} corresponds to the probability that sY(d)s_{Y}^{(d^{\prime})} locates directly in the interval of (LY(d)LX(d),LY(d)](L_{Y}^{(d^{\prime})}-L_{X}^{(d^{\prime})},L_{Y}^{(d^{\prime})}], the second term corresponds to the probability that sY(d)s_{Y}^{(d^{\prime})} locates on the initial boundary and has the length larger than LY(d)LX(d)L_{Y}^{(d^{\prime})}-L_{X}^{(d^{\prime})}, and the third term corresponds to the probability that sY(d)s_{Y}^{(d^{\prime})} locates in the interval of (0,LY(d)LX(d)](0,L_{Y}^{(d^{\prime})}-L_{X}^{(d^{\prime})}] (excluding the initial boundary) and has the length larger than LY(d)LX(d)sY(d)L_{Y}^{(d^{\prime})}-L_{X}^{(d^{\prime})}-s_{Y}^{(d^{\prime})}.

In case (2) where XX and YY share the initial boundary in the dd^{\prime}th dimension, we have

since πY,X(uY(d))>0|\pi_{Y,X}(u^{(d^{\prime})}_{Y})|>0 requires the condition of sY(d)[0,LX(d)]s_{Y}^{(d^{\prime})}\in[0,L_{X}^{(d^{\prime})}] because sY(d)[0,LX(d)]s_{Y}^{(d^{\prime})}\notin[0,L_{X}^{(d^{\prime})}] would lead to the result that πY,X(uY(d))=0\pi_{Y,X}(u^{(d^{\prime})}_{Y})=0.

The conclusion can be derived as above. ∎

Because of the same Poisson process intensity Eq. (10), the following equality also holds

Proposition 3.2: The position distribution of a bounding box is self-consistent

W.l.o.g, we assume that the two domains, XX and YY, have the same shape apart from the dd^{\prime}th dimension where YY has additional space of length LL^{\prime} (L>0L^{\prime}>0) (the general case follows by induction). For dimensions ddd\neq d^{\prime}, it is obvious that the law of bounding boxes are consistent under projection because the projection is the identity. Given the same budget τ\tau, The result holds if we can prove the following equality

There are two cases to consider: (1) XX and YY share the initial boundary in the dd^{\prime}th dimension; (2) XX and YY share the terminal boundary in the dd^{\prime}th dimension. In each case, there are two cases (denoted as A & B in the following) regarding whether the terminal/initial (for Case 1/2, respectively) position locates on the boundary of XX. In total we have four cases to discuss as follows.

In case (1) where XX and YY share the initial boundary, according to Eq. (14), we have P(πY,X(uY(d))>0)=1+λLX(d)1+λLY(d)P\left(|\pi_{Y,X}(u_{Y}^{(d)})|>0\right)=\frac{1+\lambda L_{X}^{(d^{\prime})}}{1+\lambda L_{Y}^{(d^{\prime})}}.

For convenience of notation, we let \theta_{\dagger}=P^{Y}_{s}(s^{(d^{\prime})}_{Y}\bigm{|}s^{(d^{\prime})}_{Y}\in X), specifically θ=11+λLX(d)\theta_{\dagger}=\frac{1}{1+\lambda L^{(d^{\prime})}_{X}} if sY(d)=0s^{(d^{\prime})}_{Y}=0; θ=λ1+λLX(d)\theta_{\dagger}=\frac{\lambda}{1+\lambda L^{(d^{\prime})}_{X}} if sY(d)>0s^{(d^{\prime})}_{Y}>0.

[Case 1.A] For 0<sX(d)+lX(d)<LX(d)<LY(d)0<s^{(d^{\prime})}_{X}+l^{(d^{\prime})}_{X}<L_{X}^{(d^{\prime})}<L_{Y}^{(d^{\prime})},

[Case 1.B] For 0<sX(d)+lX(d)=LX(d)<LY(d)0<s^{(d^{\prime})}_{X}+l^{(d^{\prime})}_{X}=L_{X}^{(d^{\prime})}<L_{Y}^{(d^{\prime})},

In case (2) where XX and YY share the terminal boundary, according to Eq. (C), we have P(πY,X(uY(d))>0)=1+λLX(d)1+λLY(d)P\left(|\pi_{Y,X}(u_{Y}^{(d)})|>0\right)=\frac{1+\lambda L_{X}^{(d^{\prime})}}{1+\lambda L_{Y}^{(d^{\prime})}}.

[Case 2.A] For πY,X(sY(d))=0\pi_{Y,X}(s^{(d^{\prime})}_{Y})=0, we have sX(d)=0s^{(d^{\prime})}_{X}=0. What is more, we have

where θ=eλlX(d)\theta_{\ddagger}=e^{-\lambda l_{X}^{(d^{\prime})}} if πY,X(sY(d))+lX(d)=LX(d)\pi_{Y,X}(s^{(d^{\prime})}_{Y})+l^{(d^{\prime})}_{X}=L_{X}^{(d^{\prime})}; θ=λeλlX(d)\theta_{\ddagger}=\lambda e^{-\lambda l_{X}^{(d^{\prime})}} if πY,X(sY(d))+lX(d)<LX(d)\pi_{Y,X}(s^{(d^{\prime})}_{Y})+l^{(d^{\prime})}_{X}<L_{X}^{(d^{\prime})}.

[Case 2.B] For πY,X(sY(d))>0\pi_{Y,X}(s^{(d^{\prime})}_{Y})>0, we have sX(d)>0s^{(d^{\prime})}_{X}>0,

Consider all DD dimensions, for each case, we have P^{Y}_{\Box}\left(\pi_{Y,X}^{-1}(\Box^{X})\bigm{|}{\pi_{Y,X}(\Box^{Y})}\neq\emptyset\right)=P^{X}_{\Box}(\Box^{X}). ∎

Proposition 3.3: RBP is self-consistent

We can obtain Eq. (23) from Eq. (22) because

We can obtain Eq. (24) from Eq. (23) because of independence of bounding boxes. Eq. (25) is derived from Eq. (24) by applying Proposition 3.1 while Eq. (C) is derived from Eq. (25) by applying Proposition 3.2.

References