A Unified Framework for Approximating and Clustering Data
Dan Feldman, Michael Langberg
Introduction
In this work, we present a unified framework for the efficient construction of coresets for clustering problems corresponding to a given function set . Our coresets are obtained via a new and natural reduction to the well studied notion of -approximation from the theory of VC dimension [VC71]. The reduction from coresets to -approximations allows our framework to rely only on the combinatorial complexity of the input family of functions (i.e., the combinatorial complexity of the clustering problem at hand), and to use the vast literature on -approximation to obtain improved results (that are at times deterministic). For several function families for which coresets are known not to exist, or the corresponding (approximate) optimization problems are hard, our framework yields bicriteria approximation, or coresets that are large, but contained in a low-dimensional space.
In the body of the paper, we give an overview of the contributions of our work. We start by presenting, in Section 2, several concrete results that follow from our algorithmic paradigm, including a detailed comparison with corresponding previous work. We then present the main proof techniques and conceptual novelties in our approach in Section 3. Finally, in Section 4, we present a detailed overview of our algorithms for the construction of corestes and bicriteria approximation. The above discussion will take up the body of this extended abstract. All of the technical details of our results appear in the (self contained) appendix. A first application of our framework (for HD-image processing) already appeared in [FFS11].
Concrete Contributions
All the algorithms that are described in this section are randomized, and succeed with probability at least (or any other constant approaching ).
The construction time of the strong and weak coresets is . All our coresets and running times below are generalized to sum of distances to the power of , after replacing the term in the corresponding results by .
2 k𝑘k-Median and its generalizations
3 k𝑘k-Line median and its generalizations
4 Subspace approximation
For the case , Boutsidis et al. [BDMI11] provide randomized and deterministic CUR decompositions using columns. They also provide an updated reference for this long line of research. Mahoney and Drineas suggested a randomized algorithm that yields a -approximation for the case [MD09].
5 Projective clustering
Note that this result does not have the reduction property of weak coresets as defined in the beginning of this section. That is, even if we have an algorithm that computes the optimal set of -subspaces for any given set of points, it is not clear how to use it with in order to have a more efficient solution for the original problem. Similarly, it seems that this result can not be generalized for the streaming model when the subspace needs to be computed for a stream of points using less than space.
For these problems (where ), we suggest strong, weak, and streaming coresets contained in low-dimensional subspaces, and therefore take sub-linear space. Our coresets, referred to as -coresets, were described in Section 2.1, and are used as the first step for the construction of all the coresets presented in this section (including when or ).
Novelties in proof techniques
As specified in Section 2, our unified framework yields a number of improved results in the context of approximate clustering and shape fitting. In what follows, we briefly touch on the major new ideas used in our algorithms allowing theses improved results. Reduction to -approximation: The main reason that our framework is able to address a spectrum of clustering and approximation problems lies in our reduction from the inconsistent definition of coresets to the notion of -approximation. Using this reduction we can: (i) use a common ground in our analysis, thus removing the specialized (and sometimes tedious) analysis of the required sampling sizes used in many of the related works mentioned in Section 2. (ii) use smaller sample sizes that improve on those obtained in previous works, due to recent results taken from the context of Machine Learning [LLS00]. (iii) apply numerous results from the field of Computational Geometry, dated back to [HW86], regarding the study of VC-dimension and -approximations. For example: deterministic constructions [Mat95], for convex shapes (which have unbounded VC-dimension) [CEG+95], and in the streaming model [BCEG07].
Our reduction includes multiple stages and uses the new notions of robust approximation and robust corests as intermediate points. We elaborate on our reduction to -approximation (including our new notions) in the upcoming Section 4 which addresses a detailed overview of our framework. Functional representation of data elements and coresets: To study coresets over a wide range of objectives, we present an abstract framework in which the data points are considered as functions. Namely, for a center , the value represents the cost of clustering the data element corresponding to with . This representation is not superficial, and is in a sense crucial, as in our setting the coresets we construct are no longer “data elements” (as is common in the literature) but rather functions as well. Indeed, in some cases, our coresets will correspond to a subset of data elements, and thus their representation by functions will have no special meaning. However, in several cases the coreset consists of a small set of functions, that are closely related to the original data functions, however differ in certain behaviors.
For example, several of our coresets use functions corresponding to the data functions such that only if is smaller than a certain threshold; otherwise will be neglected and equal to zero. Another example includes the use of functions that correspond fully to data elements , but appear in the coreset as having negative weight. We extend and generalize results from [FMSW10] that had such properties. However, unlike in [FMSW10], a PTAS for the optimization problem can be computed from the coresets without using the original data.
One may argue that this skewed succinct representation of the original data violates the traditional line of thought in which a coreset consists of a subset of “real” data elements, and thus in many cases we make an effort in finding such “standard” coresets. However, when considering the computational objective in the construction of coresets, namely a tool to allow the efficient approximation of clustering problems, our notion of coresets plays a role equivalent to that of standard coresets. The flexibility in allowing our coresets to deviate from standard conception is a key point in our ability to obtain improved results. Generalized range spaces: In the vast literature on clustering, the notion of coresets is defined in several ways. Two common definitions include strong and weak coresets, which roughly speaking, address the combinatorial and computational aspects of clustering respectively. Namely, strong coresets require a similar behavior when compared to the data set for every set of centers, while weak coresets require “just enough” so that the coreset can be used in the design of efficient algorithms for approximate clustering.
In this work we unify the study of weak coresets that was used recently in [AHPV05, FMS07, FMSW10] with older results related to -approximation [CF90], called -frames. As our work reduces the study of coresets to that of -approximation in certain range spaces, this unification is captured by the development of a new notion: a generalized range space and a corresponding generalized dimension.
More specifically, in the standard study of range spaces, an -approximation captures the propertied of the original space with respect to any range in the space. This intuitively corresponds to the study of strong coresets. For the (more delicate) study of weak coresets, we enhance the standard definition of a range space, to obtain a generalized definition and theory. In our generalized view, an -approximation captures the propertied of the original space with respect to a subset of predetermined ranges in the space (and not necessarily all of the ranges). Choosing the predefined subsets carefully, one may capture the essence of weak coresets. The study of generalized range spaces enables us to use the same algorithms in our constructions of coresets, whether weak or strong, where the difference in the obtained results (in size and running time) is now easily traced back to the notion of the generalized dimension of the range space at hand.
Framework overview
We now review the concept of -approximations and -coresets followed by a detailed overview of our general framework.
For a multi-set of non-negative functions on a set , we say that is an -approximation for , if for every every and we have
where . For a set of non-negative functions on a set , we say that is an -coreset for , if for every we have
For each , let be defined as . Let consists of copies of , and let be an -approximation of the set . Then is an -coreset for . That is, for every ,
2 Bicriteria approximation
The first part of our framework yields a general paradigm for bicriteria approximations, that essentially reduces the task at hand to that of -approximations from the theory of Machine/PAC Learning and VC dimension [VC71, HW86]. Roughly speaking our reduction includes three steps. In the first step, we determine the combinatorial complexity of the clustering problem at hand by defining a corresponding generalized range space and studying its generalized VC-dimension (we elaborate on these notions shortly). We then show that an -approximation to the corresponding range space, yields a relaxed notion of bicriteria clustering we refer to as a robust median. Finally, we show how to use these robust medians in able to obtain a bicriteria solution. An outline of our framework follows. Generalized VC dimension: Given the clustering problem at hand (i.e., the function family ), one starts by defining a corresponding range space and by studying its combinatorial complexity (i.e., dimension).
Let be a finite set of functions from a set to . The dimension of is the dimension of the range space \big{(}F,\mathbf{ranges}(F)\big{)}, where is the range space of , that is defined as follows. For every and , let . Let the set be defined as . The dimension of is the minimum such that
To allow the unified study of both strong and weak coresets, we enhance the definition above to that of a generalized range space. In a generalized range space corresponding to , for every subset of functions one defines a corresponding subset of important ranges . In our context of clustering, the set will be defined by a subset of centers that are guaranteed to include a good center to be used in the clustering of . More precisely:
Let be a finite set of functions from a set to . Let be a function that maps every subset to a set of items . The pair is called a generalized function space, if for any it holds that . The dimension of is the smallest integer , such that
where .
For a generalized function space , we now seek small subsets that are -approximations to the range space . Loosely speaking, such sets will approximate the function set with respect to the centers in that are (by definition) of “importance” to the approximation of . Combining this with a proof that centers that approximate also approximate , will yield the weak coresets we desire. Notice that in the above definition we have required the function to be monotone. This allows us to obtain the following (immediate) connection between random sampling and -approximation (e.g., via [LLS01]).
Let be a function space of dimension from to . Let . Let be a sample of i.i.d functions from , where is a sufficiently large constant. Then, with probability at least , is an -approximation of the range space .
Let be a set of functions from a set to . Let , and . For every , let denote the \big{\lceil}\gamma n\big{\rceil} functions with the smallest value . Let , and let be the set of the functions with smallest value . The set is called a -median of , if and
Notice that a set of centers which are a -median are (by definition) an bicriteria approximation. Thus, one is interested in finding good robust medians for . We show that this is possible via -approximations to the function space . In the lemma below we use . We note that a similar lemma, for general , also holds, and appears in the appendix.
Let be a function space of dimension . Let , , , . Let be a random sample of i.i.d functions from , where is a sufficiently large constant. Suppose that is a -median of , and that . Then, with probability at least , is a -median of .
Once the connection between -approximation and robust medians is established, one can find robust medians for via an exhaustive (or sometimes more efficient) algorithm that addresses the -approximation . From robust medians to bicriteria. We are now ready to present our algorithm for bicriteria approximation. Before presenting our algorithm, we note that although an -bicriteria approximation is precisely a -median, we cannot use Lemma 4.6 above to obtain a bicriteria solution (as in Lemma 4.6, and there is a slackness in the reduction w.r.t. ).
Our algorithm Bicriteria for bicriteria approximation appears in Figure 1. The algorithm receives the function family and parameters and outputs a subset of centers of size logarithmic (in ) that act as a bicriteria approximation to the median problem on . The main recursive call for “-median” in Bicriteria is to the computation of a -median for which is essentially done via the connection to -approximation specified above. Namely, to compute a -median for the function set (defined in the algorithm), we take a random sample of , find a corresponding robust median for , and return it as a robust median for . Our main theorem in the context of bicriteria approximation follows.
is the time it takes to compute a -median for a set .
is the time it takes to compute an bicriteria for a set of size .
The size and running time are specified in Theorem 4.7 in an abstract manner as a function of , , , RobustMedian, ExhaustiveBicriteria, and implicitly - the generalized VC dimension of the function space . In Section 2, we presented some concrete examples in which the size and running time specified in Theorem 4.7 are computed for specific well studied clustering problems. More examples appear in the appendix of this work. As we show, our framework improves upon previously best known results.
3 From bicriteria to coresets
Once one has established an bicriteria approximation for the clustering problem at hand, we present a paradigm for obtaining coresets (both strong and weak as defined in Section 2).
This general algorithmic paradigm in itself is the basis of several coreset constructions that have been recently suggested, e.g., [Che06, FMSW10, FMS07, LS10]. However, the main novelty in our algorithm is in its second step, which essentially adds the bicriteria centers as additional elements in the coreset. Adding the bicriteria centers to the coreset, combined with a delicate weighting mechanism (that may assign negative weights), enables the proof of the following theorem. In what follows, we assume is an bicriteria approximation. This can be obtained from previous works (e.g., [Che06]) or by the use of our framework in an enhanced version of Theorem 4.7 (details appear in the appendix).
The main idea governing the proofs of Theorems 4.8 and 4.9 lies in the fact the the random sample of algorithm -Median-Coreset is an -approximation to (a slightly modified version of) the function family corresponding to -median clustering of . To obtain our succinct setting for , we perform a delicate analysis which determines the weights , and specified in -Median-Coreset. In the case of -median clustering, our coresets consist of points in the data set (as common in the study of coresets for approximate clustering). In the coresets to come, this will no longer be the case, and the functional representation of our data will be central.
However, as the reader may have noticed, the size of our coreset is larger than the set we started with, so where is the gain? The gain is in the structure of the coreset compared to the data set : it is (essentially) the union of a small set with a set that lies in a low dimensional space. Specifically, can be partitioned to sets, each consisting of points on a single line (from ). Thus, if is small (and using Theorem 4.7 it is logarithmic), we have conceptually reduced the problem of finding a coreset for to that of finding a coreset for , which can now be done via its specialized structure (e.g., via [FFS06]). The following theorem summarizes the quality of the resulting algorithm, which (a) first runs Metric-B-Coreset to obtain corresponding to and , (b) then uses [FFS06] and a few additional ideas to find a small set of points that are a good approximation to (including a corresponding weight function), and (c) returns a succinct function set corresponding to and .
The general setting: We now address the general setting in which we are given a general function family . As in the previous case, our algorithm first finds a -coreset, and only then may try to utilize the nature of the -coreset to obtain a standard coreset. Our algorithm B-Coreset for finding the -coreset is presented in Figure 4 and is phrased in an abstract manner that captures the previously defined coreset algorithms Metric-B-Coreset and -Median-Coreset.
We now turn to discuss the set returned as output by B-Coreset. Notice, that there is no use of random sampling in algorithm B-Coreset. Instead, to construct the set we use the more general notion of -approximation, again on a weighted and threshold defined variant of . To be precise, we could have used the notion of -approximation in the previously defined coreset algorithms as well, but instead represented them in terms of random sampling for ease of presentation.
All in all, algorithm B-Coreset returns two sets, the function set that corresponds to a threshold version of (which intuitively corresponds to a projected version of onto a given bicriteria solution), and the function set which corresponds to a small sized -approximation to (a threshold and weighted version) of the family . Our main theorem in the this general setting is now:
Some remarks are in place. Primarily, our presentation of Theorem 4.11 is very general and involves several parameters and function sets. From this presentation, both the the size and quality of our coreset is hard to decipher. The abstract nature of Theorem 4.11 allows us to apply it on several function families . In Section 2 we have presented a number of concrete algorithmic applications. These applications are proven in detail in the appendix.
Secondly, as discussed in Section 3, the output of algorithm B-Coreset is a new set of functions that may not be a subset of . Indeed, this is the case, however we stress that the set is essentially a subset of which differs only by our weights and threshold cut-off . Moreover, the function set and thus the set will be a set of functions that are typically easy to compute from a bicriteria of . As we have shown, in certain cases, such as the -median problem discussed previously, we are able to slightly modify our algorithm so that it returns a set of points as the desired coreset and not a function set that may have cut-off thresholds.
Acknowledgment
We wish to thank Christos Boutsidis, Michael Mahoney and Leonard Schulman for helpful discussions on this paper.
References
Appendix 5 Road map
The body of this extended abstract holds a detailed discussion of our results, without elaborating on the rigorous technical content. In this self contained appendix, we present the complete definitions and proofs of all our claims discussed in the body of this work. The appendix is organized as follows.
In Section 6, we review the notion of approximation for range spaces and define and analyze the new notion of -approximations for function families.
In Section 7, we define and analyze the notion of generalized range spaces and generalized dimension, including the connection between these notions and the classical notions of Section 6.
In Section 8, we show a connection between -approximations and a new relaxed notion of coresets we refer to as robust coresets.
In Section 9, we further study the notion of robust coresets and link them with the notion of a robust median discussed in the body of the paper. This connection ties the notion of robust medians with that of -approximations.
In Section 10 we define the notion of a centroid set to be used in the sections to come.
In Section 11 we tie the notion of robust coresets with that of bi-criteria approximation, a connection discussed in the body of this work.
In Section 12, we use the analysis of previous sections to obtain concrete results on the bicriteria approximation of several clustering problems, some of which were discusses in Section 2 in the body of the paper.
In Section 17 we study the -line median problem, and prove the results stated in Section 2.
In Section 18, we show how to apply our framework in order to construct (low-dimensional) -coresets and coresets for subspace approximation. We apologize to the reader, and note that we are currently still writing parts of this section, which will be uploaded to a future version on arXiv.
Appendix 6 ε𝜀\varepsilon-Approximations
In this section we will discuss the basic definitions of -approximation used throughout this work.
A range space is a pair where is a set, and is a set of subsets of . The dimension of the range space is the smallest integer , such that for every we have
The dimension of a range space relates (but is not equivalent) to a term known as the VC-dimension of a range space.
A set of functions is an -approximation of the range space , if for every we have
Usually , otherwise is called in the literature a weak -approximation.
The following well known theorem states that a random sampling from a set is also an -approximation of . See discussion in [HP09].
Let be a range space of dimension . Let . Let be a sample of
i.i.d items from , where is a sufficiently large constant. Then, with probability at least , is an -approximation of .
Let be a finite set of functions from a set to . The dimension of is the dimension of the range space \big{(}F,\mathbf{ranges}(F)\big{)}, where is the range space of , that is defined as follows. For every and , let . Let .
The following lemma follows directly from our definitions:
Let be a set of functions from to , and let . For every define a corresponding function such that , for every . Let be the union of these functions. Then
We now define the notion of an -approximation for a function set and tie it to an -approximation of the corresponding range space. This notion plays a central part in our work. Roughly speaking, an -approximation for a function set is a subset that approximates the average cost of ranges in the range space corresponding to . To allow invariance by constant multiplication, the quality of the approximation defined below is necessarily related to the parameter bounding the value of our functions in the range being considered.
Let be a set of functions from to , and let . An -approximation of is a set that satisfies
where .
We now show the connection between -approximations for range spaces and for function families.
Let be a set of functions from to , and let . Let be an -approximation of the range space of . Then is an -approximation of .
Proof. Let and . For every , let . Let denote the functions in , sorted by their value. Let , and . For every , , let . We define the partition of , where and, for ,
Let for every . Summing the last term of (3) over yields
Hence, summing (3) over yields
We now bound each term in the right hand side of the last equation. Using (5), we have
Since is an -approximation for , we have
Combining the last inequality in (10) bounds (8), as
Combining (9), (12) and the last inequality bounds the left hand side of (6), as
By plugging Theorem 6.3 in Theorem 6.8 we obtain the following corollary.
Let be a set of functions from to , and let . Let be a sample of
i.i.d items from , where is a sufficiently large constant. Then, with probability at least , is an -approximation of .
Appendix 7 ε𝜀\varepsilon-Approximations for High and Infinite Dimensional Spaces
Suppose that we have a range space of a high (maybe infinite) dimension . In this section we show that for several natural families of high dimensional range spaces, a small -approximation can be constructed that approximates (not all, but rather) a subset of the ranges in the range space. This weaker type of -approximation suffices to solve certain optimization problems in high dimensional space. Towards this end, we will define the notion of a generalized range space, the notion of a corresponding function space, and the notion of -approximation in this context. As before, these notions will play a major role in our analysis.
Let be a set. Let be a function that maps every subset to a set of subsets of . The pair is a generalized range space if for every two sets such that , we have . The dimension of a generalized range space is the smallest integer , such that
We now define the generalized dimension of a family of functions:
Let be a finite set of functions from a set to . Let be a function that maps every subset to a set of items . The pair is called a function space, if the pair is a generalized range space, where is defined as follows. For every and , let . For every , let . The dimension of the function space is the dimension of the generalized range space .
We note that it is not hard to verify that for it holds that . For a subset of , let be the function set which is defined by restricting the functions to inputs in . The following theorem is an immediate consequence of the proof in [LLS00] and can be seen as a corollary of Theorem 6.3.
Let be a function space of dimension from to . Let . Let be a sample of
i.i.d functions from , where is a sufficiently large constant. Then, with probability at least , is an -approximation of the range space .
The following is a simple corollary of Theorem 6.8 that connects between the notion of -approximation for range spaces and -approximation for function sets in the generalized setting.
Let be a function space of dimension . Let be an -approximation of the range space for some . Then is an -approximation of .
Using Corollary 7.4 with Theorem 7.3, we now conclude:
Let be a function space of dimension . Let , and let be a random sample of at least
i.i.d functions from , where is a sufficiently large constant. Then, with probability at least , is an -approximation of .
Appendix 8 From ε𝜀\varepsilon-approximations to (γ,ε)𝛾𝜀(\gamma,\varepsilon)-coresets
In this section we define and analyze the notion of -coresets: a relaxed notion of coresets (that we refer to as robust coresets) that we will use in our study of robust medians discussed in the Introduction. Roughly speaking, we show that -approximators for are also -coresets.
Let , and . Let and be two sets of functions from a set to . For every :
Let denote the \big{\lceil}\gamma|F|\big{\rceil} functions with the smallest value
Let denote the \big{\lceil}(1-\varepsilon)\gamma|S|\big{\rceil} functions with the smallest value
Let denote the \big{\lceil}(1-2\varepsilon)\gamma|F|\big{\rceil} functions with the smallest value
The set is -good for if
The set is a -coreset of if for every , and , we have that is -good for .
Our definition of robust coresets has the flavor of approximating with outliers. Namely, in our definition, we allow a portion of the functions in both and to be neglected when considering the quality of . In what follows, we show that an -approximation to a function set is also a robust coreset.
Let . Let be a set of functions from to , and let be an -approximation of the range space corresponding to . Suppose that . Let , and for every :
Let denote the functions with the smallest value
Let denote the functions with the smallest value
Proof. Let , and let be an -approximation to the range space corresponding to . By Theorem 6.8, is also an approximation to . Let denote the functions with the smallest value . Let , , and be defined as in the statement of the theorem. We will prove that
This suffices to prove the theorem for .
Indeed, for every and , we define . By our definitions,
Fix , and let , . We have
Let . Since , we have that
We now bound each of the terms in the right hand side of (18). Using the triangle inequality,
Combining the last two equations in (18) yields
By (15) we bound the first term in the right hand side of (19) by . Using (16) we bound the second term by . We thus obtain
We now bound the other terms in the right hand side of (19). By the definition of and , we have either , or (or both). Hence, or . By (16) we have
Using the last three equations and (17), we obtain
Since both and contain the functions with the smallest values , we have . Together with the previous equation, we obtain
Similarly, we bound the rightmost term in (19). As stated above, we have or . Using (21) with the last two inequations yields
where the last derivation follows from (17). We have . Together with the previous equation, we obtain
Combining (20), (22) and the last equation in (19) proves (14) as follows.
We are now ready to state the connection between -approximations and coresets.
Let , and . Let be a set of functions from a set to , and let be an -approximation of the range space corresponding to (and thus also of the function set ), such that . Then is a -coreset of .
Proof. Let and let be an -approximation of such that . We will prove that is -good for ; see Definition 8.1. By our definitions, is also an -approximation of , for every and . Hence, is -good for every and . This suffices to prove that is a -coreset by replacing with .
Indeed, let be the \big{\lceil}(1-6\varepsilon)\gamma|F|\big{\rceil} functions with the smallest value , and denote the \big{\lceil}(1-3\varepsilon)\gamma|S|\big{\rceil} functions with the smallest value . In order to prove that is -good for , we need to prove that
Fix , and let denote the \big{\lceil}\gamma(1-3\varepsilon)|F|\big{\rceil} functions with the smallest value . We first bound the right hand side of (23). By Theorem 8.2, we have
Since , we have
By the last equation and Markov’s inequality,
Let . Since , we have
Since is an -approximation of , we have
Since this theorem assumes , we have
Combining the last equation and (27) in (24) yields
Multiplying the last equation by bounds the right hand side of (23) as follows.
We now bound the left hand side of (23) in a similar way. Let denote the \big{\lceil}\gamma(1-6\varepsilon)|S|\big{\rceil} functions with the smallest value . Since , we have
By the last equation and Markov’s inequality,
Let . Since , we have . Since is an -approximation of , substituting in Definition 6.2 yields
That is, . Hence,
Since , we have
Combining the last three equations yields
Multiplying the last equation by yields
The last equation and (29) proves (23) as desired.
Using Theorems 6.3 and 8.3, we get the following corollary.
Let , and . Let be a set of functions from a set to . Let be a sample of at least
i.i.d functions from , where is a sufficiently large constant. Suppose . Then, with probability at least , is a -coreset of .
In this section we discuss the notion of robust medians stated in the Introduction and tie it to the notion of -coresets discussed in the last section. Roughly speaking, a robust median is a subset of points from that acts as a bi-criteria clustering of when considering outliers. More specifically, our robust medians will be parametrized by four parameters: and . The parameter (or to be precise ) will specify the fraction of outliers considered. The parameter is a slackness parameter crucial to the proof of our theorems to come. The parameter is the approximation ratio between the obtained clustering by and the optimal -median clustering. Finally, the parameter will denote the size of . In several cases, we will just take to be , and will remove the parameter from our notation.
Let be a set of functions from a set to . Let , and . For every , let denote the \big{\lceil}\gamma n\big{\rceil} functions with the smallest value . Let , and let be the set of the functions with smallest value . The set is called a -median of , if and
For simplicity of notation, a -median is a shorthand for a -median.
Let be a set of functions from to . In the previous section we proved that a small -coreset of can be constructed using algorithms that compute -approximation of . In particular, a random sample of is such a -coreset. In this section we prove that the -median of is also an -median of . In other words, if we have a (possibly inefficient) algorithm for computing the -median of a small coreset , then we can compute a similar median for the original set in time linear in .
Let be a set of functions from a set to . Let , . Suppose that is a -coreset of , and that . Let . Then a \big{(}(1-\varepsilon)\gamma,\varepsilon,\alpha)-median of is also a -median of .
Proof. For every , let denote the \big{\lceil}\gamma|F|\big{\rceil} functions with the smallest value .
Let be a \big{(}(1-\varepsilon)\gamma,\varepsilon,\alpha)-median for
Let denote the \big{\lceil}(1-4\varepsilon)\gamma|F|\big{\rceil} functions with the smallest value
Let denote the \big{\lceil}(1-3\varepsilon)\gamma|S|\big{\rceil} functions with the smallest value
Let denote the \big{\lceil}(1-\varepsilon)\gamma|S|\big{\rceil} functions with the smallest value
Since , we have . Using this, (31) and the fact that is a \big{(}(1-\varepsilon)\gamma,\varepsilon,\alpha\big{)}-median of , we have
Since is a -coreset of , it is -good for ; see Definition 8.1. By this, and since |G|\leq\big{\lceil}(1-2\varepsilon)\gamma|F|\big{\rceil}, and , we obtain
Since is a -coreset of , we have that
By (32) and the last two equations, we obtain
By the assumption of the theorem, we have , so . Hence,
Similarly, since ,
Hence, is a -median of as desired.
In the following (immediate) corollary, we use the same parameters as in Theorem 9.3.
Let be a set of size that contains a \big{(}(1-\varepsilon)\gamma,\varepsilon,\alpha)-median of . Then is a -median of .
Suppose that for a small subset from , we can compute a -median for . For , we showed in Lemma 9.3 that if is a robust coreset for then is a robust median for . Unfortunately, this does not hold for . However, if we use stronger assumptions on the set , the following theorem proves that is indeed a robust median in this case. More specifically, we will need to be an approximation to an enhanced version of the function set . The enhanced function set corresponding to is one which takes as input subsets (and naturally outputs the minimum evaluation over points in ). In a later section, will will use the theorem below to construct efficient bicriteria approximation algorithms from inefficient ones.
Let be an integer, , , and .
Let be a set of functions from to such that .
For every define as .
Let be a -coreset for , such that .
Let be a -median for .
Then is a -median for .
Proof. Let denote the functions with the smallest value . Let denote the functions with the smallest value . Since is a -coreset for , it is also -good for ; see Definition 8.1. Hence,
Since is a -coreset for , we have
Combining (34), (35), (36) and (37) yields
Since , we have
Since , we have
By plugging (40) and (39) in (38), we infer that
where in the last derivation we used the assumption of the theorem. This proves that is a -median of .
We conclude this section with a lemma (similar in nature to Theorem 9.3) that addresses generalized range spaces.
Let be a function space of dimension . Let , , , . Let be a random sample of
i.i.d functions from , where is a sufficiently large constant that is determined in the proof. Suppose that is a -median of , and that . Then, with probability at least , is a -median of .
Proof. Let be a -median of , and for all let . Notice that is a generalized range space as in Definition 7.2. The number of ranges in is larger by at most than the number of ranges in . Hence, . Hence, applying Theorem 6.3 and then Corollary 7.4 with large enough, we obtain that, with probability at least , is an -approximation of . Assume that this event indeed occurs. By Theorem 8.3, is also a -coreset of .
Since , we have that is a -median of . Using Theorem 9.3 with and , we obtain that is a -median of . Since , we infer that is a -median for .
In this section, we use the results of Section 8 to reduce the problem of computing the robust median for a set of points to easier problems on smaller (usually, of size independent of ) sets. We assume that sampling functions from uniformly can be done in time . Using Theorem 8.4, Theorem 9.3, and Corollary 9.4, we get the following corollary.
Let and . Let be a set of functions from to . Suppose that we have an algorithm that receives a set of size
and returns a set , that contains a \big{(}(1-\varepsilon)\gamma,\varepsilon,\alpha\big{)}-median of in time . Then a -median of can be computed, with probability at least , in time .
The reduction stated in the corollary above (approximately) preserves the quality of the median with respect to . In cases, it is useful to show a connection between medians for with and medians for which arbitrary . This point is addressed in the next corollary.
Let and . Let be a set of functions from a set to . Suppose that we have an algorithm that receives a set of size
and returns a -median of in time . Then a -median of can be computed, with probability at least , in time
Hence, is a for as desired.
We compute using exhaustive search over all possible subsets of size of . The proof now follows by applying Corollary 9.7 with .
Appendix 10 Centroid Sets
In this section we define and analyze the notion of a centroid set. Roughly speaking, a centroid set in a subset of the centers that includes a robust median for every subset . The notion of centroid sets will be later tied to that of weak coresets as outlined in the Introduction.
Recall that by Corollary 9.8, in order to compute a -median of for in time independent in , it suffices to compute a median for a small set in some finite time (even exponential in ).
Let be a set of functions from to . A -centroid set for is a set that contains as an element a -median of , for every . A -centroid set is a shorthand for a -centroid set.
We start with the following simple lemmas that follows directly by our definitions.
Let be a set of functions from to . Let be parameters. Then, for every two parameters a -median of is also a -median of .
Let be a set of non-negative functions, and . Then every -centroid set of is a -centroid set of .
Proof. Let be a -centroid set for . Let . We will show that includes a median for . Then using Lemma 10.2 and Definition 10.1, we can conclude our assertion. Let be a -median of . Let , and let denote the functions with the smallest value . By Definition 10.1 contains a -median for . Let denote the functions with the smallest value . Let denote the functions with the smallest value . Hence,
By denoting , and noting that , we have
where in the last deviation we used the assumption . By the previous equation and (41), we have that is a -median for . Using Lemma 10.2, is also a -median for . Since the proof holds for every , we conclude that is a -centroid set for .
For every -tuple , let
be a partition of into disjoint sets, each of size at most . Let . Then is a -centroid set of size for .
Proof. Let . Let be a -median for , and let be the corresponding functions in . Let be a partition of , such that for every . Fix , . Let be a -median for . Hence,
Let . Summing (42) over every yields
Hence, is a for . Since , we conclude that is a -centroid set for .
Let and be defined as in Lemma 10.4. Let , , . Let be a -centroid set for . Then there is which is a -median for .
Proof. Let be a -median for . Let denote the functions with the smallest value . Let . Let be a partition of , such that for every .
That is, is a -median for . Hence, is also a -median for .
Appendix 11 From (γ,ε,α,β)𝛾𝜀𝛼𝛽(\gamma,\varepsilon,\alpha,\beta)-medians to bicriteria approximations
Let be a set of functions from to . An -bicriteria approximation for is a -median of .
An algorithm that computes a robust-median for a given subset of ; see Definition 9.2
The second algorithm receives an input of size independent of , and thus can be inefficient. Algorithms for computing a robust-median of functions in time linear in are presented in Section 9.1.
Let be a set of functions from a set to , and let , . Let be the set that is returned by the algorithm ; see Fig. 5. Then is a -approximation for . That is, and
Let be the set that is returned by a call to the algorithm . We will prove that
We denote the functions in by , such that for every , where ties are broken arbitrarily. Let
During the first “while” iterations, an overall of functions were removed from . Hence,
By Lines 5 and 5 of the algorithm, we have
Let . Using the last three inequations, we obtain
Let, . We now prove that
and that for every integer such that , we have
Indeed, let be an integer such that , and assume . We have . Using the last equation and (46), we get
We have , where in the last deviation we use the assumption from the beginning of this proof. Hence,
Since , we have by Line 5 that . We thus have
Combining the last equation with (51) yields
We have , i.e, . Thus, substituting in (53) yields
which proves (49). If , we have by (53)
which contradicts the fact . Hence, the assumption implies . This proves (50).
We have . The set contains the functions with the smallest value . Hence, Equation (49) implies
Since , combining the previous equation in (54) yields
where in the last deviation we used (50). This proves (44) as desired.
In what follows we restate Theorem 4.7 and present its proof.
Let be a set of functions from a set to . Let , . Then a set of size can be computed such that, with probability at least ,
is the time it takes to compute, with probability at least , a -median for a set .
is the time it takes to compute a -median for a set of size .
Proof. We present a randomized implementation of the algorithm Bicriteria in Fig. 5. The implementation succeed with probability at least , and its running time is , as stated in the theorem. By Theorem 11.2, this proves the theorem.
Indeed, let denote the output of a call to Bicriteria. Put , . Suppose that we have an algorithm that computes, with probability at least , a -median for . Calling to in each of the times that Line 5 of the algorithm Bicriteria is executed, would yield an implementation for Bicriteria that succeeds with probability at least . However, in this implementation, we use that is dependent of .
The probability that is a -median of is at least the probability that one or more of the items contains a -median of . Hence, is a -median of with probability at least . By Theorem 11.2 there are at most iterations. Hence, the probability that the item would be a -median in the th iteration, for every , , is at least .
By the assumption of this theorem, Line 5 can be computed in time . We conclude the that the total running time of the above implementation for is as desired.
Appendix 12 Applications: Bicriteria for Projective Clustering
In this section we present several applications of the Theorems presented in Section 11 addressing bi-criteria approximation. Our applications are from the context of projective clustering. We consider several settings of parameters. For each setting we prove appropriate results. We start with some notation.
We start by showing how one can obtain an bi-criteria approximation in which the approximation ratio is rather large, and the resulting and running time are of size exponential in and . Our proof has the following structure.
and . Then
is of size , and can be computed in time.
is a -centroid set for .
Proof. (i) There are subsets of size at most of . For a fixed subset of points from , we use the QR decomposition in order to compute the flat that is spanned by them. This takes time. (ii) We prove the case . The case then follows from Lemma 6.5. Fix . For , let . Hence, . Therefore,
By our definitions, we obtain as desired.
(iii) Follows from Lemma 10.4 and Theorem 12.1.
Then, a -median for can be computed, with probability at least , in time .
Let . By applying Theorem 12.2 with , a -centroid set , , for can be computed in time . By applying Lemma 10.5 with , , , and there is a -median for . Applying Lemma 9.6 with the function space yields that with probability at least , is a -median of .
A -bicriteria approximation for can be computed, with probability at least , in time
Proof. By Lemma 12.3, a -median for a set can be computed, with probability at least , in time. Similarly, using and in the proof of Lemma 12.3, a -median for a set of size can be computed in time. The time it takes to compute the distance between a point to a set of -flats is . By applying Theorem 11.3 with , and , we infer that a -bicriteria approximation for can be computed, with probability at least , in time
3 α=1+ε𝛼1𝜀\alpha=1+\varepsilon, Small j𝑗j and k𝑘k
.
Given , can be computed in time.
A -centroid set for of size can be constructed in time.
We now present a technical lemma that we will use in our proofs to come.
Moreover, can be computed in time.
The following is a generalization of Theorem 12.2(i).
and . Then .
Since both and the flats of are contained in the -dimensional subspace , applying Lemma 12.6(i) with implies that . By definition of , we obtain
By (57), for every , and a set there is a corresponding distinct set: . Therefore,
Using the last equations with (58), we obtain
Taking the union over every possible choice of yields
By our definitions, we obtain as desired.
and . Then
is a (possibly infinite) -centroid set for .
Proof. (i) Follows from Lemma 12.8. (ii) Follows from Lemma 10.4 and Theorem 12.5.
The following centroid set that is constructed using the bound of Theorem 12.6 is similar to the larger and somewhat less general centroid set that is constructed in [DRVW06].
Moreover, , where is defined in Theorem 12.9.
Proof. Let be a random sample of i.i.d functions from , for some constant that will be determined later. Here, we assumed that . Otherwise, let . By Lemma 12.10, a -centroid set for can be computed in time, where
By Lemma 10.3, is also a -centroid set for . Using exhaustive search over , a -median of can be computed in time. Let be defined as in Theorem 12.9. By Theorem 12.9, is a -centroid set for , and . By Theorem 12.10, is contained in , so . By Theorem 9.6, for a large enough constant we have that, with probability at least , is a -median for .
Then a -bicriteria approximation for can be computed, with probability at least , in time
Proof. By applying Lemma 12.11 with and , a -median for a set can be computed, with probability at least , in time. For a set , , a -median of can be computed in time using exhaustive search on the centroid set in Lemma 12.10. By applying Theorem 11.3 with and , a -bicriteria approximation for can be computed, with probability at least , in time
4 α=1+ε𝛼1𝜀\alpha=1+\varepsilon, Large k𝑘k, Small j𝑗j
Then a -median for can be computed in time.
Proof. Let and . Let be a random sample of i.i.d functions from , for some constant . Here, we assumed that . Otherwise, let . Let . By applying Lemma 12.10 with , a -centroid set for , , can be computed in time. Applying Lemma 10.5 with , , yields that there is which is a -median for . Let . Applying Lemma 9.6 with the function space yields that with probability at least , is a -median of . Assume that this event indeed occurs.
and . Then a -bicriteria approximation for can be computed in time
Proof. Let . By applying Lemma 12.13 with and , a -median for a set can be computed, with probability at least , in time.
For a set , , a -centroid set for , , can be computed in time using Lemma 12.10. Applying Lemma 10.5 with , , and yields that there is which is a -median for . Hence, an arbitrary partition of to -tuples is a -median for that can be computed in time.
The time it takes to compute the distance between a point to a set of -flats is . By Theorem 11.3 a -bicriteria approximation for can thus be computed, with probability at least , in time
5 α=1+ε𝛼1𝜀\alpha=1+\varepsilon, Large j𝑗j and k𝑘k
i.i.d functions from , where is a sufficiently large constant that is determined in the proof. Let
Then, with probability at least , is a -median for .
Proof. Let be defined as in Theorem 12.9. By Theorem 12.9(ii), is a -centroid set for . Let and . By Lemma 10.3, is also a -centroid set for . Hence, there is a -median for . Since , we have that is a -median for .
For and , there is a -median for . By Theorems 12.9(i), we have . Using this with Theorem 9.6, we infer that there is a constant such that, with probability at least , is a -median for . Assume that this event indeed occurs. Since , we have that is a -median for .
Proof. By applying Lemma 12.15 with and , a -median of a set can be computed, with probability at least , such that all the -flats of are contained in an -flat. For a set of size , the span of (the points corresponding to) contains a -median of .
6 k𝑘k-Median in a Metric Space
A set of points can be computed in time such that, with probability at least ,
If , by applying Corollary 9.7 with and we can compute, with probability at least , a -median of in time . If , the set is a trivial median for .
Appendix 13 From bicriteria to B𝐵B-coresets
In this section we analyze the quality of the coresets obtained via algorithm B-Coreset (Figure 6). We present of analysis which will be used in sections to come when we derive results for specific clustering problems.
The first term in the right hand side is approximated by , up to an error of
Since is a -approximation of , by Lemma 6.8 we obtain
By Step 6 of our algorithm, for every , we have . By the assumption of the theorem, we thus obtain
Multiplying this equation by yields
Recall that . Together with the previous two inequalities, we obtain
We now present a few corollaries of Theorem 13.1 that will be used in the sections to come.
Let , , and be defined as in Theorem 13.1. Let . Suppose that for every and we have
Then for it holds that
Proof. Put . For every , we have
For every , we have
The Corollary follows by applying Theorem 13.1 using the last inequalities.
Let and be defined as in Theorem 13.1. Let and . Suppose that for all and for all it holds that
For every and assume and define
Then for it holds that
Proof. Put , , and . If , then using our definitions
Otherwise, . Thus , so, by (64), . Replacing with in Theorem 13.1 yields
For every and , let and . For every , let and . Then for it holds that:
By applying Theorem 13.1 with as , we infer that
In the above we use the fact that . We conclude that,
Appendix 14 From B-Coresets to Metric B-Coresets
We now turn to study algorithm B-Coreset when applied to functions corresponding to a metric space. Namely, we show an improved analysis when and the bi-criteria correspond to points in a given metric space. We will use the analysis stated in this section in deriving improved results for specific clustering problems.
Notice the close resemblance between the definition of in algorithm Metric-B-Coreset and the definition of . For every , we then define
Let and be two functions from a set to . Let , , , and . Let , , and suppose that
Proof. It suffices to prove that for , we have
By substituting and in (68), we obtain
Assume that . By taking the th root, we get . That is, . Using this with (66) yields
Combining the last two inequalities in (69) yields (67), as
where in the last two deviations we used the assumption .
for a function space . Then, with probability at least ,
Put . Let be the output of a call to where . Using (70), applying Corollary 13.3 yields
Let . By the construction of , we have that is a random sample of i.i.d functions from . By using a sufficiently large constant in Theorem 7.3, with probability at least , is thus an -approximation of .
Also, for defined in algorithm Metric-B-Coreset, notice that our definitions imply that
Here we use the fact that is defined in algorithm B-Coreset to take copies of each .
Suppose that was used in Line 4 of the above call to B-Coreset. Using the last equation and (72) with the construction of , yields
For every , we then define
for a function space where is a sufficiently large constant. For every , let
Then, with probability at least ,
Let be defined as , and . Let be the output of a call to ; see Fig. 6.
Let be the set that is defined in Line 6 of the above call to B-Coreset. Note that for every we have
We thus have , so . Let . By the construction of , we have that is a random sample of i.i.d functions from . By Theorem 7.3, with probability at least we have that is an -approximation of . Assume that this event indeed occurs, and suppose that was used in Line 4 of the above call to B-Coreset.
for a function space where is a sufficiently large constant. For every , let
Then, with probability at least ,
Let be defined as , and . Let be the output of a call to ; see Fig. 6.
Let be the set that is defined in Line 6 of the above call to B-Coreset. Note that for every we have
We thus have , so . Let . By the construction of , we have that is a random sample of i.i.d functions from . By Theorem 7.3, with probability at least we have that is an -approximation of . Assume that this event indeed occurs, and suppose that was used in Line 4 of the above call to B-Coreset.
Appendix 15 k𝑘k-Median in a Metric Space
We now present the results obtained by applying our framework on the -median problem in metric spaces. We start by presenting a constant factor approximation. We assume that the time to compute the distance between two points in the metric space is .
Fix . Using the triangle inequality,
Fix . Using the triangle inequality,
2 Strong Coresets for Metric k𝑘k-Median
The following is a generalization of Theorem (6.3), as appeared in [LLS00]. Although the original claim uses another definition of dimensionality (analogous to the VC-dimension), it can be easily verified that it also holds for our weaker definition of dimensionality.
We start by proving a technical lemma regarding the weights defined in algorithm -Median-Coreset, see Fig. 8.
Let be two finite sets of points in a metric space, and . Let be the constant from Theorem 15.2, and
Let be the pair that is returned from a call to the algorithm -Median-Coreset, see Fig. 8. Then, with probability at least , we have
Proof. Let and . Let be the sample that is constructed during the execution of Line 8 of the algorithm; see Fig. 8. Hence,
For every , define as
That is, \overline{s}(b)\leq\big{(}uv+\overline{f}(b)(1+u)\big{)}/(1-u). Since , we obtain
Since (by the definition of ), we obtain
For every , we have , so
Together with the fact that for every , we conclude that for every .
We are now ready to address strong coresets for metric -median.
where is a sufficiently large constant. Then a set , , with a weight function can be computed such that, with probability at least ,
The running time is .
Proof. By Theorem 15.1, a set of points can be computed in time such that, with probability at least ,
Consider the set of functions ; see Definition 14.4. Since , we have for the case . Using Lemma 6.5, for any .
Let be the output of a call to the algorithm -Median-Coreset. By Corollary 15.3, with probability at least , the weight function is non-negative. Assume that this event indeed occurs. Let be the output of a call to the algorithm Metric-B-Coreset. Since and have the same distribution, we assume w.l.o.g. that .
By Theorem 14.5, with probability at least ,
Combining the last inequality with (80) and (81) yields
Given , the set can be constructed in by taking multiple copies of each point and then use uniform random sampling; see Fig 6. By using (79) and a sufficiently large constant , this proves the theorem.
3 Strong Coreset for Metric k𝑘k-Means and Distances to the Power of z𝑧z
where is a sufficiently large constant. Then a set , , with a weight function can be computed such that, with probability at least ,
The running time is .
Proof. We construct a set and a weight function such that
with probability at least . Replacing and in the proof with and respectively, would then prove the theorem.
By Theorem 15.1, a set of points can be computed in time such that, with probability at least ,
if , and otherwise. Let . Let be the output of a call to , where for every ; see Fig. 6.
Let be the set that is defined in Line 6 of the above call to B-Coreset. Note that for every we have
We thus have , so . Let . By the construction of , we have that is a random sample of i.i.d functions from . By Theorem 7.3, with probability at least we have that is an -approximation of . Assume that this event indeed occurs, and suppose that was used in Line 4 of the above call to B-Coreset.
Using the last inequality in Lemma 14.2 yields
Summing (85) over yields
Using Corollary (15.3), we have that,In fact, we can use below and reduce the size of the resulting coreset if we are willing to have negative weights. In this case the term will be outside the parenthesis. If we want only positive weights, then the should be inside anyway. with probability at least , for every . In particular, by Line 8 of the algorithm -Median-Coreset (see Fig. 8), for every we have
Assume that the last inequality holds. Combining it with (87) yields
Combining the last inequality and (86) yields
Plugging (82) in the last inequality then proves the theorem.
Let , and . We have
We now bound and then .
where is a constant that depends only on , and equals to zero for all except terms of the summation. Equation (89) implies that there are two -dimensional vectors and , such that
where is a constant that depends only on and equals to zero for all except terms. Hence, there are two -dimensional vectors, and , such that
Suppose that . We now prove that
where the last deviation is by (93). By the last equation and (95),
Combining this with (95) yields . Using the last equation with (96) proves (94).
By (94), . Hence,
We now bound in a similar way. We have
Plugging the last equation and (97) in (88) yields
Since the last inequality holds for any , the dimension of is .
and let . Then .
Proof. We prove the lemma for the case . The case then follows from Lemma 6.5. Put . For every and , let
where is a constant that depends only on , and equals to zero for all except terms of the summation.
Equation (101) implies that there are two -dimensional vectors and , such that
Similarly, we can prove that there are two -dimensional vectors and , such that
and that there are two -dimensional vectors and .
By (105), . Hence,
Using the last equation with (106) yields
Since the last inequality holds for any , the dimension of is .
The construction time of is , where either one of the following holds:
and may be negative for some .
The proof is the same as the proof of Theorem 15.4, except for the computation of . In this case, we have instead of , as proved in Lemma 16.3.
Lemma 15.3 requires that for fixed and , and together with the bound on we need .
Appendix 17 k𝑘k-Line Median
Proof. Let . By Theorem 12.12, a set of ) lines that satisfies
can be computed, with probability at least , in time
Using the result from [FFS06], a set , , with a weight function can be constructed in time such that
Together with (107), (108) and (109) this proves the theorem as
Appendix 18 B𝐵B-Coresets for Projective Clustering
Let ,, , , and be defined as in Lemma 16.3. For every set and its corresponding set , let . Then .
Proof. Follows from the proof of Lemma 12.8 with , where the usage of Lemma 12.6(i) is replaced by Lemma 16.3. Notice that replaceing Lemma 12.6(i) by Lemma 16.3 adds a multiplicative factor of to the asserted dimension.
Proof. Put and . For each , let
and for each , let be defined as:
and be defined as:
Fix such that (110) holds, , , and let . We now prove that
Indeed, if then , so
By the last two inequalities and the assumption , we have
Together with the previous inequality and the assumptions , , and , we obtain
For every , let be defined as in Line 6 of a call to ; see Fig 6. Let , , and . By applying Lemma 18.2, we have . By its construction, is a random sample of i.i.d function from . By Theorem 7.5, with probability at least , is thus an -approximation of . Assume that this event indeed occurs, and let be the output of a call to using as an -approximation for in Line 6. By Theorem 13.1 we obtain We have
Combining the last two inequalities and (111) yields
We now prove that the right hand side of the last inequality is positive. By letting
Using (110), and the assumption of the lemma, we have
Combining the last inequality with (115) yields
By construction of , we have either (i) for some , or (ii) for some ; see Fig. 6. Let such that . In case (i), we have
In case (ii), for some . Hence, , and
We conclude that the lemma holds for both cases.
Our proof contains two conceptual steps. In the first step, we use Lemma 18.3 to iteratively prove the existence of a point for which
Combining the properties of , with the fact that is a coreset (via Theorem 14.3), will consist of the second part of our proof.
Notice that . If
then we are done, and have completed the first step of our proof (we set ).
Otherwise, we now present a procedure Improve, that for any integer , receives such that
and outputs . We show that iteratively applying Improve will result in the desired .
The procedure Improve returns which is the -tuple after replacing with . Notice that .
Suppose that we call to the procedure for until (119) does not hold. Fix and . We now prove that in at most calls of Improve the index was a “witness” that govern the construction of . Indeed, by contradiction assume that (120) holds for for the th time, . Applying (121) times yields
which contradicts the assumption that (120) holds.
Let be the output of the last call to Improve. Hence, (117) does not hold for , i.e,
By construction, every point in is spanned by at most points from . That is, . This concludes the first part of our proof.
By Theorem 14.3, with probability at least we have
which proves the theorem for a call to and a sufficiently large .
where is a sufficiently large constant. Then, a set of size , with a weight function , can be computed such that, with probability at least ,
Proof. By Theorem 15.1, a set of points can be computed in time such that, with probability at least ,
Assume that (125) indeed holds. Let be the output of a call to the algorithm -Median-Coreset
Consider the set of functions ; see Definition 14.4. For every , let . Using Lemma 18.2, we have . Similarly to the proof of Theorem 15.4, using the above definition of , we have with probability at least ,
The running time is . Let be a tuple of points that satisfies
as desired, for choosing a sufficiently large .
where is a sufficiently large constant. Then, a tuple of points can be computed in
time such that, with probability at least ,
Appendix 19 Subspace Approximation
points from . By applying Lemma 12.15 with , , and , the span of contains, with probability at least , a -median for . If then the span of trivially contains a -median for . Let an -dimensional subspace, and let be an matrix whose columns are mutually orthogonal unit vectors that span . The squared distance from a point to is then
The construction of from the set that spans takes time via SVD [GL96].
Using the observations from the previous paragraph we apply Theorem 11.3 with , , and to obtain a set , of -dimensional subspaces and a partition of such that, with probability at least ,
Since the last term is the bottleneck of our construction, we now suggest a construction which is faster for large values of .
Let denote a matrix whose columns are mutually orthogonal unit vectors that span the -subspace that is orthogonal to . Hence, the distance from to is . Let be a matrix whose entries are Gaussian unit vectors. Using the Johnson-Lindenstrauss lemma [DG03], we have, with probability at least
where the first inequality is by (126) and the fact that any subspace contains the origin.
For every , let be defined as in Line 6 of a call to ; see Fig 6. Let , and . Note that is a generalized range space; see Definition 7.2. By Theorem 12.9(i), we have . The number of ranges in is larger by at most than the number of ranges in . Hence, . See the proof of a similar argument in Lemma 9.6.
By its construction, is a random sample of i.i.d functions from . By Theorem 7.5, with probability at least , is thus an -approximation of . Assume that this event indeed occurs, and let be the output of such a call to using as an -approximation for in Line 6.
Put . By Corollary 13.2 and (131),
By the previous inequality and the construction of , we have
For every , , let denote an matrix whose set of rows is . The matrix can be constructed from and in time. Since has rank , there is a decomposition such that is an matrix whose columns are mutually orthogonal unit vectors, and is an matrix. and can be computed using the QR or SVD decomposition of in time. Hence, the overall time over all is .
By denoting as the Frobenius norm, we obtain
Let be an matrix whose rows are the union of rows in the matrices . Hence,
Let be the union of with the set of points which consists of the rows of . The size of is
Plugging (132) and (133) in (130) yields that for every ) we have
The constructing time of is dominated by (129).
where (137) and (19.1) holds by (135), inequality (138) is by (136), and inequality (139) is by the definition of . The overall running time is