Ranking with Fairness Constraints

L. Elisa Celis, Damian Straszak, Nisheeth K. Vishnoi

Introduction

Selecting and ranking a subset of data is a fundamental problem in information retrieval and at the core of ubiquitous applications including ordering search results such (e.g., Google), personalized social media feeds (e.g., Facebook, Twitter or Instagram), ecommerce websites (e.g., Amazon or eBay), and online media sites (e.g., Netflix or YouTube). The basic algorithmic problem that arises is as follows: There are $m$ items (e.g., webpages, images, or documents), and the goal is to output a list of $n\ll m$ items in the order that is most valuable to a given user or company. For each item $i\in[m]$ and a position $j\in[n]$ one is given a number $W_{ij}$ that captures the value that item $i$ contributes to the ranking if placed at position $j$ . These values can be tailored to a particular query or user and a significant effort has gone into developing models and mechanisms to learn these parameters [MRS+08]. In practice there are many ways one could arrive at $W_{ij}$ , each of which results in a slightly different metric for the value of a ranking – prevalent examples include versions of discounted cumulative gain (DCG) [JK02], Bradley-Terry [BT52] and Spearman’s rho [Spe04]. Note that for many of these metrics, one does not necessarily need $nm$ parameters to specify $W$ and typically $m$ “degrees of freedom” is enough (just specifying the “quality of each item”). Still, we choose to work with this general setting, and only abstract out the most important properties such a weight matrix $W$ satisfies. Generally, for such metrics, $W_{ij}$ is non-increasing in both $i$ and $j$ , and if we interpret $i_{1}<i_{2}$ to mean that $i_{1}$ has better quality than $i_{2}$ , then the value of the ranking can only increase by placing $i_{1}$ above $i_{2}$ in the ranking. Formally, such values satisfy the following property (known as monotonicity and the Monge condition)

for all $1\leq i_{1}<i_{2}\leq m$ and $1\leq j_{1}<j_{2}\leq n$ (see Appendix A). The ranking maximization problem is to find an assignment of the items to each of the $n$ positions such that the total value obtained is maximized. In this form, the problem is equivalent to finding the maximum weight matching in a complete $m\times n$ bipartite graph and has a well known solution – the Hungarian algorithm.

However, recent studies have shown that producing rankings in this manner can result in one type of content being overrepresented at the expense of another. This is a form of algorithmic bias and can lead to grave societal consequences – from search results that inadvertently promote stereotypes by over/under-representing sensitive attributes such as race and gender [KMM15, BCZ+16], to news feeds that can promote extremist ideology [CHRG16] and possibly even influence the results of elections [Bae16, BMA15]. For example, [ER15] demonstrated that by varying the ranking of a set of news articles the voting preferences of undecided voters can be manipulated. Towards ensuring that no type of content is overrepresented in the context of the ranking problem as defined above, we introduce the constrained ranking maximization problem that restricts allowable rankings to those in which no type of content dominates – i.e., to ensure the rankings are fair.

This problem is equivalent to finding a maximum weight matching of size $n$ that satisfies the given fairness constraints in a weighted complete $m\times n$ bipartite graph, and now becomes non-trivial – its complexity is the central object of study in this paper.

Beyond the fairness and ethical considerations, traditional diversification concerns in information retrieval such as query ambiguity (does “jaguar” refer to the car or the animal?) or user context (does the user want to see webpages, news articles, academic papers or images?) can also be cast in this framework. Towards this, a rich literature on diversifying rankings has emerged in information retrieval. On a high-level, several approaches redefine the objective function to incorporate a notion of diversity and leave the ranking maximization problem unconstrained. E.g., a common approach is to re-weight the $w_{ij}$ s to attempt to capture the amount of diversity item $i$ would introduce at position $k$ conditioned on the items that were placed at positions $1,\ldots,k-1$ (see [CG98, ZH08, CKC+08, ZCL03, ZMKL05]), or casting it directly as an (unconstrained) multi-objective optimization problem [YS17]. Alternate approaches mix together or aggregate different rankings, e.g., as generated by different interpretations of a query [RBCJ09, DKNS01]. Diversity has also been found to be desirable by users [CT17], and has been observed to arise inherently when the ranking is determined by user upvotes [CKK16] Despite these efforts and the fact that all major search engines now diversify their results, highly uniform content is often still displayed – e.g., certain image searches can display results that have almost entirely the same attributes [KMM15]. Further, [GS09] showed that no single diversification function can satisfy a set of natural axioms that one would want any fair ranking to have. In essence, there is a tension between relevance and fairness – if the $w_{ij}$ s for items that have a given property are much higher than the rest, the above approaches cannot correct for overrepresentation. Hence the reason to cast the problem as a constrained optimization problem: The objective is still determined by the values but the solution space is restricted by fairness constraints.

Our Model

We study the following constrained ranking maximization problem

We distinguish two important special cases of the problem: when only the upper-bound constraints are present, and when only the lower-bound constraints are present. These variants are referred to as the constrained ranking maximization problem (U) and the constrained ranking maximization problem (L) respectively, and to avoid confusion we sometimes add (LU) when talking about the general problem with both types of constraints present. Furthermore, most our results hold under the assumption that the weight function $W$ is monotone and satisfies the Monge property (1), whenever these assumptions are not necessary, we emphasize this fact by saying that general weights are allowed.

Our Results

In this section, we present an overview of our results. The statements of theorems here are informal for the ease of readability, formal statements are deferred to a later part.

of item $i$ be the set of properties that the item $i$ has. Our first result is an exact algorithm for solving the constrained ranking maximization problem whose running time is polynomial if the number of distinct $T_{i}$ s, denoted by $q$ , is constant; see Theorem 7.1.

There is an algorithm that solves the constrained ranking maximization problem (LU) in $O(pqn^{q}+pm)$ time when the values $W$ satisfy property (1).

This algorithm combines a geometric interpretation of our problem along with dynamic programming and proceeds by solving a sequence of $q-$ dimensional sub-problems. The proof of Theorem 3.1 is provided in Section 7. When $q$ is allowed to be large, the problem is $\mathbf{NP}$ -hard; see Theorem 3.5.

Generally, we may not be able to assume that $q$ is a constant and, even then, it would be desirable to have algorithms whose running time is close to $(m+n)p$ , the size of the input. Towards this we consider a natural parameter of the set of properties: The size of the largest $T_{i}$ , namely

The complexity of the constrained ranking maximization problem turns out to show interesting behavior with respect to $\Delta$ (note that $\Delta\leq p$ and typically $p\ll q$ ). The case when $\Delta=1$ corresponds to the simplest practical setting where there are $p$ disjoint properties, i.e., the properties partition the set of items. For instance, a set of images of humans could be partitioned based on the ethnicity or age of the individual. Note that even though $q=p$ for $\Delta=1$ , this $q$ could still be large and the previous theorem may have a prohibitively large running time.

When $\Delta=1$ we prove that the constrained ranking maximization problem (LU) is polynomial time solvable.

The constrained ranking maximization problem (LU) for $\Delta=1$ can be solved in $\widetilde{O}(n^{2}m)$ time when the values $W$ satisfy property (1).

The above is obtained by reducing this variant of the ranking maximization problem to the minimum cost flow problem, that can be solved efficiently (the network is acyclic). We note that even though the running time is polynomial in $m$ , it might be still not satisfactory for practical purposes. With the aim of designing faster – linear time algorithms, we focus on the case when only upper-bound constraints are present. For this case, we analyze a natural linear programming (LP) relaxation for the constrained ranking maximization problem (U). It reveals interesting structure of the problem and motivates a fast greedy algorithm. Formally, the relaxation considers the set $\Omega_{m,n}$ defined as

Observe that in the absence of fairness constraints, (3) represents the maximum weight bipartite matching problem – it is well known that the feasible region of its fractional relaxation has integral vertices and hence the optimal values of these two coincide. However, in the constrained setting, even for $\Delta=1$ , it can be shown that the feasible region is no longer integral – it can have fractional vertices (see Fact 8.2). For this reason, it is not true that maximizing any linear objective results in an integral solution. Surprisingly, we prove that for $\Delta=1$ the cost functions we consider are special and never yield optimal fractional (vertex) solutions.

Consider the linear programming relaxation (3) for the constrained ranking maximization problem (U) when $\Delta=1$ and the objective function satisfies (1). Then there exists an optimal solution with integral entries and hence the relaxation is exact. Further, there exists a greedy algorithm to find an optimal integral solution in $O(np+m)$ time.

The proof relies on a combinatorial argument on the structure of tight constraints that crucially uses the assumption that $\Delta=1$ and the property (1) of the objective function. Note that the result of Theorem 3.3 implies in particular that whenever the linear program (3) is feasible then there is also an integer solution – a feasible ranking. This can be also argued for the general (LU) variant of the problem and its corresponding LP relaxation. However, extending Theorem 3.3 to this case seems more challenging and is left as an open problem.

When trying to design algorithms for larger $\Delta$ , the difficulty is that the constrained ranking feasibility problem remains $\mathbf{NP}$ -hard (in fact, even hard to approximate when feasibility is guaranteed) for $\Delta\geq 3$ ; see Theorems 10.1 and 10.2. Together, these results imply that unless we restrict to feasible instances of the constrained ranking problem, it is impossible to obtain any reasonable approximation algorithm for this problem. In order to bypass this barrier, we focus on the (U) variant of the problem and present an algorithmically verifiable condition for feasibility and argue that it is natural in the context of information retrieval. For each $1\leq k\leq n,$ we consider the set

of all properties whose constraints increase by at least $1$ when going from the $(k-1)$ st to the $k$ th position. We observe that the following abundance of items condition is sufficient for feasibility:

We show that assuming condition (4), there is a linear-time algorithm that achieves an $(\Delta+2)$ -approximation, while only slightly violating the upper-bound constraints. This result does not need assumption (1), rather only that the $W_{ij}$ s are non-negative. This result is near-optimal; we provide an $\Omega\left(\frac{\Delta}{\log\Delta}\right)$ hardness of approximation result (see Theorem 10.2).

Δ2(\Delta+2)-approximation algorithm; see Theorem 9.1) For the constrained ranking maximization problem (U), under the assumption (4), there is an algorithm that in linear time outputs a ranking $x$ with value at least $\frac{1}{\Delta+2}$ times the optimal one, such that $x$ satisfies the upper-bound constraints with at most a twice multiplicative violation, i.e.,

One can construct artificial instances of the ranking problem, where the output of the algorithm indeed violates upper-bound constraints with a $2$ -multiplicative factor. However, these violations are caused by the presence of high-utility items with a large number of properties. Such items are unlikely to appear in real-life instances and thus we expect the practical performance of the algorithm to be better than the worst-case bound given in Theorem 3.4 suggests. Lastly we summarize our hardness results for the constrained ranking problem.

The following variants of the constrained ranking feasibility (U) and constrained ranking maximization (U) problem are $\mathbf{NP}$ -hard.

Deciding feasibility for the case of $\Delta\geq 3$ (Theorem 10.1).

Under the feasibility condition (4), approximating the optimal value of a ranking within a factor $O\left(\nicefrac{{\Delta}}{{\log\Delta}}\right)$ , for any $\Delta\geq 3$ (Theorem 10.2).

Deciding feasibility when only the number of items $m$ , number of positions $n$ , and upper-bounds $u$ are given as input; the properties are fixed for every $m$ (Theorem 10.3).

For every constant $c$ , deciding between whether there exists a feasible solution or every solution violates some constraint by a factor of $c$ (Theorem 10.4).

Other Related Work

Information retrieval, which focuses on selecting and ranking subsets of data, has a rich history in computer science, and is a well-established subfield in and of itself; see, e.g., the foundational work by [SB88]. The probability ranking principle (PRP) forms the foundation of information retrieval research [MK60, Rob77]; in our context it states that a system’s ranking should order items by decreasing value. Our problem formulation and solutions are in line with this – subject to satisfying the diversity constraints.

A related problem is diverse data summarization in which a subset of items with varied properties must be selected from a large set [PDSAT12, CDKV16], or similarly, voting with diversity constraints in which a subset of items of people with varied properties or attributes must be selected via a voting procedure [Mon95, CHV57]. However, the formulation of these problem is considerably different as there is no need to produce a ranking of the selected items, and hence the analogous notion of constraints is more relaxed. Extending work on fairness in classification problems [ZWS+13], the fair ranking problem has also been studied as an (unconstrained) multi-objective optimization problem, and various fairness metrics of a ranking have been proposed [YS17].

Combining the learning of values along with the ranking of items has also been studied [RKJ08, SRG13]; in each round an algorithm chooses an ordered list of $k$ documents as a function of the estimated values $W_{ij}$ and can receive a click on one of them. These clicks are used to update the estimate of the $W_{ij}$ s, and bounds on the regret (i.e., learning rate) can be given using a bandit framework. In this problem, while there are different types of items that can affect the click probabilities, there are no constraints on how they should be displayed.

Recent work has shown that, in many settings, there are impossibility results that prevent us from attaining both property and item fairness [KMR17]. Indeed, our work focuses on ensuring property fairness (i.e., no property is overrepresented), however this comes at an expense of item fairness (i.e., depending on which properties an item has, it may have much higher / lower probability of being displayed than another item with the same value). In our motivating application we deal with the ranking of documents or webpages, and hence are satisfied with this trade-off. However, further consideration may be required if, e.g., we wish to rank people as this would give individuals different likelihoods of being near the top of the list based on their properties rather than solely on their value.

In Section 4 we discuss other related work. Section 5 contains an overview of the proofs of our main results. Section 6 contains the proof of Theorem 6 – a polynomial time algorithm for the $\Delta=1$ case. The proof of Theorem 3.1 on the exact algorithm for general $\Delta$ is presented in Section 7. Section 8 contains the proof of Theorem 3.3 which shows that there exists an integral solution to (3); it also provides a simple and fast greedy algorithm for the ranking maximization problem (U) for $\Delta=1$ . Section 9 contains the proof of Theorem 3.4, which gives our approximate result on the ranking maximization problem (U) for general $\Delta$ . Our hardness results are presented in Section 10. In Section 11 we provide a discussion of possible directions for future work and open problems. Finally in Appendix A we give a brief overview of some common ranking metrics and explain how they can be captured by values $W_{ij}$ that satisfy (1).

Proof Overviews

Let $Q:=\{t_{i}:i\in[m]\}$ be the set of all the different property vectors $t_{i}$ that appear for items $i\in[m]$ , and let us denote its elements by $v_{1},v_{2},\ldots,v_{q}$ . A simple but important observation is that whenever two items $i_{1},i_{2}\in[m]$ (with say $i_{1}<i_{2}$ ) have the same property vector: $t_{i_{1}}=t_{i_{2}}$ , then in every optimal solution either $i_{1}$ will be ranked above $i_{2}$ , only $i_{1}$ is ranked, or neither is used. This follows from the assumption that the weight matrix is monotone in $i$ and $j$ and satisfies the property as stated in (1).

Overview of the proof of Theorem 3.2. The main idea is to reduce ranking maximization to the minimum cost flow problem and then observe several structural properties of the resulting instance which allow one to solve it efficiently (in $\widetilde{O}(n^{2}m)$ time).

The instance of the minimum cost flow problem we construct has $O(np)$ vertices and $O(nm)$ edges and is acyclic, which allows to replace the application of the Bellman-Ford algorithm in the first phase of the Successive Shortest Path algorithm by a linear-time procedure. This then easily leads to an implementation in $O(n^{2}m\log m)$ time.

Combinatorially, the directions we consider correspond to $4$ -cycles in the underlying complete bipartite graph, such that the weight of the matching can be improved by swapping edges along the cycle. The argument that shows the existence of such a cycle makes use of the special structure of the constraints in this family of instances.

The second part of Theorem 3.3 is an algorithm for solving the constrained ranking maximization problem for $\Delta=1$ in optimal (in the input size) running time of $O(np+m)$ . We show that a natural greedy algorithm can be used. More precisely, one iteratively fills in ranking positions by always selecting the highest value item that is still available and does not lead to a constraint violation. An inductive argument based that relies on property 1 and the $\Delta=1$ assumption gives the correctness of such a procedure.

Overview of the proof of Theorem 3.4. Let $\Delta>1$ be arbitrary. The most important part of our algorithm is a greedy procedure that finds a large weight solution to a slightly relaxed problem in which not all positions in the ranking have to be occupied. It processes pairs $(i,j)\in[m]\times[n]$ in non-increasing order of weights $W_{ij}$ and puts item $i$ in position $j$ whenever this does not lead to constraint violation.

is the set of independent sets in this (laminar) matroid. In the work [Jen76] it is shown that the greedy algorithm run on an intersection of $K$ matroids yields $K$ -approximation, hence $(p+2)$ -approximation of our algorithm follows.

To obtain a better – $(\Delta+2)$ -approximation bound, a more careful analysis is required. The proof is based on the fact that, roughly, if a new element is added to a feasible solution $S$ , then at most $\Delta+2$ elements need to be removed from $S$ to make it again feasible. Thus adding greedily one element can cost us absence of $\Delta+2$ other elements of weight at most the one we have added. This idea can be formalized and used to prove the $(\Delta+2)$ -approximation of the greedy algorithm. This is akin to the framework of $K$ -extendible systems by [Mes06] in which this greedy procedure can be alternatively analyzed. Finally, we observe that since the problem solved was a relaxation of the original ranking maximization problem, the approximation ratio we obtain with respect to the original problem is still $(\Delta+2)$ .

It remains to complete the ranking by filling in any gaps that may have been left by the above procedure. This can be achieved in a greedy manner that only increases the value of the solution, and violates the constraints by at most a multiplicative factor of $2$ .

Overview of the proof of Theorem 3.5. Our hardness results are based on a general observation that one can encode various types of packing constraints using instances of the constrained ranking maximization (U) and feasibility (U) problem. The first result (Theorem 10.1) – $\mathbf{NP}$ -hardness of the feasibility problem (for $\Delta\geq 3$ ) is established by a reduction from the hypergraph matching problem. Given an instance of the hypergraph matching problem one can think of its hyperedges as items and its vertices as properties. Degree constraints on vertices can then be encoded by upper-bound constraints on the number of items that have a certain property in the ranking. The inapproximability result (Theorem 10.2) is also established by a reduction from the hypergraph matching problem, however in this case one needs to be more careful as the reduction is required to output instances that are feasible.

A Polynomial Time algorithm for Δ=1Δ1\Delta=1

Proof of Theorem 3.2: We show how to solve the constrained ranking maximization problem for $\Delta=1$ by reformulating it as a minimum cost flow problem in a directed network. For clarity, we begin with the case when only upper-bound constraints are present and then modify the construction to deal with lower-bounds as well.

Given an instance of the constrained ranking maximization problem we build a directed, weighted, capacitated graph $G=(V,E)$ such that the solution of the min cost flow problem on $G$ allows us to solve the instance of the ranking problem.

where we define $W_{i,n+1}:=0$ for every item $i\in[m]$ .

We pick the edges on this path so that all of them correspond to the item $i$ – this is possible by our observation above. The cost contributed by this path is

The fact that these are the edges with smallest cost follows from the Monge property.

To solve the constructed instance of minimum cost flow, note first that the network $G$ does not contain any negative-cost cycle (in fact it is acyclic), which is a sufficient condition for the problem to be polynomial-time solvable.

One can use the Successive Shortest (augmenting) Paths algorithm (see e.g. [AMO93]) to solve this problem efficiently. For the initial computation of the node potentials one does not need to use the Bellman-Ford algorithm (which has running time $O(|V|\cdot|E|)=O(pmn^{2})$ ) but take advantage of the fact that $G$ is acyclic and compute single source shortest paths from $s$ by sorting the vertices topologically and running a simple dynamic programming procedure. After that, the algorithm needs to augment the flow $n$ times, each augmentation requires to run Dijkstra’s algorithm once, i.e., takes $O(|E|\log|V|)=O(mn\log m)$ time. Thus the total running time of the algorithm is $O(nm+n\cdot mn\log m)=O(n^{2}m\log m)$ .

Dynamic Programming-based Exact Algorithm

Recall that for an instance of constrained ranking maximization, every item $i\in[m]$ has a type $T_{i}$ assigned to it, which is the set of all properties item $i$ has. In this section, we present an exact dynamic programming algorithm for solving the constrained ranking maximization problem which is efficient when the number of distinct types $q$ in the instance is small. We start by providing a geometric viewpoint of the problem, which (arguably) makes it easier to visualize and provides us with convenient notation under which the dynamic programming algorithm is simpler to state and understand.

Note that every ranking can be described either by a binary matrix $x\in\{0,1\}^{m\times n}$ such that $x_{ij}=1$ if and only if item $i$ is ranked at position $j$ , or alternatively by a one-to-one function $\pi:[n]\to[m]$ such that $\pi(j)$ is the item ranked at position $j$ , for every $j\in[n]$ . Using the latter convention we can encode the fairness condition as

where $L_{k}=(L_{k1},L_{k2},\ldots,L_{kp})^{\top}$ and $U_{k}=(U_{k1},U_{k2},\ldots,U_{kp})^{\top}$ are the vectors of lower and upper-bounds at position $k$ . In other words, a ranking is feasible if and only if the $k$ th partial sum of all $t_{i}$ vectors of items at top- $k$ positions belongs to the rectangle $[L_{k1},U_{k1}]\times[L_{k2},U_{k2}]\times\ldots\times[L_{kp},U_{kp}]$ , for every $k\in[n]$ .

2 The dynamic programming algorithm

There is an algorithm that solves the constrained ranking maximization problem (LU) when the objective function $w$ satisfies property (1) in $O(mp+pqn^{q})$ time (where $q$ is the number of different types of items).

This is positive due to the (strict) property (1). Hence the swap can only increase the weight of the solution. A similar reasoning shows that it is beneficial to swap a ranked item $i_{2}$ with an unranked item $i_{1}$ , whenever $i_{1}<i_{2}$ .

The above allows us to reduce the number of candidate rankings which one has to check to roughly $q^{n}$ . However, this number is still prohibitively large. As $q\ll n$ in many scenarios of interest, we construct a dynamic programming algorithm with a much fewer states $O(n^{q})$ .

The total number of sub-problems is at most $O(n^{q})$ . Hence, the above algorithm can be implemented in time $O(m\cdot p+p\cdot q\cdot n^{q})$ , where $m\cdot p$ time is required to read the input and construct the list of elements of every given type. The second term $O(p\cdot q\cdot n^{q})$ appears because there are $n^{q}$ subproblems, each such sub-problem considers $q$ cases, and every case has a feasibility check that takes $O(p)$ time.

Algorithms for Δ=1Δ1\Delta=1 and Upper-Bound Constraints

For simplicity, assume that the matrix $w$ satisfies the strict variant of property (1); i.e., when the inequalities in (1) are strict. This can be achieved by a small perturbation of the weights without changing the optimal ranking.

Our proof consist of two phases. In the first phase, we show that every optimal solution satisfies a certain property on its support. In the second phase we show that no optimal solution that has this property can have fractional entries. Let us state the property of a feasible solution $x\in\Omega_{m,m}$ that we would like to establish:

Similarly, for every rank position $j\in[m]$ , we have $\sum_{i=1}^{m}x^{\prime}_{i,j}=\sum_{i=1}^{m}x_{i,j}=1$ . Hence, $x^{\prime}\in\Omega_{m,m}$ .

Therefore $x^{\prime}$ is feasible for (3). Furthermore, because of the (strict) property (1), we have:

As this is a feasible solution with a strictly better objective value, we conclude that $x$ was not optimal. Hence, every optimal solution necessarily satisfies (3).

Let us again consider a new candidate solution using the indices defined above:

In contrast to the above theorem, some vertices of the feasible region might be non-integral.

There exists an instance of the ranking maximization problem for $\Delta=1$ , such that the feasible region of (3) has fractional vertices.

Proof: Let $n=m=4$ and suppose there is only one property $P_{1}=\{1,2\}$ and the constraints are $U_{21}=1$ and $U_{k1}=\infty$ for $k\neq 2$ . In other words, we only constrain the ranking to have at most $1$ element of property $1$ in the top-2 entries.

Clearly $x$ is feasible. Observe that the support of $x$ has $2n$ elements and there are exactly that many linearly independent tight constraints at that point. Indeed the doubly-stochastic constraints give us $2n$ constraints out of which $2n-1$ are linearly independent, and the remaining one is

Therefore $x$ is a (non-integral) vertex of the feasible region of (3).

2 Fast greedy algorithm

Due to the special structure for $\Delta=1$ , we are able to find a fast simple algorithm for the ranking maximization problem in this case.

There exists an algorithm which, given an instance of the constrained ranking maximization problem (U) with $\Delta=1$ and objective function that satisfies property (1), outputs an optimal ranking in $O(m+n\cdot p)$ time.

Proof: For simplicity, assume that $w$ satisfies the strict variant of property (1) (with strict inequalities in the definition). This can be assumed without loss of generality by slightly perturbing $w$ . Consider the following greedy algorithm that iteratively constructs a ranking $\pi:[n]\to[m]$ (i.e., $\pi(j)$ is the item ranked at position $j$ , for all $j\in[n]$ ).This alternate notation makes the exposition in this section cleaner – see also the notation and problem formulation in Section 7.1.

Let $i\in[m]$ be the smallest index of an item which was not yet picked and can be added at position $j$ without violating any constraint. If there is no such $i$ , output INFEASIBLE.

It is clear that if the above algorithm outputs a ranking $\pi:[n]\to[m]$ then $\pi$ is feasible. Assume now that it indeed outputs a ranking. We will show that it is optimal.

Further, the above observation allows us to now prove optimality of the greedy strategy. Take the largest number $k$ such that $\pi$ and $\pi^{\star}$ agree on $[k-1]$ , i.e., $\pi(j)=\pi^{\star}(j)$ for $j=1,2,\ldots,k-1$ . If $k-1=n$ then there is nothing to prove. Let us then assume that $k-1<n$ and $\pi(k)=i_{A}\neq i_{O}=\pi^{\star}(k)$ . There are two cases: either $i_{A}$ is ranked in $\pi^{\star}$ or it is not.

Hence, if $k-1<n$ , this contradicts the optimality of $\pi^{\star}$ . Thus, $\pi=\pi^{\star}$ and $\pi$ is the optimal ranking. By the same argument, one can show that if the instance is feasible, the greedy algorithm will never fail to output a solution (i.e., report infeasibility).

A (Δ+2)Δ2(\Delta+2)-Approximation Algorithm

Δ2(\Delta+2)-Approximation Algorithm Theorem 9.1 There exists a linear time algorithm which given an instance of the constrained ranking maximization problem (U) satisfying condition (4), outputs a ranking $x\in\{0,1\}^{m\times n}$ whose weight is at least $\frac{1}{\Delta+2}$ times the optimal one and satisfies all fairness constraints up to a factor of $2$ , i.e.,

Proof: The algorithm can be divided into two phases: First we construct a partial ranking that may leave some positions empty, and then we refine it to yield a complete ranking.

The first phase is finding a (close to optimal) solution to the following integer program:

where $\widetilde{\Omega}_{m,n}$ is the set of all $x\in\{0,1\}^{m\times n}$ such that for all $i\in[m]$ and $j\in[n]$

Note that this is a relaxation of our problem as it does not require every position in the ranking to be filled. In the proof we will often refer to a pair $(i,j)\in[m]\times[n]$ as a cell, since we treat them as entries in a 2-dimensional array. Consider the following greedy algorithm which can be used to find a (likely non-optimal) integral solution $z\in\{0,1\}^{m\times n}$ to the problem (8).

Order the cells $(i,j)$ according to non-increasing values of $W_{ij}$ .

Set $z_{ij}=0$ for all $(i,j)\in[m]\times[n]$ .

if adding $(i,j)$ to the solution does not cause any constraint violation, set $z_{ij}=1$ ,

We claim that the solution $z\in\{0,1\}^{m\times n}$ given by this algorithm has value at least $\frac{1}{\Delta+2}$ times the optimal solution. In other words,

The last equality follows from the fact that $S_{N}$ is a maximal solution (it cannot be extended), hence indeed

Let us now show how to construct a full ranking out of it. Let

The approximation guarantee follows because $y$ has a nonnegative contribution to the total weight and the guarantee on $x$ is with respect to a relaxed problem (8) whose optimal value is an upper-bound on the optimal value of the ranking maximization problem. Hence it remains to find $y$ with properties as above.

We can construct $y$ using another greedy procedure. Let $y=0$ . We process the elements of $J$ one at a time in increasing order. When considering a given $j$ pick any $i\in I$ such that adding $(i,j)$ to the solution $y$ does not introduce fairness constraint violation. It is clear that if the above algorithm succeeds to find a suitable $i\in I$ at every step, then it succeeds to construct a solution with the required properties. It remains to show that this is indeed the case. To this end, fix $j\in J$ and look at the step in which $j$ is considered. From condition (4), we know that there are at least $n$ elements in $[m]$ that can be placed at position $j$ without violating any constraint. Out of these $n$ elements, some may have been already taken by the above algorithm (i.e., they are not in $I$ ) and some of them could have been added to the new solution $y$ . However, as there were $n$ of them to begin with, at least one remains and can be selected. This concludes the proof of correctness of the above procedure. It remains to observe that both phases of the algorithm (after simple preprocessing) can be implemented in linear time, i.e., $O((m+n)\cdot p)$ .

Hardness Results

In this section we state and prove our hardness results regarding the constrained ranking feasibility and maximization problems for the (U) variant.

The name $\Delta$ above is intended to be suggestive – recall that in the context of the constrained ranking feasibility (or maximization) problem, the parameter $\Delta$ is the maximum number of properties any given item has. The first hardness result states that even for small $\Delta$ the constrained ranking feasibility is hard.

The constrained ranking feasibility problem (U) is $\mathbf{NP}$ -hard for any $\Delta\geq 3$ .

One possibility is that it is the feasibility of this problem that makes it hard. However, this is not the case. Our next Theorem says that even if we guarantee feasibility via assumption (4), the problem remains hard to approximate.

A feasible constrained ranking maximization problem (U) that satisfies condition (4) and has weights that satisfy property (1) is $\mathbf{NP}$ -hard to approximate within a factor of $O(\nicefrac{{\Delta}}{{\log\Delta}})$ .

Proof: In the proof we will use the fact that the maximum $\Delta$ -hypergraph matching problem is hard to approximate within a factor of $O(\nicefrac{{\Delta}}{{\log\Delta}})$ [HSS03]. We present an approximation-preserving reduction from the maximum $\Delta$ -hypergraph matching problem. The proof is similar to that of of Theorem 10.1, however we must now ensure that the resulting instance is feasible and satisfies condition (4).

Note that such a matrix satisfies the property (1).

Every matching $M\subseteq E$ of cardinality $k$ in $H$ corresponds to a feasible ranking of total weight $k$ in our instance of ranking maximization (simply take the $k$ items corresponding to edges in the matching and add any $(m-k)$ improper items). Similarly, every ranking of weight $k$ can be transformed into a matching of size $k$ . As the reduction is approximation preserving, we conclude that the constrained ranking maximization problem is $\mathbf{NP}$ -hard to approximate within a factor better than $O(\nicefrac{{\Delta}}{{\log\Delta}})$ .

Our next result says that the constrained ranking feasibility is hard even when we fix the structure of the set of properties; i.e., the only input to the problem are the upper-bound constraints.

Proof: Let the family of properties $\mathcal{P}_{m}$ be all 2-element subsets of $[m]$ , i.e.,

We reduce the independent set problem to the above problem of constrained ranking maximization with a fixed property set.

Clearly feasible rankings are in one-to-one correspondence with independent sets of size $n$ in $G$ . Indeed, if we place items (vertices) from an independent set of size $n$ in the ranking in any order, then the resulting ranking is feasible, as the only essential constraints which are imposed on the ranking are that for every edge $e=\{i_{1},i_{2}\}\in E$ at most one of $i_{1},i_{2}$ is present in the ranking. Conversely, if these constraints are satisfied, the set of $n$ items placed in the ranking forms an independent set of size $n$ . As the independent set problem is $\mathbf{NP}$ -hard, via the above reduction we obtain hardness of the constrained ranking maximization problem with fixed properties.

Lastly, one could hope that it is still possible solve the constrained ranking maximization problem by allowing constraints to be violated by a small amount. However, this also remains hard. The result states that finding a solution which violates all the constraints at most a factor of $c$ (for any constant $c$ ) is $\mathbf{NP}$ -hard. Interestingly our proof relies on a certain bound from Ramsey theory.

For every constant $c>0$ , the following violation gap variant of the constrained ranking feasibility problem (U) is $\mathbf{NP}$ -hard.

Output YES if the input instance is satisfiable.

Output NO if there is no solution which violates every upper-bound constraint at most $c$ times.

Proof: The reduction is inspired by an idea developed for the approximation hardness of packing integer programs by [CK05]. We use the inapproximabality of independent set ([Has96, Zuc07]) to prove the theorem. It states that approximating the cardinality of the maximum independent set in an undirected graph $G=(V,E)$ to within a factor of $|V|^{1-\varepsilon}$ is $\mathbf{NP}$ -hard, for every constant $\varepsilon>0$ .

If there is a ranking that violates all of the constraints at most a factor of $c$ , then there is an independent set of cardinality $\Omega\left(n^{1/(c+1)}\right)$ in $G$ .

If there is no feasible ranking then there is no independent set of cardinality $n$ in $G$ .

Note that proving the above suffices to obtain the desired result: If there was a procedure to solve the constrained ranking feasibility problem for $c$ , then one could approximate in polynomial time the cardinality of the maximum independent set in a graph with an approximation ratio of $|V|^{1-\nicefrac{{1}}{{(c+1)}}}$ , which is not possible unless $P=NP$ [Has96, Zuc07]. Hence it remains to establish the claim.

To establish the second claim, note that every independent set $S\subseteq V$ of size $n$ gives a feasible solution to the constrained ranking instance by placing the items corresponding to $S$ in the ranking in any order. To establish the first claim, suppose that there is a solution that violates every constraint by at most a factor of $c$ . This means that we have a subset $S$ containing $n$ vertices in $G$ that does not contain any $(c+1)$ -clique. Using a standard upper-bound on the Ramsey number $R\left(n^{\frac{1}{c+1}},c\right)$ it follows that in the subgraph induced by $n$ in $G$ there exists an independent set of cardinality $\Omega\left(n^{1/(c+1)}\right)$ .

Discussion and Future Work

In this paper, motivated by controlling and alleviating algorithmic bias in information retrieval, we initiate the study of the complexity of a natural constrained optimization problem concerning rankings. Our results indicate that the constrained ranking maximization problem, which is a generalization of the classic bipartite matching problem, shows fine-grained complexity. Both the structure of the constraints and the numbers appearing in upper-bounds play a role in determining its complexity. Moreover, this problem generalizes several hypergraph matching/packing problems. Our algorithmic results bypass the obstacles implicit in the past theory work by leveraging on the structural properties of the constraints and common objective functions from information retrieval. More generally, our results not only contribute to the growing set of algorithms to counter algorithmic bias for fundamental problems [DHP+12, BS15, ZVGRG15, CDKV16, CKS+18, CDK+17, Kir16, CV17], the structural insights obtained may find use in other algorithmic settings related to the rather broad scope of ranking problems.

In this work we consider linear objective functions for the the ranking optimization problem, i.e., the objective is an independent sum of profits for individual item placements. While this model might be appropriate in certain settings, there may be cases where one would prefer to measure the quality of a ranking as a whole, and in particular the utility of placing a given item at the $k$ th position should also depend on what items were placed above it [ZCL03]. Thus defining and studying a suitable variant of the problem for a class of objectives on rankings that satisfy some version of the diminishing returns principle (submodularity), is of practical interest.

A related question that deserves independent exploration is to study the complexity of sampling a constrained ranking from the probability distribution induced by the objective (rather than outputting the ranking that maximizes its value, output a ranking with probability proportional to its value). Finally, extending our results to the online setting seems like an important technical challenge which is also likely to have practical consequences.

We would like to thank Mohit Singh and Rico Zenklusen for valuable discussions on the problem.

References

In information retrieval, when proposing a new ranking method, it is naturally important to evaluate the quality of the resulting output. Towards this, a variety of ranking metrics have been developed. Ideally, one would use such metrics to compare how “good” a constrained ranking is (with respect to quality) in comparison to an unconstrained ranking. Putting it another way, we would like to maximize the quality of the constrained ranking as measured by such metrics. In this section we briefly survey three important classes of ranking metrics, translated to our setting, and show that our problem formulation can capture metrics by defining the values $W_{ij}$ appropriately.

Such metrics are often being defined with respect to the item quality (often referred to as relevance in the IR literature). Without loss of generality, we relabel the items so that $a_{1}\geq a_{2}\geq\cdots\geq a_{m}$ denote the qualities for $i\in\{1,\ldots,m\}$ . For ease of presentation, we introduce simple forms of these metrics below; often in practice they are normalized for better comparison across rankings – the two are equivalent with respect to optimization. We consider integral rankings $x\in\overline{\Omega}_{m,n}$ where $x\in\{0,1\}^{m\times n}$ if and only if for every $i\in[m]$ and every $j\in[n]$

The quality of the ranking is considered to be the sum of the quality of its items where the quality of the item at position $j$ is discounted by a value $f(j)$ that is non-increasing in $j$ ; in other words, getting the order correct at the top of the list is more important than getting it correct at the bottom. Formally, given a ranking $x\in\overline{\Omega}_{m,n}$ , a rank-1 metric is defined as

A.2 Bradley-Terry Metrics:

A.3 Alignment Metrics:

One can also consider metrics that depend only on the difference in the item positions between two rankings. Let $x^{\star}\in\overline{\Omega}_{m,m}$ be the optimal unconstrained ranking (this corresponds to sorting $i$ s in non-increasing order of $a_{i}$ ). Alignment metrics are defined with respect to the positions of the items in $x^{\star}$ , and we give some examples below.

Given a ranking $x\in\overline{\Omega}_{m,n}$ , in our setting Spearman’s footrule [Spe06] corresponds to

and similarly Spearman’s rho [Spe04] corresponds to

These vary slightly from the usual definitions as it is for a partial rather than complete ranking, and we measure similarity rather than difference. The former is in line with partial ranking versions of Spearman’s footrule and Spearman’s rho as in [FKS03, FKM+06], and for the latter we add a value to maintain non-negativity and monotonicity.

The above metrics can be naturally captured in our formulation by letting $W_{ij}:=(2m-i-j)-|j-j^{\star}|$ or $W_{ij}:=(2m-i-j)^{2}-(j-j^{\star})^{2}$ respectively where $j^{\star}$ is such that $x^{\star}_{ij^{\star}}=1$ . Here, for any two $1\leq i_{1}<i_{2}\leq m$ and $1\leq j_{1}<j_{2}\leq n$ it is clear that $W_{i_{1}j_{1}}\geq W_{i_{2}j_{1}}$ and $W_{i_{1}j_{1}}\geq W_{i_{1}j_{2}}$ as desired. Furthermore, it is not difficult to show via a simple but tedious case analysis of the possible orderings of $j_{1},j_{2},j^{\star}_{1},j^{\star}_{2}$ that the desired property $(W_{i_{1}j_{1}}+W_{i_{2}j_{2}})-(W_{i_{1}j_{2}}+W_{i_{2}j_{1}})\geq 0$ holds. One can also think of weighted versions of these metrics, as introduced in [KV10], for which these observations generalize.