Computing Approximate Nash Equilibria in Polymatrix Games

Argyrios Deligkas, John Fearnley, Rahul Savani, Paul Spirakis

Introduction

Nash equilibria are the central solution concept in game theory. Since it is known that computing an exact Nash equilibrium DGP ; CDT is unlikely to be achievable in polynomial time, a line of work has arisen that studies the computational aspects of approximate Nash equilibria. The most widely studied notion is of an ϵ\epsilon-approximate Nash equilibrium (ϵ\epsilon-Nash), which requires that all players have an expected payoff that is within ϵ\epsilon of a best response. This is an additive notion of approximate equilibrium; the problem of computing approximate equilibria of bimatrix games using a relative notion of approximation is known to be PPAD\mathtt{PPAD}-hard even for constant approximations Das13 .

So far, ϵ\epsilon-Nash equilibria have mainly been studied in the context of two-player bimatrix games. A line of work DMP ; Progress ; BBM10 has investigated the best ϵ\epsilon that can be guaranteed in polynomial time for bimatrix games. The current best result, due to Tsaknakis and Spirakis TS , is a polynomial-time algorithm that finds a 0.3393-Nash equilibrium of a bimatrix game with all payoffs in $$.

In this paper, we study ϵ\epsilon-Nash equilibria in the context of many-player games, a topic that has received much less attention. A simple approximation algorithm for many-player games can be obtained by generalising the algorithm of Daskalakis, Mehta and Papadimitriou DMP from the two-player setting to the nn-player setting, which provides a guarantee of ϵ=11n\epsilon=1-\frac{1}{n}. This has since been improved independently by three sets of authors BGR08 ; HRS08 ; BBM10 . They provide a method that converts a polynomial-time algorithm that for finding ϵ\epsilon-Nash equilibria in (n1)(n-1)-player games into an algorithm that finds a 12ϵ\frac{1}{2-\epsilon}-Nash equilibrium in nn-player games. Using the polynomial-time 0.33930.3393 algorithm of Tsaknakis and Spirakis TS for 22-player games as the base case for this recursion, this allows us to provide polynomial-time algorithms with approximation guarantees of 0.60220.6022 in 33-player games, and 0.71530.7153 in 44-player games. These guarantees tend to 11 as nn increases, and so far, no constant ϵ<1\epsilon<1 is known such that, for all nn, an ϵ\epsilon-Nash equilibrium of an nn-player game can be computed in polynomial time.

For nn-player games, we have lower bounds for ϵ\epsilon-Nash equilibria. More precisely, Rubinstein has shown that when nn is not a constant there exists a constant but very small ϵ\epsilon such that it is PPAD\mathtt{PPAD}-hard to compute an ϵ\epsilon-Nash equilibrium Rub14b . This is quite different from the bimatrix game setting, where the existence of a quasi-polynomial time approximation scheme rules out such a lower bound, unless all of PPAD\mathtt{PPAD} can be solved in quasi-polynomial time LMM03 .

Polymatrix games.

In this paper, we focus on a particular class of many-player games called polymatrix games. In a polymatrix game, the interaction between the players is specified by an nn vertex graph, where each vertex represents one of the players. Each edge of the graph specifies a bimatrix game that will be played by the two respective players, and thus a player with degree dd will play dd bimatrix games simultaneously. More precisely, each player picks a strategy, and then plays this strategy in all of the bimatrix games that he is involved in. His payoff is then the sum of the payoffs that he obtains in each of the games.

Polymatrix games are a class of succinctly represented nn-player games: a polymatrix game is specified by at most n2n^{2} bimatrix games, each of which can be written down in quadratic space with respect to the number of strategies. This is unlike general nn-player strategic form games, which require a representation that is exponential in the number of players.

There has been relatively little work on approximation algorithms for polymatrix games. The approximation algorithms for general games can be applied in this setting in an obvious way, but to the best of our knowledge there have been no upper bounds that are specific to polymatrix games. On the other hand, the lower bound of Rubinstein mentioned above is actually proved by constructing polymatrix games. Thus, there is a constant but very small ϵ\epsilon such that it is PPAD\mathtt{PPAD}-hard to compute an ϵ\epsilon-Nash equilibrium Rub14b , and this again indicates that approximating polymatrix games is quite different to approximating bimatrix games.

Our contribution.

Our main result is an algorithm that, for every δ\delta in the range 0<δ0.50<\delta\leq 0.5, finds a (0.5+δ)(0.5+\delta)-Nash equilibrium of a polymatrix game in time polynomial in the input size and 1δ\frac{1}{\delta}. Note that our approximation guarantee does not depend on the number of players, which is a property that was not previously known to be achievable for polymatrix games, and still cannot be achieved for general strategic form games.

We prove this result by adapting the algorithm of Tsaknakis and Spirakis TS (henceforth referred to as the TS algorithm). They give a gradient descent algorithm for finding a 0.33930.3393-Nash equilibrium in a bimatrix game. We generalise their gradient descent techniques to the polymatrix setting, and show that it always arrives at a (0.5+δ)(0.5+\delta)-Nash equilibrium after a polynomial number of iterations.

In order to generalise the TS algorithm, we had to overcome several issues. Firstly, the TS algorithm makes the regrets of the two players equal in every iteration, but there is no obvious way to achieve this in the polymatrix setting. Instead, we show how gradient descent can be applied to a strategy profile where the regrets are not necessarily equal. Secondly, the output of the TS algorithm is either a point found by gradient descent, or a point obtained by modifying the result of gradient descent. In the polymatrix game setting, it is not immediately obvious how such a modification can be derived with a non-constant number of players (without an exponential blowup). Thus we apply a different analysis, which proves that the point resulting from gradient descent always has our approximation guarantee. It is an interesting open question whether a better approximation guarantee can be achieved when there is a constant number of players.

An interesting feature of our algorithm is that it can be applied even when players have differing degrees. Originally, polymatrix games were defined only for complete graphs H72 . Since previous work has only considered lower bounds for polymatrix games, it has been sufficient to restrict attention to regular graphs, as in work Rubinstein Rub14b . However, since this paper is proving an upper bound, we must be more careful. As it turns out, our algorithm will efficiently find a (0.5+δ)(0.5+\delta)-Nash equilibrium for all δ>0\delta>0, no matter what graph structure the polymatrix game has.

Finally, we show that our algorithm can be applied to two-player Bayesian games. In a two-player Bayesian game, each player is assigned a type according to a publicly known probability distribution. Each player knows their own type, but does not know the type of their opponent. We show that finding an ϵ\epsilon-Nash equilibrium in these games can be reduced to the problem of finding an ϵ\epsilon-Nash equilibrium in a polymatrix game, and therefore, our algorithm can be used to efficiently find a (0.5+δ)(0.5+\delta)-Nash equilibrium of a two-player Bayesian game.

Related work.

An FPTAS for the problem of computing an ϵ\epsilon-Nash equilibrium of a bimatrix game does not exist unless every problem in PPAD\mathtt{PPAD} can be solved in polynomial time CDT . Arguably, the biggest open question in equilibrium computation is whether there exists a PTAS for this problem. As we have mentioned, for any constant ϵ>0\epsilon>0, there does exist a quasi-polynomial-time algorithm for computing an ϵ\epsilon-Nash equilibria of a bimatrix game, or any game with a constant number of players LMM03 ; BBP14 , with running time kO(logk)k^{O(\log k)} for a k×kk\times k bimatrix game. Consequently, in contrast to the many-player case, it is not believed that there exists a constant ϵ\epsilon such that the problem of computing an ϵ\epsilon-Nash equilibrium of a bimatrix game (or any game with a constant number of players) is PPAD\mathtt{PPAD}-hard, since it seems unlikely that all problems in PPAD\mathtt{PPAD} have quasi-polynomial-time algorithms. On the other hand, for multi-player games, as mentioned above, there is a small constant ϵ\epsilon such that it is PPAD\mathtt{PPAD}-hard to compute an ϵ\epsilon-Nash equilibrium of an nn-player game when nn is not constant. One positive result we do have for multi-player games is that there is a PTAS for anonymous games (where the identity of players does not matter) when the number of strategies is constant DP14 .

Polymatrix games have played a central role in the reductions that have been used to show PPAD\mathtt{PPAD}-hardness of games and other equilibrium problems DGP ; CDT ; EY10 ; FT-C10 ; CPY13 . Computing an exact Nash equilibrium in a polymatrix game is PPAD\mathtt{PPAD}-hard even when all the bimatrix games played are either zero-sum games or coordination games CD11 . Polymatrix games have been used in other contexts too. For example, Govindan and Wilson proposed a (non-polynomial-time) algorithm for computing Nash equilibria of an nn-player game, by approximating the game with a sequence of polymatrix games GW04 . Later, they presented a (non-polynomial) reduction that reduces nn-player games to polymatrix games while preserving approximate Nash equilibria GW10 . Their reduction introduces a central coordinator player, who interacts bilaterally with every player.

Preliminaries

We start by fixing some notation. We use [k][k] to denote the set of integers {1,2,,k}\{1,2,\ldots,k\}, and when a universe [k][k] is clear, we will use Sˉ={i[k],iS}\bar{S}=\{i\in[k],i\notin S\} to denote the complement of S[k]S\subseteq[k]. For a kk-dimensional vector xx, we use xSx_{-S} to denote the elements of xx with with indices Sˉ\bar{S}, and in the case where S={i}S=\{i\} has only one element, we simply write xix_{-i} for xSx_{-S}.

An nn-player polymatrix game is defined by an undirected graph (V,E)(V,E) with nn vertices, where every vertex corresponds to a player. The edges of the graph specify which players interact with each other. For each i[n]i\in[n], we use N(i)={j  :  (i,j)E}N(i)=\{j\;:\;(i,j)\in E\} to denote the neighbours of player ii.

Each edge (i,j)E(i,j)\in E specifies that a bimatrix game will be played between players ii and jj. Each player i[n]i\in[n] has a fixed number of pure strategies mim_{i}, and the bimatrix game on edge (i,j)E(i,j)\in E will therefore be specified by an mi×mjm_{i}\times m_{j} matrix AijA_{ij}, which gives the payoffs for player ii, and an mj×mim_{j}\times m_{i} matrix AjiA_{ji}, which gives the payoffs for player jj. We allow the individual payoffs in each matrix to be an arbitrary (even negative) rational number. As we describe in the next subsection, we will rescale these payoffs so that the overall payoff to each player lies in the range $$.

1 Payoff Normalization

Before we continue, we must first discuss how the payoffs in the game are rescaled. It is common, when proving results about additive notions of approximate equilibria, to rescale the payoffs of the game. This is necessary in order for different results to be comparable. For example, all results about additive approximate equilibria in bimatrix games assume that the payoff matrices have entries in the range $,andthereforean, and therefore an\epsilonNashequilibriumalwayshasaconsistentmeaning.Forthesamereason,wemustrescalethepayoffsinapolymatrixinordertogiveaconsistentmeaningtoan-Nash equilibrium always has a consistent meaning. For the same reason, we must rescale the payoffs in a polymatrix in order to give a consistent meaning to an\epsilon$-approximation.

An initial, naive, approach would be to specify that each of the individual bimatrix games has entries in the range $$. This would be sufficient if we were only interested in polymatrix games played on either complete graphs or regular graphs. However, in this model, if the players have differing degrees, then they also have differing maximum payoffs. This means that an additive approximate equilibrium must pay more attention to high degree players, as they can have larger regrets.

One solution to this problem, which was adopted in the conference version of this paper DFSS14 , is to rescale according to the degree. That is, given a polymatrix game where each bimatrix game has payoffs in the range $,ifaplayerhasdegree, if a player has degreed,theneachofhispayoffmatricesisdividedby, then each of his payoff matrices is divided byd.Thistransformationensuresthateveryplayerhasregretintherange. This transformation ensures that every player has regret in the range$, and therefore low degree players are not unfairly treated by additive approximations.

However, rescaling according to the degree assumes that each bimatrix game actually uses the full range of payoffs between $.Inparticular,somebimatrixgamesmayhaveminimumpayoffstrictlygreaterthan,ormaximumpayoffstrictlylessthan. In particular, some bimatrix games may have minimum payoff strictly greater than , or maximum payoff strictly less than1$. This issue arises, in particular, in our application of two-player Bayesian games. Note that, unlike the case of a single bimatrix game, we cannot fix this by rescaling individual bimatrix games in a polymatrix game, because we must maintain the relationship between the payoffs in all of the bimatrix games that a player is involved in.

To address this, we will rescale the games so that, for each player, the minimum possible payoff is , and the maximum possible payoff is 11. For each player ii, we denote by U{U} the maximum payoff he can obtain, and by L{L} the minimum payoff he can obtain. Formally:

Then, for all ii and all jN(i)j\in N(i) we will apply the following transformation, which we call T()T(\cdot), to all the entries zz of payoff matrices AijA_{ij}:

Observe that, since player ii’s payoff is the sum of d(i){d(i)} many bimatrix games, it must be the case that after transforming the payoff matrices in this way, player ii’s maximum possible payoff is 11, and player ii’s minimum possible payoff is . For the rest of this paper, we will assume that the payoff matrices given by AijA_{ij} are rescaled in this way.

2 Approximate Nash Equilibria

A strategy profile specifies a mixed strategy for every player. We denote the set of mixed strategy profiles as Δ:=Δm1××Δmn\Delta:=\Delta_{m_{1}}\times\ldots\times\Delta_{m_{n}}. Given a strategy profile x=(x1,,xn)Δ\mathbf{x}=(x_{1},\ldots,x_{n})\in\Delta, the payoff of player ii under x\mathbf{x} is the sum of the payoffs that he obtains in each of the bimatrix games that he plays. Formally, we define:

We denote by ui(xi,x)u_{i}(x^{\prime}_{i},\mathbf{x}) the payoff for player ii when he plays xix^{\prime}_{i} and the other players play according to the strategy profile x\mathbf{x}. In some cases the first argument will be xixix_{i}-x^{\prime}_{i} which may not correspond to a valid strategy for player ii but we still apply the equation as follows:

Best responses.

Let vi(x)v_{i}(\mathbf{x}) be the vector of payoffs for each pure strategy of player ii when the rest of players play strategy profile x\mathbf{x}. Formally,

The corresponding best response payoff is given by:

Equilibria.

In order to define the exact and approximate equilibria of a polymatrix game, we first define the regret that is suffered by each player under a given strategy profile. The regret function fi:Δf_{i}:\Delta\rightarrow is defined, for each player ii, as follows:

The maximum regret under a strategy profile x\mathbf{x} is given by the function f(x)f(\mathbf{x}) where:

We say that x\mathbf{x} is an ϵ\epsilon-approximate Nash equilibrium (ϵ\epsilon-NE) if we have:

and x\mathbf{x} is an exact Nash equilibrium if we have f(x)=0f(\mathbf{x})=0.

The gradient

Our goal is to apply gradient descent to the regret function ff. In this section, we formally define the gradient of ff in Definition 1, and give a reformulation of that definition in Lemma 1. In order to show that our gradient descent method terminates after a polynomial number of iterations, we actually need to use a slightly modified version of this reformulation, which we describe at the end of this section in Definition 4.

Given a point xΔ\mathbf{x}\in\Delta, a feasible direction from x\mathbf{x} is defined by any other point xΔ\mathbf{x}^{\prime}\in\Delta. This defines a line between x\mathbf{x} and x\mathbf{x}^{\prime}, and formally speaking, the direction of this line is xx\mathbf{x}^{\prime}-\mathbf{x}. In order to define the gradient of this direction, we consider the function f((1ϵ)x+ϵx)f(x)f((1-\epsilon)\cdot\mathbf{x}+\epsilon\cdot\mathbf{x}^{\prime})-f(\mathbf{x}) where ϵ\epsilon lies in the range 0ϵ10\leq\epsilon\leq 1. The gradient of this direction is given in the following definition.

Given profiles x,xΔ\mathbf{x},\mathbf{x}^{\prime}\in\Delta and ϵ\epsilon\in, we define:

Then, we define the gradient of ff at x\mathbf{x} in the direction xx\mathbf{x}^{\prime}-\mathbf{x} as:

This is the natural definition of the gradient, but it cannot be used directly in a gradient descent algorithm. We now show how this definition can be reformulated. Firstly, for each x,xΔ\mathbf{x},\mathbf{x}^{\prime}\in\Delta, and for each player i[n]i\in[n], we define:

Next we define K(x)\mathcal{K}(\mathbf{x}) to be the set of players that have maximum regret under the strategy profile x\mathbf{x}.

Given a strategy profile x\mathbf{x}, define K(x)\mathcal{K}(\mathbf{x}) as follows:

The following lemma, which is proved in Appendix A, provides our reformulation.

The gradient of ff at point x\mathbf{x} along direction xx\mathbf{x}^{\prime}-\mathbf{x} is:

In order to show that our gradient descent algorithm terminates after a polynomial number of steps, we have to use a slight modification of the formula given in Lemma 1. More precisely, in the definition of Dfi(x,x)Df_{i}(\mathbf{x},\mathbf{x}^{\prime}), we need to take the maximum over the δ\delta-best responses, rather than the best responses.

We begin by providing the definition of the δ\delta-best responses.

We now define the function Dfiδ(x,x)Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime}).

Let x,xΔ\mathbf{x},\mathbf{x}^{\prime}\in\Delta, let ϵ\epsilon\in, and let δ(0,0.5]\delta\in(0,0.5]. We define Dfiδ(x,x)Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime}) as:

Furthermore, we define Dfδ(x,x)Df^{\delta}(\mathbf{x},\mathbf{x}^{\prime}) as:

Our algorithm works by performing gradient descent using the function DfδDf^{\delta} as the gradient. Obviously, this is a different function to DfDf, and so we are not actually performing gradient descent on the gradient of ff. It is important to note that all of our proofs are in terms of DfδDf^{\delta}, and so this does not affect the correctness of our algorithm. We proved Lemma 1 in order to explain where our definition of the gradient comes from, but the correctness of our algorithm does not depend on the correctness of Lemma 1.

The algorithm

In this section, we describe our algorithm for finding a (0.5+δ)(0.5+\delta)-Nash equilibrium in a polymatrix game by gradient descent. In each iteration of the algorithm, we must find the direction of steepest descent with respect to DfδDf^{\delta}. We show that this task can be achieved by solving a linear program, and we then use this LP to formally specify our algorithm.

We show that the direction of steepest descent can be found by solving a linear program. Our goal is, for a given strategy profile x\mathbf{x}, to find another strategy profile x\mathbf{x}^{\prime} so as to minimize the gradient Dfδ(x,x)Df^{\delta}(\mathbf{x},\mathbf{x}^{\prime}). Recall that DfδDf^{\delta} is defined in Equation (9) to be:

Note that the term f(x)f(\mathbf{x}) is a constant in this expression, because it is the same for all directions x\mathbf{x}^{\prime}. Thus, it is sufficient to formulate a linear program in order to find the x\mathbf{x}^{\prime} that minimizes maxiK(x)Dfiδ(x,x)\max_{i\in\mathcal{K}(\mathbf{x})}Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime}). Using the definition of DfiδDf^{\delta}_{i} in Equation (8), we can do this as follows.

Given a strategy profile x\mathbf{x}, the steepest descent linear program is defined as follows. Find xΔ\mathbf{x}^{\prime}\in\Delta, l1,l2,,lK(x)l_{1},l_{2},\ldots,l_{|\mathcal{K}(\mathbf{x})|}, and ww such that:

Once we have found the direction of steepest descent, we then need to move in that direction. More precisely, we fix a parameter ϵ=δδ+2\epsilon=\frac{\delta}{\delta+2} which is used to determine how far we move in the steepest descent direction. We will show in Section 6 that this value of ϵ\epsilon leads to a polynomial bound on the running time of our algorithm.

The algorithm.

We can now formally describe our algorithm. The algorithm takes a parameter δ(0,0.5]\delta\in(0,0.5], which will be used as a tradeoff between running time and the quality of approximation.

Algorithm 1 1. Choose an arbitrary strategy profile xΔ\mathbf{x}\in\Delta. 2. Solve the steepest descent linear program with input x\mathbf{x} to obtain x=Q(x)\mathbf{x}^{\prime}=Q(\mathbf{x}). 3. Set x:=x+ϵ(xx)\mathbf{x}:=\mathbf{x}+\epsilon(\mathbf{x}^{\prime}-\mathbf{x}), where ϵ=δδ+2\epsilon=\frac{\delta}{\delta+2}. 4. If f(x)0.5+δf(\mathbf{x})\leq 0.5+\delta then stop, otherwise go to step 2. A single iteration of this algorithm corresponds to executing steps 2, 3, and 4. Since this only involves solving a single linear programs, it is clear that each iteration can be completed in polynomial time.

The rest of this paper is dedicated to showing the following theorem, which is our main result.

Algorithm 1 finds a (0.5+δ)(0.5+\delta)-NE after at most O(1δ2)O(\frac{1}{\delta^{2}}) iterations.

To prove Theorem 4.1, we will show two properties. Firstly, in Section 5, we show that our gradient descent algorithm never gets stuck in a stationary point before it finds a (0.5+δ)(0.5+\delta)-NE. To do so, we define the notion of a δ\delta-stationary point, and we show that every δ\delta-stationary point is at least a (0.5+δ)(0.5+\delta)-NE, which then directly implies that the gradient descent algorithm will not get stuck before it finds a (0.5+δ)(0.5+\delta)-NE.

Secondly, in Section 6, we prove the upper bound on the number of iterations. To do this we show that, if an iteration of the algorithm starts at a point that is not a δ\delta-stationary point, then that iteration will make a large enough amount of progress. This then allows us to show that the algorithm will find a (0.5+δ)(0.5+\delta)-NE after O(1δ2)O(\frac{1}{\delta^{2}}) many iterations, and therefore the overall running time of the algorithm is polynomial.

Stationary points

Recall that Definition 5 gives a linear program for finding the direction x\mathbf{x}^{\prime} that minimises Dfδ(x,x)Df^{\delta}(\mathbf{x},\mathbf{x}^{\prime}). Our steepest descent procedure is able to make progress whenever this gradient is negative, and so a stationary point is any point x\mathbf{x} for which Dfδ(x,x)0Df^{\delta}(\mathbf{x},\mathbf{x}^{\prime})\geq 0. In fact, our analysis requires us to consider δ\delta-stationary points, which we now define.

Let x\mathbf{x}^{*} be a mixed strategy profile, and let δ>0\delta>0. We have that x\mathbf{x}^{*} is a δ\delta-stationary point if for all xΔ\mathbf{x}^{\prime}\in\Delta:

We now show that every δ\delta-stationary point of f(x)f(\mathbf{x}) is a (0.5+δ)(0.5+\delta)-NE. Recall from Definition 4 that:

Therefore, if x\mathbf{x}^{*} is a δ\delta-stationary point, we must have, for every direction x\mathbf{x}^{\prime}:

Since f(x)f(\mathbf{x}^{*}) is the maximum regret under the strategy profile x\mathbf{x}^{*}, in order to show that x\mathbf{x}^{*} is a (0.5+δ)(0.5+\delta)-NE, we only have to find some direction x\mathbf{x}^{\prime} such that that maxiK(x)Dfiδ(x,x)0.5\max_{i\in\mathcal{K}(\mathbf{x})}Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})\leq 0.5. We do this in the following lemma.

In every stationary point x\mathbf{x}^{*}, there exists a direction x\mathbf{x}^{\prime} such that:

First, define xˉ\bar{\mathbf{x}} to be a strategy profile in which each player i[n]i\in[n] plays a best response against x\mathbf{x}^{*}. We will set x=xˉ+x2\mathbf{x}^{\prime}=\frac{\bar{\mathbf{x}}+\mathbf{x}^{*}}{2}. Then for each iK(x)i\in\mathcal{K}(\mathbf{x}), we have that Dfiδ(x,x)Df^{\delta}_{i}(\mathbf{x}^{*},\mathbf{x}^{\prime}), is less than or equal to:

Thus, the point x\mathbf{x}^{\prime} satisfies maxiK(x)Dfiδ(x,x)0.5\max_{i\in\mathcal{K}(\mathbf{x})}Df^{\delta}_{i}(\mathbf{x}^{*},\mathbf{x}^{\prime})\leq 0.5. ∎

We can sum up the results of the section in the following lemma.

Every δ\delta-stationary point x\mathbf{x}^{*} is a (0.5+δ)(0.5+\delta)-Nash equilibrium.

The time complexity of the algorithm

In this section, we show that Algorithm 1 terminates after a polynomial number of iterations. Let x\mathbf{x} be a strategy profile that is considered by Algorithm 1, and let x=Q(x)\mathbf{x}^{\prime}=Q(\mathbf{x}) be the solution of the steepest descent LP for x\mathbf{x}. These two profiles will be fixed throughout this section.

We begin by proving a technical lemma that will be crucial for showing our bound on the number of iterations. To simplify our notation, throughout this section we define fnew:=f(x+ϵ(xx))f_{new}:=f(\mathbf{x}+\epsilon(\mathbf{x}^{\prime}-\mathbf{x})) and f:=f(x)f:=f(\mathbf{x}). Furthermore, we define D=maxi[n]Dfiδ(x,x)\mathcal{D}=\max_{i\in[n]}Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime}). The following lemma, which is proved in Appendix B, gives a relationship between ff and fnewf_{new}.

In every iteration of Algorithm 11 we have:

In the next lemma we prove that, if we are not in a δ\delta-stationary point, then we have a bound on the amount of progress made in each iteration. We use this in order to bound the number of iterations needed before we reach a point x\mathbf{x} where f(x)0.5+δf(\mathbf{x})\leq 0.5+\delta.

Fix ϵ=δδ+2\epsilon=\frac{\delta}{\delta+2}, where 0<δ0.50<\delta\leq 0.5. Either x\mathbf{x} is a δ\delta-stationary point or:

Recall that by Lemma 4 the gain in every iteration of the steepest descent is

Df>δ\mathcal{D}-f>-\delta. Then, by definition, we are in a δ\delta-stationary point.

Dfδ\mathcal{D}-f\leq-\delta. We have set ϵ=δδ+2\epsilon=\frac{\delta}{\delta+2}. If we solve for δ\delta we get that δ=2ϵ1ϵ\delta=\frac{2\epsilon}{1-\epsilon}. Since Dfδ\mathcal{D}-f\leq-\delta, we have that (Df)(1ϵ)2ϵ(\mathcal{D}-f)(1-\epsilon)\leq-2\epsilon. Thus we have:

Finally, using the fact that ϵ=δδ+2\epsilon=\frac{\delta}{\delta+2}, we get that

So, when the algorithm has not reached yet a δ\delta-stationary point, there is a decrease on the value of ff that is at least as large as the bound specified in (12) in every iteration of the gradient descent procedure. In the following lemma we prove that after O(1δ2)O(\frac{1}{\delta^{2}}) iterations of the steepest descent procedure the algorithm finds a point x\mathbf{x} where f(x)0.5+δf(\mathbf{x})\leq 0.5+\delta.

After O(1δ2)O(\frac{1}{\delta^{2}}) iterations of the steepest descent procedure the algorithm finds a point x\mathbf{x} where f(x)0.5+δf(\mathbf{x})\leq 0.5+\delta.

Let x1\mathbf{x}_{1}, x2\mathbf{x}_{2}, \dots, xk\mathbf{x}_{k} be the sequence of strategy profiles that are considered by Algorithm 1. Since the algorithm terminates as soon as it finds a (0.5+δ)(0.5+\delta)-NE, we have f(xi)>0.5+δf(\mathbf{x}_{i})>0.5+\delta for every i<ki<k. Therefore, for each i<ki<k we we can apply Lemma 3 to argue that xi\mathbf{x}_{i} is not a δ\delta-stationary point, which then allows us to apply Lemma 5 to obtain:

So, the amount of progress made by the algorithm in iteration ii is:

Thus, each iteration of the algorithm decreases the regret by at least (δδ+2)20.5(\frac{\delta}{\delta+2})^{2}\cdot 0.5. The algorithm starts at a point x1\mathbf{x}_{1} with f(x1)1f(\mathbf{x}_{1})\leq 1, and terminates when it reaches a point xk\mathbf{x}_{k} with f(xk)0.5+δf(\mathbf{x}_{k})\leq 0.5+\delta. Thus the total amount of progress made over all iterations of the algorithm can be at most 1(0.5+δ)1-(0.5+\delta). Therefore, the number of iterations used by the algorithm can be at most:

Since δ<1\delta<1, we have that the algorithm terminates after at most O(1δ2)O(\frac{1}{\delta^{2}}) iterations. ∎

Lemma 6 implies that that after polynomially many iterations the algorithm finds a point such that f(x)0.5+δf(\mathbf{x})\leq 0.5+\delta, and by definition such a point is a (0.5+δ)(0.5+\delta)-NE. Thus we have completed the proof of Theorem 4.1.

Application: Two-player Bayesian games

In this section, we define two-player Bayesian games, and show how our algorithm can be applied in order to efficiently find a (0.5+δ)(0.5+\delta)-Bayesian Nash equilibrium. A two-player Bayesian game is played between a row player and a column player. Each player has a set of possible types, and at the start of the game, each player is assigned a type by drawing from a known joint probability distribution. Each player learns his type, but not the type of his opponent. Our task is to find an approximate Bayesian Nash equilibrium (BNE).

We show that this can be reduced to the problem of finding an ϵ\epsilon-NE in a polymatrix game, and therefore our algorithm can be used to efficiently find a (0.5+δ)(0.5+\delta)-BNE of a two-player Bayesian game. This section is split into two parts. In the first part we formally define two-player Bayesian games, and approximate Bayesian Nash equilibria. In the second part, we give the reduction from two-player Bayesian games to polymatrix games.

We will use k1k_{1} to denote the number of pure strategies of the row player and k2k_{2} to denote the number of pure strategies of the column player. Furthermore, we will use mm to denote the number of types of the row player, and nn to denote the number of types of the column player.

For each pair of types i[m]i\in[m] and j[n]j\in[n], there is a k1×k2k_{1}\times k_{2} bimatrix game (R,C)ij:=(Rij,Cij)(R,C)_{ij}:=(R_{ij},C_{ij}) that is played when the row player has type ii and the column player has type jj. We assume that all payoffs in every matrix RijR_{ij} and every matrix CijC_{ij} lie in the range $$.

Types.

The distribution over types is specified by a joint probability distribution: for each pair of types i[m]i\in[m] and j[n]j\in[n], the probability that the row player is assigned type ii and the column player is assigned type jj is given by pijp_{ij}. Obviously, we have that:

We also define some useful shorthands: for all i[m]i\in[m] we denote by piRp^{R}_{i} (pjCp^{C}_{j}) the probability that row (column) player has type i[m]i\in[m] (j[n]j\in[n]). Formally:

Note that i=1mpiR=j=1npjC=1\sum_{i=1}^{m}p^{R}_{i}=\sum_{j=1}^{n}p^{C}_{j}=1. Furthermore, we denote by piR(j)p^{R}_{i}(j) the conditional probability that type j[n]j\in[n] will be chosen for column player given that type ii is chosen for row player. Similarly, we define pjC(i)p^{C}_{j}(i) for the column player. Formally:

We can see that for given type t=(i,j)t=(i,j) we have that pij=piRpiR(j)=pjCpjC(i)p_{ij}=p^{R}_{i}\cdot p^{R}_{i}(j)=p^{C}_{j}\cdot p^{C}_{j}(i).

Strategies.

In order to play a Bayesian game, each player must specify a strategy for each of their types. Thus, a strategy profile is a pair (x,y)(\mathbf{x},\mathbf{y}), where x=(x1,x2,,xm)\mathbf{x}=(x_{1},x_{2},\dots,x_{m}) such that each xiΔk1x_{i}\in\Delta_{k_{1}}, and where y=(y1,y2,,yn)\mathbf{y}=(y_{1},y_{2},\dots,y_{n}) such that each yiΔk2y_{i}\in\Delta_{k_{2}}. This means that, when the row player gets type i[m]i\in[m] and the column player gets type j[n]j\in[n], then the game (Rij,Cij)(R_{ij},C_{ij}) will be played, and the row player will use strategy xix_{i} while the column player will use strategy yjy_{j}.

Given a strategy profile (x,y)(\mathbf{x},\mathbf{y}), we can define the expected payoff to both players (recall that the players are not told their opponent’s type).

Given a strategy profile (x,y)(\mathbf{x},\mathbf{y}) and a type t=(i,j)t=(i,j), the expected payoff for the row player is given by:

Similarly, for the column player the expected payoff is:

Rescaling.

Before we define approximate equilibria for two-player Bayesian games, we first rescale the payoffs. Much like for polymatrix games, rescaling is needed to ensure that an ϵ\epsilon-approximate equilibrium has a consistent meaning. Our rescaling will ensure that, for every possible pair of types, both player’s expected payoff uses the entire range $$.

For each type ii of the row player, we use URi{U}^{i}_{R} to denote the maximum expected payoff for the row player when he has type ii, and we use LRi{L}^{i}_{R} to denote the minimum expected payoff for the row player when he has type ii. Formally, these are defined to be:

Then we apply the transformation TRi()T_{R}^{i}(\cdot) to every element zz of RijR_{ij}, for all types jj of the column player, where:

Similarly, we transform all payoff matrices for the column player using

where UCj{U}^{j}_{C} and LCj{L}^{j}_{C} are defined symmetrically. Note that, after this transformation has been applied, both player’s expected payoffs lie in the range $.Moreover,thefullrangeisused:thereexistsastrategyforthecolumnplayeragainstwhichoneoftherowplayersstrategieshasexpectedpayoff. Moreover, the full range is used: there exists a strategy for the column player against which one of the row player’s strategies has expected payoff1$, and there exists a strategy for the column player against which one of the row player’s strategies has expected payoff . From now on we will assume that the payoff matrices have been rescaled in this way.

We can now define approximate Bayesian Nash equilibria for a two-player Bayesian game.

Let (x,y)(\mathbf{x},\mathbf{y}) be a strategy profile. The profile (x,y)(\mathbf{x},\mathbf{y}) is an ϵ\epsilon-BNE iff the following conditions hold:

2 The reduction

In this section we reduce in polynomial time the problem of computing an ϵ\epsilon-BNE for a two-player Bayesian game B\mathcal{B} to the problem of computing an ϵ\epsilon-NE of a polymatrix game P(B)\mathcal{P(B)}. We describe the construction of P(B)\mathcal{P(B)} and prove that every ϵ\epsilon-NE for P(B)\mathcal{P(B)} maps to an ϵ\epsilon-BNE of B\mathcal{B}.

Let B\mathcal{B} be a two-player Bayesian game where the row player has mm types and k1k_{1} pure strategies and the column player has nn types and k2k_{2} pure strategies. We will construct a polymatrix game P(B)\mathcal{P(B)} as follows.

The game has m+nm+n players. We partition the set of players [m+n][m+n] into two sets: the set K={1,2,,m}K=\{1,2,\dots,m\} will represent the types of the row player in B\mathcal{B}, while the set L={m+1,m+2,,m+n}L=\{m+1,m+2,\dots,m+n\} will represent the types of the column player in B\mathcal{B}. The underlying graph that shows the interactions between the players is a complete bipartite graph G=(KL,E)G=(K\cup L,E), where every player in KK (respectively LL) plays a bimatrix game with every player in LL (respectively KK). The bimatrix game played between vertices viKv_{i}\in K and vjLv_{j}\in L is defined to be (Rij,Cij)(R^{*}_{ij},C^{*}_{ij}), where:

Observe that, for each player ii in the KK, the matrices RijR^{*}_{ij} all have the same number of rows, and for each player jLj\in L, the matrices CijC^{*}_{ij} all have the same number of columns. Thus, P(B)\mathcal{P(B)} is a valid polymatrix game. Moreover, we clearly have that P(B)\mathcal{P(B)} has the same size as the original game B\mathcal{B}. Note that, since we have assumed that the Bayesian game has been rescaled, we have that for every player in P(B)\mathcal{P(B)} the minimum (maximum) payoff achievable under pure strategy profiles is (11), so no further scaling is needed in order to apply our algorithm.

We can now prove that every ϵ\epsilon-NE of the polymatrix game is also an ϵ\epsilon-BNE of the original two-player Bayesian game, which is the main result of this section.

Every ϵ\epsilon-NE of P(B)\mathcal{P(B)} is a ϵ\epsilon-BNE for B\mathcal{B}.

Let z=(x1,,xm,y1,,yn)\mathbf{z}=(x_{1},\ldots,x_{m},y_{1},\ldots,y_{n}) be an ϵ\epsilon-NE for P(B)\mathcal{P(B)}. This mean that no player can gain more than ϵ\epsilon by unilaterally changing his strategy. We define the strategy profile (x,y)(\mathbf{x},\mathbf{y}) for B\mathcal{B} where x=(x1,,xm)\mathbf{x}=(x_{1},\dots,x_{m}) and y=(y1,,yn)\mathbf{y}=(y_{1},\dots,y_{n}), and we will show that (x,y)(\mathbf{x},\mathbf{y}) is an ϵ\epsilon-BNE for B\mathcal{B}.

Let iKi\in K be a player. Since, z\mathbf{z} is an ϵ\epsilon-NE of P(B)\mathcal{P(B)}, we have:

By construction, we can see that player ii only interacts with the players from LL. Hence his payoff can be written as:

and since we are in an ϵ\epsilon-NE, we have:

This is true for all iKi\in K, thus it is true for all i[m]i\in[m].

Similarly, every player jLj\in L interacts only with players form KK, thus:

and since we are in an ϵ\epsilon-NE we have:

and this is true for all jKj\in K, thus it is true for all j[n]j\in[n].

Combining now the fact that Equation (20) is true for all i[m]i\in[m] and that Equation (21) is true for all j[m]j\in[m], it is easy to see that the strategy profile (x,y)(\mathbf{x},\mathbf{y}) is an ϵ\epsilon-BNE for B\mathcal{B}. ∎

Applying Algorithm 1 to P(B)\mathcal{P(B)} thus gives us the following.

A (0.5+δ)(0.5+\delta)-Bayesian Nash equilibrium of a two-player Bayesian game B\mathcal{B} can be found in time polynomial in the input size of B\mathcal{B} and 1/δ1/\delta.

Conclusions and open questions

We have presented a polynomial-time algorithm that finds a (0.5+δ)(0.5+\delta)-Nash equilibrium of a polymatrix game for any δ>0\delta>0. Though we do not have examples that show that the approximation guarantee is tight for our algorithm, we do not see an obvious approach to prove a better guarantee. The initial choice of strategy profile affects our algorithm, and it is conceivable that one may be able to start the algorithm from an efficiently computable profile with certain properties that allow a better approximation guarantee. One natural special case is when there is a constant number of players, which may allow one to derive new strategy profiles from a stationary point as done by Tsaknakis and Sprirakis TS . It may also be possible to develop new techniques when the number of pure strategies available to the players is constant, or when the structure of the graph is restricted in some way. For example, in the games arising from two-player Bayesian games, the graph is always bipartite.

This paper has considered ϵ\epsilon-Nash equilibria, which are the most well-studied type of approximate equilibria. However, ϵ\epsilon-Nash equilibria have a drawback: since they only require that the expected payoff is within ϵ\epsilon of a pure best response, it is possible that a player could be required to place probability on a strategy that is arbitrarily far from being a best response. An alternative, stronger, notion is an ϵ\epsilon-well supported approximate Nash equilibrium (ϵ\epsilon-WSNE). It requires that players only place probability on strategies that have payoff within ϵ\epsilon of a pure best response. Every ϵ\epsilon-WSNE is an ϵ\epsilon-Nash, but the converse is not true. For bimatrix games, the best-known additive approximation that is achievable in polynomial time gives a \bigl{(}\frac{2}{3}-0.0047\bigr{)}-WSNE FGSS12 . It builds on the algorithm given by Kontogiannis and Spirakis that achieves a 23\frac{2}{3}-WSNE in polynomial time KS . Recently a polynomial-time algorithm with a better approximation guarantee have been given for symmetric bimatrix games CFJ14 . Note, it has been shown that there is a PTAS for finding ϵ\epsilon-WSNE of bimatrix games if and only if there is a PTAS for ϵ\epsilon-Nash DGP ; CDT . For nn-player games with n>2n>2 there has been very little work on developing algorithms for finding ϵ\epsilon-WSNE. This is a very interesting direction, both in general and when n>2n>2 is a constant.

References

Appendix A Proof of Lemma 1

Before we begin with the proof, we introduce the following notation. For a player i[n]i\in[n], given a strategy profile x\mathbf{x} and a subset of ii’s pure strategies S[mi]S\subseteq[m_{i}], we use Mi(x,S)M_{i}(\mathbf{x},S) for taking the maximum of the payoffs of ii when the others play according to x\mathbf{x}, and player ii is restricted to pick elements from SS:

In order to find the gradient, we have to calculate the variation of fif_{i} along the direction xx\mathbf{x}^{\prime}-\mathbf{x}, by evaluating f(xˉ)f(\bar{\mathbf{x}}) for points xˉ\bar{\mathbf{x}} of the form

Recall from (4), that for xˉΔ\bar{\mathbf{x}}\in\Delta we have that fi(xˉ):=ui(xˉ)ui(xˉ)f_{i}(\bar{\mathbf{x}}):=u_{i}^{*}(\bar{\mathbf{x}})-u_{i}(\bar{\mathbf{x}}). In order to rewrite ui(xˉ)u_{i}^{*}(\bar{\mathbf{x}}) we introduce notation Λi(x,x,ϵ)\Lambda_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon) as follows.

In the following technical lemma we provide an expression for ui(xˉ)u_{i}^{*}(\bar{\mathbf{x}}). In order to rewrite ui(xˉ)u_{i}^{*}(\bar{\mathbf{x}}), we use the following simple observation. Consider a multiset of numbers {a1,,an}\{a_{1},\ldots,a_{n}\}, and the index sets S[n]S\subseteq[n] and Sˉ=[n]S\bar{S}=[n]\setminus S. We have the following identity:

We will use the expression (24) for ui(xˉ)u_{i}^{*}(\bar{\mathbf{x}}), along with the following reformulation of ui(xˉ)u_{i}(\bar{\mathbf{x}}):

We now use these reformulations to prove the following lemma.

We have that fi(xˉ)f(x)f_{i}(\bar{\mathbf{x}})-f(\mathbf{x}) is equal to:

Recall from (4) that fi(x)=Mi(x,S)ui(x)f_{i}(\mathbf{x})=M_{i}(\mathbf{x},S)-u_{i}(\mathbf{x}), so the formula above is equal to:

Using now (6) for Dfi(x,x)Df_{i}(\mathbf{x},\mathbf{x}^{\prime}), the above formula becomes:

Recall now that f(x)=maxj[n]fj(x)f(\mathbf{x})=\max_{j\in[n]}f_{j}(\mathbf{x}). Thus the term f(x)fi(x)f(\mathbf{x})-f_{i}(\mathbf{x}) can be written as \max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}. So, the expression above is equivalent to

Now we are ready to prove Lemma 1. Recall from definition 1 for the gradient that

This is true from the definition of pure best response strategies. So, from equation (22) for Λi(x,x,ϵ)\Lambda_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon) it is true that limϵ0Λi(x,x,ϵ)=0\lim_{\epsilon\rightarrow 0}\Lambda_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)=0.

Furthermore, the term ϵ2ui(xx)\epsilon^{2}\cdot u_{i}(\mathbf{x}^{\prime}-\mathbf{x}) when is divided by ϵ\epsilon equals to ϵui(xx)\epsilon\cdot u_{i}(\mathbf{x}^{\prime}-\mathbf{x}), thus \lim_{\epsilon\rightarrow 0}\bigl{(}\epsilon\cdot u_{i}(\mathbf{x}^{\prime}-\mathbf{x})\bigr{)}=0.

is either 0 when fi(x)=f(x)f_{i}(\mathbf{x})=f(\mathbf{x}), i.e player ii has the maximum regret and \max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}=0, or -\infty otherwise, because \max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}>0.

To sum up, if fi(x)f_{i}(\mathbf{x}) achieves the maximum regret at point x\mathbf{x}^{\prime}, then the limit \lim_{\epsilon\rightarrow 0}\bigl{(}f_{i}(\bar{\mathbf{x}})-f(\mathbf{x})\bigr{)}=Df_{i}(\mathbf{x},\mathbf{x}^{\prime})-f(\mathbf{x}), otherwise the limit equals -\infty.

From (26) for the gradient we want the maximum of these quantities, thus we have the claimed result.

Appendix B Proof of Lemma 4

Throughout this proof, x,x,xˉ\mathbf{x},\mathbf{x}^{\prime},\bar{\mathbf{x}}, and ϵ\epsilon will be fixed as they are defined in Section 6. In order to prove this lemma, we must show a bound on:

Before we start the analysis we need to redefine the term Λiδ(x,x,ϵ)\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon) in order to prove an analogous version of Lemma 7 when δ\delta-best responses are used.

We define Λiδ(x,x,ϵ)\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon) as:

We now use this definition to prove the following lemma.

We will use the reformulation from Equation (25) for ui(xˉ)u_{i}(\bar{\mathbf{x}}):

The correctness of this was proved in Appendix A. Now we use all the these reformulations in order to prove the following lemma.

We have that fi(xˉ)f(x)f_{i}(\bar{\mathbf{x}})-f(\mathbf{x}) is less than or equal to:

Recall that, by definition, we have that:

Thus, we can apply Lemma 9 along with the reformulation given in Equation (29) for ui(xˉ)u_{i}(\bar{\mathbf{x}}) to prove that fi(xˉ)f(x)f_{i}(\bar{\mathbf{x}})-f(\mathbf{x}) is less than or equal to:

Having shown Lemma 10, we will now study each term of (30) and provide bounds for each of them. To begin with, it is easy to see that for all i[n]i\in[n] we have that \max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}\geq 0, and since ϵ<1\epsilon<1, we have that (1-\epsilon)\max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}\geq 0. Thus, Equation (30) is less than or equal to:

Next we consider the term Λiδ(x,x,ϵ)\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon). In the following technical lemma we prove that Λiδ(x,x,ϵ)=0\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)=0 for all i[n]i\in[n].

We have Λiδ(x,x,ϵ)=0\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)=0 for all i[n]i\in[n].

According to equation (27) for Λiδ(x,x,ϵ)\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon), we have:

We can rewrite this expression as follows. First define:

We now substitute these two bounds into the definition of Z(x,x,ϵ,k)Z(\mathbf{x},\mathbf{x}^{\prime},\epsilon,k). We have:

Next we consider the term ui(xx)u_{i}(\mathbf{x}^{\prime}-\mathbf{x}) in Equation (31). The following lemma provides a simple lower bound for this term.

For all i[n]i\in[n], we have Dfiδ(x,x)1ui(xx)Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})-1\leq u_{i}(\mathbf{x}^{\prime}-\mathbf{x}).

For ui(xx)u_{i}(\mathbf{x}^{\prime}-\mathbf{x}) we have the following:

Recall that D=maxi[n]Dfiδ(x,x)\mathcal{D}=\max_{i\in[n]}Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime}) and fnew=f(xˉ)f_{new}=f(\bar{\mathbf{x}}) and f=f(x)f=f(\mathbf{x}). We can now apply the bounds from Lemma 11 and Lemma 12 to Equation (31) to obtain: