Computing Approximate Nash Equilibria in Polymatrix Games

Argyrios Deligkas, John Fearnley, Rahul Savani, Paul Spirakis

Introduction

Nash equilibria are the central solution concept in game theory. Since it is known that computing an exact Nash equilibrium DGP ; CDT is unlikely to be achievable in polynomial time, a line of work has arisen that studies the computational aspects of approximate Nash equilibria. The most widely studied notion is of an $\epsilon$ -approximate Nash equilibrium ( $\epsilon$ -Nash), which requires that all players have an expected payoff that is within $\epsilon$ of a best response. This is an additive notion of approximate equilibrium; the problem of computing approximate equilibria of bimatrix games using a relative notion of approximation is known to be $\mathtt{PPAD}$ -hard even for constant approximations Das13 .

So far, $\epsilon$ -Nash equilibria have mainly been studied in the context of two-player bimatrix games. A line of work DMP ; Progress ; BBM10 has investigated the best $\epsilon$ that can be guaranteed in polynomial time for bimatrix games. The current best result, due to Tsaknakis and Spirakis TS , is a polynomial-time algorithm that finds a 0.3393-Nash equilibrium of a bimatrix game with all payoffs in $$.

In this paper, we study $\epsilon$ -Nash equilibria in the context of many-player games, a topic that has received much less attention. A simple approximation algorithm for many-player games can be obtained by generalising the algorithm of Daskalakis, Mehta and Papadimitriou DMP from the two-player setting to the $n$ -player setting, which provides a guarantee of $\epsilon=1-\frac{1}{n}$ . This has since been improved independently by three sets of authors BGR08 ; HRS08 ; BBM10 . They provide a method that converts a polynomial-time algorithm that for finding $\epsilon$ -Nash equilibria in $(n-1)$ -player games into an algorithm that finds a $\frac{1}{2-\epsilon}$ -Nash equilibrium in $n$ -player games. Using the polynomial-time $0.3393$ algorithm of Tsaknakis and Spirakis TS for $2$ -player games as the base case for this recursion, this allows us to provide polynomial-time algorithms with approximation guarantees of $0.6022$ in $3$ -player games, and $0.7153$ in $4$ -player games. These guarantees tend to $1$ as $n$ increases, and so far, no constant $\epsilon<1$ is known such that, for all $n$ , an $\epsilon$ -Nash equilibrium of an $n$ -player game can be computed in polynomial time.

For $n$ -player games, we have lower bounds for $\epsilon$ -Nash equilibria. More precisely, Rubinstein has shown that when $n$ is not a constant there exists a constant but very small $\epsilon$ such that it is $\mathtt{PPAD}$ -hard to compute an $\epsilon$ -Nash equilibrium Rub14b . This is quite different from the bimatrix game setting, where the existence of a quasi-polynomial time approximation scheme rules out such a lower bound, unless all of $\mathtt{PPAD}$ can be solved in quasi-polynomial time LMM03 .

Polymatrix games.

In this paper, we focus on a particular class of many-player games called polymatrix games. In a polymatrix game, the interaction between the players is specified by an $n$ vertex graph, where each vertex represents one of the players. Each edge of the graph specifies a bimatrix game that will be played by the two respective players, and thus a player with degree $d$ will play $d$ bimatrix games simultaneously. More precisely, each player picks a strategy, and then plays this strategy in all of the bimatrix games that he is involved in. His payoff is then the sum of the payoffs that he obtains in each of the games.

Polymatrix games are a class of succinctly represented $n$ -player games: a polymatrix game is specified by at most $n^{2}$ bimatrix games, each of which can be written down in quadratic space with respect to the number of strategies. This is unlike general $n$ -player strategic form games, which require a representation that is exponential in the number of players.

There has been relatively little work on approximation algorithms for polymatrix games. The approximation algorithms for general games can be applied in this setting in an obvious way, but to the best of our knowledge there have been no upper bounds that are specific to polymatrix games. On the other hand, the lower bound of Rubinstein mentioned above is actually proved by constructing polymatrix games. Thus, there is a constant but very small $\epsilon$ such that it is $\mathtt{PPAD}$ -hard to compute an $\epsilon$ -Nash equilibrium Rub14b , and this again indicates that approximating polymatrix games is quite different to approximating bimatrix games.

Our contribution.

Our main result is an algorithm that, for every $\delta$ in the range $0<\delta\leq 0.5$ , finds a $(0.5+\delta)$ -Nash equilibrium of a polymatrix game in time polynomial in the input size and $\frac{1}{\delta}$ . Note that our approximation guarantee does not depend on the number of players, which is a property that was not previously known to be achievable for polymatrix games, and still cannot be achieved for general strategic form games.

We prove this result by adapting the algorithm of Tsaknakis and Spirakis TS (henceforth referred to as the TS algorithm). They give a gradient descent algorithm for finding a $0.3393$ -Nash equilibrium in a bimatrix game. We generalise their gradient descent techniques to the polymatrix setting, and show that it always arrives at a $(0.5+\delta)$ -Nash equilibrium after a polynomial number of iterations.

In order to generalise the TS algorithm, we had to overcome several issues. Firstly, the TS algorithm makes the regrets of the two players equal in every iteration, but there is no obvious way to achieve this in the polymatrix setting. Instead, we show how gradient descent can be applied to a strategy profile where the regrets are not necessarily equal. Secondly, the output of the TS algorithm is either a point found by gradient descent, or a point obtained by modifying the result of gradient descent. In the polymatrix game setting, it is not immediately obvious how such a modification can be derived with a non-constant number of players (without an exponential blowup). Thus we apply a different analysis, which proves that the point resulting from gradient descent always has our approximation guarantee. It is an interesting open question whether a better approximation guarantee can be achieved when there is a constant number of players.

An interesting feature of our algorithm is that it can be applied even when players have differing degrees. Originally, polymatrix games were defined only for complete graphs H72 . Since previous work has only considered lower bounds for polymatrix games, it has been sufficient to restrict attention to regular graphs, as in work Rubinstein Rub14b . However, since this paper is proving an upper bound, we must be more careful. As it turns out, our algorithm will efficiently find a $(0.5+\delta)$ -Nash equilibrium for all $\delta>0$ , no matter what graph structure the polymatrix game has.

Finally, we show that our algorithm can be applied to two-player Bayesian games. In a two-player Bayesian game, each player is assigned a type according to a publicly known probability distribution. Each player knows their own type, but does not know the type of their opponent. We show that finding an $\epsilon$ -Nash equilibrium in these games can be reduced to the problem of finding an $\epsilon$ -Nash equilibrium in a polymatrix game, and therefore, our algorithm can be used to efficiently find a $(0.5+\delta)$ -Nash equilibrium of a two-player Bayesian game.

Related work.

An FPTAS for the problem of computing an $\epsilon$ -Nash equilibrium of a bimatrix game does not exist unless every problem in $\mathtt{PPAD}$ can be solved in polynomial time CDT . Arguably, the biggest open question in equilibrium computation is whether there exists a PTAS for this problem. As we have mentioned, for any constant $\epsilon>0$ , there does exist a quasi-polynomial-time algorithm for computing an $\epsilon$ -Nash equilibria of a bimatrix game, or any game with a constant number of players LMM03 ; BBP14 , with running time $k^{O(\log k)}$ for a $k\times k$ bimatrix game. Consequently, in contrast to the many-player case, it is not believed that there exists a constant $\epsilon$ such that the problem of computing an $\epsilon$ -Nash equilibrium of a bimatrix game (or any game with a constant number of players) is $\mathtt{PPAD}$ -hard, since it seems unlikely that all problems in $\mathtt{PPAD}$ have quasi-polynomial-time algorithms. On the other hand, for multi-player games, as mentioned above, there is a small constant $\epsilon$ such that it is $\mathtt{PPAD}$ -hard to compute an $\epsilon$ -Nash equilibrium of an $n$ -player game when $n$ is not constant. One positive result we do have for multi-player games is that there is a PTAS for anonymous games (where the identity of players does not matter) when the number of strategies is constant DP14 .

Polymatrix games have played a central role in the reductions that have been used to show $\mathtt{PPAD}$ -hardness of games and other equilibrium problems DGP ; CDT ; EY10 ; FT-C10 ; CPY13 . Computing an exact Nash equilibrium in a polymatrix game is $\mathtt{PPAD}$ -hard even when all the bimatrix games played are either zero-sum games or coordination games CD11 . Polymatrix games have been used in other contexts too. For example, Govindan and Wilson proposed a (non-polynomial-time) algorithm for computing Nash equilibria of an $n$ -player game, by approximating the game with a sequence of polymatrix games GW04 . Later, they presented a (non-polynomial) reduction that reduces $n$ -player games to polymatrix games while preserving approximate Nash equilibria GW10 . Their reduction introduces a central coordinator player, who interacts bilaterally with every player.

Preliminaries

We start by fixing some notation. We use $[k]$ to denote the set of integers $\{1,2,\ldots,k\}$ , and when a universe $[k]$ is clear, we will use $\bar{S}=\{i\in[k],i\notin S\}$ to denote the complement of $S\subseteq[k]$ . For a $k$ -dimensional vector $x$ , we use $x_{-S}$ to denote the elements of $x$ with with indices $\bar{S}$ , and in the case where $S=\{i\}$ has only one element, we simply write $x_{-i}$ for $x_{-S}$ .

An $n$ -player polymatrix game is defined by an undirected graph $(V,E)$ with $n$ vertices, where every vertex corresponds to a player. The edges of the graph specify which players interact with each other. For each $i\in[n]$ , we use $N(i)=\{j\;:\;(i,j)\in E\}$ to denote the neighbours of player $i$ .

Each edge $(i,j)\in E$ specifies that a bimatrix game will be played between players $i$ and $j$ . Each player $i\in[n]$ has a fixed number of pure strategies $m_{i}$ , and the bimatrix game on edge $(i,j)\in E$ will therefore be specified by an $m_{i}\times m_{j}$ matrix $A_{ij}$ , which gives the payoffs for player $i$ , and an $m_{j}\times m_{i}$ matrix $A_{ji}$ , which gives the payoffs for player $j$ . We allow the individual payoffs in each matrix to be an arbitrary (even negative) rational number. As we describe in the next subsection, we will rescale these payoffs so that the overall payoff to each player lies in the range $$.

1 Payoff Normalization

Before we continue, we must first discuss how the payoffs in the game are rescaled. It is common, when proving results about additive notions of approximate equilibria, to rescale the payoffs of the game. This is necessary in order for different results to be comparable. For example, all results about additive approximate equilibria in bimatrix games assume that the payoff matrices have entries in the range $ $, and therefore an$ \epsilon $-Nash equilibrium always has a consistent meaning. For the same reason, we must rescale the payoffs in a polymatrix in order to give a consistent meaning to an$ \epsilon$-approximation.

An initial, naive, approach would be to specify that each of the individual bimatrix games has entries in the range $$. This would be sufficient if we were only interested in polymatrix games played on either complete graphs or regular graphs. However, in this model, if the players have differing degrees, then they also have differing maximum payoffs. This means that an additive approximate equilibrium must pay more attention to high degree players, as they can have larger regrets.

One solution to this problem, which was adopted in the conference version of this paper DFSS14 , is to rescale according to the degree. That is, given a polymatrix game where each bimatrix game has payoffs in the range $ $, if a player has degree$ d $, then each of his payoff matrices is divided by$ d $. This transformation ensures that every player has regret in the range$ $, and therefore low degree players are not unfairly treated by additive approximations.

However, rescaling according to the degree assumes that each bimatrix game actually uses the full range of payoffs between $ $. In particular, some bimatrix games may have minimum payoff strictly greater than , or maximum payoff strictly less than$ 1$. This issue arises, in particular, in our application of two-player Bayesian games. Note that, unlike the case of a single bimatrix game, we cannot fix this by rescaling individual bimatrix games in a polymatrix game, because we must maintain the relationship between the payoffs in all of the bimatrix games that a player is involved in.

To address this, we will rescale the games so that, for each player, the minimum possible payoff is , and the maximum possible payoff is $1$ . For each player $i$ , we denote by ${U}$ the maximum payoff he can obtain, and by ${L}$ the minimum payoff he can obtain. Formally:

Then, for all $i$ and all $j\in N(i)$ we will apply the following transformation, which we call $T(\cdot)$ , to all the entries $z$ of payoff matrices $A_{ij}$ :

Observe that, since player $i$ ’s payoff is the sum of ${d(i)}$ many bimatrix games, it must be the case that after transforming the payoff matrices in this way, player $i$ ’s maximum possible payoff is $1$ , and player $i$ ’s minimum possible payoff is . For the rest of this paper, we will assume that the payoff matrices given by $A_{ij}$ are rescaled in this way.

2 Approximate Nash Equilibria

A strategy profile specifies a mixed strategy for every player. We denote the set of mixed strategy profiles as $\Delta:=\Delta_{m_{1}}\times\ldots\times\Delta_{m_{n}}$ . Given a strategy profile $\mathbf{x}=(x_{1},\ldots,x_{n})\in\Delta$ , the payoff of player $i$ under $\mathbf{x}$ is the sum of the payoffs that he obtains in each of the bimatrix games that he plays. Formally, we define:

We denote by $u_{i}(x^{\prime}_{i},\mathbf{x})$ the payoff for player $i$ when he plays $x^{\prime}_{i}$ and the other players play according to the strategy profile $\mathbf{x}$ . In some cases the first argument will be $x_{i}-x^{\prime}_{i}$ which may not correspond to a valid strategy for player $i$ but we still apply the equation as follows:

Best responses.

Let $v_{i}(\mathbf{x})$ be the vector of payoffs for each pure strategy of player $i$ when the rest of players play strategy profile $\mathbf{x}$ . Formally,

The corresponding best response payoff is given by:

Equilibria.

In order to define the exact and approximate equilibria of a polymatrix game, we first define the regret that is suffered by each player under a given strategy profile. The regret function $f_{i}:\Delta\rightarrow$ is defined, for each player $i$ , as follows:

The maximum regret under a strategy profile $\mathbf{x}$ is given by the function $f(\mathbf{x})$ where:

We say that $\mathbf{x}$ is an $\epsilon$ -approximate Nash equilibrium ( $\epsilon$ -NE) if we have:

and $\mathbf{x}$ is an exact Nash equilibrium if we have $f(\mathbf{x})=0$ .

The gradient

Our goal is to apply gradient descent to the regret function $f$ . In this section, we formally define the gradient of $f$ in Definition 1, and give a reformulation of that definition in Lemma 1. In order to show that our gradient descent method terminates after a polynomial number of iterations, we actually need to use a slightly modified version of this reformulation, which we describe at the end of this section in Definition 4.

Given a point $\mathbf{x}\in\Delta$ , a feasible direction from $\mathbf{x}$ is defined by any other point $\mathbf{x}^{\prime}\in\Delta$ . This defines a line between $\mathbf{x}$ and $\mathbf{x}^{\prime}$ , and formally speaking, the direction of this line is $\mathbf{x}^{\prime}-\mathbf{x}$ . In order to define the gradient of this direction, we consider the function $f((1-\epsilon)\cdot\mathbf{x}+\epsilon\cdot\mathbf{x}^{\prime})-f(\mathbf{x})$ where $\epsilon$ lies in the range $0\leq\epsilon\leq 1$ . The gradient of this direction is given in the following definition.

Given profiles $\mathbf{x},\mathbf{x}^{\prime}\in\Delta$ and $\epsilon\in$ , we define:

Then, we define the gradient of $f$ at $\mathbf{x}$ in the direction $\mathbf{x}^{\prime}-\mathbf{x}$ as:

This is the natural definition of the gradient, but it cannot be used directly in a gradient descent algorithm. We now show how this definition can be reformulated. Firstly, for each $\mathbf{x},\mathbf{x}^{\prime}\in\Delta$ , and for each player $i\in[n]$ , we define:

Next we define $\mathcal{K}(\mathbf{x})$ to be the set of players that have maximum regret under the strategy profile $\mathbf{x}$ .

Given a strategy profile $\mathbf{x}$ , define $\mathcal{K}(\mathbf{x})$ as follows:

The following lemma, which is proved in Appendix A, provides our reformulation.

The gradient of $f$ at point $\mathbf{x}$ along direction $\mathbf{x}^{\prime}-\mathbf{x}$ is:

In order to show that our gradient descent algorithm terminates after a polynomial number of steps, we have to use a slight modification of the formula given in Lemma 1. More precisely, in the definition of $Df_{i}(\mathbf{x},\mathbf{x}^{\prime})$ , we need to take the maximum over the $\delta$ -best responses, rather than the best responses.

We begin by providing the definition of the $\delta$ -best responses.

We now define the function $Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})$ .

Let $\mathbf{x},\mathbf{x}^{\prime}\in\Delta$ , let $\epsilon\in$ , and let $\delta\in(0,0.5]$ . We define $Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})$ as:

Furthermore, we define $Df^{\delta}(\mathbf{x},\mathbf{x}^{\prime})$ as:

Our algorithm works by performing gradient descent using the function $Df^{\delta}$ as the gradient. Obviously, this is a different function to $Df$ , and so we are not actually performing gradient descent on the gradient of $f$ . It is important to note that all of our proofs are in terms of $Df^{\delta}$ , and so this does not affect the correctness of our algorithm. We proved Lemma 1 in order to explain where our definition of the gradient comes from, but the correctness of our algorithm does not depend on the correctness of Lemma 1.

The algorithm

In this section, we describe our algorithm for finding a $(0.5+\delta)$ -Nash equilibrium in a polymatrix game by gradient descent. In each iteration of the algorithm, we must find the direction of steepest descent with respect to $Df^{\delta}$ . We show that this task can be achieved by solving a linear program, and we then use this LP to formally specify our algorithm.

We show that the direction of steepest descent can be found by solving a linear program. Our goal is, for a given strategy profile $\mathbf{x}$ , to find another strategy profile $\mathbf{x}^{\prime}$ so as to minimize the gradient $Df^{\delta}(\mathbf{x},\mathbf{x}^{\prime})$ . Recall that $Df^{\delta}$ is defined in Equation (9) to be:

Note that the term $f(\mathbf{x})$ is a constant in this expression, because it is the same for all directions $\mathbf{x}^{\prime}$ . Thus, it is sufficient to formulate a linear program in order to find the $\mathbf{x}^{\prime}$ that minimizes $\max_{i\in\mathcal{K}(\mathbf{x})}Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})$ . Using the definition of $Df^{\delta}_{i}$ in Equation (8), we can do this as follows.

Given a strategy profile $\mathbf{x}$ , the steepest descent linear program is defined as follows. Find $\mathbf{x}^{\prime}\in\Delta$ , $l_{1},l_{2},\ldots,l_{|\mathcal{K}(\mathbf{x})|}$ , and $w$ such that:

Once we have found the direction of steepest descent, we then need to move in that direction. More precisely, we fix a parameter $\epsilon=\frac{\delta}{\delta+2}$ which is used to determine how far we move in the steepest descent direction. We will show in Section 6 that this value of $\epsilon$ leads to a polynomial bound on the running time of our algorithm.

The algorithm.

We can now formally describe our algorithm. The algorithm takes a parameter $\delta\in(0,0.5]$ , which will be used as a tradeoff between running time and the quality of approximation.

Algorithm 1 1. Choose an arbitrary strategy profile $\mathbf{x}\in\Delta$ . 2. Solve the steepest descent linear program with input $\mathbf{x}$ to obtain $\mathbf{x}^{\prime}=Q(\mathbf{x})$ . 3. Set $\mathbf{x}:=\mathbf{x}+\epsilon(\mathbf{x}^{\prime}-\mathbf{x})$ , where $\epsilon=\frac{\delta}{\delta+2}$ . 4. If $f(\mathbf{x})\leq 0.5+\delta$ then stop, otherwise go to step 2. A single iteration of this algorithm corresponds to executing steps 2, 3, and 4. Since this only involves solving a single linear programs, it is clear that each iteration can be completed in polynomial time.

The rest of this paper is dedicated to showing the following theorem, which is our main result.

Algorithm 1 finds a $(0.5+\delta)$ -NE after at most $O(\frac{1}{\delta^{2}})$ iterations.

To prove Theorem 4.1, we will show two properties. Firstly, in Section 5, we show that our gradient descent algorithm never gets stuck in a stationary point before it finds a $(0.5+\delta)$ -NE. To do so, we define the notion of a $\delta$ -stationary point, and we show that every $\delta$ -stationary point is at least a $(0.5+\delta)$ -NE, which then directly implies that the gradient descent algorithm will not get stuck before it finds a $(0.5+\delta)$ -NE.

Secondly, in Section 6, we prove the upper bound on the number of iterations. To do this we show that, if an iteration of the algorithm starts at a point that is not a $\delta$ -stationary point, then that iteration will make a large enough amount of progress. This then allows us to show that the algorithm will find a $(0.5+\delta)$ -NE after $O(\frac{1}{\delta^{2}})$ many iterations, and therefore the overall running time of the algorithm is polynomial.

Stationary points

Recall that Definition 5 gives a linear program for finding the direction $\mathbf{x}^{\prime}$ that minimises $Df^{\delta}(\mathbf{x},\mathbf{x}^{\prime})$ . Our steepest descent procedure is able to make progress whenever this gradient is negative, and so a stationary point is any point $\mathbf{x}$ for which $Df^{\delta}(\mathbf{x},\mathbf{x}^{\prime})\geq 0$ . In fact, our analysis requires us to consider $\delta$ -stationary points, which we now define.

Let $\mathbf{x}^{*}$ be a mixed strategy profile, and let $\delta>0$ . We have that $\mathbf{x}^{*}$ is a $\delta$ -stationary point if for all $\mathbf{x}^{\prime}\in\Delta$ :

We now show that every $\delta$ -stationary point of $f(\mathbf{x})$ is a $(0.5+\delta)$ -NE. Recall from Definition 4 that:

Therefore, if $\mathbf{x}^{*}$ is a $\delta$ -stationary point, we must have, for every direction $\mathbf{x}^{\prime}$ :

Since $f(\mathbf{x}^{*})$ is the maximum regret under the strategy profile $\mathbf{x}^{*}$ , in order to show that $\mathbf{x}^{*}$ is a $(0.5+\delta)$ -NE, we only have to find some direction $\mathbf{x}^{\prime}$ such that that $\max_{i\in\mathcal{K}(\mathbf{x})}Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})\leq 0.5$ . We do this in the following lemma.

In every stationary point $\mathbf{x}^{*}$ , there exists a direction $\mathbf{x}^{\prime}$ such that:

First, define $\bar{\mathbf{x}}$ to be a strategy profile in which each player $i\in[n]$ plays a best response against $\mathbf{x}^{*}$ . We will set $\mathbf{x}^{\prime}=\frac{\bar{\mathbf{x}}+\mathbf{x}^{*}}{2}$ . Then for each $i\in\mathcal{K}(\mathbf{x})$ , we have that $Df^{\delta}_{i}(\mathbf{x}^{*},\mathbf{x}^{\prime})$ , is less than or equal to:

Thus, the point $\mathbf{x}^{\prime}$ satisfies $\max_{i\in\mathcal{K}(\mathbf{x})}Df^{\delta}_{i}(\mathbf{x}^{*},\mathbf{x}^{\prime})\leq 0.5$ . ∎

We can sum up the results of the section in the following lemma.

Every $\delta$ -stationary point $\mathbf{x}^{*}$ is a $(0.5+\delta)$ -Nash equilibrium.

The time complexity of the algorithm

In this section, we show that Algorithm 1 terminates after a polynomial number of iterations. Let $\mathbf{x}$ be a strategy profile that is considered by Algorithm 1, and let $\mathbf{x}^{\prime}=Q(\mathbf{x})$ be the solution of the steepest descent LP for $\mathbf{x}$ . These two profiles will be fixed throughout this section.

We begin by proving a technical lemma that will be crucial for showing our bound on the number of iterations. To simplify our notation, throughout this section we define $f_{new}:=f(\mathbf{x}+\epsilon(\mathbf{x}^{\prime}-\mathbf{x}))$ and $f:=f(\mathbf{x})$ . Furthermore, we define $\mathcal{D}=\max_{i\in[n]}Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})$ . The following lemma, which is proved in Appendix B, gives a relationship between $f$ and $f_{new}$ .

In every iteration of Algorithm $1$ we have:

In the next lemma we prove that, if we are not in a $\delta$ -stationary point, then we have a bound on the amount of progress made in each iteration. We use this in order to bound the number of iterations needed before we reach a point $\mathbf{x}$ where $f(\mathbf{x})\leq 0.5+\delta$ .

Fix $\epsilon=\frac{\delta}{\delta+2}$ , where $0<\delta\leq 0.5$ . Either $\mathbf{x}$ is a $\delta$ -stationary point or:

Recall that by Lemma 4 the gain in every iteration of the steepest descent is

$\mathcal{D}-f>-\delta$ . Then, by definition, we are in a $\delta$ -stationary point.

$\mathcal{D}-f\leq-\delta$ . We have set $\epsilon=\frac{\delta}{\delta+2}$ . If we solve for $\delta$ we get that $\delta=\frac{2\epsilon}{1-\epsilon}$ . Since $\mathcal{D}-f\leq-\delta$ , we have that $(\mathcal{D}-f)(1-\epsilon)\leq-2\epsilon$ . Thus we have:

Finally, using the fact that $\epsilon=\frac{\delta}{\delta+2}$ , we get that

So, when the algorithm has not reached yet a $\delta$ -stationary point, there is a decrease on the value of $f$ that is at least as large as the bound specified in (12) in every iteration of the gradient descent procedure. In the following lemma we prove that after $O(\frac{1}{\delta^{2}})$ iterations of the steepest descent procedure the algorithm finds a point $\mathbf{x}$ where $f(\mathbf{x})\leq 0.5+\delta$ .

After $O(\frac{1}{\delta^{2}})$ iterations of the steepest descent procedure the algorithm finds a point $\mathbf{x}$ where $f(\mathbf{x})\leq 0.5+\delta$ .

Let $\mathbf{x}_{1}$ , $\mathbf{x}_{2}$ , $\dots$ , $\mathbf{x}_{k}$ be the sequence of strategy profiles that are considered by Algorithm 1. Since the algorithm terminates as soon as it finds a $(0.5+\delta)$ -NE, we have $f(\mathbf{x}_{i})>0.5+\delta$ for every $i<k$ . Therefore, for each $i<k$ we we can apply Lemma 3 to argue that $\mathbf{x}_{i}$ is not a $\delta$ -stationary point, which then allows us to apply Lemma 5 to obtain:

So, the amount of progress made by the algorithm in iteration $i$ is:

Thus, each iteration of the algorithm decreases the regret by at least $(\frac{\delta}{\delta+2})^{2}\cdot 0.5$ . The algorithm starts at a point $\mathbf{x}_{1}$ with $f(\mathbf{x}_{1})\leq 1$ , and terminates when it reaches a point $\mathbf{x}_{k}$ with $f(\mathbf{x}_{k})\leq 0.5+\delta$ . Thus the total amount of progress made over all iterations of the algorithm can be at most $1-(0.5+\delta)$ . Therefore, the number of iterations used by the algorithm can be at most:

Since $\delta<1$ , we have that the algorithm terminates after at most $O(\frac{1}{\delta^{2}})$ iterations. ∎

Lemma 6 implies that that after polynomially many iterations the algorithm finds a point such that $f(\mathbf{x})\leq 0.5+\delta$ , and by definition such a point is a $(0.5+\delta)$ -NE. Thus we have completed the proof of Theorem 4.1.

Application: Two-player Bayesian games

In this section, we define two-player Bayesian games, and show how our algorithm can be applied in order to efficiently find a $(0.5+\delta)$ -Bayesian Nash equilibrium. A two-player Bayesian game is played between a row player and a column player. Each player has a set of possible types, and at the start of the game, each player is assigned a type by drawing from a known joint probability distribution. Each player learns his type, but not the type of his opponent. Our task is to find an approximate Bayesian Nash equilibrium (BNE).

We show that this can be reduced to the problem of finding an $\epsilon$ -NE in a polymatrix game, and therefore our algorithm can be used to efficiently find a $(0.5+\delta)$ -BNE of a two-player Bayesian game. This section is split into two parts. In the first part we formally define two-player Bayesian games, and approximate Bayesian Nash equilibria. In the second part, we give the reduction from two-player Bayesian games to polymatrix games.

We will use $k_{1}$ to denote the number of pure strategies of the row player and $k_{2}$ to denote the number of pure strategies of the column player. Furthermore, we will use $m$ to denote the number of types of the row player, and $n$ to denote the number of types of the column player.

For each pair of types $i\in[m]$ and $j\in[n]$ , there is a $k_{1}\times k_{2}$ bimatrix game $(R,C)_{ij}:=(R_{ij},C_{ij})$ that is played when the row player has type $i$ and the column player has type $j$ . We assume that all payoffs in every matrix $R_{ij}$ and every matrix $C_{ij}$ lie in the range $$.

Types.

The distribution over types is specified by a joint probability distribution: for each pair of types $i\in[m]$ and $j\in[n]$ , the probability that the row player is assigned type $i$ and the column player is assigned type $j$ is given by $p_{ij}$ . Obviously, we have that:

We also define some useful shorthands: for all $i\in[m]$ we denote by $p^{R}_{i}$ ( $p^{C}_{j}$ ) the probability that row (column) player has type $i\in[m]$ ( $j\in[n]$ ). Formally:

Note that $\sum_{i=1}^{m}p^{R}_{i}=\sum_{j=1}^{n}p^{C}_{j}=1$ . Furthermore, we denote by $p^{R}_{i}(j)$ the conditional probability that type $j\in[n]$ will be chosen for column player given that type $i$ is chosen for row player. Similarly, we define $p^{C}_{j}(i)$ for the column player. Formally:

We can see that for given type $t=(i,j)$ we have that $p_{ij}=p^{R}_{i}\cdot p^{R}_{i}(j)=p^{C}_{j}\cdot p^{C}_{j}(i)$ .

Strategies.

In order to play a Bayesian game, each player must specify a strategy for each of their types. Thus, a strategy profile is a pair $(\mathbf{x},\mathbf{y})$ , where $\mathbf{x}=(x_{1},x_{2},\dots,x_{m})$ such that each $x_{i}\in\Delta_{k_{1}}$ , and where $\mathbf{y}=(y_{1},y_{2},\dots,y_{n})$ such that each $y_{i}\in\Delta_{k_{2}}$ . This means that, when the row player gets type $i\in[m]$ and the column player gets type $j\in[n]$ , then the game $(R_{ij},C_{ij})$ will be played, and the row player will use strategy $x_{i}$ while the column player will use strategy $y_{j}$ .

Given a strategy profile $(\mathbf{x},\mathbf{y})$ , we can define the expected payoff to both players (recall that the players are not told their opponent’s type).

Given a strategy profile $(\mathbf{x},\mathbf{y})$ and a type $t=(i,j)$ , the expected payoff for the row player is given by:

Similarly, for the column player the expected payoff is:

Rescaling.

Before we define approximate equilibria for two-player Bayesian games, we first rescale the payoffs. Much like for polymatrix games, rescaling is needed to ensure that an $\epsilon$ -approximate equilibrium has a consistent meaning. Our rescaling will ensure that, for every possible pair of types, both player’s expected payoff uses the entire range $$.

For each type $i$ of the row player, we use ${U}^{i}_{R}$ to denote the maximum expected payoff for the row player when he has type $i$ , and we use ${L}^{i}_{R}$ to denote the minimum expected payoff for the row player when he has type $i$ . Formally, these are defined to be:

Then we apply the transformation $T_{R}^{i}(\cdot)$ to every element $z$ of $R_{ij}$ , for all types $j$ of the column player, where:

Similarly, we transform all payoff matrices for the column player using

where ${U}^{j}_{C}$ and ${L}^{j}_{C}$ are defined symmetrically. Note that, after this transformation has been applied, both player’s expected payoffs lie in the range $ $. Moreover, the full range is used: there exists a strategy for the column player against which one of the row player’s strategies has expected payoff$ 1$, and there exists a strategy for the column player against which one of the row player’s strategies has expected payoff . From now on we will assume that the payoff matrices have been rescaled in this way.

We can now define approximate Bayesian Nash equilibria for a two-player Bayesian game.

Let $(\mathbf{x},\mathbf{y})$ be a strategy profile. The profile $(\mathbf{x},\mathbf{y})$ is an $\epsilon$ -BNE iff the following conditions hold:

2 The reduction

In this section we reduce in polynomial time the problem of computing an $\epsilon$ -BNE for a two-player Bayesian game $\mathcal{B}$ to the problem of computing an $\epsilon$ -NE of a polymatrix game $\mathcal{P(B)}$ . We describe the construction of $\mathcal{P(B)}$ and prove that every $\epsilon$ -NE for $\mathcal{P(B)}$ maps to an $\epsilon$ -BNE of $\mathcal{B}$ .

Let $\mathcal{B}$ be a two-player Bayesian game where the row player has $m$ types and $k_{1}$ pure strategies and the column player has $n$ types and $k_{2}$ pure strategies. We will construct a polymatrix game $\mathcal{P(B)}$ as follows.

The game has $m+n$ players. We partition the set of players $[m+n]$ into two sets: the set $K=\{1,2,\dots,m\}$ will represent the types of the row player in $\mathcal{B}$ , while the set $L=\{m+1,m+2,\dots,m+n\}$ will represent the types of the column player in $\mathcal{B}$ . The underlying graph that shows the interactions between the players is a complete bipartite graph $G=(K\cup L,E)$ , where every player in $K$ (respectively $L$ ) plays a bimatrix game with every player in $L$ (respectively $K$ ). The bimatrix game played between vertices $v_{i}\in K$ and $v_{j}\in L$ is defined to be $(R^{*}_{ij},C^{*}_{ij})$ , where:

Observe that, for each player $i$ in the $K$ , the matrices $R^{*}_{ij}$ all have the same number of rows, and for each player $j\in L$ , the matrices $C^{*}_{ij}$ all have the same number of columns. Thus, $\mathcal{P(B)}$ is a valid polymatrix game. Moreover, we clearly have that $\mathcal{P(B)}$ has the same size as the original game $\mathcal{B}$ . Note that, since we have assumed that the Bayesian game has been rescaled, we have that for every player in $\mathcal{P(B)}$ the minimum (maximum) payoff achievable under pure strategy profiles is ( $1$ ), so no further scaling is needed in order to apply our algorithm.

We can now prove that every $\epsilon$ -NE of the polymatrix game is also an $\epsilon$ -BNE of the original two-player Bayesian game, which is the main result of this section.

Every $\epsilon$ -NE of $\mathcal{P(B)}$ is a $\epsilon$ -BNE for $\mathcal{B}$ .

Let $\mathbf{z}=(x_{1},\ldots,x_{m},y_{1},\ldots,y_{n})$ be an $\epsilon$ -NE for $\mathcal{P(B)}$ . This mean that no player can gain more than $\epsilon$ by unilaterally changing his strategy. We define the strategy profile $(\mathbf{x},\mathbf{y})$ for $\mathcal{B}$ where $\mathbf{x}=(x_{1},\dots,x_{m})$ and $\mathbf{y}=(y_{1},\dots,y_{n})$ , and we will show that $(\mathbf{x},\mathbf{y})$ is an $\epsilon$ -BNE for $\mathcal{B}$ .

Let $i\in K$ be a player. Since, $\mathbf{z}$ is an $\epsilon$ -NE of $\mathcal{P(B)}$ , we have:

By construction, we can see that player $i$ only interacts with the players from $L$ . Hence his payoff can be written as:

and since we are in an $\epsilon$ -NE, we have:

This is true for all $i\in K$ , thus it is true for all $i\in[m]$ .

Similarly, every player $j\in L$ interacts only with players form $K$ , thus:

and since we are in an $\epsilon$ -NE we have:

and this is true for all $j\in K$ , thus it is true for all $j\in[n]$ .

Combining now the fact that Equation (20) is true for all $i\in[m]$ and that Equation (21) is true for all $j\in[m]$ , it is easy to see that the strategy profile $(\mathbf{x},\mathbf{y})$ is an $\epsilon$ -BNE for $\mathcal{B}$ . ∎

Applying Algorithm 1 to $\mathcal{P(B)}$ thus gives us the following.

A $(0.5+\delta)$ -Bayesian Nash equilibrium of a two-player Bayesian game $\mathcal{B}$ can be found in time polynomial in the input size of $\mathcal{B}$ and $1/\delta$ .

Conclusions and open questions

We have presented a polynomial-time algorithm that finds a $(0.5+\delta)$ -Nash equilibrium of a polymatrix game for any $\delta>0$ . Though we do not have examples that show that the approximation guarantee is tight for our algorithm, we do not see an obvious approach to prove a better guarantee. The initial choice of strategy profile affects our algorithm, and it is conceivable that one may be able to start the algorithm from an efficiently computable profile with certain properties that allow a better approximation guarantee. One natural special case is when there is a constant number of players, which may allow one to derive new strategy profiles from a stationary point as done by Tsaknakis and Sprirakis TS . It may also be possible to develop new techniques when the number of pure strategies available to the players is constant, or when the structure of the graph is restricted in some way. For example, in the games arising from two-player Bayesian games, the graph is always bipartite.

This paper has considered $\epsilon$ -Nash equilibria, which are the most well-studied type of approximate equilibria. However, $\epsilon$ -Nash equilibria have a drawback: since they only require that the expected payoff is within $\epsilon$ of a pure best response, it is possible that a player could be required to place probability on a strategy that is arbitrarily far from being a best response. An alternative, stronger, notion is an $\epsilon$ -well supported approximate Nash equilibrium ( $\epsilon$ -WSNE). It requires that players only place probability on strategies that have payoff within $\epsilon$ of a pure best response. Every $\epsilon$ -WSNE is an $\epsilon$ -Nash, but the converse is not true. For bimatrix games, the best-known additive approximation that is achievable in polynomial time gives a $\bigl{(}\frac{2}{3}-0.0047\bigr{)}$ -WSNE FGSS12 . It builds on the algorithm given by Kontogiannis and Spirakis that achieves a $\frac{2}{3}$ -WSNE in polynomial time KS . Recently a polynomial-time algorithm with a better approximation guarantee have been given for symmetric bimatrix games CFJ14 . Note, it has been shown that there is a PTAS for finding $\epsilon$ -WSNE of bimatrix games if and only if there is a PTAS for $\epsilon$ -Nash DGP ; CDT . For $n$ -player games with $n>2$ there has been very little work on developing algorithms for finding $\epsilon$ -WSNE. This is a very interesting direction, both in general and when $n>2$ is a constant.

References

Appendix A Proof of Lemma 1

Before we begin with the proof, we introduce the following notation. For a player $i\in[n]$ , given a strategy profile $\mathbf{x}$ and a subset of $i$ ’s pure strategies $S\subseteq[m_{i}]$ , we use $M_{i}(\mathbf{x},S)$ for taking the maximum of the payoffs of $i$ when the others play according to $\mathbf{x}$ , and player $i$ is restricted to pick elements from $S$ :

In order to find the gradient, we have to calculate the variation of $f_{i}$ along the direction $\mathbf{x}^{\prime}-\mathbf{x}$ , by evaluating $f(\bar{\mathbf{x}})$ for points $\bar{\mathbf{x}}$ of the form

Recall from (4), that for $\bar{\mathbf{x}}\in\Delta$ we have that $f_{i}(\bar{\mathbf{x}}):=u_{i}^{*}(\bar{\mathbf{x}})-u_{i}(\bar{\mathbf{x}})$ . In order to rewrite $u_{i}^{*}(\bar{\mathbf{x}})$ we introduce notation $\Lambda_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)$ as follows.

In the following technical lemma we provide an expression for $u_{i}^{*}(\bar{\mathbf{x}})$ . In order to rewrite $u_{i}^{*}(\bar{\mathbf{x}})$ , we use the following simple observation. Consider a multiset of numbers $\{a_{1},\ldots,a_{n}\}$ , and the index sets $S\subseteq[n]$ and $\bar{S}=[n]\setminus S$ . We have the following identity:

We will use the expression (24) for $u_{i}^{*}(\bar{\mathbf{x}})$ , along with the following reformulation of $u_{i}(\bar{\mathbf{x}})$ :

We now use these reformulations to prove the following lemma.

We have that $f_{i}(\bar{\mathbf{x}})-f(\mathbf{x})$ is equal to:

Recall from (4) that $f_{i}(\mathbf{x})=M_{i}(\mathbf{x},S)-u_{i}(\mathbf{x})$ , so the formula above is equal to:

Using now (6) for $Df_{i}(\mathbf{x},\mathbf{x}^{\prime})$ , the above formula becomes:

Recall now that $f(\mathbf{x})=\max_{j\in[n]}f_{j}(\mathbf{x})$ . Thus the term $f(\mathbf{x})-f_{i}(\mathbf{x})$ can be written as $\max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}$ . So, the expression above is equivalent to

Now we are ready to prove Lemma 1. Recall from definition 1 for the gradient that

This is true from the definition of pure best response strategies. So, from equation (22) for $\Lambda_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)$ it is true that $\lim_{\epsilon\rightarrow 0}\Lambda_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)=0$ .

Furthermore, the term $\epsilon^{2}\cdot u_{i}(\mathbf{x}^{\prime}-\mathbf{x})$ when is divided by $\epsilon$ equals to $\epsilon\cdot u_{i}(\mathbf{x}^{\prime}-\mathbf{x})$ , thus $\lim_{\epsilon\rightarrow 0}\bigl{(}\epsilon\cdot u_{i}(\mathbf{x}^{\prime}-\mathbf{x})\bigr{)}=0$ .

is either 0 when $f_{i}(\mathbf{x})=f(\mathbf{x})$ , i.e player $i$ has the maximum regret and $\max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}=0$ , or $-\infty$ otherwise, because $\max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}>0$ .

To sum up, if $f_{i}(\mathbf{x})$ achieves the maximum regret at point $\mathbf{x}^{\prime}$ , then the limit $\lim_{\epsilon\rightarrow 0}\bigl{(}f_{i}(\bar{\mathbf{x}})-f(\mathbf{x})\bigr{)}=Df_{i}(\mathbf{x},\mathbf{x}^{\prime})-f(\mathbf{x})$ , otherwise the limit equals $-\infty$ .

From (26) for the gradient we want the maximum of these quantities, thus we have the claimed result.

Appendix B Proof of Lemma 4

Throughout this proof, $\mathbf{x},\mathbf{x}^{\prime},\bar{\mathbf{x}}$ , and $\epsilon$ will be fixed as they are defined in Section 6. In order to prove this lemma, we must show a bound on:

Before we start the analysis we need to redefine the term $\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)$ in order to prove an analogous version of Lemma 7 when $\delta$ -best responses are used.

We define $\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)$ as:

We now use this definition to prove the following lemma.

We will use the reformulation from Equation (25) for $u_{i}(\bar{\mathbf{x}})$ :

The correctness of this was proved in Appendix A. Now we use all the these reformulations in order to prove the following lemma.

We have that $f_{i}(\bar{\mathbf{x}})-f(\mathbf{x})$ is less than or equal to:

Recall that, by definition, we have that:

Thus, we can apply Lemma 9 along with the reformulation given in Equation (29) for $u_{i}(\bar{\mathbf{x}})$ to prove that $f_{i}(\bar{\mathbf{x}})-f(\mathbf{x})$ is less than or equal to:

Having shown Lemma 10, we will now study each term of (30) and provide bounds for each of them. To begin with, it is easy to see that for all $i\in[n]$ we have that $\max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}\geq 0$ , and since $\epsilon<1$ , we have that $(1-\epsilon)\max_{j\in[n]}\bigl{\{}f_{j}(\mathbf{x})-f_{i}(\mathbf{x})\bigr{\}}\geq 0$ . Thus, Equation (30) is less than or equal to:

Next we consider the term $\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)$ . In the following technical lemma we prove that $\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)=0$ for all $i\in[n]$ .

We have $\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)=0$ for all $i\in[n]$ .

According to equation (27) for $\Lambda^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime},\epsilon)$ , we have:

We can rewrite this expression as follows. First define:

We now substitute these two bounds into the definition of $Z(\mathbf{x},\mathbf{x}^{\prime},\epsilon,k)$ . We have:

Next we consider the term $u_{i}(\mathbf{x}^{\prime}-\mathbf{x})$ in Equation (31). The following lemma provides a simple lower bound for this term.

For all $i\in[n]$ , we have $Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})-1\leq u_{i}(\mathbf{x}^{\prime}-\mathbf{x})$ .

For $u_{i}(\mathbf{x}^{\prime}-\mathbf{x})$ we have the following:

Recall that $\mathcal{D}=\max_{i\in[n]}Df^{\delta}_{i}(\mathbf{x},\mathbf{x}^{\prime})$ and $f_{new}=f(\bar{\mathbf{x}})$ and $f=f(\mathbf{x})$ . We can now apply the bounds from Lemma 11 and Lemma 12 to Equation (31) to obtain: