Prisoner's dilemma

Definition

The prisoner's dilemma is a non-zero sum simultaneous two-player game, or rather, isomorphism class of games. There are two players, each of whom has two choices, cooperate or defect. The payoffs based on the choices made by the players are given below (with each player self-interested in maximizing its own payoff and not concerned about the other player's payoff):

Choice pairs	Player 1 cooperates	Player 1 defects
Player 2 cooperates	Player 1: $x_{1}$ Player 2: $x_{2}$	Player 1: $y_{1}$ Player 2: $z_{2}$
Player 2 defects	Player 1: $z_{1}$ Player 2: $y_{2}$	Player 1: $w_{1}$ Player 2: $w_{2}$

where we have:

$\!z_{1}<w_{1}<x_{1}<y_{1},\qquad z_{2}<w_{2}<x_{2}<y_{2}$

In many cases, for simplicity, we assume that $z_{1}=z_{2},w_{1}=w_{2},x_{1}=x_{2},y_{1}=y_{2}$ , i.e., we assume a symmetric prisoner's dilemma. This is not a necessary assumption.

Key features

Winning strategies and Nash equilibrium

For any fixed choice of either player, the self-interested choice for the other player is to defect. Thus, even though the players do not know the other player's choice at the time of making their own choice, the unique Nash equilibrium strategy is for both players to defect.

We explain the four cases:

Case	What the other player should do
Player 1 cooperates	In this case, Player 2 has a payoff of $x_{2}$ from cooperating and $y_{2}$ from defecting. Since $x_{2}<y_{2}$ , the rational choice is to defect.
Player 1 defects	In this case, Player 2 has a payoff of $w_{2}$ from cooperating and $z_{2}$ from defecting. Since $w_{2}<z_{2}$ , the rational choice is to defect.
Player 2 defects	In this case, Player 1 has a payoff of $x_{1}$ from cooperating and $y_{1}$ from defecting. Since $x_{1}<y_{1}$ , the rational choice is to defect.
Player 2 defects	In this case, Player 1 has a payoff of $w_{1}$ from cooperating and $z_{1}$ from defecting. Since $w_{1}<z_{1}$ , the rational choice is to defect.

Win-win strategy

If both players could negotiate and enforce each other's moves through a mutually agreed upon contract, then they would both seek to cooperate. Mutual cooperation is a win-win for both players relative to mutual defection, even though mutual defection is the Nash equilibrium strategy and mutual cooperation is not. More specifically, note that if both players cooperate, the payoffs are $x_{1}$ and $x_{2}$ respectively, and if both defect, the payoffs are $z_{1}$ and $z_{2}$ respectively. Further, $z_{1}<x_{1}$ and $z_{2}<x_{2}$ , so mutual defection is a worse choice for both players than mutual cooperation.

Economic explanation for difference between winning strategy and win-win strategy

The reason why the win-win strategy of mutual cooperation differs so much from the Nash equilibrium strategy is because each player has the power to affect the fortunes of the other player, but benefiting the other player comes at a cost and the other player has no mechanism to compensate the first player for exercising that power beneficially.

We can formulate this by saying that the actions of players impose externalities (external costs and external benefits) on the other players, and there is a missing market for player choices, i.e., there is no mechanism for having enforceable contracts between the players that would allow them to internalize externalities.

Repeated prisoner's dilemma

Further information: repeated prisoner's dilemma

A repeated prisoner's dilemma (RPD) is a version of the prisoner's dilemma where the same pair of players play the prisoner's dilemma game repeatedly with one another. The typical RPD is one where both players have perfect memory of all past plays of the game, always make the move they intended to make, and where the game is stopped at a random point with no advance notice. Each player's goal is to maximize the cumulative payoff so far.

RPD is qualitatively different from a single iteration prisoner's dilemma because the ability of players to respond to the past moves of the other players helps in the internalization of externalities, i.e., each player can reward or punish the other player based on past moves.

One of the strategies that makes sense in a repeated prisoner's dilemma, even though it does not make sense in a single iteration of the prisoner's dilemma, is the tit for tat strategy, where each player begins by cooperating, and chooses to defect if the other player defected in the previous move. More generally, good strategies for the RPD share some key characteristics: they are nice (they start out by cooperating), clear (it is easy for the other player to figure out what strategy is being followed), retaliatory (if the other player tries to defect too much, then they get defected against too), and forgiving (even if the other player defects once, it is possible to get back on track to both cooperating).

There are variations of the RPD where players make mistakes (i.e., their actual move is not always their intended move), have poor memories (i.e., they do not remember the entire past history of the game), etc.