Prisoner's dilemma

(Redirected from Iterated prisoner's dilemma)
Standard prisoner's dilemma payoff matrix
B
A
B stays
silent
B
betrays
A stays
silent
−2
−2
0
−10
A
betrays
−10
0
−5
−5

The prisoner's dilemma is a game analyzed in game theory.[citation needed] It is a thought experiment that challenges two completely rational agents to a dilemma: they can cooperate with their partner for mutual benefit or betray their partner ("defect") for individual reward.

This dilemma was originally framed by Merrill Flood and Melvin Dresher in 1950 while they worked at RAND.[citation needed] Albert W. Tucker later formalized the game by structuring the rewards in terms of prison sentences and named it "prisoner's dilemma".[1] William Poundstone described the game in his 1993 book Prisoner's Dilemma:

Two members of a criminal gang, A and B, are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communication with their partner. The principal charge would lead to a sentence of ten years in prison; however, the police do not have the evidence for a conviction. They plan to sentence both to two years in prison on a lesser charge but offer each prisoner a Faustian bargain: If one of them confesses to the crime of the principal charge, betraying the other, they will be pardoned and free to leave while the other must serve the entirety of the sentence instead of just two years for the lesser charge.

This leads to a possible of four different outcomes:

• A: If A and B both remain silent, they will each serve the lesser charge of 2 years in prison.
• B: If A betrays B but B remains silent, A will be set free while B serves 10 years in prison.
• C: If A remains silent but B betrays A, A will serve 10 years in prison and B will be set free.
• D: If A and B both betray the other, they share the sentence and serve 5 years.

As a projection of rational behavior in terms of loyalty to one's partner in crime, the Prisoner's Dilemma suggests that criminals who are offered a greater reward will betray their partner.

Loyalty to one's partner is, in this game, irrational. This particular assumption of rationality implies that the only possible outcome for two purely rational prisoners is betrayal, even though mutual cooperation would yield a greater net reward.[2] Alternative ideas governing behavior have been proposed — see, for example, Elinor Ostrom.

The best response, ie. the dominant strategy, is to betray the other, which aligns with the sure-thing principle.[3] The prisoner's dilemma also illustrates that the decisions made under collective rationality may not necessarily be the same as those made under individual rationality. This conflict is also evident in a situation called the "Tragedy of the Commons".[3]

In reality, systemic bias towards cooperative behavior happens despite predictions by simple models of "rational" self-interested action.[4][5][6][7] This bias towards cooperation has been evident since this game was first conducted at RAND: Secretaries involved often trusted each other and worked together toward the best common outcome.[8]

The prisoner's dilemma became the focus of extensive experimental research.[9][10] This research has taken one of three forms: single play (agents play one game only), iterated play (agents play several games in succession), and iterated play against a programmed player.[3] Research on the prisoner's dilemma has served to justify the categorical imperative raised by Immanuel Kant, which states that a rational agent is expected to "act in the way you wish others to act." This theory is vital for a situation involving different players each acting for their best interest who must take others' actions into consideration to form their own choice.[3]

In the "iterative" variant of the game, where two agents play against each other several times, agents continually have the opportunity to penalize the other for previous decisions. If the number of times the game will be played is known to the players, then by backward induction two classically rational players will betray each other repeatedly, for the same reasons as the single-shot variant. In an infinite or unknown length game there is no fixed optimum strategy, and prisoner's dilemma tournaments have been held to compete and test algorithms for such cases.[11]

The iterated version of the prisoner's dilemma is of particular interest to researchers. Due to its iterative nature, previous researchers observed that the frequency for players to cooperate could change, based on the outcomes of each iteration. Specifically, a player may be less willing to cooperate if their counterpart did not cooperate many times, which renders disappointment. Conversely, as time goes by, cooperation can increase due to the setup of a "tacit agreement" between players. Another aspect concerning the iterated version of the experiment is that this tacit agreement between players has always been established successfully even when the number of iterations is made public to both sides.[3]

The prisoner's dilemma game can model many real world situations involving cooperative behavior. In casual usage, the label "prisoner's dilemma" may be applied to any situation in which two entities could gain important benefits from cooperating or suffer from the failure to do so but find it difficult or expensive—though not necessarily impossible—to coordinate their activities.

Strategy for the prisoner's dilemma

Two prisoners are separated into individual rooms and cannot communicate with each other. The normal game is shown below:

Prisoner B

Prisoner A
Prisoner B stays silent
(cooperates)
Prisoner B betrays
(defects)
Prisoner A stays silent
(cooperates)
Each serve 2 years Prisoner A: 10 years
Prisoner B: goes free
Prisoner A betrays
(defects)
Prisoner A: goes free
Prisoner B: 10 years
Each serve 5 years

It is assumed that both prisoners understand the nature of the game, have no loyalty to each other and will have no opportunity for retribution or reward outside of the game. Regardless of what the other decides, each prisoner gets a higher reward by betraying the other ("defecting"). The reasoning involves analyzing both players' best responses: B will either cooperate or defect. If B cooperates, A should defect, because going free is better than serving 2 years. If B defects, A should also defect, because serving 5 years is better than serving 10. So, either way, A should defect since defecting is A's best response regardless of B's strategy. Parallel reasoning will show that B should defect.

Defection always results in a better payoff than cooperation, so it is a strictly dominant strategy for both A and B. Mutual defection is the only strong Nash equilibrium in the game (ie. the only outcome from which each player could only do worse by unilaterally changing strategy). The dilemma is that mutual cooperation yields a better outcome than mutual defection but is not the rational outcome because the choice to cooperate, from a self-interested perspective, is irrational. Thus, Prisoner's dilemma is a game where the Nash equilibrium is not Pareto efficient.

Generalized form

The structure of the traditional prisoner's dilemma can be generalized from its original prisoner setting. Suppose that the two players are represented by the colors red and blue and that each player chooses to either "cooperate" or "defect".

If both players cooperate, they both receive the reward R for cooperating. If both players defect, they both receive the punishment payoff P. If Blue defects while Red cooperates, then Blue receives the temptation payoff T, while Red receives the "sucker's" payoff, S. Similarly, if Blue cooperates while Red defects, then Blue receives the sucker's payoff S, while Red receives the temptation payoff T.

This can be expressed in normal form:

Canonical PD payoff matrix
Red
Blue
Cooperate Defect
Cooperate
R
R
T
S
Defect
S
T
P
P

and to be a prisoner's dilemma game in the strong sense, the following condition must hold for the payoffs:

${\displaystyle T>R>P>S}$

The payoff relationship ${\displaystyle R>P}$ implies that mutual cooperation is superior to mutual defection, while the payoff relationships ${\displaystyle T>R}$ and ${\displaystyle P>S}$ imply that defection is the dominant strategy for both agents.

Special case: donation game

The "donation game"[12] is a form of prisoner's dilemma in which cooperation corresponds to offering the other player a benefit b at a personal cost c with b > c. Defection means offering nothing. The payoff matrix is thus

Red
Blue
Cooperate Defect
Cooperate
bc
bc
b
c
Defect
c
b
0
0

Note that ${\displaystyle 2R>T+S}$ (i.e. ${\displaystyle 2(b-c)>b-c}$), which qualifies the donation game to be an iterated game (see next section).

The donation game can be applied to markets. Suppose X grows oranges and Y grows apples. The marginal utility of an apple to the orange-grower X is b, which is higher than the marginal utility (c) of an orange, since X has a surplus of oranges and no apples. Similarly, for apple-grower Y, the marginal utility of an orange is b while the marginal utility of an apple is c. If X and Y contract to exchange an apple and an orange, and each fulfills their end of the deal, then each receive a payoff of b-c. If one "defects" and does not deliver as promised, the defector will receive a payoff of b, while the cooperator will lose c. If both defect, then neither one gains or loses anything.

The iterated prisoner's dilemma

If two players play the prisoner's dilemma more than once in succession, remember previous actions of their opponent and are allowed to change their strategy accordingly, the game is called the iterated prisoner's dilemma.

In addition to the general form above, the iterative version also requires that ${\displaystyle 2R>T+S}$, to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.

The iterated prisoner's dilemma game is fundamental to some theories of human cooperation and trust. Assuming that the game effectively models transactions between two people that require trust, cooperative behavior in populations can be modeled by a multi-player iterated version of the game. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma is also referred to as the "peace-war game".[13][14]

If the game is played N times and both players know this, then the dominant strategy is to defect in all rounds. The only possible Nash equilibrium is to always defect. The proof is inductive: One might as well defect on the last turn, since the opponent will not have a chance to later retaliate. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.

Unlike the standard prisoner's dilemma, in the iterated prisoner's dilemma the defection strategy is counter-intuitive and fails to predict the behavior of human players, despite defection being the only correct answer in standard game theory. The superrational strategy in the iterated prisoner's dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies align with the superrational version rather than the game-theoretic rational one.

For cooperation to emerge between game theoretic rational players, the number of rounds N must be unknown to the players. In this case "always defect" may no longer be a strictly dominant strategy but only a Nash equilibrium. As shown by Robert Aumann in a 1959 paper,[citation needed] rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.

According to a 2019 experimental study in the American Economic Review that tested what strategies real-life subjects used in iterated prisoners' dilemma situations with perfect monitoring, the majority of chosen strategies were always to defect, tit-for-tat, and grim trigger. Which strategy the subjects chose depended on the parameters of the game.[15]

Strategy for the iterated prisoner's dilemma

Interest in the iterated prisoner's dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984), in which he reports on a tournament that he organized of the N step prisoner's dilemma (with N fixed) in which participants have to choose their mutual strategy again and again and have memory of their previous encounters. Axelrod invited academic colleagues from around the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.

Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more altruistic strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behavior from mechanisms that are initially purely selfish, by natural selection.

The winning deterministic strategy was tit for tat, developed and entered into the tournament by Anatol Rapoport. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness". When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents.

After analyzing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to succeed:

Nice
The most important condition is that the strategy must be "nice". That is, it will not defect before its opponent does (this is sometimes referred to as an "optimistic" algorithm). Almost all of the top-scoring strategies were nice. A purely selfish strategy will not "cheat" on its opponent for purely self-interested reasons first.
Retaliating
However, Axelrod contended, the successful strategy must not be a blind optimist; it must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate, a very bad choice that will frequently be exploited by "nasty" strategies.
Forgiving
Successful strategies must also be forgiving. Though players will retaliate, they will once again fall back to cooperating if the opponent does not continue to defect. This can stop long runs of revenge and counter-revenge, maximizing points.
Non-envious
The last quality is being non-envious, meaning not striving to score more than the opponent.

The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents (collectively called a "population") may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. Consider, for example, a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In turn, given a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage and on the amount of iterations played.

In the strategy called Pavlov, win-stay, lose-switch, faced with a failure to cooperate, the player switches strategy the next turn.[16] In certain circumstances,[specify] Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy.

Deriving the optimal strategy is generally done in two ways:

• Bayesian Nash equilibrium: If the statistical distribution of opposing strategies can be determined (e.g., 50% tit for tat, 50% always cooperate) an optimal counterstrategy can be derived analytically. [a]
• Monte Carlo simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a genetic algorithm for finding an optimal strategy). The mix of algorithms in the final population generally depends on the mix in the initial population. The introduction of mutation (random variation during reproduction) lessens the dependency on the initial population; empirical experiments with such systems tend to produce tit for tat players (see for instance Chess 1988),[clarification needed] but no analytic proof exists that this will always occur.[18]

Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start.[19] Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, the 2004 Prisoners' Dilemma Tournament results show University of Southampton's strategies in the first three places (and a number of positions towards the bottom), despite having fewer wins and many more losses than the GRIM strategy. (In a PD tournament, the aim of the game is not to "win" matches – that can easily be achieved by frequent defection).

The Southampton strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of min-maxing). Because of this new rule, this competition also has little theoretical significance when analyzing single agent strategies as compared to Axelrod's seminal tournament. However, it provided a basis for analyzing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Dawkins, in his book The Selfish Gene, pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoner's dilemma in that there is no communication allowed between the two players, which the Southampton programs arguably did with their preprogrammed "ten move dance" to recognize one another; reinforcing just how valuable communication can be in shifting the balance of the game.

Even without implicit collusion between software strategies (exploited by the Southampton team) tit for tat is not always the absolute winner of any given tournament; it would be more precise to say that its long run results over a series of tournaments outperform its rivals. (In any one event a given strategy can be slightly better adjusted to the competition than tit for tat, but tit for tat is more robust). The same applies for the tit for tat with forgiveness variant, and other optimal strategies: on any given day they might not "win" against a specific mix of counterstrategies. An alternative way of putting it is using the Darwinian ESS simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Richard Dawkins showed that here, no static mix of strategies form a stable equilibrium, and the system will always oscillate between bounds.

Stochastic iterated prisoner's dilemma

In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities".[20] In an encounter between player X and player Y, X's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities: ${\displaystyle P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}}$, where ${\displaystyle P_{ab}}$ is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then ${\displaystyle P_{cd}}$ is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit-for-tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e., cc or dc) but changes strategy if it was a loss (i.e., cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy that gives the same statistical results, so that only memory-1 strategies need be considered.[20]

If we define P as the above 4-element strategy vector of X and ${\displaystyle Q=\{Q_{cc},Q_{cd},Q_{dc},Q_{dd}\}}$ as the 4-element strategy vector of Y, a transition matrix M may be defined for X whose ij th entry is the probability that the outcome of a particular encounter between X and Y will be j given that the previous encounter was i, where i and j are one of the four outcome indices: cc, cd, dc, or dd. For example, from X's point of view, the probability that the outcome of the present encounter is cd given that the previous encounter was cd is equal to ${\displaystyle M_{cd,cd}=P_{cd}(1-Q_{dc})}$. (The indices for Q are from Y's point of view: a cd outcome for X is a dc outcome for Y.) Under these definitions, the iterated prisoner's dilemma qualifies as a stochastic process and M is a stochastic matrix, allowing all of the theory of stochastic processes to be applied.[20]

One result of stochastic theory is that there exists a stationary vector v for the matrix M such that ${\displaystyle v\cdot M=v}$. Without loss of generality, it may be specified that v is normalized so that the sum of its four components is unity. The ij th entry in ${\displaystyle M^{n}}$ will give the probability that the outcome of an encounter between X and Y will be j given that the encounter n steps previous is i. In the limit as n approaches infinity, M will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing j which will be independent of i. In other words, the rows of ${\displaystyle M^{\infty }}$ will be identical, giving the long-term equilibrium result probabilities of the iterated prisoner's dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that v is a stationary vector for ${\displaystyle M^{n}}$ and particularly ${\displaystyle M^{\infty }}$, so that each row of ${\displaystyle M^{\infty }}$ will be equal to v. Thus, the stationary vector specifies the equilibrium outcome probabilities for X. Defining ${\displaystyle S_{x}=\{R,S,T,P\}}$ and ${\displaystyle S_{y}=\{R,T,S,P\}}$ as the short-term payoff vectors for the {cc,cd,dc,dd} outcomes (From X's point of view), the equilibrium payoffs for X and Y can now be specified as ${\displaystyle s_{x}=v\cdot S_{x}}$ and ${\displaystyle s_{y}=v\cdot S_{y}}$, allowing the two strategies P and Q to be compared for their long-term payoffs.

Zero-determinant strategies

The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated prisoner's dilemma (IPD) illustrated in a Venn diagram. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.

In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies.[20] The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: ${\displaystyle s_{x}=D(P,Q,S_{x})}$ and ${\displaystyle s_{y}=D(P,Q,S_{y})}$, which do not involve the stationary vector v. Since the determinant function ${\displaystyle s_{y}=D(P,Q,f)}$ is linear in f, it follows that ${\displaystyle \alpha s_{x}+\beta s_{y}+\gamma =D(P,Q,\alpha S_{x}+\beta S_{y}+\gamma U)}$ (where U = {1, 1, 1, 1}). Any strategies for which ${\displaystyle D(P,Q,\alpha S_{x}+\beta S_{y}+\gamma U)=0}$ is by definition a ZD strategy, and the long-term payoffs obey the relation ${\displaystyle \alpha s_{x}+\beta s_{y}+\gamma =0}$.

Tit-for-tat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which ${\displaystyle D(P,Q,\beta S_{y}+\gamma U)=0}$, unilaterally setting ${\displaystyle s_{y}}$ to a specific value within a particular range of values, independent of Y's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set ${\displaystyle s_{x}}$ to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.[20])

An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly because they reduce each other's surplus).[21]

Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is larger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and win–stay, lose–switch agents.[12]

While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust. In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the donation game by Alexander Stewart and Joshua Plotkin in 2013.[22] Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Akin (2013)[23] to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.[22]

Continuous iterated prisoner's dilemma

Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd[24] found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoner's dilemma, tit-for-tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit-for-tat-like cooperation are extremely rare in nature (ex. Hammerstein[25]) even though tit for tat seems robust in theoretical models.

Emergence of stable strategies

Players cannot seem to coordinate mutual cooperation, thus often get locked into the inferior yet stable strategy of defection. In this way, iterated rounds facilitate the evolution of stable strategies.[26] Iterated rounds often produce novel strategies, which have implications to complex social interaction. One such strategy is win-stay lose-shift. This strategy outperforms a simple Tit-For-Tat strategy – that is, if you can get away with cheating, repeat that behavior. However, if you get caught, switch.[27]

The only problem of this tit-for-tat strategy is that they are vulnerable to signal error. The problem arises when one individual cheats in retaliation, but the other interprets it as cheating. As a result of this, the second individual now cheats, and then it starts a see-saw pattern of cheating in a chain reaction.

Even without repeated games, strong enlightened self-interest can result in a stable and efficient outcome.[28]

Real-life examples

The prisoner setting may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature that have the same payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics, politics, and sociology, as well as to the biological sciences such as ethology and evolutionary biology. Many natural processes have been abstracted into models in which living beings are engaged in endless games of prisoner's dilemma. This wide applicability of the PD gives the game its substantial importance.

Environmental studies

In environmental studies, the PD is evident in crises such as global climate-change. It is argued all countries will benefit from a stable climate, but any single country is often hesitant to curb CO2 emissions. The immediate benefit to any one country from maintaining current behavior is perceived to be greater than the purported eventual benefit to that country if all countries' behavior was changed, therefore explaining the impasse concerning climate-change in 2007.[29]

An important difference between climate-change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by governments is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests that states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.[30]

Osang and Nandy (2003) provide a theoretical explanation with proofs for a regulation-driven win-win situation along the lines of Michael Porter's hypothesis, in which government regulation of competing firms is substantial.[31]

Animals

Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long-term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.[32]

Vampire bats are social animals that engage in reciprocal food exchange. Applying the payoffs from the prisoner's dilemma can help explain this behavior:[33]

• Cooperate/Cooperate: "Reward: I get blood on my unlucky nights, which saves me from starving. I have to give blood on my lucky nights, which doesn't cost me too much."
• Defect/Cooperate: "Temptation: You save my life on my poor night. But then I get the added benefit of not having to pay the slight cost of feeding you on my good night."
• Cooperate/Defect: "Sucker's Payoff: I pay the cost of saving your life on my good night. But on my bad night you don't feed me and I run a real risk of starving to death."
• Defect/Defect: "Punishment: I don't have to pay the slight costs of feeding you on my good nights. But I run a real risk of starving on my poor nights."

Psychology

In addiction research / behavioral economics, George Ainslie points out[34] that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, defecting means relapsing, and it is easy to see that not defecting both today and in the future is by far the best outcome. The case where one abstains today but relapses in the future is the worst outcome – in some sense the discipline and self-sacrifice involved in abstaining today have been "wasted" because the future relapse means that the addict is right back where they started and will have to start over (which is quite demoralizing and makes starting over more difficult). Relapsing today and tomorrow is a slightly "better" outcome, because while the addict is still addicted, they haven't put the effort in to trying to stop. The final case, where one engages in the addictive behavior today while abstaining "tomorrow" will be familiar to anyone who has struggled with an addiction. The problem here is that (as in other PDs) there is an obvious benefit to defecting "today", but tomorrow one will face the same PD, and the same obvious benefit will be present then, ultimately leading to an endless string of defections.

John Gottman in his research described in "The Science of Trust" defines good relationships as those where partners know not to enter the (D,D) cell or at least not to get dynamically stuck there in a loop. In cognitive neuroscience, fast brain signaling associated with processing different rounds may indicate choices at the next round. Mutual cooperation outcomes entail brain activity changes predictive of how quickly a person will cooperate in kind at the next opportunity;[35] this activity may be linked to basic homeostatic and motivational processes, possibly increasing the likelihood to short-cut into the (C,C) cell of the game.

Economics

The prisoner's dilemma has been called the E. coli of social psychology, and it has been used widely to research various topics such as oligopolistic competition and collective action to produce a collective good.[36]

Without enforceable agreements, members of a cartel are also involved in a (multi-player) prisoner's dilemma.[38] 'Cooperating' typically means keeping prices at a pre-agreed minimum level. 'Defecting' means selling under this minimum level, instantly taking business (and profits) from other cartel members. Anti-trust authorities want potential cartel members to mutually defect, ensuring the lowest possible prices for consumers.

Sport

Doping in sport has been cited as an example of a prisoner's dilemma.[39]

Two competing athletes have the option to use an illegal and/or dangerous drug to boost their performance. If neither athlete takes the drug, then neither gains an advantage. If only one does, then that athlete gains a significant advantage over the competitor, reduced by the legal and/or medical dangers of having taken the drug. If both athletes take the drug, however, the benefits cancel out and only the dangers remain, putting them both in a worse position than if neither had used doping.[39]

In a conversation with Ken Griffey Jr. after the 1998 MLB season, Barry Bonds expressed his frustration with other players' use of steroids: "I had a helluva season last year, and nobody gave a crap. Nobody. As much as I've complained about McGwire and Canseco and all of the bull with steroids, I'm tired of fighting it. I turn 35 this year. I've got three or four good seasons left, and I wanna get paid. I'm just gonna start using some hard-core stuff, and hopefully it won't hurt my body. Then I'll get out of the game and be done with it."[40] Bonds found himself in the prisoner's dilemma that is doping in baseball, the feeling that he has to use steroids so that his competitors don't have such a significant advantage over him, putting him on an even playing field, though everyone is worse off than if no one had used steroids at all.

International politics

In international relations theory, the Prisoner's Dilemma is often used to demonstrate why cooperation fails in situations when cooperation between states is collectively optimal but individually suboptimal.[41][42] A classic example the security dilemma whereby an increase in one state's security (such as increasing its military strength) leads other states to fear for their own security (because they do not know if the security-increasing state intends to use its growing military for offensive purposes).[43] Consequently, security-increasing measures can lead to tensions, escalation or conflict with one or more other parties, producing an outcome which no party truly desires; a political instance of the prisoner's dilemma.[44][43][45][46][47] The security dilemma is particularly intense in situations when (1) it is hard to distinguish offensive weapons from defensive weapons, and (2) offense has the advantage in any conflict over defense.[43] Military technology and geography strongly affect the offense-defense balance.[43]

The prisoner's dilemma has frequently been used by realist international relations theorists to demonstrate the why all states (regardless of their internal policies or professed ideology) under international anarchy will struggle to cooperate with one another even when all benefit from such cooperation.

Critics of realism however argue that iteration and extending the shadow of the future are solutions to the prisoner's dilemma. When actors play the prisoner's dilemma once, they have incentives to defect, but when they expect to play it repeatedly, they have greater incentives to cooperate.[48]

Multiplayer dilemmas

Many real-life dilemmas involve multiple players.[49] Although metaphorical, Hardin's tragedy of the commons may be viewed as an example of a multi-player generalization of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons"). A commons dilemma most people can relate to is washing the dishes in a shared house. By not washing dishes an individual can gain by saving his time, but if that behavior is adopted by every resident, the collective cost is no clean plates for anyone.

The commons are not always exploited: William Poundstone, in a book about the prisoner's dilemma, describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for people to take a paper without paying (defecting), but very few do, feeling that if they do not pay then neither will others, destroying the system.[50] Subsequent research by Elinor Ostrom, winner of the 2009 Nobel Memorial Prize in Economic Sciences, hypothesized that the tragedy of the commons is oversimplified, with the negative outcome influenced by outside influences. Without complicating pressures, groups communicate and manage the commons among themselves for their mutual benefit, enforcing social norms to preserve the resource and achieve the maximum good for the group, an example of effecting the best-case outcome for PD.[51][52]

Related games

Closed-bag exchange

The prisoner's dilemma as a briefcase exchange

Douglas Hofstadter[53] once suggested that people often find problems such as the PD problem easier to understand when it is illustrated in the form of a simple game, or trade-off. One of several examples he used was "closed bag exchange":

Two people meet and exchange closed bags, with the understanding that one of them contains money, and the other contains a purchase. Either player can choose to honor the deal by putting into his or her bag what he or she agreed, or he or she can defect by handing over an empty bag.

Friend or Foe?

Friend or Foe? is a game show that aired from 2002 to 2003 on the Game Show Network in the US. It is an example of the prisoner's dilemma game tested on real people, but in an artificial setting. On the game show, three pairs of people compete. When a pair is eliminated, they play a game similar to the prisoner's dilemma to determine how the winnings are split. If they both cooperate (Friend), they share the winnings 50–50. If one cooperates and the other defects (Foe), the defector gets all the winnings, and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the reward matrix is slightly different from the standard one given above, as the rewards for the "both defect" and the "cooperate while the opponent defects" cases are identical. This makes the "both defect" case a weak equilibrium, compared with being a strict equilibrium in the standard prisoner's dilemma. If a contestant knows that their opponent is going to vote "Foe", then their own choice does not affect their own winnings. In a specific sense, Friend or Foe has a rewards model between prisoner's dilemma and the game of Chicken.

The rewards matrix is

Pair 2
Pair 1
"Friend"
(cooperate)
"Foe"
(defect)
"Friend"
(cooperate)
1
1
2
0
"Foe"
(defect)
0
2
0
0

This payoff matrix has also been used on the British television programs Trust Me, Shafted, The Bank Job and Golden Balls, and on the American game shows Take It All, as well as for the winning couple on the Reality Show shows Bachelor Pad and Love Island. Game data from the Golden Balls series has been analyzed by a team of economists, who found that cooperation was "surprisingly high" for amounts of money that would seem consequential in the real world but were comparatively low in the context of the game.[54]

Iterated snowdrift

Researchers from the University of Lausanne and the University of Edinburgh have suggested that the "Iterated Snowdrift Game" may more closely reflect real-world social situations. Although this model is actually a chicken game, it will be described here. In this model, the risk of being exploited through defection is lower, and individuals always gain from taking the cooperative choice. The snowdrift game imagines two drivers who are stuck on opposite sides of a snowdrift, each of whom is given the option of shoveling snow to clear a path or remaining in their car. A player's highest payoff comes from leaving the opponent to clear all the snow by themselves, but the opponent is still nominally rewarded for their work.

This may better reflect real-world scenarios, the researchers giving the example of two scientists collaborating on a report, both of whom would benefit if the other worked harder. "But when your collaborator doesn't do any work, it's probably better for you to do all the work yourself. You'll still end up with a completed project."[55]

Example snowdrift payouts (A, B)
A
Cooperates Defects
Cooperates 500, 500 200, 800
Defects 800, 200 0, 0
Example PD payouts (A, B)
A
Cooperates Defects
Cooperates 500, 500 −200, 1200
Defects 1200, −200 0, 0

Coordination games

In coordination games, players must coordinate their strategies for a good outcome. An example is two cars that abruptly meet in a blizzard; each must choose whether to swerve left or right. If both swerve left, or both right, the cars do not collide. The local left- and right-hand traffic convention helps to co-ordinate their actions.

Symmetrical co-ordination games include Stag hunt and Bach or Stravinsky.

Asymmetric prisoner's dilemmas

A more general set of games is asymmetric. As in the prisoner's dilemma, the best outcome is cooperation, and there are motives for defection. Unlike the symmetric prisoner's dilemma, though, one player has more to lose and/or more to gain than the other. Some such games have been described as a prisoner's dilemma in which one prisoner has an alibi, whence the term "alibi game".[56]

In experiments, players getting unequal payoffs in repeated games may seek to maximize profits, but only under the condition that both players receive equal payoffs; this may lead to a stable equilibrium strategy in which the disadvantaged player defects every X game, while the other always co-operates. Such behavior may depend on the experiment's social norms around fairness.[57]

Guardian's Dilemma

It is not only prisoners who face dilemmas. Guardians also confront situations in which there are only unattractive choices from which to choose. Examples can easily be found in cases where one agent must smooth tensions between its own partners: one can think of two colleagues jockeying for career advancement and the troubles this causes their company's managing director; two officials competing for promotion and the tension this causes for the head of their bureau; or in parenting when two siblings vie for attention and the anxiety this causes their parents. If the behaviour of the guardian satisfies one side, the other side feels exposed and alienated.

From an international relations perspective, Dr Spyros Katsoulas introduces the concept of the guardian's dilemma.[58] The guardian's dilemma is defined as the condition in which two states maintain their enmity towards one another despite sharing a stronger common ally. By default, a dilemma is a situation with unsatisfactory choices. The guardian's dilemma lies in the fact that the stronger state can neither stay out of a crisis between its allies nor get actively involved without affecting the fragile equilibrium. If the guardian abstains, the situation may spin out of control; if the guardian gets involved, any tilt against one side may be seen as a victory or a window of opportunity for the other. Expanding on Glenn Snyder's concept of the alliance security dilemma,[59] the outcomes of the interaction between the guardian and the two smaller partners are described as abandonment, entrapment, and emboldening.

Software

Several software packages have been created to run prisoner's dilemma simulations and tournaments, some of which have available source code.

In fiction

Hannu Rajaniemi set the opening scene of his The Quantum Thief trilogy in a "dilemma prison". The main theme of the series has been described as the "inadequacy of a binary universe" and the ultimate antagonist is a character called the All-Defector. Rajaniemi is a Cambridge-trained mathematician and holds a Ph.D. in mathematical physics – the interchangeability of matter and information is a major feature of the books, which take place in a "post-singularity" future. The first book in the series was published in 2010, with the two sequels, The Fractal Prince and The Causal Angel, published in 2012 and 2014, respectively.

A game modeled after the (iterated) prisoner's dilemma is a central focus of the 2012 video game Zero Escape: Virtue's Last Reward and a minor part in its 2016 sequel Zero Escape: Zero Time Dilemma.

In The Mysterious Benedict Society and the Prisoner's Dilemma by Trenton Lee Stewart, the main characters start by playing a version of the game and escaping from the "prison" altogether. Later they become actual prisoners and escape once again.

In The Adventure Zone: Balance during The Suffering Game subarc, the player characters are twice presented with the prisoner's dilemma during their time in two liches' domain, once cooperating and once defecting.

In the 8th novel from the author James S. A. Corey Tiamat's Wrath, Winston Duarte explains the prisoner's dilemma to his 14-year-old daughter, Teresa, to train her in strategic thinking.[citation needed]

An extreme version of the prisoner's dilemma is featured in the 2008 film The Dark Knight in which the Joker rigs two ferries, one containing prisoners and the other containing civilians, arming both groups with the means to detonate the bomb on each other's ferries. Ultimately, the two sides decide not to act.

Notes

1. ^ For example see the 2003 study[17] for discussion of the concept and whether it can apply in real economic or strategic situations.
2. ^ This argument for the development of cooperation through trust is given in The Wisdom of Crowds, where it is argued that long-distance capitalism was able to form around a nucleus of Quakers, who always dealt honourably with their business partners. (Rather than defecting and reneging on promises – a phenomenon that had discouraged earlier long-term unenforceable overseas contracts). It is argued that dealings with reliable merchants allowed the meme for cooperation to spread to other traders, who spread it further until a high degree of cooperation became a profitable strategy in general commerce

References

1. ^ Poundstone 1993, pp. 8, 117.
2. ^ Milovsky, Nicholas. "The Basics of Game Theory and Associated Games". Retrieved 11 February 2014.
3. Rapoport, Anatol (2016), "Prisoner's Dilemma", The New Palgrave Dictionary of Economics, London: Palgrave Macmillan UK, pp. 1–5, doi:10.1057/978-1-349-95121-5_1850-1, ISBN 978-1-349-95121-5, retrieved 2021-11-29
4. ^ Fehr, Ernst; Fischbacher, Urs (Oct 23, 2003). "The Nature of human altruism" (PDF). Nature. 425 (6960): 785–91. Bibcode:2003Natur.425..785F. doi:10.1038/nature02043. PMID 14574401. S2CID 4305295. Archived (PDF) from the original on 2013-06-18. Retrieved February 27, 2013.
5. ^ Tversky, Amos; Shafir, Eldar (2004). Preference, belief, and similarity: selected writings (PDF). Massachusetts Institute of Technology Press. ISBN 9780262700931. Retrieved February 27, 2013.
6. ^ Toh-Kyeong, Ahn; Ostrom, Elinor; Walker, James (Sep 5, 2002). "Incorporating Motivational Heterogeneity into Game-Theoretic Models of Collective Action" (PDF). Public Choice. 117 (3–4): 295–314. doi:10.1023/b:puch.0000003739.54365.fd. hdl:10535/4697. S2CID 153414274. Retrieved June 27, 2015.
7. ^ Oosterbeek, Hessel; Sloof, Randolph; Van de Kuilen, Gus (Dec 3, 2003). "Cultural Differences in Ultimatum Game Experiments: Evidence from a Meta-Analysis" (PDF). Experimental Economics. 7 (2): 171–88. doi:10.1023/B:EXEC.0000026978.14316.74. S2CID 17659329. Archived from the original (PDF) on May 12, 2013. Retrieved February 27, 2013.
8. ^ Ormerod, Paul (2010-12-22). Why Most Things Fail. ISBN 9780571266142.
9. ^ Deutsch, M. (1958). Trust and suspicion. Journal of Conflict Resolution, 2(4), 265–279. https://doi.org/10.1177/002200275800200401
10. ^ Rapoport, A., & Chammah, A. M. (1965). Prisoner's Dilemma: A study of conflict and cooperation. Ann Arbor, MI: University of Michigan Press.
11. ^ Kaznatcheev, Artem (March 2, 2015). "Short history of iterated prisoner's dilemma tournaments". Journal of Conflict Resolution. 24 (3): 379–403. doi:10.1177/002200278002400301. S2CID 145555261. Retrieved February 8, 2016.
12. ^ a b Hilbe, Christian; Martin A. Nowak; Karl Sigmund (April 2013). "Evolution of extortion in Iterated Prisoner's Dilemma games". PNAS. 110 (17): 6913–18. arXiv:1212.1067. Bibcode:2013PNAS..110.6913H. doi:10.1073/pnas.1214834110. PMC 3637695. PMID 23572576.
13. ^ Grofman, Bernard; Pool, Jonathan (January 1977). "How to make cooperation the optimizing strategy in a two‐person game". The Journal of Mathematical Sociology. 5 (2): 173–186. doi:10.1080/0022250x.1977.9989871. ISSN 0022-250X.
14. ^ Shy, Oz (1995). Industrial Organization: Theory and Applications. Massachusetts Institute of Technology Press. ISBN 978-0262193665. Retrieved February 27, 2013.
15. ^ Dal Bó, Pedro; Fréchette, Guillaume R. (2019). "Strategy Choice in the Infinitely Repeated Prisoner's Dilemma". American Economic Review. 109 (11): 3929–3952. doi:10.1257/aer.20181480. ISSN 0002-8282. S2CID 216726890.
16. ^ Wedekind, C.; Milinski, M. (2 April 1996). "Human cooperation in the simultaneous and the alternating Prisoner's Dilemma: Pavlov versus Generous Tit-for-Tat". Proceedings of the National Academy of Sciences. 93 (7): 2686–2689. Bibcode:1996PNAS...93.2686W. doi:10.1073/pnas.93.7.2686. PMC 39691. PMID 11607644.
17. ^ "Bayesian Nash equilibrium; a statistical test of the hypothesis" (PDF). Tel Aviv University. Archived from the original (PDF) on 2005-10-02.
18. ^ Wu, Jiadong; Zhao, Chengye (2019), Sun, Xiaoming; He, Kun; Chen, Xiaoyun (eds.), "Cooperation on the Monte Carlo Rule: Prisoner's Dilemma Game on the Grid", Theoretical Computer Science, Springer Singapore, vol. 1069, pp. 3–15, doi:10.1007/978-981-15-0105-0_1, ISBN 978-981-15-0104-3, S2CID 118687103
19. ^ "University of Southampton team wins Prisoner's Dilemma competition" (Press release). University of Southampton. 7 October 2004. Archived from the original on 2014-04-21.
20. Press, WH; Dyson, FJ (26 June 2012). "Iterated Prisoner's Dilemma contains strategies that dominate any evolutionary opponent". Proceedings of the National Academy of Sciences of the United States of America. 109 (26): 10409–13. Bibcode:2012PNAS..10910409P. doi:10.1073/pnas.1206569109. PMC 3387070. PMID 22615375.
21. ^ Adami, Christoph; Arend Hintze (2013). "Evolutionary instability of Zero Determinant strategies demonstrates that winning isn't everything". Nature Communications. 4: 3. arXiv:1208.2666. Bibcode:2013NatCo...4.2193A. doi:10.1038/ncomms3193. PMC 3741637. PMID 23903782.
22. ^ a b Stewart, Alexander J.; Joshua B. Plotkin (2013). "From extortion to generosity, evolution in the Iterated Prisoner's Dilemma". Proceedings of the National Academy of Sciences of the United States of America. 110 (38): 15348–53. Bibcode:2013PNAS..11015348S. doi:10.1073/pnas.1306246110. PMC 3780848. PMID 24003115.
23. ^ Akin, Ethan (2013). "Stable Cooperative Solutions for the Iterated Prisoner's Dilemma". p. 9. arXiv:1211.0969 [math.DS]. Bibcode:2012arXiv1211.0969A
24. ^ Le S, Boyd R (2007). "Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma". Journal of Theoretical Biology. 245 (2): 258–67. Bibcode:2007JThBi.245..258L. doi:10.1016/j.jtbi.2006.09.016. PMID 17125798.
25. ^ Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94.
26. ^ Spaniel, William (2011). Game Theory 101: The Complete Textbook.
27. ^ Nowak, Martin; Karl Sigmund (1993). "A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game". Nature. 364 (6432): 56–58. Bibcode:1993Natur.364...56N. doi:10.1038/364056a0. PMID 8316296. S2CID 4238908.
28. ^ Stark, Oded (1989). "Altruism and the Quality of Life". The American Economic Review. 79 (2): 86–90. ISSN 0002-8282. JSTOR 1827736. Retrieved 25 July 2022.
29. ^ "Markets & Data". The Economist. 2007-09-27.
30. ^ Rehmeyer, Julie (2012-10-29). "Game theory suggests current climate negotiations won't avert catastrophe". Science News. Society for Science & the Public.
31. ^ Osang, Thomas; Nandyyz, Arundhati (August 2003). Environmental Regulation of Polluting Firms: Porter's Hypothesis Revisited (PDF) (paper). Archived (PDF) from the original on 2010-07-02.
32. ^ Brosnan, Sarah F.; Earley, Ryan L.; Dugatkin, Lee A. (October 2003). "Observational Learning and Predator Inspection in Guppies ( Poecilia reticulata ): Social Learning in Guppies". Ethology. 109 (10): 823–833. doi:10.1046/j.0179-1613.2003.00928.x.
33. ^ Dawkins, Richard (1976). The Selfish Gene. Oxford University Press.
34. ^ Ainslie, George (2001). Breakdown of Will. ISBN 978-0-521-59694-7.
35. ^ Cervantes Constantino, Garat, Nicolaisen, Paz, Martínez-Montes, Kessel, Cabana, and Gradin (2020). "Neural processing of iterated prisoner's dilemma outcomes indicates next-round choice and speed to reciprocate cooperation". Social Neuroscience. 16 (2): 103–120. doi:10.1080/17470919.2020.1859410. PMID 33297873. S2CID 228087900.{{cite journal}}: CS1 maint: multiple names: authors list (link)
36. ^ Axelrod, Robert (1980). "Effective Choice in the Prisoner's Dilemma". The Journal of Conflict Resolution. 24 (1): 3–25. doi:10.1177/002200278002400101. ISSN 0022-0027. JSTOR 173932. S2CID 143112198.
37. ^ Henriksen, Lisa (March 2012). "Comprehensive tobacco marketing restrictions: promotion, packaging, price and place". Tobacco Control. 21 (2): 147–153. doi:10.1136/tobaccocontrol-2011-050416. PMC 4256379. PMID 22345238.
38. ^ Nicholson, Walter (2000). Intermediate microeconomics and its application (8th ed.). Fort Worth, TX: Dryden Press : Harcourt College Publishers. ISBN 978-0-030-25916-6.
39. ^ a b Schneier, Bruce (2012-10-26). "Lance Armstrong and the Prisoners' Dilemma of Doping in Professional Sports | Wired Opinion". Wired. Wired.com. Retrieved 2012-10-29.
40. ^ "Pearlman: Great wasn't good enough". ESPN.com. 2006-03-14. Retrieved 2021-12-08.
41. ^ Snyder, Glenn H. (1971). ""Prisoner's Dilemma" and "Chicken" Models in International Politics". International Studies Quarterly. 15 (1): 66–103. doi:10.2307/3013593. ISSN 0020-8833.
42. ^ Jervis, Robert (1978). "Cooperation under the Security Dilemma". World Politics. 30 (2): 167–214. doi:10.2307/2009958. ISSN 1086-3338.
43. ^ a b c d Jervis, Robert (1978). "Cooperation Under the Security Dilemma". World Politics. 30 (2): 167–214. doi:10.2307/2009958. hdl:2027/uc1.31158011478350. ISSN 0043-8871. JSTOR 2009958.
44. ^ Herz, John H. (1950). Idealist Internationalism and the Security Dilemma. pp. 157–180.
45. ^ Snyder, Glenn H. (1984). "The Security Dilemma in Alliance Politics". World Politics. 36 (4): 461–495. doi:10.2307/2010183. ISSN 0043-8871.
46. ^ Jervis, Robert (1976). Perception and Misperception in International Politics. Princeton University Press. pp. 58–113. ISBN 978-0-691-10049-4.
47. ^ Glaser, Charles L. (2010). Rational Theory of International Politics. Princeton University Press. ISBN 9780691143729.{{cite book}}: CS1 maint: url-status (link)
48. ^ Axelrod, Robert; Hamilton, William D. (1981). "The Evolution of Cooperation". Science. 211 (4489): 1390–1396. doi:10.1126/science.7466396. ISSN 0036-8075.
49. ^ Gokhale CS, Traulsen A. Evolutionary games in the multiverse. Proceedings of the National Academy of Sciences. 2010 Mar 23. 107(12):5500–04.
50. ^ Poundstone 1993, pp. 126–127.
51. ^ "The Volokh Conspiracy " Elinor Ostrom and the Tragedy of the Commons". Volokh.com. 2009-10-12. Retrieved 2011-12-17.
52. ^ Ostrom, Elinor (2015) [1990]. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press. doi:10.1017/CBO9781316423936. ISBN 978-1-107-56978-2.
53. ^ Hofstadter, Douglas R. (1985). "Ch.29 The Prisoner's Dilemma Computer Tournaments and the Evolution of Cooperation.". Metamagical Themas: questing for the essence of mind and pattern. Bantam Dell Pub Group. ISBN 978-0-465-04566-2.
54. ^ Van den Assem, Martijn J. (January 2012). "Split or Steal? Cooperative Behavior When the Stakes Are Large". Management Science. 58 (1): 2–20. doi:10.1287/mnsc.1110.1413. hdl:1765/31292. S2CID 1371739. SSRN 1592456.
55. ^ Kümmerli, Rolf. "'Snowdrift' game tops 'Prisoner's Dilemma' in explaining cooperation". Retrieved 11 April 2012.
56. ^ Robinson, D.R.; Goforth, D.J. (May 5, 2004). Alibi games: the Asymmetric Prisoner' s Dilemmas (PDF). Meetings of the Canadian Economics Association, Toronto, June 4–6, 2004. Archived (PDF) from the original on 2004-12-06.
57. ^ Beckenkamp, Martin; Hennig-Schmidt, Heike; Maier-Rigaud, Frank P. (March 4, 2007). "Cooperation in Symmetric and Asymmetric Prisoner's Dilemma Games" (PDF). Max Planck Institute for Research on Collective Goods. Archived (PDF) from the original on 2019-09-02.
58. ^ Katsoulas, Spyros (2022). The United States and Greek-Turkish Relations: the Guardian's Dilemma. Routledge, Taylor & Francis Group. ISBN 9781032123370.
59. ^ Glenn H. Snyder, "The Security Dilemma in Alliance Politics" World Politics, Volume 36, Issue 4, July 1984, pp. 461 - 495 https://doi.org/10.2307/2010183