Quantal response equilibrium

Quantal response equilibrium
A solution concept in game theory
Relationship
Superset of Nash equilibrium, Logit equilibrium
Significance
Proposed by Richard McKelvey and Thomas Palfrey
Used for Non-cooperative games
Example Traveler's dilemma

Quantal response equilibrium (QRE) is a solution concept in game theory. First introduced by Richard McKelvey and Thomas Palfrey[1] [2] , it provides an equilibrium notion with bounded rationality. QRE is not an equilibrium refinement, and it can give significantly different results from Nash equilibrium. QRE is only defined for games with discrete strategies, although there are continuous-strategy analogues.

In a quantal response equilibrium, players are assumed to make errors in choosing which pure strategy to play. The probability of any particular strategy being chosen is positively related to the payoff from that strategy. In other words, very costly errors are unlikely.

The equilibrium arises from the realization of beliefs. A player's payoffs are computed based on beliefs about other players' probability distribution over strategies. In equilibrium, a player's beliefs are correct.

Application to data

When analyzing data from the play of actual games, particularly from laboratory experiments, particularly from experiments with the matching pennies game, Nash equilibrium can be unforgiving. Any non-equilibrium move can appear equally "wrong", but realistically should not be used to reject a theory. QRE allows every strategy to be played with non-zero probability, and so any data is possible (though not necessarily reasonable).

Logit equilibrium

By far the most common specification for QRE is logit equilibrium (LQRE). In a logit equilibrium, player's strategies are chosen according to the probability distribution:

${\displaystyle P_{ij}={\frac {\exp(\lambda EU_{ij}(P_{-i}))}{\sum _{k}{\exp(\lambda EU_{ik}(P_{-i}))}}}}$

${\displaystyle P_{ij}}$ is the probability of player i choosing strategy j. ${\displaystyle EU_{ij}(P_{-i}))}$ is the expected utility to player i of choosing strategy j given other players are playing according to the probability distribution ${\displaystyle P_{-i}}$. Note that the "belief" density in the expected payoff on the right side must match the choice density on the left side. Thus computing expectations of observable quantities such as payoff, demand, output, etc., requires finding fixed points as in mean field theory.

Of particular interest in the logit model is the non-negative parameter λ (sometimes written as 1/μ). λ can be thought of as the rationality parameter. As λ→0, players become "completely non-rational", and play each strategy with equal probability. As λ→∞, players become "perfectly rational", and play approaches a Nash equilibrium.

In the case of a bounded-rational potential game, this logit equilibrium was shown to be a mean-field version of the equilibrium Gibbs measure.[3] The Gibbs measure has the same property of interpolating between "completely non-rational" (infinite "temperature") and "perfectly rational" (zero "temperature") decision making. Furthermore the parameter λ is inversely related to "temperature" in the context of information theory and statistical mechanics. This temperature is proportional to the square of a scaling parameter for a Gaussian white noise by a fluctuation-dissipation argument, relating a constrained maximum information entropy model to a stochastic dynamical model which both yield the same Gibbs equilibrium measure.

For dynamic games

For dynamic (extensive form) games, McKelvey and Palfrey defined agent quantal response equilibrium (AQRE). AQRE is somewhat analogous to subgame perfection. In an AQRE, each player plays with some error as in QRE. At a given decision node, the player determines the expected payoff of each action by treating their future self as an independent player with a known probability distribution over actions.

As in QRE, in an AQRE every strategy is used with nonzero probability. This provides an additional advantage of AQRE over perfectly rational solution concepts. Since every path is followed with some probability, there is no concern about defining beliefs "off the equilibrium path".

Critiques

Free parameter

LQRE has the free parameter λ. As λ→∞, LQRE→Nash equilibrium, so LQRE will always be at least as good a fit as Nash equilibrium. Changes in the parameter can result in large changes to equilibrium behavior.

However, the theory is incomplete without describing where λ comes from. Estimates of λ from experiments can vary significantly. Sometimes this variance seems to be a result of individual characteristics (for instance, λ sometimes increases with learning). Other times it appears that λ varies from game to game.

In the case of certain bounded rational potential games, λ is inversely proportional to the square of the magnitude of fluctuations of the non-rational component of decisions (equivalent to "temperature"). Such fluctuations can be due to endogenous properties of agents such as lack of complete information, biases, emotions, etc. They can also result from external shocks, information, etc., that influence agents' decisions. In this sense, it may be useful to think of λ as a "tuning" parameter that is measured to fit data, much as temperature is measured to determine what will happen to water (turn to ice, steam, or remain water) in any given instance of an experiment.