Научная статья на тему 'Uncertainty aversion and equilibrium in extensive games'

Uncertainty aversion and equilibrium in extensive games Текст научной статьи по специальности «Математика»

CC BY
7
2
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
RATIONALITY / EXTENSIVE GAME / UNCERTAINTY AVERSION / PERFECT EQUILIBRIUM / BACKWARD INDUCTION / MAXIMIN / CHOQUET EXPECTED UTILITY THEORY

Аннотация научной статьи по математике, автор научной работы — Rothe Jörn

This paper formulates a rationality concept for extensive games in which deviations from rational play are interpreted as evidence of irrationality. Instead of confirming some prior belief about the nature of non-rational play, we assume that such a deviation leads to genuine uncertainty. Assuming complete ignorance about the nature of non-rational play and extreme uncertainty aversion of the rational players, we formulate an equilibrium concept on the basis of Choquet expected utility theory. Equilibrium reasoning is thus only applied on the equilibrium path, maximin reasoning applies off the equilibrium path. The equilibrium path itself is endogenously determined. In general this leads to strategy profiles differ qualitatively from sequential equilibria, but still satisfy equilibrium and perfection requirements. In the centipede game and the finitely repeated prisoners’ dilemma this approach can also resolve the backward induction paradox.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Uncertainty aversion and equilibrium in extensive games»

Jorn Rothe

London School of Economics,

Department of Management,

Managerial Economics and Strategy Group,

Houghton Street, London, WC2A 2AE, United Kingdom E-mail: J.D.Rothe@lse.ac.uk WWW home page: http://www2.lse.ac.uk/management/people/jrothe.aspx

Abstract. This paper formulates a rationality concept for extensive games in which deviations from rational play are interpreted as evidence of irrationality. Instead of confirming some prior belief about the nature of non-rational play, we assume that such a deviation leads to genuine uncertainty. Assuming complete ignorance about the nature of non-rational play and extreme uncertainty aversion of the rational players, we formulate an equilibrium concept on the basis of Choquet expected utility theory. Equilibrium reasoning is thus only applied on the equilibrium path, maximin reasoning applies off the equilibrium path. The equilibrium path itself is endogenously determined. In general this leads to strategy profiles differ qualitatively from sequential equilibria, but still satisfy equilibrium and perfection requirements. In the centipede game and the finitely repeated prisoners’ dilemma this approach can also resolve the backward induction paradox.

Keywords: rationality, extensive game, uncertainty aversion, perfect equilibrium, backward induction, maximin, Choquet expected utility theory.

1. Introduction

According to the principle of sequential rationality, a rational player of an extensive form game regards his opponents as rational even after a deviation from rational play. The internal consistency of this principle is subject of much debate (see, e.g., Aumann (1995, 1996, 1998), Binmore (1996), Reny (1993)).

Attributing non-rational deviations to ‘trembles’ of otherwise perfectly rational players (Selten (1975)) is logically consistent, but raises the second concern with the principle of sequential rationality, that is its empirical plausibility. Quite independently of the question whether there exists a rationality concept that implies, or is at least consistent with sequential rationality, the question arises if there is room for an alternative rationality concept, in which deviations from the solution concept are interpreted as evidence of non-rationality. In this paper, we attempt to formulate such a rationality concept on the basis of Choquet expected utility theory.

* First version 1999. This paper continues the research presented in Rothe (1996, 2009, 2010) and is based on chapter 3 of my PhD thesis submitted to the University of London in 1999. I am grateful for financial support from the Deutsche Forschungsgemeinschaft (Graduiertenkolleg Bonn), Deutscher Akademischer Austauschdienst, Economic and Social Research Council, the European Doctoral Program and the LSE.

In a seminal series of papers, Kreps, Milgrom, Roberts, and Wilson (19S2) (henceforth KMRW) developed the methodology for analysing games with possibly non-rational opponents. In their models, there is some a priori uncertainty about the rationality of the opponent. Under subjective expected utility, players act as if they possess a probability distribution over the ‘type’ of the opponents’ non-rationality. They maximise utility given their beliefs, and in sequential equilibrium their beliefs are consistent with the play of rational opponents. KMRW have shown how even small degrees of uncertainty about rationality can have large equilibrium effects. They showed that this can explain both intuitive strategic phenomena, particularly in industrial organization, and, at least to some degree, experimental evidence.

One problem in this approach, however, is for an outside observer to specify the probability distribution over the types of the non-rational opponents before experimental or field data are available. A second problem is that analysing the strategic interaction as a game with incomplete information implies that the other players, whether rational or not, can be modelled as ‘types’, who possess a consistent infinite hierarchy of beliefs about the strategic interaction. Thus, the players in this methodology are not really non-rational; rather, they are rational but have preferences that differ from those that the game attributes to ‘rational’ players.

In this paper, we argue that a consistency argument addresses both of these problems. A game-theoretic solution concept that singles out rational strategies implicitly defines all other strategies as non-rational. Thus, consistency requires that beliefs about non-rational players should not exclude any of these non-rational strategies. In other words, if the rationality concept is point-valued, the beliefs about non-rational play should include all deviations, and thus must be set-valued. So in this sense, the rationality concept itself pins down beliefs about non-rational play, but excludes subjective expected utility theory (henceforth SEU) as the adequate model of these beliefs. Thus, SEU is not an appropriate framework for beliefs about non-rationality when rationality is endogenous.

Thus, this paper argues that, after an opponent deviates from rational play, a rational player faces genuine uncertainty. What matters, then, is the rational player’s attitude towards uncertainty. This paper formulates the equilibrium concept for the case in which rational players are completely uncertainty averse. It is this case that has led to the development of decision theories with set-valued and nonadditive beliefs as an explanation of the Ellsberg (1961) paradox. Consequently, we base the equilibrium concept on Choquet expected utility theory (henceforth CEU) developed by Schmeidler (19S9).

This paper joins a growing literature that applies CEU to games. The first of these were Dow and Werlang (1994) andKlibanoff (1993). Dow and Werlang (1994) consider normal form games in which players are CEU maximisers. Klibanoff (1993) similarly considers normal form games in which players follow maxmin-expected utility theory (Gilboa and Schmeidler (19S9)), which is closely related to CEU. In Hendon et al. (199Б) players have belief functions, which amounts to a special case of CEU. Extensions and refinements have been proposed by Eichberger and Kelsey (1994), Lo (199Ба), Marinacci (1994) and Ryan (1997). Epstein (1997a) analysed rationalizability in normal form games. These authors consider normal form games and do not distinguish between rational and non-rational players. The paper closest to ours is Mukerji (1994), who considers normal form games only but argues that the distinction between rational and non-rational players is necessary to reconcile

CEU with the equilibrium concept. For normal form games our concepts differ only in motivation and technical detail. The present paper mainly concerns extensive form games. Extensive games have been studied by Lo (1995b) and Eichberger and Kelsey (1995). Lo (1995b) extends Klibanoff’s approach to extensive games, Eichberger and Kelsey (1995) are the first to use the Dempster-Shafer updating rule (see section 3) in extensive games. They do not distinguish between rational and non-rational players.

This paper is organized as follows: The next section discusses an example. Section 3 presents Choquet expected utility theory and discusses the problem of updating non-additive beliefs. In section 4 we formulate the equilibrium concept for two player games with perfect information. In section 5 we discuss the centipede game and the finitely repeated prisoners’ dilemma in order to relate the equilibrium concept to the foundations of game theory. Section 6 elaborates on the extension of the solution concept to general extensive games. Section 7 concludes. There is one appendix on details of updating non-additive beliefs.

2. An Example

Consider the following extensive form game, in which payoffs are given in von Neumann - Morgenstern utilities1:

Fig. 1. A simultaneous game preceded by an outside option

First, consider the case that x = 4. Then D cannot be rational for P2, because it is strictly dominated. Therefore, P1 knows at the beginning of the subgame that P2 is not rational and, consequently, has no reason to assume that P2 will play his strictly dominant strategy L in the subgame. Pi’s best reply to L is T, but, intuitively, it is very risky.

In the absence of a theory of rationality, P1 faces true uncertainty about P2’s play in the subgame. Therefore, if P1 is sufficiently uncertainty averse, it becomes rational for him to play B. Thus, under these assumptions the rational strategies are U, L (because it is strictly dominant in the subgame) and B. This strategy combination is not a Nash equilibrium, yet no player has an incentive to deviate unilaterally from these rational strategies. Moreover, if there is some initial doubt

1 After D, both players Pi and P2 know that P2 chose D and they play the normal form subgame, i.e. choose simultaneously between T and B, respectively L and R.

e > 0 about P2’s rationality2, then D is also not a probability zero event, because nothing is known about a non-rational player, who might therefore well play D3 All that it takes to reach these conclusion formally is a calculus that allows non-additive, or set-valued, beliefs, and that captures P1’s uncertainty as well as his uncertainty aversion. In addition, in order to conclude that that P2 must be non-rational after D we need an updating rule for non-additive beliefs.

Secondly, consider the case that x = 2. The above criticism of subgame perfection still applies: The equilibrium (T, L) in the subgame makes U rational, but once U is designated as rational, P1 faces true uncertainty after D and, if uncertainty averse, will rationally deviate to B. So (U, L, T) is not a rational solution. However, neither is (D, L, B), because if D is rational then P1 is justified in anticipating strategy L, and should play his best reply T. Now P2 has an incentive to deviate.

So suppose it is rational for P2 to play D with probability p. Suppose further that there is a probability e > 0 that P2 is not rational at the beginning of the game. Then P1’s optimal strategy in the subgame will depend on his belief about the rationality of P2, given p and e. The same updating rule that for x = 4 allows the natural conclusion that P2 is rational after D gives the result4 that

v(P2 rational | D) = ^ ^

1 - (1 - e)(1 -P)'

Note that v(P2 rational | D) = (1 — e)p+£(i_p) < 1 — e. Thus, in line with his uncertainty aversion P1 considers the worst case when he updates his belief e. This worst case is that a non-rational player will play D with probability 1, because this makes it most likely that his behavior in the subgame is unpredictable, and, again due to uncertainty aversion, should be evaluated with the worst outcome.

Since a rational P2 will play L, P1 knows that T will give utility 1 with probability . With the complementary probability, P2 is non-rational and the

theory is silent about what this means. Again, P1 faces true uncertainty, and if he is extremely uncertainty averse, he will allocate the complementary probability to the worst outcome -99. So P1’s expected utility from T is

.1-99- "

1 - (1 - e)(1 - p) 1 - (1 - e)(1 - p)

In a mixed strategy equilibrium P1 must be willing to randomize, so we must have

(1 — e)p — 99e

1 - (1 - e )(1 - p )

0,

For simplicity, assume in this example that there is no doubt about the rationality of

Pi.

If rationality is common knowledge at the beginning of the game, then D is indeed a probability zero event. In general, we take the view that there is a difference between probability zero events in decision theory and probability zero events in games, where an event for one player is an act for another. Here, D is an act that might destroy this common knowledge. It is still intuitive that P1 should consider P2 as non-rational. This conclusion could be formally reached by taking limits as e ^ 0. However, in this paper we concentrate on the case e > 0.

See the next section and the appendix.

2

i.e. p = 99-^. Note, first, that Pi is willing to mix only if e < e = i.e. if the initial doubt about rationality is small enough, otherwise T will be too risky. Secondly, as e goes to zero, p goes to zero, i.e. it is less and less rational for P2 to play D. Both aspects are quite intuitive.

Further, P2 must be willing to randomize as well, so that we must have 2 = q ■ 1 + (1 — q) ■ 3, i.e. <7=5, where q is the probability that Pi plays T. Overall, the rational strategies for given e > 0 are given by (p* = 99 L, q* = 5) if e < e and (£), L, B) if e > e. Again, no player has an incentive to deviate.

Finally, we can consider the case e —> 0. This gives the strategy profile (U, L, q* = i). Note, however, that if e = 0 (as opposed to e \ 0), Pi has an incentive to deviate from q* = i.

3. Choquet Expected Utility and Updating

Under SEU, a player has preferences over acts that map a set of states of nature S into a set of consequences Z. Under consistency assumptions about this preference ordering, the player acts as if he possesses a utility function u over consequences (cardinal, i.e. unique up to affine transformations), and a probability distribution p over states that represents subjective beliefs, and maximises expected utility. This axiomatisation of SEU is due to Savage (1954). Anscombe and Aumann (1963) have simplified this approach by assuming that acts map states into lotteries (probability distributions) over states.

Ellsberg’s paradox (Ellsberg (1961)) provides evidence, however, that players do not necessarily act as if their beliefs are probability distributions. On the contrary, these experiments provide evidence for the hypothesis that beliefs are non-additive, and that players are uncertainty averse.

CEU also considers a preference relation over acts. Under weaker consistency assumptions, a player still acts as if he possesses a cardinal utility function u and subjective beliefs v, and maximises ‘expected utility’. The difference to SEU is that beliefs no longer have to be additive. Non-additive beliefs are given by a set function v that maps events (sets of states) into IR such that

(i) v(0) = 0,

(ii) v(tt) = 1,

(iii) E C F =^ v(E) < v(F).

CEU was first axiomatised by Schmeidler (1989)5.

The expectation of a utility function with respect to non-additive beliefs v is defined by Choquet (1953). For u > 0 the Choquet integral is given by the extended Riemann integral

j u dv := j v(u > t)dt,

where v(u > t) is short for v({s G S|u(s) > t}). For arbitrary u = u+ — u-, where u+ := max{u, 0} and u- := max{— u, 0} denote the positive and the negative part, the Choquet integral is defined as / u dv := f u+ dv — J u- dv, where v is the dual of v, i.e. v(E) := 1 — v(E) and E is the complement of E.

Non-additive beliefs express uncertainty aversion6 if v is supermodular, i.e. v(EU E') + v(E n E') > v(E) + v(E'). If v is supermodular, then its core Core(v) :=

5 See also Gilboa (1987), Wakker (1989) and Sarin and Wakker (1992).

6 Note that we only claim that supermodularity is sufficient for uncertainty aversion, not that it is also necessary. Necessity is a controversial question (see Epstein (19976) and

{p|p(E) > v(E)} of additive set functions p that eventwise dominate v is non-empty (Shapley, 1971). In that case, we can equivalently think of the players as possessing She set of additive belisfs Core(v). The Choquet integral of u is then given by j u dv = minp£ Core(v) f u dp (Schmeidler (1986), Schmeidler (1989), Gilboa and Schmeidler (15)89)). .

Finally, we have to specify how players update beliefs. There is no universally agreed upon updating rule for non-additive beliefs. Instead, we take the view that different updating rules are appropriate for different circumstances. In line with the assumption that players are uncertainty averse, we use the Dempster-Shafer rule (Dempster (1967), Shafer (1976)), which is given by

I ;= v(AUB)-_v(B)

1 — v(B)

The Dempster-Shafer rule reduces to Bayes’ Rule if the capacity v is additive. Gilboa and Schmeidler (1993) show that the Dempster-Shafer rule corresponds to pessimistic updating. The Dempster-Shafer rule is not dynamically consistent, but there is no dynamically consistent updating rule for non-additive beliefs (see, e.g., Epstein and Breton (1993) and Eichberger and Kelsey (1996)). Thus the Ellsberg paradox implies that updating must be dynamically inconsistent. The Dempster-Shafer rule preserves supermodularity (Fagin and Halpern (1990)), and is commutative (Gilboa and Schmeidler (1993)).

Finally, we note that our approach does not rely on the details of the Dempster-Shafer rule. Any updating rule that takes into account that there are no probability zero events when non-rational play is unrestricted is admissible. Which updating rule will eventually prove to be the correct one is an issue that will have to be settled experimentally, for a first step in this direction see Cohen et al. (1999).

4. Perfect Choquet Equilibria

We use the following notation for extensive form games as defined in Selten (1975) and Kreps and Wilson (1982a): Let r be an extensive game, finite and with perfect recall. Let V be the set of vertices, with decision nodes X and endnodes Z. Let 0 be the origin (empty history). Let ^ be the precedence relation, i.e. v ■< v' means that there is a path from v to v'. The relation ^ is an arborescence, i.e. a partial ordering in which different nodes have disjoint successor sets. Let I be the player set. Let Xi be the decision nodes of player i G I. Let Hi be the set of player i’s information sets hi G Hi. Let A(hi) be the set of actions that are available to player i at his information set hi, similarly let A(xi) be the set of actions that are available to player i at his decision node xi. Let Ai be the set of actions available to player i at some information set. Let X0 be the set of all nodes at which there is a random move, and for x0 G X0 let n(x0) be the probability distribution over A(x0).

Let Si be the set of pure strategies si : Hi —> Ai of player i, si(hi) G A(hi). Let

Si be the set of behavior strategies of player i, i.e. ai (hi) is a probability distribution over A(hi). The sets S and S are the sets of pure and behavior strategy profiles s G S = xieISi, a G S = xieISi. As usual, s-i and a-i denote i-incomplete

Ghirardato and Marinacci (1997)). The reason why we associate uncertainty aversion with supermodularity is that we can then think interchangeably of non-additive and set-valued beliefs.

strategy combinations. Similarly, si,-hi and ai,-hi denote ^-incomplete strategies of player i, i.e. strategies that do not specify an action at information set hi.

Let ui : Z —> IR be the von Neumann - Morgenstern utility function of player i. For s G S, let ui(s) be the expected utility of player i if the pure strategy combination s is played and random moves are drawn according to the distributions n(x0). For a G S, let ui(a) be the expected utility of player i if the behavior strategy combination a is played. For a decision node x G X, let u^al'x) be the conditional expected utility of player i, if the game starts at decision node x and the behavior strategy combination a is played.

The definition of a perfect Choquet equilibrium will become quite involved for general extensive games. For this reason, we first restrict attention to two-player games with perfect information.

Since in extensive games lack of mutual knowledge of rationality arises endogenously whenever a player deviates from his rational strategy, we consider a situation in which rationality is in general not mutual knowledge. We aim to define what rational strategies are. We assume that rational players maximise Choquet - expected utility, i.e. possess a utility function u and maximise utility given their beliefs. Since the opponent may be rational or not, their beliefs can be expressed as two capacities vr and v^, where vr is the belief about the play of rational opponents and the belief about the play of non-rational opponents. Let eij(xi) be player i’s belief that player j is not rational at decision node xi.

So for given beliefs the rational player chooses his action at decision node xi by maximising s

max [1 — eij (xi)] J ui(a,a*_x., Sj | xH) dvR

aEA(xi) ’ *

+tij(xi) J Ui{a,a* _Xi, Sj I Xi) d v^,

where a*-x is player i’s plan how to continue playing.

The strategy of a rational opponent has to be determined endogenously. So, in equilibrium beliefs vR gave to coincide with the opponent’s rational strategy a*. In particular, vR is an additive belief and the Choquet integral reduces to the usual integral, i.e.

j ui(a,a*, — x* ,sj 1 xi ) dvR ^ ui(a,a*, — x* ,a* 1 xi)-

It remains to specify the beliefs about play of non-rational opponents. Since the solution concept specifies rational strategies only, every deviation has to be considered non-rational. Thus v-^ should not impose any restriction on the play of a non-rational player, so that the rational player faces non-additive uncertainty. What matters then is the rational players attitude towards uncertainty. We define the solution concept for the case in which rational players are uncertainty-averse7 Consequently we assume that vis the basic capacity that assigns belief

1 0, else

7 The Ellsberg paradox seems to point towards uncertainty aversion, and this has been the main motivation for developing CEU. Smithson (1997) reports that uncertainty aversion is a robust phenomenon in the Ellsberg experiment. However, Smithson (1997) also draws attention to the fact that uncertainty aversion is not a universal empirical fact.

to the event that a non-rational player’s strategy is in the set Sj.

Modelling complete uncertainty as a basic capacity as opposed to a uniform probability distribution also has the practical advantage that the expected value of the utility function does not depend on the description of the state space. For instance, if a superfluous move, i.e. a copy one of the opponent’s strategies, is added to the opponent's strategy set, Choquet expected utility under a basic capacity is the same, whereas the expected utility under a uniform probability distribution would, in general, change.

For this capacity, the Choquet integral reduces to8

/Ui(a, a* _x., Sj | Xi) d v-ft = min Wj(a, a* _x., Sj | x*).

’ ^ sj £S3 ’ *

Overall, a rational player thus maximises his expected utility, given his beliefs eij, vr and v-ft. The perfection requirement now means that a player maximises his utility at each decision node in the game, conditional on that node being reached. Moreover, as the game progresses he updates his beliefs, and since his beliefs about non-rational opponents are non-additive he does so on the basis of the Dempster-Shafer rule. In a perfect Choquet equilibrium, a rational player correctly anticipates the play of a rational opponent and has no incentive to deviate. Formally:

Definition 1. Let r be a finite extensive two-player game with perfect information. Then a * is a perfect Choquet equilibrium iff (if and only if) for each players i, each of his decision nodes xi, and each pure action a * (xi) in the support of a* (xi)

a* (xi) G arg max [1 — ej (xi)] ui(a,a*_x. ,a* | xi)

aeA(xi) * J

+eij(xi) [min ui(a,a*_x.,Sj | xi)],

Sj eSj

i(xi) :=

i(x'i)

1 — [1 — eij (xi)][1 — H aj (xj)]

x'.^xj x*

where xi is player i’s decision node that precedes xi, the product is taken over all decision nodes of player j that lie between xi and xi, and a* (xj) is the probability that player j takes the action that leads from xi to xi9.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Thus, a perfect Choquet equilibrium resolves the infinite regress that arises in a situation in which rationality is not mutual knowledge: Rationality means to maximise utility given beliefs; these beliefs take into account that a rational opponent will do the same, and that the rationality concept does not restrict the play of a non-rational opponent.

8 Note that in contrast to some of the literature on CEU in games we do not restrict rational players’ beliefs to ‘simple capacities’, i.e. distorted probability distributions. In principle, the players may have arbitrary beliefs about non-rational play. Here, we assume instead that a rational player distinguishes between rational and non-rational opponents.

9 The updating rule takes into account that the opponent may move more than once between xi and xi. See remark (6) in the appendix.

Note that in a perfect Choquet-equilibrium the equilibrium path is supported by a different solution concept, i.e. maximin play, off the equilibrium path. Consequently, the solution concept does not suffer from the logical deficiency of subgame-perfection, where the equilibrium path is supported by equilibrium reasoning off the equilibrium path.

Note also the important difference between subjective expected utility theory and Choquet expected utility in justifying the maximin-strategy against non-rational opponents. Under subjective expected utility the maximin-strategy is rational only if the rational player believes that the non-rational opponent minimaxes him. This belief seems difficult to justify. Under CEU the maximin-strategy is rational because the rational player cannot exclude the possibility that the non-rational opponent plays, perhaps by chance, a minimax-strategy and because he reacts aversely towards the uncertainty created by the lack of possibility to forecast a non-rational opponent's play.

This solution concept generalizes immediately to repeated normal form games, i.e. multi-stage games with observed actions (Fudenberg and Tirole (1991)), in which the players move simultaneously in each stage, and learn the (pure) actions after each stage.

In section 6 we discuss the extension of this equilibrium concept to general extensive games and to more than two players. In the next section we relate the solution concept to the foundations of game theory.

5. Subgame-Perfection

The aim of this section is to relate our solution concept to the discussion on the

foundations of game theory. The example in section 2 already shows how a per-

fect Choquet equilibrium differs from subgame-perfection. The following discussion of the centipede game shows that backward induction need not be based on common knowledge of rationality. Thus, we argue that our solution concept provides a robustness criterion for subgame-perfect equilibria.

5.1. The Centipede Game

The logical consistency of subgame-perfection has been controversial for a long

time (Binmore (1987-88), Reny (1993)). Selten’s (1975) concept of trembling-hand perfection circumvents these difficulties by explaining deviations from rationality as unsystematic trembles of otherwise rational players, so that deviations are not evidence of non-rationality. Rationality is then defined as a limiting case of nonrationality where the probability of mistakes approaches zero. Though this approach is empirically implausible, it is logically consistent.

The logical status of subgame-perfection was further clarified by Aumann (1995, 1998) (see also Binmore (1996), Aumann (1996)). Aumann (1995) shows that common knowledge of ‘ex ante substantive rationality' implies the backward induction outcome in perfect information games. Here, a player is ‘rational' if there is no other strategy that the player knows to give him higher expected utility than the one he chooses.

The distinction between ‘ex ante' and ‘ex post' rationality refers to the point in the game when his knowledge matters. ‘Ex ante rationality' at some decision node v means that at the beginning of the game he knows of no better action at v,

‘ex post’ rationality means that when v is reached he knows of no better strategy. Consequently, 'ex ante’ rationality is weaker than 'ex post’ rationality.

The distinction between ‘substantive’ and ‘material’ rationality refers to the decision nodes where the player is assumed to be rational. Thus ‘substantive’ rationality means that a player is rational at all decision nodes, whether they are reached by rational play or not. ‘Material’ rationality, on the other hand, means that players are only assumed to be rational at reached decision nodes. ‘Material’ rationality is weaker than substantive rationality, and Aumann shows that ‘material’ rationality does not imply the backward induction outcome.

Aumann (1998) notices that his result can be sharpened for the centipede game. The centipede game (Rosenthal (1981)) has become a cornerstone for the discussion of the foundations of game theory. Aumann shows that, due to its specific payoff structure, common knowledge of ‘ex post material rationality’ implies the backward induction outcome in the centipede game. The rationality concept cannot be weakened to ‘ex ante material rationality’. Note that common knowledge of rationality implies the backward induction outcome, not the backward induction strategy profile.

Ann

Bob

Ann

Ann

Bob

2n + 2 2n + 1

2 1 4

1 4 3

2n 2n - 1'

2n - 1 2n + 2

Fig. 2. A Centipede Game

It is immediate that the perfect Choquet equilibrium in the centipede game is to play down everywhere: At the last node ‘down’ is strictly dominant, at the penultimate node the player knows that a rational opponent will go down at the last node, or will consider the non-rational opponent unpredictable and assume the worst. In either case the continuation payoff is less than that from going ‘down’, so again ‘down’ is optimal. The same reasoning applies at every earlier node.

This conclusion is interesting for the following reasons: First, not only do we get the backward induction outcome, but also the backward induction strategy profile. Moreover, this profile is achieved using the same logic as subgame-perfection. Secondly, the backward induction solution arises without mutual knowledge of rationality. Finally, the original objection to backward induction no longer applies: players are not assumed to be rational off the equilibrium path.

5.2. The Finitely Repeated Prisoners’ Dilemma

One of the first papers to apply CEU to normal form games was Dow and Werlang (1994). In particular, Dow and Werlang develop an equilibrium concept for players who have non-additive beliefs and analyse the once-repeated prisoners’ dilemma. They show that players with non-additive beliefs may not backward induct, and contrast their result with that of Kreps et al. (1982).

One of the differences between a perfect Choquet equilibrium and the Nash equilibrium under uncertainty of Dow and Werlang (1994) is the way in which uncertainty arises in the game. Dow and Werlang (1994) do not distinguish between

rational and non-rational players, in their equilibrium concept players are CEU maximisers who lack, to some degree, logical omniscience. They anticipate that their opponents maximise CEU, but do not draw the conclusion about the strategy choice. In other words, theirs is an equilibrium in beliefs. In our model, it is the lack of mutual knowledge of rationality that gives rise to uncertainty. The rational players can anticipate how rational opponents will act, but not how non-rational opponents will

The main difference, however, that Nash equilibrium under uncertainty is a normal form concept. Thus when players may have non-additive beliefs, cooperation in the first period and defection in the second can be an equilibrium: If the players believe that the opponent cooperates in the second period if, and only if they do so at the first stage, they have an incentive to cooperate early, and for lack of logical omniscience both players may think so10.

Under complete uncertainty aversion, in a perfect Choquet equilibrium both players defect at all stages: In the second stage defection is strictly dominant, in the first stage a player can anticipate that a rational opponent will defect in the next stage, and a non-rational opponent causes uncertainty that is evaluated by its worst outcome, i.e. defection. So the second stage action is independent of the first stage action, and again defection is strictly dominant in the first stage.

Note that Aumann’s (1995) justification of the backward induction outcome does not apply to the repeated prisoners’ dilemma, since it is not a perfect information game. Thus, the perfect Choquet equilibrium concept sheds some light on the robustness of subgame-perfect equilibria. It is perhaps surprising that backward induction is robust in games like the centipede game or the finitely repeated prisoners’ dilemma, in which it is most counterintuitive.

However, we can also extend the result of Dow and Werlang (1994) in the following way: Instead of assuming that players are completely uncertainty averse, assume that they believe that if the opponent is non-rational, then he will defect in the first period and cooperate in the second period if, and only if, there was cooperation in the first period.

Now, if the probability that the opponent is non-rational is sufficiently high, then again it is rational to cooperate in the first period (and to defect in the second period), because first-period play influences the second-period play of the non-rational opponents. This result is interesting because it shows that backward induction may break down even if the strategies of the rational opponent can be correctly anticipated and subgame perfection is required. Yet, a basic shortcoming of this result is that it rests on this specific belief about non-rational play, that is as difficult to justify as the ‘crazy type’ of Kreps et al. (1982).

10 In fact this phenomenon is also related to another aspect of the Nash equilibrium under uncertainty in Dow and Werlang (1994), i.e. their definition of support of a non-additive measure. The support implicitly defines the knowledge of the players. What the correct support notion is for non-additive beliefs is controversial. This issue does not arise in a perfect Choquet equilibrium, in which players know the rational strategies in the usual sense.

6. Extension

The aim of this section is to discuss the extension of the solution concept to general extensive games. This extension is not straightforward, as the following example shows11:

0

Fig. 3. Extensive form game I

Note, first, that in a general extensive game a player should hold different beliefs about the degree of his opponent’s rationality at different decision nodes in the same information sets. In the above example, at note a player P2 should have belief e(0), i.e. the prior with which he started the game. However, if T strictly dominates B, then at decision node b P2 should hold the updated belief that the opponent is non-rational, i.e. e(b) = 1.

The second problem, also illustrated in the above example, is that not all decision nodes of an information set matter equally for the player who moves there. Above, P2’s belief e(b) does not matter at all, because P1 does not move after b. Only at decision node a is P2’s belief about Pi’s rationality relevant for his decision.

Overall, in the example the intuitively correct belief for P2 to have at his information set is therefore his prior e(0). We suggest to generalize this observation in the following way:

Let hi be an information set of player i. Call a decision node Xi G hi relevant for player i if his opponent has a decision node in the subtree that starts at xi. For each relevant decision node xi, calculate the (in general non-additive) belief ^'(x^) that the node is reached given the optimal strategies and beliefs e(hi) at preceding information sets hi. Form an updated belief e(xi) for each relevant decision node, where e(xi) is the Dempster-Shafer update from the preceding information sets and the equilibrium strategies. Finally, define e(hi) as the expected value of e(xi) given ^Xi) '■= Formally:

Definition 2. Let r be a finite two-player game in extensive form.

11 Nature gives the move to player 1 or 2 with probability |. Player 1 has full information, player 2 does not know if he moves first or second. The outcomes are denoted zi,...,6.

Then a* is a perfect Choquet equilibrium iff (if and only if) for each players i, each of his information sets hi, and each pure action a * (hi) in the support of a* (hi)

a* (hi) G arg max [1 - c*(hi)] Ui(a,a*_h. ,a* \ hi)

aeA(hi) ’ i J

+eij(hi) [min Ui(a,a*_Xi,s* | hi)],

Sj E Sj ’

Mi(xi) = n a*(xi) n ([1 - eij(xj)]a* (xj)+eij(xi)

V'(hi) := EX^eh, ^(xi )

ui(a,a*,-hi ,aj 1 hi) ^ , M(xi )ui(a,ai,-hi ,a* \ xi),

eij (xi ) : =

XiEhi

eij (xi)

j

x'- -<Xj -<Xi

1 - [1 - £ij(xi)][1 - JJ aj (xj)]

^ij (hi) : ^ ^ ^(xi)^ij (xi)

Xiehi

where xi (xj) are player i’s (j’s) decision nodes that precedes xi, a*(xi) (a*(xj)) is the probability that player i (j) takes the action that leads from xi (x* ’) toward xi.

Consider the following example:

X' X

X' X

0

Fig. 4. Extensive form game II

Again, assume that T is strictly dominant, so that e(b) = 1. Then fi'(a) = ^ and //(5) = Note that the calculation of //(&) again reflects uncertainty aversion,

because the worst case is that non-rational players would play B, since this would give most weight to the worst outcome as P2 weighs his decision at his information set.

The extension of the equilibrium concept to more than two players is conceptually straightforward, but computationally demanding. For two opponents P* and Pk of player i, and beliefs c* and ck about their lack of rationality, the rational player has to take all four cases into account: that either of the players is non-rational, that both are rational and that neither is. So at decision node xi he should maximise

m.ax [(1 - Cj)(1 - Ck)] ui (a,a*,-Xi ,aj ,ak)

aeA(Xi)

+ [(1 - Cj )ck} min ui(a,a*,-Xi ,a* ,sk)

sk e Sk

+ [cj (1 - ck )] min ui(a,a*-Xi ,sj ,ak ) sj eSj

+ [c*Ck] min ui(a,a*-Xi, s*,sk).

(sj : sk)e Sj ^ Sk

In particular, taking into account that both players may be non-rational means that in the worst case their actions may be correlated. Beliefs c* and ck are then updated separately on the basis of the Dempster-Shafer rule.

7. Conclusion

The paper has suggested a solution concept that combines equilibrium logic with maximin play off the equilibrium path. The solution concept is natural if rationality is not mutual knowledge, no restriction is imposed on deviations from rationality, and if players are uncertainty-averse.

Nevertheless, the solution concept also has some limitations. First, the computational effort of calculating equilibria may be quite high. Secondly, experiments show that players sometimes systematically deviate from rational play, so that it may be possible to formulate more restrictive assumptions on non-rational play after all. Note that this would give rise to a difference between a descriptive solution concept with such restrictions, and a normative concept like ours where we based the lack of restrictions on the consistency argument that a rationality concept alone does not restrict non-rational play.

Thirdly, the assumption that players are completely uncertainty-averse is extreme. Yet, at the current stage of the development of Choquet expected utility theory there is no ideal alternative12.

Finally, it seems a drastic consequence that strategic interaction comes to a complete halt after a deviation from rational play. Note, however, that the solution concept applies to one-shot games, and therefore leaves no room for real-world strategies to deal with doubts about rationality, e.g. experimentation and communication. Note also that trembling-hand perfection makes an equally extreme assumption to ensure logical consistency by postulating that otherwise fully ra-

12 It would be possible to assume that players take a weighted average of the best and the worst outcome if they are certain to face a non-rational opponent. This is Hurwicz’s optimism-pessimism index (see Arrow and Hurwicz (1972). However, by introducing another free parameter the model would lose predictive power.

tional players ‘tremble’, and that deviations therefore provide no evidence against rationality.

Needless to say, our solution concept provides nothing but a first step that may be a basis for a more refined analysis.

Acknowlegments. For helpful comments on the first version I would like to thank Jurgen Eichberger, Leonardo Felli, Hans Haller, David Kelsey, Marco Mariotti, Su-joy Mukerji, Matthew Ryan and audiences at the Jahrestagung of the Verein fur Socialpolitik in Bern (1997), the 1st World Congress of the Game Theory Society in Bilbao (2000), the 8th World Congress of the Econometric Society in Seattle (2000), and seminar audiences at the universities of Cambridge (2001), Leicester (2005), Newcastle (2006), Moscow (ICEF, 2008) and Bristol (2010). Errors are my responsibility.

Appendix

Let v be a capacity and consider the events E,F G S. The Dempster-Shafer rule specifies that the posterior capacity of event E is given by (if v(F) < 1)

„(E|F) ;= v(EUF)-_v(F)^

1 — v(F)

Let c be the prior probability that a player is not rational with 0 < c < 1. Assume that a rational player chooses an action A with probability p. Then the posterior belief about the opponent’s rationality after action A is given by

1 - (1 - C)(1 - P)'

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

This is derived as follows:

Let R be the event that the player is rational, let R be the event that he is non-rational.

We want to calculate

v[R\A) = .(«uI)-,(3)^ (1)

1 - v(A)

Since a player is either rational or not, we have

v(R\A) +v(R\A) = 1. (2)

Consequently,

(3) v{m =

(4) v(RlA) = imply

v(A) = v(R Ul)+ v(R UA)-1.

Further

[| R)

(6) v(A\R) = v(A[JR^{R).

(5) v(A\R) = V<-A[J_%VIR\ and

We know that

(7) v(R) = 1 - e,

(8) v(R) = e,

(9) v(A\R) = 1 — p, and

(10) v(A\R) =0. so that

(11) v(A U R) = (1 — e)(l — p) + e, and

(12) v(AuR) = 1 - e.

Thus

V(A) = (1 - e) + (1 - e)(l -p) + e- 1 = (1 - e)(l -p). (13)

Consequently,

(1 - c) - (1 - c)(1 - p)

v(R\A) :=

v(R\A)

1 - (1 - c)(1 - P) (1 ~ e)p

i - (i -e)(i-Py

1 - (1 - c)(1 - P)

Remarks:

(1) The derivation is only valid under lack of mutual knowledge of rationality, i.e. for e > 0 and e < 1, otherwise v(A\R) or v(A\R) are not well-defined.

(2) With 0 < c < 1 there are no probability zero events, since

v{A) = (1 - e)(l —p) < 1.

This holds for any p G [0,1], including the boundaries.

(3) In particular, if c > 0 then d > 0, independently of p. However, if p = 0, then d =1. Thus we also need to be able to update the belief c =1. Intuitively, if the prior belief about the opponent is that he is non-rational and beliefs about his behavior are uncertainty averse, then there are no probability zero events, and the posterior belief should also be that the opponent is non-rational.This can be justified directly from the Dempster-Shafer rule (1): From monotonicity, v(R) < v(RUolA), therefore v(RuA) = 1. Also, (6) implies v(A\R) = v(AuR), so again by monotonicity, v(A) < v(A U R) = 0. Since this result also follows if we substitute c = 1 into (13), we do not have to explicitly track this special case.

(4) The reason why c = 0 has to be excluded is that there is no parallel argument

that e = 0 and p = 0 should give d = 1. (3) implies v(A U R) = v(A\R) = 1 and (1) gives d = ^=y, but v{A) ^ 1.

(5) Note that action A is always interpreted as evidence of non-rationality:

v(R\A) = e > e- Thus updating is in line with uncertainty aversion,

which gives rise to non-additive beliefs in the first place. For the player, the worst case is that the non-rational opponent chose action A with probability

1, because this makes it more likely, under uncertainty aversion, to receive the worst outcome in the next stage.

(6) Note that if d = —n—Vi—r and d' = —r.——ty then d' = —r.—^

(7) Finally, note that the argument rests heavily on (2), i.e. the the requirement about beliefs that an opponent is either rational or non-rational, so that these beliefs have to be additive.

References

Anscombe, F. J. and R. J. Aumann (1963). A definition of subjective probability. Annals of Mathematical Statistics, 34, 199-205.

Arrow, K. J. and L. Hurwicz (1972). An optimality criterion for decision-making under ignorance. in: Carter & Ford (1972), 1-11.

Aumann, R. J. (1995). Backward induction and common knowledge of rationality. Games and Economic Behavior, 8, 6-19.

Aumann, R. J. (1996). Reply to Binmore. Games and Economic Behavior, 17, 138-46.

Aumann, R. J. (1998). On the centipede game. Games and Economic Behavior, 23, 97-105.

Binmore, K. G. (198788). Modelling rational players I & II. Economics and Phi- losophy 3 & 4, 3:179-214, 4:9-55.

Binmore, K. G. (1996). A note on backward induction. Games and Economic Behavior, 17, 135-7.

Carter, C. F. and J. L. Ford (Eds.) (1972). Uncertainty and Expectations in Economics. Basil Blackwell: Oxford.

Choquet, G. (1953). Theory of capacities, Annales de l’Institut Fourier (Grenoble), 5, 131-295.

Cohen, M., Gilboa, I., Jaffray, J.-Y. and D. Schmeidler (1999). An experimental study of updating ambiguous beliefs. mimeo.

Dempster, A. P. (1967). Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics, 38, 325-39.

Dow, J. and S. R. d. C. Werlang (1994). Nash equilibrium under Knightian uncertainty: Breaking down backward induction. Journal of Economic Theory, 64, 305-324.

Eichberger, J. and D. Kelsey (1994). Non-additive beliefs and game theory. Discussion Paper 9410, Center for Economic Research.

Eichberger, J. and D. Kelsey(1995). Signalling games with uncertainty. mimeo.

Eichberger, J. and D. Kelsey (1996). Uncertainty aversion and preference for randomisation. Journal of Economic Theory, 71, 31-43.

Ellsberg, D. (1961). Risk, ambiguity and the savage axioms. Quarterly Journal of Economics, 75, 643-69.

Epstein, L. G. (1997a). Preference, rationalizability and equilibrium. Journal of Economic Theory, 73, 1-29.

Epstein, L. G. (1997b). Uncertainty aversion. mimeo.

Epstein, L. G. and M. L. Breton (1993). Dynamically consistent beliefs must be bayesian. Journal of Economic Theory, 61, 1-22.

Fagin, R. and J. Y. Halpern (1990). A new approach to updating beliefs. mimeo.

Fudenberg, D. and J. Tirole (1991). Game Theory. MIT Press: Cambridge, MA.

Ghirardato, P. and M. Marinacci (1997). Ambiguity made precise: A comparative foundation and some implications. mimeo, University of Toronto.

Gilboa, I. (1987). Expected utility with purely subjective non-additive probabilities. Journal of Mathematical Economics, 16, 65-88.

Gilboa, I. and D. Schmeidler (1989). Maxmin expected utility with non-unique prior. Journal of Mathematical Economics, 18, 141-153.

Gilboa, I. and D. Schmeidler (1993). Updating ambiguous beliefs. Journal of Eco- nomic Theory, 59, 33-49.

Hendon, E., Jacobsen, H. J., Sloth, B. and T. Tranaes (1995). Nash equilibrium in lower probabilities. mimeo.

Klibanoff, P. (1993). Uncertainty, decision and normal norm games. mimeo.

Kreps, D.M. and R. B. Wilson (1982a). Reputation and imperfect information. Journal of Economic Theory, 27, 253-79.

Kreps, D.M. and R. B. Wilson (1982b). Sequential equilibria. Econometrica 50, 863-894.

Kreps, D.M., Milgrom, P., Roberts, J. and R. B. Wilson (1982). Rational cooperation in the finitely repeated prisoners’ dilemma. Journal of Economic Theory, 27, 245-252.

Lo, K. C. (1995a). Nash equilibrium without mutual knowledge of rationality. mimeo, University of Toronto.

Lo, K. C. (1995b). Extensive form games with uncertainty averse players. mimeo, University of Toronto.

Marinacci, M. (1994). Equilibrium in ambiguous games. mimeo.

Milgrom, P. R. and J. Roberts (1982a). Limit pricing and entry under incomplete information: An equilibrium analysis. Econometrica, 50, 443-60.

Milgrom, P. R. and J. Roberts (1982b). Predation, reputation and entry deterrence. Journal of Economic Theory, 27, 280-312.

Mukerji, S. (1994). A theory of play for games in strategic form when rationality is not common knowledge. mimeo.

Reny, P. J. (1993). Common belief and the theory of games with perfect information. Journal of Economic Theory, 59, 257-74.

Rosenthal, R. W. (1981). Games of perfect information, predatory pricing and the chain store paradox. Journal of Economic Theory, 25, 92-100.

Rothe, J. (1996). Uncertainty aversion and equilibrium. mimeo.

Rothe, J. (2009). Uncertainty aversion and equilibrium. In: Contributions to game theory

and management (Petrosjan, L. A. and N. A. Zenkevich, eds), Vol. II, pp. 363-382.

Rothe, J. (2010). Uncertainty aversion and equilibrium in normal form games. In: Contri-

butions to game theory and management (Petrosjan, L. A. and N. A. Zenkevich, eds), Vol. III, pp. 342-367.

Ryan, M. (1997). A refinement of Dempster-Shafer equilibrium. mimeo, University of Auckland.

Sarin, R. and P.P. Wakker (1992). A simple axiomatization of nonadditive expected utility. Econometrica 60, 1255-72.

Savage, L. J. (1954). The Foundations of Statistics. Wiley: New York, NY (2nd edn.: Dover, 1972).

Schmeidler, D. (1986). Integral representation without additivity. Proceedings of the American Mathematical Society, 97, 255-61.

Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Econo-metrica, 57, 571-587.

Selten, R. (1975). Re-examination of the perfectness concept for equilibrium points in extensive games. International Journal of Game Theory, 4, 25-55.

Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press: Princeton, NJ.

Shapley, L. S. (1971). Cores of convex games. International Journal of Game Theory, 1, 11-26.

Smithson, M. J. (1997). Human judgment and imprecise probabilities. mimeo, Imprecise Probability Project.

Wakker, P.P. (1989). Additive Representations of Preferences. Kluwer Academic Publishers: Dordrecht.

i Надоели баннеры? Вы всегда можете отключить рекламу.