УДК: 517.577.1 MSC2010: 91A10
A NEW APPROACH TO OPTIMAL SOLUTIONS OF NONCOOPERATIVE GAMES: ACCOUNTING FOR SAVAGE-NIEHANS
RISK
© V. I. Zhukovskiy
Lomonosov Moscow State University Faculty of Computational Mathematics and Cybernetics Department of Optimal Control Leninskiye Gory, GSP-1, Moscow, 119991, Russia e-mail: [email protected]
© L. V. Zhukovskaya
Federal State Budgetary Institution of Science Central Economic and Mathematical Institute of the
Russian Academy of Sciences (CEMI RAS) Nakhimovskii prosp., 47, Moscow, 117418, Russia e-mail: [email protected]
© Y. S. Mukhina
Lomonosov Moscow State University Faculty of Mechanics and Mathematics Department of Higher Algebra Leninskiye Gory, GSP-1, Moscow, 119991, Russia e-mail: [email protected]
a new approach to optimal solutions of nonoooperative games:
accounting for savage-niehans risk.
Zhukovskiy V. I., Zhukovskaya L. V., Mukhina Y. S.
Abstract. The novelty of the approach presented below is that each person in a conflict (player) seeks not only to increase his payoff but also to reduce his risk, taking into account a possible realization of any uncertainty from a given admissible set. A new concept, the so-called strongly-guaranteed Nash equilibrium in payoffs and risks, is introduced and its existence in mixed strategies is proved under standard assumptions of the theory of noncooperative games, i.e., compactness and convexity of the sets of players' strategies and continuity of the payoff functions.
Keywords: Savage Niehans risk, minimax regret, uncertainties, noncooperative game, optimal solution
1. PRINCIPIA UNIVERSALIA
Consider the noncooperative N-player normal-form game under uncertainty
<N, {X^Y, {/i(x,y)}ieN), (1)
where N = {1,2, ...,N > 2} denotes the set of players; each player i G N chooses and uses a pure strategy xi G Xi C Rni ( i G N ), which yields a strategy profile x = (xi,... , xN) G X = ign Xi C Rn (n = ^2ign n^; regardless of the players actions, an uncertainty y G Y C Rm is realized in game (1); the payoff function /i(x,y) of player i is defined on the pairs (x, y) G X x Y, and its value is called the payoff of player i.
At a conceptual level, the goal of each player in the standard setup considered before was to choose his strategy so as to achieve as great payoff as possible.
The middle of the twentieth century was a remarkable period for the theory of noncooperative games. In 1949, 21 years old Princeton University postgraduate J. Nash suggested and proved the existence of a solution [1] that subsequently became known as the Nash equilibrium: a strategy profile xe G X is called Nash equilibrium in a game <N, {Xi}i£N , {/i [x]}igN> if
max /i [xe||xi] = /i [xc] (i G N),
Xi&Xi
where [xc|xi] = [xi,..., xC-1, xi, xC+1,..., x^].
This concept (and the approach driven by it) has become invaluable for resolving global (and other) problems in economics, social and military sciences. After 45 years, in 1994, J. Nash together with R. Selten and J. Harsanyi were awarded the Nobel Prize in Economic Sciences "for their pioneering analysis of equilibria in the theory of non-cooperative games."In 1951, American mathematician, economist and statistician L. Savage, who worked as a statistics assistant for J. von Neumann during World War II, proposed [2] the principle of minimax regret (the Savage-Niehans risk). In particular, for a single-criterion choice problem under uncertainty r = <X, Y, <£>(x,y)), the principle of minimax regret can be written as
minmax R(x,y) = max R (xc,y) = R (2)
xgX ySY
where the Savage-Niehans risk function [2] has the form
R(x, y) = max <£>(z,y) — <£>(x,y) (3)
zgX
The value R(x, y) is called the Savage-Niehans risk in a single-criterion choice problem r. It describes the risk of decision makers while choosing an alternative x (the difference between the desired value of the criterion maxxgX <£>(x,y) and the realized value <£>(x,y)).
Note that a decision maker seeks to reduce precisely this risk as much as posible by choosing x £ X .In fact, the combination of the concept of Nash equilibrium with the principle of minimax regret is the fundamental idea of this work. Such an approach matches the desire of each player not only to increase his playoff, but also to reduce his risk with realizing this desire.
In this context, two questions arise naturally:
1. How can we combine the two objectives of each player (payoff increase with simultaneous risk reduction) using only one criterion?
2. How can we combine these actions (alternatives) in a single strategy profile, in such a way that uncertainty is also accounted for?
2. How Can We Combine the Objectives of Each Player to Increase the Payoff and Simultaneously Reduce the Risk?
Construction of Savage—Niehans Risk Function
Recall that, in accordance with the principle of minimax regret, the risk of player i is defined by the Savage-Niehans risk function [3-5]
Ri (x, y) = max fi (z, y) - f (x, y), (4)
ZgX
where fi(x,y) denotes the payoff function of player i in game (1). Thus, to construct the risk function Ri(x,y) for player i, first we have to find the dependent maximum
max fi(x,y) = fi [y]
x£X
for all y £ Y. To calculate fi[y], in accordance with the theory of two-level hierarchical games, it is necessary to assume the discrimination of the lower-level player, who forms the uncertainty y £ Y and sends this information to the upper level for constructing counterstrategies x(i) (y) : Y ^ X so that
max fi(x, y) = fi (x(i)(y),y) = fi[y] Vy £ Y.
xgX
The set of such counterstrategies is denoted by XY. (Actually, this set consists of n-dimensional vector functions x(y) : Y ^ X with the domain of definition Y and the codomain X.) Thus, to construct the first term in (4) at the upper level of the hierarchy, we have to solve N single-criterion problems of the form
(XY, Y,fi(x,y)> (i £ N),
for each uncertainty y £ Y; here XY is the set of counterstrategies x(y) : Y ^ X, i.e., the set of pure uncertainties y £ Y. The problem itself consists in determining the scalar
functions /i[y] defined by the formula
/i[y] = max /i(x, y) Vy G Y. x(-)exY
After that, the Savage-Niehans risk functions are constructed by formula (4). Continuity of Risk Function, Guaranteed Payoffs and Risks
Hereinafter, the collection of all compact sets of Euclidean space Rk is denoted by comp Rk, and if a scalar function -0(x) on the set X is continuous, we write G C(X). The main role in this work will be played by the following result.
Proposition 1. If X G compRn, Y G compRm, and /i(-) G C(X x Y), then
(a) the maximum function maxxgX /i(x,y) is continuous on Y;
(b) the minimum function minygY /i (x, y) is continuous on X.
These assertions can be found in most monographs on game theory, operations research, systems theory, and even in books on convex analysis [6].
Corollary 1. If in game (1) the sets X G comp Rn and Y G comp Rm and the functions /i(-) G C(X x Y), then the Savage-Niehans risk function Ri(x, y) (i G N) is continuous on X x Y.
Indeed, by Proposition 1 the first term in (4) is continuous on Y and a difference of continuous functions is itself continuous for all (x, y) G X x Y.
Let us proceed with guaranteed payoffs and risks in game (1). In a series of publications [7, 8], three different ways to account for uncertain factors of decisionmaking in conflicts under uncertainty were proposed. Our analysis below will be confined to one of them presented in [8]. The method that will be applied here consists in the following. Each payoff function /i(x, y) in game (1) is associated with its strong guarantee /i[x] = minygY /i(x, y) (i G N). As a consequence, choosing their strategies from a strategy profile x G X, the players ensure a payoff /i[x] < /i(x, y) Vy G Y to each player i, i.e., under any realized uncertainty y G Y. Such a strongly-guaranteed payoff /i[x] seems natural for the interval uncertainties y G Y, because no additional probabilistic characteristics of y (except for information on the admissible set Y C Rm ) are available. An example of such uncertainties can be the length of women's skirts [18]. For a clothing factory, production planning for a next year heavily affects its future profits; however, in view of the vagaries of fashion and female logic dictating fashion trends, availability of any probabilistic characteristics would be hardly expected. In such problems, it is possible to establish only some obvious limits of length variations. Proposition 1, in combination with Corollary 1 as well as the continuity of /i(x, y) and Ri(x,y) on X x Y, leads to the following result.
Proposition 2. If in game (1) the sets Xi(i £ N) and Y are compact and the payoff functions fi(x,y) are continuous on X x Y, then the guaranteed payoffs
fi[x] = min fi (x, y) (i £ N) (5)
yeY
and the guaranteed risks
Ri[x] = max Ri(x,y) (i £ N) (6)
yeY
are scalar functions that are continuous on X.
Remark 1. First, the meaning of the guaranteed payoff fi[x] from (5) is that, for any y £ Y, the realized payoffs fi (x, y) are not smaller than fi[x]. In other words, using his own strategies from a strategy profile x £ X in game (1), each player ensures a payoff fi(x, y) of at least fi[x] under any uncertainty y £ Y (i £ N). Therefore, the guaranteed payoff fi[x] gives a lower bound for all possible payoffs fi(x,y) occurring when the uncertainty y runs through all admissible values from Y. Second, the guaranteed risk Ri [x] also gives an upper bound for all Savage-Niehans risks Ri(x,y) that can be realized under any uncertainties y £ Y. Indeed, from (6) it immediately follows that
Ri(x,y) < Ri[x] Vy £ Y (i £ N).
Thus, adhering to his strategy xi from a strategy profile x £ X, player i £ N obtains a guarantee in the playoff fi[x], because fi[x] < fi(x,y) Vy £ Y, and simultaneously a guarantee in the risk Ri[x] > Ri(x, y) Vy £ Y.
3. Transition from Game (1) to a Noncooperative Game with Two-Component Payoff Function
The new mathematical model of a noncooperative N-player game under uncertainty with a two-component payoff function of each player in the form
G = (N, {X^, Y, {fi(x, y), -Ri(x, y)}i£N)
matches the desire of each player to increase his payoff and simultaneously reduce his risk. Here, N, Xi and Y are the same as in game (1); the novelty consists in the transition from the one-component function fi(x, y) of each player i to the two-component counterpart {fi(x,y), —Ri(x,y)}, where Ri(x,y) denotes the Savage-Niehans risk function for player i. Recall that Ri(x,y) figures in the game G with the minus sign, as in this case player i seeks to increase both criteria simultaneously by an appropriate choice of his strategy xi £ Xi. In this model, we expect any uncertainty y £ Y to occur. Since Ri(x, y) > 0 for all (x, y) £ X x Y, an increase of —Ri(x,y) is equivalent to a reduction of Ri(x,y).
Since the game G involves interval uncertainties y G Y only (the only available information is the range of their variation), each player i G N should focus on the guaranteed payoffs /¿[x] from (5) and the guaranteed risks R^x] from (6), This approach allows one to pass from the game G to the game of guarantees
Gg = <N, (Xi}îeN , {/¿[x], -Ri[x]}îeN> ,
in which each player i G N chooses his strategy xj G Xj so as to simultaneously maximize both criteria /¿[x] and —R^x]. By "freezing"the strategies of all players in Gg except for Xj, we arrive at the bi-criteria choice problem
Gg = (Xj, {/¿[x], — Rj[x]}>
for each player i. Recall that, in the bi-criteria choice problem Gg, the strategies of all players except for player i are considered to be fixed ("frozen"), and this player i chooses his strategy xj G Xj so that for xj = xS the maximum possible values of /¿[x] and —R^x] are simultaneously realized. Right here it is necessary to answer the first of the two major questions formulated at the end of article.
4. How Can We Combine the Objectives of Each Player (Increase Payoff and Simultaneously Reduce Realized Risk) Using Only
One Criterion?
To answer this question, we will apply the concept of vector optimum-the Pareto efficient solution-proposed in 1909 by Italian economist and sociologist Pareto [10].
In what follows, for the choice problem Gg, introduce the notations /j [xj] = /¿[x] and Rj [x^ = Ri[x] for the frozen strategies of all players except for the strategy xj of player i. Then the problem Gg = (X^ {/[x], — R^x]}> can be transformed into
(Xj, {/ï [xï] , —Rj [xï]}> . (7)
Proposition 3. If in problem (7) there exists a strategy x? G Xj and a value aj G (0,1) such that xf maximizes the scalar function
[xj] = /j [xj] — CTjRj [xj] (8)
i.e.,
[xf] = max (/j [xj] — CTjRj [xj]) (9)
XiSXi
then x^ is the Pareto-maximal alternative in (7); in other words, for any xj G Xj the system of two inequalities
/j [xj] > /j [x?], —Rj [xj] > —Rj [x?],
with at least one strict inequality, is inconsistent.
Assume on the contrary that the strategy x^ yielded by (9) is not the Pareto-maximal alternative in problem (7) . Then there exists a strategy xi £ Xi of player i such that the system of two inequalities
fi [xi] > fi [x?], —Ri [xi] > —Ri [x?]
with at least one strict inequality, is consistent.
Multiply both sides of the first inequality by 1 — ai > 0 and of the other inequality by ai > 0 and then add separately the left- and right-hand sides of the resulting inequalities to obtain
(1 — ^i) fi [xi] — CTiRi [xi] > (1 — ^i) fi [xe] — CTiRi [x?] or, taking into account (8),
[xi] > [x?]
This strict inequality contradicts (9), and the conclusion follows.
Remark 2. The combination of criteria (5) and (6) in form (8) is of interest for two reasons. First, even if for xi = x? we have an increase of the guaranteed result fi [xi] > fi [x?], then due to the Pareto maximality of x? and the fact that Ri [xi] > 0 such an improvement of the guaranteed payoff fi [xi] > fi [x?] inevitably leads to an increase of the guaranteed risk Ri [xi] > Ri [x?]; conversely, for the same reasons, a reduction of the guaranteed risk Ri [xi] < Ri [x?] leads to a reduction of the guaranteed payoff fi [xci] < fi [x|] (both cases are undesirable for player i). Therefore, the replacement of the bi-criteria choice problem (7) with the single-criterion choice problem (Xi, fi [xi] — ffiRi [xi]) matches the desire of player i to increase fi[xi] and simultaneously reduce Ri[xi].
Second, since Ri [xi] > 0, an increase of the difference fi [xi] — aiRi [xi] also matches the desire of player i to increase the guaranteed payoff fi[x] and simultaneously reduce the guaranteed risk Ri [x].
5. Formalization of Guaranteed Equilibrium in Payoffs and Risks
for Game (1)
Now, let us answer the second question from the begining: how can we combine the efforts of all N players in a single strategy profile taking into account the existing interval uncertainty? To do this, from game (1) we will pass sucessively to noncooperative games
r1; r2 and r3, where
r = (N, (Xi)ieN , Y, {/¡(x, y), -Ri(x, y))ieN> ,
r2 = <N, {Xi}igN , {/i[x], -Ri[x]}i£N> ,
r3 = <N, {Xi}i£N , {$i[x] = /¡[x] - ^Ri[x]}i£N> .
In all these three games, N = {1, 2,..., N > 2} is the set of players; Xj G X C Rni (i G N) denote the strategies of player i; x = (x1,... , xN) G X = igN Xj C Rn (n = ^igN nj) forms a strategy profile; y G Y C Rm are uncertainties; the payoff function /j(x,y) of each player i G N is defined on the pairs (x,y) G X x Y, in (4), Rj(x,y) denotes the Savage-Niehans risk function of player i; finally, Oi G (0,1) (i G N) are some constants. In the game r1, the payoff function of player i becomes two-component as the difference between the payoff function /¡(x, y) of player i from (1) and the risk function Ri(x,y) from (4).
In the game T2, the payoff function /¡(x, y) and the risk function Ri(x, y) are replaced with their guarantees /¡[x] = minygY /¡(x, y) and R^x] = maxygY Ri(x, y), respectively. Finally, in the game T3, the linear combination of the guarantees /¡[x] and —R^[x] (see Proposition 3) is used instead of the payoff function of player i.
6. Internal Instability of the Set of Nash Equilibria
Consider a noncooperative N-player game in pure strategies (a non-zero-sum game of guarantees) of the form
r = (N, {X}jgN , №[x]}jgN> (10)
Each player i chooses and uses his pure strategy xi G Xi C Rni without making coalitions with other players, thereby forming a strategy profile x = (x1,...,xN) G X = ign Xi C Rn (n = ign n^ ; a payoff function $i[x] is defined for each i G N on the set of strategy profiles X, and its value is called the payoff of player i. Below, we will again use the notations [x^x^ = [x1,..., x£-:L, x^ x£+1,... ,x^] and $ = ($1,..., $N).
Definition 1. A strategy profile xc = (x1,..., xN) G X is called a Nash equilibrium in game (10) if
max $i [xc|xi] = $i [xc] (i G N) (11)
XigXi
denote by Xc the set {xc} of Nash equilibria in game (10).
Let us analyze the internal instability of Xc. A subset X* C Rn is internally unstable if there exist at least two strategy profiles x(j) G X*(j = 1, 2) such that
[$ [x(1)] < $ [x(2)]] ^ [x(1)] < $i [x(2)] (i G N)] (12)
and internally stable otherwise.
For example, let us consider the two-player game
({1,2}, {Xi = [-1,1]}.=li2 , {fi(x) = -x2 + 2xxx2 }(13)
A strategy profile xe = (xl,x£) G [-1,1]2 is a Nash equilibrium in game (13) if (see (5))
-x2 + 2xix2 < - (xe)2 + 2x1x2 Vxi G [-1,1](i = 1,2)
which is equivalent to
- (xi - x2)2 < - (xl - x2)2 , - (xl - x2)2 < - (xl - x2)2
Therefore, xl = x2 = a Va = const G [-1,1], i.e., in (13) we have the sets
Xe = {(a, a) | Va G [-1,1]}
and f (Xe) = exc f (xe) = Uag[-l l] (a2, a2). Thus, the set Xe is internally unstable, since for game (13) with x(l) = (0,0) and x(2) = (1,1) we obtain fi (x(l)) =0 < fi (x(2)) = 1(i = 1, 2)(see (6)).
Remark 3. In the zero-sum setup of game (10) (i.e., with N = {1, 2} and fl = -f2 = f, the equality f (x(l)) = f (x(2)) holds for any two saddle points x(k) G X(k = 1, 2) (by the equivalence of saddle points). Therefore, the set of saddle points in the zero-sum game is always internally stable. Note that a saddle point is a Nash equilibrium in the zero-sum set-up of game (10).
Remark 4. In the non-zero-sum setup of game (10), internal instability (see the previous example) does not occur if there is a unique Nash equilibrium in (10).
Let us associate with game (10) an auxiliary N-criteria choice problem of the form
rc = ( Xe, {$i[x]}i£N > (14)
where the set Xe of alternatives x coincides with the set of Nash equilibria xe of game (10) and the ith criterion $i[x] is the payoff function (8) of player i.
Definition 2. An alternative xP G Xe is a Pareto-maximal (weakly efficient) alternative in (14) if for all x G Xe the system of inequalities
$i[x] > [xP] (i G N),
with at least one strict inequality, is inconsistent. Denote by XP the set {xPj of all such strategy profiles.
In accordance with Definition 2, the set XP Ç Xe is internally stable.
The following assertion is obvious. If
$>(x) f (xP)
(15)
¿eN
for all x £ Xc, then xP is a Pareto-maximal alternative in problem (14).
Remark 5. A branch of mathematical programming focused on numerical methods of Nash equilibria design in games (10) has recently become known as equilibrium programming. At Moscow State University, research efforts in this field are being undertaken by the groups of Professors F.P. Vasiliev and A.S. Antipin at the Faculty of Computational Mathematics and Cybernetics. However, the equilibrium calculation methods developed so far yield a Nash equilibrium that is not necessarily Pareto-maximal (in other words, the methods themselves do not guarantee Pareto maximality). At the same time, such a guarantee appears (!) if equilibrium is constructed using the sufficient conditions below-see Theorem 1.
Recall that the set of Nash equilibria xc of game (10) (Definition 1) is denoted by Xc, while the set of Pareto-maximal alternatives xP of problem (14) (Definition 2) is denoted
Definition 3. A strategy profile x* £ X is called a Pareto-Nash equilibrium in game (10) if x* is simultaneously
(a) a Nash equilibrium in (10) (Definition 1) and
(b) a Pareto-maximal alternative in (14) (Definition 2).
Remark 6. The existence of x* in game (10) with Xc = 0, compact sets Xj and continuous payoff functions [x] (i £ N) follows directly from the fact that Xc £ comp X.
8. Sufficient Conditions of Pareto-Nash Equilibrium in Game (10) Relying on (5) and (15), introduce N +1 scalar functions of the form
by XP.
^¿(x, z) = f (z||xi) - fj(z) (i £ N),
^N+1(x,z)=5^ Zr(x) — /r(z) (16)
rgN rgN
where z = (z1,..., zN), z G Xi (i G N), z G X, and x G X. The Germeier convolution [11, p.66] of the scalar functions (16) is given by
w(x, z)= max w,'(x, z). (17)
j=1,...,N+1 j
Finally, let us associate with game (10) and the N-criteria choice problem (14) the zero-sum game
(X, Z = X, w(x,z)), (18)
in which the first player chooses his strategy x G X to increase the payoff function, while the opponent (the second player) forms his strategy z G X, seeking to decrease as much as possible the payoff function w(x, z) from (16) and (17).
A saddle point (x0, z*) G X2 in game (18) is defined by the chain of inequalities
w (x,z*) < w (x0,z*) < w (x0,z) Vx, z G X. (19)
In this case, the saddle point is formed by the minimax strategy z*,
minmax w(x, z) = max w (x,z*) ) ,
zgX xg^ xgX J
in combination with the maximin strategy x0,
( max min ю(х, z) = min ю (x0, z) | ,
^zgX zgX rK 1 ' v J
in game (18).
The next result provides a sufficient condition for the existence of a Pareto equilibrium in game (10).
Theorem 1. If there exists a saddle point (x0,z*) in the zero-sum game (18) (i.e., inequalities (19) hold), then the minimax strategy z* is a Pareto-Nash equilibrium in game (10).
Доказательство. Let z = x0 in the right-hand inequality of (19). Then, using (16) and (17), we obtain
ш (x0,x0) = max ш, (x0,x0) = 0
4 ' j = 1,...,N+1 j 4 '
In accordance with (19), it appears that
0 ^ ш (x, z*) = max ш, (x, z*) Vx G X.
j=1,...,N+1 j
Therefore, the following chain of implications is valid for all x G X :
0 ^ max w, (x, z*) ^ w, (x, z*)
j=1,...,N+1 j j
(16
[w, (x,z*) < 0(j = 1,...,N +1)]
{[fi (z*||xi) - f (z*) < 0 Vxt G Xi (i G N)]
Efi(x) fi (z*) < 0 Vx G Xe
_ieN ieN
max fi (z*|xi) = fi (z*) (i G N)
xi e Xi
max
xeX1
(5),(15)
Efi(x)^ fi (z*)
ieN ieN
{[z* G Xe]/\ [z* G XP] },
)
)
due to the inclusion Xe C X.
□
Remark 7. Theorem 1 suggests the following design method for a Pareto-Nash equilibrium x* in game (10).
Step 1. Using the payoff function $j[x](i £ N) from (5.2.10) and also the vectors z = (zi,..., zN), Zj £ Xj, and x = (xi,..., xN) , xj £ Xj(i £ N), construct the function <£>(x,z) by formulas (16) and (17).
Step 2. Find the saddle point (x0,z*) of the zero-sum game (18). Then z* is the Pareto-Nash equilibrium solution of game (10).
As far as we know, numerical calculation methods for the saddle point (x0, z*) of the Germeier convolution
<£>(x,z)= max ixj(x, z) j = 1,...,N+1
are lacking; however, they are crucial (see Theorem 1) for constructing Nash equilibria that are simultaneously Pareto-maximal strategy profiles. Seemingly, this is a new field of equilibrium programming and it can be developed, again in our opinion, using the mathematical tools of Germeier convolution optimization maxj | (x) that were introduced by Professor V. F. Demyanov.
Remark 8. The next statement follows from results of operations research (see Proposition 1) and is a basic recipe for proving the existence of a Pareto-Nash equilibrium in mixed strategies in game (10). Namely, in game (10) with the sets Xj £ comp
Rni and the payoff functions $i[-j G C(X) ( i G N), the Germeier convolution w(x,z) = maxj=1i...iN+1 w,(x, z)(16), (17) is continuous on X x X.
9. Formalization of Strongly-Guaranteed Equilibrium in Payoffs
and Risks
Let us consider the concept of guaranteed equilibrium in game (1) from the viewpoint of a risk-neutral player. Assume each player i exhibits a risk-neutral behavior, i.e., chooses his strategy to increase the payoff (the value of the payoff function /¡(x, y)) and simultaneously reduce the risk (the value of the Savage-Niehans risk function Ri(x,y) = maxzgX /¡(z, y) — /¡(x,y)) under any realization of the uncertainty y G Y. Hereinafter, we use three N-dimensional vectors / y (/1,...,/N) , R = (R1,...,RN), and $ = ($1,..., ) as well as N values o G (0,1)(i G N).
Definition 4. A triplet (xP,/ P,RP) is called a strongly-guaranteed Nash equilibrium in payoffs and risks in game (1) if first, /P = / [xP] and RP = R [xP]; second, there exist scalar functions /¡[x] = minygY / (x, y) and R^[x] = maxygY Ri(x, y), Ri(x, y) = maxzgX /¡(z, y) — /¡(x, y)(i G N), that are continuous on X; third, the set Xc of all Nash equilibria xc in the game of guarantees
r = (N, (XijigN , {$i[x] = /¡[x] — ^iRi[x])igN> is non-empty at least for one value G (0,1), i.e.,
max $ [xc|xi] = [xc] , (i G N),
XjgX;
where Xc = {xc} and [xc|xi] = [x!,..., xC-1, xi, xC+1,..., x^]; fourth, xP is a Pareto-maximal alternative in the N-criteria choice problem of guarantees
<Xc, {$i[x]}jgN> ,
i.e., the system of inequalities
$i[x] > [xP] (i G N) Vx G Xc, with at least one strict inequality, is inconsistent.
Remark 9. Let us list a number of advantages of this equilibrium solution. First, as repeatedly mentioned, economists often divide decision makers (in our game (1), players) into three groups. The first group includes those who do not like to take risks (risk-averse players); the second group, risk-seeking players; and the third group, those who consider the payoffs and risks simultaneously (risk-neutral players). Definition 4 treats all players as risk-neutral ones, though it would be interesting to analyze the players from different
groups (risk-averse, risk-seeking and risk-neutral players). We hope to address these issues in future work.
Second, lower and upper bounds on the payoffs and risks are provided by the inequalities fj [xP] ^ fj (xP, y) Vy £ Y and R [xP] ^ R (xP,y) Vy £ Y, respectively; note that the continuity of the guarantees /¿[x] and R [x] follows directly from the inclusions Xj £ comp Rni (i £ N), Y £ comp Rm, and /j[-] £ (X x Y ) (see Proposition 2).
Third, an increase of the guaranteed payoffs for a separate player (as compared to /j [x^ ) would inevitably cause an increase of the guaranteed risk (again, as compared to R [x^), whereas a reduction of this risk would inevitably cause a reduction of the guaranteed payoff.
Fourth, it is impossible to increase the difference $j [xP] for all players simultaneously (this property follows from the Pareto maximality of the strategy profile xP ).
Fifth, the best solution has been selected from all guaranteed solutions, as the difference $j [xP] takes the largest value (in the sense of vector maximum).
Sixth, under the assumption that the sets Xj(i £ N) and Y are compact and the payoff functions /j(x,y) are continuous on X x Y, the guarantees /¿[x] and R[x] exist and are continuous on X (Proposition 2). Therefore, the existence of solutions formalized by Definition 4 rests on the existence of Nash equilibria in the game of guarantees. Note that the framework developed in this section can be also applied to the concepts of Berge equilibrium, threats and counterthreats, and active equilibrium.
Below we will present a new method of proving the existence of a strongly. guaranteed Nash equilibrium in payoffs and risks. In particular, using the Germeier convolution of the payoff function, we have already established sufficient conditions for the existence of Nash equilibria in pure strategies that are simultaneously Pareto maximal with respect to all other equilibria (see Theorem 1).
Concluding this section, we will show the existence of such an equilibrium in mixed strategies under standard assumptions of noncooperative games (compact strategy sets and continuous payoff functions of all players).
10. Existence of Pareto Equilibrium in Mixed Strategies
The hope that game (10) has a Pareto equilibrium in pure strategies (Deflnition 3) is delusive. Such an equilibrium may exist only for a special form of the payoff functions, a special structure of the strategy sets, and a special number of players. Therefore, adhering to an approach that stems from Borel [12], von Neumann [13], Nash [1] and their followers, we will establish the existence of a Pareto equilibrium in mixed strategies
in game (10) under standard assumptions of noncooperative games (compact strategy sets and continuous payoff functions).
Thus, suppose that in game (10) the sets X of pure strategies x are convex and compact in Rni (i.e., convex, closed and bounded; denote this by Xj £ cocomp Rni ) while the payoff function /¿[x] of each player i(i £ N) is continuous on the set of all pure strategy profiles X = []jgN Xj.
Consider the mixed extension of game (10). For each of the N compact sets Xj(i £ N), construct the Borel ^-algebra B (Xj) and probability measures Vj(-) of B (Xj) (i.e., nonnegative countably-additive scalar functions defined on the element of B (Xj) that are normalized to 1 on Xj). Denote by {v^} the set of such measures, a measure vj(-) is called a mixed strategy of player i(i £ N) in game (5.2.10) Then, for the same game (10), construct mixed strategy profiles, i.e., the product measures v(dx) = (dxx) ■ ■ ■ vN (dxN). Denote by {v} the set of such strategy y profiles. Finally, calculate the expected values
/¿(v) = J /¿(x)v(dx) (i £ N). (20)
X
As a result, we associate with the game r3 (10) its mixed extension
r3 = (N, {Vj}jgN , {/¿(v)}jgN> .
In the noncooperative game T3,
vj(-) £ {vj} is a mixed strategy of player i;
v(-) £ {v} is a mixed strategy profile;
/¿(v) is the payoff function of player i, defined by (10).
In what follows, we will use the vectors z = (zi,..., ) £ X, where zj £ Xj(i £ N), and x = (x1,... ,xN) £ X, as well as the mixed strategy profiles v(-),^(-) £ {v} and the expected values
^¿(v) = J $j(x)v(dx), ^¿(u) = J ^¿(z)^(dz)
X X
^ (u|Vj) = J ••• J J J---J ^ (Zl,...,Zj-l,Xj,Zj+l, (21)
X1 Xi-1 Xi Xi+1 XN
..., ZN) UN (dzN) ■ ■ ■ Ui+1 (dzi+1) Vj (dXj) ^¿-1 (dzi-1) ■ ■ ■ ui (dzi) .
Once again, take notice that xj, zj £ X^i £ N) and x, z £ X.
The following concept of Nash equilibrium in mixed strategies vc(-) £ {v} in game (10) corresponds to Definition 1 of a Nash equilibrium in pure strategies xe £ X in the same game (10).
Definition 5. A strategy profile vc(-) £ {v} is called a Nash equilibrium in the game T3 if
[veN < [ve] Vv(-) £ {Vj} (i £ N) (22)
sometimes, the same strategy profile vc(-) £ {v} will be also called a Nash equilibrium in mixed strategies in game (10).
By Glicksberg's theorem, under the conditions Xj £ cocomp Rni and /j(-) £ C(X)(i £ N), there exists a Nash equilibrium in mixed strategies in game (10). Denote by Y the set of such mixed strategy profiles {v}. With the game T3 we associate the N-criteria choice problem
f„ = ( Y, №[v]}j£N > . (23)
In (9), the DM chooses a strategy profile v(■) £ Y in order to simultaneously maximize all elements of a vector criterion $(v) = ($x(v),..., (v)). Here a generally accepted solution is a Pareto-maximal alternative.
Definition 6. A strategy profile vP(-) £ Y is called a Pareto-maximal alternative for the N-criteria choice problem from (9) if for any v(■) £ Y the system of inequalities
$j[v] > [v] (i £ N),
with at least one strict inequality, is inconsistent.
An analog of (15) states the following. If
E/j(v) < E/j (vP) , (24)
jeN jeN
for all v(-) £ Y, then the mixed strategy profile vP(-) £ Y is a Pareto-maximal alternative in the choice problem (9).
Definition 7. A mixed strategy profile v*(-) £ {v} is called a Pareto-Nash equilibrium in mixed strategies in game (10) if
v*(-) is a Nash equilibrium in the game T3 (Definition 5);
v*(-) is a Pareto-maximal alternative in the multicriteria choice problem (9) (Definition 6).
Now, we will prove the existence of a Nash equilibrium in mixed strategies that is simultaneously Pareto-maximal with respect to all other Nash equilibria.
Proposition 4. Consider the noncooperative game (10), assuming that
1. the set of pure strategies Xj of each player i is a nonempty, convex, and compaet set in Rni (i £ N);
2. the payoff function ^¿[x] of player i(i £ N ) is continuous on the set of all strategy profiles X = []jeN Xj.
Then there exists a Pareto equilibrium in mixed strategies in game (10).
Доказательство. Using formulas (16) and (17), construct the scalar function
where, as before,
<z>(x,z)= max ш,(x, z),
j = 1,...,N+1 j
Шг(ж, z) = f (z||xi) - fi(z) (i e N),
^N + 1(X z) = J] fr (x) - E fr (z),
(x,z) fr (x) — ^ fr (Z)
reN reN
z = (zi,. . ., zN) £ X, zj £ Xj (i £ N), and x = (xi,. . ., xN) £ X, xj £ Xj (i £ N). By the construction procedure and Remark 9, the function <£>(x, z) is well-defined and continuous on the product of compact sets X x X. Introduce the auxiliary zero-sum game
ra = ({I, II}, X, Z = X, p(x,z)>.
In this game, player I chooses his strategy x £ X to maximize a continuous payoff function <£>(x, z) on X x Z (Z = X) while player II seeks to minimize it by choosing an appropriate strategy z £ X.
Next, we can apply a special case of Glicksberg's theorem to the game ra, since the saddle point in the game Ta coincides with the Nash equilibrium in the noncooperative two-player game
r2 = ({I, II}, {X, Z = X}, {/i(x, z) = p(x, z), /„(x, z) = —p(x, z)}> .
In this game, player I chooses his strategy x £ X to maximize /I(x,z) = <£>(x,z), while player II seeks to maximize /II(x, z) = — <£>(x,z). In the game r2, the sets X and Z = X are compact, while the payoff functions /I(x,z) and /II(x,z) are continuous on X x Z. Therefore, by the before mentioned Glicksberg theorem, there exists a Nash equilibrium (ve,^*) in the mixed extension of the game r2, i.e.,
r2 = / {I, II}, {v}, M, | /¿(v,^) = // /j(x,z)v(dx)^(dz) I \ .
^ I X X ) j=Iy
Moreover, (ve, obviously represents a saddle point in the mixed extension of the game
ra,
ra = ^{I, II}, {v}, {^}, <^(v, = y J <£>(x, z)v(dx)^(dz)^
XX
Consequently, by Glicksberg's theorem, there exists a pair (ve,u*) representing a saddle point of <(v, u), i.e.,
< (v, u*) < < (vc, u*) < < (vc, u), Vv(■), u(-) £ {v}. (25)
Setting u = vc in the right-hand inequality in (2), we obtain < (vc, vc) = 0, and hence for all v(■) £ {v} inequalities (2) yield
0 ^ < (v,u*) = ^y. max <(x,z)v(dx)u*(dz) (26)
X X
As established in [14],
max J J <j (x, z)v(dx)u(dz)
X X , , (27)
^ max <j(x, z)v(dx)u(dz)
XX
(This is an analog of the property that the maximum of a sum is not greater than the
sum of corresponding maxima.) From (11) and (12) it follows that
max J J <j(x, z)v(dx)u*(dz) ^ 0 Vv(-) £ {v}
XX
but then for each j = 1,..., N +1 we surely have
J J Wj(x,z)v(dx)u*(dz) < 0 Vv(-) G {v}. (28)
X X
Recall the normalization conditions of the mixed strategies and mixed strategy, profiles, namely,
J vt (dx,) = 1, J ^ (dz.) = 1(i G N)^ v(dx) = 1, J ^(dz) = 1, (29)
XX XX
which hold Vvj(-) G {v,} and V^(-) G as well as Vv(■) G {v} and V^(-) G Taking
these conditions into account, we will distinguish two cases, j G N and j = N +1, and further specify inequalities (28) for each case.
Case 1: j £ N Using (16) and (29) for each i £ N, inequality (28) is reduced to J J [/ (zlN) - /(z)] v(dx)u*(dz) = J j [/ (z^)
X X X Xi
-/¿(z)] Vj (dXj) u*(dz) = J J / (z|Xj) Vj (dXj) u*(dz)
X Xi
/¿(z)^*(dz) / Vi (dxi) (=
J-JJJ-J f h
X1 Xi_1 Xi Xi+1
... ¿¿-1, Xj, zi+1,... zn) uN (dzN) ■ ■ ■ u*+1 (dzi+1) v (dXj) u*— (d^-1) ... u1 (dz1)] - /j (u*) = /j (ulN) - /j (u*) < 0 V^-) £ M .
In combination with (8), this inequality shows that u*(') £ N, i.e., the mixed strategy profile u*(') is a Nash equilibrium in game (10) (Definition 5). Case 2: j = N + 1 Now inequality (28) takes the form
//<N+:(x,z )v(dxK(d;) = f f £/,(x)v(dxK(dz)
XX XX ¿eN
- / / ^/¿(z)v(dx)u*(dz) = / ^/¿(x)v(dxW u*(dz)
X X ¿eN X ¿eN X
- ^^/¿(z)u*(dz) / v(dx) = £ / /¿(x)v(dx)
X ¿eN X ¿eN X
- £ / /¿(z)u*(dz) (5 = 4) £ /¿(v) - £ /j (u*) < 0 Vv(-) £ N
¿eN X ¿eN ¿eN
since N C {v}. Hence, for vP = u* we directly get (10), i.e., the strategy profile u*(') is a Pareto-maximal alternative in the N-criteria choice problem Tc (9) (Definition 2).
This result, together with the inclusion u*(') £ N, completes the proof of Proposition 4. □
11. DE OMNI RE SCIBILI ET QUIBUSDAM ALIIS
As easily noticed, all the constructions, and lines of reasoning used in our work can be successfully carried over to the case of Berge equilibrium. We will do this below.
To avoid repetitions, we will emphasize the moments in the proof that are dictated by the specifics of Berge equilibrium. Again consider the N-player game (1)
N, {X^ , Y, {/¿(x, y)^eN
and, using formulas (4), define the Savage-Niehans risk functions
Rj(x,y) = max /j (z,y) — /¿(x,y).
zex
Next, by formulas (5) and (6), construct the strongly-guaranteed payoff /¿[x] of player i and the corresponding guaranteed Savage-Niehans risk Rj[x]. As a result, we arrive at the game of guarantees
rg = ( N, {Xj}jeN , {/¿[x], —Rj[x]}jeN > Then it is natural to pass to the auxiliary game (10),
( N, {Xj}jeN , {^¿[x] = /¿[x] — ^jRj[x]}jeN>
with a constant £ (0,1).
Recall that, if in the two-player game (N = {1, 2}) the players exchange their payoff functions, then a Nash equilibrium in the new game is a Berge equilibrium in the original game. Therefore, all the properties intrinsic to Nash equilibria remain in force for Berge equilibria. In particular, the set of Berge equilibria is internally unstable. With this instability in mind, let us introduce an analog of Definition 3 for the auxiliary game (10). As before, [x|zj] = [x1,..., xj-1, zj, xj+1,..., xN].
Definition 8. A strategy profile xB £ X is called a Pareto-Berge equilibrium in game (10) if xB = (xB,...,xB) is simultaneously
1. a Berge equilibrium in (10), i.e.,
max [x||xB] = [xB] (i £ N),
and
2. a Pareto-maximal alternative in the N-criteria choice problem
<XB, {$j[x]}jeN> , i.e., for any x £ XB the system of inequalities
$j [x] > [xB] (i £ N), with at least one strict inequality, is inconsistent.
Denote by XB the set of all jxBj.
Sufficient conditions for the existence of a Pareto-Berge equilibrium also involve the Germeier convolution, with the N-dimensional vectors x = (xx,...,xN) G X, z = (zi,..., zn) G X, f = (fi,..., /n), R = (Ri,..., Rn), and $ = ($i,..., ), as well
(30)
as N constants o £ [0,1] (i £ N). Specifically, consider the N +1 scalar functions
■0l(x,z) = $1 [zi,X2, . . . ,XN] - $l[z] ^2(x,z) = $2 [xi,Z2, . . . ,Xn] - $2[z]
(x,z) = $2 [Xi ,X2, . . . ,ZN ] - $2 [z]
N N
^N + 1 z) = $, [X] - $, [Z] j=1 j=1
and their Germeier convolution
^>(x, z) = max (x, z). (31)
j=1,...,N+1 j
Proposition 5. If there exists a saddle point (x0, zB) £ X x X in the zero-sum game
(X, Z = X, ^(x,z))
i.e.,
max^ fx, zB) = ^ fx0, z= min^ fx0, z) xex v ' K ' zex v '
then the minimax strategy zB is a Pareto-Berge equilibrium in game (10).
Like in Proposition 4, we may establish the existence of a Pareto-Berge equilibrium in mixed strategies.
Proposition 6. Consider the noncooperative game (10), assuming that
1. the sets Xj(i £ N) and Y are nonempty, convex and compact in Rni (i £ N);
2. the payoff functions /j(x,y)(i £ N) are continuous on the Cartesian product X x Y.
Then there exists a Pareto-Berge equilibrium in mixed strategies in this game.
12. A LA FIN DEs fins
Classical scholars believe that the whole essence of mathematical game theory is to provide comprehensive answers to the following three questions:
1. What is an appropriate optimality principle for a given game?
2. Does an optimal solution exist?
3. If yes, how can one find it?
The answer to the first question for the noncooperative N-player game (1) is the concept of Pareto-Nash equilibrium (Definition 4) or the concept of Pareto-Berge equilibrium (Definition 8).
Next, the answer to the second question is given by Propositions 4 or 6: if the sets of strategies are convex and compact and the payoff functions of the players are continuous on X x Y, then such equilibria exist in mixed strategies.
Finally, the answer to the third question is provided by the following procedure: first, construct the guarantees of the outcomes /¿[ж] (5) and risks (6): second, define the
functions Фг[ж] = /¿[ж] — 6 N); third, find the Germeier convolution of the payoff
functions <£>(ж, z) using formulas (16) and (17) for Nash equilibrium or using formulas (30) and (2) for Berge equilibrium; fourth, calculate the saddle point (x0,z*) of this convolution; then the minimax strategy z* is the desired Pareto-Berge (or Pareto-Nash) equilibrium.
список литературы
1. NASH, J. F. (1951) Non-cooperative Games. Ann. Math.. 54. p. 286-295.
2. SAVAGE, L. Y. (1951) The theory of Statistical Decision. J. American Statistic Assosiation. 46. p. 55-67.
3. Черемных, Ю. Н. Микроэкономика. Продвинутый уровень. — M.: Инфо-М, 2008. — 60 c.
CHEREMNYKH, Y. N. (2008) Microeconomics: Advance Level. Moscow: Info-M.
4. Черкасов, В. В.. Проблемы риска в управленческой деятельности. — M.: Рефл-бук, 1999. — 78 c.
CHERKASOV, V. V. (1996) Business Risk in Entrepreneurship. Kiev: Libra.
5. Шикин, Е. В.. От игр к играм. — M.: URSS, 1997. — 70 c. SHIKIN, E. V. (1997) From Games to Games. Moscow: URSS.
6. Дмитрук, А. В. Выпуклый анализ. Элементарный вводный курс. — M.: МАКС-ПРЕСС, 2012. — 34 c.
DMITRUK, A. V. (2012) Convex analysis. An Elementary Introductory Course. Moscow: Makspress.
7. Жуковский, В. И., Кудрявцев, К. Н. Уравновешивание конфликтов при неопределенности. I. Аналог седловой точки // Математическая Теория Игр и ее Приложения. — 2013. — Т. 5. — C. 27-44.
ZHUKOVSKIY, V. I. and KUDRYAVTSEV, K. N. (2013) Equilibrating Conflicts under Uncertainty. I. Analog of a Saddle-Point. Mat. Teor. Igr Prilozh.. 5 (1). p. 27-44.
8. Жуковский, В. И., Кудрявцев, К. Н. Уравновешивание конфликтов при неопределенности. II. Аналог максимина // Математическая Теория Игр и ее Приложения. - 2013. - Т. 5. - C. 3-45.
ZHUKOVSKIY, V. I. and KUDRYAVTSEV, K. N. (2013) Equilibrating Conflicts under Uncertainty. II. Analog of a Maximin. Mat. Teor. Igr Prilozh.. 5 (2). p. 3-45.
9. Венцель, Е. С. Исследование операций. — M.: Знание, 1976. — 40 c. VENTSEL', I. A. (1976) Operations Research. Moscow: Znanie.
10. PARETO, V. (1909) Manuel d'economie politique. Paris: Geard.
11. Подиновский, В. В., Ногин, В. Д. Парето-оптимальные решения многокритериальных задач. — M.: Физматлит, 2007. — 100 c.
PODINOVSKII, V. V., NOGIN, V. D. (2007) Pareto Optimal Solutions of Multicriteria Problems. Moscow: Fizmatlit.
12. BOREL, E. (1927) Sur les systemes de formes linears a determinant symetric gauche et la theorie generale du jeu. Comptes Rendus de l'Academie ds Sciences. 184. p. 52-53.
13. VON NEUMANN, J. (1928) Zur Theorie der Gesellschaftspielel. Math. Ann.. 100. p. 295-320.
14. Жуковский, В. И., Кудрявцев, К. Н. Парето-оптимальное равновесие Нэша: достаточные условия и существование в смешанных стратегиях. — M.: URSS, 2004. - 61 c.
ZHUKOVSKIY, V. I., KUDRYAVTSEV, K. N. (2004) Pareto-Optimal Nash Equilibrium: Sufficient Conditions and Existence in Mixed Strategies. Moscow: URSS.