Научная статья на тему 'ON NASH EQUILIBRIA FOR STOCHASTIC GAMES AND DETERMINING THE OPTIMAL STRATEGIES OF THE PLAYERS'

ON NASH EQUILIBRIA FOR STOCHASTIC GAMES AND DETERMINING THE OPTIMAL STRATEGIES OF THE PLAYERS Текст научной статьи по специальности «Математика»

CC BY
6
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
MARKOV DECISION PROCESSES / STOCHASTIC GAMES / NASH EQUILIBRIA / OPTIMAL STATIONARY STRATEGIES

Аннотация научной статьи по математике, автор научной работы — Lozovanu Dmitrii, Pickl Stefan

We consider n-person stochastic games in the sense of Shapley. The main results of the paper are related to the existence of Nash equilibria and determining the optimal stationary strategies of the players in the considered games. We show that a Nash equilibrium for the stochastic game with average payoff functions of the players exists if an arbitrary situation induces an ergodic Markov chain. For the stochastic game with discounted payoff functions we show that a Nash equilibrium always exists. Some approaches for determining Nash equilibria in the considered games are proposed.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «ON NASH EQUILIBRIA FOR STOCHASTIC GAMES AND DETERMINING THE OPTIMAL STRATEGIES OF THE PLAYERS»

Contributions to Game Theory and Management, VIII, 187—198

On Nash Equilibria for Stochastic Games and Determining the Optimal Strategies of the Players

Dmitrii Lozovanu1 and Stefan Pickl2

1 Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, Academy str., 5, Chisinau, MD-2028, Moldova e-mail: lozovanu@math.md http://www.math.md/structure/applied-mathematics/math-modeling-optimization/ 2 Institute for Theoretical Computer Science, Mathematics and Operations Research, Universitat der Bundeswehr, Miinchen 85577 Neubiberg-Munchen, Germany e-mail: stefan.pickl@unibw.de

Abstract We consider n -person stochastic games in the sense of Shapley. The main results of the paper are related to the existence of Nash equilibria and determining the optimal stationary strategies of the players in the considered games. We show that a Nash equilibrium for the stochastic game with average payoff functions of the players exists if an arbitrary situation induces an ergodic Markov chain. For the stochastic game with discounted payoff functions we show that a Nash equilibrium always exists. Some approaches for determining Nash equilibria in the considered games are proposed.

Keywords: Markov decision processes, stochastic games, Nash equilibria, optimal stationary strategies.

1. Introduction

In this paper we consider the infinite n-person stochastic games. An n-person stochastic game (Owen, 1982; Neyman and Sorin, 2003; Mertens and Neyman, 1981) is a dynamic game with probabilistic transitions played by players in a sequence of stages, where the beginning of each stage corresponds to a state from a given finite set of states of the game. The game starts at a given state. At each stage players select actions from their feasible sets of actions and each player receives a stage payoff that depends on the current state and the chosen actions. The game then moves to a new random state the distribution of which depends on the previous state and the actions chosen by the players. The procedure is repeated at a new state and the play continues for an infinite number of stages. The total payoff of a player is either the average of the stage payoffs or the discounted sum of the stage payoffs. The considered stochastic games have been studied by Gillette, 1957; Mertens and Neyman, 1981; Filar and Vrieze, 1997; Lal and Sinha, 1992; Neyman and Sorin, 2003. Existence of Nash equilibria for n-person games are proven in the case of stochastic games when the total payoff of each player is the discounted sum of stage payoffs and for some special cases of the games with average payoffs. In the general case, for the game with total payoffs that represents the average of the stage payoffs a Nash equilibrium may not exist (Lozovanu and Pickl, 2014).

The main results we describe in this paper are concerned with the existence of Nash equilibria in the considered games and elaboration of algorithms for determining the optimal stationary strategies of the players. We consider the stationary

strategies in the sense of Shapley. The stationary strategy weWe show that a Nash equilibrium for the stochastic game with average payoff functions of the players exists if an arbitrary situation generated by the strategies of the players induces a Markov unichain. For the stochastic game with discounted payoff functions we show that a Nash equilibrim always exists. The obtained results can be easily extended for antagonistic stochastic games and the corresponding conditions for the existence of saddle points can be derived.

2. Formulation of the basic game models

A stochastic game with n players consists of the following elements:

1. A state space X (which we assume to be finite);

2. A finite set A® (x) of actions with respect to each player i G {1, 2,..., n} for an arbitrary state x G X;

3. A stage payoff /®(x, a) with respect to each player i G {1, 2,..., n} for each state x G X and for an arbitrary action vector a G i A® (x);

4. A transition probability function p : X x xeXY\i A® (x) x X ^ [0,1] that gives the probability transitions pX y from an arbitrary x G X to an arbitrary y G Y for a fixed action vector a G ® A® (x), where EyexPX,y = 1, Vx G X, a G]1i A®(x);

5. A starting state x0 G X.

The stochastic game starts in state x0. At stage t players observe state xt and simultaneously choose actions at G A® (xt), i = 1, 2,..., n. Then nature selects state y = xt+i according to probability transitions pX^ y for a fixed action vector at = (a1 ,a2,...,an). A play of the stochastic game xo, ao, xi, ai,..., xt, at,... defines a stream of payoffs /0, /i, /2,..., where /ti = /i(xt,at), t = 0,1, 2,.... The t-stage average stochastic game is the game where the payoff of player i G {1, 2,.. .,n} is

1 i-1 f® = TE/.

t T =1

The infinite average stochastic game is the game where the payoff of player i G {1, 2,..., n} is

F® = lim F®.

In a similar the stochastic game with discounted sum payoffs of the players is defined. In such a game along to elements described above also a discount factor A (0 < A < 1) is given and the t-stage stochastic game with discounted sum payoffs is the game where the payoff of player i G {1, 2, . . . , n} is

t-i

a® = £ At /T.

T =1

The infinite stochastic game with discounted payoffs is the game where the payoff of player i G {1, 2, . . . , n} is

a® = lim a®.

The considered games can be formulated in terms of stationary strategies that correspond to pure strategies of the players. In this case the stationary strategies of the players we define as n maps:

s1 : x ^ a1 G A®(x) for x G X, i = 1, 2,..., n.

Obviously, the corresponding sets of stationary strategies S®, S2,..., Sn of the players are finite sets.

Let s = (s1, s2,..., sn) be a situation determined by a set of stationary strategies s1,s2,...,sn of the players 1, 2,..., n. This situation induces a Markov process with the probability distributions pX^ in the states x G X, i.e. we obtain the matrix of probability transitions Ps = (pX y). For this process we can determine the matrix of limiting probabilities Qs = (q£ y) that correspond to Ps. Therefore, if the starting state x0 is given, then for the Markov process with the matrix of probability transitions Ps we can calculate the corresponding average costs per transition F^ (s1, s2,..., sn), FX20 (s1, s2,..., sn),..., F" (s1, s2,..., sn) for the players as follows:

FXo (s1, s2, . . . , sn) = £ qSo,xf ®(x, s1(x), s2 (x), . . . , sn(x)), i = 1, 2, . . ., n.

xex

In such a way on the set of situations S = S1 x S2x,..., xSn we obtain the functions F®0 (s1, s2,..., sn), i = 1, 2,..., n that define the stochastic game with average payoffs in pure strategies. This game is determined by the set of states X, the sets of actions of the players {A®}i=—, the probability function p, the set of stage payoffs {/®(x, a}i=—, where a G A, A = n=1 A®, and the starting position of the game x0. Therefore we denote this game (X, {A®}i=—, {/®(x, a}i=^,p, x0).

We define the stochastic game with a discounted sum of stage payoffs in pure strategies in analogues way if for the Markov process with the matrix of probability transitions Ps = (pX y) we consider the matrix Ws(A) = (w£ y(A)) where Ws(A) = (I- APs)-1. Then for a situation s = (s1, s2,..., sn) the total discounted sum of stage payoffs <rX0 (s1, s2,..., sn) with given discount factor A (0 < A < 1) for the players can be calculated as follows:

aXo (s1, s2,. .., sn) = £ wXo,x(A)/®(x, s1(x), s2(x), .. ., sn(x)), i = 1, 2, .. ., n. yex

So, on the set of situations S = S1 x S2x,..., xSn we obtain the functions aX 0 (s1, s2,..., sn), i = 1, 2, ...,n that define the stochastic game with discounted payoffs in pure strategies. In a similar way as the previous game we can denote the discounted stochastic game in pure strategies (X, {A®}i=—, {/®(x, a}®=—,p, y, xo).

For these games Nash equilibria in pure strategies may not exist . Therefore in this paper we study the stochastic game using stationary strategies in the sense of Shapley (Shapley, 1953) that correspond to mixed strategies. For such games we formulate conditions for the existence of Nash equilibria and describe some approaches for determining the optimal strategies of the players.

3. Determining Nash Equilibria for Stochastic Games with Average Payoffs

We shall use a continuous model for studying the average stochastic games. We construct such a model as follows: At first we identify an arbitrary stationary strategy s® : x ^ a® € A®(x) with the set of boolean variables s® a € {0,1},x € X, a® € A® (x), where s® * = 1 if and only if player i fixes the action a® € A®(x) in the state x. So, the set of stationary strategies of player i we regard as the set of solutions of the following system:

E s®®,„i = 1, Vx € X; s®^ €{0,1}, Vx € X, Va® € A®(x).

o* eA(®)

Then in this system we change the s® o* € {0,1} by the condition 0 < s® o* < 1 and we obtain the set of stationary strategies in the sense of Shapley (Shapley, 1953), where s® o* is treated as probability of the choices of the action a® by player i every time when the state x is reached by any route in the dynamic stochastic game. Additionally, we shall use the following condition for the average stochastic games. We assume that an arbitrary situation s = (s1, s2, ..., sn) € S generates a Markov unichain with the corresponding matrix of probability transitions Ps = (p® y). We call a game with such a property with respect to the situations s = (s1,s2, ..., sn ) € S perfect game (Lozovanu, 2011). We show that in this case the problem of determining Nash equilibria for a stochastic game can be formulated as a continuous model that represents the game variant of the following optimization problem: Minimize

subject to

^(s,q)=E E f(® ,o)s®,o q®

®ex oeA(®)

E E p® , ys® , oq® = qy, Vy € X;

®ex oeA(®) q® = 1;

®ex

(1)

(2)

E •

o£A(®)

1, Vx € X;

s®,o > 0, Vx € X, a € A(x),

This problem represents a continuous model for an average Markov decision problem with immediate costs /(® o) in the states x € X for given actions a € A(x) and probability transitions p® y, where ^y£X p® y = 1, Vx € X, Va € A. More precisely, problem (1), (2) corresponds to a Markov decision problem where each strategy induces a Markov unichain (see Lozovanu and Pickl, 2015). This is easy to show, if we identify an arbitrary stationary strategy with the set of boolean variables s® o € {0,1}, x € X, a € A(x) that satisfy the conditions

s® o

o£A(®)

1, Vx € X; s® o > 0, Vx € X, a € A.

® o

These conditions determine all feasible strategies in (2). The remaining restrictions in (2) correspond to the system of linear equations with respect to for x G X. This system of linear equations reflects the ergodicity condition for the limiting probability , x G X in the Markov unichain, where x G X are determined uniquely for given sx,a, Vx G X, a G A(x). Thus, the value of the objective function (1) expresses the average cost per transition in this Markov unichain and an arbitrary optimal solution sX a,q* (x G X, a G A) of problem (1), (2) with sX a G {0,1} represents an optimal stationary strategy for a Markov decision problem with an average cost criterion. If such an optimal solution is known, then an optimal action for a Markov decision problem can be found by fixing a* = s*(x) for x G X if sX a = 1.

The problem (1), (2) can be transformed into a linear programming problem using the notations , a = sx aVx G X, a G A(x) (see Lozovanu and Pickl, 2015). Based on such transformation of the problem we will describe some additionally properties of the optimal stationary strategies in Markov decision processes.

Lemma 1. Let an average Markov decision problem be given, where an arbitrary stationary strategy s generates a Markov unichain, and consider the function

^(s) = £ £ f(x , a) sx, a fe xEX aeA(x)

where for x G X satisfy the condition

E E pX ,ysx = qy, Vy G X;

xEX a£A(x)

E qx =1

xEX

(3)

Then the function ^(s) depends only on sx,a for x G X, a G A(x), and on the set S of solutions of the system

E sx,a = 1, Vx G X;

oEA(X)

(4)

c,a > 0, Vx G X, a G A(x),

the function ^(s) is monotone.

Proof. If an arbitrary strategy s for a Markov decision problem induces a Markov unichain then for such an arbitrary a strategy the rank of system (3) is equal to |X | and (3) has a unique solution with respect to (x G X) (see Puterman, 2005). Moreover, the system of linear equations (3) uniquely determines Vx G X for an arbitrary solution of system (4). So, the function ^(s) depends only on sx a for x G X, a G A(x),

Now let us prove the second part of the lemma. We show that on the set of solutions of system (4) the function ^(s) is monotone. For this reason it is sufficient to show that for arbitrary s', s'' G S with ^(s') = ^(s") the following relation holds

s

minj^(s'), V(s")} < ^(s) < maxj^(s'), ^(s")}. (5)

192 if

= 0s' + (1 - 0)s'', 0 <0< 1.

We proof the correctness of this property using the relationship of the problem (1),(2) with the following linear programming problem:

Minimize

subject to

) = E E f(

/_. /_. j (x,a) —x,a x£X a£A(x)

(6)

E E —x,a = qy, Vy G X; x£X a£A(x)

E qx = 1;

xex

E °x,a = qx, Vx G X; aEA(x)

—x,a > 0, Vx G X, a G A(x).

(7)

The problem (6),(7) is obtained from (1),(2) introducing the substitutions —x,a = sx,aqx for x G X, a G A(x). These substitutions allow us to establish a bijective mapping between the set of feasible solutions of the problem (1),(2) and the set of feasible solutions of the linear programming problem (6),(7). So, if —x,a for x G X, a G A(x) and ) are known then we can uniquely determine

sx a = , Vx G X, a G A(x) (8)

qx

for which -0(s) = ). In particular, if an optimal basic solution —*,q* of the linear programming problem (6),(7) is found, then the optimal stationary strategy for a Markov decision problem can be found fixing

—x,a > 0; —x,a =

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Let s', s'' be arbitrary solutions of the system (4) where ^(s') <-0(s"). Then there exist the corresponding feasible solutions —', —'' of the linear programming problem (6),(7) for which

1, if

0, if

s* =

x,a

^(s ' ) = ' ), ^(s '') = ''),

—x,a = sX,aqX, —x,y = sx',aqx' Vx G X, a G A(x)

where qX, qX' are determined uniquely from the system of linear equations (3) for s = s' and s = s '', respectively. The function ^(a) is linear and therefore for an arbitrary a = 0a ' + (1 — 0)a'', 0 < 0 < 1 the following equality holds

^(a) = 0^(a ') + (1 — 0>0(a"),

where a is a feasible solution of the problem (6),(7), that in initial problem (1),(2) corresponds to a feasible solution s for which

V>(s) = ^(ä); qx = 0«X + (1 - 0)«X, Vx G X.

Using (8) we have

sx,a = , Vx G X, a G A(x),

i.e.

0a' + (1 - 0)<a 0<a<zX + (1 - 0)<a<z£

+ (1 - 0)«

+ (1 - 0)«

So, we obtain

where

0«X + (1 -

+

(1 - 0)«X

. ' ' x,a

X

+ (1 - 0)«

'' x,a-x

'x,a — "x^a

0x4a + (1 - 0x)<a,

0x =

0«X

0«X + (1 - 0)«X"

0 < 0 < 1.

It is easy to observe that 0 < 0X < 1, where 0X = 0, Vx G X if and only if 0 = 0 and 0X = 1, Vx G X if and only if 0 = 1. Moreover, it can be easily seen from the following proof that ^(s) = ^(s') in the case ^(s') = -0(s''). Thus the function -0(s) on the set of solutions of system (4) is monotone.

x'a

X

x

Now we extend the results described above for the continuous model of a stochastic game with average payoffs. We consider the continuous model for perfect stochastic games.

Let denote by S , i G {1, 2,...n} the set of solutions of the system Z si,„i = 1, Vx G X;

^(x) (9)

s* , > 0, Vx G X, a® G A®(x).

So, S is a convex compact set and its arbitrary extreme point corresponds to a basic solution s® of the system (9), where sX ai G {0,1}, Vx G X, a® G A(x). Thus,

if s® is an arbitrary basic solution of system (9), then s® G S , and s® correspond to a pure strategy.

— —1 —2 —n

On the set S = S x S x • • • x S we define n payoff functions

^V,s2, ...,sn)^£ £ H sX,ak /(V,a2 ...an) q*, i =1, 2, . . . , n,

n

„1 „2 „n^ _ V^ V^ TT „fc

sx,afc f (x.a1,«2 ...an) q

(a1,a2,...,an)6A(x) k=1

(10)

where for x € X are determined uniquely from the following system of linear equations

£ £ n sXakPa1 'a2"'"an)qx = qy, Vy G X;

lEX (a1 ,a2,...,an)eA(x) k=1

£ qx =1

xEX

(11)

when s1,s2, ..., sm are given.

The main results we prove for our game model represent the following properties:

- The set of Nash equilibria situations of the continuous model is non empty if and only if the set of Nash equilibria situations of the stochastic game in pure strategies is not empty;

- If (s1,s2, ..., sm) is an extreme point of S then FX(s1,s2, ..., sn) = —(s1, s2, ..., sn), Vx G X, i =1, 2. ..., n and all Nash equilibria situations for the continuous game model that correspond to extreme points in S represent Nash equilibria situations for the stochastic game in pure strategies.

From Lemma 1 as a corollary we obtain the following result.

Lemma 2. For a perfect stochastic game each payoff function —® (s1, s2, . .., sn),

i G {1, 2, ..., n} possesses the property that —®(s1,s2, ..., s®-1,s® , s®+1, ..., sn)

. _® _k

is monotone with respect to s® G S for arbitrary fixed sk G S , k =1, 2, ..., i — 1,

i +1, . .., n.

Using this lemma we can prove the following theorem.

Theorem 1. Let (X, A, {X®}®=—, {/®(x, a)}®=—, p, x) be a stochastic game with a given starting position x G X and average payoff functions

Fx1(s1, s2, ..., sm), FxV, s2, ...,sn), ..., FX"(s1, s2, ..., sm)

of players 1, 2, .. ., n, respectively. If for an arbitrary situation s = (s1, s2,. .., sn) G S of the game the transition probability matrix Ps = (pX y ) corresponds to a Markov unichain then for the continuous game on S there exists a Nash equilibrium s* = (s1 , s2 , ..., sn*) which is a Nash equilibrium for an arbitrary starting position x G X of the game.

Proof. According to Lemma 2 each function —®(s1 ,s2,...,sn), i G {1, 2, ...,n}

satisfies the condition that —®(s1,s2, ..., s®-1 ,s®,s®+1, ..., sn) is monotone

. _® _k

with respect to s® G S for an arbitrary fixed sk G S, k = 1, 2, ..., i — 1, i +

1, ..., n. In the considered game each subset S® is convex and compact. Therefore, these conditions (see Debreu, 1952, Dasgupta and Maskin, 1986, Simon, 1987 and

Reny, 1999) provide the existence of a Nash equilibrium s* = (s1*, s2*, ..., sn*) for the functions ^i(s1, s2, ..., sn), i € {i, 2, ..., n} on S x S x ■ ■ ■ x S . This Nash equilibrium is a Nash equilibrium for an arbitrary starting position x of the game.

Corollary 1. For the average stochastic game there exists a Nash equilibrium in pure strategies if and only if the continuous game has a Nash equilibrium in pure strategies.

Using the results described above we may conclude that in the case of perfect games a Nash equilibrium for stochastic games with average payoffs can be determined by using classical iterative methods for the continuous game models with payoff functions —® (s1,s2, ..., sn), i € {i, 2, ..., n} on the set _1 _2 _n

S x S x ■ ■ ■ x S . If we refer these iterative methods to a discrete game model with payoff functions FKs1, s2, ..., sn), FX(s\ s2, ..., sn), ..., Fm(s\ s2, ..., sn) on S1 x S2 x ■ ■ ■ x Sn, then we obtain the iterative procedures where players fix successively their strategies in order to minimize their payoff functions, respectively, and finally to reach Nash equilibrium (if such an equilibrium exists).

Note that if a stochastic game is not perfect, then Nah equilibrium may not exist (Lozovanu and Pickl, 2014, 2015)

4. Determining Nash Equilibria for Stochastic Games with Discounted Payoffs

In this section we show that a Nash equilibrium in mixed strategies exists for an arbitrary stochastic game with discounted payoff functions of the players and given discount factor 7, 0 < 7 < 1. To prove this we shall use the continuous model for the stochastic game. We will formulate such a model using the following auxiliary optimization problem: Maximize

^xo (a s) = axo (12)

subject to

€ X, (13)

yex oeA(x) oeA(x)

where sx,a, x € X, a € A(x) correspond to a fixed strategy that satisfy (1).

This problem represents a continuous model for the discounted Markov decision problem (see Lozovanu and Pickl, 2015). The system of linear equations (13) with respect to ax has a unique solution for a fixed s and we can find all ax for x € X that represent the discounted sum of immediate costs in the decision problem with the corresponding starting positions x € X. It is easy to observe that if we consider the optimization problem (12), (13) with respect to a then the equations in (13) can be changed by inequalities (<) and the values of the optimal solutions of the problem (12), (13) will correspond to the same ax for x € X. Therefore if after that we dualize (12), (13) with respect to ax for fixed s then we obtain the following linear programming problem : Minimize

£ /(x,a) sx,aAx xex oeA(x)

subject to

Ay - 7£ £ Ax > 0, Vy G X \{xo};

xex oeA(x)

Ay - 7£ £ PX,y sx,oAx > 1 y = xo; xex oeA(x)

If we add to this system the condition (1) and will minimize with respect to s then

we obtain the following optimization problem:

Minimize

^(s,A )=E E f(x,a) Sx,»^x (14)

xEX oEA(x)

subject to

Ay - 7£ £ px,y Sx,aAx > 0, Vy G X \{xo};

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

x£X oeA(x)

Ay - 7£ £ px,y sx,oAx > 1, y = x0;

x£X aeA(x)

£ Sx,a = 1, Vx G X ;

aeA(x)

(15)

Ay > 0 Vy G X; sx,a > 0, Vx G X, a G A(x).

Using elementary transformations in this problem and introducing the notations ax,a sx ,sAx, Vx G X, a G A(x) we obtain the following linear programming problem: Minimize

^(s,A )=E E f (x,a) ax,a (16)

xEX aeA(x)

subject to

Ay - 7£ £ px,y ax,O > 0, Vy G X \{xo};

xex oeA(x)

Ay - 7£ £ Px,y ax,o > 1 y = xo;

xex oeA(x)

(17)

£ ax,a = Ax, Vx G X;

a£A(x)

Ay > 0, Vy G X; ax,a > 0, Vx G X, a G A(x).

If (a*, A *) is an optimal basic solution of problem (16), (17) then the optimal stationary strategy s* for the discounted Markov decision problem is determined as follows:

f 1, if ax a = 0;

sxa = i (18)

x,a \0, if ax,a = 0.

and ax a = sx aA*, Vx G X, a G A(x) (see Lozovanu and Pickl, 2015).

For the continuous model of the discounted Markov decision problem we prove similar properties as for the average Markov decision model.

Lemma 3. Let a discounted Markov decision problem with the discount factor Y, 0 < y < 1 be given. Consider the function

^xo (s) = axo , where ax for x € X satisfy the condition

ax - Y^ £ sx,a px,y ay = £ sx,a /(x,a), Vx € X. (19)

y£X aeA(x) aeA(x)

Then the function <^xo (s) depends only on sx,a for x € X, a € A(x), and on the set S of solutions of the system

{E sx,a = 1, Vx € X;

aeA(x)

sx,a > 0, Vx € X, a € A(x) the function <£>xo (s) is monotone.

The proof of this lemma is similar to the proof of Lemma 1 if instead of the linear programming formulation (6), (7) we shall use the linear programming formulation (16), (17).

We formulate the continuous model for the stochastic game with discounted payoffs as follows: On the set S = S x S x ■ ■ ■ x Sn we consider n payoff functions

< (s1,s2, ... sn) = ax o, i =1, 2,..., n, (20)

where ax for x € X satisfy the condition

n /12 n \

ax - Y E E n sx akpxy° ""'a )ay =

y£X (a1,a2,...,an)eA(x) k=1

(21)

n

= E n sk k /i 1 2 ^, Vx € X; i = 1, 2,..., n.

(o1,o2,...,o")eA(x) k=1

This game model possesses the same property as the previous continuous model: -The set of Nash equilibria situations of the continuous model is non empty if and only if the set of Nash equilibria situations of the stochastic game in pure strategies is not empty;

- If (s1 ,s2, ..., sn) is an extreme point of S then ax(s1,s2, ..., sn) = ^x(s1, s2, ..., sn), Vx € X, i = 1, 2. ..., n and all Nash equilibria situations for the continuous game model that correspond to extreme points in S represent Nash equilibria situations in pure strategies.

From Lemma 3 as a corollary we obtain the following result.

Lemma 4. For an arbitrary stochastic game with discounted payoffs each payoff function <£>xo (s1,s2, ..., sn), i € {1,2, ..., n} possesses the property that

^>xo (s1,®2, ..., si-1,s®,si+1, ..., sn) is monotone with respect to s® € S for _k

arbitrary fixed sk € S , k = 1, 2, ..., i — 1, i + 1, ..., n.

Using this lemma we can prove the following theorem.

Theorem 2. Let a stochastic game (X, {A®}®=m, {f®(x, a}®=— ,p, 7, xo) with the starting position x G X and discounted payoff functions

(s1, s2, . .., sn), (s1, s2, . .., sm), ..., ^(s1, s2, .. ., sn)

of the players 1, 2, ..., n, be given. Then in the considered game there exists a Nash equilibrium s* = (s1 , s2 , .. ., sn*) on S which is a Nash equilibrium for an arbitrary starting position x G X.

The proof of this theorem is similar to the proof of Theorem 1, i.e. the existence of Nash equilibria for the continuous game with payoff functions ^>xo (s1, s2, ..., sn), i G {1, 2, ..., n} on S can be gained in a analogues way as for the game with average payoffs if we apply Lemma 4 and the corresponding results from (Debreu, 1952, Dasgupta and Maskin, 1986, Simon, 1987 and Reny, 1999).

5. Conclusion

The considered n-person stochastic games can be studied using the continuous game models. Based on the proposed approach new Nash equilibria conditions for the games with average and discounted payoffs have been derived and some approaches for determining the optimal stationary strategies of the players are proposed. The obtained results can be extended for the antagonistic stochastic games.

References

Dasgupta, P, Maskin, E. (1986). The existence of Equilibrium in Discontinuous Economic

Games. Review of Economic Studies, 53, 1-26. Debreu, G. (1952). A Social Equilibrium Existence Theorem, Proceedings of the National

Academy of Aciences, 386-393. Filar, J. A., Vrieze, K. (1997). Competitive Markov Decision Processes. Springer, 1997. Gillette, D. (1957). Stochastic games with zero stop probabilities. Contribution to the

Theory of Games, vol. III, Princeton, 179-187. Lal, A.K., Sinha S. (1992). Zero-sum two person semi-Markov games, J. Appl. Prob., 29, 56-72.

Lozovanu, D. (2011). The game-theoretical approach to Markov decision problems and determining Nash equilibria for stochastic positional games. Int. J. Mathematical Modelling and Numerical Optimization, 2(2), 162-164. Lozovanu, D., Pickl, S. (2014). Nash equilibria conditions for stochastic positional games. Contribution to Game Theory and Management, VII, Saint. Petersburg State University, 10, 201-213.

Lozovanu, D., Pickl, S. (2015). Optimization of Stochastic Discrete Systems and Control

on Complex Networks. Springer. Mertens, J. F., Neyman, A. (1981) Stochastic games. International Journal of Game Theory, 10, 53-66.

Neyman, A., Sorin, S. (2003). Stochastic games and applications. NATO ASI series, Kluver Academic press.

Owen, G. (1982). Game Theory, 2nd edition, Academic Press, New York. Puterman, M. (2005). Markov Decision Processes:Stochastic Dynamic Programming. John Wiley, New Jersey.

Reny, F. (1999). On the existence of Pure and Mixed Strategy Nash Equilibria In Discontinuous Games. Economertrica, 67, 1029-1056. Shapley, L. (1953). Stochastic games. Proc. Natl. Acad. Sci. U.S.A., 39, 1095-1100. Simon, L. (1987). Games with Discontinuous Payoffs. Review of Economic Studies, 54, 569-597.

i Надоели баннеры? Вы всегда можете отключить рекламу.