Научная статья на тему 'Nash equilibria conditions for stochastic positional games'

Nash equilibria conditions for stochastic positional games Текст научной статьи по специальности «Математика»

CC BY
5
4
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
MARKOV DECISION PROCESSES / STOCHASTIC POSITIONAL GAMES / NASH EQUILIBRIA / SHAPLEY STOCHASTIC GAMES / OPTIMAL STATIONARY STRATEGIES

Аннотация научной статьи по математике, автор научной работы — Lozovanu Dmitrii, Pickl Stefan

We formulate and study a class of stochastic positional games using a game-theoretical concept to finite state space Markov decision processes with an average and expected total discounted costs optimization criteria. Nash equilibria conditions for the considered class of games are proven and some approaches for determining the optimal strategies of the players are analyzed. The obtained results extend Nash equilibria conditions for deterministic positional games and can be used for studying Shapley stochastic games with average payoffs.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Nash equilibria conditions for stochastic positional games»

Nash Equilibria Conditions for Stochastic Positional Games

Dmitrii Lozovanu1 and Stefan Pickl2

1 Institute of Mathematics and Computer Science,

Academy of Sciences of Moldova,

Academy str., 5, Chisinau, MD-2028, Moldova E-mail: lozovanu@math.md http://www.math.md/structure/applied-mathematics/math-modeling-optimization/

2 Institute for Theoretical Computer Science, Mathematics and Operations Research, Universitat der Bundeswehr, Munchen 85577 Neubiberg-Munchen, Germany E-mail: stefan.pickl@unibw.de

Abstract We formulate and study a class of stochastic positional games using a game-theoretical concept to finite state space Markov decision processes with an average and expected total discounted costs optimization criteria. Nash equilibria conditions for the considered class of games are proven and some approaches for determining the optimal strategies of the players are analyzed. The obtained results extend Nash equilibria conditions for deterministic positional games and can be used for studying Shapley stochastic games with average payoffs.

Keywords: Markov decision processes, stochastic positional games, Nash equilibria, Shapley stochastic games, optimal stationary strategies.

1. Introduction

In this paper we consider a class of stochastic positional games that extends deterministic positional games studied by Moulin,1976, Ehrenfeucht and Mycielski, 1979, Gurvich at al., 1988, Condon, 1992, Lozovanu and Pick, 2006, 2009. The considered class of games we formulate and study applying the concept of positional games to finite state space Markov decision processes with average and expected total discounted costs optimization criteria. We assume that the Markov process is controlled by several actors (players) as follows: The set of states of the system is divided into several disjoint subsets which represent the corresponding position sets of the players. Additionally the cost of system’s transition from one state to another is given for each player separately. Each player has to determine which action should be taken in each state of his position set of the Markov process in order to minimize his own average cost per transition or the expected total discounted cost. In these games we are seeking for a Nash equilibrium.

The main results of the paper are concerned with the existence of Nash equilibria for the considered class of games and determining the optimal strategies of the players. Necessary and sufficient conditions for the existence of Nash equilibria in stochastic positional games that extend Nash equilibria conditions for deterministic positional games are proven. Based on the constructive proof of these results we propose some approaches for determining the optimal strategies of the players. Additionally we show that the stochastic positional games are tightly connected with Shapley stochastic games (Shapley, 1953) and the obtained results can be used for studying a special class of Shapley stochastic games with average payoffs.

2. Formulation of the Basic Game Models and Some Preliminary Results

We consider two game-theoretic models. We formulate the first game model for Markov decision processes with average cost optimization criterion and call it the stochastic positional game with average payoffs. We formulate the second one for Markov decision processes with discounted cost optimization criterion and call it stochastic positional game with discounted payoffs. Then we show the relationship of these games with Shapley stochastic games.

2.1. Stochastic Positional Games with Average Payoffs

To formulate the stochastic positional game with average payoffs we shall use the framework of a Markov decision process (X, A,p, c) with a finite set of states X, a finite set of actions A, a transition probability function p : X x X x A ^ [0,1] that satisfies the condition

Y,pax,y = 1, Vx e X, Va e A

yex

and a transition cost function c : X x X ^ R which gives the costs cx,y of states transitions of the dynamical system from an arbitrary x e X to another state y e X (see Howard, 1960; Puterman, 2005). For the noncooperative game model with m players we assume that m transition cost functions

cj : X x X ^ R, i = 1, 2,..., m

are given, where cX,y expresses the cost of the system’s transition from the state x e X to the state y e X for the player i e {1, 2,..., m}. In addition we assume that the set of states X is divided into m disjoint subsets X1; X2,..., Xm

x = x1 u x2 u---u Xm (Xi n xj = 0, Vi = j),

where Xi represents the positions set of the player i e {1, 2, . . . , m}. So, the Markov process is controlled by m players, where each player i e {1, 2, . . . , m} fixes actions in his positions x e Xi. We assume that each player fixes actions in the states from his positions set using stationary strategies, i.e. we define the stationary strategies of the players as m maps:

sl : x ^ a e Aj(x) for x e Xj, i = 1, 2,..., m,

where Ai(x) is the set of actions of the player i in the state x e Xi. Without loss of generality we may consider |Aj(x)| = |Aj| = |A|, Vx e Xj, i = 1, 2,..., m. In order to simplify the notation we denote the set of possible actions in a state x e X for an arbitrary player by A(x). A stationary strategy sj, i e {1,2,..., m} in the state x e Xj means that at every discrete moment of time t = 0,1, 2,... the player i uses the action a = sj(x). Players fix their strategy independently and do not inform each other which strategies they use in the decision process.

If the players 1, 2, ...,m fix their stationary strategies s1, s2,..., sm, respectively, then we obtain a situation s = (s1, s2,..., sm). This situation corresponds

to a simple Markov process determined by the probability distributions pX,yX) in the states x e Xj for i = 1, 2, ...,m. We denote by Ps = (pX,y) the matrix of

probability transitions of this Markov process. If the starting state x0 is given, then for the Markov process with the matrix of probability transitions Ps we can determine the average cost per transition ^X0(s1, s2,..., sm) with respect to each player i e {1, 2,..., m} taking into account the corresponding matrix of transition costs Cj = (cX ). So, on the set of situations we can define the payoff functions of the players as follows:

Fi(

Xo V

12

12

'), i = І, 2,

In such a way we obtain a discrete noncooperative game in normal form which is determined by a finite set of strategies S1, S2,..., Sm of m players and the payoff functions defined above. In this game we are seeking for a Nash equilibrium (see Nash, 1951), i.e., we consider the problem of determining the stationary strategies

і * o* s1 , s2 ,

-1*,si*,

,i+1*

such that

FXo (s^ ,s2* ,...,s

< FXo(s1*,s2*,

i-1* si* si+1*

sm*)

i-1*

s\si+1" ,...,sm*), Vs® Є S®, i = І, 2,...,

The game defined above is determined uniquely by the set of states X, the position sets X1, X2,..., Xm, the set of actions A, the cost functions cj : X x X ^ R, i = 1,2,..., m, the probability function p : X x X x A ^ [0,1] and the starting position xq. Therefore, we denote this game by (X, A, {X*}i=Y^, {cl}i=T^Fn V-, xo)• In the case m = 2 and c2 = —c1 we obtain an antagonistic stochastic positional game. If pX,y = 0 V 1, Vx,y e X, Va e A the stochastic positional game (X, A, {Xj}i=y^, {c*}i=y—, p, xo) is transformed into the cyclic game (Ehrenfeucht and Mycielski, 1979, Gurvich at al., 1988, Condon, 1992,

Lozovanu and Pick, 2006). Some results concerned with the existence of Nash equilibria for stochastic positional games with average payoffs have been derived by Lozovanu at al., 2011. In particular the following theorem has been proven.

Theorem 1. If for an arbitrary situation s = (s1,s2

, sm) of the stochastic

positional game with average payoffs the matrix of probability transitions Ps = (pX,y) induces an ergodic Markov chain then for the game there exists a Nash equilibrium.

If the matrix Ps for some situations do not correspond to an ergodic Markov chain then for the stochastic positional game with average payoffs a Nash equilibrium may not exist. This follow from the constructive proof of this theorem (see Lozovanu at al., 2011). An example of a deterministic positional game with average payoffs for which Nash equilibrium does not exist has been constructed by Gurvich at al., 1988. However, in the case of antagonistic stochastic positional games saddle points always exist (Lozovanu and Pickl, 2014), i.e. in this case the following theorem holds.

Theorem 2. For an arbitrary antagonistic positional game there exists a saddle point.

s

s

m.

m *

s

s

s

m.

The existence of saddle points for deterministic positional games with average payoffs have been proven by Ehrenfeucht and Mycielski, 1979, Gurvich at al., 1988.

2.2. Stochastic Positional Games with Discounted Payoffs

We formulate the stochastic positional game with discounted payoffs in a similar way as the game from Section 2.. We assume that for the Markov process m transition cost functions cj : X x X ^ R, i = 1,2,..., m, are given and the set of states X is divided into m disjoint subsets X1,X2,... ,Xm, where Xj represents the positions set of the player i e {1, 2,..., m}. The Markov process is controlled by m players, where each player i e {1, 2, ...,m} fixes actions in his positions x e Xj using stationary strategies, i.e. the stationary strategies of the players in this game are defined as m maps:

sj : x ^ a e A(x) for x e Xj; i = 1, 2,..., m.

Let s1, s2,..., sm be a set of stationary strategies of the players that determine the situation s = (s1, s2,..., sm). Consider the matrix of probability transitions Ps = (pX,y) which is induced by the situation s, i.e., each row of this matrix corresponds to

a probability distributionpX,yX) in the state x where x e Xj. If the starting state x0 is given, then for the Markov process with the matrix of probability transitions Ps we can determine the discounted expected total cost aX0 (s1, s2,..., sm) with respect to each player i e {1, 2, ...,m} taking into account the corresponding matrix of transition costs Cj = (cX,y). So, on the set of situations we can define the payoff functions of the players as follows:

FXo (s1, s2,. .., sm) = aXo (s1, s2,. .., sm), i = 1, 2, .. ., m.

In such a way we obtain a new discrete noncooperative game in normal form which is determined by the sets of strategies S1, S1,..., Sm of m players and the payoff functions defined above. In this game we are seeking for a Nash equilibrium. We denote the stochastic positional game with discounted payoffs by (X, A, {Xi

{ci)i=71 xo)-

For this game the following result has been proven (Lozovanu, 2011).

Theorem 3. For an arbitrary stochastic positional game (X, A, {Xi}i=Y^, {c*}i=TTn’ p, 7, xo) with given discount factor 0 < 7 < 1 there exists a Nash equilibrium.

Based on a constructive proof of Theorems 1,3 some iterative procedures for determining Nash equilibria in the considered positional games have been proposed (see Lozovanu at al., 2011).

2.3. The Relationship of Stochastic Positional Games with Shapley Stochastic Games

A stochastic game in the sense of Shapley (see Shapley, 1953) is a dynamic game with probabilistic transitions played by several players in a sequence of stages, where the beginning of each stage corresponds to a state of the dynamical system. The game starts at a given state from the set of states of the system. At each stage players select actions from their feasible sets of actions and each player receives a stage payoff that depends on the current state and the chosen actions. The game then moves to a new random state the distribution of which depends on the previous state and the actions chosen by the players. The procedure is repeated at a new

state and the play continues for a finite or infinite number of stages. The total payoff of a player is either the limit inferior of the average of the stage payoffs or the discounted sum of the stage payoffs.

So, an average Shapley stochastic game with m players consists of the following elements:

1. A state space X (which we assume to be finite);

2. A finite set Aj(x) of actions with respect to each player i e {1, 2,..., m} for an arbitrary state x e X;

3. A stage payoff f j(x, a) with respect to each player i e {1, 2,..., m}

for each state x e X and for an arbitrary action vector a e j Aj(x);

4. A transition probability function p : X x X£X f|j Aj(x) x X ^ [0,1] that gives the probability transitions pX,y from an arbitrary x e X to an arbitrary y e Y for a fixed action vector a e j Aj(x), where 5-yexpX,y = 1, Vx e X, a e rijAj(x);

5. A starting state x0 e X.

The stochastic game starts in state x0. At stage t players observe state xt and simultaneously choose actions at e Aj(xt), i = 1, 2,..., m. Then nature selects a state xt+1 according to probability transitions pXt,y for fixed action vector at = (a]-, aj2,..., am). A play of the stochastic game x0, a0, x1, a1,..., xt, at,... defines a stream of payoffs f0, f j, f,..., where ftj = f j(xt, at), t = 0,1, 2,.... The t-stage average stochastic game is the game where the payoff of player i e {1, 2,..., m} is

1 t-1 T =1

The infinite average stochastic game is the game where the payoff of player i e

{1, 2, . . . , m} is

T = lim Fj.

t—— ^O

In a similar a Shapley stochastic game with expected discounted payoffs of the players is defined. In such a game along to the elements described above also a discount factor A (0 < A < 1) is given and the total payoff of a player represents the expected discounted sum of the stage payoffs.

By comparison for Shapley stochastic games with stochastic positional games we can observe the following. The probability transitions from a state to another state as well as the stage payoffs of the players in a Shapley stochastic game depend on the actions chosen by all players, while the probability transitions from a state to another state as well as the stage payoffs (the immediate costs of the players) in a stochastic positional game depend only on the action of the player that controls the state in his position set. This means that a stochastic positional game can be regarded as a special case of the Shapley stochastic game. Nevertheless we can see that stochastic positional games can be used for studying some classes of Shapley stochastic games.

The main results concerned with determining Nash equilibria in Shapley stochastic games have been obtained by Gillette, 1957, Mertens and Neyman, 1981, Filar and Vrieze, 1997, Lal and Sinha, 1992, Neyman and Sorin, 2003. Existence

of Nash equilibria for such games are proven in the case of stochastic games with a finite set of stages and in the case of the games with infinite stages if the total payoff of each player is the discounted sum of stage payoffs. If the total payoff of a player represents the limit inferior of the average of the stage payoffs then the existence of a Nash equilibrium in Shapley stochastic games is an open question. Based on the results mentioned in previous sections we can show that in the case of the average non-antagonistic stochastic games a Nash equilibrium may not exist. In order to prove this we can use the average stochastic positional game (X, A, {Xi}i=YWn {cl}i=Tmi Pi xo) from section 2. It is easy to observe that this game can be regarded as a Shapley stochastic game with average payoff functions of the players, where for a fixed situation s = (s1, s2,..., sm) the probability transition pX,y from a state x = x(t) e Xj to a state y = x(t + 1) e X depends only on a strategy sj of player i and the corresponding stage payoff in the state x of player i e {1, 2, ...,m} is equal to pX,y cX,y. Taking into account that the

cyclic game represents a particular case of the average stochastic positional game and for the cyclic game Nash equilibrium may not exist (see Gurvich at al., 1988) we obtain that for the average non-antagonistic Shapley stochastic game a Nash equilibrium may not exist. However in the case of average payoffs Theorem 1 can be extended for Shapley stochastic games.

3. Nash Equilibria Conditions for Stochastic Positional Games with Average Payoffs

In this section we formulate Nash equilibria conditions for stochastic positional games in terms of bias equations for Markov decision processes. We can see that Nash equilibria conditions in such terms may be more useful for determining the optimal strategies of the players.

Theorem 4. Let (X, A, {Xi}i=Y^, {ct}i=Y^,p,x) be a stochastic positional game with a given starting position x € X and average payoff functions

F±(s\ s2, ..., sm), F^{s\ s2, ..., sm), ..., F^{s\ s2, ..., sm)

of the players 1, 2,..., m, respectively. Assume that for an arbitrary situation s = (s1, s2,..., sm) of the game the transition probability matrix Ps = (pX,y) corresponds to an ergodic Markov chain. Then there exist the functions

ej : X ^ R, i = 1, 2,..., m and the values w1, w2,..., wm that satisfy the following conditions:

1) MX a + E pX y^y — ^X — wj ^ 0, Vx e Xj, Va e A(x), i — 1, 2,..., m,

y£X

where m*x,o = E pX,ycX,y;

y£X

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2) ^X,a + E pX,y4 — 4 — wj} = 0 Vx e ^ i = 12,..., m;

a£A(X) y£X

3) on each position set Xj, i e {1, 2,..., m} there exists a map sj* : Xj ^ A such that

sj*(x) = a* e Arg {MX,o + ^2 pX,y4 — eX — wj}

y£X

and

+ Y1 PX,y ЄУ — 4 — ^ =0, Vx Є Xi, j = 1, 2, ...,m.

yex

The set of maps s1 *,s2 *,..., sm* determines a Nash equilibrium situation s * = (s1 , s2 ,sm*) for the stochastic positional game (X, A, {xi}j=T^;, {c*}i=T^ p, x) and

F^s1*, s2*,..., sm*) = oji, Vx £ X, i = 1,2,... ,m.

Moreover, the situation s * = (s1*,s2*,..., sm *) is a Nash equilibrium for an arbitrary starting position x € X.

Proof. Let a stochastic positional game with average payoffs be given and assume that for an arbitrary situation s of the game the transition probability matrix Ps = (pX,y) corresponds to an ergodic Markov chain. Then according to Theorem 1 for this game there exists a Nash equilibrium s * = (s1 *, s2*,..., sm*) and we can set

w4 = F^s1*, s2*,..., sm*), Vx G X, i = l,2,...,m.

Let us fix the strategies s1*, s2*,..., s*-1 *, s*+1 *,..., sm * of the players 1, 2, ..., i — 1, i + 1,..., m and consider the problem of determining the minimal average cost per transition with respect to player i. Obviously, if we solve this decision problem then we obtain the strategy s* . We can determine the optimal strategy of this decision problem with an average cost optimization criterion using the bias equations with respect to player i. This means that there exist the functions e* : X ^ R and the values w*, i = 1, 2,..., m that satisfy the conditions:

1) + E ey — eX — w* ^ 0 Vx e X^ Va e A(x);

yex

2) {Ms,a + E PX,y4 — 4 — w*| = 0, Vx e X*.

Moreover, for fixed strategies s1*, s2*,..., s*-1 *, s*+1 *,..., sm * of the corresponding players 1, 2,..., i — 1, i + 1,..., m we can select the strategy s* of player i where

s** (x) e Arg (Ms,a +J2 PX,y4 — eX — w*

yex

and w* = F*(s1 *, s2*,..., sm*), Vx e X, i =1, 2,..., m. This means that conditions

1)-3) of the theorem hold.

Corollary 1. If for a stochastic positional game {X, A, {Xi}i=Y^, {c*}i=x^, p, x) with average payoffs there exist a Nash equilibrium s * = (s1 *, s2*,..., sm*) which is a Nash equilibrium for an arbitrary starting position of the game x e X and for arbitrary two different starting positions x, y e X holds F® (s1 , s2 ,..., sm*) = Fy(s1 , s2 ,..., sm*) then there exists the functions

e* : X ^ R, i = 1, 2,..., m

and the values w1, w2,..., wm that satisfy the conditions 1) — 3) from Theorem 4. So, w* = F*(s1 , s2 ,..., sm *), Vx e X, i = 1, 2,..., m and an arbitrary Nash equilibrium can be found by fixing

m

aeA(x)

s**(x) = a * e Arg anun (^x,a + PX,yey — eX — w*}.

yex

Using the elementary properties of non ergodic Markov decision processes with average cost optimization criterion the following lemma can be gained.

Lemma 1. Let (X, A, {Xi}i=Y^, {c*}i=Tni’ P> x) be an average stochastic positional game for which there exists a Nash equilibrium s * = (s1*, s2*,..., sm *), which is a Nash equilibrium for an arbitrary starting position of the game with w * = F* (s1* ,s2*,..., sm *). Then s * = (s1 *,s2* ,...,sm*) is a Nash equilibrium for the average stochastic positional game (X, A, {Xi}i=Y^, {ct}i=Tm> P> x)> where

K,y = clx,y ~ WX, Vx, y G X, i = 1, 2,..., to

and

Fjs1*, s2*,..., sm*) = 0, Vx G X, i = 1, 2,..., to.

Now using Corollary 1 and Lemma 1 we can prove the following results.

Theorem 5. Let {X, A, {xi}j=T^;, {c*}i=r^,P, x) be an average stochastic positional game. Then in this game there exists a Nash equilibrium for an arbitrary starting position x e X if and only if there exist the functions

e* : X ^ R, i =1, 2,..., m

and the values w^, w^,..., w^ for x e X that satisfy the following conditions:

1) M*c,a + E pX,yey — eX — w* > 0, Vx e X^ Va e A(x) i = 1 2, . . . , 5

yex

where MX,a = E p X,ycX,y;

yex

2) m!n ^X,a + E pX,yey — eX — w*} = 0, Vx e X i = 1,2,..., m;

aeA( x) yex

3) on each position set X*, i e {1, 2,..., m} there exists a map s** : X* ^ A such that

* *(x) = a * e Arg anun I M*x,a + XI P X,yey — ex — w*}

and

a£A(x) .

( ) yex

MX,a* + ^ PX,y ey — eX — wj =0, Vx e X*, j 1 2, ...,m.

yex

If such conditions hold then the set of maps s1*, s2*,..., sm * determines a Nash equilibrium of the game for an arbitrary starting position x e X and

FX (s1 *,s2* ,...,sm* ) = wX, i = 1, 2,...,

m.

Proof. The sufficiency condition of the theorem is evident. Let us prove the necessity one. Assume that for the considered average stochastic positional game there exists a Nash equilibrium s * = (s1 , s2 ,..., sm *) which is a Nash equilibrium for an arbitrary starting position of the game. Denote

4 = FX (s1* ,s2 *,..., sm *), Vx e X, i =1, 2,..., m

and consider the following auxiliary game (X, A, {Aj}i=y^, {cl}i=T7^, Pi x)>

where

K,y = clx,y ~ Vx, y € X, i= 1,2,..., to.

Then according to Lemma 1 the auxiliary game has the same Nash equilibrium s = (s1 *, s2*,..., sm*) as initial one. Moreover, this equilibrium is a Nash equilibrium for an arbitrary starting position of the game and

FX(s1*, s2*,..., sm *) =0, Vx e X, i = 1, 2,..., m.

Therefore, according to Corollary 1, for the auxiliary game there exist the functions

e* : X ^ R, i = 1, 2,..., m

and the values w1, cJ2,..., Tv171 (uj1 = 0, i = 1, 2,..., to), that satisfy the conditions

of Theorem 4, i.e.

V J^x,a + E Px,y£l - £x - ^x > 0; ^ ^ A(x)j * = 1, 2, . . . , m,

yex

where /4 = £ P%,y?x,y\

yex

^ m|n {/4,a + E Px,y4 - 4 - <4} = 0, Vx G Xi, i = 1, 2,..., to;

aeA(x) yex

3) on each position set X*, i e {1, 2,..., m} there exists a map s** : X* ^ A such that

sl*(x) = a* G Arg ^min |^,0 + ^ Px,y£v ~ £%x ~

yex

and

PL,a* + J2Px,y£l -4 = °, Vx G It, j = 1, 2,..., to.

yex

Taking into account that = 0, and /1^, = /4 — uj1x (because ~&xy = cXyV — oj1x )

we obtain conditions 1 — 3 of the theorem.

4. Nash Equilibria Conditions for Stochastic Positional Games with Discounted Payoffs

Now we formulate Nash equilibria conditions in the terms of bias equations for stochastic positional games with discounted payoffs.

Theorem 6. Let a stochastic positional game (X, A, {Xi}i=y—, {c*}i=Y^, p, 7, x) with a discount factor 0 < 7 < 1 be given. Then there exist the values aX, i =1, 2,..., m, for x e X that satisfy the following conditions:

1) MX,a + Y E PX,yay — aX > 0, Vx e X*, Va e A(x) i = 1, 2, ...,m,

yex

where ^X,a = E Px,yCX,y .

yex

2) ml^ (M‘x,a + y E pX,yay — aX| =0, Vx e X*, i = 12,...,m;

aeA(x^ yeX J

3) on each position set X*, i e {1, 2,..., m} there exists a map s** : X* ^ A such

that

s* *(x) = a * e Arg jMX,a + Y ^ PX,yay — aX |, Vx e X*

and

MX,a* + Y^PX^ — aj =0, Vx e X*, j = 1 2, ...,m.

yex

The set of maps s1 *, s2*,..., sm* determines a Nash equilibrium situation s * = (s1 , s2 ,..., sm *) for the stochastic positional game with discounted payoffs, where

F^s1*, s2*,..., sm*) = 4, Vx G X, i = 1, 2,..., to.

Moreover, the situation s * = (s1*, s2*,..., sm *) is a Nash equilibrium for an arbitrary starting position x G X.

Proof. According to Theorem 3 for the discounted stochastic positional game (X, A, {Xj}i=I^, {c*}i=i^;, P, Y, x) there exists a Nash equilibrium s* = (s1*, s2*,..., sm*) which is a Nash equilibrium for an arbitrary starting position x e X of the game. Denote

a* = F* (s1 *,s2 *,..., sm*), Vx e X, i = 1, 2, .. ., m.

Let us fix the strategies s1 *, s2*,..., s*-1 *, s*+1 *,..., sm* of the players 1, 2, ..., i — 1, i + 1,..., m and consider the problem of determining the expected total discounted cost with respect to player i. Obviously, the optimal stationary strategy for this problem is s* . Then according to the properties of the bias equations for this Markov decision problem with discounted costs there exist the values a*, i = 1, 2,..., m, for x e X that satisfy the conditions:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

1) MX,a + Y E pX,yay — a* > 0, Vx e X*, Va e A(x), i = 1,2,...,m;

yex

2) m]^ l^X,a + Y E PX,yay — <} =0, Vx e X* i = 1 2, ...,m.

aeA(X) I ' yex J

Moreover, for fixed strategies s1*, s2 *,..., s*-1 *,s** ,s*+1 *,..., sm* of the corresponding players 1, 2, . . . , i — 1, i + 1, . . . , m we can select the strategy s* of the player i where

s* * (x) e Arg aminX) j ^X,a + Y ^ P X,y ay — a x}

and

F^(s1\s2\...,sm*)=a^ VxGX, i = 1,2, ...,to.

This means that the conditions 1)-3) of the theorem hold.

5. Saddle Point Conditions for Antagonistic Stochastic Positional Games

The antagonistic stochastic positional game with the average payoff corresponds to the case of the game from Section 2 in the case m = 2 when c = c1 = —c2. So, we have a game (X, A,Xi, X2,c,p,x) where the stationary strategies s1 and s2 of the players are defined as two maps

s1 : x ^ a e A1(x) for x e X1; s2 : x ^ a e A1(x) for x e X2.

and the payoff function FX(s1, s2) of the players is determined by the values of average costs wS in the Markov processes with the corresponding probability matrices Ps induced by the situations s = (s1, s2) e S. For this game saddle points s1 , s2 always exists (Lozovanu and Pickl, 2014) , i.e. for a given starting position x e X holds

FyIs1 , s2 ) = min max i'Vfs1, s2) = max min i'Vfs1, s2).

s^S1 s2eS2 s2eS2 s^S1

Theorem 7. Let (X, A, X\,X^, c, p,x) be an arbitrary antagonistic stochastic positional game with an average payoff function Px(si,s2). Then the system of equations

О®

, , і ,'l'x,a I px,yєУ

»eA(xU yex

Vx Є X1;

eX + wx = min <^X,a + E pXyey >, Vx e X2;

aeA(XH yex ’ J

has solution under the set of solutions of the system of equations

Vx e X1;

max < E pX,ywx[>,

a^A(xH yex

min

aeA(xU yex

E pX,ywx

Vx Є X2,

i.e. the last system of equations has such a solution wX, x e X for which there exists a solution eX, x e X of the system of equations

eX + wX = max <^X,a + E PXy ey \, Vx e X1; aeA(XH yex ’ )

eX + wX = mjn \ MX,a + E PX,yey \, Vx e X2.

aeA(XU yeX J

The optimal stationary strategies of the players

s1* : x ^ a1 e A(x) for x e X1;

s2* : x ^ a2 e A(x) for x e X2

in the antagonistic stochastic positional game can be found by fixing arbitrary maps s1 (x) e A(x) for x e X1 and s2 (x) e A(x) for x e X2 such that

X

s1*(x) e ( Arg max \ Y, PX,ywX M Pi ( Arg max i MX,a + E PX,yey

V aeA(XH yex )) V aeA(XH yex

Vx e X1

and

s2 *(x) e (Arg ^,1 E PX,ywX}) H fArg m|n (^X.a + E PX,yey

V aeA(X) [ yex J / V aeA(X^ yex

Vx e X2

For the strategies s1*, s2* the corresponding values of the payoff function F^is1*, s2*) coincides with the values to£ for x € X and

FX(s1 *,s2*) = min max FX(s1, s2) = max min FX(s1,s2) Vx (E X.

s1 eS1 s2eS2 s2 eS2 s1 eS1

Based on the constructive proof of this theorem (see Lozovanu and Pickl, 2014) an algorithm for determining the saddle points in antagonistic stochastic positional games has been elaborated. The saddle point conditions for antagonistic stochastic positional games with a discounted payoff can be derived from Theorem 6.

6. Conclusion

Stochastic positional games with average and discounted payoffs represent a special class of Shapley stochastic games that extends deterministic positional games. For the considered class of games Nash equilibria conditions have been formulated and proven. Based on these results new algorithms for determining the optimal stationary strategies of the players can be elaborated.

References

Condon, A. (1992). The complexity of stochastic games. Informations and Computation. 96(2), 203-224.

Ehrenfeucht, A., Mycielski, J. (1979). Positional strategies for mean payoff games. International Journal of Game Theory, 8, 109-113. 8-113.

Filar, J.A., Vrieze, K. (1997). Competitive Markov Decision Processes. Springer, 1997. Gillette, D. (1957) Stochastic games with zero stop probabilities. Contribution to the Theory of Games, vol. III, Princeton, 179-187.

Gurvich, V.A., Karzanov, A.V., Khachian, L.G. (1988). Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR, Computational Mathematics and Mathematical Physics, 28, 85-91.

Howard, R.A. (1960). Dynamic Programming and Markov Processes. Wiley.

Lal, A.K., Sinha S. (1992) Zero-sum two person semi-Markov games, J. Appl. Prob., 29, 56-72.

Lozovanu, D. (2011) The game-theoretical approach to Markov decision problems and determining Nash equilibria for stochastic positional games. Int. J. Mathematical Modelling and Numerical Optimization, 2(2), 162-164.

Lozovanu, D., Pickl, S. (2006) Nash equilibria conditions for Cyclic Games with p players. Eletronic Notes in Discrete Mathematics, 25, 117-124.

Lozovanu, D., Pickl, S. (2009) Optimization and Multiobjective Control of Time-Discrete Systems. Springer.

Lozovanu, D., Pickl, S., Kropat, E. (2011) Markov decision processes and determining Nash equilibria for stochastic positional games, Proceedings of 18th World Congress IFAC-2011, 13398-13493.

Lozovanu, D., Pickl, S. (2014) Antagonistic Positional Games in Markov Decision Processes and Algorithms for Determining the Saddle Points. Discrete Applied Mathematics, 2014 (Accepted for publication).

Mertens, J.F., Neyman, A. (1981) Stochastic games. International Journal of Game Theory, 10, 53-66.

Moulin, H. (1976) Prolongement des jeux a deux joueurs de somme nulle. Bull. Soc. Math. France., Mem 45.

Nash, J.F. (1951) Non cooperative games. Annals of Mathematics, 2, 286-295.

Neyman, A., Sorin, S. (2003). Stochastic games and applications. NATO ASI series, Kluver Academic press.

Puterman, M. (2005). Markov Decision Processes: Stochastic Dynamic Programming. John Wiley, New Jersey.

Shapley L. (1953). Stochastic games. Proc. Natl. Acad. Sci. U.S.A. 39, 1095-1100.

i Надоели баннеры? Вы всегда можете отключить рекламу.