Научная статья на тему 'STATIONARY NASH EQUILIBRIA FOR TWO-PLAYER AVERAGE STOCHASTIC GAMES WITH FINITE STATE AND ACTION SPACES'

STATIONARY NASH EQUILIBRIA FOR TWO-PLAYER AVERAGE STOCHASTIC GAMES WITH FINITE STATE AND ACTION SPACES Текст научной статьи по специальности «Математика»

CC BY
6
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
TWO-PLAYERS STOCHASTIC GAMES / AVERAGE PAYOFFS / STATIONARY NASH EQUILIBRIA / OPTIMAL STATIONARY STRATEGIES

Аннотация научной статьи по математике, автор научной работы — Lozovanu Dmitrii, Pickl Stefan

The problem of the existence and determining stationary Nash equilibria in two-player average stochastic games with finite state and action spaces is considered. We show that an arbitrary two-player average stochastic game can be formulated in the terms of stationary strategies where each payoff is graph-continuous and quasimonotonic with respect to player’s strategies. Based on this result we ground an approach for determining the optimal stationary strategies of the players in the considered games. Moreover, based on the proposed approach a new proof of the existence of stationary Nash equilibria in two-player average stochastic games is derived and the known methods for determining the optimal strategies for the games with quasimonotonic payoffs can be applied.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «STATIONARY NASH EQUILIBRIA FOR TWO-PLAYER AVERAGE STOCHASTIC GAMES WITH FINITE STATE AND ACTION SPACES»

Contributions to Game Theory and Management, X, 175—184

Stationary Nash Equilibria for Two-Player Average Stochastic Games with Finite State and Action Spaces

Dmitrii Lozovanu1 and Stefan Pickl2

1 Institute of Mathematics and Computer Science of Moldova Academy of Sciences,

Academiei 5, Chisinau, MD-2028, Moldova, E-mail: lozovanu@math.md

2 Institute for Theoretical Computer Science, Mathematics and Operations Research,,

Universität der Bundeswehr München, 85577 Neubiberg-München, Germany, E-mail: stefan.pickl@unibw.de

Abstract The problem of the existence and determining stationary Nash equilibria in two-player average stochastic games with finite state and action spaces is considered. We show that an arbitrary two-player average stochastic game can be formulated in the terms of stationary strategies where each payoff is graph-continuous and quasimonotonic with respect to player's strategies. Based on this result we ground an approach for determining the optimal stationary strategies of the players in the considered games. Moreover, based on the proposed approach a new proof of the existence of stationary Nash equilibria in two-player average stochastic games is derived and the known methods for determining the optimal strategies for the games with quasimonotonic payoffs can be applied.

Keywords: two-players stochastic games, average payoffs, stationary Nash equilibria, optimal stationary strategies

1. Introduction

The aim of this paper is to propose a new approach for determining stationary Nash equilibria in two-player average stochastic games with finite state and action spaces. We ground such an approach by using a new model in stationary strategies for the considered class of average stochastic games. We show that the payoffs of the players in the proposed model are quasimonotonic (i.e quasiconvex and qua-siconcave) with respect to the corresponding strategies of the players and satisfy the graph-continuity property in the sense of Dasgupta and Maskin, 1986. Based on these results a new proof of the existence of stationary Nash equilibria in the considered two-player average stochastic games is obtained and a new approach for determining the optimal stationary strategies of the players is proposed.

Note that two-player stochastic games with average and discounted payoffs have been studied by Mertens and Neyman, 1981, Vielle, 2000, Solan and Vieille, 2010, who proved the existence of stationary Nash equilibria and proposed computing procedures for determining the optimal stationary strategies of the players in two-player stochastic games. The approach we propose for two-player average stochastic games differ from the mentioned ones and it can be extended for n-player average stochastic games if the mentioned graph-continuous property for the payoffs holds. However the graph-continuous property for the average stochastic games with n > 3 players may not take place. It is well known that for an n-player average stochastic game (n > 3) a stationary Nash equilibrium may not exist. This fact has been shown by Flesch, Thuijman and Vrieze, 1997, who constructed an example of 3-player average stochastic game with fixed starting state for which a stationary Nash

equilibrium does not exist. Tijs and Vrieze, 1986, have shown that for an arbitrary average stochastic game with a finite set of states always exists a non empty subset of starting states for which a stationary Nash equilibrium exists. In the general case the problem of determining the states in average stochastic games for which stationary Nash equilibria exist is an open problem.

2. A Two-Player Average Stochastic Game in Stationary Strategies

We first present the framework of a two-person stochastic game and then specify the formulation of stochastic games with average payoffs when players use pure and mixed stationary strategies.

2.1. The Framework of a Two-Person Stochastic Game

A stochastic game with two players consists of the following elements:

- a state space X (which we assume to be finite);

- a finite set A1(x) of actions of player 1 for an arbitrary state x G X ;

- a finite set A2 (x) of actions of player 2 for an arbitrary state x G X ;

- a payoff f 1(x, a) with respect to player 1 for each state x G X and for an arbitrary action vector a = (a1, a2) G A1(x) x A2(x);

- a payoff f 2(x, a) with respect to player 2 for each state x G X and for an arbitrary action vector a = (a1, a2) G A1(x) x A2(x);

- a transition probability function p : X x (A1 (x) x A2(x)) x X ^ [0,1]

xex

that gives the probability transitions pX y from an arbitrary x G X to an

arbitrary y G Y for every action vector a = (a1, a2) G A1 (x) x A2(x)

where J] pX y = 1, Vx G X, a G A1(x) x A2(x); yex

- a starting state x0 € X.

The game starts in the state x0 and the play proceeds in a sequence of stages. At stage t players observe state xt and simultaneously and independently choose actions a\ € A®(xt), i = 1, 2. Then nature selects a state y = xt+1 according to probability transitions pX y for the given action vector at = (a1, a)2). Such a play of the game produces a sequence of states and actions x0, a0, xi, ai,..., xt, at,... that defines a stream of stage payoffs f1 = f 1(xt, at), ft2 = f 2(xt, at),..., ftn = f n(xt, at), t = 0, 1, 2, .. .. The infinite average stochastic game is the game with payoffs of the players

o4 = lim inf E (- V/® ) , i = 1,2,

where expresses the average payoff per transition of player i in an infinite game.

Each players has the aim to maximize his average payoff per transition. In the case i = 1 this game becomes the average Markov decision problem with a transition probability function p : X x A(x) x X ^ [0,1] and immediate rewards f (x, a) =

xex

f 1(x, a) in the states x € X for given actions a € A(x) = A1(x).

In the paper we will study the stochastic games when players use pure and mixed stationary strategies of selection of the actions in the states.

2.2. Pure and mixed stationary strategies of the players

A strategy of player i G {1, 2} in a stochastic game is a mapping s® that for every state xt G X provides a probability distribution over the set of actions A®(xt). If these probabilities take only values 0 and 1, then s® is called a pure strategy, otherwise s® is called a mixed strategy. If these probabilities depend only on the state xt = x G X (i. e. s® does not depend on t), then s® is called a stationary strategy, otherwise s® is called a non-stationary strategy. This means that a pure stationary strategy of player i G {1, 2} can be regarded as a map

s® : x ^ a® G A®(x) for x G X

that determines for each state x an action a® G A®(x), i.e. s®(x) = a®. Obviously, the corresponding sets of pure stationary strategies S1, S2,..., Sn of the players in the game with finite state and action spaces are finite sets.

In the following we will identify a pure stationary strategy s®(x) of player i with the set of boolean variables s^ ai G {0,1}, where for a given x G X s^ ai = 1 if and only if player i fixes the action a® G A®(x). So, we can represent the set of pure stationary strategies S ® of player i as the set of solutions of the following system:

E s4 i = 1, Vx G X;

a'eA'(i)

sX ai G {0,1}, Vx G X, Va® G A®(x).

If in this system we change the restriction s^ ai G {0,1} for x G X, a® G A®(x) by the condition 0 < s^ ai < 1 then we obtain the set of stationary strategies in the sense of Shapley, 1953, where s^ ai is treated as the probability of choices of the action a® by player i every time when the state x is reached by any route in the dynamic stochastic game. Thus, we can identify the set of mixed stationary strategies of the players with the set of solutions of the system

E s< i = 1, Vx G X;

aieAi(x) (1)

s* i > 0, Vx G X, Va® G A®(x)

and for a given profile s = (s1,s2) of mixed strategies s1,s2, of the players the probability transition pX y from a state x to a state y can be calculated as follows

s = V S1 , S2 2ü^a2) (2)

(a1,a2)eA(x)

In the sequel we will distinguish stochastic games in pure and mixed stationary strategies.

2.3. Average stochastic games in pure stationary strategies

Let s = (s1, s2) be a profile of pure stationary strategies of the players and denote by a(s) = (a1(s), a2(s)) G A1(x) x A2(x) the action vector that corresponds to s and determines the probability distributions pX y = PxS) in the states x G X. Then the average payoffs per transition (s), (s) for the players are determined as follows

wX0(s) = E qSo,y f®(y, a(s)), i = 1, 2

yex

where qS0 y represent the limiting probabilities in the states y € X for the Markov process with probability transition matrix Ps = (pX y) when the transitions start in x0. So, if for the Markov process with probability matrix Ps the corresponding limiting probability matrix Qs = (qX y) is known then w^, w^ can be determined for an arbitrary starting state x € X of the game. The functions w;^(s), w^(s) on S = S1 x S2 define a game in normal form that we denote ({S®}i=1j2, {wX0(s)}i=1j2). This game corresponds to an average stochastic game in pure stationary strategies that in extended form is determined by the tuple (X, {Ai(x)}i=1i2, {fi(x,a}i=1i2, p, xo).

If an arbitrary profile s = (s1, s2) of pure stationary strategies in a stochastic game induces a probability matrix Ps that corresponds to a Markov unichain then we say that the game possesses the unichain property and shortly we call it unichain stochastic game; otherwise we call it multichain stochastic game.

2.4. Average stochastic games in mixed stationary strategies

Let s = (s1,s2) be a profile of mixed stationary strategies of the players. Then elements of the probability transition matrix Ps = (pX y) in the Markov process induced by s can be calculated according to (3). Therefore if Qs = (qS y) is the limiting probability matrix of Ps then the average payoffs per transition w^ (s), w^ (s) for the players are determined as follows

wX0 (s) = E yf4(y,s), i = 1, 2, (3)

yex

where

f J(y,s)= E s;,ol s^f4(y, (a1, a2)) (4)

(a1 ,a2 )e A(y)

expresses the average payoff (immediate reward) in the state y € X of player i when the corresponding stationary strategies s1, s2 have been applied by players 1 and 2 in y.

Let ¿7 , ¿7 be the corresponding sets of mixed stationary strategies for the players

1, 2, i.e. each S1 for i G {1,2} represents the set of solutions of system (2). The

_ __2

functions (s), w2o(s) on S = S x S , defined according to (3),(4), determine

a game in normal form that we denote by ({S,i}i=i,2, {wk0(s)}«=1,2}- This game corresponds to an average stochastic game in mixed stationary strategies that in extended form is determined by the tuple (X, {A®(x)}i=1j2, {f®(x, a}i=1j2, p, x0).

2.5. Average stochastic games with random starting state

In the paper we will consider also average stochastic games in which the starting state is chosen randomly according to a given distribution {0X} on X. So, for a given stochastic game we will assume that the play starts in the states x € X with probabilities 0X > 0 where ^ 0X = 1. If the players use mixed stationary

Xex

strategies of selection the actions in the states then the payoff functions

^(s1,s2)= E ^xwX(s1,s2), i =1, 2

Xex

on S = S1 x S2 define a game in normal form ({S,i}i=i,2, {V,e(s)}«=i,2} that in extended form is determined by (X, {Al(x)}i=—, {/*(x, a}i=— ,P, {Ox})- In the

case 0x = 0, Vx € X \ {x0}, #xo = 1 the considered game becomes a stochastic game with fixed starting state x0.

3. Some Auxiliary Results

In this section we present some auxiliary results for the average Markov decision problem in the terms of stationary strategies and some auxiliary results related to the existence of pure-strategy Nash equilibria in n-person games.

3.1. Optimal Stationary Policies in the Average Markov Decision Problem

It is well-known that an optimal stationary policy (strategy) for the average Markov decision problem can be found by using the following linear programming model (see Puterman, 2005): Maximize

f (x,a)ax,a (5)

x£X a£A(x)

subject to

£ ay,a - £ £ ax,a = 0, Vy € X;

aeA(y) x£X a£A(x)

< £ ay,a + £ £y,a - £ £ £x,a = 0y, Vy € X; (6)

aGA(y) a£A(y) x£X a£A(x)

ax,a > 0, > 0, Vx € X, a € A(x),

where 0y for y € X represent arbitrary positive values that satisfy the condition £ 0y = 1, where 0y for y € Y are treated as the probabilities of choosing the

yex

starting state y € Y .In the case 0y = 1 for y = x0 and 0y = 0 for y € X \ {x0} we obtain the linear programming model for an average Markov decision problem with fixed starting state x0.

This linear programming model corresponds to a multichain case of an average Markov decision problem. If each stationary strategy in the decision problem induces an ergodic Markov chain then the restrictions (6) can be replaced by the restrictions

f £ ay,a -£ £ px,y ax,a = 0, Vy € X;

aeA(y) x£X aeA(x)

{ £ £ ay,a = 1; (7)

yeXaeA(y)

I ay,a > 0, Vy € X, a € A(y).

In the linear programming model (5),(6) the restrictions

ay,a ^y,a - 12 12 PX,y^x,a = ^y, Vy € X

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

aeA(y) aeA(y) xGX a£A(x)

with the condition £ 0y = 1 generalize the constraint

y£X

ay,a = 1

x£X aeA(y)

in linear programming model (5),(7) for the ergodic case.

The relationship between feasible solutions of problem (5),(6) and stationary strategies in the average Markov decision problem is the following: Let (a, 0) be a feasible solution of the linear programming problem (5), (6) and denote Xa = {x G X | J2 ax,a > 0}. Then (a, 0) possesses the properties that ^ 0x,a > 0 for

a£X aeA(i)

x G X \ Xa and a stationary strategy sx,a that correspond to (a, 0) is determined as

ax,a

sx.a \

E <

a£A(x)

E ^x.a

oeA(x)

if x G Xa;

if x G X \ Xa

(8)

where sX a expresses the probability of choosing the actions a € A(x) in the states x € X. Thus, s can be regarded as a mapping that for every state x € X provides a probability distribution over the set of actions A(x); if these probabilities take only values 0 and 1, then s corresponds to a pure stationary strategy, otherwise it corresponds to a mixed stationary strategy.

Using the linear programming problem (5),(7) Lozovanu, 2016, showed that an average Markov decision problem in terms of stationary strategies can be formulates as follows: Maximize

^(s, q,w)=E E f (x, a)sX,aqx (9)

x£X a£A(x)

subject to

qy - E E sx,aqx = 0,

xex aeA(i)

qy + wy - E E pX

x£X a£A(x)

E sy,a aeA(y)

1,

Vy G X ; Vy G X ; Vy G X ;

(10)

sx,a > 0, Vx € X, Va € A(x); wx > 0, Vx € X,

where 0y are the same values as in problem (5), (6) and sx a, qx, wx for x € X, a € A(x) represent the variables that must be found, where qx for x € X express the limiting probabilities in the states for the corresponding strategy s.

The main property that we shall use for the average stochastic game is represented by the theorem that has been proven by Lozovanu, 2016.

Theorem 1. Let an average Markov decision problem be given and consider the function

^(s) = E E f(x,a)sx,a qx, (11)

x£X a£A(x)

0

y

where qx for x G X satisfy the condition

% - Z Z sx,aqx =0, Vy G X;

x£X aeA(x)

( ) (12) «y + Wy - £ £ PX,ySx.aWx = , Vy G X.

xex oeA(i)

T/iera on. the set S of solutions of the system

£ Sx,a = 1, Vx € X;

a£A(x) (13)

sxa > 0, Vx € X, a € A(x)

the function ^(s) depends only on sx,a for x € X, a € A(x) and ^(s) is quasimonotone on S ( i.e. ^(s) is quasiconvex and quasiconcave on S).

Remark 1. The function (1) on S depends only on sx a for x G X, a G A(x) because system (12) uniquely determines qx, Vx € X for a given s € S.

3.2. Existence of Pure Nash equilibria in n-Player Games with Quasimonotonic Payoffs

Let /i(s)j=m) be an n-player game in normal form, where S*, i = 1, n rep-

resent the corresponding sets of strategies (pure strategies) of the players 1, 2, . . . , n

n _^ _

and /*: [I ^ R1, i = l,n represent the corresponding payoffs of these players.

j=i

_ n _^

Let s = (s1, s2,..., sn) be a profile of strategies of the players, s G S = S ,

j=i

__^ n _^ __^

and define s~l = (s1, s2,..., s*-1, s*+1,..., s"), S = [| S where s~l £ S .

_

Thus, for an arbitrary s G S we can write s = (s®, s-*).

Fan, 1966 extended the well-known equilibrium result of Nash, 1951 to the games with quasiconcave payoffs. He proved the following theorem:

Theorem 2. Let S , i = l,n be non-empty, convex and compact sets. If each payoff fl : S —>■ R1, i G {1, 2,.. ., n}, is continuous on S and quasiconcave with respect to sl on S , then the game (S i=—, /i(s)j=~) possesses a pure-strategy Nash equilibrium.

Dasgupta and Maskin, 1986 considered a class of games with discontinuous payoffs and proved a pure Nash equilibria existence result for the case when the payoffs are upper semi-continuous and graph-continuous.

n _^

The payoff /* : S —> R1 is upper semi-continuous if for any sequence _ i= i {«ft} C S such that {s^} —> s holds lim sup /*(sfc) < /*(«)•

n _^ _

The payoff /* : S —> R1 is graph-continuous if for all s G S there exists a

j=1

function Fl : S ' S1' with f^s-*) such that /'(^(s-*), s~*) is continuous at = s~\

Dasgupta and Maskin proved the following theorem.

Theorem 3. Let S , i = l,n he non-empty, convex and compact sets. If each payoff P '■ & ► R1 j i G {lj 2, • • •, n}, is upper semi-continuous on S, graph-continuous and quasiconcave with respect to s* on , then the game (S'1 P{s)i=Tn) Pos~

sesses a pure-strategy Nash equilibrium.

In the following we need to extend this theorem for the case when each payoff Pis'1, s-4), i = 1, 2,..., n is quasimonotonic with respect to s* on S, i.e. /*(s®, is quasiconvex and quasiconcave with respect to s* on S*. We can observe that in this case the reaction correspondences of the players

<pi(s-i) = {si G S^l/V,*-4) = maif^,^)}, i = l,2,...,n

are compact and convex valued and therefore the upper semi-continuous condition can be released. So, in this case the theorem can be formulated as follows.

Theorem 4. Let S , i = l,n be non-empty, convex and compact sets. If each payoff fl : S R1, i G {1, 2,.. ., n}, is graph-continuous and quasimonotonic with

respect to sl on S , then the game (S fi(s)i=Tn) possesses a pure-strategy

Nash equilibrium.

4. The Main Results

In this section we present the results concerned with the existence and determining stationary Nah equilibria for a two-player average stochastic game in stationary strategies. For this case we formulate this game in normal form.

4.1. The Game Model in Normal Form

The game model in normal form for the considered two-player average stochastic game is the following:

Let , i G {1,2} be the set of solutions of the system

E s4 < = 1, Vx € X;

x.a2 ' '

!i£Ai(x) (14)

sX , > 0, Vx € X, a4 € A4(x).

that determines the set of stationary strategies of player i. Each S1 is a convex compact set and an arbitrary extreme point corresponds to a basic solution s4 of system (14), where s^ ai € {0,1}, Vx € X, a4 € A(x), i.e such a solution

corresponds to a pure stationary strategy of player i. On the set S = S1 x S2 we define the payoff functions

Vj (sV)= £ E sX,ai sX,a2 f4(x, (a\a2))qx, i = 1, 2 (15)

xex (a1,a2 )GA1(x)xA2(x)

where qx for x € X are determined uniquely from the following system of linear equations

„1 „2 (a1,a2)

qy "E E sx,ai sx,a2Px,y qx = 0, VV € X;

(16)

x,a1 x,a2 x,y

xGX (a1,a2)GA1(x)xA2(x)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(1 2)

qy + wy "E E sX,a1 sX,a2Pxi,y'a wx = , Vy € X,

x£X (a1,a2)eA1(x)xA2(x)

for an arbitrary fixed profile s = (s1,«2) € S. The functions ip^s1, s2), « = 1,2 represent the payoff functions for the average stochastic game in normal form that we denote by ({S,i}i=i,2, {V,e(s)}i=i,2 )• This game is determined by the tuple (X, {Ai(x)}i=ij2, {/®(x, a}i=ii2, p, {#y}) where 0y for y G X are given nonnegative values such that ^y£X 0y = 1.

If 0y = 0, Vy G X \ {x0} and 0Xo = 1 then we obtain an average stochastic

game in normal form {{S }i=i,2, {UJx0(s)}i=i,2 } when the starting state xq is fixed, i.e. (s^s2) = 0(s1, s2), i = 1, 2. So, in this case the game is determined by (X, {Ai(x)}i=ii„, {/*(x, a}i=ijn, p, xo).

If 0y > 0, Vy G X and yeX = 1 then we obtain an average stochastic game when the play starts in the states y G X with probabilities 0y. In this case for the payoffs of the players in the game in normal form we have

^(s1, s2, .. ., sn) = 4(s1, s2, .. ., sn), i = 1, 2. (17)

yex

4.2. Stationary Nash Equilibria Existence Results

As we have noted the existence of stationary Nah equilibria in two-player average stochastic games has been shown by Vielle, 2000. Here we show that this result can be derived also from the following theorem.

Theorem 5. The game ({<S'i}i=il2, {riPl(s)}i=i,2 } possesses a pure-strategy Nash equilibrium s * = (s1 , s2 ) which is a stationary Nash equilibrium for the two-player average stochastic game determined by (X, {Ai(x)}i=1j2, {/J(x, a}j=1,2, P, {$y})• Moreover, if s * = (s1 , s2 ) is a pure-strategy Nash equilibrium for the game ({Sh =1,2, {^(s)}i=1,2 ), where 0y > 0, Vy G X then s * ia a stationary Nash equilibrium for the two-player average stochastic game (X, {A®(x)}j=1,2, {/J(x, a}j=1,2,p, y) with an arbitrary starting state y G Y•

Proof• The proof of the existence of a pure-strategy Nash equilibrium for the game

({S'i}j=ij2, (s)}i=i,2 } follows from Theorems 4. Indeed, according to

Theorem 1 the payoff ipg(s1,s2) is quasimontonic with respect to s1 on S*1 for _2

a fixed s2 G S and the payoff ipg(s1,s2) is quasimontonic with respect to s2 on _2 _\

S for a fixed s1 G S . The graph-continuous property of payoffs functions also follows from Theorem 1 (see the proof of theorem in Lozovanu, 2016). Note that the graph-continuous property for payoffs holds only for two-players games.

Now let us prove the second part of the theorem. Let s = (s1 , s2 ) be a pure-strategy Nash equilibrium for the game {{^{r,Pe(s)}i=Tn ) determined by (X,{Ai(x)}i=p{/i(x,a}i=— ,p,{0y}), where 9y > 0,Vy G X,EyeX^y = 1-Then s = (s1 , s2 ) is a Nash equilibrium for the average stochastic game ({S'i}i=Y^, {'lPe'(s)}i=~m ) with an arbitrary distribution {9'y} on X, where 9'y > 0, Vy G X,'£8y = 1,'i.e

^iei{si\s-i*)>^i)l{s\s-i*), ysi « = 1,2.

If here we express ^' via ^y using (17) then we obtain

YJd'yH{si\S-i*)-Lo\j{Si, $-'*))> 0, Vs'GS4, « = 1,2. yex

This property holds for arbitrary > 0, Vy € X such that £= 1 and therefore for an arbitrary y € X we have

w]l{si*,s-i*)-wil{si,s-i*)>0, Vs* € IS1, ¿ = 1,2.

So, s* = (sx*,s2*) is a Nash equilibrium for an arbitrary average stochastic game {{S {UJl(s)}i=Tn ) with an arbitrary starting state y £ X.

Remark 2. The graph-continuous property of the payoffs in the case n > 2 players may fail to holds and therefore Theorem 5 couldn't be extended to general n-player games.

So, the problem of determining the optimal stationary strategies in a two-player average stochastic game determined by (X, {A®(x)}j=1j2, {/®(x, a}j=1j2, p, })

can be found if we find the optimal strategies of the game ({S,i}i=i,2, {4>g(s)}i=1,2 ), __2

where the strategy sets S , S and the payoff functions ipg(s), 4>g{s) are determined according to (14)-(16).

5. Conclusion

For a two-player average stochastic game a stationary Nash equilibrium exists and the optimal stationary strategies of the players can be found by determining the the optimal pure strategies for the game in normal form presented in this paper.

References

Dasgupta, P. and Maskin, E. (1986). The existence of equilibrium in discontinuous economic games. Review of Economic Studies, 53, 1-26.

Fan, K. (1966). Application of a theorem concerned sets with convex sections. Math. Ann. 163 189-203.

Flesch, J., Thuijsman, F. and Vrieze, K. (1997). Cyclic Markov equilibria in stochastic games. International Journal of Game Theory, 26, 303-314 66.

Lozovanu, D. (2016). Stationary Nash equilibria for average stochastic games. Buletinul A.S.R.M, ser. Math. 2(81) 71-92.

Mertens, J. F. and Neyman, A. (1981) Stochastic games. International Journal of Game Theory, 10, 53-66 .

Nash, J. Non-cooperative games. Ann. Math. 54, 286-293.

Puterman, M. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New Jersey.

Shapley, L. (1953). Stochastic games. Proc. Natl. Acad. Sci., U.S.A., 39, 1095-1100.

Solan, E. and Vieille, N. (2010). Computing uniform optimal strategies in two-player stochastic games. Economic Theory (special issue on equilibrium computation), 42, 237-253.

Tijs, S.and Vrieze, O. (1986). On the existence of easy initial states for undiscounted stochastic games. Math. Oper. Res., 11, 506-513.

Vieille, N. (2000). Equilibrium in 2-person stochastic games I ,II. Israel J. Math., 119(1), 55-126.

i Надоели баннеры? Вы всегда можете отключить рекламу.