Научная статья на тему 'ON THE EXISTENCE OF STATIONARY NASH EQUILIBRIA IN AVERAGE STOCHASTIC GAMES WITH FINITE STATE AND ACTION SPACES'

ON THE EXISTENCE OF STATIONARY NASH EQUILIBRIA IN AVERAGE STOCHASTIC GAMES WITH FINITE STATE AND ACTION SPACES Текст научной статьи по специальности «Математика»

CC BY
7
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
MARKOV DECISION PROCESSES / AVERAGE STOCHASTIC GAMES / STATIONARY NASH EQUILIBRIA / OPTIMAL STATIONARY STRATEGIES

Аннотация научной статьи по математике, автор научной работы — Lozovanu Dmitrii, Pickl Stefan

We consider infinite -person stochastic games with limiting average payoffs criteria for the players. The main results of the paper are concerned with the existence of stationary Nash equilibria and determining the optimal strategies of the players in the games with finite state and action spaces. We present conditions for the existence of stationary Nash equilibria in the considered games and propose an approach for determining the optimal stationary strategies of the players if such strategies exist.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «ON THE EXISTENCE OF STATIONARY NASH EQUILIBRIA IN AVERAGE STOCHASTIC GAMES WITH FINITE STATE AND ACTION SPACES»

Contributions to Game Theory and Management, XIII, 304-323

On the Existence of Stationary Nash Equilibria in Average Stochastic Games with Finite State and Action Spaces

Dmitrii Lozovanu1 and Stefan Pickl2

1 Institute of Mathematics and Computer Science of Moldova Academy of Sciences,

Academiei 5, Ghisinau, MD-2028, Moldova, E-mail: lozovanu@math.md

2 Institute for Theoretical Computer Science, Mathematics and Operations Research,

Universität der Bundeswehr München, 85577 Neubiberg-München, Germany, E-mail: stefan.pickl@unibw.de

Abstract We consider infinite n-person stochastic games with limiting average payoffs criteria for the players. The main results of the paper are concerned with the existence of stationary Nash equilibria and determining the optimal strategies of the players in the games with finite state and action spaces. We present conditions for the existence of stationary Nash equilibria in the considered games and propose an approach for determining the optimal stationary strategies of the players if such strategies exist.

Keywords: Markov decision processes, Average stochastic games, Stationary Nash equilibria, Optimal stationary strategies

1. Introduction

In this paper we study the problem of the existence and determining stationary Nash equilibria in average stochastic games with finite state and action spaces. Stochastic games, named sometimes Markov games, were introduced by Shapley, 1953. He considered two-person zero-sum stochastic games for which he proved the existence of the value and the optimal stationary strategies of the players with respect to a discounted payoff criterion. Later, this class of games has been extended to general n-person stochastic games with discounted and average payoffs criteria (Gillette, 1957; Fink, 1964; Takahashi, 1964; Vrieze, 1987; Filar et al., 1991;

n

with discounted payoffs have been obtained by Fink, 1964, Takahashi, 1964 and Sobel, 1971 who proved the existence of stationary Nash equilibria in such games. Schultz, 1986, Filar et al., 1991 showed that the problem of determining stationary

n

be represented as a nonlinear programming problem with linear constraints and the global minimum of the objective function equal to zero. Mertens and Neyman, 1981 studied two-person zero-sum games and proved the existence of uniform e-optimal strategies for the players, i.e. they showed that for every e > 0 each of the two players

e

0

trary noncooperative stochastic game of two players and afterwards they have been used for studying the problem of the existence of Nash equilibria in non-stationary strategies for two-player average stochastic games (Vieille, 2002, 2009, Solan, 2009; Solan and Vieille, 2010). Algorithmic approaches concerned with determining the optimal strategies of the players in some classes of stochastic games can be found in (Schultz, 1986; Filar et al., 1991; Neyman and Sorin, 2003; Solan, 2009).

The n-person stochastic games with limiting average payoffs have been studied by many authors (Neyman and Sorin, 2003; Rogers, 1969; Sobel, 1971; Solan, 2009; Solan and Vieille, 2010; Vieille, 2002; Vieille, 2009; Vrieze, 1987) however the existence of stationary Nash equilibria has been proved only for some classes of such games. Rogers, 1969 and Sobel, 1971 showed that stationary Nash equilibria exist for nonzero-sum stochastic games with average payoffs when the transition probability matrices induced by any stationary strategies of the players are unichain. An important class of average stochastic games for which stationary Nash equilibria exist represents stochastic positional games (Lozovanu, 2018, 2019). Furthermore Lozovanu, 2018, 2019 shown that for average stochastic positional games with unichain property and for two-player zero-sum stochastic positional games there exist stationary Nash equilibria in pure strategy. The main results concerned with the existence and determining Nash equilibria in two-player average stochastic games can be found in (Mertens and Neyman, 1981; Neyman and Sorin, 2003; Vieille, 2002, 2009, Vieille, 2009; Solan, 2009; Solan and Vieille, 2010). In the general case for an average stochastic game with given starting state a stationary Nash equilibrium may not exist. This fact has been shown by Flesch et al., 1997 that constructed an example of a 3-player average stochastic game with fixed starting state for which a stationary Nash equilibrium does not exist. Moreover, they shown that for an m-player (m > 3) average stochastic game may not exist also stationary e-equilibrium (e > 0). In general, for an average stochastic game there may exist a nonempty subset of states such that if the game starts in one of them then a stationary Nash equilibrium exists (Tijs and Vrieze, 1986). However, the problem of determining the initial states in an average stochastic game for which stationary-equilibria exist is an open problem.

In this contribution we consider average stochastic games with finite state and action spaces. We show that an arbitrary average stochastic game in stationary-strategies can be represented as a game in normal form where each payoff is quasi-monotonic (quasi-concave and quasi-convex) with respect to the strategy of the corresponding player. Furthermore we show that if the game in normal form has a pure Nash equilibrium then such an equilibrium corresponds to a stationary Nash equilibrium of the average stochastic game and vice versa. Based on this result and results of Debreu, 1952, Glicksberg, 1952, Dasgupta and Maskin, 1986, Reny, 1999 related to existence of Nah equilibria in the games with quasi-concave (quasi-convex) payoffs we formulate conditions for the existence and determining stationary Nash equilibria in average stochastic games.

2. Average Stochastic Games in Pure and Mixed Stationary Strategies

n

the formulation of stochastic games with average payoffs when the players use pure and mixed stationary strategies.

2.1. The Framework of a n-person Average Stochastic Game

n

- a state space X (which we assume to be finite);

- a finite set Ai(x) of actions with respect to each player i e {1, 2,..., n} for an arbitrary state x e X;

- a payoff f®(x,a) with respect to each player i G{1,2, ...,n} for each

state x G X and for an arbitrary action vector a G n A®(x);

i

n

- a transition probability function p : X x [| Ai (x) x X ^ [0,1]

xex i=1

that gives the probability transitions pX,y from an arbitrary x G X to an arbitrary y G Y for a fixed action vector a Ai(x), where

E PX,y = 1, Vx G X, a G]1 Ai(x);

yex i

- a starting state x0 G X.

The game starts in the state x0 and the play proceeds in a sequence of stages. At stage t the players observe state xt and simultaneously and independently choose actions a\ G Ai(xt), i = 1,2,..., n. Then nature selects a state y = xt+1 according to probability transitions pXt,y f°r the given action vector at = (a1, a2,..., an). Such a play of the game produces a sequence of states and actions xo, ao,x1; a1;..., xt, at,... that defines a stream of stage payoffs ft1 = f 1(xi,ai), ft2 = f 2(xt, at),..., ftn = fn(xt, at), t = 0,1, 2,.... The infinite average stochastic game is the game with payoffs of the players

WXo = /i™ inf ^ X1 frj , i = 1, 2,..., n,

where E is the expectation operator with respect to the probability measure in a

xo

at xt G X

^X o expresses the average payoff per transit ion of player i in the infinite game. Each player in this game has the aim to maximize his average payoff per transitions. In the case n =1 this game becomes the average Markov decision problem with a probability transition function p : X x A(x) x X ^ [0,1] and step rewards

x£X

f (x, a) = f 1(x, a) in the states x G X for given actions a G A(x) = A1(x).

In the paper we will study the stochastic games when players use pure and mixed stationary strategies of selection the actions in the states.

2.2. Pure and Mixed Stationary Strategies of the Players

A strategy (policy) of player i G {1,2,..., n} in a stochastic game is a mapping si that provides for every state xt G X a probability distribution over the set of actions Ai(xt). If these probabilities take only values 0 and 1, then si is called a pure strategy, otherwise si is called a mixed strategy. If these probabilities depend only on the state xt = x G X (i. e. si do not depend on t), then si is called a stationary strategy, otherwise si is called a non-stationary strategy.

Thus, a pure stationary strategy of player i G {1,2,..., n} can be regarded as a map si : x ^ ai G Ai (x) for x G X that determines for each state x an action ai G A^x), i.e. si(x) = ai. Obviously, the corresponding sets of pure stationary-strategies S 1,S2,...,Sn of the players in the game with finite state and action spaces are finite sets.

In the following we will identify a pure stationary strategy si(x) of player i with the set of boolean variables si . G {0,1}, where for a given x G X si ni = 1 if

and only if player j fixes the action a1 e A4(x). So, we can represent the set of pure stationary strategies S4 of player i as the set of solutions of the following system:

E s* i = 1, Vx e X;

a iGAi(*)

s*x,ai e{0,1}, Vx e X, Va4 e A4(x).

If in this system we change the restriction s* ai e {0,1} for x e X, a4 e A4(x) by the condition 0 < s* ai < 1 then we obtain the set of stationary strategies in the sense of Shapley, 1953, where s* ai is treated as the probability of the choices of the action a4 by player i every time when the state x is reached by any route in the dynamic stochastic game. Thus, we can identify the set of mixed stationary-strategies of the players with the set of solutions of the system

( E s* i = 1, Vx e X;

I aieAi(x) (!)

I s* i > 0, Vx e X, Va4 e A4(x)

and for a given profile s = (s1, s2,..., sn) of mixed strategies s1, s2,..., sn of the players the probability transition pX,y from a state x to a state y can be calculated as follows

n

= E n s*,akpia1'a2,""an). (2)

(a1 ,a2,...,an)eA(x) k=1

In the sequel we will distinguish stochastic games in pure and mixed stationary-strategies.

2.3. Average Stochastic Games in Pure Stationary Strategies

Let s = (s1, s2,..., sn) be a profile of pure stationary strategies of the players

n

and denote by a(s) = (a1(s), a2(s),..., an(s)) e n EI A4(x) the action vector

xex 4=1

that corresponds to s and determines the probability distributions pX,y = px,^ in the states x e X. Then the average payoffs per transition w^ (s), w^ (s),..., wno (s) for the players are determined as follows

wXo(s) = E q^o.ynyX^ j =12,..., n,

yex

where qSo,y represent the limiting probabilities in the states y e X for the Markov-process with a probability transition matrix Ps = (pX,y) when the transitions start in xo- So, if for the Markov process with probability matrix Ps the corresponding limiting probability matrix Qs = (qX y) is known then

w^, w^,.. ., wn Cclll

be

xeX

w^o (s), w^o(s),..., wno(s) on S = S1 x S2 x • • • x Sn define a game in normal form that we denote by ({S®}4=^, {w£,o(s)}4=^ }. This game corresponds to an average stochastic game in pure stationary strategies that in extended form is determined by the tuple (X, {A®(x)}4=^, {/4(x, a}4=^, p, xo).

If an arbitrary profile s = (s1 ,s2,...,sn) of pure stationary strategies in a stochastic game induces a probability matrix Ps that corresponds to a Markov unichain then we say that the game possesses the unichain property and shortly we call it unichain stochastic game; otherwise we call it multichain stochastic game.

2.4. Average Stochastic Games in Mixed Stationary Strategies

Let s = (s1, s2,..., sn) be a profile of mixed stationary strategies of the players. Then elements of the probability transition matrix Ps = ) in the Markov-process induced by s can be calculated according to (2). Therefore if Qs = (q£ ) is the limiting probability matrix of Ps then the average payoffs per transition w£0 (s), w^0(s),..., (s) for the players are determined as follows

w® o (s)=E C.yf i(y,s), i = 1, 2,...,n, (3)

yex

where

n

fi(y,s)= E n sk.akf(y,a1,a2,...,an) (4)

(a1,a2,...,an )eA(y) k=1

expresses the average payoff (immediate reward) in the state y € X of player

i when the corresponding stationary strategies s1, s2,..., sn have been applied by-

players 1, 2,..., n in y. —1 —2 —n

Let S , S ,..., S be the corresponding sets of mixed stationary strategies for

the players 1, 2,..., n, i.e. each S for i € {1,2,..., n} represents the set of so_ _1 _2

lutions of system (2). The functions w£0 (s), w^0 (s),..., wno (s) on S = S x S x • • • x S"", defined according to (3),(4), determine a game in normal form that we denote by ({S }j=yn, {w®0 (s)}j=^ ). This game corresponds to an average stochastic game in mixed stationary strategies that in extended form is determined by the tuple (X {Ai(x)}i=1n U^al^i^ P xo)■

2.5. Average Stochastic Games with Random Starting State

In the paper we will consider also average stochastic games in which the starting state is chosen randomly according to a given distribution {0 x} on X. So, for a given stochastic game we will assume that the play starts in the states x € X

with probabilities 0x > 0 where E 0 x = 1. If the players use mixed stationary

xex

strategies of selection the actions in the states then the payoff functions

^ (s1,s2,...,sn)= £ 0xwx (s1,s2,...,sn), i = 1, 2,.. ., n

xex

on S = S1 x S2 x • • • x S define a game in normal form ({S}4=^, {^g(s)}j=j^) that in extended form is determined by the following tuple (X, {A® (x)}4=^, {/®(x, a}®=^,p, {0x}). In the case 0x = 0, Vx € X \{x0}, 0xo = 1 the considered

xo

we can specify the game in normal form ({S®}®=^, {^g(s)}®=^) for the average stochastic game with random starting state x0 when players use pure stationary-strategies of selection the actions in the states.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2.6. Definition of Stationary Nash Equilibria

Let s = (s1,s2,...,sn) € S*. Define s-i = (s1, s2,..., si-1, si+1, ..., sn)

i

s = (s®, s-®), i = 1, 2,..., n. The profile s* = (s1 , s2 ,..., sn*) is called stationary

Nash equilibrium for an average stochastic game (jS }j=T~n, j^X0 (s)}j=T"n ) given starting state if

< (s4*,*-4*) > < (s4,s-<*), Vs4 G S4, i = 1, 2, ..., n. (5)

The profile s* = (sT*, s2*,...,sn*) is called stationary Nash equilibrium, for an average stochastic game (jS }4=T"n, j^e(s)}i=T"n ) when the starting state is chosen randomly according to a given distribution j0x} on X if

^(s4*,s-i*) > Vj(s4,s-i*), Vs4 G S4, i = 1, 2, ..., n. (6)

3. An Approach for Determining Stationary Nash Equilibria in Average Stochastic Games with Unichain Property

In this section we show that an unichain average stochastic game in stationary strategies can be represented as a continuous game in normal form where the payoffs are quasi-monotonic with respect to the corresponding strategies of the players. Using such a model we propose an approach for determining stationary Nash equilibria for unichain average stochastic games.

3.1. A Continuous Model for the Average Markov Decision Problem with Unichain Property

In (Lozovanu, 2011) has been shown that an average Markov decision problem with unichain property can be formulated as the following optimization problem: Maximize

q) = ^ f (x'a)sx,a qx, (7)

x£X a£A(x)

subject to

qy - E E pX,ysx,aqx = o, Vy g x;

i£X a£A(x)

E qx = 1;

xex

<

E sx,a 1, aeA(x)

sx,a > 0, Vx G X, a G A(x).

Here f (x, a) represents the step reward in the state x G X for a given action a G A(x) in the unichain problem and px,y expresses the probability transition from x G X to y G X for a G A(x). The variables sxja correspond to strategies of selection of the actions a G A(x) in the states x G X Mid qx for x G X represent

xGX

transition matrix Ps = (px,y) induced by the stationary strategy s.

In this problem the average reward ^(s, q) is maximized under the conditions (8) that determines the set of feasible stationary strategies in the unichain problem. An optimal solution (s*, q*) of problem (7), (8) with sx,a G {0,1} corresponds to an optimal stationary strategy s* : X ^ A where a* = s*(x) for x G X if sx,a = 1-Using the notations axj0 = sxjaqx, for x G X, a G A(x), problem (7), (8) can be

(8)

x X;

easily transformed into the following linear programming problem: Maximize

= 53 53 f (x, a) ax,a

x£X aeA(x)

(9)

subject to

qy - E E ax,a = 0, Vy G X; E qx = i;

xex

E «x,a - qx = 0, Vx G X;

oeA(x)

(10)

ax,a > 0, Vx G X, a G A(x).

qx

the problem in which it is necessary to maximize the objective function (9) on the set of solutions of the following system:

E ay,a -E E px,y ax,a =0, Vy G X;

eA(y) xex aeA(x)

E E ax,a = l;

x£X oeA(x)

x,a > 0, Vx G X, a G A(x).

(11)

Based on the relationship mentioned above between problem (7), (8) and problem (9), (11) in (Lozovanu, 2011) the following result has been announced.

Lemma 1. Let an average Markov decision problem be given, where an arbitrary s

^(s) = 53 53 f (x,a)sx,a qx x£X a£A(x)

qx x G X

qy - E E px,ysx,aqx = 0 Vy G X;

x£X aeA(x) qx = 1.

x£X

Then the function ^(s) on the set S of solutions of the system

E sx,a = 1, Vx G X;

aeA(x)

sx,a > 0, Vx G X, a G A(x)

depends only on sx,a for x G X, a G A(x), and ^(s) ¿s quasi-monotonic on S fie. ^(s) ¿s quasi-concave and quasi-convex on S i?o«/d and Vandenberghe, 2004). Moreover, ^(s) = wx(s), Vx G X.

The full proof of this lemma in a more general form is presented in (Lozovanu, 2018).

3.2. Stationary Equilibria for Average Stochastic Games with Unichain Property

An average stochastic game with unichain property can be formulated in terms of stationary strategies as follows.

Let S = S x S x ■ ■ ■ x S", where each S for i e {1,2,..., n} represents the set of solutions of system (2), i.e. S represents the set of mixed stationary strategies for player i. On S we define the average payoffs for the players as follows:

n

^®(s\s2, ..., s") = E E n sXak/i(x,a1,a2,...,an)qx,

xEX (a1,a2,...,an)eA(x) k=1

i = 1, 2, . . ., n,

where qx for x e X are determined uniquely from the following system of linear equations

n /-12 ns

E E n sX ofcpXya "'"a )qx = qy, Vy e X;

x£X (a1 ,o2,...,a»)EA(x) k=1

<

E qx =1

x£X

where s® e S , i = 1, 2,..., n. The functions ^®(s1, s2,..., s"), i = 1, 2,..., n on S define a game in normal form ({S }®=m, W®(s)}®=1""} that corresponds to a stationary average stochastic game with unichain property, where (s1, s2,..., s") = ^x(s1, s2, ..., s"), Vx e X, i = 1, 2,..., n. From Lemma 1 we obtain the following result.

Lemma 2. For an arbitrary unichain stochastic game ({S®}®=m, {^(s)}®^-"} each payoff function ^®(s® , s-®), i e {1, 2, ...,n} ¿s quasi-monotonic with respect to s® e S for arbitrary fixed s-® e S ®.

Based on Lemma 2 and results from (Debreu, 1952; Glicksberg, 1952) we obtain the following theorem.

Theorem 1. Let ({S }®=m, W®(s)}®=Y""} be an average stochastic game determined by (X, A, {Xj}i=j^, {/®(x, a)}j=1^, p, x). // /or an arbitrary s = (s1, s2,..., s") eS of the game the transition probability matrix Ps = (px,y) corresponds to a Markov unichain then for the game ({S }®=1", {^®(s)}®=^} there exists a Nash equilibrium s* = (s1*, s2*,..., s"*) which is a Nash equilibrium for

xeX

Proof According to Lemma 2 each payoff ^®(s®, s-®), i e {1, 2,..., n}) is quasi-monotonic with respect to s® e S for fixed s-® e S ®. Additionally, each payoff (s), i e {1, 2,..., n} is continuous on S because the stochastic game is unichain. Then accordingto (Debreu, 1952; Glicksberg, 1952) the game ({S®}®=j^, {^®(s)}®=j^} possesses a pure Nash equilibrium s* e S which is a stationary Nash equilibrium for the unichain average stochastic game with an arbitrary starting state x e X. □

s*

(jS}j=T"n, j^4(s)}j=T"n) then s* is a stationary Nash equilibrium for the average stochastic game with unichain property.

4. Some Results for a Multichain Average Markov Decision Problem

In this section we extend the results from Section 3.1. for the multichain average Markov decision problem, i.e. we show how this decision problem can be formulated in terms of stationary strategies. These results we shall use in the next section for the average stochastic games in general case.

4.1. A Linear Programming Approach for a Multichain Decision Problem

The basic model that we shall use in the sequel for formulation and studying a Markov decision problem in terms of stationary strategies represents the following linear programming problem (Kallenberg, 2016; Lozovanu and Pickl, 2015; Puterman, 2005): Maximize

^(a,ft53 f(x,a)ax,a (12)

x£X aeA(x)

subject to

E ay,a - E E px,y ax,a =0, Vy G X;

aeA(y) x£X aeA(x)

E ay,a + E fty,a - EE px,y ftx,a = 0y, Vy G X; (13)

aeA(y) aeA(y) x£X aeA(x)

ax,a > 0, fty,a > 0, Vx G X, a G A(x),

where 0y for y G X represent arbitrary positive values that satisfy the condition E = 1- Recall that f (x, a) denotes the step reward in a state x G X for a given

y£X

action a G A(x) in the decision problem and px,y represent the corresponding probability transitions from a state x G X to the states y G X for a G A(x), where

E px,y = 1

y£X

This problem generalizes the unichain linear programming model (9), (11) from Section 3.1.. In (13) the restrictions

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

53 "y.a +53 ^y.a - 53 53 px,y^x,a = ^y, Vy G X (14)

aeA(y) aeA(y) xEX aeA(x)

with the condition E = 1 generalize the constraint E E ay,a = 1 in the

y£X xeXaeA(y)

y

between feasible solutions of problem (12), (13) and stationary strategies in the average Markov decision problem is the following(see Puterman, 2005):

Let (a, ft) be a feasible solution of the linear programming problem (12), (13) and denote Xa = jx G X | E ax,a > 0} Then (a, ft) possesses the properties that

a£X

E > 0 f°r x G X \ Xa and a stationary strategy that corresponds to (a, ß)

oeA(x)

is determined as follows

- if x G Xa;

ax

sx.a \

E

(15)

ßx'° if X G X \ Xa,

E

leA(x)

where sXja expresses the probability of choosing the actions a G A(x) in the states x G X. It is easy to see that the set of feasible solutions of problem (12),(13) generate through (15) the set of stationary strategies S that corresponds to the set of solution of the following system

E sx,a = 1, Vx G X;

oeA(x)

s

> 0, Vx G X, Va G A(x)

x,a >

In (Kallenberg, 2016; Lozovanu and Pickl, 2015; Puterman, 2005) the problem (12), (13) is regarded as the dual model of the following linear programming problem:

Minimize

^(e, w) = ^ Wx (16)

xex

subject to

ex + Wx > f (x, a) + E pX yey, Vx G X, Va G A(x); yex

Wx > E px yWy, Vx G X, Va G A(x). yex '

(17)

The optimal value of the objective function in this problem as well as the optimal values of the objective functions in problems (12), (13) and (16), (17) express the optimal average reward when the initial state is chosen according to distribution {0x}. Solving problem (16), (17) we obtain the value w* for each x G X that

x

equal to 1. This means that if (a*, A*) is an optimal solution of problem (12), (13) then we can determine the optimal strategy s* and the optimal values of object functions of problems (16), (17) and (12), (13), where ^(e*, w*) = *). An

arbitrary optimal solution of problem (12), (13) or of problem (16), (17) determine an optimal strategy s* that is an optimal stationary strategy for the multichain

x G X

Remark 1. Problems (12),(13) and (16),(17) can be considered also for the case when = 0 for some x G X. In particular, if = 0, Vx G X \{x0} and = 1 then these problems are transformed into the models with fixed starting state x0. In this case for a feasible solution (a, A) the subset X \ Xa may contain states for which EaeA(x) Ax,a = 0. In such states it couldn't be used (15) for determining sx,a. Formula (15) can be used for determining the strategies sx,a in the states x G X

for which either EaeA(x) ®x,a > 0 or EaeA(x) > 0 and these strategies determine the value of the objective function in the decision problem. In the states x e X0, where

Xo = {x e X| ^ ax,a = 0, = 0},

a£A(x) a£A(x)

the strategies of the selection of the actions may be arbitrary because they do not affect the value of the objective function.

4.2. A Multichain Markov Decision Model in Terms of Stationary Strategies

The multichain average Markov decision model in terms of stationary strategies that generalizes the unichain model (7), (8) from Section 3.1. is the following: Maximize

^(s q,w)=^ f(x,a)sx,aqx (18)

subject to

' qy - E E sx,aqx = 0, Vy e X;

qy + wy - E E sx,awx = ,

^ aeA(x)

E sy,a -1; aeA(y)

„ > 0, Vx e X, Va e A(x); wx > 0, Vx e X,

where 0y are the same values as in problem (12), (13) and sx,a, qx, wx for x e X, a e A(x) represent the variables that must be found.

Theorem 2. Optimization problem (18), (19) determines the optimal stationary strategies of the multichain average Markov decision problem.

Proof. Indeed, if we assume that each action set A(x),x e X contains a single action a' then system (13) is transformed into the following system of equations

(qy - E qx = Vy e X;

xex

qy + wy - E Px,ywx = , Vy e X

xex

with conditions qy,wy > 0 for y e X where qy = ay,a', wy = fty,a', Vy e X and px,y = pX,y, Vx, y e X. This system uniquely determines qx for x e X and wx x e X

P = (px,y) (see Puterman, 2005). Here qx represents the limiting probability in the state x when transitions start in the states y e X with probabilities 0y and

qx > 0 x e X wx

states may be negative, however always the additive constants in the corresponding

wx

wx > 0 x e X

Vy e X;

Vy e X;

(19)

this does not affect the value of the objective function of the problem. In the case

|A(x)| = 1, Vx e X the average cost is determined as ^ = E f (x)qx, where

xex

f (x) = f (x, a), Vx e X.

If the action sets A(x), x e X may contain more than one action then for a given stationary strategy s e S of a selection of the actions in the states we can find the average cost ^(s) in a similar way as above by considering the probability-matrix Ps = )> where

= E sx,a (20)

aeA(x)

expresses the probability transition from x e X to a state y e X when the

s

have to solve the following system of equations

qy - E px yqx = o, Vy e X;

xex

qy + wy - E px ywx = 0y, Vy e X-

xex

If in this system we take into account (20) then this system can be written as follows

qy - E E pX,y sx,aqx = 0, Vy e X;

xex oeA(x)

( ) (21)

qy + wy - E E pX,ysx,awx = 0y, Vy e X.

x£X o£a(x)

An arbitrary solution (q, w) of the system of equations (21) uniquely determines qy yeX

^(s) = E E f(x' a)sx,«qx (22)

xex aex

s

strategy then we should add to (21) the conditions

E sx,a = 1, Vx e X; sx,a > 0, Vx e X, a e A(x) (23)

aeA(x)

and to maximize (22) under the constraints (21), (23). In such a way we obtain

wx > 0 x e X

wx > 0 x e X

and therefore we can preserve such conditions that show the relationship of the problem (18), (19) with problem (12), (13). □

Corollary 1. If 9x > 0, Vx e X then an arbitrary optimal strategy s* of problem (18), (19) is an optimal stationary strategy for the multichain average decision problem with an arbitrary starting state x e X. If Qx = 0, Vx e X \ jxo} and 0xo = 1 then an optimal strategy s* of problem (18), (19) is an optimal stationary strategy

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

xo

The relationship between feasible solutions of problem (12), (13) and feasible solutions of problem (18), (19) can be established on the basis of the following lemma.

Lemma 3. Let (s,q, w) be a feasible solution of problem (18), (19). Then

, ßx.a = «x.aWx, Vx G X, a G A(x)

(24)

represent a feasible solution (a, ft) of problem (12), (13) and ^(s, q, w) = ^(a, ft). // (a, ft) ¿s a feasible solution of problem (12),(13) with 0x > 0, Vx G X then a feasible solution (s, q, w) of problem (18), (19) can be determined, as follows:

sx.a *

ax,a

aeA(x) ftx.a

E ßx,a

aeA(x)

/or x G Xa, a G A(x);

/or x G X \ Xa, a G A(x);

(25)

qx

E

iGA(x)

ax.ao wx

E ßx,a for x G X.

jga(x)

// (a, ft) ¿s a feasible solution of problem (12), (13) for which 0x = 0, Vx G X \ jxo} (s, q, w)

for x G Xa, a G A(x);

E

aeA(x)

Sx.a = < -^- for x G X \ (Xa U Xo), a G A(x);

E ßx.a

aeA(x)

arbitrary for x G X0, a G A(x),

qx / ^ ax.a?

aeA(x)

E ßx.a for x G X,

ieA(x)

where X0 = {x G XI E aeA(x) ax,a = 0, Ea£A(x) ftx,a = 0}

(s, q, w) (a, ft)

mined according to (24) then by introducing (24) in (12),(13) we obtain that (13) is transformed in (19) and ^(s,q, w) = ^(a, ft), i.e. (a, ft) is a feasible solution of problem (12), (13). The second part of lemma follows directly from the properties of a feasible solutions of problems (12),(13) and (18),(19). □

4.3. The Main Properties of the Problem in Stationary Strategies

Using problem (18), (19) we can now extend the results from Section 3.1. for the general case of an average Markov decision problem.

Theorem 3. Let an average Markov decision problem be given and consider the function

qx, (26)

x£X aeA(x)

where qx for x e X satisfies the condition

qy - E E sx,aqx = o, Vy e X;

i£X oeA(x)

qy + wy - E E px,y= , Vy e x.

iex oeA(x)

Then on the set S of solutions of the system

E = 1, Vx e X;

oeA(x)

sx,a > 0, Vx e X, a e A(x)

(27)

(28)

the function ^(s) depends only on sx,a /or x e X, a e A(x) and ^(s) ¿s quasi-monotonic on S (i.e. ^(s) is quasi-convex and quasi-concave on S ).

The proof of this theorem can be found in (Lozovanu, 2018).

5. The Main Results for Average Stochastic Games

In this section we extend the results from Section 3.2. for the case of multichain average stochastic game in stationary strategies. We show that a multichain average stochastic game in normal form can be formulated as the game in which the payoffs possess of quasi-monotonic property with respect to the corresponding strategies of the players. Based on this property we present some conditions for the existence of stationary Nash equilibria in the multichain average stochastic game.

5.1. A Normal Form of Average Stochastic Game in Stationary Strategies

The multichain average stochastic game in stationary strategies that generalizes the unichain game model from Section 3.2. is the following:

Let S , i e {1,2,... n} be the set of solutions of the system

E <

ai £A*(x)

1, x X;

s4 i > o, Vx e X, a4 e A(x).

x,ai — ' e ' e V /

(29)

that determines the set of stationary strategies of player i. Each S is a convex compact set and an arbitrary extreme point corresponds to a basic solution s4 of system (29), where sX ai e {0,1}, Vx e X, a4 e A(x), i.e each basic solution

i

_ _1 _2 _n

S = S x S x---x S we define n payoff functions

k (

,1 „2

)=

E

r£X (a1,a2,...,an)eA(x) k=1

I! sXakfi(x,a1,a2 ...an)qx,

(30)

i = 1, 2,...,n,

qx x G X equations

n ,12 ns

qy - E E n sx ofc)qx = 0, Vy g x;

( ) (31)

n ( 1 2 n)

qy + wy - E E n sx akpxya '"''a )wx = ^y, Vy G X,

x£X ( a1,a 2,...,an)eA(x) k=1 '

for an arbitrary fixed s = (s1, s2,..., sm) G S The fonctions ^g(s1, s2,..., s"), i = 1,2, ...,n, represent the payoff functions for the average stochastic game in normal form (jS j^g(s)}i=x^ )• This game is determined by the tuple

(X, {A®(x)}i=in, j/®(x, p, j0y}) where 0y for y G X are given nonnega-

tive values such that Eyex 0y = 1-

If 0y = 0, Vy G X \ jx0} and = 1 then we obtain an average stochastic

game in normal form (jS }i=!n, j^X0 (s)}i=m ) when the starting state x0 is fixed, i.e. ^g (s1,s2,..., s") = 0 (s1,s2, ..., s"), i = 1, 2,..., n. So, in this case the game is determined by (X, jA®(x)}i=^, j/®(x, a}j=1", p, x0).

If 0y > 0, Vy G X and Eyex 0y = 1 then we obtain an average stochastic game when the play starts in the states y G X with probabilities 0y. In this case for the payoffs of the players in the game in normal form we have

^g (s\s2, ..., s") = Eyex 0y<4 (s1,s2, ..., sn), i = 1, 2,..., n. (32)

5.2. The Main Properties of Average Stochastic Games in Normal Form

Based on results from the previous section we can prove the following results.

Theorem 4. Let (jS j^g(s)}i=^) be the game in normal form for

the average stochastic game in stationary strategies determined by (X, jA®(x)}i=^, j/®(x,a}i=T^,p, j0y}) where 0y > 0, Vy G X, Ey£x 0y = 1. V for this game there exists a Nash equilibrium, s* = (s1*, s2*,..., s"*) then it is a Nash equilibrium for the game in normal form (jS j^y(s)}i=j^ ) with an ar-

bitrary y G X, i.e. s* = (s1 *, s2*,...,sn*) is a, stationary Nash equilibrium of

yGX

yGX

(jSi}i=r^, j^y(s)}i=m) has a Nash equilibrium then for an arbitrary distribution function j0y} on X with 0y > 0, Vy G X (EyeX 0y = 1) the corresponding game

in normal form (jS®}i=^, j^g(s)}i=^ ) of the average stochastic game determined by (X, jA®(x)}i=1", j/®(x, a}i=1",p, j0y}) has a Nash equilibrium s* = (s1*, s2*,..., sn*) which is a Nash equilibrium, for each of the game in normal form (jS}® =1-" j^y(s)}i=ï" ) with the corresponding starting states y G Y.

Proof, Let s* = (s1*, s2*,..., sn*) be a Nash equilibrium for the game in normal form US^!" Wg(s)}®=1") determined by (X jAi(x)}i=^, j/i(x «1®=!" p, j0y }), where 0y > 0, Vy G X,E yeX 0y = 1. Then (s1 * ,s2*,...,s"*) is a Nash

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

equilibrium for the average stochastic game (jS }i=y-n, j^g'(s)}i=rn ) with an arbitrary distribution j0y} on X, where 0y > 0, Vy G X, £^ = 1 i-e

^g'(s^s-4*) > ^g'(s\s-*), Vs4 G S, i = 1, 2,.. ., n. If here we express ^g' via ^y using (32) then we obtain

£ 0y(4(s4*, s-4*) - ^y(s\ s-4*)) > 0, Vs4 G S, i = 1, 2,..., n. yex

This property holds for arbitrary 0y > 0, Vy G X such that £yeY ^ = 1 and therefore for an arbitrary y G X we have

^y(s^s-4*) - ^y(s\s-*) > 0, Vs4 G S, i = 1, 2, ..., n.

So, (s1 *, s2*,..., sn*) is a Nash equilibrium for each of the game in normal form (jS }j=m, j^y(s)}i=m ) with the corresponding starting states y G X.

Assume that for each starting state y G X the average stochastic game (jS }i=in, j^y(s)}j=i;n ) a Nash equilibrium. Let us show that for the game

U^^in Wg(s)}»=^ K determined by (X, M^^in, U^x «}i=^,p, j0y}), where 0y > 0, Vy G X, EyeX 0y = 1 there exists a Nash equilibrium. We prove this using an auxiliary average stochastic game with a new starting state z and the set of states X U jz}, where for an arbitrary state x G X each player i G j1, 2,..., n} has the same set of actions A4 (x), the same payoffs /4 (x, a) for a G A(x) and the same transition probability distributions pX,y f°r a G A(x) as in the game determined by (X, jA^x)}^^, j/4(x, a}4=^, p, j0y}); in the state z of the auxiliary game each player i G j1, 2,..., n} has a single action a!, and A(z) contains a unique profile az = (a1, a2,..., a!?) for which p^ = 0, pafy = 0y, Vy G X Mid /4(z, az) = 0, i = 1, 2,..., n. Obviously, for the auxiliary-average stochastic game with starting state z, determined by (X U jz}, jA4(x) U A4(z )}j=m, j/4(x, a),/4 (z, az )}j=^,p U jpXfy },z) there exists a stationary Nash equilibrium because a Nash equilibrium exists for an arbitrary average stochastic game (jS}^!!, j^y(s)}4=^ ) with starting state y G Y. Taking into account that the auxiliary game is equivalent to the average stochastic game, determined by (X, jA4(x)}i=^, j/(x, a^i^p, j0y}), where 0y > 0, Vy G X, £y£X 0y = 1 we obtain that the considered average stochastic game with a random starting state has a stationary Nash equilibrium s* = (s1 , s2 ,...,sn*) which is a stationary Nash equilibrium for the average stochastic game (jS j^y(s)}4=^ ) with an

arbitrary starting state y G Y. □

From Theorem 3 we can easily obtain the following result.

Lemma 4. For an arbitrary game in normal form (jS j^g(s)}4=^ ) with

> 0, Vx G X, E yex 0y = 1 each payoff func tion ^g (s1, s2, ..., sn), i G j1, 2,..., n} possesses the property that ^g(s4, s-®) is quasi-monotonic with respect to s4 G S for arbitrary fixed s-4 G S \

Proof Indeed, if players 1, 2,..., i — 1, i +1, ..., n fix their stationary strategies _k

sk G S , k = 1, 2,..., i — 1, i + 1, ..., n, then we obtain an average decision

problem with respect to s® G S and an average cost function ^g(s®, s ®). According to Theorem 3 ^g(s®, s-®) possesses the property that the value of this function is uniquely determined by s® G S and it is quasi-monotone with respect to s® on S . □

Using this lemma we can prove the following result.

Theorem 5. Let ({S®}®=j^, j^g(s)}®=^) be the normal form game for the average stochastic game determined by (X, A, {X®}®=^, {f®(x, a)}®=in,P, j^x}) where 0x > 0, Vx G X, Ey£X = 1. If each function ^g, i G {1,2,..., n} is continuous

on S = S1 x S2 x • • • x Sn then the game ({S }®=^, j^g(s)}®=m ) possesses a Nash equilibrium, s* = (s1 *, s2*, ..., sn*) which is a stationary Nash equilibrium for the average stochastic game with an arbitrary starting state y G X.

Proof. Indeed, according to Lemma 4 each function ^®(s1, s2, ..., sn), i G {1, 2,..., n} satisfies the condition that (s®,s-®), is quasi-monotonic with respect to s® G S® for arbitrary fixed s-® G S ®. In the considered game each subset S® is convex and compact and according to the condition of the theorem each payoff function ^g(s1, s2, ..., sn), i G {1, 2, ..., n} is continuous on S. Based on results from (Dasgupta and Maskin, 1986; Debreu, 1952; Reny, 1999; Simon, 1987) these conditions provide the existence of a Nash equilibrium s* = (s1*, s2*, ..., sn*) for the game ({S}®=^, {^g(s)}®=m )• According to Theorem 4 such an equilibrium

is a Nash equilibrium for the game ({S}®=^, {^y(s)}®=yn ) with an arbitrary starting state y G X. □

Remark 2. Theorems 4 and 5 are valid also for the case of the game ({S }® Wg(s)}®=m ) 11 = 0 for some y G X, however in this case we obtain stationary Nash equilibria only for the games ({S }®{^Z(s)}®=yn) with starting states z G X + , where X + = {z G X|0Z > 0}.

Remark 3. Theorem 5 holds also for the case when the payoffs are not continuous but satisfy so-called graph-continuous property from (Dasgupta and Maskin, 1986).

6. Stationary Equilibria for Average Stochastic Positional Games

Average stochastic positional games have been introduced in (Lozovanu, 2018) as a generalization of mean payoff games from (Ehrenfeucht and Mycielski, 1979). An average stochastic positional game represents an average stochastic game in which the set of states is divided into several disjoint subsets such that each subset represents the position set for one of the player and each player controls the Markov-process only in his position set. In such a game each player chooses actions in his position set in order to maximize his average reward per transition.

n

following elements: X

- a partition X = X1 U X2 U • • • U Xn where X® represents the position set of player i G {1, 2,..., n};

- a finite set A(x) of actions in each state x € X;

- a step reward f j(x, a) with respect to each player i € {1,2,..., n} in each state x € X and for an arbitrary action a € A(x);

- a transition probability function p : X x A(x) x X ^ [0,1] that gives

xex

the probability transitions pX,y from an arbitrary x € X to an arbitrary

y € X for a fixed action a € A(x), where £ pX y = 1, Vx € X, a € A(x);

yex

- a starting state x0 € X.

The game starts at the moment of time t = 0 in the state x0 where the player i € {1, 2,..., m} who is the owner of position x0 (x0 € Xj) chooses an action a0 € A(x0) and determines the rewards f x(x0, a0), f2(x0, a0),..., fm(x0, a0) for the corresponding players 1, 2,..., m. After that the game passes to a state y = xi € X according to probability distribution {pX0,y}• At the moment of time t =1 the player k € {1, 2,..., n} who is the owner of the state position x1 (x1 € Xk) chooses an action a1 € A(x1) and players 1,2,..., m receive the corresponding rewards f 1(x1, a1), f 2(x1, a1),..., fn(x1, a1^. Then the game passes to a state y = x2 € X according to probability distribution {pXl,y} and so on indefinitely. Such a play of the game produces a sequence of states and actions x0, a0, x1, a1,..., xt, at,... that defines a stream of stage rewards f 1(xt, at),f2(xt, at),..., fn(xt,at), t = 0, 1, 2, . . . players

o = tliminf 1 fj (xt,aT, i = 1, 2,..., n.

If px,a € {0,1} then the average stochastic positional game becomes a mean payoff game. The problem of the existence of pure and mixed stationary equilibria in a stochastic positional games has been studied in (Lozovanu, 2018; Lozovanu, 2019). The pure and mixed stationary strategies in such a game can be defined in analogous way as for a stochastic game, taking into account that each player select actions only in his state positions and determines in these states the step rewards for all players. Thus, a stationary strategy of player i € {1,2,..., n} in a stochastic positional game is a mapping sj that provides for every state x € Xj a probability distribution over the set of actions A(x). This means that the set of stationary strategies S of player i € {1, 2,..., n} can be identified with the set of solutions of the system

( £ 4,a = 1, Vx € Xi;

) a-eA(x) (33)

[ sX,a > 0, Vx € Xj, Va € A(x).

The payoffs (s1, s2,..., sm), i = 1, 2,..., n on S = S1 x S2 x • • • x S" for the game in normal form for the considered positional game we can obtain from (30), (31) if we take into account the particularity of the average stoshastic positional game. So,

m

(s1,s2,...,sm ) = £ £ ]T sX,afj (x, a)qx, i = 1, 2, ..., m, (34)

k=1 xeXfc aeA(x)

where qx for x e X are determined uniquely from the following system of linear equations

' m

qy - E E E sX,o qx = o,

fc=i xexfc oeA(x)

<

m

qy + wy - E E E sX,o pX,y wx = ^y,

fc=i xexfc oeA(x)

In (Lozovanu, 2018) is shown that the game (jS}j^g(s)}j=m ) defined according to (2) - (35) possesses a Nash equilibrium which is a stationary Nash equilibrium for the average stochastic positional game with an arbitrary starting state y e X. Moreover, in (Lozovanu, 2019) is shown that for two-player zero-sum stochastic positional games and for n-player average stochastic positional games with unichain property there exist stationary Nash equilibria in pure strategies.

7. Conclusion

An arbitrary average stochastic game with finite state and action spaces can be formulated in terms of stationary strategies as a game in normal form where each payoff is quasi-monotonic (quasi-concave and quasi-convex) with respect to the strategy of the corresponding player. Such a normal form game (the game model from Section 5) allows to determine all stationary Nash equilibria of the average stochastic game if stationary Nash equilibria exist. If the payoffs of the game in normal form are continuous or graph continuous then stationary Nash equilibria exist. For an average stochastic game with unichain property and for an average stochastic positional game stationary Nash equilibria always exist and all stationary equilibria can be found by using the corresponding game models in normal form from Section 4 and Section 6. For two-player zero-sum average stochastic positional n

there exist stationary equilibria in pure strategies References

Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambrigge university press. Dasgupta, P. and Maskin, E. (1986). The existence of equilibrium in discontinuous economic games, I: Theory. The Review of economic studies, 53, 1-26. Debreu, G. (1952). A social equilibrium existence theorem. Proceedings of the National

Academy of Sciences, 38, 886-893. Ehrenfeucht, A., Mycielski, J. (1979). Positional strategies for mean payoff games. Int. J.

of Game Theory, 8, 109-113. Filar, J., Vrieze, K. (1997). Competitive Markov Decision Processes. New York, NY, Springer.

Filar, J. A., Schultz, T. A., Thuijsman, F., Vrieze, O. (1991). Nonlinear programming and

stationary equilibria in stochastic games. Mathematical Programming, 50, 227-237. Fink, A.M. (1964). Equilibrium in a stochastic n-person game. Journal of Science of the

Hiroshima University, ser. math., 28, 89-93.. Flesch, J., Thuijsman, F., Vrieze, K. (1997). Cyclic Markov equilibria in stochastic games.

International Journal of Game Theory, 26, 303-314. Gillette, D. (1957). Stochastic games with zero stop probabilities. Contributions to the Theory of Games, 3, 179-187.

Vy € X;

Vy € X

(35)

Glicksberg, I. L. (1952). A further generalization of the Kakutani fixed point theorem wih application to Nash equilibrium points. Proceedings of the American Mathematical Society, 38, 170-174.

Kallenberg, L. (2016). Markov decision processes. University of Leiden, Netherland 2016.

Lozovanu, D. (2011). The yame-theoretical approach to Markov decision problems and determining Nash equilibria for stochastic positional games. International Journal of Mathematical Modelling and Numerical Optimisation, 2, 162-174.

Lozovanu, D. (2018). Stationary Nash equilibria for average stochastic positional games.Chapter 9 in the book "Frontiers of Dynamic Games, Static and Dynamic Game Theory: Fondation and Application"(L.Petrosyan et al. eds), Springer, 139-163.

Lozovanu, D. (2019). Pure and Mixed Stationary Nash equilibria for average stochastic positional games. Chapter 8 in the book "Frontiers of Dynamic Games, Static and Dynamic Game Theory: Fondation and Application "(L.Petrosyan et al. eds), Springer, 131-174.

Lozovanu, D., Pickl, S. (2015). Optimization of stochastic discrete systems and control on complex networks. Springer.

Mertens, J.-F., Neyman, A. (1981). Stochastic games. International Journal of Game Theory, 10, 53-66.

Neyman, A., Sorin, S. (2003). Stochastic games and applications. NATO science series, C, 569, Mathematical and physical sciences, Kluwer Academic Publishers.

Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Wiley, New Jersey.

Reny, P.J. (1999). On the existence of pure and mixed strategy Nash equilibria in discontinuous games. Econometrica, 67, 1029-1056.

Rogers, P. D. (1969). Nonzero-sum stochastic games. Technical Report, DTIC Document.

Schultz, T. A. (1986). Mathematical programming and stochastic games. Ph. D. Thesis, The John Hopkins University, Baltimore, Maryland.

Shapley, L. S. (1953). Stochastic games. Proceedings of the National Academy of Sciences, 39, 1095-1100.

Simon, L.K. (1987). Games with discontinuous payoffs. The Review of Economic Studies, 54, 569-597.

Sobel, M.J. (1971). Noncooperative stochastic games. The Annals of Mathematical Statistics, 42, 1930-1935.

Solan, E. (2009). Stochastic games. In: Encyclopedia of Complexity and Systems Science. 8698-8708, Springer.

Solan, E., Vieille, N. (2010). Computing uniformly optimal strategies in two-player stochastic games. Economic Theory, 42, 237-253.

Takahashi, M. (1964). Equilibrium points of stochastic non-cooperative n-person games. Journal of Science of the Hiroshima University, Series Al, Math., 28, 95-99.

Tijs, S., Vrieze, O. (1986). On the existence of easy initial states for undiscounted stochastic games. Mathematics of Operations Research, 11, 506-513.

Vieille, N. (2002). Stochastic games: Recent results. Handbook of Game Theory with Economic Applications, 3, 1833-1850.

Vieille, N. (2009). Equilibrium in 2-person stochastic games I,II. Israel Journal of Mathematics, 8698-8708.

Vrieze, O. J. (1987). Stochastic games with finite state and action spaces. CWI Tracts, 33, 1-221.

i Надоели баннеры? Вы всегда можете отключить рекламу.