Научная статья на тему 'PURE STATIONARY NASH EQUILIBRIA FOR DISCOUNTED STOCHASTIC POSITIONAL GAMES'

PURE STATIONARY NASH EQUILIBRIA FOR DISCOUNTED STOCHASTIC POSITIONAL GAMES Текст научной статьи по специальности «Математика»

CC BY
9
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
STOCHASTIC POSITIONAL GAMES / DISCOUNTED PAYOFFS / PURE STATIONARY STRATEGIES / MIXED STATIONARY STRATEGIES / NASH EQUILIBRIA

Аннотация научной статьи по математике, автор научной работы — Lozovanu Dmitrii, Pickl Stefan

A discounted stochastic positional game is a stochastic game with discounted payoffs in which the set of states is divided into several disjoint subsets such that each subset represents the position set for one of the player and each player control the Markov decision process only in his position set. In such a game each player chooses actions in his position set in order to maximize the expected discounted sum of his stage rewards. We show that an arbitrary discounted stochastic positional game with finite state and action spaces possesses a Nash equilibrium in pure stationary strategies. Based on the proof of this result we present conditions for determining all optimal pure stationary strategies of the players.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «PURE STATIONARY NASH EQUILIBRIA FOR DISCOUNTED STOCHASTIC POSITIONAL GAMES»

Contributions to Game Theory and Management, XII, 246-260

Pure Stationary Nash Equilibria for Discounted Stochastic

Positional Games

Dmitrii Lozovanu1 and Stefan Pickl2

1 Institute of Mathematics and Computer Science of Moldova Academy of Sciences,

Academiei 5, Ghisinau, MD-2028, Moldova, E-mail: lozovanu@math.md

2 Institute for Theoretical Computer Science, Mathematics and Operations Research,

Universität der Bundeswehr München, 85577 Neubiberg-München, Germany, E-mail: stefan.pickl@unibw.de

Abstract A discounted stochastic positional game is a stochastic game with discounted payoffs in which the set of states is divided into several disjoint subsets such that each subset represents the position set for one of the player and each player control the Markov decision process only in his position set. In such a game each player chooses actions in his position set in order to maximize the expected discounted sum of his stage rewards. We show that an arbitrary discounted stochastic positional game with finite state and action spaces possesses a Nash equilibrium in pure stationary strategies. Based on the proof of this result we present conditions for determining all optimal pure stationary strategies of the players.

Keywords: stochastic positional games, discounted payoffs, pure stationary strategies, mixed stationary strategies, Nash equilibria

1. Introduction

Stochastic games where introduced by Shapley, 1953. He considered two-person zero-sum stochastic games with finite state and action spaces for which he proved the existence of the value and the optimal stationary strategies of the players with respect to a discounted payoff criterion. Later this result has been extended to m-person stochastic games and the existence of Nash equilibria in stationary strategies have been obtained for a more general class of discounted stochastic games (see Fink, 1964; Takahashi, 1964; Sobol, 1971; Solan, 1998). Shapley defined a stationary-strategy for a player as a map that provides in each state of the game a probability-distribution over the set of feasible actions. Therefore a stationary strategy for a player in a stochastic game can be treated as a mixed stationary strategy. So, the existence of Nash equilibria results mentioned above are related to Nash equilibria in mixed stationary strategies for the considered games.

In this paper we study the problem of the existence of Nash equilibria in pure

m

that we call discounted stochastic positional games. This class of games has been

m

m

m

set for one of the players and each player controls the Markov decision process only in his position set. In such a game each player chooses actions in his position set in order to maximize the expected discounted sum of his stage rewards. We show that for an arbitrary discounted stochastic positional game with finite state and action

spaces there exists a Nash equilibrium in pure stationary strategies. Based on the proof of this result we present conditions for determining all pure stationary Nash equilibria.

The paper is organized as follows. In Section 2 the general formulation of a discounted stochastic positional game is presented. Then in Sections 3 the formulation of a discounted stochastic positional game is specified when the players use pure and mixed stationary strategies of choosing the actions in position sets. In Section 4 some new basic properties of the solutions for a discounted Markov decision problem in terms of stationary strategies are presented. Additionally, it is shown that such a problem can be represented as a quasi-monotonic programming problem. Based on these results in Section 5 it is shown that a discounted stochastic game can be formulated in terms of stationary strategies, where the payoff of each player is quasi-monotonic with respect to his strategy. Using these properies a new proof of the existence of stationary Nash equilibrium for a discounted stochastic game is derived and new conditions for determining the optimal strategies of the players are obtained. In Section 6 it is shown that a stochastic positional game with discounted payoff represents a particulary case of a discounted stochastic game and the corresponding conditions for determining the stationary strategies of the players are specified. In Section 6 the proof of the existence of pure stationary Nash equilibria for an arbitrary discounted stochastic positional game is presented.

2. Formulation of the Discounted Stochastic Positional Game in the Term of Stationary Strategies

First we present the general model for a discounted stochastic positional game and then we specify the formulation of the game when the players use pure and mixed stationary strategies of choosing the actions in their state positions.

2.1. The General Model of a Discounted Stochastic Positional Game

A discounted stochastic positional game with m players consists of the following elements:

- a state space X (which we assume to be finite);

- a partition X = Xi U X2 U • • • U Xm where Xj represents the position set of player i G {1, 2,..., m};

- a finite set A(x) of actions in each state x G X;

- a step reward f j(x, a) with respect to each player i G {1, 2,..., m} in each state x G X and for an arbitrary action a G A(x);

- a transition probability function p : X x H A(x) x X ^ [0,1] that gives

xex

the probability transitions pX,y from an arbitrary x G X to an arbitrary y G X

for a fixed action a G A(x), where Px y = 1, Vx G X, a G A(x);

yex

- a discount factor 7, 0 < 7 < 1;

- a starting state x0 G X.

The game starts at the moment of time t = 0 in the state x0 where the player i G {1, 2,..., m} who is the owner of the state position x0 (x0 G Xj) chooses an action a0 G A(x0) and determines the rewards f 1(xo, a0), f 2(xo, a0),..., fm(xo, a0) for the

corresponding players 1,2,..., m. After that the game passes to a state y = x1 e X according to a probability distribution {p®0,y}• At the moment of time t =1 the player k e {1,2,..., m} who is the owner of the state position xi (xi e Xk) chooses an action a1 e A(x1) and players 1, 2, ...,m receive the corresponding rewards f 1(x1, a1), f 2(x1, a1),..., fm(x1; a^. Then the game passes to a state y = x2 e X according to a probability distribution {p®1,y} and so 011 indefinitely. Such a play of the game produces a sequence of states and actions xo, ao, x1; a1;..., xt, at,... that defines a stream of stage rewards f 1(xt, at), f 2(xt, at),..., f m(xt, at), t = 0,1,2,.... The discounted, stochastic positional game is the game with payoffs of the players

o = E (X0 YT f^ 'Or)) ' i = 1' 2,...,m

where E is the expectation operator with respect to the probability measure in the Markov process induced by actions chosen by players in their position sets and given xo

discounted sum of his stage rewards. In the case m =1 this game becomes the discounted Markov decision problem with given action sets A(x) for x e X, a transition probability function p : X x A(x) x X ^ [0,1] , step rewards

xex

f (x, a) = f 1(x,a) for x e X, a e A(x), given discount factor A and starting

xo

In the paper we will study the discounted stochastic positional game when the players use pure and mixed stationary strategies of choosing the actions in the states.

2.2. A Discounted Stochastic Positional Games in Pure and Mixed Stationary Strategies

A strategy of player i e {1, 2,..., m} in a stochastic positional game is a mapping s® that provides for every state xt e X® a probability distribution over the set of actions A(xt). If these probabilities take only values 0 Mid 1, then s® is called a pure strategy, otherwise s® is called a mixed strategy. If these probabilities depend only on the state xt = x e X® (i. e. s® does not depend on t), then s® is called a stationary strategy, otherwise s® is called a non-stationary strategy.

Thus, we can identify the set of mixed stationary strategies S® of player i with the set of solutions of the system

( E s®x,0 = 1, Vx e X®;

) aeA(®) (!)

( s®®,a > 0, Vx e X®, Va e A(x).

player i e {1, 2,..., m} So, the set of pure stationary strategies S® of player i corresponds to the set of basic solutions of system (1).

Let s = (s1, s2,..., sm) e S = S1 x S2 x • • • x Sm be a profile of stationary-strategies (pure or mixed strategies) of the players. Then the elements of probability-transition matrix Ps = (p® ) in the Markov process induced by s can be calculated as follows:

PX ,y = X) SX , oPX ,y for x G i = 1, 2, ...,m. (2)

aeA(x)

Let us consider the matrix Ws = (wX,y) where Ws = (I — 7Ps)-1 . Then in a discounted stochastic positional game the payoff of player i G {1, 2,..., m} for a given profile s and initial state x0 G X is determined as follows

m

4 0 (s) = ££ wX„,y f j(y, ), i = 1, 2,..., m, (3)

fc=iyexfc

where

f j(y,sk )= E sk,0fj(y,a)^r y G Xk, k G{1, 2,. .., m} (4)

aeA(y)

The functions ai„(s), a2„(s), ..., (s) 011 S = S1 x S2 x ••• x Sm, defined according to (3),(4), determine a game in normal form that we denote by ({Sj}j=^m, {aX„(s)}j=^m )• This game corresponds to a discounted stochastic positional game in mixed stationary strategies that in extended form is determined by the tuple ({Xi}i=^, {A(x)}xex, {f j(x, a)}i=1,m, p, 7, x0). The functions a^„ (s), (s),..., ^m (s) on S = S1 x S2 x • • • x Sm, determine the game ({Sj}j=^m, {aX„ (s)}j=^m ) that corresponds to a discounted stochastic positional game in pure strategies.

A stochastic positional games can be considered also for the case when the starting state is chosen randomly according to a given distribution {0x} on X. So, for a given stochastic positional game we may assume that the play starts in

the state x G X with probability #x > 0 where = 1. If the players use

xex

mixed stationary strategies then the payoff functions

4(s)=E 0x4(s), i = 1, 2,..., m

xex

on S define a game in normal form ({Sj}i=j,m, {ae (s)}j=1m ) that in extended form is determined by ({Xj}j=1^, {A(x)}xex, {f j(x, a)}j=1m,p, Y, {#x}xex)• In the case 0x = 0, Vx G X \ {x0}, 0x„ = 1 the considered game becomes a stochastic

x0

3. Some Auxiliary Results

To prove the main results we need some properties of reward optimality equations for a discounted Markov decision problem with finite state and action spaces. Based on these properties we show how to determine the solutions of a discounted Markov-decision problem and how to formulate such a problem in terms of stationary strategies as a quasi-monotonic programming problem. We shall use these results in the sequel for the discounted stochastic positional games.

3.1. Optimality Equations for a Discounted Markov Decision Process

Here we present the optimality equations for a discounted Markov decision process determined by a tuple (X, {A(x)}xeX, {f (x, a)}xeX,aeA(x), p, Y) where X is a finite set of states; A(x) is a finite set of actions in x e X; f (x, a) is a step reward in x e X for a e A(x), p : X x A(x) x X ^ [0,1] is a probability transition

xex

function that satisfies the condition EyeX PyeX = 1, Vx e X, a e A(x) and 7 is a discount factor.

Theorem 1. Let a Markov decision process (X,{A(x)}xeX, {f (x, a)}xeXaeA(x),p, 7) be given. Then the system of equations

ax = max \f (x, a) + Y^Px«f> Vx e X; (5)

aeA(x)l ytX J

Aas a unique solution with respect to ax, x e X. If a*, x e X, is i/ie solution of system (5) then

max \ / (x, a) + 7 V p£ - < f = 0 Vx G X

and an arbitrary stationary strategy

s* : x ^ a e A(x) for x e X

sucft that

"(x) = a* € argmax < /(x, a) + 7 ^ p ay - ^ /or x € X

represents an optimal stationary strategy for the discounted Markov decision problem

with an arbitrary starting state x e X; the values w* /0rx e X represent the optimal

s*

x

The proof of this theorem can be found in Puterman, 2005. Based on this theorem the optimal values wx e X for a discounted Markov decision problem can be determined by solving the following linear programming problem: Minimize

(a) =^2 0xa x (6)

x£X

subject to

ax > f (x, a) + y ^ Px,ya«, Vx e X, Va e A(x) (7)

yeX

where 0x, x e X represent arbitrary positive values such that E 0x = 1.

x£X

3.2. Dual Linear Programming Model for a Discounted Markov Decision Problem

The dual problem for the linear programming problem (6), (7) is the following:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Maximize

(a) = X^ X^ f(x,a)ax,a (8)

xex oeA(x)

subject to

{E ay,a — YE E px,v ax,a = , Vy G X; aeA(y) xex oeA(x) ^g-j

ax,a > 0, Vx G X, a G A(x),

where 0y for y G X represent arbitrary positive values that satisfy the condition = 1. Here 0y for y G X can be treated as the probabilities of choosing the

yex

starting state y G X in the decision problem. In the case 0y = 1 for y = x0 and 0y = 0 for y G X \ {x0} we obtain the linear programming model for the discounted

x0

In Puterman, 2005 the following relationship is shown between feasible solutions of problem (8), (9) and stationary strategies in the discounted Markov decision problem determined by the tuple (X, {A(x)}xex, {f (x, a)}xex,aeA(x), p): If a is an arbitrary feasible solution of the linear programming problem (8), (9) then E ax,a > 0, Vx G X and a stationary strategy s : x ^ a G A(x) for x G X

aeA(x)

that corresponds to this feasible solution is determined as follows

a

sx,a =-—- for x G Xa, a G A(x), (10)

X/ ax,a

aeA(x)

where sx,a expresses the probability of choosing the action a G A(x) in x G X.

3.3. A Discounted Markov Decision Problem in Terms of Stationary Strategies

Using the relationship between feasible solutions of problem (8), (9) and stationary-strategies (12) we can formulate the discounted Markov decision problem in terms of stationary strategies as follows: Maximize

(s q) = X^ X^ f(x,a)sx,aqx (li)

xex aeA(x)

subject to

'qy — yE E px,v Sx,aqx = , Vy G X;

xex aeA(x)

E sy,a = 1, Vy G X; (12)

aeA(y)

, sx,a > 0, Vx G X, Va G A(x);

where 0y are the same values as in problem (8), (9) and sx,a, qx for x G X, a G A(x) represent the variables that must be found. It is easy to observe that for fixed sx,a, x G X, a G A(x) system (12) uniquely determines qx for x G X. This means that (s, q) depends only on s and for a given s G S we have (s, q) = (s) i.e. in (11) we can set

(s) = 53 53 f (x,a)sx,aqx.

xex aex

So the decision problem in stationary strategies (problem (11), (12)) can be derived from (8), (9) if we introduce the following notations

qx =53 ax,a, Vx G X; Sx,a = -axa-, Vx G X, a G A(x). (13)

aeA(x) ax,a

aeA(x)

This means that if ax,a, x G X, a G A(x) is a feasible solution of problem (8), (9) then sx,a, x G X, a G A(x) and qx, x G X, determined according to (13), represent a feasible solution of problem (11),(12). Conversely, if sx,a, x G X, a G A(x); qx, x G X is a feasible solution of problem (11),(12) then ax,a = sx,aqx, x G X, a G A(x) represent a feasible solution of problem (8), (9).

3.4. A Quasi-Monotonic Programming Model in Stationary Strategies for a Discounted Markov Decision Problem

Based on the results from the previous section we show that a discounted Markov-decision problem in stationary strategies can be represented as a quasi-monotonic programming problem. We assume that an average Markov decision problem is determined by a tuple (X, {A(x)}xex, {f (x, a)}xex, p, {#x}xex,A).

Theorem 2. Let an average Markov decision problem be given and consider the function

(s) = 53 53 f (x,a)sx,a qx,

xex aeA(x)

qx x G X

qy — 53 px,y Sx,aqx = tfy, Vy G X. (14)

xex aeA(x)

Then on the set S of solutions of the system

22 Sx,a = 1, Vx G X;

aeA(x)

sx,a > 0, Vx G X, a G A(x)

the function (s) depends only on sx,a for x G X, a G A(x) rad (s) ¿s quasi-monotonic on S ( i.e. (s) ^s quasi-convex and quasi-concave on S).

Proof For an arbitrary s G S system (14) uniquely det ermines qx for x G X and therefore ^(s) is drtermined uniquely for an arbitrary s G S, i.e. the first part of the theorem holds.

Now let us prove the second part of the theorem. We show that the function ^(s) is quasi-monotonic on S. To prove this it is sufficient to show that for an arbitrary c e R1 the sublevel set

L-(ae) = {s e S| a0(s) < c}

and the superlevel set

L+(a0) = {s e S| a0(s) > c}

a(s)

set ) = {a| (a) < c} and the superlevel set L+ ) = {a| (a) > c} of

function (a) for the linear programming problem (8), (9).

Denote by a®, i = 1, k the basic solutions of system (9). All feasible strategies of problem (8), (9) can be obtained as convex combination of basic solutions a®, i = 1, k. Each a® e {1, 2,..., k} determines a stationary strategy

^, x G X, a G A(x) (15)

qX

for which a(s(i)) = <£>(a®) where

qX = E a*x,a, Vx G X. (16)

oea(i)

An arbitrary feasible solution a of system (9) determines a stationary strategy

a

Sx a = for x G X, a G A(x) (17)

qx

for which a0 (s) = <£>0(a) where qx = E -x,a, Vx G X. Taking into account that

aeA(x)

a can be represented as a = Ei=i A® a®, where Ei=i A® = 1, A® ^ 0, i = 1, k we k

have <£>0 (a) = E (a® )A® and we can consider

®=i

k k a = E A®a®; q = £ A®q®; (18)

®=i ®=i

Using (15)-(18) we obtain

kk

E A®aX,a E A®s®x,aqX k

x,a x

Sx.a = —— = —- = —- = > —— s®"a, Vx G A„, a G A(x)

x,a x,a x k ® ® ax,a _ ®=i_ _ ®=i A qx

5x,a = - = - = --s®®,a, Vx G Xa, a G A(x)

qx qx qx qx

®=i

and

k

q® = ]T A®q® , for x G X. (19)

®=i

So,

k A®q®

Sx a = V — sX®a for x G X, a G A(x) (20)

qx ,

®=i Hx

where qx for x e X are determined according to (19). The strategy s defined by (20) is a feasible strategy because sx,a > 0, Vx e X, a e A(x) and EaeA(X) sx,a =

k

1, Vx e X. Moreover, we can observe that qx = E A®qX, for x e X represent a

i=i

solution of system (14) for the strategy s defined by (20). This can be verified by-introducing (19) and (20) in (14); after such a substitution all equations from (14) are transformed into identities. For a(s) we have

k /A® q® \

(s) = E E f (x,a)sx,a qx = E E f (x,a)E( -X SX®,)a) qx =

xeXaeA(x) xeXaeA(x) ®=1 ^ qx 7

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

kk E EE f (x, a)sX®j)aqX j A® = £ a, (s(i) )A®,

i=1 vxeXaeA(x) 7 i=1

i.e.

a, (s) = E a, (s(i))A®, (21)

®=i

where s is the strategy that corresponds to a. This means that if strategies s(1), s(2),..., s(k) correspond to basic solutions a1, a2,..., ak, of problem (8), (9) and s e S corresponds to an arbitrary solution a that can be expressed as convex combination of basic solutions of problem (8), (9) with the corresponding coefficients A1, A2,..., Ak then we can express the strategy s and the corresponding value ag(s) by (19)—(21).

seS

A1, A2,..., Ak correspond to a solution of the following system

k

EA® = 1; A® > 0, i = 17k.

®=1

Consequently, the sublevel set ) of function a(s) represents the set of strategies

s determined by (19), (20), where A1, A2,..., Ak satisfy the condition

E ae(s®)A® < c; ®=1 _ (22)

¿A® = 1; A® > 0, i = 1,k

®=1

and the superlevel set L+(ag) of (s) represents the set of strategies s determined

A1, A2, . . . , Ak

k

E ag(s(<))A® > c; ®=1 _ (23)

¿A® = 1; A® > 0, i = 1, k.

■ ®=1

Let us show that L-(ag), L+(ag) are convex sets. We present the proof of convexity of sublevel set L-(ag). The proof of convexity of L+(ag) is similar to the proof of convexity of L-(ag).

Denote by A the set of solutions (A1, A2,..., Ak) of system (22). Then from (19), (20), (22) we have

) = 11

xex

where Sx represents the set of strategies

ELi A®qXsX

ELi A® qX

-, for a e A(x)

in the state x e X determined by (A1, A2,..., Ak) e A. Here E®=1 A®qX > 0 and sx,a for a given x e X represents a linear-fractional function with respect to A1, A2,..., Ak defined on a convex set Ax and Sx is the image of sx,a on Ax. Therefore Sx is a convex set (see Boyd and Vandenberghe, 2004). □

4. Stationary Nash Equilibria for Discounted Stochastic Games

As we have noted the problem of the existence of stationary Nash equilibria for discounted stochastic games have been studied by Fink, 1964; Takahashi, 1964; Sobol, 1971 and Solan, 1998. In this section we present a normal form game for a discounted stochastic game in mixed stationary strategy and show that the payoffs of the players in such a game are continuous and quasi-monotonic with respect to the corresponding strategies of the players. Based on these properties and the results of Dasgupta and Maskin, 1986 we obtain a new proof of the existence of stationary equilibria in a discounted stochastic game. Moreover using such a model we can derive the conditions for determining the optimal stationary strategies of the players..

4.1. A Normal Form of a Discounted Stochastic Game in Stationary Strategies

In general, an m-player discounted stochastic game is determined by the following elements:

X

- a finite set A® (x) of actions with respect to each player i e {1,2,..., n}

xeX

- a payoff f ®(x, a) with respect to each player i e {1,2,..., n} for each

state x e X and for an arbitrary action vector a e n A®(x);

®

n

- a transition probability function p : X x [| A®(x) x X ^ [0,1]

xex ®=1

that gives the probability transitions px,y from an arbitrary x e X to an arbitrary y e Y for a fixed action vector a A®(x), where

E px,y = 1, Vx e X, a en A®(x);

yex ®

a discount factor y, 0 < y < 1;

- a starting state x0 e X.

The game starts in the state x0 and the play proceeds in a sequence of stages. At stage t the players observe state xt and simultaneously and independently choose actions at € A® (xt), i = 1, 2,..., n. Then nature selects a state y = xt+1 according to probability transitions for the given action vector at = (a]-, a2,..., a^). Such a play of the game produces a sequence of states and actions x0, a0, x1; a1;..., xt, at,... that defines a stream of stage payoffs ft1 = f 1(xi,ai), ft2 = f 2(xt ,at),..., ft" = f n(xt, at), t = 0,1, 2,.... Tfte discounted stochastic game is the game with payoffs of the players

a® o = E YT f®(xT 'Or ^ ' i = 1' 2, ...,m

We will assume that players use stationary strategies of choosing the actions in the states. A stationary strategy s® of player i € {1,2,..., m} we define as a mapping s® that provides for for every state x € X a probability distribution over the set of actions A(x). So we can identify the set of stationary (mixed stationary strategies) strategies S® of player i with the set of solutions of the system

£ s®®,a = 1, Vx € X;

aeA(x) (24)

s®®,a > 0, Vx € X, Va € A(x).

player i € {1, 2,..., m}. So, the set of pure stationary strategies S® of player i corresponds to the set of basic solutions of system (24).

Let s = (s1, s2,..., sm) € S = S1 x S2 x • • • x Sm be a profile of stationary strategies (pure or mixed strategies) of the players. Then the elements of the probability-transition matrix Ps = (p® ) in the Markov process induced by s can be calculated as follows:

n

= E n s®,0kp®al'a2"'"an). (25)

(a1 ,a2,...,a" )£A(®) k=1

Let us consider the matrix Ws = (w® ) where Ws = (I — yPs)-1 . Then in a discounted stochastic positional game the payoff of player i € {1,2,... ,m} for a s x0 € X

a®o (s) = E w®0,yf®(y, s), i = 1, 2, ..., n, (26)

yex

where

n

f®(y,s)= E n sk,akf®(y, a1, a2,.. ., a") (27)

(a1,a2,...,an)eA(y) k=1

The functions a_1o(s), a2o(s), ..., (s) on S = S1 x S2 x ••• x Sm, determined according to (26),(27), define a game in normal form that we denote by ({S®}®=rm, {a®0(s)}®=T^ )• This game corresponds to a discounted stochastic game in stationary strategies.

The discounted stochastic game can be considered also for the case when the starting state is chosen randomly according to a given distribution {0®} on X.

x € X

with probability 0x > 0 where £ = 1. If the players use mixed stationary

xex

strategies then the payoff functions

a®(s)=E 0x^x(s), i = 1, 2,..., m

xex

on S define a game in normal form ({S®}®=^m, {a® (s)}®=^m )• In the case 0x = 0, Vx e X \ {xo}, #xo = 1 the considered game becomes a discounted stochastic game with a fixed starting state x0.

Bellow we show how to represent explicitly the payoff functions a® (s1, s2,..., sm) on S = S1 x S2 x • • • x Sm. Based on Theorem 2 and (26), (27) the payoffs in the game ({S®}®=1m, {a®(s)}®=xm ) can ^e defined as follows

{m

a® (s1, s2, . .., sm) = £ £ n sx,akf ®(x, a1, a2 ... am)qx,

xeX (a1,a2,...,am)eA(x) k=1 ' ^g)

i = 1, 2,..., m,

where qx, x e X are determined uniquely from the following system of equations

m

qy - yE E IK,,,*pxa;'°2'-'°m)qx = , Vy e X; (29)

xex (a1,a2 ,...,o")eA(x) k=1

for an arbitrary s = (s1, s2,..., sm) e S = S1 x S2 x • • • x Sm, where each S®, i e {1, 2, . . ., m}

Each payoff function a® (s1,s2, ..., sm) on S is continuous. Additionally, according to Theorem 2, each a®(s1, s2, ..., sm) is quasi-monotonic with with respect s® S®

Debreu, 1952 we obtain the following theorem.

Theorem 3. The game ({S®}®=^m, {a®(s)}®=^m ) has a Nash equilibrium s = (s1, s2,..., sm) e S = S1 x S2 x • • • x Sm that is a stationary Nash equilibrium of

x e X

4.2. A Normal Form of a Discounted Stochastic Positional Game in Stationary Strategies

It is easy to see that a discounted stochastic positional game determined by a tuple ({X®}®=i,n, {A(x)}xex, {f®(x, a}®=T-m, p, {0y}yex) represent a particulary case of the discounted stochastic game from Section 4.1. Therefore if we specify the game model from the previous section for the positional game then we obtain the normal form of the positional game in stationary strategies ({S®}®=1m, {ax0(s)}®=1m ) where S^d ax0 (s), i e {1, 2,..., m} are defined as follows.

Let S®, i e {1, 2,... m} be the set of solutions of the system (1) that determines the set of stationary strategies of player i. On the set S = S1 x S2 x • • • x Sm we m

m

a® (sV,...,sm) = E E E sx,af®(x,a)qx, i = 1, 2, .. ., m, (30)

fc=1 xeXfc aeA(x)

qx x e X equations

m

qy - yE E E sx,a px,y qx = tfy, Vy e X; (31)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

k=1 xeXk aeA(x)

for an arbitrary s = (s1, s2,..., sm) e S = S1 x S2 x • • • x Sm.

Note that here the payoff functions a®(s1, s2,..., sm) defined according to (30), (31) for the positional game differ from the payoff functions a® (s1, s2,..., sm) defined according (28), (29) in the general case of the game. As a corollary from Theorem 3 we obtain that for the game ({S®}®=^m, {a®(s)}®=1m ) defined according to (1), (30), (31), there exists a Nash equilibrium s = (s1 ,s2,...,sm) e S = S1 x S2 x • • • x Sm that is a stationary Nash equilibrium of the discounted

xeX

5. Existence of Pure Stationary Equilibria for a Discounted Stochastic Positional Game

The existence of Nash equilibria in pure stationary strategies for a discounted stochastic positional game can be derived on the basis of the following theorem.

Theorem 4. Let a discounted stochastic positional game be given that is determined by the tuple ({X®}®=^m, {A(x)}xeX, {f®(x, a}®=1m:,p)- Then there exist the values

ax® x e X, i = 1, 2, . . . , m

1) f®(x,a) + Y£pxyay - ax < 0, Vx e X®, Va e A(x), i = 1, 2,. .., m,

yex

2) max {f®(x, a) + y £ px,yay - ax} = 0, Vx e X®, i = 1, 2,..., m;

aeA(x) yex

3) on each position set X®, i e {1,2,..., m} there exists a, map s®*:X®^UxeXiA(x) such that

s®*(x) = a* e Arg max {f®(x,a)+ y E px,yay - ax} aeA(x) k yex J

and

fj (x, a*) + YEpxiy aj - ax = 0, Vx e X®, j = 1, 2,...,m.

yex

The maps s1*, s2*,..., sm* determine a Nash equilibrium, s* = (s1*, s2*,..., sm*) for the discounted stochastic positional game determined, by ({X®}®=^m, {A(x)}xeX, {f®(x, a}®=^m,p, Y) and s* = (s1*, s2*,..., sm*) is a, pure stationary Nash equilib-

xeX

Proof. According to Theorem 3 for the discounted stochastic positional game determined by ({X®}®=1m, {A(x)}xeX, {f®(x, a}®=1~m,p) there exists a stationary Nash equilibrium s* = (s1*, s2*,..., s*). If s®* is a mixed stationary strategy of player i e {1, 2,..., m} then for a fixed x e X® the strategy s® (x) represents a convex combination of actions determined by the probability distribution {s®x a} on A*(x) = {a e A(x)| s®x,a > 0}. '

Let us consider the Markov process induced by the profile of mixed stationary-strategies s * = (s1 , s2 ,..., s*). Then according to (2) the elements of transition probability matrix Ps = (p® ) of this Markov process can be calculated as follows

= E s®X,aPS,y fOT x € X®' i = 1 2,...,m. (32)

aeA(®)

and the step rewards in the states induced by s can be determined according to (4), i. e.

f ®(x, sk * )= ]T sfc®iaf®(x,a)^r x € Xfc, k € {1, 2,...,m}. (33)

a£A(®)

Based on Theorem 1 for this Markov process we can write the following equations

fj (x,s®*)+ y E pC aj — aj = 0, Vx € X®, Vij € {1, 2,...,m}. (34) yex

From these equations we determine uniquely a®, i =1, 2,..., m (Puterman, 2005). These values satisfy the condition

fj (x,a) + aj — aj < 0, Vx € X®, Va € A(x) Vi,j € {1, 2,...,m}. (35)

yex

By introducing (32) and (33) in (34) we obtain

E s®®,afj (x,a))+^E E I®®,aPS,yaj —aj =0, Vx € X®' Vi,j € {1, 2,...,m}.

aeA(®) y£X a£A(®)

In these equations we can set a® = E s®® aa®. After these substitutions and

a£A(®)

some elementary transformations of the equations we obtain

]T s® ®,a (fj (x,a)+ aj — aj) = 0, Vx € X®, Vi,j € {1, 2,...,m}.

aeA(®) yex

So, for the Markov process induced by the profile of mixed stationary strategies s * = (s1 , s2 ,..., s*) there exists the values a®, x € X, i = 1, 2,..., m that satisfy the following condition

f j(x, a) + y E P®,yaj — w®® = 0, Vx € X®, Va € A * (x), j = 1, 2,..., m. (36) yex

Now let us fix the strategies s1 *, s2 *, ..., s®-1 *, s®+1*, ...,s * of the players 1, 2,..., i — 1, i + 1,..., m and consider the problem of determining the maximal expected total discounted reward with respect to player i € {1,2,..., m}. Obviously, if we solve this decision problem then we obtain the strategy s® . However for

si

optimality equations for the discounted Markov decision problem with respect to player i then we obtain that there exists the values w® for x € X such that

1) f®(x, a) + £ p® yej — e®® — w® < 0, Vx € X®, Va € A(x);

yex

2) max {/®(x, a) + £ p® «4 - 4 - = 0, Vx G X®.

We can observe that a®, x G X, determined from (34), satisfy conditions 1), 2) above and (36) holds. So, if for an arbitrary i G {1, 2,..., m} we fix a map s® : X® ^ UxeXi A(x) such that

s®*(x) = a* G Arg max j/i(x, a) + 7 p® ya! - a®}, Vx G X

oEa(i) l " J

and

( ' yex

fj(x, a*)+ Y^Xyay - aj = 0, Vx e X®, j = 1, 2, .. ., m

yex

then we obtain a Nash equilibrium in pure stationary strategies. □

6. Conclusion

Discounted stochastic positional games represents a special class of discounted stochastic games with finite state and action spaces for which pure stationary Nash equilibria exist. The considered class of games represents a generalization of discounted deterministic positional games on graphs considered by Gurvich et al., 1988. A pure and a mixed stationary Nash equilibria for a discounted stochastic positional game can be obtained by using the game models and conditions from Sections 6,7. Stationary Nash equilibria for a discounted stochastic game can be determined by using the game model in mixed stationary strategies from Section 5.

References

Boyd, S. and Vandenberghe, L. (2004). Convex optimization. Cambridge University Press. Dasgupta, P. and Maskin, E. (1986). The existence of equilibrium in discontinuous economic games. Rev. Econ. Stud. 53, 1-26. Debreu, G. (1952). A social equilibrium existence theorem. Proceedings of the National

Academy of Sciences, 38, 886-893. Fink, A. (1964). Equilibria in a stochastic n-person game. Journal of Science of Hiroshima

University, Series A-I, 28, 89-93. Gurvich, V. Karzanov, A. and Khachiyan, L. (1988). Cyclic games and an algorithm to find minimax cycle means in directed graphs. USSR Comput. Math. Math. Phis., 28, 85-91.

Lozovanu, D. and Pickl, S. (2015). Optimization of Stochastic Discrete Systems and Control on Complex Networks. Springer. Puterman, M. (2005). Markov Decision Processes: Discrete Dynamic Programming. Wiley, Hoboken.

Shapley, L. (1953). Stochastic games. Proc. Natl. Acad. Sci. USA, 39, 1095-1100. Sobol, M. (1971). Non-cooperative stochastic games. Ann. Math. Statist., 42, 1930-1935. Solan, E. (1998). Discounted stochastic games. Mathematics of Operation Research, 23(4), 1010-1021.

n

J. Sci. Hiroshima Univ., Series A-I, 28, 95-99.

i Надоели баннеры? Вы всегда можете отключить рекламу.