Elena M. Parilina
St.Petersburg University,
Faculty of Applied Mathematics and Control Processes, Bibliotechnaya pl. 2, St.Petersburg, 198504, Russia E-mail: [email protected] WWW home page: http://www.apmath.spbu.ru/ru/staff/parilina/
Abstract The paper considers stochastic games in the class of stationary strategies. The cooperative form of this class of stochastic games is constructed. The cooperative solution is found. Conditions of dynamic stability for stochastic games are obtained. Principles of dynamic stability include three conditions: subgame consistency, strategic stability and irrational behavior proof condition of the cooperative agreement. Also the paper considers the example for which the cooperative agreement is found and the conditions of dynamic stability are checked.
Keywords: cooperative stochastic game, stationary strategies, time consistency, subgame consistency, payoff distribution procedure, strategic stability, irrational behavior proof condition
1. Introduction
A stochastic game is a dynamic game process. If cooperation is possible in the game an important property of cooperative agreement is stability in dynamics. The work of Petrosyan and Zenkevich, 2009, considers three principles of stable cooperation: time-consistency (dynamic consistency), strategic stability and irrational behavior proof condition.
L.A. Petrosyan was the first to introduce the concept of dynamic consistency for differential games (Petrosyan, 1977). This condition appeared to be typical also for stochastic games (Petrosyan, 2006). The paper considers stochastic games in stationary strategies with the finite number of states any of which can be realized at every game stage. With this type of the definition of stochastic games the consistency of cooperative agreement should take place in every position (state) of the game. In other words, the claim of subgame consistency is laid to the cooperative agreement. Subgame consistency of the cooperative agreement lets the players expect receiving the allocation according to the same optimality principle in every stochastic subgame.
The condition of strategic stability is guaranteed by the existence of Nash equilibrium in the regularized game with the payoffs that players expect to receive as a result of the cooperative agreement. The regularization of the game is constructed on the basis of the initial stochastic game with the use of payoff distribution procedure (Petrosyan and Danilov, 1979). The conditions of strategic stability for stochastic games were also considered by Grauer and Petrosyan, 2002.
The irrational behavior proof condition (Yeung, 2006) lets guarantee that if some player (group of players) cancels the agreement at some stage of the game and players play individually from this stage to the end of the game they receive not
less than if each player plays by himself during the whole game. This condition also lets secure the cooperative agreement from the force majeure circumstances.
The definition of stochastic games was introduced by Shapley, 1953a. At present a lot of papers deal with the study of stochastic games (Petrosyan et al., 2004, Petrosyan and Baranova, 2006, Herings and Peeters, 2004). Stochastic games have wide application in the field of telecommunication system modeling (Parilina, 2010, Altman et al., 2003), in economics (Amir, 2003), in the problem of tax evasion (Raghavan, 2006).
2. Stochastic games in stationary strategies
Stochastic game begins with the chance turn, i.e. with the choice of the initial state of the game which the game process begins with. The state of the stochastic game is determined as simultaneous normal form game of n players. One of the finite number of states is realized at each stage of the stochastic game. In the state some action profile is realized depending on which transition to the next states is accomplished with some probability. The payoff of the players is discounted when the game goes on. There are some notations:
— The set of players is N = {1,...,n}.
— The set of states is {rj }tj=1 where rj = (N, Xj,..., Xj, K\,..., Kj) is state j, set N is equal for all rj, j = 1,...,t, Xj is the finite set of pure strategies of player i in rj, Kj(xj . ..,xjn) = Kj(xj) is a payoff function of player i in state
rj, j = 1,...,t.
— The probability that state rk is realized if at the previous stage (in state rj) action profile xj = (x-{,...,xXn) has realized, is p(j,k; xj). It is obvious that p(j,k; xj) ^ 0 and ^tk=1 p(j,k; xj) = 1 for each xj G Xj = n®ew
Xij and for
any j,k = 1,...,t.
— The discount factor is S G (0,1).
— The vector of the initial distribution on states r1 ,...,rt is n = (ni,.. .^i0^ where nj0 is the probability that state rj is realized at the first stage of the game, £j=i n0 = 1;
— The set of player i’s stationary strategies is Si = {r/i}. Using stationary strategies the player’s choice of the strategy in each state from set {r1,... ,r^ at any stage depends only on which state is realized at this stage, i.e. ni : rj i—> xj G Xj,j = 1,...,t.
Definition 1. Call the set
o = (», , {Sl)KH,s, a } m
finite stochastic game in stationary strategies.
Definition 2. Call stochastic game (12) with vector n0 = (0,...,0,1,0,..., 0) (with 1 in the jth component), i.e. the game beginning with state rj, the finite stochastic subgame in stationary strategies and denote as Gj, j = 1,...,t.
Remark 1. Obviously, player i’s stationary strategy in game G is player i’s stationary strategy in any subgame G1,...,Gt.
Payoff in finite stochastic game is a random variable. So we have to determine the utility function of the payoff. Consider the mathematical expectation of the player’s payoff as the utility of his payoff in stochastic game G. Let Ei(r/) be the expected payoff of player i in game G and Ej (n) be the expected payoff of player i in subgame Gj when strategy profile n is realized in stochastic game G (subgame Gj). Form vector Ei(n) = (Ej(n), ...,Ej(n)).
For the expected payoff of player i in subgame Gj the following recurrent equation takes place:
t
Ej (n) = Kj (x°) + sJ2p(j, k; x°)Ek (n) (2)
k = 1
under condition that n(rj) = xj, i.e. n( ) = (n1('),... ,nn()) where ni(rj) = xj G
Xj, xj = (xj,. ..,xjn) for each j = 1,...,t, i G N.
Since stochastic game G is considered in the class of stationary strategies defined above and the set of states {r1,..., rt} is finite then it is sufficiently to consider t of subgames G1,...,Gt accordingly beginning with the states r1 ,...,rt.
Hereinafter, let n( ) = (n1('),... ,nn()) be the stationary strategy profile such as ni(rj) = xj G Xj where j = 1,...,t, i G N. We restrict our consideration to the set of player i’s pure stationary strategies in stochastic game G. Denote it as Si.
The matrix of transition probabilities in stochastic game G under the realization of stationary strategy profile n( ) looks like:
n (n) =
(p(1, 1; x1) ...p(1,t; x1)^ p(2,1; x2) ...p(2,t; x2)
(3)
\p(t, 1; xt) ... p(t, t; xt) J
We can rewrite equation (2) in matrix form using (3) as follows:
Ei(n) = Ki(n) + Sn(n)Ei (n), (4)
where Ki(n) = (K1(x1),..., Kf:(xt)), and Kj(xj) is the player i’s payoff in state rj on condition that strategy profile xj G Xj is realized in this state.
Equation (3) is equivalent to the following one:
Ei(n) = (I - Sn(n))-1 Ki(n), (5)
where I is an identity t x t matrix.
Remark 2. Matrix (I — Sn(n)) 1 always exists for S G (0,1). It is not difficult to prove this statement. It is known that all the eigenvalues of stochastic matrix n(n) are in the interval [—1,1]. For the existence of matrix (I — Sn(n)) 1 it is necessary and sufficient that the determinant of matrix (27(rj) — |l) be not equal to zero. So matrix (n(ri) — |l) must not have the eigenvalue equal to The last condition takes place because ^ > 1, so this number cannot be the eigenvalue of stochastic matrix n(n).
The expected payoff of player i in the stochastic game G we can find in the following way:
Ei(ri) = ir°Ei(ri). (6)
Suppose now, that the players from N decide to cooperate to receive the maximum total payoff. Denote the pure strategy profile maximizing the sum of the expected players’ payoffs in stochastic game G as r}(-) = (%(•),... i-e-
max_ 771 ,eiP„ Si ieN ieN
Ei(r]) = J2Ei(v)- (7)
Problem (6) may have more than one decision. Call strategy profile Jj(-) satisfying
(6) as cooperative decision.
The coalitional form of noncooperative game is usually given by the pair (N, V), where N is the set of players and V is a real-valued function, called the characteristic function of the game, defined on the set 2N (the set of all subsets of N), and satisfying two properties (1) V(0) = 0, and (2) (superadditivity) for any disjoint coalitions S,T C N, S fl T = 0, the next inequality is satisfied: V(S) + V(T) < V(S U T). The value V(S) is a real number for each coalition S C N, which may be interpreted as the worth or power of coalition S when its members play together as a unit. Condition (2) says that the value of two disjoint coalitions is at least as great when they play together as when they work apart. The assumption of superadditivity is not needed for some of the theory of coalitional games, but it seems to be a natural condition.
Define the characteristic function V(S) in stochastic game G via characteristic function Vj (S) of stochastic subgames Gj, j = 1,...,t, as follows:
V (S) = n0V (S) (8)
for any coalition S C N where V (S) = (V X(S),... ,Vf'(S)), Vj (S) is the value of the characteristic function of stochastic subgame Gj derived for coalition S.
The task is to determine the characteristic function Vj (S) for any coalition S.
Firstly, consider S = N. Bellman equation (Bellman, 1957) for the value V(N) can be written as follows:
V (N) = max^ ne n s
J2Ki(n)+ Sn(n)V (N)
ieN
= YlKi(rj) + Sn(rj)V(N),
ieN
EN
where T}(-) is the pure strategy profile satisfying condition (6).
The value V(N) is got from the previous equation:
V(N) = (I-SII(ri)r1J2Ki(ri)- (9)
ie N
Secondly, consider S C N, S = 0. To define the value of characteristic function
Vj(S) for this coalition, j = 1,.. .,t, for each subgame Gj, define an auxiliary zero-sum stochastic game GjS where coalition S C N plays as a maximizing player and coalition N\S plays as a minimizing player. Define the value of function Vj (S) for subgame Gj as a lower value of antagonistic stochastic game GjS in pure stationary strategies (in fact, the lower value of matrix game):
Vj (S) = max min V Ej (ns, Vn\s),
ns nN\S its
where (ns(•), Vn\s(•)) is a strategy profile in pure stationary strategies and r/s(•) = (r®1 (•),---,Ггк (•)) is a vector of stationary strategies of players i1,...,ik Є 5,
k ___
il ... ik = S, rS(•) Є П , the set of pure stationary strategies of coalition
j=1
S C N, and nN\s (•) is a vector of stationary strategies of players ik+1,. ..,in Є N\S,
n ___
*fc + 1 II-IK =N\S, П -ij, the set of pure stationary strategies of coalition
j=k+1
N\S.
Finally, consider S = 0 and get the value of characteristic function:
Vj (0) = 0. (11)
Remark 3. Characteristic functions V(S) determined by (10) and V^(S) determined by (7)—(11) are superadditive.
Definition 3. Cooperative stochastic subgame G?co is a set (N, V?(•)}, where N is the set of players, and V? : 2N —> R is the characteristic function calculated by
(7)-(11).
Definition 4. Cooperative stochastic game Gco is a set {N,V(-)), where N is the set of players and V : 2N —> R is the characteristic function calculated by (10).
Definition 5. Vector aj = (al,..., ajn) satisfying the two following conditions:
1) £ a = vj(n),
ieN
2) a? ^ Vj({i}) for any i Є N,
is called the allocation in subgame Gjco (j = 1,...,t). Denote the set of allocations in cooperative subgame Gjco as Aj, j = 1,...,t.
Definition 6. Vector a = («і,... ,an), where ~оц = 7r°Q!j, оц = (aj,..., a|), and (a\,... ,ajn) = a? Є Ij is called an allocation in cooperative stochastic game Gco. Denote the set of allocations in cooperative stochastic game Gco as /.
Suppose that the set of allocations in any subgame G?co, j = 1,...,t, is nonempty. So the set of allocations in cooperative stochastic game Gco is also nonempty.
4. Principles of stable cooperation
4.1. Subgame consistency of cooperative agreement
Suppose that players cooperate in stochastic game and for every subgame G?co choose allocation a? = (a? . ..,a?n) Є I?. The problem is how to realize payments to the players at each stage of the stochastic game for getting the expected payoff a? for player i in stochastic subgame G?. If players receive payoffs according to their payoff functions in the states they hardly ever get the components of the chosen allocation in mathematical expectation sense. To find the way out of the situation we should suggest the method of redistribution of total players’ payoff in every state realized in the stochastic game process. This method was proposed by Petrosyan and Danilov, 1979, for differential games.
There are two principles of constructing the real payments to the players in the dynamic game adapted to the theory of stochastic games:
1. The sum of payments to the players in every state is equal to the sum of players’ payoffs in strategy profile realized in this state according to cooperative decision
2. The expected sum of payments to player i in each subgame Gj is equal to the ith component of an allocation in subgame Gjco that players have chosen before the beginning of the game.
Taking into account that in stochastic game (12) the number of subgames is equal to the number of possible states we should find vector !jii = (i1 ,...,it) for every i G N, where ij is a payment to player i in state rj, j = 1,...,t. And these payments have to satisfy the two above principles and if so then these payments can be called payoff distribution procedure (PDP) (see Petrosyan and Danilov, 1979).
Find the conditions that the new payoffs of the players are satisfied the principles of PDP in the terms of stochastic games.
1. The first principle is equivalent to the following equation:
E^ = E^)’ (12)
ie N ie N
where is an action profile realized under cooperative decision rj(-) in state
rj, j = 1,...,t.
2. To find the condition of the second principle we need to work out the expected total payoff of player i in stochastic subgame with new payments ij in state rj, j = 1,...,t. Denote this value as Bi and write the recurrent equation for this value:
t
Bj = ij + ^p(j,k; xj)Bk,
k=1
or in vector form:
Bi = pi + Sn(rj)Bi, (13)
where Bi = (B1,.. .,Bt). Equation (13) is equivalent to the following one:
Bi = (I - sn(n))-1 ii. (14)
In respect to the second principle of PDP and equation (14) we obtain the following equation for ii:
on = (I - Sn(n))-1 ii, (15)
where ai = (a1,..., at), (a]_,..., ajn) = aj G Ij. Equation (15) can be rewritten
in an equivalent form:
A = (I - Sn(rj))ai. (16)
It is easy to show that ¡3i found from (16) satisfies (12). As ij is equal
ie N i
to (I — SII(r})) £ Q.i = (I — SII(rj))V(N), and V(N) is found from (7) then
ie N
equation (12) holds.
Remark 4. Equation (16) equals the following functional equation
on = A + SII(r])ai.
(17)
The second item in the right part of equation (17) is the expected value of the component of the allocation in subgame beginning with the next stage. Suppose that the allocation for each subgame is chosen from the same optimality principle that has been chosen by the players at the beginning of the game.
Obviously, if players keep to cooperative decision rj(-) the expected payoff of player i in stochastic game with new payments in some action profiles (which are realized in the states under cooperative decision rj is equal to the expected value of the correspondent component of the allocation in cooperative stochastic game Gco.
Now for every allocation a = («i,... ,an), where a* = Tr°ai, on = (aj,..., aj), (aj,. ..,ajn) = aj G Ij we can determine the regularization of stochastic game G by the following definition.
Definition 7. Noncooperative stochastic game Ga (subgame Gja, j = 1,...,t) is called a-regularization of stochastic game G (subgame Gj), if for any player i G N in state rj payoff function Ka,j (xj) is defined as follows:
where PDP 3 = (31, ■ ■ ■ , in) (Petrosyan and Baranova, 2006) is found from (??).
The procedure of regularization of the stochastic game G (subgame Gj ) suggests a method of construction of real payments to the players in every state and one can insist that players are interested in the redistribution of their payoffs because getting il,. ■ ■,iti in states r 1,...,rt respectively player i receives the same sum (in terms of mathematical expectation) in game Ga (Gja) as he has planned to receive in the cooperative stochastic game Gco (Gjco) and the expected sum of the remained payments will belong to the same optimality principle which has been chosen by the players at the beginning of the game. In this case we can say that subgame consistency (dynamic consistency) of the chosen cooperative agreement takes place.
4.2. Strategic stability of cooperative agreement
Introduce the additional notations. Set r(k) as the state realized at stage k of stochastic game G. It is obvious that r (k) G {r1, ■ ■ ■ , rt}. Write x(k) the strategy profile realized in state r(k). Set the subgame of stochastic game Ga from Definition 3 beginning from state r(k) as GO. k.
Call the sequence ((r(1), x(1)),(r(2), x(2)), ■ ■ .,(T(k — 1), x(k — 1))) the history of stage k and denote it as h(k). Let T be {(-T1, x1), (-T2, x2),..., (I7*, x*)}.
Stochastic game G and Ga are games with perfect information in the sense that at each stage k (k = 1, 2, ■ ■ ■ ) all players from N know state r(k) and the history of stage k.
(18)
Definition 8. We call behavior strategy profile <p*(■) = (^|strong
transferable equilibrium in regularized game Ga if for any coalition S C N, S = 0,
the inequality
E^V)>E^V 11^) (is)
ies ies
is true for any behavior strategy of coalition S: s(■) = {^i(-)}ies € Ü ^, and
ieS
Ei (•) is the expected payoff of player i in a-regularization of stochastic game G.
Theorem 1. If in a-regularization of stochastic game G with a such that a = n0a, the following inequality holds for any coalition S C N, S = 0:
A ^ (I — SII(r}))F(S), (20)
ies
where F(S) = (FX(S),..., F‘(S)),
Fi(S) = max | Kf II xs) + ^ £ PÜi II x^s)Vl (&) 1, then in regula,r-
xjSe n xj [ie_s ¡=1 J
ÍES
4^4
ized game Ga there exists a strong transferable equilibrium with payoffs (ai,. . .,an). Proof. Consider the behavior strategy profile p(0 = (pi(-),..., <pn(-)) in game Ga:
xj, ifr(k)=ri,j = l,t,h(k)cT;
x{ (S), if r(k) = ri, j = M, 3 I e [1, k — 1]
and S C N, i / S: h(l) C T,
<Pi(h(k))
(21)
and (r(l),x(l)) / T, V '
but (r(l),(x(l) || xs(l)) & T, anyone in other cases,
where xj (S) is the player i’s pure strategy in state rj which with strategies xp(S), p = i, p £ S forms the strategy of coalition {N\S} in antagonistic game against coalition S in subgame Gr.
The proof of the theorem repeats the proof of folk theorems (see Dutta, 1995) using the structure of strategy (21). Prove that x(-) = (Xi(’), ■ ■■, Xn(')) determined in (21) is a strong transferable equilibrium in stochastic game Ga.
From definition (21) it follows that on condition that all players keep cooperative decision rj(-) the expected payoff of coalition S in subgame G^a, j = 1,..., t, is equal to the following one:
4(W) =
ies ies
Let Es(X(-)) be equal to vector (Eg(ip(-)),...,Ets(X('))) then for any coalition S C N, S = 0 the next equality takes place:
Es{(p) = il-5nm-lYJPi■ (22)
iES
Consider strategy-profile (p(-) || pS(■)), S C N, S = 0, when some coalition S deviates from strategy pS(■). Let stage k be such that there exists number l G [1,k — 1] such that history h(l) C T and state (r(l),x(l)) G T but (r(l), (x(l) || xs(l))) € T. Without loss of generality suggest that r(k) = . Determine the
payoff of coalition S in game Ga in strategy profile (p(-) || ps(■)) by formula
E Ei(v II Vs) = 7T° E E?(,V II vs), Where ieS ieS
EEi(v I PS) = EE“’[1’fc^1](X || Ps)
ieS ieS
+ 5k-1nk-1($ II PS) E Ea,[k,CX})(P II PS), (23)
ieS
where the first term in the right side of equation (23) is the expected payoff of
coalition S at the first k — 1 stages of game Ga, Ea’^k,(X') (p || pS) in the second
ieS
term is the expected payoff of coalition S in the subgame of game Ga beginning from stage k. Since there were no deviations of any coalition from the cooperative decision rj(-) up to stage k — 1 inclusive as it was shown before the following equalities holds for the elements of the right side of (23):
E£“’[1,fc-1](£ ii ^) = E^“’[1,fc~1](^
ieS ieS
nk-\$\\ lfs) = nk-1(r]).
In the second term of the right side of (23) as Ea^k,(x)(p || pS) we mean vector (Ea,1(p | ps),..., Ea,t(p | ps)) where Ea,j(p || ps) is the expected payoff of player i G S in regularized subgame Gja beginning with state rj.
Find the expected payoff of coalition S in subgame Gja beginning with stage k and state r(k) is equal to rj. The following formula takes place:
t
E^“’J'(PII <ps) = E^' II 4) + SY,P^l^j II 4)Vl(S), (24)
ieS ieS l = 1
because players from coalition N \ S will punish coalition S playing the antagonistic game against coalition S beginning from stage k + 1 according to the definition of strategy profile p(-). In (24) the value of characteristic function Vj(S) is determined by (9).
Since the expected payoffs of coalition S in strategy profiles p(-) and (p(-) || pS(■)) are equal up to stage k — 1, then as a result of deviation coalition S can guarantee the increase of payoff only at the expense of the part of game G a beginning with stage k, i.e. at the expense of the expected payoff in subgame Gja, j = 1,...,t. Coalition S in strategy profile (p(-) || pS(■)) can guarantee the following expected payoff from stage k:
. max l^Kfix3 \\ x]s)+S^2p(j,l;x3 \\ x]s)Vl (S)\ . (25)
xSe nsX ti J
4^4
According to the definition of PDP the expected payoff of coalition S in regularized subgame Gja in strategy profile p(-) can be found from the equation:
££?(£) = (26)
ieS ieS
where E\a(p(-)) = (E(*’1(<?(■)), ■ ■ ■ , E(*’t(p(-))- Taking into account inequality (20) from (25), (26) and reasoning presented above we can obtain inequality
E°s(£(•)) > Ea(P(.) || PS(•))■
Thus behavior strategy profile (21) is the strong transferable equilibrium in a-regularization of game G. The expected payoff of player i in game Ga in strategy-profile p(-) is equal to a* where a* = Tr°on and vector on = (aj,..., a\) consists of ith components of allocations a1, ..., at derived for cooperative subgames G1, ■ ■., Gt accordingly.
Corollary 1. If a-regularization of game G for any player i € N the next inequality takes place:
A > (i - 6n(rj))Wi,
where Wi = (W1, ■■■, Wt),
Wj = max < Kj (xJ || x?i) + S ^ p{j, I', xJ II x^V1 ({*}) >, then in a-regularization
xj exj I l=1 )
x{^x{
of stochastic game G there exists Nash equilibrium with players’ payoffs (ai, .. . ,an).
4.3. Condition of irrational behavior proofness
To protect the players against losses in cases when cooperation breaks up at some stage of the game it is necessary that the following equation takes place for every i € N and any k = 1, 2, ■ ■ ■.
!/({*}) < EflM + 5knk(ri)(27)
where Ea^1,k is the mathematical expectation of player i’s payoff at the first k stages of regularized game G .
We suppose that before the beginning of the next game stage players know if the cooperation has broken up or not, i.e. information delay is not supposed in such a problem definition. In the left side of inequality (27) there is the value of characteristic function V({i}) = (V 1({i}), ■ ■■, ^({i})) derived for player i where Vj({i}) is the value of characteristic function of player i in subgame Gj. In the right side of inequality (27) the first term is equal to the expected value of player i’s payoff if at the first k stages of the game players keep to cooperative decision n( ), the second term is the expected payoff of player i beginning with stage k + 1 if player i plays independently from this stage.
Proposition 1. In stochastic game Ga for condition of irrational behavior proofness it is sufficient that the following inequality takes place for any i € N:
(I - SII(r}))(ai - ^({i})) > 0, (28)
where ai = (a1, ■ ■ ■ ,at) and aj is the ith component of allocation aj € Ij.
Proof. Show condition (28) is sufficient for inequality (27) for any k =1, 2,„ ■. We use the mathematical induction method for the proof.
Rewrite inequality (27) for k = 1:
V({i})^Pi + 6II(ri)V({i}). (29)
Transform inequality (28) considering definition ai using PDP (15) and get inequality (29).
Suppose that from the truth of inequality (28) the truth of inequality (27) for k = l follows. Rewrite inequality (27) for k = l:
nw) < A + • • • + ¿'^ii'-^A + (30)
Proof the statement for k = l + 1. Inequality (27) for k = l + 1 is as follows:
nw) < A + • • • + ¿'iT‘(j7)A + Sl+1nl+1(rj)V({*}). (31)
We should proof that if (28) is true then inequality (27) takes place for k = l + 1. After the transformation the right part of inequality (31) will have the following form:
A + sn(rj) {A + <5iT(i?)A + • • • + s'^n1-1^)A + ¿‘iT'^y«*})}
Taking into account inequality (30) the expression in braces is not less than V({i}), so the right part of inequality (31) is not less than A + ^n(r})V({i}). Considering the definition of PDP (16) and inequality (28) we get the truth of inequality (27)
for k = l + 1. So the statement is proved.
5. Example
Consider the following stochastic game G:
1. The set of players is N = {1, 2}.
2. The set of states is {r1, r2}, where rj = (N, Xj, Xj, Kj, Kj), j = 1, 2, X{ = {x,11,x112} is the set of actions of player 1, and Xj = {xj21,x:!22} is the set of actions of player 2. For state r1 players’ payoffs are determined as follows:
((2; 2) (1; 12)A
V(H;3) (5;4) y ■
And for state r2 players’ payoffs are determined as follows:
((3; 1) (2; 7) A \(8; 2) (5; 3)y ■
3. Transition probabilities from state r1 look like
((0^6; 0^4) (OJ^A ^(0^3; 0^7) (0^6; 0^4^ ,
where element (k, l) of the matrix consists of the transition probability from state r1 to states r1, r2 accordingly on condition that player 1 chooses kth action and player 2 chooses lth action in state r1.
Transition probabilities from state r2 look like
((0^8; 0^2) (0^3; 0^7)\
^(0^3; 0^7) (0^2; 0^8^ ■
4. The discount factor is 5 = 0^99.
5. The vector of the initial distribution on the set of states is n0 = (1/2,1/2).
Determine the cooperative form Gco of game G described above. Firstly, calculate the cooperative decision rj = (i?i, i?2) in stationary strategies using (5) and (6). We obtain the unique stationary strategy profile r}1 (r1) = x\t, rj^r2) = x\2,
mir1) = x22, mir2) = x21-
Secondly, work out the values of characteristic function V(•) = (V 1(-),V2(-)) for all possible coalitions using (7)-(11):
vv)=(;;;), no»=(^¡s), nm>=(^), no,2})=(j^).
Using (10) calculate the values of characteristic function V(-) for all possible coalitions:
F(0) = 0.00, F({1}) = 500.00, V({2}) = 333.61, V({1, 2}) = 1150.00.
So, we determine the cooperative stochastic subgame Gjco as the set (N,Vj (•)),
j = 1,2, and cooperative stochastic game Gco as the set (N, Vr(-)).
Finally, suppose that players choose for example Shapley value (Shapley, 1953b) as allocation of their total payoff in cooperative stochastic game Gco and in all subgames Gjco, j = 1, 2.
Shapley values calculated for subgames look like:
(659^ (493^
a1 = ^657^3^ , a2 = ^490^1^ ,
where ai = (a1, a2), and aj is the ith component of Shapley value of subgame Gjco using characteristic function Vj(•), j = 1, 2, i € N. Then taking into account vector of initial distribution 7r° determine the allocation a in Gco by Definition 8:
a = (ai,a2) = (658.20,491.80).
Verify the principles of stable cooperation. To satisfy the principle of subgame consistency we should calculate the PDP for the allocation a equals to n°a using (16):
* = (6:08) • *=(5:92) ■
As we can see not all the components of PDP are equal to the corresponding payoffs of players in the states. So if players get their payoffs determined by the initial rules of the game they can’t receive the components of the chosen allocation (in the mathematical expectation sense). It tells about subgame inconsistency of chosen cooperative agreement. Realize a-regularization of initial stochastic game G using PDP and Definition 3. In a-regularization of game G players’ payoffs in state r1 are as follows:
/(2; 2) (7^08; 5^92)\
\(11; 3) (5; 4) ),
and in state r2 players’ payoffs look like:
( (3; 1) (2;7)\
^(5^92; 3^92) (5;3)^
Check the second principle of stable cooperation which is strategic stability of cooperative agreement. For this purpose verify the truth of inequality (20) from Theorem 1. Compute F(S) for all S C N and obtain that F({1}) = (500^00,498^00), F({2}) = (332^44, 332^78). Then inequality (20) is equivalent to the following ones:
a=(e:o!) * (S:so)=c-
*=(3:0?) * (s:^)=f-
They are true and we can say about strategic stability of the players’ chosen cooperative agreement.
Verify the third principle condition of irrational behavior proofness. The sufficient condition for this principle from Proposition 1 holds as you can see here:
(I-Sn(rj))(c‘i~V({l}))=(^^ >0,
(I-Sn(rj))(c‘2~V({2}))= (^) >0.
For this numerical example we have made the a-regularization of the initial game G, and checked the principles of stable cooperation, they all are satisfied.
References
Altman,E., El-Azouzi, R. and Jimenez, T. (2003). Slotted Aloha as a stochastic game with partial information. Proceedings of Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt ’03), Sophia Antipolis, France.
Amir, R. (2003). Stochastic games in economics: The latter-theoretic approach. Stochastic Games and Applications in A. Neyman and S. Sorin (eds.), NATO Science Series C, Mathematical and Physical Sciences, Vol. 570, Chap. 29, 443-453.
Bellman, R. (1957). Dynamic programming. Princeton University Press, Princeton, NJ. Dutta, P. (1995). A folk theorem for stochastic games. Journal of Economic Theory 66, 1-32.
Grauer, L. V. and Petrosjan, L. A. (2002). Strong Nash Equilibrium in Multistage Games.
International Game Theory Review, 4(3), 255-264.
Herings,P. J.-J. and Peeters, R. J. A. P. (2004). Stationary Equlibria in Stochastic Games: Structure, Selection, and Computation. Journal of Economic Theory, Vol. 118, No. 1, 32-60.
Parilina, E. M. (2010). Cooperative data transmission game in wireless network. Upravlenie bol’simi sistemami, 31.1, 93-110 (in Russian).
Petrosyan, L. A. (1977). Stability of the solutions in differential games with several players.
Vestnik Leningrad. Univ. Mat. Mekh. Astronom, 19(4), 46-52 (in Russian). Petrosjan, L. A. (2006). Cooperative Stochastic Games. Advances in dynamic games, Annals of the International Society of Dynamic Games, Vol. 8, Part VI, 139-146. Petrosjan, L. A. and Baranova, E. M. (2006). Cooperative Stochastic Games in Stationary Strategies. Game Theory and Applications, XI, 7-17.
Petrosjan, L. A. and Danilov, N. N. (1979). Stability of the solutions in nonantagonistic differential games with transferable payoffs. Vestnik Leningrad. Univ. Mat. Mekh. Astronom. 1, 52-59 .
Petrosyan, L. A. and Zenkevich, N. A. (2009). Principles of dynamic stability. Upravlenie bol’simi sistemami, 26.1, 100-120 (in Russian).
Petrosyan, L.A., Baranova, E. M., Shevkoplyas, E. V. (2004). Multistage Cooperative Games with Random Duration. Optimal control and differential games. Proceedings of Institute of mathematics and mechanics, 10(2), 116-130 (in Russian).
Raghavan, T. E. S. (2006). A Stochastic Game Model of Tax Evasion. Advances in dynamic games, Annals of the International Society of Dynamic Games, 2006, Vol. 8, Part VI, 397-420.
Shapley, L. S. (1953a). Stochastic Games. Proceedings of National Academy of Sciences of the USA, 39, 1095-1100.
Shapley, L. S. (1953b). A Value for n-person Games. Contributions to the Theory of Games-II, by H.W. Kuhn and A.W. Tucker, editors. (Annals of Mathematical Studies, Vol. 28), Princeton University Press, 307-317.
Yeung, D. W. K. (2006). An irrational-behavior-proof condition in cooperative differential games. International Game Theory Review, 8(4), 739-744.