Contributions to Game Theory and Management, XI, 129-195
A Survey on Cooperative Stochastic Games with Finite and Infinite Duration*
Elena Parilina
St. Petersburg State University, 7/9 Universitetskaya nab., Saint Petersburg 199034, Russia E-mail: [email protected]
Abstract The paper is a survey on cooperative stochastic games with finite and infinite duration which based on the author's and coauthors' publications. We assume that the non-cooperative stochastic game is initially defined. The cooperative version of the game is constructed, the cooperative solutions are found. The properties of cooperative solutions of the game which are realised in dynamics are considered. Several numerical examples of stochastic games illustrate theoretical results.
Keywords: cooperative stochastic game, cooperative solution, imputation distribution procedure, subgame consistency.
1. Introduction
The paper is an overview of the results obtained in the theory of cooperative stochastic games by the author and her coauthors (Baranova, 2006, Parilina, 2014, Parilina, 2015, Parilina, 2016, Baranova and Petrosjan, 2006, Parilina and Petro-syan, 2017, Parilina and Tampieri, 2018, Petrosyan et al., 2004, Petrosjan and Baranova, 2005, Petrosyan and Baranova, 2003, Petrosyan and Baranova, 2005a, Petrosyan and Baranova, 2005b).
The starting point of stochastic game theory is a publication of L. Shapley (Shapley, 1953a), in which the existence of value of a zero-sum stochastic game with a finite set of players' strategies is proved. A generalization of this result for the case of n-person stochastic game was obtained in the papers (Fink, 1964) and (Takahashi, 1964), in which it was proved that equilibrium exists in stationary-strategies in a stochastic game with a compact set of strategies and a finite set of states. Many papers are devoted to the proof of the existence of the Nash equilibrium in various classes of strategies, studying stochastic games with incomplete information, asymmetric players, stochastic games of a special structures (see the following publications: (Solan and Vieille, 2002, Vieille, 2000, Mertens and Neyman, 1981a, Mertens and Neyman, 1981b, Neyman, 2008, Neyman, 2013, Nowak, 1985, Nowak, 1999, Nowak and Radzik, Horner et al., 2010, Solan, 1998, Jaskiewicz and Nowak, 2016, Neyman and Sorin, 2003, Solan, 2009, Solan and Vieille, 2015)).
The method of constructing a cooperative version of stochastic game realized on a finite tree was first proposed by L. A. Petrosyan in the paper (Petrosjan, 2006), where the problem of time consistency of the Shapley value was formulated and a method of regularization of time-inconsistent Shapley value is introduced. Then the method of constructing a cooperative version of stochastic game with infinite duration was proposed in the paper (Baranova and Petrosjan, 2006). Cooperative stochastic games of infinite duration with a finite set of strategies were later studied
* The work was supported by Russian Science Foundation, project no. 17-11-01079.
in (Kohlberg and Neyman, 2015, Parilina, 2015, Parilina and Tampieri, 2018). The principles of stable cooperation are formulated for dynamic and differential games in (Petrosyan and Zenkevich, 2015). The first principle is time consistency (or subgame consistency) of cooperative solutions which was initially proposed by L. A. Petrosyan (Petrosyan, 1977) for differential games.
The mechanism for determining payments to the players for regularization of time-inconsistent cooperative solutions using the so-called imputation distribution procedure was introduced by L. A. Petrosyan and V. V. Danilov (Petrosyan and Danilov, 1979). Further, the problem of constructing time-consistent cooperative solutions was studied in the paper (Petrosyan and Shevkoplyas, 2000) for differential games with random duration, and in (Yeung and Petrosyan, 2011) for dynamic games with random duration.
The second principle of stable cooperation in dynamic and differential games is strategic consistency of a cooperative solution which was initially proposed in (Petrosyan, 1998). This principle is relevant and can be adapted for various classes of differential and dynamic games (Shevkoplyas, 2010, Petrosjan and Grauer, 2002, Petrosyan and Chistyakov, 2013, Petrosyan and Sedakov, 2015).
The third principle of stable cooperation is irrational-behavior-proof which was formulated by D. W. K. Yeung (Yeung, 2006) and then was applied for linear-quadratic games (Tur, 2014, Markovkin, 2006). The conditions for stable cooperation with Markov processes, which allow players' cooperation, including irrational-behavior-proof condition, are formulated in (Avrachenkov et al., 2013).
Time consistency condition was extended for the case when the cooperative solution is a set (containing more than one imputation) in (Petrosyan, 1993) and was called strongly time consistency. Recently, this condition is investigated in various classes of games (Gromova and Petrosyan, 2015, Sedakov, 2015, Chistyakov and Petrosyan, 2011, Parilina and Pet ro syan, 2017).
The paper is organized as follows. Section 2 contains results on cooperative stochastic games with finite duration while Section 3 is devoted to cooperative stochastic games with infinite duration. We briefly conclude in Section 4.
2. Cooperative stochastic games with finite duration 2.1. Non-cooperative stochastic games
We define a finite stochastic game played on a graph. Let ^ = (Z, L) be a finite graph of a tree structure, where Z is the set of vertices of the graph, and L : Z —> Z
Z
of set Z. The vertex z0 is the initial vertex of the tree graph We denote the terminal vertices of graph ^ by ZT c Z, that is, the vertices z for which L(z) = 0. The finite tree graph with initial vertex z0 is denoted by ^ (z0).
Let at each vertex z e Z of the graph &(z0) the normal form game of n players
r(z) = (N,Af,...,Al,Kt,...,Kn),
be given, and N = {1,2,..., n} is a finite set of players, the same for all vertices z e Z; AZ is a finite set of actions of player i e N, K?(af,..., azn) : njeN Az ^ R is a payoff function of player i, a? e A?. The collection of actions az = (af,..., a^), a? e AZ, i e N, is called an action profile in the game r(z). And az e Az = n AZ,
ieN
Az is the set of action profiles in game r(z).
For each vertex z e Z we define the transition probabilities to the vertices y e L(z) of the graph &(z0) following the vertex z. These probabilities depend on the action profile az realized in the game r(z). Thus, for each vertex z e Z we define a function p(-|z, az) : Az ^ A(L(z)), where A(L(z)) is a probability distribution over the set L(z):
p(y|z, az) > 0,
J2 p(y|z, az) = 1
yeL(z)
for any action profile az e Az. The value p(ylz,az) is the probability that at the next stage, the game r(y) will be played, y e L(z), if at the previous stage in the game r(z), the action profile az = (af,..., an) has been realized.
We also suppose that the duration of the game is random which values are 0, 1, ...,/, and / is the length of the game (by the length of the game we mean the number of stages in the game of maximal possible path). Define probabilities qk of the event that the game will end at stage k. Notice that 0 < qk < 1, k = 0,..., / — l,q; = 1, /
stages in the game of maximal possible path); stage k at vertex z e Z in a stochastic game with random duration is determined from the condition: z e (L(z0))k■
Remark 1. Notice that the probabilities qk, k = 0,..., / are conditional probabilities and do not form probability distribution of the game duration. In case when all paths in graph ^(z0) have the same length /, the discrete distribution of a random variable equal to the game duration, determined by the conditional probabilities qk, is presented in Table 1, in which Pk is the probability that the game will end at k
Table 1. Probability distribution of the game duration.
k Pk
0 qo
1 (1 - qo )qi
2 (1 - qo)(1 - qi)q2
I (1 - qo)(1 - qi) • • • • • (1 - qi-i)
Definition 1. Stochastic game with random duration G(z0), where z0 is an initial vertex of a tree graph &(zo), is a set
G(zo) = (zo), {r(z)}zez, [qk}Lo, {p(-|z, az)}zez,a'eA*) . (1)
From the definition of a stochastic game with a random duration it is clear that the transitions from some vertices of the graph &(z0) to the others, as well as the final stage of the game are random.
Stochastic game with random duration G(z0) is played in the following way:
1. At vertex z0 of the graph &(z0), a simultaneous game r(z0) is played. Suppose that in this game action profile az0 e Az0 is realized by the players. Each
player i e N receives a payoff KZo (az0). The stochastic game G(z0) either terminates with probability q0, 0 < q0 < 1, or continues with probability 1 — q0 and transmits to the vertex y e L(z0) of the graph W (z0) with probability p(y|z0,azo), depending on the action profile az0 realized in the game r(z0). In
L(z0) z0
1.
2. Suppose that at sta ge k the game process is at the vertex zk e Z, at which the game in a normal form r(zk) is given. Let the action profile azk e Azk is realized in this game. Each player i e N receives a payoff KZfc (azk). Stochastic game either ends with probability qk, 0 < qk < 1, or continues with probability 1 — qk and transits to the vertex zk+i e L(zk) with probability p(zk+i|zk, azk), which depending on the action profile azk realized in game r(zk). In case when the set L(zk) is empty, the game terminates at the vertex zk with probability 1.
3. The stochastic game continues until the terminal vertex is reached or it may-end according to the realizations of probabilities q0, ..., q;.
We denote by G(zk) the subgame (see (Kuhn, 1950, Kuhn, 1953)) of the game G(z0) starting at the vertex zk e Z of graph the W(z0) (stating with the game r(zk)), which is also a stochastic game with random duration. Subgame G(zk) is defined on the subgraph W (zk) with the set of vertices Z (zk) c Z and is given by the quintuple
G(zfc) = <N, W(zfc), {r(z)}zez', {qs}U, {p(-|z, az)}
To solve the game you need to determine the set of players' strategies. We denote by f : Z ^ nzeZ ^(Az) the behavior strategy of player i in game G(z0), where 4(Az) is the set of mixed actions of the player i at the vertex z e Z. The strategy profile in stochastic game G(z0) is a collection of the players' strategies given by f = (f : i e N). Denote by Zi the set of behavior strategies of player i in the stochastic game G(z0), then Z = ieN Zi is the set of behavior strategy profiles in game G(z0). Obviously, the restriction of the strategy fi on subgraph W(zk) of graph W (z0) is a strategy in subgame G(zk). Denote this restriction of a strategy by f,zfc.
2.2. Main functional equations
Assume that in stochastic game G(z0) players implement strategies fi; i e N.
i
to a random variable equal to the game duration, and e. g., for the realized path zi e L(z0), z2 e L(zi), . . ., z; e L(z;_i), L(z;) = 0, we obtain
; k ; /fc_i \ / k \
Ei(z0) = E Pk E Kzj (azj) = E qk m (1 — qj) I E Kzm (azm) , k=0 j=0 k=0 \j=0 J \m=0 /
where azo e Azo, azi e Azi, ..., azi e Azi is a sequence of realized action profiles when players adopt strategies (fi : i e N).
Since transitions from the vertices to the following vertices are stochastic, we consider mathematical expectation of the player's payoff relative to random transitions from vertices to the following vertices as a player's payoff in the stochastic game. The mathematical expectation Ei(z0, f) of player i's payoff in the game sat-
isfies the functional equation
Ei (z0,<p) = q0K? (az0 ) + (1 — q0) Ikz0 (az0)+ £ p(yWaz0 )Ei(y,py )j (2)
\ yeL(zo) )
= KZ0 (az0 ) + (1 — q0) £ p(yWaz0 )Ei(y,py), yeL(zo)
where Ei (y, py) is the mathematical expectation of player i's payoff in the subgame G(y) stating at the vertex y e L(z0) of graph G(z0).
Assume that z e (L(z0))k, that is, the game process enters the vertex z e Z ki
G(z)
Ei(z,pz) = qkKz(az) + (1 — qk) Ikz(az)+ £ p^z,^)Ei(y,py
\ yeL(z)
= Kz(az) + (1 — qk) £ p(y^xz)Ei(y,py).
yeL(z)
To define a cooperative version of the game, it is necessary to determine a cooperative path (one of the cooperative paths, if there are several ones), that is, the path that maximizes the total players' payoffs. In the case of stochastic games, this is a subtree with the given transition probabilities, at which the maximum of the mathematical expectation of the total players' payoffs in the whole game is achieved. However, the maximum mathematical expectation of the total players' payoffs in mixed strategies is equal to the maximum mathematical expectation of the summarized players' payoffs in pure strategies. Therefore, we can restrict ourselves and consider the class of pure strategies to find cooperative strategies in the stochastic game.
2.3. Cooperative stochastic games with finite duration
Denote by p = (pf,... ,pn) the pure strategy profile in game G(z0) which maximizes the total mathematical expectations of the players' payoffs:
V(N, z0) = max
ves
£Ei (z0,p)
ieN
= £ Ei (z0,ip).
ieN
We call this strategy profile as a cooperative one. Let strategy profile p be such that pi (z) = az, i e N, z e Z. We can determine the cooperative strategy profile for any subgame G(z), z e Z, starting with simultaneous game r(z).
We construct a cooperative version of a stochastic game on the basis of a non-cooperative stochastic game with random duration G(z0) described above. For this purpose it is necessary to define the characteristic function for each subset S ( coali-
N
G(z), z e Z, is denoted by V(S, z), where S c N.
Characteristic function V(S, z) shows which total payoff can be obtained by
S
characteristic function that determines the cooperative game on the basis of a non-cooperative one. We introduce some of these approaches:
1. a-approach. In this case, V(S, z) is the maxmin value of the zero-sum game between coalitions S and N\S. Moreover, the maxmin is found in the pure strategies of coalition S. This approach can be described as "pessimistic", since V(S, z)
SS
N\S
book of Neumann and Morgenstern (von Neumann and Morgenstern, 1944).
2. ,0-approach. Following this approach, V(S, z) is the minmax value of the zero-sum game Gs between coalitions S and N\S. Moreover, the minimax is found in pure strategies. This approach can be considered as "optimistic". Comparison of a- and ^-approaches can be found in (Aumann and Peleg, I960).
3. The value of game Gs. In this rase, value V(S) is equal to the value of the zero-sum game Gs game between coalitions S and N\S. Moreover, this value always exists in mixed strategies, while it is equal to the maxmin and minimax of Gs. In case the minmax and maxmin are found in mixed strategies, the values of a-and ^-characteristic functions coincide.
4. 7-approach. According to this approach, V(S, z) is equal to the payoff of coali-
S
S
5. ¿-approach. Value V(S, z) is equal to the maximum payoff of coalition S in the
S
Nash equilibrium strategies optimal in the n-person game when all players act individually. This approach was proposed in (Petrosjan and Zaccour, 2003) and further considered in detail in the paper (Reddy and Zaccour, 2016).
6. C-approa^. In this case, V(S, z) is equal to the payoff of coalition S in the
S
N
SS in (Gromova and Petrosyan, 2016)).
a
tion S is equal to the maxmin value of a two-person zero-sum stochastic game Gs between coalitions S and N\S. This approach was used in the paper (Petrosjan, 2006), in which for the first time a cooperative stochastic game was constructed on the basis of a non-cooperative one and the problem of time-inconsistency of the Shapley value is considered.
We determine the values of the characteristic function. First we consider the case S = N N
tic game G(z0). For this purpose, we write Bellman's equation (see Bellman, 1957) for the maximum sum of the mathematical expectations of players' payoffs:
V(N, z0) = max
a"0
(az°) + (1 - qo) E P(y|zo, az0)V(N, y)
y£L(z°)
=
= E Kf (azo) + (1 — q0) E p(y|z0, azo)V(N,y)
i£N y£L(zo)
with boundary condition
V(N, z) = max y^ Kf (az), z e {z : L(z) = 0 or qk = 1} . aZ eAf i£N' i^N
Later on in this chapter, we suppose that z e (L(z0))k.
For the subgame of G(z), z e Z, the equation (3) with the initial condition (4) takes the form:
V(N, z) = max
eAz ieN
£KZ(az) + (1 - qk) £ (p(y\z, az)V(N,y))
ieN yeL(z)
= £ KZ(az) + (l - qk) £ (p(y\z, az)V(N,y))
yeL(z)
(5)
ie N
with boundary condition
V(N, z) = max Kz(az), z G {z : L(z) = 0 or qk = 1} .
a eA* ieNz ieN
(6)
Strategy profile (<fi : i e N) in stochastic game G(z0) generates the probability-distributions over set Z of the vertices of graph &(z0).
Definition 2. A subgraph of graph &(z0), which consists of the vertices z e Z of the graph &(z0), having positive realization probabilities, generated by the cooperative strategy profile <£(•), is called a cooperative subtree and denoted by &(z0).
Obviously, subgraph &(z0) is a finite tree graph. The set of vertices in graph &(z0) is denoted by CZ c Z.
Let S c N, S = N For each vertex z e CZ we define the auxiliary zero-sum game denoted by GS(z). It is a zero-sum game between coalition S c N acting as a maximizing player, and coalition N \ S acting as a minimizing player. In this
S
belonging to coalition S. Then, the value of the characteristic function V(S, z) is given by the lower value of zero-sum game GS(z) in pure strategies (similar to the lower value of the matrix game) 1.
Function V(S, z), z e CZ, satisfies the following functional equation
V (S,
as eas an\s ean\s
Y.K! (a
S, aN\S
) +
ie S
+ (1 - qk) p(y\z, (aS,aN\s))V(S,y)
yeL(z)
(7)
with boundary condition V(S,z)= max^ z min^ Kz(azS,azN\S), z G {z : L(z) = 0 or qk = 1} , (8)
asan\s\s i^z
where a,S = (az : i G S) is an action of coalition S AS = n A is the action set of
ies
coalition S aN\s = (az : j G N\S) is an action of coalition N\S; AN\s = n Az
jez\s
is the action set of coalition N\S.
In this chapter we use the a-approach for construction of the characteristic function, proposed by Neumann and Morgenstern (von Neumann and Morgenstern, 1944).
For all z e CZ it is natural to suppose that
V (0,z ) = 0. (9)
Thus, for each subgame G(z), z e CZ, we have determined the characteristic function V(S, z), S c N. The characteristic function V(S, z) is determined by the equation (5) with the boundary condition (6), and also the equation (7) with the boundary condition (8) and equation (9).
The characteristic function V(S, z) defined by formulas (5) - (9) is superadditive on S, i. e., for any vertex z e CZ and any coalitions S, P c N, S n P = 0, the inequality
V(S U P,z) > V(S, z) + V(P,z).
holds.
Definition 3. A cooperative stochastic game with random duration G(zo) constructed on the basis of non-cooperative stochastic game G(z0) is a tuple (N, V(S, z0)}, where V(S, z0) is a characteristic function defined by equation (5) with boundary condition (6) for coalition N, % equation (7) with boundary condition (8) for coalition S = N S = 0, and % formula (9) for coalition S = 0.
Definition 4. An imputation in cooperative stochastic game G(z0) is a vector £(z0) = (£i(z0),..., £n(z0)), satisfying two properties:
1. Collective rationality: ^ieN ^i(z0) = V(N, z0);
2. Individual rationality: &(z0) > V({i}, z0) for any i e N.
The set of imputations (see (Vilkas, 1990, Vorobiev, 1960, Vorobiev, 1967) and also (Vorobiev, 1985, Pecherski and Yanovskaya, 2004) for definitions of cooperative games) of cooperative stochastic game G(z0) is denoted by I(z0).
Definition 5. A solution of cooperative stochastic game G(z0) is a subset C(z0) of the set of imputations I (z0).
The solutions of a cooperative game can be conventionally divided into single-valued and multi-valued ones. The well-known single-valued solutions are the Shap-ley value (Shapley, 1953b), the nucleolus (Schmeidler, 1969). The most well-known multi-valued solution is the core (Gillies, 1959). Suppose that solution C(z0) of cooperative stochastic game G(z0) is a non-empty subset of the imputation set I(z0).
Definition 6. A cooperative stochastic subgame G(z), z e Z, of game G(z0),
G(z)
(N, V(S, z)}, where V(S, z) is the characteristic function defined by equation (5)
(6) N (7)
(8) S = N S = 0 (9) S = 0
Determine the imputation, the imputation set and the solution for any cooperative subgame G(z), z e Z.
Definition 7. An imputation in cooperative stochastic subgame G(z) is vector £(z) = (&1(z),..., £n(z)), satisfying two properties:
1- Ei£N Ci(z) = V(N, z);
A Survey on Cooperative Stochastic Games with Finite and Infinite Duration 137 2. Ci(z) > V({i}, z) for any i e N.
The set of imputations of cooperative stochastic subgame G(z) is denoted by I(z).
Definition 8. The solution of a cooperative stochastic subgame G(z) is a subset C(z) of the set of imputations I(z).
Suppose that solution C(z) of any cooperative subgame G(z) is non-empty subset of the imputation set I(z) for any z e CZ.
2.4. The Shapley value, core and nucleolus
In this section we define some cooperative solutions which will be used further in the work. The Shapley value of a cooperative stochastic game or subgame G(z), z e CZ, is a vector Sh(z) = (Sh\(z),..., Shn(z)), where an element Shi(z), i e N, is calculated by the formula
SMz) = £ ( 1 S |- -| S 1 )! (V(S, z) - V(S \ {i}, z)),
SCN !
S3i
|S| S
in Shapley's paper (Shapley, 1953b).
A core of a cooperative stochastic game or subgame G(z), z e CZ, is a set denoted by CO(z), and it is the set
CO(z) = i C(z) e I(z): £ Ci(z) > V(S,z) for VS c N, £ Ci(z) = V(N, z) 1 .
I ies ieN J
(10)
For the cooperative stochastic game or subgame G(z) and any vector C(z) e I(z), by 0(C(z)) we ^^^^^e the vector of the values of excesses e(S,C(z)) = V(S,z) — Ci(z) located in a descending order:
ies
0(C(z)) = (e(Si, C(z)), e(S2,C(z)),..., e(S2n_u C(z))),
where coalitions are numbers that e(Si,C(z)) ^ e(S2,C(z)) > ... > e(S2n-i,C(z))■ On the set of excesses {#(£(z)) : C(z) e I(z)} we consider the lexicographic ordering yiex:
0(C(z)) ylex d(^(z)) i e{i,..., 2n},
such that
Ok(C(z)) = dk($(z)), for all k = 1,...,i — 1; 0i(C(z)) >Oi(t(z)), where ^(z) e I(z).
The definition of the nucleolus is first introduced in (Schmeidler, 1969). The nucleolus of a cooperative stochastic game or subgame G(z), z e CZ, is a subset of the imputation set on which min >-lex is reached.
m*))
i(z)ei(z)
If C(z0) is the solution of cooperative stochastic game G(z0), then later on in the work by solution C (z) of cooperative sub game G(z) we mean a solution constructed according to the same "rules" as C(z0). For example, if C(z0) is the Shapley value
in stochastic game G(z0), then C(z) is the Shapley value, calculated for cooperative subgame G(z), z e CZ.
Here we assume that the players choose some fixed subset of the imputation set which contains the imputations satisfing "optimal" properties, i. e., the players N
tion N throughout the game process. Set C(z) may consist of a single imputation, if, e. g., the players have decided to use the Shapley value or the nucleolus, or it may be empty if, e. g., they have chosen the core and it is empty. The solution of the game or subgame G(z) can be any other imputations from the classical "static" cooperative theory, such as von Neumann-Morgenstern solution (or the so-called stable set), the kernel, M-stable sets (see Pecherski and Yanovskaya, 2004).
Further in the work we will suppose that C(z) is a nonempty subset of set I(z) for any z e CZ that is, for each vertex z e CZ there exists at least one imputation
£(z) = (£i(z),...,(z)) e C(z) c I(z).
2.5. Imputation distribution procedure
In this section we introduce the definition of an imputation distribution procedure of the cooperative stochastic game solution, which has been chosen by the players. The imputation distribution procedure determines the payments to the players at each vertex of the cooperative subtree #(z0).
Definition 9. A path in a stochastic game is the sequence of action profiles az°, azi, ..., azi, where azi is the action profile realized in the game r(z^, zi e L(zi_1),
1 = 1,...,Z.
Consider any vertex z e CZ z e (L(z0))k, of the cooperative subtree. Each player receives some payments implementing a cooperative agreement 2. Let at the vertex z e CZ the payment to player i e N be ^i(z). In any cooperative subgame G(z), the player can calculate the sum of the payments along the path az,..., azi = az'-"'zi, and this sum is a random variable. We denote by Bi(z) the mathematical expectation of the sum of such payments, calculated along the path segment az' -'zi in cooperative subgame G(z). The value Bi(z) satisfies the following functional equation:
Bi(z)= &(z) + (1 — qk) £ P(y|z,xz)Bi(y) (11)
y£L(z)
with boundary condition
Bi(z) = &(z) for z e {z : L(z) = 0 or qk = 1} . (12)
Now we define the distribution procedure of the imputation belonging to the cooperative solution C(z0), chosen by the players at the beginning of the game.
Definition 10. Let £(z0) be the vector (&1(z0),..., (z0)) e C(z0). The set of vectors {&(z) = (&i(z),..., &n(z)) : z e CZ} is called a distribution procedure of the imputation &(z0) if the following conditions are satisfied:
2 Obviously, all z,... ,zi € CZ, since CZ is the set of vertices of the cooperative subtree, and the strategy profile <p is determined.
1. For each vertex z € CZ:
E Pi (z) = E K (az)■
ieN ieN
2. The components £i(z0), i € N, of imputation £ coincide with the mathematical expectation of the corresponding components of the imputation distribution procedure with respect to the probability distribution of transitions and the end of the game, i. e., £i(z0) = Bi(z0), where Bi(z0) satisfies the functional equation (11) with the boundary condition (12).
For each cooperative subgame G(z), z € CZ, we write the functional equation for the components £i (z) of the imputation £(z) € C(z) c I(z) of type (11) and define the values Yi (z) from equation:
£i(z) = Yi(z) + (1 - gk) E P(v\z,xz)£i(y), (13)
yeL(z)
where £(y) = (£i(y) : i € N) is an imputation belonging to the solution C(y) of the cooperative subgame G(y). The boundary condition for Yi(z) is as follows:
Yi(z) = £i(z) for z €{z : L(z) = 0 or qk = 1} ■ (14)
Lemma 1. The vector y(z) = (Yi (z) : i € N) given by equation (13) with the boundary condition (14) is an imputation distribution procedure.
Proof. It is obvious that for terminal vertices and the vertices at which the probability of the game end equals one, and the equality (14) holds, conditions 1 and 2 of Definition 10 are satisfied.
Now we prove that these conditions are satisfied for the remaining vertices of the cooperative subtree. From (13) we express the values Yi (z) and summing them i€N
E Yi(z) = E £i(z) - (i - qk) El E p(y\z,xZ )£i (y)l ■ (15)
ieN ieN ieN \yeL(z) J
As we have
£(z) = (£i(z): i € N) € C(z) C I(z), £(y) = (£i(y) : i € N) € C(y) C I(y),
then from (15) we obtain:
EYi(z)= V(N,z) - (1 - qk) E P(y\z,xZ)V(N,y). (16)
ieN yeL(z)
From (16) and (5) it follows that ^2ieN Yi(z) = YI KZZ(az) for action profile
ieN ieN
az = (a,z : i € N), which has been realized in game r(z) when the players used a cooperative strategy profile (p. Therefore, Yi(z) satisfies Condition 1 of Definition 10.
Now we verify if Condition 2 of Definition 10 is satisfied. Specifically, we find the mathematical expectation of the sums Y^z), defined by formula (13), along the vertices of the cooperative subtree. For the vertices z e {z : L(z) = 0 or qk = 1}, Condition 2 is satisfied. Continue with the vertices of the cooperative subtree, from which the vertices mentioned above are reached with one stage. For these vertices, we obtain the equality:
Bi(z,)= &(z,) — (1 — q,) E p(y|z;,xzi )£i(y) + (1 — qi) ^ P^N,** )Yi(y)
yeL(zj) yeL(zj)
= & (zi),
because & (y) = Yi(y)- Following from the terminal vertices to the initial one, we prove that condition 2 of Definition 10 is satisfied. Lemma is proved.
2.6. Subgame consistency of cooperative stochastic game solution
Before the game starts, players come to an agreement about cooperation, i. e., they
N
and expect to receive the imputation &(z0) e C(z0). The game process takes place along the vertices of the cooperative subtree #(z0). But since the stochastic structure of the game implies uncertainty in realization of the vertices of the cooperative subtree, then moving along a certain path, that is, along the vertices of the cooperative subtree, does not yet ensure the support of cooperation. Indeed, players moving along the cooperative path get into cooperative subgames with the current initial states in which the same player may have different opportunities. Conditions of a conflict and players' opportunities involved in the conflict change over time. And it will be natural to require maintenance of the optimality principle or "approach" in the choice of solutions of cooperative subgames. But at some moment, at vertex z e CZ, the sum of the remaining payments to player i may not be equal to the ith component of the imputation from solution C(z) of a cooperative subgame G(z). Therefore, at vertex z e CZ player i may ask a question whether it is worth keeping the cooperative agreement to act "jointly optimally" proposed before the game i
this deviation is beneficial for at least one player, it means subgame inconsistency of imputation &(z0) e C(z0) and, accordingly, the motion along the vertices of the cooperative subtree.
Definition 11. An imputation &(z0) e C(z0) is called subgame-consistent in cooperative stochastic game G(z0) if for each vertexz e CZ n (L(z0))k there exists the imputation distribution procedure &(z) = (&(z) : i e N) such that
&(z)= &i(z) + (1 — qk) £ p(y|z,xz)&(y), (17)
y£L(z)
and
&(z) = &i(z), z e{z : L(z) = 0 or qk = 1} , (18)
where &(y) = (&i(y) : i e N) is an imputation belonging to solution C(y) of cooperative subgame G(y).
Remark 2. If C(z0) consists of more than one imputation, then the choice of the imputation &(z0) is indefinite. If players have chosen a certain imputation
£(z0) € C(z0) and decided to verify if it is subgame consistent, first it is necessary to check the condition (17) for the vertex z0. This means to verify if there exists the imputation distribution procedure P(z0) = (Pi(z0) : i € N), satisfying condition (17) for some imputation £(y) € C(y), where y € L(z0). Obviously, there is indefiniteness in the choice of imputation £(y) € C(y), which in its turn should also be subgame consistent in cooperative subgame G(y). This means that condition
(17) should be satisfied for imputation imputation £(y) € C(y). From Definition 11
z
the cooperative subtree.
Definition 12. We say that cooperative stochastic game G(z0) has subgame consistent solution C(z0) if all imputations £(z0) € C(z0) are subgame consistent.
Obviously, if the payments to the players are made at the vertices of the cooperative subtree in accordance with the initially defined payoff functions, it is impossible in general to achieve subgame consistency of the cooperative solution. This may lead to the breakup of the cooperative agreement. In this connection, the problem of finding a scheme or procedure of payments to the players at the vertices of the cooperative subtree in order to satisfy the property of subgame consistency of a cooperative solution. For this we need to find such an imputation distribution procedure (Pi (z) : i € N) for all vertices z € CZ, for which the conditions (17) and
(18) are satisfied.
Theorem 1. Let in the cooperative stochastic game G(z0) and each subgame the cooperative solutions C(z0) and C(z), z € CZ, be nonempty. If for each £(z) = (£i(z) : i € N) € C(z) the imputation distribution procedure is defined by the formula
Pi(z)= £i(z) - (1 - qk) Y, P(y\z,xz)£i(y), (19)
yeL(z)
for each z € CZ z € {z : L(z) = 0}, where £(y) = (£i(y) : i € N) € C(y), and by formula (18) for any z € {z : L(z) = 0}, then cooperative solution C(z0) is subgame consistent.
Proof. To prove subgame consistency of the cooperative solution C(z0), it is required to prove that for each vector £(z0) € C(z0) conditions (17) and (18) are satisfied.
From Lemma 1 it follows that the payments, determined by formulas (19) and (18), are the components of the imputation distribution procedure. Condition (17) follows from (19) taking into account that £(y) = (£i(y) : i € N) belongs to the cooperative solution of the subgame G(y).
The proposed method of implementing the imputation has an important property: at each vertex of the cooperative path, players are guided by the same "op-timality principle" (property of subgame consistency) and, in this sense, have no reasons for interruption of the previously adopted cooperative agreement and deviation from the cooperative strategy profile. The sum of payments to the players at each vertex of the cooperative subtree is also equal to the sum of the payoffs received by the players at that vertex (condition 1 of Definition 10 of an imputation distribution procedure). The latter condition may be called a condition of attainability of the payments, since players redistribute the sum which they obtain in the game and do not take any funds outside.
Notice that Definition 11 does not require the nonnegativity of functions & (z), where z e CZ All imputations belonging to the solution C(z) will be subgame
C(z) = 0 z e CZ
ble if the payments to the players are not made according to their initially defined payoffs in games along which the cooperative path realizes, but according to the imputation distribution procedure &(z) = (&1(z),...,&n(z)) defined by (17), (18) for all z e CZ, where &i(z) is the payment to player i at the vertex z e CZ. More-
i
i
solution chosen by the players. It follows from Theorem 1. Thus, players can agree on getting negative payments at some vertices to ensure that the cooperation is supported throughout the whole game in order to guarantee receiving the components of initially selected imputation &(z0) partition belonging to the solution C(z0) of the cooperative stochastic game G(z0).
2.7. Nonnegative components of imputation distribution procedure.
Regularization of imputations
ieN
is non-negative: Ki(xz) > 0 for all vertices z e CZ. Assume that the players are interested in receiving non-negative payments at each vertex of the cooperative subtree and at the same time they want to guarantee subgame consistency of the cooperative solution. In case when non-negativity of &i(z) cannot be guaranteed for all vertices z e CZ, one can construct new subgame-consistent solution based on the solution initially chosen by the players from the set C (z0). We present how this is done when the set C(z0) c I(z0) is considered as the solution. Notice that this procedure can be applied to the imputations well-known in the classical "static" cooperative game theory (core, nucleolus, von Neumann-Morgenstern solution).
For each vertex z e CZ define new imputation distribution procedure by
E Ki(äf V (N,z)
ßi(z) = iEN ^-&(z), (20)
where &(z) = (&1 (z),... ,&n(z)) e C(z), and az = (af,..., an) is the realization of the cooperative strategy profile 92 = (a(0,..., (•)) at vertex z e CZ maximizing the sum of mathematical expectations of the players' payoffs in stochastic game G(z0), V(N, z) is the value of characteristic function of coalition N calculated for cooperative subgame G(z).
As Ki(az) > 0 for each vertex z e CZ and each player i e N, then &i(z) > 0 for each vertex z e CZ Taking into account equation (20) and equity £ieN &i(z) = V(N, z), we obtain that the current payment & (z) to player i in game r(z) should be proportional to the ith component of the imputation &(z) e C(z) in cooperative subgame G(z) of stochastic game G(z0).
Determine a new imputation for cooperative subgame G(z), where z e CZ, and z e (L(z0))k on the basis of the "old" imputation &(z) as a solution of the functional equation
£ Ki(az)
&i (z) = i£V(N z) &i(z) + (1 — qk) E P(y|z, az )&i(y) (21)
( ,z) yei(z)
with boundary condition
E Ki(az)
kz) = teN(N,z) & {Z)= &{Z) (22)
for z € {z : L(z) = 0 or qk = 1}.
Construct a new characteristic function V(S, z) for each cooperative subgame G(z) for all z € CZ using functional equation
E Ki(az)
^(S, z) = iPv(N z) VS z) + (1 - qk) E p(ylz, az)V(S, y) (23)
( ,z) yeL(z)
with boundary condition
V(S, z) = V(S, z) for z €{z : L(z) = 0 ox qk = 1} . (24)
Functions V(S, z^d V(S, z) are superadditive, and V(N, z) = V(N, z) because V(N, z^d V(N, z) satisfy the functional equation (5) with boundary condition (6).
For all vertices of z € CZ and all subgame-inconsistent imputations &(z) € C(z), we compute the regularized imputations &(z) and define the set of solutions C(z) as follows:
f E Ki(az)
C(z) = \ &(z) : V(z) = ieVN(N , &i(z) + (1 - qk) E P(ylz, az)ii(y), (25)
L ( ' ) yeL(z)
V(z)= &i(z) for z €{z : L(z) = 0 or qk = 1} ,&(z) € C(z)!.
Definition 13. The set C(z0) defined % formula (25), is called the regularized solution of the cooperative stochastic game G(z0).
Therefore, players have an opportunity to regularize the solution chosen at the beginning of the game so that at each vertex of the stochastic game G(z0) "new" solution C(z0) is subgame consistent. But the imputation belonging to the new regularized solution C(z0), generally speaking, will not be an imputation for cooperative game with the characteristic function V(S,z0), defined by (7) and (8). It will be an imputation for a cooperative stochastic game with a new characteristic function V(S, z0) defined % formulas (23), (24).
Theorem 2. An imputation £(z) = (£i(z),...,£n(z)), defined by formula (21) (22)
(N, V) where characteristic function V(S, z) is defined by functional equation (23) with boundary condition (24).
Proof. Subgame consistency follows from the method of construction of a "new" imputation £(z). Comparing the functional equations (17) and (21), we obtain that for the proof it is necessary to show the non-negativity of the component
E Ki(az)
ieN_£ ( )
V(N,z) £i(z
which is obvious because
Kz(a1,...,an) > 0
z e Z i e N
Now we prove that &(z) = (&]_(z),..., &n(z)) has the properties of an imputation in cooperative game with characteristic function V(S, z), which is given by the functional equation (23) with the boundary condition (24). To do this, for any player i e N and each vertex z e CZ, it is necessary to prove satisfaction of two properties:
1. E&i(z) = mz),
iew
2. &i(z) > F({i},z).
The first property is obviously satisfied for vertices z e {z : L(z) = 0 or qk = 1} and z e CZ. Now prove these properties for vertices z e {z : L(z) 9 y and L(y) = 0} z e CZ
£ Ki(az) , x
E&i(z) = "wIT^ &(z) + (1 — qk) Z p(y|z,az)£ &i(yn =
iew ( , ) iew yeL(z) V iew )
£ Ki(az)
= i£V(N z) V(N, z) + (1 — qk) E (p(y|z,az)&(N,z)) =
( ,z) yei(z)
= V(N, z) = F(N,z), y e {y : L(y) = 0}
The second property is also obviously satisfied for the vertices z e {z : L(z) = 0 or qk = 1}. We show that &(z) — V^({i}, z) > 0 for the vertices z e {z : L(z) 9 y L(y) = 0} (21) (23)
£ Ki(az)
&i(z) — &({i},z)= i£VN(N z) &i(z) + (1 — qk) £ p(y|z,az)&i(y)— ( ,z) yei(z)
£ Ki(ä-
ieN
-V({i},z) + (1 - qk) E p(y|z,az)!>({i},y)
V(N, z)
v ' 7 yei(z)
£ Ki(az)
= ^N^T(&i(z) — V ({i},z)) +
+ (1 — qk) E p(y|z, az) (&i(y) — &({i},y)) > 0.
y£L(z)
The first term is non-negative since &(z) is an imputation of cooperative subgame G(z) y e {y : L(y) = 0}
recursively for the previous vertices z e CZ and so on until vertex z0.
It is important to know in what relation the set C(z) which is a regularized solution defined by the formula (25), and the set C(z) which is the solution found
for the cooperative subgame G(z) with the characteristic functionV(S,z) (i. e., the solution constructed using the same rules as the solution C(z) c I(z) for the cooperative subgame G(z)). Now we find the sets C(z) and C(z) for the cooperative stochastic subgame G(z) if when the solutions of the stochastic game G(z0) are the imputations (the Shapley value and the core) from the classical "static" theory of cooperative games.
2.8. Regularization of the Shapley value and the core
We start with the case when players choose the single-point optimality principle^ Shapley value—as a cooperative solution. The Shapley value calculated in cooperative stochastic game G(z0), is denoted by Sh(z0) = (Shi(z0) : i € N), and in cooperative subgame G(z), where z € CZ, by Sh(z) = (Shi(z) : i € N).
Define the regularized Shapley value in cooperative subgame G(z), where z € CZ, and z € (L(z0))k based on the Shapley value of the initially given game as a solution of the functional equation
£ Ki(az)
Shi(z) = ieVN(N Shi(z) + (1 - qk) E P(y\z, az)Shi(y) (26)
( ,z) yeL(z)
with boundary condition
Shi(z) = Shi (z) (27)
for z € {z : L(z) = 0 or qk = 1}. The following theorem holds.
Theorem 3. Vector satisfying the functional equation (26) with boundary condition (27)
(N,V(•, z)), z € CZ of stochastic game (N,V(•,z0)), where the values of characteristic function V(^,z) are calculated by formulas (23) and (24).
Remark 3. Theorem 3 provides the relation between the sets C(z) and C(z), which are mentioned at the end of the previous paragraph. If the Shapley value is chosen as a solution of the stochastic game G(z0), then C(z) = C(z) for any z € CZ. Therefore, we may reformulate Theorem 2 in the following way.
Theorem 4. Vector satisfying the functional equation (26) with boundary condition (27), is subgame-consistent, and S h(z0) = C(z0) = C(z0), where C(z0) is a regularized solution satisfying equation (25), and C(z0) is the Shapley value of the cooperative stochastic game (N,V(•,z0)) with characteristic function given by (23) (24)
(26)
(27)
C(z)
Calculate the Shapley value of cooperative stochastic game {N, V(•, z)), z € CZ,
(23) (24)
S hi(z )= E <\S'- ^ (V (S,z) - m,.)).
ScN '
S3i
Rewrite (23) for coalition S \ {i} and obtain
£ Ki(az)
V(S \ {i}, z) = i£VN(N,z) V(S \ {i}, z)+
+ (1 — qk) E p(y|z, az)V(S\{i},y). (28)
y£L(z)
Subtracting (28) from (23), multiplying by (|S|-1)^("-|S|)! and summing up over the all possible coalitions S c N such that S 9 i, we obtain
E (|S| — 1)n(n — |S|)! [V(S, z) — V(S \ {i}, z)] = (29)
SCN !
S3i
)£ K-(az)
V (N,z)
+ (1 — qk) E p(y|z,az)x
y£L(z)
E (lS|-DKn-|SI>! [V-(S,9) — V-(Su^l =
SCN ! I
- S3i )
£ Ki(az)
= Shi(z)i£\/(N z) +(1 — qk) m p(y| z, az)S^(y).
( , ) yei(z)
The result of the theorem follows from (29) and (26).
Now we assume that the players choose the core as a solution of cooperative stochastic game G(z0). As before, we suppose that CO(z) = 0 for any vertex z e CZ. We also assume that CO(z0) is not subgame-consistent, i. e., there exists at least one imputation &(z0) e CO(z0) for which the condition of subgame consistency is not satisfied.
Definition 14. The regularized core of stochastic game G(z0) is the set:
£Ki(
zo 1
CO(z0)= &(z0): &i(z0) = ie;;(N ) &i(z0) + (1 — q0) E Wy|z0,azo)&(y)),
1 ( ,z0) y£L(z0)
&i(z0) = &(z0), z0 e {z : L(z) = 0 or qk = 1} ,&(z0) e CO(z0)J. (30)
Definition 15. The regularized core of cooperative subgame G(z) is the set CO(z) defined as:
f £ Ki(az)
CO(z) = &(z): &i(z)= «V ) &i(z) + (1 — qk) £ p(y|z, az)&i(y)
1 ( ,z) ye-L(z)
&i(z) = &i(z) for z e {z : L(z) = 0 or qk = 1} , &(z) e CO(z)l. (31)
Denote by CO(z) the core calculated for cooperative subgame (N, V(-, z)), z e CZ, with characteristic function V(S, z), defined % formulas (23), (24). We prove the theorem providing the relation between CO(z) and CO(z).
Theorem 5. The regularized core defined by formula (30) is subgame-consistent solution. Moreover, CO(z0) c CO(z0), where CO(z0) is the core of cooperative stochastic game (N,V(-,z)) with characteristic function defined by formulas (23), (24)
Proof. Subgame consistency of the core follows from Theorem JL^ To prove that CO(z0) C CO(z), we need to prove that any imputation £(z0) e CO(z0) belongs to the set CO(z0), which is equivalent to the following: for any £(z) e CO(z), z e CZ ScN
E V (z) > V(S,z) (32)
ies
is true.
The proof is obvious for the vertices z e{z : L(z) = 0 or qk = 1}- Now we prove this inequality for the vertices z e {z : L(z) 9 y and L(y) = 0}:
£ Ki(az) , x
EV(z) = ieN(Nz) E&(z) + (i-qk) E (p(y\z,«z)Eto >
ies ( ' ) ies yeL(z) \ ies J
> V(S,z),
which is true because y e {z : L(z) = 0 or qk = 1} and £ieS &(z) > V(S,z), as £(z) is the imputation belonging to the core CO(z).
The following part of the proof is made for the next vertices up to the initial vertex z0 like in the proof of Theorem 1.
Now we consider examples of construction and regularization of the solution in cooperative stochastic games defined on the graphs.
Example 1.1. (Petrosyan et al., 2004) Consider stochastic game G(z0) defined on graph &(z0) which is represented on Fig. 1.
Zo
Z\ Zl Z3 -5- Ztj
Z4 Zs Z s Z7 Z&
Fig. 1. Graph of Example 1.1.
The set of vertices of graph &(z0) is Z = {z0,...,zg}. The set of players is N = {1,2}. In each vertex of graph &(z0) two-player normal-form game r(z),
zeZ
r(z0): 5)(0, 8)) , r(z2) V(3, 0)(6, 4)
(8, 0) (1,1)J' ^ * V(5, 6) (2, 2)
1 11) (4, 2)N r( ): /(1,1) (0, 2) (1, 3) (1,1)i ' ' (Z7)'1(2, 0) (1, 2)
r(zs):((5' 5)(6' 1D , r(z9):^ 2)(3,4)
(1, 6) (6, 6)^' V(5, 6) (1,5)
(0, 0) (1, 0)
r (zi),r (z4),r (z5),r (ze)
(1,0) (0,1)
G(z0)
probabilities and probabilities of the game duration. First, define the transition probabilities from the vertices of the graph to the next vertices. If in game r(z0) the action profile (2,2) is revised, then stochastic game G(z0) transits to the vertex z2 with probability 1/3 and to the vertex z3 with probability 2/3. If any other action profile different from (2,2) is revised (arrow means the deterministic transition), then the game G(z0) transits to ve rtex z^. At vertic es z1; z2 when any action profile is played, stochastic game G(z0) transits to vertices z^d z5 respectively. If in the game r(z3) the action profile (2,2) is played, then stochastic game G(z0) transits to vertices z^d zg with equal probabilities 1/2. And if in the game r (z3) the action profile (2,1) is played, the g ame G(z0) transits to ve rtex z7 1
other action profiles to vertex z6 (arrow means the deterministic transition). Let probabilities qk that the game ends at stage k be given:
q1 = 8, q2 = 0, q3 = 1
Let players choose the Shapley value as the cooperative solution of the game. For two-player game, it is calculated by formulas:
Shi(z) = V ({1},z) +
Sh2(z) = V({2}, z)
V ({1, 2} ,z) - V({1},z) - V({2}, z)
V ({1, 2} ,z) - V({1},z) - V({2}, z)
V({1}, z) V({2}, z)
z {1} {2}
The above described sets and values determine stochastic game with random
G(z0) (1)
We start to find the solution of the cooperative game from the terminal vertices of the graph, i. e., the vertices from which it is impossible to transmit to any other vertices of the graph. First, calculate V({1}, zg) and V({2}, zg) as maximum guaranteed players' payoffs in the game r(zg) using formula (8):
V({1},zg)=3, V({2},zg)=4, V ({1, 2},zg) = 11.
Then, we may calculate the Shapley value of the subgame G(zg) of the game G(z0) starting from game r (zg):
Sh1(zg) = 5, Sh2 (zg) = 6.
We make the similar calculations for the subgames starting from the games r (z4), r(z5), r(z6), r(z7) and r(z8) using formula (8) while these games are realised
{z : L(z) = 0}
functions for these subgames and corresponding Shapley values are given in the Table 2.
Table 2. Characteristic functions and the Shapley values of subgames G(z), z £
{z4 ,ZB,Ze,Z7, Z8, zg}.
Vertex z V ({1},Z) V ({2},Z) V({1, 2},Z) Shi (z) Sh2(Z)
Z4 0 0 1 1/2 1/2
Z 5 0 0 1 1/2 1/2
Z6 0 0 1 1/2 1/2
Z7 1 2 3 1 2
Z8 5 5 12 6 6
Z9 3 4 11 5 6
{z : (L(z))2 = 0}
z3. As stochastic game may transit to the other vertices of the graph, we need to transform the payoff matrix of the game to calculate the Shapley value of cooperative subgame G(z3). With action profile (2,2) the mathematical expectations of the players' payoffs we find in the following way: • for Player 1:
1 + (1 — q2) QV({1}, zg) + 1V({1}, zg)) = 5,
1 + (1 — q2) QV({2}, zg) + 2 V({2}, zg)) = 5.5.
With action profile (2,1) they are
1 + (1 — q2) V ({1},zr) = 2,
1 + (1 — q2) V ({2},zr) = 3.
Similarly, with action profile (1,1) the mathematical expectations of the players' payoffs are
1 + (1 — q2) V ({1},z6) = 1,
• for Player 2:
11 + (1 - q2)V({2},ze) = 11;
and with action profile (1,2) the mathematical expectations of the players' payoffs are
• for Player 1:
4 + (1 - q2)V({1},ze) = 4,
2 + (1 - q2)V({2},ze) = 2.
Then the bi-matrix game written for the calculations of the values of characteristic functions V({1},z3) and V({2},z3) looks like
(1,11) (4, 2) N
(2, 5) (5, 5.5) J ■
The values of characteristic function of cooperative subgame G(z3) of the game G(z0) for coalitions {1} {2} are
V ({1},z3) = 2, V ({2},z3)=5.
To calculate V({1, 2}, z3) we use formula (7) and obtain the bi-matrix game:
12 + (1 - q2)V({1,2}, ze) 6 + (1 - q2)V({1,2}, ze)
4 + (1 - q2)V({1, 2}, zr) 2 + (1 - q2)(0.5V({1, 2}, zg) + 0.5V({1, 2}, zg))
or in numeric form:
13 7 x 7 13.5 y '
Therefore,
V ({1, 2},z3) = 13, 5,
Shi(z3) = 5.25, Sh2(z3) = 8.25.
We make similar calculations for the cooperative subgame G?(z1):
V ({1},zi)=0, V ({2},zi) = 0, V ({1, 2},zi) = 2, Shi(zi) = Sh2(zi) = 1,
and for subgame G(z2):
V({1},z2)=3, V({2},z2) = 2, V ({1, 2},z2) = 12, Shi(z2) = 6.5, Sh2(z2) = 5.5.
For cooperative stochastic game G(z0), the matrix game for the calculation of the
{1} {2}
(7) are
1 + (1 - qi) (3v({1}, z2) + 2v({1}, z3)) = 3^4,
1 + (1 - qi) (3V({2}, z2) + |V({2}, z3^ = 4^.
With action profile (2,1) the mathematical expectations of the players' payoffs are
8 + (1 - qi)V({1},zi) = 8,
0 + (1 - qi)V({2},zi)=0.
Similarly, with action profile (1,1) the mathematical expectations of the players' payoffs are
5 + (1 - qi)V({1},zi) = 5,
5 + (1 - qi)V({2},zi) = 5.
With action profile (1,2) the mathematical expectations of the players' payoffs are
0 + (1 - qi)V({1},zi)=0,
•
8 + (1 - qi)V({2},zi) = 8.
Finally, we obtain the matrix:
'(5, 5) (0, 8) 4.
'24, 4 2
, 0) (324, 42)y
V ({1},z0)=3^4 ,V ({2},z0)=4^.
For the calculation of V ({1, 2}, z0) we form matrix game using formula (7):
'10 + (1 - qi)V({1, 2}, zi) 8 + (1 - qi)V({1, 2}, zi)
+ (1 - qi)V({1, 2}, zi) 2 + (1 - qi)(3V({1, 2}, z2) + §V({
or in a numeric form:
UJ 9 3 .
94 13ty
Calculating V ({1, 2},z0) and Shi(z0^ Sh2(z0), we obtain:
3 23 5
V ({1, 2},z0) = 138, Shi(z0) = 5-, Sh2(z0) = 7I2.
z0
z2, z3 z5, zg, zg.
Now we verify if the imputation distribution procedure is non-negative. It is negative at vertex z3 that follows from equation (17), in which the vertex z3 is used:
Shi(zs) = ßi(zs) + (1 - qi)
5.25 = ßi(zs) + (1 - 0) ßi(z3) = -0.25.
1 • Shi(zg) + 1 • Shi(zg)
1 • 5+ 1 • 6
2 2
As ^i(z3) is negative, we make the regularization of the Shapley value to construct a «new» non-negative Shapley value.
Determine the new Shapley value for the vertices of the cooperative subtree with vertices z0, z2, z3, z5, zg, zg by formulas (26) and (27):
Shi(z5) = 0.5, Shi(zg) = 6, Shi(zg) = 5, 5h2(z5) = 0.5, 5h2(zg) = 6, 5h2(zg) = 6,
.11 '24'
13
Shi(z2)=12 •6.5+2
Sh2(z2) = H • 5.5+ 2
24'
Shi(z3) = Sh2(z3) = Shi(zo) = Sh2(zo) =
2
13.5
5.25
1 • 5+ 1 • 6
•8.25
13.5
2 23 IX
133 ' • 5- 24
2 5
■ 7—
13f ' 12
6
6
= 618,
= 7-
+ 1 -
"1 11 2 5 80741
6-- h - 6— = 6 ä 6.437,
3 24 + 3 18 184896
"1 13 2 21 173491
5-- h - 7- = 6-ä 6.938.
3 24 + 3 9 184896
The «new» vector is the Shapley value of the cooperative game with characteristic function defined by formulas (23), (24). It is subgame-consistent which follows from Theorem 4.
For the games r (z5), r (zg) and r (zg) the new characteristic functions are presented in Table 3.
Table 3. «New» characteristic functions.
Vertex z V>({1},z) v>({2},z) V>({1, 2},Z)
z 0 3.763 4.265 13.375
Z2 2.750 1.833 12.000
Z3 4.296 5.574 13.500
Z5 0.000 0.000 1.000
Z8 5.000 5.000 12.000
Z9 3.000 4.000 11.000
Remark 4. The nucleolus may be chosen by the players as a solution of the cooperative game (see Schmeidler, 1969). Notice that the nucleolus consists of one
vector, so there are no problems with the choice of a unique imputation from the imputation set. We also notice that the nucleolus belongs to the core when the latter is non-empty.
Example 1.2. Consider stochastic game G(z0) defined on the graph &(z0) which is presented on Fig. 3. The set of vertices of graph &(zo) is Z = jzo,..., z5}. The set
of players is N = {1, 2,3}. At each vertex of graph G(z0) three-player normal-form game r(z), z G Z, is given. The payoff matrix are the following:
In each game defined above. Player 1 chooses rows. Player 2 chooses columns and Player 3 chooses matrices.
First, we define the transition probabilities from the vetrices to the other vertices of the graph. If in game r(z0) action profile (1,1,1) is played, then stochastic game G(z0) transits to the vertex zi with probability 1/3 and to the vertex z2 with 2/3
(arrow means the deterministic transition), then the game G(z0) transits to the vertex z^. If action profile (2,1,2) is realised at vertex z2, stochastic game G(z0) transits to the vertex z^d z4 with probabilities 1/3 2/3 respectively. If any other action profile different from (2,1,2) is realised, game G(z0) transits to vertex z5 with probability 1.
The probabilities qk that stochastic game G(z0) ends at stage k are given:
Let players choose the Shapley value as a solution of the game. We start solving the game with the vertices of the graph which belong to the set {z : L(z) = 0}. We calculate the values of characteristic function and the Shapley value for subgame
Fig. 2. Graph of Example 1.2.
r(Z4) : ((
qi = 0.5, q2 =0, 93 = 1.
G(z3). Similar calculations are made for the vertices zi5 z5, z4, and then for the vertices z^d z0 using formula (7). The calculations are presented in Tables 4 and 5.
Table 4. Characteristic functions for subgames G(z), z € {zo,zi, Z2, Z3, Z4, zsj.
z V ({1},z) V({2},z) V({3},z) V({1, 2},Z) V({2, 3},z) V({1, 3},z) V({1, 2,3},z)
Zo 2 1 3/2 11/2 9/2 2 83/9
Z1 2 0 1 3 4 3 6
Z2 3 1 4/3 7 6 7 47/3
Z3 1 1 1 4 4 3 6
Z4 0 1 0 8 9 5 13
Z5 2 0 1 3 4 3 6
Table 5. The Shapley values of subgames G(z), z € {zo, Z1,Z2,Z3,Z4, Z5}.
z Shi(z) Sh2(z) Sh3(z)
zo 193/54 305/108 305/108
z1 8/3 7/6 13/6
z2 37/6 14/3 29/6
z3 11/6 7/3 11/6
z4 10/3 35/6 23/6
z5 8/3 7/6 13/6
The set of vertices of the cooperative subtree is CZ = {z0, z1; z2, z3, z4}. We regularize the Shapley value:
, . ( 31 89 89 \
Sh(zo) = 3—, 2-, 2-
v 0 V 54 108 108
and verify if the imputation distribution procedure is non-negative. For this, we find values pi(z) for vertices z0 G CZ and z2 G CZ using formula (19) and verify if imputation distribution procedure Pi(z) is non-negative:
(1 2
Pi(z2) = Shi(z2) - (1 - q2) ( 3Shi(z3) + 3Shi(z4)
12
3 Shi(zi) + ^
obtaining
12
Pi (zo) = Shj(zo) - (1 - qi) ( -Shi (zi) + - Shi (z2)
^i(z2) = 3*3, P2(Z2) = 0, P3(Z2) = 12;
2 2 23
Pi(zo) = 127, P2 (zo) = 1-, P3(zo) = .
For z G {zo, z2} the following conditions: Pi(z) > 0 and £ieN Pi(z2) = 5 are satisfied, and £ieN Pi(zo) = 3.
In all vertices of the cooperative subtree, conditions of subgame consistency and non-negativity of the Shapley value are satisfied. Therefore, we state that the Shapley value is subgame-consistent imputation in game G(zo).
Now we repeat calculations assuming that players adopt the nucleolus as a solution of the game G(z0). The nucleolus was initially proposed by D. Schmeidler (Schmeidler, 1969). The definition and some usefull theorems and lemmas about the properties of the nucleolus may be found in (Pecherski and Yanovskaya, 2004, Driessen et al., 1992, Kohlberg, 1971). The works (Kohlberg, 1972, Montero, 2005) are devoted to the calculation of the nucleolus which contains the unique vector. For the calculation of the nucleolus, one may use Matlab (Mathworks, 2017) and program TUGlab (TUGlab), written for calculation in cooperative game theory, or Mathematica (MATHEMATICA) and program TUGames (Meinhardt) written for the same tasks.
The characteristic function was calculated above. The nucleolus of the subgame G(z), z e CZ, is denoted by n(z) = (n(z) : i e N).
We calculate the nucleoli for all subgames of the game G(z0). The results are presented in Table 6.
Table 6. The nucleoli of subgames G(z), z € {zo,zi,z2,23,^4,^5}.
z ni (z) n2 (z) n3(z)
z0 zi z2 z3 z4 z5 3 9 2 2 6 3 12 13 2 3 21 2 2 3 -1 3 18 14 4 2 21 61 14 218 2 4 4 2 4 3 12 1 3 3 3 2 4
Now we verify the subgame consistency of the nucleolus using formula (17) and calculate ) for vert ex z2 by formula:
n-i(z2) = A(z2) + (1 - 92)(p(z3|z2, az2)ni(Z3) + p(z4|z2, az2)ni(z4)).
We obtain:
ft (Z2) = 4, ^2(Z2) = - 6, & (Z2) = l5.
The nucleolus of the cooperative stochastic game G(z0) is not subgame-consistent if the non-negativity of the imputation distribution procedure is required. For example, ^2(z2) < 0. We won't verify existence of non-negative imputation distribution (17)
Calculate «new» nucleolus for each vertex z e CZ by formula (21) with initial condition (22). For vert ex z2, we use the following formula:
£ K (az2)
ni(z2 )= i£iN( N-r" (z2) + (1 - 92) E P(y|z2'«Z2 )ni(y)J
V (N'Z2) y£L(z2)
z0
£ Kj(az°)
ni(zo) = i£-i/(N-Ni(zo) + (1 - 91) E P(y|z0'«z°)ni(y)-
( 'Z0) y£L(z°)
Table 7. The nucleoli of subgames G(z), z e {zo, zi, Z2, Z3, Z4, Z5}.
z ni(z) «2 (z) «3(z)
zo 3 3487 3128867 2 69149
3 140436 3280872 2 93624
zi 2 i 2 2 14 2 4
z2 4 jo. 4 141 6 217 6 282 4 94
z3 12 1 3 21 12 1 3
z4 2 2 2 3 61 31
z5 2 i 2 2 14 2 4
«New» nucleoli for the vertices of set CZ are given in Table 7.
Calculate characteristic function V(S, z) for each vertex z G CZ by formulas (23) and (24). Moreover, !>(S,z3) = V(S,z3), V>(S,Z4) = V(S,z4), 1>(S, zi) = V^(S, z5) = V(S, z1) = V(S, z5). For the calculation of V^(S, z2) we use formula:
£ K (az2)
V(S,z2)= i£TN(N ) V(S,z2) + (1 - q2) ]T p(y|z2,«z2)!>(S,y),
V (N'Z2) yeL(z2)
and for V(S, zo):
£ Kj(az°)
V^(S,zo) = i£TN(N ) V(S,zo)+
V (N, zo)
+ (1 - qi) (p(zi|zo,az°)!>(S,zi)+ p(z2|zo,az°)l>(S,z2)) . The values of the function V(S, •) are given in Table 8.
Table 8. Characteristic function V"(S, z), z e {zo, zi, Z2, Z3, Z4, Z5}.
z V>({1},z) V>({2},z) !>({3},z) V>({1, 2},z) V>({2, 3},z) V>({1, 3},z) V>({1, 2, 3},z)
zo i 245 1 249 164 249 1 1 747 4 155 4 249 3 80 3 83 4 52 4 83 9 2 9 9
z1 2 0 1 3 4 3 6
z2 1 1 141 115 1 47 107 141 o 127 8 14! 9 -35 9 141 6-806 141 15 3
z3 1 1 1 4 4 3 6
z4 0 1 0 8 9 5 13
z5 2 0 1 3 4 3 6
Notice that the "new" nucleolus n(z2) of subgame G(z2) belongs to the imputation set with characteristic function V(S, z2) (the nucleolus n(z2) also belongs to the set I(z2), which is not true in general), but it is not the nucleolus of the cooperative game. The nucleolus of cooperative game defined by characteristic function V^(S, z2), is denoted by n(z2) = (n1(z2), n2(z2), n3(z2)). It equals to the following one:
n(z2) « (4.213, 6.894,4.560) = n(z2).
The «new» nucleolus n(zo) calculated for the game G(zo), belongs to the imputation set of the cooperative game defined by characteristic function V(S, zo) (n(zo) also belongs to the imputation set I(zo)), but it does not coincide with the
nucleolus of this cooperative game. The nucleolus of the cooperative game defined by characteristic function — (S, z0), given above, is denoted by n(z0) and it equals
n(z0) « (3.621, 2.720, 2.881) = n(z0). 2.9. Strongly subgame consistency of the core
In this section we consider the case when solution of the cooperative stochastic game is the set and contains more than one point. As an example of such a solution we examine the core. First, we describe the problem of subgame consistency and then find the sufficient conditions of strongly subgame consistency of the core. This problem was initially examined by Leon Petrosyan for differential games (Petrosyan, 1992) and then for multicriteria problems of optimal control (Petrosyan, 1993).
Suppose that the cores of stochastic game G(z0) and any subgame G(z), z e CZ, are non-empty. When players cooperate they come to an agreement about the realization of the cooperative strategy profile ( and expect to receive the components of the imputation belonging to the core CO(z0). Reaching the intermediate vertex z e CZ \ jz0} of the cooperative subtree, player i e N chooses an action pf in accordance with the cooperative strategy ( and receives the payoff Kf (az). If the players recalculate the cooperative solution, i.e., find the solution of the coop-
z
CO(z). It will be rational to require that the payoff received by the player in vertex z summarized with the expected sum of any imputations from solutions CO(y), y e L(z), of the games of the cooperative subtrees following game r(z), is equal to the imputation from solution CO(z). If this property is satisfied for any vertex z of the cooperative subtree, the core of cooperative stochastic game G(z0) is strongly subgame-consistent.
To introduce the mathematically strict definition of strongly subgame-consistent core, it is necessary to define the so-called expected core. For any non-terminal vertex of the cooperative subtree we define the set of expected imputations belonging to the cores which are the solutions of the subgames following the considered vertex. For any vertex z e CZ, L(z) = 0, define the expected core:
EC(L(z)) = J a(L(z)) = £ p(y|z, af)a(y) | a(y) e CO(y). (33)
{ ye-L(f) )
The set EC(L(z)) consists of the vectors a(L(z)) which are the mathematical expectations of the possible collection of the imputations from the cores of the subgames
z
tribution {p(y|z, af),y e L(z)}.
We also define the distribution procedure of the players' payoffs in the vertices of the cooperative subtree. Refine Definition 10 of the imputation distribution procedure. The first condition in Definition 10 maybe called the condition of "feasibility of the imputation distribution procedure" because it guarantees that in any vertex of the cooperative subtree the sum of the payments to the players equals the sum of the payoffs received by the players when they realize cooperative strategies. The second condition guarantees to the players that they receive the components of the initially chosen imputation from the core of cooperative game G(z0) in the sense of mathematical expectation, if the payments to the players along the game are realized in accordance with imputation distribution procedure {,0(z) : z e CZ}.
Now we need to define the distribution procedure of the imputation a(z0) from the core CO(z0) in a way that the core is strongly subgame-consistent.
Definition 16. We call the core CO(z0) of the cooperative stochastic game G(z0) strongly subgame-consistent if there exists the distribution procedure {ß(z)}zeCZ of the imputation from the core CO(z0) such that for each vertex z e CZ the inclusions take place:
ß(z) © (1 - qk)EC(L(z)) C CO(z), (34)
B(Z0) e CO(z0), (35)
where
ß(z) © (1 - qk)EC(L(z)) = jß(z) + (1 - qfc)a(L(z)) : a(L(z)) e EC(L(z)^ .
And the imputation distribution procedure {ß(z)}zeCZ is called strongly subgame-consistent. 3
Condition (34) means that the set of vectors which are equal to the sum of the
z
z
z
in the games defined at vertices and often it is not satisfied for any game if the payments to the players are realised in accordance with initially defined payoff functions.
We impose additional restrictions on characteristic functions of subgames starting from the vertices of the cooperative subtree to obtain sufficient conditions of strongly subgame consistency of the core. Denote by EV(S, L(z)) the expected values of characteristic function calculated for coalition S C N at the vertices following
z
EV(S,L(z))= £ P(y|z,az)V(S,y).
yeL(z)
Denote by
4V(S, z) = V(S, z) - (1 - qk)EV(S, L(z))
z
value of characteristic function on condition that the game does not finish at vertex z. Denote by 4CO(z) analogue of the core calculated using function 4V(S, z). Now define sufficient condition of strongly subgame consistency of the imputation distribution procedure and the core CO(z0).
Theorem 6. Let for each vertex z e CZ the core CO(z) and the set 4CO(z) be non-empty. For each vertex z e CZ distribution procedure {ß(z) : z e CZ} of the
CO(z0)
ß(z) e 4CO(z), (36)
B(z0) e CO(z0).
then the core CO(z0) and distribution procedure {ß(z) : z e CZ} are strongly subgame-consistent.
3 The sum denoted by sign ® is called Minkowski sum (see (Schneider), in which some properties of this operator are proved).
Proof. We need to prove that any vector ft(z) e 4CO(z) satisfying conditions (36) and (37) is strongly subgame-consistent distribution procedure of the imputation a(z0) e CO(z0). So, the conditions (34) and (35) from Definition 16 hold. Condition (37) coincides with (35), therefore, it remains to show that the inclusion (34) holds for any vertex z e CZ. Consider any vector a(L(z)) e EC(L(z)) for vertex z and calculate the sum ft(z) + (1 — )a(L(z)). Verify if the latter vector belongs to the core CO(z). Now calculate the sum of all components of the vector:
) + (1 — 9fc) X) P(y|z, af)Z) ai(y) =
¿eN yeL(f) ¿eN
= V(N,z) - (1 - qfc) Y )V(N,y)+
yEL(z)
+ (1 - qfc) E p(y|z, ) Y ai(y) = V(N,z),
yEL(z) ¿EN
which carries out the property of collective rationality. Now consider S c N, S = N:
) + (1 - qfc) £ p(y|z, «¿(y) >
¿ES yEL(z) ¿ES
> V(S,z) + (1 - qfc) £ p(y|z, )V(S,y)-
yEL(z)
- (1 - qfc) £ p(y|z, )V(S,y) = V(S,z).
yEL(z)
By virtue of the arbitrariness of vertex z G CZ, we make a conclusion that the core of cooperative game G(z0) and procedure j^(z) : z G CZ} are strongly subgame-consistent.
When analogue of the core 4CO(z) is non-empty for each vertex z of the cooperative subtree, Theorem 6 provides the method of construction of strongly subgame-consistent distribution procedure of the imputations from the core, equal B» (z0) by condition (37). Notice that in a general case not all the imputations from the core can be realised using distribution procedure {^(z) : z e CZ} defined above.
Example 1.3 Consider stochastic game G(z0) defined on graph &(z0) depicted on Fig. 3.
The set of the vertices of graph &(z0) is Z = {z0,..., z5}. The set of the players is N = {1,2, 3} In each vertex of graph G(z0) the three-person normal-form game
zo
Z 3 Z4 Z 5
Fig. 3. The tree ^(z0). r(z), z G Z, is defined. The payoff matrices are
2, 2,0)J /(1,1, 2) (2, 2,1)JJ 0, 0, 3)J y(1, 3,1) (3,0,1)J J '
2, 2,0)J /(1,1, 2) (2, 2,1)\\
0, 0, 3)J /(1, 3,1) (3,0,1)JJ'
1, 0,1)J /(2, 2,1) (1,1, 3)JJ
2, 2, 2)J /(2,1,1) (2,1, 2)JJ '
2, 2, 2) J /(2,0,1) (2,1,1)JJ 1, 4,1) J /(4,0,1) (0,4,1) JJ'
2,1, 3)J /(4, 5, 0) (0, 5,4) J J
3, 6, 4)y ^(2, 8, 0) (0, 8, 2)J J'
In each game the first player chooses rows, the second one chooses columns, the third one chooses matrices. The strategy set of player i G N in game r(z) is Af = {1,2}.
Define the probabilities of transition from all vertices to the following ones. If in game r(zo) the action profile (1,1,1) is realised, stochastic game G(zo) transits to vertex z1 with a probability of 1/3 and to vertex z2 with a probability of 2/3. If any action profile different from (1,1,1) is realised (arrow means the deterministic transition), the game G(zo) transits to vertex z1. If at vertex z2 action profile (2,1,2) is realised, stochastic game G(zo) transits to vertices z3 and z4 with probabilities
1/3 2/3 G(zo) z5
1 from any other vertices.
The probabilities qk that stochastic game G(zo) ends at stage k are given:
q1 = 0.5, q2 =0, q3 = 1.
To construct the cooperative version of stochastic game we find the cooperative strategy profile This profile prescribes to play action profile (1,1,1) at vert ex zo. zo
probability of 0.5. If the game does not end, it transits to stage z1 with a probability 1/3 (2, 1, 1) (2, 2, 1)
2/3 z2
play action profile (2,1,2). At vert ex z2 the game does not end because q1 =0 and transits to the vertices z^d z4 with probabilities of 1/3 Mid 2/3 respectively. At vertices z3 and z4 the game terminates. Therefore, the set of the vertices of the cooperative subtree represented on Fig. 4 is #(zo) = {zo, z1, z2, z3, z4}.
r(z0) :
r(z2) : r(zi),r(z5) :
r(z3) : r(z4) :
(2, 2, 2 (2, 2,0
(1,1,1
(2, 2,0
(2, 0,1 (3,1,2
(1,1,1
(3, 2,0
(2,1,0 (3,1,2
so
S3 24
Fig. 4. Cooperative subtree &(zo) of the game G(zo).
Find the values of characteristic function using formulas (5) with boundary condition (6) for S = N, (7) with boundary condition (8) for S c N and (9) for S = 0. Calculations are given in Table 9. For further calculations we use package TUGlab of program Matlab [16].
Table 9. Characteristic functions v(S, z) for G(z), z e {zo, zi,z2,z3,z4, Z5}.
z \ S {1} {2} {3} {1, 2} {2, 3} {1, 3} {1, 2, 3}
zo 2 1 1.5 5.5 4.5 6 110/9
zi 2 0 1 3 4 3 6
z2 3 1 4/3 7 6 7 47/3
z3 1 1 1 4 4 3 6
z4 0 1 0 8 9 5 13
z5 2 0 1 3 4 3 6
Now we define the cores of subgames beginning from the vertices of cooperative subtree #(z0). We also assure that they all are non-empty to use the core as a cooperative solution of a stochastic game. The systems of linear inequalities and equities which determines the cores and their graphical representations are given in Tables 10 and 11. On the figures, the imputation set is depicted as a light-gray triangle and the cores are dark-grey sets. Notice that at vertices z1 and z5 condition a1 = 2 holds for each element of the core. And the core is the segment connecting points (2,1, (2,3,1).
For each vertex of the cooperative subtree #(z0) we define the analogues of the cores denoted by ACO(z). Remind that for terminal vertices zi5 Z3, Z4 of set ACO(-) coincide with the core CO(-). Systems of linear inequalities and equities determining sets ACO(z0) and ACO(z2) and also their graphs are presented in Tables 12. Notice that analogues of the cores ACO(-) are non-empty for all vertices of the cooperative subtree. First, verify if the core is strongly subgame-consistent if the payments to the players are realised according to initially defined payoff functions, i. e.. verify if payoff vectors in the vertices of the cooperative subtree belong to the corresponding sets ACO(-) when the players realise cooperative strategy profile:
KZ0 (1,1,1) = (2, 2,2) G ACO(zo), Kzi (2, 2,1) = (2, 2, 2) G CO(zi) = ACO(zi),
Table 10. The core for vertices z0, zi, z5 £ CZ. z Core Graph of the core
ai > 2
a2 > 1
as > 1.5
a1 + a2 ^ 5.5
a1 + a3 ^ 6
a2 + a3 ^ 4.5
a1 + a2 + a3 = 110/9
(2,0,4)
ai > 2 a2 > 0
a3 > 1 a1 + a2 ^ 3 a1 + a3 ^ 3 a2 + a3 > 4
a1 + a2 + a3 = 6
Kz2 (2,1, 2) = (1, 3,1) ),
Kz3 (1, 2,1) = (2, 2, 2) G CO(zs) = 4CO(z3), KZ4(2, 2,1) = (3, 6, 4) G CO(z4) = 4CO(z4).
We can easily see that at vertex z2 the condition of inclusion is not satisfied and we can't guarantee strongly subgame consistency of an imputation from the core if the payments to the players are realised according to initially defined payoff functions.
We show that condition (34) does not hold at vertex z2. Following Definition 16 players may choose any imputation from expected core of vertex z. Let they choose the imputations: (1.5, 3,1.5) G CO(z3^d (0, 8, 5) G CO(z4), then the sum at the left-handed term of inclusion (34) takes form:
(1, 3,1) +3(1.5, 3,1.5) + 3(0, 8, 5)=(|, 28, f) ,
and this vector does not belong to the core CO(z2), which means that condition (34) does not hold and the core is not subgame-consistent.
Following Theorem 6, the set of vectors p(z) belonging to 4CO(z), z G CZ, is the distribution procedure of an imputation from the core CO(z0) of initially defined game. By Theorem 6 we may also conclude that collection of vectors (p(z) : z G CZ) is not strongly subgame-consistent. For example, consider element from the set 4C(z), z G CZ: p(zo) = (4,1,1), p(zi) = (2, 2, 2), p(z2) = (3,1,1), p(zs) = (2,2, 2), p(z4) = (3, 6,4). Calculate the mathematical expectations of the players' payoffs if in the vertices of cooperative subtree they are paid in accordance with
Table 11. The core for the vertices z2, z3, z4 e CZ.
Z2
ai ^ 3 a2 > 1 a3 > 4/3 ai + a2 > 7 ai + a3 > 7
a2 + a3 > 6
ai + a2 + a3 = 47/3
(3,1,11.6667)
(13.3333.1.1.3333)
(3.11.3333.1.3333)
(1,1,4)
Z3
ai > 1 a2 > 1
a3 > 1 ai + a2 ^ 4 ai + a3 ^ 3 a2 + a3 > 4
ai + a2 + a3 = 6
(0,1,12)
Z4
ai > 0 a2 > 1 a3 > 0 ai + a2 ^ 8 ai + a3 ^ 5 a2 + a3 ^ 9 ai + a2 + a3
13
B(zo) = (4,1,1) + 0.5 {3(2, 2, 2) + 3 ((3,1,1) + 1(2, 2, 2) + 3(3, 6,4)) J =
_ Í 56 29 25 \
Obviously, B(z0) G CO(z0).
So, wo have proposed a method of construction of strongly subgame-consistent imputation distribution procedure when the core is chosen by the players as a set-valued optimality principle.
Table 12. Sets 4CO(z) for vertices zo and Z2.
z ^C (z) Graphs of 4CO(z)
(0.66667,0.66667,4.6667)
zo < ai > 2/3 a2 > 2/3 as > 8/9 ai + a2 > 8/3 a1 + a3 ^ 19/6 a2 + a3 > 11/6 ai + a2 + a3 = 6 (4.4444,0.666 7,0.88889) (0.66667,4.44 4,0.88889)
(2.6667,0,2.3333)
Z2 < 'ai > 8/3 a2 > 0 a3 > 1 ai + a2 > 1/3 a1 + a3 ^ 8/3 a2 + a3 > -4/3 ai + a2 + a3 = 5 u (4,0,1) (2.6667,1.3333,1)
3. Cooperative stochastic games with infinite duration 3.1. Noncooperative stochastic games with infinite duration
In this section we consider stochastic games with infinite duration defined by Shapley in the paper (Shapley, 1953a). The main classical results on noncooperative stochastic games are presented in (Filar and Vrieze. 1997. Noynian and Sorin. 2003). Similar to the previous section, the game is realised in a discrete time. The significant difference of this stochastic game from the game considered in Section 2 is that now the game has an infinite duration, the set of states which can be realised at any stage is finite and does not change over time. We define first a stochastic game and then describe the set of strategies and the payoff function of the player. Notice that the notations of this section which are widely used in modern literature on stochastic games are a bit different from the notations of Section 2.
Consider stochastic game G defined by
1. The finite set of players N = {1,..., n}.
2. The finite non-empty set of states Q = {1,..., w};
3. The finite, non-empty set of available actions A" of player i € N in state w € Q. The action of player i € N in state w € Q is element a" € A". The action profile in state w € Q is a vector of players' actions a" = (a" : i € N). The set of action profiles in state w is A" = A" x ... x An.
4. The finite payoff function K" : keN A" ^ R, for every player i € N and every
w€Q
5. The transition function p(-|w, a") : Q x A" ^ 4(Q) from state w G Q and action profile a" G nieN A", Here 4(Q) is probability distribution over set Q.
6. The initial state is determined by probability distribution
where n" is the probability that state w is realised at the first stage of the game, n" = i.
Time is discrete and game G lasts for an infinite number of stages denoted by t Stochastic game G is realised in the following way:
1. Prior to the game, an initial state w' is chosen along the probability distribution no, i. e., with probability n" stochastic game starts with state w.
w
actions. Player i chooses action a" G A", i G N. Thus the action profile a" = (a" : i G N) G A" x ... x A^ is realised at the first stage. Player i receives payoff K"(a"). Onee a" is announced for all players, then the game transits to the next state w' G Q with probability p(w'|w, a").
3. At the second stage, player i G N chooses action a" G A" . Thus, at the second stage the action profile a" = (a" : i G N) G A" x ... x A^ is played and player i receives payoff K" (a" ).
4. The game further is played in the way described above.
Finally, let a" G ^(A") be a mixed action of player i in state w, where ^(A") is a probability measure over A".
G
A change of state may correspond to the presence of (positive or negative) shocks of different size. They will be reflected on the players' payoffs.
The subgame of noncooperative stochastic game G beginning from stage k is denoted by G(k).
To solve a stochastic game, we need to define the class of players' strategies and the calculation method of players' payoffs in the whole game. First, define players' strategies and distinguish two classes of strategies:
• The behavior strategy of player i G N is a function = jy>j(k)}£=o and (k) : h(k) x Q i—> ^(A"), where h(k) is a history of stage k, which is given by a collection of pairs consisting of states and action profiles which were realised at the previous stages until stage k: ((w(1), a(1)), (w(1), a(2)), .. ., (w(k — 1), a(k — 1))). Denote the set of behavior strategies of player i by and behavior strategy profile in stochastic game by ^ = (<£>» : i G N).
no = (no,...,no >...,no ^
(38)
where S G (0,1) is a discount factor, the same for all players. Every state w is determined by n-person normal-form game
(N, {A"}iew, jK"}iew>.
• We also consider the subset of behavior strategies set, that is, the set of stationary strategies. A stationary strategy prescribes a player to choose the same strategy in the same state independently of the history of the stage. Denote a stationary strategy to distinguish behavior (not necessarily sta-
i
by n® = {n®(k)}fc=i, n®(k) : Q 1—> ^(A"). Denote the profile of stationary strategies in a stochastic game by n = (n® : i € N), and the set of stationary strategies of player i by H®, while H® C
Now we determine players' payoffs in stochastic game (1):
• For the finite number of stages t a payoff of player i in a stochastic game is determined clS cl mathematical expectation:
E®(y) = E"(1)^ 1 E Kf(k) (a(k)), k=1
i. e., a mathematical expectation of a payoff with respect to the initial state w(1) and strategy profile y, while K""(k)(a(k)) is a payoff of player i in state w(k) realised at stage k, a(k) is a strategy profile in state w(k) realised at stage k in accordance with strategy profile y.
• G i
E® (y) = ¿k-1K"(k) (a(k)) (39)
k=1
as a mathematical expectation of the payoff with respect to the initial state w(1) and profile y.
We formulate the main results on the existence of the values of stochastic games with two and more than two players which are used in the present work.
Theorem 7. (Shapley, 1953a) A two-person zero-sum stochastic game with discount factor S € (0,1) has a value for any initial state. Moreover, players' optimal strategies are stationary.
This result was extended on the case of nonzero-sum games with more than two players by Fink and Takahashi in 1964:
Theorem 8. (Fink, 1964, Takahashi, 1964) A nonzero-sum stochastic game with many players with discount factor S € (0,1) and finite set of states and strategies has a value for any initial state. Moreover, there exist optimal stationary strategies of the players.
3.2. Stochastic games in stationary strategies
In this section we provide formulas to calculate players' payoffs in a stochastic
Q
are only w subgames G"1,..., G", each with initial states w1;..., w respectively, because stationary strategies prescribe the same behavior in the same states even with different histories of the current stage. We denote a non-cooperative stochastic subgame in stationary strategies with initial state w € Q by G".
We now define the w x w-matrix of transition probabilities in G:
/p(wo|wo,a"1) ...p(w |wo, a"1 )\ n (n) = P(w1|w2,a"2 ) ...p(w |w2,a"2 ) ^
\p(wo|w, a") ... p(w |w, a") J
which is a function p(w'|w,a") of a stationary strategy profile n = (n : i G N) such that n®(w) = a" G ^(A"), w G Q i G ^^d a" = (a",..., a") for any state w G Q. Matrix entry (40) which is the element of the jrow and the j column is the probability to transit from state jth to state jwhen players use strategy profile n = (n : i G N).
i
payoff in an explicit form. Let E" (n) be the expected payoff of player i in subgame G" when profile n = (n0,.. •, n«) in stationary strategies is adopted. The vectorial
form of the expected payoffs is Ei(n) = (E"1 (n),..., E"(n))T-i G"
recurrent equation:
E" (n) = K"(a") + S E p(w'|w,a")E"' (n). (41)
Given a matrix form of transition probabilities (40), rewrite equation (41) in a matrix form:
Ej(n) = Kj(a)+ Sn (n)Ej (n), (42)
where K®(a) = (K"1 (a1),..., K"(a"))T. Equation (3) is equivalent to the equation
Ei (n) = (I — Sn (n))-1 Kj(a),
where I is an identity matrix of size w x w. Matrix (I — Sff (n))-1 always exists for discount factor S G (0,1). The payoff of player i in game G taking into account the initial state with distributed with n0 in stationary strategies is
Ei(n) = noEj(n) = no (I — Sn(n))-1 Kj(a). (43)
3.3. Cooperative stochastic games with infinite duration
G
N
total payoff. The existence of maximum of the discounted joint payoff follows from theorem proved in (Shapley, 1953a), according to which the cooperative strategy of the grand coalition that yields the maximal payoff is stationary. Denote the profile of pure stationary strategies of player i as n G Hi, where H C ^j.4 The mixed stationary strategy is denoted as a G with Hi C HTj.
A cooperative strategy profile or cooperative solution maximising the sum of the expected players' payoffs in G is denoted as n* = (ni,..., n« )> where5
E ^i(n) = E E (n*). (44)
ne n Hi 2—' 2—'
¿eiv " ieN ieN
4 Prom now on we use the notation ni if player i uses the stationary strategy in the game. When a player i uses a behaviour strategy (not necessarily stationary), we use the notation
5 Without loss of generality we may find the maximum in equation (6) over the set of pure actions of coalition N.
In order to define the cooperative solution of the stochastic game, we determine the values of a characteristic function for any coalition S C N. This function describes how much collective payoff players can gain by forming a coalition. We denote the characteristic function as V(S) = (V"1 (S),...,V"(S)). Following (Kohlberg and Neyman, 2015), let V(S) be the minmax value of two-person zero-sum game Gs between coalition S and coalition N\S.6 Before introducing charS
N\S as ns G Hs = []¿es H^d nw\s G Hw\s = H¿eN\s Hu respectively.
SCN
sume that players in S play in the interests of the coalition. Therefore, the actions and strategies of the players in S are correlated (Aumann, 1974).
In state w € the correlated actions of the players from coalition S are a" G 4(A") where A" = EI¿es A". The correlated stationary strategy of players from coalition S and N\S are nS(w) G 4(A"^d nN\S(w) G 4(AN\s), respectively.
Let the set of correlated stationary strategies of coalition S and N\S be Hs and Hw\s, respectively.
Begin the construction of the characteristic function by examining the grand coalition, S = N. The Bellman equation for the characteristic function V(N) rep-
N
V(N)= max E E(n) = E ^(a*) + ¿n(n*)V(N), (45)
Hi ¿eN ¿eN
where n* is the cooperative strategy profile satisfying condition (6) and n*(w) = a"*, w G and ^¿(a*) = (K"1 (a"1 *),..., K"(a"*))T. From (4), we can infer the matrix V(N)
V(N) = (I - Sff(n*))-1E Ki(a*), (46)
ieN
where I is an Wentity w x w-matrix and n(n*) is the w x w-matrix of transition probabilities in G when players use the strategy profile n*- Matrix n(n*) is described in details by (40).
We define next the value of V" (S) of coalition S as the minmax payoff in the G"s w
V" (S) = minmax V (ns ,nw\s) = max min V (ns ,nw\s). (47)
Vn\s Vs vs Vn\s
¿es ¿es
In equation (9), the maximum in min max^ E"(ns, nN\s) is found over the set
Vn\s Vs ¿es
of pure strategies of coalition S, while the minimum in max min £ E" (ns , nN\s)
Vs Vn\s ¿es
N\S
The Bellman equation for the characteristic function V" (S) is
V" (S)= min max E (ns, aw\s ) = E (ns ,aw\s)
Vn\s e^n\s Vs eHs ¿es ¿es
= £ K"(as, aw\s)+ S E p (w'|w, (as, aN\s)) V"' (S), (48)
¿es w'efi
6 The existence of the minmax value of two-player discounted stochastic game is proved by Shapley (1953a).
where (aSS, aN\s) is a profile in correlated actions in state w G Q such that nS(w) = a", nN\s (w) = aN\s, and Ki(a", a<N\s) = (K"1 (a^1, a^),...,Kf (a§, a^ ))• We then rewrite equation (48) in a matrix form:
V(S) = (I — Sn (ns ,aN\s ))-1 E Ki(a", aN\s). (49)
ies
Finally, we define the characteristic function V"(S) for the whole stochastic game as:
F(S) = noV (S), (50)
for any coalition S C N, where V (S) = (V"1 (S),..., V" (S )), and V" (S) is the value of the characteristic function of subgame G" for S.
wGQ
V" (0)=O. (51)
Second, the characteristic functions V"(S^d V" (S) determined by (10) and (7)-(51), respectively, are superadditive (Aumann and Peleg, 1960). In other words, for any disjoint coalitions S, T C N, and S n T = 0, the inequality V(S) + V(T) < V(S U T) holds. Superadditivity implies that the value of two disjoint coalitions is at least as great when they play together as when they act non-cooperatively. If
SUT
will not be formed.7
We are now in a position to define the cooperative version of stochastic game 17 and its subgames.
Definition 18. A cooperative stochastic game Gc, corresponding to a stochastic game G, is a set (N, V>, where N is the set of players and V : 2N —> R is the characteristic function calculated by (10). A cooperative stochastic subgame G" starting from state w is a set (N, V" >, where V" : 2N —> R is the characteristic function calculated by (7), (9) and (51).
When forming the grand coalition, players should decide not only what strategies to use to maximise the joint payoff but also how to allocate the total payoff. The next definitions display the allocation rule or solution (also called imputation) of G" and Gc, respectively. To determine an imputation of the joint payoff (6) we need
SCN
Definition 19. An imputation in the subgame G", w G Q, is a vector a" = (a" ) satisfying: (i) a"" = V" (N), and (ii) a" > V" ({i}) for any
i G N. The set of imputations in G" is denoted as E".
Definition 20. An imputation in the game Gc is a vector a = (a1,..., an), where ai = noai, ai = (a"1,..., a"(a",..., a") = a" G Ew G Q. The set of imputations in Gc is denoted as U.
7 The property of superadditivity is not needed and it is often omitted in cooperative game theory, because in real life there are a lot of motivations to consider both profitable and non-profitable coalitions. As Aumann and Dreze (1974, p. 233) note, there are arguments for superadditivity that are quite persuasive, but, as they also note, superadditivity is quite problematic in some economic applications.
By Definition 19, an imputation satisfies the following conditions: (i) any player should obtain no less than she may get by non-cooperative play (individual rationality condition) and (ii) the sum of components of the imputation equals the value of the characteristic function corresponding to grand coalition (group rationality condition). The set of imputations is non-empty in any subgame G", w G ^ and in the whole cooperative stochastic game Gc, since the characteristic function determined by equations (4)-(51) is superadditive.
3.4. Principles of stable cooperation
In cooperative games, the solution of a game is determined by an optimality principle. The optimality principle is assumed to be the subset of the imputation set. Therefore, the optimality principle contains one or more than one imputations or solutions of a cooperative game but sometimes it maybe empty. For example, the core may be empty, then the solution of a cooperative game does not exist according to this optimality principle. The Shapley value as an optimality principle always exists and contains a unique imputation. Therefore, the solution of a cooperative game always exists and it is unique according to this optimality principle. The solution of cooperative stochastic game means an imputation.8 Now we do not consider the problem of choosing a unique imputation from the set but assume that the optimality principle contains the only one imputation. The examples of one-point solutions are the Shapley value (Shapley, 1953b), the Von Neumann-Morgenstern solution (von Neumann and Morgenstern, 1944) and the nucleolus (Schmeidler, 1969). The realisation of an imputation in a cooperative stochastic game requires the satisfaction of some principles, which in turn ensure stable cooperation in a game. Following (Petrosyan and Zenkevich, 2015), we formulate the main principles of stable cooperation including subgame consistency, strategic support (or strategic stability) and irrational-behaviour-proof of the solution of a cooperative stochastic game. Each principle of stable cooperation is defined and analysed separately.
Subgame consistency. The principle of subgame consistency ensures that in any subgame cooperative solution is determined according to the initially chosen allocation rule. This concept deserves a detailed explanation. Players agree on cooperation before the game and adopt an imputation following the allocation mechanism. During the game, they play a cooperative strategy profile a*, i G N which maximises their total payoff. In any subgame beginning in a certain state, a player is able to derive her expected payoff for the remainder of the game. If at some intermediate stage of the game players decide to calculate their expected payoffs in the subgame according to the initially defined payoff functions, then most often these expected discounted payoffs do not coincide with an imputation calculated in accordance with the initially chosen optimality principle. This means subgame inconsistency of a cooperative solution (or optimality principle). If for any subgame discounted players' payoffs coincide with the imputations calculated in accordance with initial optimality principle, cooperative solution (or optimality principle) is subgame consistent (see Petrosyan, 1977). To make cooperative solution subgame consistent, we
8 We further consider the case when the solution of a cooperative stochastic game is an imputation set consisting of more than one imputation.
propose the transfer mechanism, called imputation distribution procedure (IDP).9 Originally, the idea of IDP was proposed by L. A. Petrosyan for differential games (Petrosyan and Danilov, 1979).
This mechanism leads to a modification of the players' payoffs in a dynamic
aa imputation in cooperative game Gc. This modified game ensures several advantages to the players. First, subgame consistency is ensured through the "new" payoff functions. Second, the expected payoffs in the regularised game will be equal to the
a
the regularised game is equal to the sum of the payoffs in the correspondent state of the initial game. For instance, suppose that players choose the Shapley value at the beginning of the game as an allocation rule. In this case, subgame consistency guarantees that, in each subgame, the vector of the players' payoff for the remaining stages is the Shapley value calculated for this subgame.
Let players adopt cooperative solution in stochastic game, i.e., they choose imputation a" = (a",..., a" )T G U" for every subgame G". The problem is to determine the transfers that ensure the expected payoff a" for player i in every subgame G". If transfers are based on the payoff functions in every state, then players can hardly expect to get the payoff based on the initially chosen allocation rule. To overcome this, we propose a rule to transfer the players' total payoff, based on the method for differential games (Petrosyan and Danilov, 1979).
Since strategies are stationary, the number of states corresponds to the number of relevant "different" histories. In turn, when players implement cooperative strategies in the stochastic game (1), the number of relevant subgames is equal to the number of possible states. Therefore, we need to determine a vector of transfers A = (A"1 ,...,£? r for where i G N w G Q
Definition 21. The set of transfers jpi}ieN is IDP if the following conditions are satisfied:
wGQ
payoffs in cooperative strategy profile n*:
E A" = E K" («"*). (52)
iew iew
2. The expected sum of transfers to player i G N in the game G is equal to the ith component of the initially chosen imputation a.
We then define the conditions of subgame consistency for the imputation and IDP.
Definition 22. Imputation a = (ai,..., an) and corresponding IDP jpi}ieN are
i
game G" is equal to the ith component of the initially chosen imputation in subgame G" (in accordance with the principle imputation a of the whole game is calculated).
9 Imputation distribution procedure was adapted for the class of discounted stochastic games in (Baranova and Petrosjan, 2006). See Petrosjan and Danilov (1979), and Baranova and Petrosjan (2006).
The following statement suggests the method of IDP construction for imputation a.
Lemma 2. Let imputation a be such that (a1,..., an) G X where = =
(a"1,..., a")T and, (a",... ) = a" G X1". Then the collection {A^ew where P¿ calculated by
A = (I - Sn(n*)K, (53)
G
Proof. Verify the IDP condition:
E P" = E (a"*),
¿ew ¿ew
a"* n* w
It is easy to show that ^ from (53) satisfies (52). Since '52¿eN is equal to (I - Sn(n*)) £¿ew a¿ = (I - Sn(n*))V(N), and V(N) is determined by (7), then equation (52) holds.
i
denoted as B¿, with new transfer in state w G Q satisfies the recurrent equation:
= P"J + S E p(w'|w, a"*)B"',
u'en
or, in vectorial form:
B¿ = P¿ + Sn (n* )B¿, (54)
where B¿ = (B"1,..., B")T. Equation (54) is equivalent to:
B¿ = (I - Sn(n*))-1 P¿. (55)
Given the second condition of IDP and equation (55) we obtain:
•a = (I - Sn(n*))-1 P¿, (56)
where = (a"1,..., a")T, (a",..., a") = a" G X". Equation (56) can be rewritten equivalently as:
P¿ = (I - Sn(n*)K. (57)
Finally, equation (53) equals to:
«a = P¿ + Sn (n* K. (58)
The second item in the right part of (16) is the expected value of the transfers calculated for the subgame from the next stage onwards. Suppose that the imputation for each subgame is chosen following the same allocation rule that has been
chosen by the players at the beginning of the game. If players maintain cooperative
n* i
to the correspondent component of imputation a in cooperative stochastic game Gc.
10 Notice that IDP is uniquely defined by formula (53) if optimality principle provides unique cooperative solution a (e.g., if the solution is nucleolus, the Shapley value or another single-valued solution). If the cooperative solution is the set of imputations containing more than one imputation, the method of IDP construction should be modified ( see Parilina and Zaccour, 2015).
Given Definition 21, for every imputation a = (a1 ,...,an) G U, where ai = a^ ai = (a"1,..., a" )T, (a",..., a^) = a" G U", we can define the regularization G
Definition 23. A a-regularisation of stochastic game G (subgame G", w G is non-cooperative stochastic game Gf (subgame G") if, for any piayer i G N in state w, payoff function Kf'"(a") is defined as:
where P" ^ a component of PDP of player i defined % (53) and a"* = n*(w).
G
Remark 6. The a-regularisation changes the payoff functions in any state w G ^ only when action profiles a"* = n* (w) are adopted. We may expect that players agree to modify the initial payoff functions to be sure that their cooperative solution satisfies the principle of subgame consistency.
a
G
Theorem 9. Let a = (a1,..., an) G U be the initially chosen imputation in game G, where ai = noa^ ai = (a"1, ...,a" )T, (a", ...,a") = a" G Uifte n aG
ie., i/ie cooperative solution a is subgame consistent in game GCT.
Proof. At the beginning of the game, players choose the following imputation: a = (ai ,...,a„) G U, where ai = noa^ ai = (a"1, ..., a" )T, (a" ,...,a") = a" G
U" n* a G
determined by Definition 23, thus the set of transfers {Pi}ieN defined by (53) is a
aG satisfies the principle of subgame consistency, we need to calculate the discounted payoffs in every subgame of the game Gf when a cooperative strategy profile n* occurs. Consider any subgame G" starting from state w G The discounted payoff i
where Ei(n*) = (E"1 (n*),..., E" (n* ))T and E" (n*) is the discounted payoff of player i in subgame G" starting from state w when players adopt n* • Equation (60) can be rewritten in a vector form:
(59)
E" (n*)= P" + ^p(w'|w,a"*)Ei (n*),
(60)
Ei (n*) = Pi + ¿n (n* )Ei(n*),
or
Ei(n*) = (I - ¿ff(n*))-1 Pi
Pi
Ei(n*) = (I - ¿n(n*))-1 (I - ¿n(n*)) ai = ai
aG
game consistency.
Definition 23 and Theorem 9 provide a method of constructing subgame consistent transfers in every state of a stochastic game. The imputation distribution procedure P"1, ...,P" in stat es w1;w ensures that a p layer i receives the same expected payoff in game GCT (subgame G"), as she planned to receive in cooperative stochastic game Gc (subgame G"). Moreover, the expected payoff from future transfers is in line with the same allocation rule chosen by the players at the beginning of the game.
Strategic support. The principle of strategic support ensures that, along the whole game, an individual deviation from cooperative strategy profile in a regularized game does not yield a higher payoff than cooperation. In other words, it guarantees the existence of the Nash equilibrium in a regularized game with the same payoffs that players expect to receive with the cooperative solution (which was the basis of regularization). This principle was proposed in (Petrosyan, 1998). We reformulate the principle and then find conditions under which Nash equilibrium is subgame perfect (see Selten, 1975) in a regularized game with the payoffs described above.
The subgame perfectness is important for dynamic games because it allows to guarantee the existence of the Nash equilibrium in any subgame with which the players' payoffs coincide with the cooperative ones. Comparing our approach with the standard analysis of deterministic (repeated) games, the condition of strategic support for stochastic (or dynamic) games corresponds to the condition of the existence of subgame perfect Nash equilibrium in grim-trigger strategies. The main difference is that, in our setting, players first regularize the initial game by adapting the IDP to achieve subgame consistency.
Suppose players come to a cooperative agreement, i. e., find a cooperative strat-n*
player deviates from the cooperative strategy profile, then the other players switch to trigger strategy from the next stage until forever to punish the deviating player. The strict definition of a behavior strategy used by players in Nash equilibrium is given below (see formula (63)). Here we assume that a stochastic game is the game with perfect monitoring, that is, all players know the state of a current stage and the history of the stage.
To begin with, we define the Nash equilibrium in a regularized stochastic game.
i a G"
from state w as
Definition 24. A Nash equilibrium in the regularised game GCT is a behaviour strategy profile y* = (y1,..., yjnj such that, for any player i G N and for any state w G Q, the condition
ET(y**,ywv) > ET^y^) (61)
holds for any behaviour strategy of player i: y¿ G
We assume that the behaviour strategy exhibits the following structure. If, in the history of stage k, all players use their cooperative strategies, then they implement the cooperative correlated actions also in stage k. Conversely, if before stage k the individual deviation of a player z G N is observed, then the coalition N\z punishes player z. We assume that the punishment ensures that player z's payoff is at most her
minimax value in any subgame.11 Notice that, since we focus on a Nash equilibrium, we need to consider only individual deviations from this profile.12 If deviation occurs by more than one member of the coalition, the player may implement any strategy from the her set of strategies.
We now outline the condition under which the Nash equilibrium with players' payoffs equal the cooperative ones exists. For convenience, define
F({i}) = (F " ({i}),...,F" ({i}))
T
F"№}) = „ max ^ K"(a",aN\i) + ^ p(w'|w, (a".a^V" ({i}) The following inequality:
ai = (I - ¿n(n*))-1Pi > F({i}), (62)
compares two payoffs for each subgame: (i) the payoff when players adopt the cooperative strategy profile in the left hand side, and (ii) the payoff of deviation plus future punishment in the right hand side. If the first payoff is greater or equal to the second one, the player gets no benefit from deviation. If this is true for any player and any state, then the principle of strategic stability is satisfied. This result is summarised in the following proposition.
Proposition 1. If in an a-regularisation Gf such that a = n0a, inequality (14) holds for any player i G N, then there exists behaviour strategy profile ( such that it is the Nash equilibrium with players' payoffs (a1?.. .,a„).
Proof We determine the behaviour strategy profile p = ((p1,..., (n) where strate-(pi i G N
"*, ifw(k) = w,h(k) C h*; "(z), ifw(k) = w, and 3 l G [1, k - 1],
(i(h(k))H z G Ni = zh(l) C h*'and (63)
( )) ^ (w(l),a(l)) G h*, but 1 ^
(w(l), (a*(l),aN\z(l)) G h*; ^any otherwise,
where a"* corresponds to the player i's cooperative action, while a" (z) G ^(A") is the player i's punishment that, together with actions a"(z) G ^(A"), of players i' = i, i' G N\z, forms the action (either in pure or mixed strategies) of coali-
N\z z
theorem for stochastic games (Dutta, 1995) using the structure of the behaviour strategy (63). Notice that we do not define the reaction of players when they observe the deviations of more than one player. This because we focus here on the
11 The strict definition of the behaviour strategy is given in the proof of Proposition 1.
12 Things change for subgame perfectness. In this case, we need to prove that eq. (13) holds for all possible histories and all stages. Therefore, we need to determine the strategy of a player even if more than one player deviates. Strategy (71) defines the behaviour of the player given any history.
13 Notice that the actions of the players from coalition N\z are correlated.
Nash equilibrium (not subgame perfect). When more than one player deviates, the player chooses any strategy from the player's set of strategies. We now prove that y>(-) = (y (•),..., yn(-)) determined in (63) is a NE in the stochastic game GCT. Given strategy (63) and provided that all players do not deviate from a cooperative strategy profile n* > the discounted payoff of player i in the subgame G", w G Q, is:
E" (y) = E" (n*).
Let E¿ (y) be equal to the vector (E"1 (y),..., E" (y))T. Then for any player i G N the next equation holds:
E^y)^ - Sn(n*))-1^. (64)
Consider next the profile of strategies (yz , yw\z), when some piayer z deviates from strategy yz. For any k, there exists l G [1, k - 1] such that h(l) c h* but (w(k),a(k)) G h^d (w(k), (a*(k),aw\z(k))) G h*. Without loss of generality, we
w(k) = w z
stage k. We are now able to determine the total payoff of player z in the game GCT with strategy profiles (yz, yw\z) by
(yz , \z) = n0E^ (yz, yw\z ),
where
Ez (yz,yw\z) = E^1'*"1^,yw\z) + Sk-1 nk-1 (yz,yw\z)Ezff'[fc'~)(yz,yw\z).
(65)
The first term in the right hand side of (65) is the expected payoff of player z in the first k - 1 stages of the game GCT, the second term is the expected payoff of player z in the subgame of GCT beginning from stage k, where (yz, yw\z)
is the vector (E^-1 (yz, yw\z),..., Ez^" (yz, yw\z))T, with E^-" (yz, yw\z) being the
z G" w
there are no deviations from a cooperative strategy profile n* up to stage k - 1, the following equalities hold:
E^-^yz ,yw\z ) = Ezr'[1'fc-1] (n*),
n k-1 (yz \z ) = n k-1(n*).
z G"
k w(k) w
Ez'"(yz, yw\z) = K"(a", aw\z) + S E P(w'|w, , a^z))V"' ({z}), (66)
where a" G ). Players from the coalition N \ z punish player z by playing
z
definition of strategy profile y. In (66), the value of the characteristic function
V" ({z}) z
profiles y and (yz , yw\z) do not change up to stage k - 1, then a deviation may
z
G", w G ft. In particular, the strategy profile ((, (N\z) ensures the following expected payoff of player ¿from stage k:
F({z}) = _ max (a", a^) + * £ (a", aN\z ))V"' ({z})l . (67)
According to the definition of PDP, the expected payoff of player z in the regularised subgame G" with a profile of strategies (?(•) can be found from:
EJ(() = (I - (n *))-1 Pz = , (68)
where EJ(() = (E^1 ((?),..., EJ'"(())T. Taking into account (14) from (67), (68) and the above discussion we get
EJ (() > EJ ((z \z),
which is satisfied when inequality
az = (I - (n *))-1 Pz > F({z}) (69)
is true. In inequality (69) is satisfied for any player z G N, a player is not willing to deviate from the cooperative strategy profile in any subgame of the a-regularisation G
Thus the behaviour strategy profile (63) is a NE in the a-regularisation of game G. The discounted payoff of i in the game GCT with profile of strategies (? is equal to aj, where aj = n0a.j, while aj = (a"1,... , a")T consists of ith components of imputations a"1, ..., a" derived from the cooperative subgames G1, .. ^ G" accordingly.
Notice that the players' strategies used in a punishing regime of the behaviour
i
zz
ii ate from strategy profile formed by (63). Therefore, the strategy profile determined by strategies (63) is not subgame perfect.
We investigate now the conditions to obtain a subgame perfect Nash equilibrium aG strategy profile such that, for any state occurring in any period with any history, individual deviation is not profitable.
We assume that, if the history of the stage differs from the cooperative history, then all players implement a Nash equilibrium of the game G denoted by nne = (n™6,..., nJie) such that n?e(w) G ^(A").14 Again for convenience, define
Q({i}) = (Q"1 ({i}),...,Q" ({i}))T, Q"({i}) = _ max (k"(a", a^ ) + * £ p(w'|w, (a",aN\j))E"'(n™e)} ,
and
aj = (I - (n *))-1 Pi > Q({i}). (70)
14 In the case of multiple Nash equilibria, one of them should be chosen for the realisation of the punishment. Notice that this can be implemented because players use correlated strategies.
The condition of existence of a SPNE are summarised in the following proposition. The validity of inequality (70) implies that the principle of strategic stability holds when the Nash equilibrium is subgame perfect.
Proposition 2. If, in an a-regularisation GCT such that a = n0a, inequality (70) holds for any player i G N, then there exists behaviour strategy profile y which is a SPNE with players' payoffs (a1?.. .,a„).
Proof. The proof is similar to the proof of Proposition 1 using the structure of the "new" strategy profile. Determine this behaviour strategy profile as y = (y1;..., yn) where strategies i G N are:
where a"'ne G ) is player's i's punishment, which can be either in pure or
mixed strategies. Notice that, if a multi-player deviation is observed in the history, all players implement nne-
Irrational-behaviour-proof. Subgame consistency and strategic support assume that the players are fully rational. However, in reality cooperation may be broken down by irrational reasons. For instance, a player may use irrational acts to extort additional gains if some circumstances allow it. Refusal of other players to yield to his extortion would result in the dissolution of the cooperative scheme. Thus in this case, a deviation would imply an "irrational behaviour."15
D.W.K. Yeung proposed a condition16 under which, even if an irrational behaviour emerges in the game, a player is certain to obtain at least her individual payoff (Yeung, 2006). This procedure can be explained as follows. Suppose two different scenarios. In the first scenario, a player cooperates until a certain period, and then the cooperation breaks up. In the second scenario, a player plays individually during the whole game. If the payoff in the first scenario is not less than the payoff in the second scenario, then the principle of irrational behaviour proof is satisfied. The following definition provides the condition to satisfy this principle.
Definition 25. Cooperative solution a and the corresponding IDP satisfy the principle of irrational-behaviour-proof if
E^7[1, k] + Sknk(n *)V({i}) > V({i}), for every i G N and any k =1,2,..., (72)
where E^7 [1, k] is the expected player i's payoff at the first k stages in a-regularisation GCT.
The underlying assumption is that, before the beginning of each stage, players know if the cooperation has broken down or not, so that the information is not delayed. In the left hand side of inequality (15), the first term is equal to the
ik strategy profile n * and a-regularization of game GCT is made. The second term is the expected payoff of player i from stage k +1, when the cooperation breaks up.
i
the start onwards.
15 Note that it is possible to formulate an analogous condition for repeated games.
16 The so-called Yeung's condition or principle of irrational- behaviour-proof was adopted for linear-quadratic games in (Tur, 2014, Markovkin, 2006).
(71)
A Survey on Cooperative Stochastic Games with Finite and Infinite Duration 179 Theorem 10. If inequality
(I - (n *))(aj - V({i})) > 0 (73)
holds for any i G N, then the cooperative solution a and the corresponding IDP {Pi}jeN satisfy the principle of irrational-behaviour-proof
Proof In what follows, we show that condition (73) is sufficient for inequality (15) to hold for any k = 1, 2,.... The proof is based on the mathematical induction method. First, we rewrite (15) for k = 1. Then we transform (73) by considering definition aj and using I DP (56). We get
V({i}) < Pi + (n *)V({i}). (74)
k=1
Suppose that (73) implies (15) for k = I. Rewriting (15) for k = l we yield:
V({i}) < Pi + ... + *l-1 n l-1(n*)Pi + *ln1 (n *)V({i}). (75)
We adopt the same procedure for k = l + 1. Inequality (15) for k = l +1 is:
V({i}) < Pi + ... + *lffl(n *)Pi + *l+1 n1+1(n *)V({i}). (76)
Next we need to prove that, if (73) holds, then (15) holds for k = l +1. After
(76)
Pi + (n *) {Pi + (n * )Pi +... + *l-1 n l-1(n * )Pi + *l nl (n * )V ({i})}.
(75) V({i})
the right part of (76) is not less than Pi + *ff(n *)V({i}). From equation (53) and (73), we get (15) for k = l + 1, which proves the theorem.
Corollary 1. For irrational-behaviour-proof principle it is sufficient that for each iGN
K(a) - Pi < * (afin - Vmax ({i})) , (77)
where Ki(a) = | max K"1 (a"1*la"1),..., max K"(a"*llaip)| , and
max K" (a" * ||a") w the mammal payoff of player i which he obtains deviating
a? GA?
a" n
conditio iN
condition (6), and a = arg max K"(a" * ||a") for each state w G . and eac/i player
a? 6A?
i , ..., mm a i Vmax ({i}) = (max V" ({i}),..., max V" ({i})
(77)
way:
Pi + *amin > Ki(a) + *Vmax ({i}). (78)
Estimate the left- and right- hand parts of inequality (78). As matrix of transition probabilities n(n *) is stochastic, we obtain:
Pi + Safin = Pi + Sn(n * Kmin < Pi + Sn(n * )ai. (79)
(78)
Ki(a) + SVmax ({i}) = Ki(a) + Sn(a)Vmax ({i}), (80)
where n(a) is a stochastic matrix, and n = (a : i G N) is a profile in stationary
strategies such that
'arg max n(n*||n¿)V({i}), if j = i n¿£J¿
, if j = i Therefore, we have the inequality:
Ki(a) + ¿ff(ñ)Vmax ({i}) = max K<(a* ||o<) + 5 max {ff(n * ||ni)V ({i})} >
max {Ki(a * ||o¿) + ¿ff(n * ||ni)V ({i})} . (81)
n¿£H¿
The inequalities (78), (79), (80) Mid (81) implies condition (73). Therefore, by Theorem 1 the principle of irrational-behaviour-proof is satisfied.
3.5. Existence of stable cooperative solution
In this section we discuss the conditions guaranteeing the existence of a stable cooperative solution. First, we need to mention that the allocation rule adopted should give a non-empty subset of the imputation set. Cooperative solutions such as the Shapley value or the nucleolus always exist and we may calculate them for any subgame using the values of the characteristic function given by (7), (9) and (51).
The existence of a subgame consistent cooperative solution follows from Theorem 9 and the method of construction of IDP for a. For a given cooperative solution a, the regularisation of a stochastic game determines new payoff functions to players in order to satisfy the principle of subgame consistency. Hence, the players' discounted a
rative solution a, which is subgame consistent.
a
subgame consistent cooperative solution a exists in general.
To verify whether cooperative solution a satisfies the principle of strategic stability and irrational-behaviour-proof, we need to check that the following system of inequalities holds:
iVi = (I - ¿n(n *))-1Pi > F({i}), i e N,
\(I - ¿n(n *))(ai - V({i})) > 0, i e N 1 ;
¿
cooperative strategy profile is SPNE in repeated games. This system is non-linear ¿
form.
However, we may state the existence of a stable cooperative solution for the class of stochastic games in which the cooperative strategy profile coincides with the Nash equilibrium and the players are symmetric. In this case, the Shapley value satisfies the principles of stable cooperation. Further, we examine the solution of system (82) on a specific class of stochastic games with two states and two players.
Example 3. Stochastic game of competition between asymmetric firms.
Noncooperative game. Consider Cournot duopoly with asymmetric firms. Describe it with a stochastic game setting like Prisoners' Dilemma. Let the set of states be . = {w1, w2}, where wj = (N, A", A", K"j, K2"j), j = 1, 2, and A" = {Cj, Dj} is the set of actions of player i = 1, 2. Strategies C^d Dj stands for "collude" and
w1,
C1 D1 C\f (7,7) (1,8)\ D1 \(8,1) (4, 5) J
w2
C2 D2 C2/ (9, 9) (1,10)\ D2 \(16.5,1) (6, 5) J
w1 w2
ket with a high demand. Both one-shot games have the unique Nash equilibrium
(4, 5) (6, 5) w1 w2
spectively. Conversely, the cooperative action profile that maximizes the sum of the
(7, 7) (9, 9)
the cooperative action profile, players get equal payoffs, but in the Nash equilibrium outcome they obtain asymmetric payoffs. In particular, with a low demand Firm 1 has a lower payoff than Firm 2, and with a high demand Firm 2 has lower payoff than Firm 1. This scenario could be interpreted as the result of technical features of firms' production. For instance, Firm 2 can be endowed with a production technology being more efficient in producing low levels of output. w2 ,
competitor "deviates". In particular, Firm l's deviation payoff is larger than Firm 2's one. Hence the asymmetry of the players influences the cooperative payoff imputa-
w2
payoffis not much larger the one in action profile (D2, C2) (18 against 17.5). There-
(D2, C2) w1
from profile (C2, C2), then players may agree on playing profile (D2 ,C2) to avoid transition from high to low demand state.
w1 w2
/(0.3, 0.7) (0.9,0.1)\ /(0.9,0.1) (0.4, 0.6)\ y(0.4, 0.6) (0.3,0.7)y , y(0.1,0.9) (0.3, 0.7) J
where the element (k, 1) of the matrix consists of transition probabilities from state wj to states w1, w2, on condition that player 1 chooses actions kth and player 2 chooses 1th. We may mention that the probability of transiting to state w1 in action
profile (C2, C2) is much higher than the probability to transit to this state in action profile (D2, C2), that is 0.9 contrary to 0.1. Let the discount factor be S = 0.99 and the vector of the initial distribution over the set of states be n0 = (0.5,0.5).
Cooperative game. Determine cooperative game Gc based on stochastic game G. For it, we compute cooperative solution n * = (n * , n*) in stationary strategies using (5) and (6). We obtain a unique stationary strategy n 2 = (Ci,D2) for player 1, and n* = (Ci, C2) for player 2 which give maximal total players' payoff V"({1,2}) = n0V({1, 2}) = 1704.61. Following tto profile, in state wi the profile of cooperative strategies (when both players collude) gives payoff 7 for each firm. In state w2, with a cooperative strategy profile, Firm 1 deviates and Firm 2 colludes, and the payoff of firm 2 is less than its payoff in the Nash equilibrium. But this will be compensated by Firm 1 when they apply an imputation of their joint payoff. Therefore, the values of a characteristic function for a grand coalition are
vm 2»- + ET(n*)W
1702.43 1706.80
By definition (51) the values of characteristic function for the empty set are zero:
V(0) V0.
Calculate the values of characteristic function V(S) = (V" (S), V"2 (S)) for coalitions S = {1} and S = {2} using (7):
V ({1})=(540:60) .V ((21)=(520.02).
These are Firms' payoffs in the Nash equilibrium when both firms deviate in all states, i.e., they adopt strategy profiles (Di(D2,D2).
Using (10), we may calculate V"(S) for the whole game and all coalitions:
V(0) = 0.00, F({1}) = 539.60, F({2}) = 500.00, V({1, 2}) = 1704.61.
Thus, we determine cooperative stochastic subgame G"j as the set (N, V"j(•)), j = 1, 2, and cooperative stochastic game Gc as the set (N, V"(-)).
Cooperative solution: the Shapley value. We suppose that players choose the Shapley value as a cooperative solution of their total payoff in cooperative stochastic game Gc and in all subgames G"j, j = 1,2. For two-player game the Shapley value is calculated by formula:
a"' = V"j ({i}) + V"j ({1, 2}) - V"j ({1}) - V"j ({2}),
where i =1, 2 Mid j € {1,2} j = i. The Shapley values in subgames are
870.516) (831.916
ai 'v873.698y , a ^833.098J
Then taking into account the vector of initial distribution n0, we are able to determine the Shapley value a in the whole game Gc by Definition 20:
er - (<7i, e2) - (872.107, 832.507).
Subgame consistency. Now we verify if the Shapley value satisfies the principles of stable cooperation and begin with subgame consistency. If firms receive stage payoffs according to their initially defined payoffs, then their discounted payoffs
1526.809 177.805
872.107 832.507
imputation distribution procedure or transfer payments to the players such that they finally receive the components of the Shapley value and the imputation distribution procedure is subgame-consistent. Using that a equals n0a and equation (53), we obtain IDP:
P1 = (I - *n(n *))a1 = (6.052) , P2 = (I - *n(n *))a2 = (7.400) .
Define a-regularisation of initial stochastic game G using IDP and Definition 23. We redefine payoff functions of the players in the initial game in all states when players adopt cooperative strategy profile substituting the payoffs by corresponding com-
w1 w2
equal:
/(6.500, 7.500) (1, 8)\
V (8,1) (4,5) J,
/ (9,9) (1,10)\
(9.052, 8.448) (6, 5) .
w1
(6.5, 7.5)
(7, 7) w2
w2
(9.052, 8.448) (16.5, 1)
16.5 - 9.052 = 7.448 the initial game is made by the above described method, the Shapley value and the corresponding IDP are subgame-consistent.
Strategic support. We now check for strategic support of the Shapley value, i. e., we check if Firms have benefits from individual deviations from the cooperative
w1
is not the Nash equilibrium, then the players may have benefits from deviation. We verify if the inequality is true:
a"1 > F" ({i}),
i = 1, 2
F"1 ({i})= ?max? i K"1 (a"1 ,aN*) + * £ p(w>1, (a"1, ))V"' ({i})
a? 1 GA? 1 \ ^ \
ai 1 1
Inequality (14) for Firm 1 is written in this way:
870.516 > 8 + 0.99 (0.4 0.6) (540.60) = 542.402
and for Firm 2:
831.916 > 8 + 0.99 (0.9 0.1) ^0°.00)
503.
In state w2, cooperative action profile (D2,C2) is the Nash equilibrium. Therefore, players can't increase their payoffs by deviations. Therefore, we may conclude that inequality (14) holds for state w2. The condition of strategic support is satisfied.
Irrational-behavior-proof. To verify the condition of irrational behavior proof, we need to compare players' payoffs in two cases:
1) A firm plays individually during the whole game,
2) A firm cooperates with the other firm until some step, and after this it starts playing individually.
Notice that in the second case, when the firms cooperate, they receive payoffs in accordance with IDP, constructed on the basis of thee initially chosen cooperative solution.
If the player's payoff in case 1) is not greater than his payoff in case 2), then the principle of irrational behavior proof against irrational behavior is satisfied. This
1
3.6. Strong transferable equilibrium
Theorem 1 can be generalized to the case when several players deviate, i. e., we may prove that if the condition similar to inequality (14) is satisfied in a-regularization Gf of stochastic game G, there exists a strong transferable equilibrium with payoffs (ai,..., an). In this case, players can implement a specially constructed profile in trigger strategies, where as a punishment for deviated coalition, not deviated players will implement trigger strategies that allow a deviated coalition to obtain a minimax payoff in any subgame. Define a strong transferable equilibrium and prove a theorem similar to Theorem 1.
Definition 26. (Petrosyan and Kuzutin, 2000) We call profile y = (y^..., yn) strong transferable equilibrium in regularized game Gf if for any coalition S C N, S = 0, inequality
holds for any behaviour strategy of coalition S: yS = (y® : i € S) € nies Here Ef (•) is a discounted payoff of player i in a-regularisation of game G.
We will prove a theorem allowing us to obtain a condition on the game parameters for which in regularized game Gf there exists a transferable equilibrium with players' payoffs equal to the corresponding components of the cooperative solution according to which the initial stochastic game is regularized.
Theorem 11. If in regularized, game Gf such that cooperative solution satisfies condition a = n0a, the inequality holds:
(I - ¿n(n *))(ei - V({1})) - (I - ¿n(n *))(e - V({2})) - (j^) > 0.
(83)
for any coalition S C = 0, where E(S) = (F" (S),..., F"(S))T,
F"(S)= max I £ K"(a" * || a") + ¿£p(w'|w,a" *)V(S) i, then in re-
aSen ) Ues J
ies
a g = a g*
gularized game GCT iftere ea^sis a strong transferable equilibrium with players' payoffs (<ri,.. ., <r„).
Proof The proof of the theorem is close the proof of Theorem 1 but instead of strategy (63) we use the following behaviour strategy yi, i € N:
a" *, ifw(k) = w, h(k) C h*; a"'(S), ifw(k) = ^d 3 l € [1,k - 1], S C N i € S: h(l) C h * and (w(1),a(1)) € h*, but ^
(w(l), (aS(1),aN\s(l)) € h*; any otherwise,
yi (h(k)) =
where a" * is an action of player i in a cooperative mode, while a"'(S) € ^(A") is an action of player i in a trigger mode which jointly with actions a"'(S) € ^(A") of players i' = i, i' € N\S forms an action of coalition N\S against coalition S and allows coalition S to obtain minmax value V" (S) in subgame G".
3.7. Strongly subgame consistency of the core
Now suppose that the solution of a cooperative stochastic game is the subset of the imputation set that contains more than one point. For definiteness, let such a solution be the core. We formulate the problem of strongly subgame consistency of the core and propose sufficient conditions for strongly subgame consistency of the core for stochastic games with infinite duration given by (1).
Suppose that the cores of stochastic game Gc and any subgame G", w € ft, are nonempty. In cooperation, players agree on the joint implementation of cooperative strategy profile n * and expect to obtain the components of the imputation belonging to the core CO Reaching intermediate states w € ft, player i € N chooses action a" * in accordance with cooperative strategy n* and gets payoff K"(a" *). If the players recalculate the solution, i.e., they find a solution of cooperative subgame G", then the current solution will be the core CO". It would be reasonable to require that the payoff received by a player in state w summarized with the expected sum of any imputations from the cores CO" , w' € ft, following state w, would be an imputation from the core CO". If this property holds for any state w € ft, then the core of cooperative stochastic game Gc is strongly subgame-consistent.
To determine a strongly subgame-consistent core, we define the so-called ex-w
w€ft
we define the expected core:
EC(w) = j a(w) = £ p(
w'|w,au*| a" € CO j . (86)
Set EC(w) contains vectors a(w) which are mathematical expectations of all possible sets of the imputations from the cores of subgames starting in states which are
realised after the current state with respect to probability distribution {p(w'|w,a" *),w' € fl}.
Remind Definition 21 of the imputation distribution procedure. The first condition (52) in a definition can be called the condition of "attainability of the imputation distribution procedure" because it allows to ensure that in any realized state the sum of payments to the players is equal to the sum of their payoffs when they implement cooperative strategies. The second condition guarantees players to receive the components of the initially chosen imputation from the core of cooperative game Gc in the sense of mathematical expectation, if payments to the players throughout the game will be made in accordance with distribution procedure {P" : w € fl}.
We now define the distribution procedure of imputation a = (ai,..., an), where a® = noaj, (a",..., a^) = a" € CO", such that the core is strongly subgame-consistent.
Definition 27. We call the core CO of cooperative stochastic game Gc strongly
subgame-consistent if there exists a distribution procedure {P" : w € fl} of the imputation from the core CO such that for any state w € fl the following inclusions hold:
P" © ¿EC(w) c CO", (87)
B" € CO", w € fl (88)
where
P" © ¿EC(w) = jP" + ¿a(w) : a(w) € EC(w) j.
And distribution procedure {P" : w € fl} is called strongly subgame-consistent.
Condition (87) means that the set of vectors equal to the sum of the imputation
w
w
condition imposes restrictions on payments to the players in the realized states, and very often is not satisfied for an arbitrary game, if payments to the players are made in accordance with the initially defined payoff functions.
We impose additional restrictions on the characteristic functions of subgames
fl
subgame consistency of the core. Denote by EV" (S) the expected value of the
SCN
w
EV"(S)= £ p(w'|w,a"*)V"'(S).
Denote by
4 V" (S) = V" (S) - ¿EV" (S)
w
expected value of the characteristic function. We denote by 4CO" an analog of the core constructed with function 4V" (S). We formulate a sufficient condition for strongly subgame consistency of IDP and the core CO.
Theorem 12. Let for each state w € fi the core CO" and the set 4CO" be nonempty. If for every state w € fi distribution procedure {P" : w € fi} of the imputation from the core CO satisfies conditions:
P" € 4CO", (89)
B" € CO", w € fi, (90)
then the core CO and procedure {P" : w € fi} are strongly subgame-consistent.
Proof. We prove that any vector P" € 4CO" satisfying conditions (89) and (90) is a strongly subgame-consistent distribution procedure of imputation <r € CO, i. e., conditions (87) and (88) from Definition 27 hold. Condition (90) coincides with (88), so, we need to prove that inclusion (87) holds for each state w € fi. In state w consider any vector <r(w) € EC(w) and fed sum P" + 5<r(w). Now we verify if the latter vector belongs to the core CO. First, calculate the sum of all components of the vector:
E P" + S E P(w'|w, a"*) e =
¿ew "'efi ¿en
= V"(N) - 5 E p(w'|w,a" *)V"'(N)+ "'efi
+ 5 E p(w'|w, a"*) E = V"(N), "'efi ¿ew
which means that property of collective rationality holds. Next, consider S C NS = N:
EPi" + 5 E P(w'|w,a"*) E>
¿es "' e^ ¿es
> V"(S) + 5 E P(w'|w, a" *)V"'(S)-
- 5 E P(w'|w, a" *)V"'(S) = V"(S).
"' e^
Since the choice of state w € fi is random, we conclude that the core of the game Gc and proced ure {P" : w € fi} are strongly subgame-consistent.
When analogs of the cores 4CO" are nonempty for any state w, Theorem 12 provides a method of construction of a strongly subgame-consistent distribution procedure of imputations from the core. Notice that generally not all imputations from the core can be realised with distribution procedure {P" : w € fi} described above.
3.8. Stochastic game with one absorbing state
Noncooperative game. In this section we consider a two-player game with two states. The set of players is N = {1,2}. Let state wi be given by:
C D
Cf (a, a +1) (c,b) \ D^ (b,c) (d +1,d)J
Players have two pure actions, C (to cooperate) and D (to defect). The constants satisfy the inequalities:
b > a + 1, a > d + 1, d > c > 0.
We also assume
2a +1 > b + c. (92)
From inequality (92) it follows that players receive a larger total payoff by cooperating than defecting. The game represents Prisoners' Dilemma with asymmetric (C, C)
(D, D)
action profiles (C, C) and (D, D) are chosen in state wl5 a stochastic game remains
(C, D) (D, C)
game transits to state w2 which is "absorbing", i.e. this state will be realised in all following stages of the game with probability 1. In state w2 both players have a Dd
D
w2 : D (d,d) (93)
w1 w2
'(1,0) (0,1)\ , .
(0, 1) (1, 0) , 0, 1 .
The discount factor is S € (0,1) and the vector of the initial distribution on the set of states is = (1,0), i.e., a game starts with state w1.
Cooperative game. For this game we construct a cooperative game by determining the characteristic functions for all subgames and the whole game. We then show how we need to redistribute the stage payoffs adopting IDP to obtain the subgame consistency of the Shapley value. The condition of strategic stability gives the lower bound of the discount factor.
The first step is to determine cooperative form Gc of non-cooperative stochastic G
calculate the values of characteristic functions for each subgame (starting from wi w2
n = ( ni , n2 )
C wi D w2
n
N
F({1, 2}) = Ei(n *) + E2(n *) = 2a + 1 + S(2a + 1) + ... = 20+1. (94)
1-S
In particular, the values of characteristic function V" ({1,2}) for both subgames are
/ 2a + 1N
i. (95)
\ 1 - S
Vm 2))- ({1^ V ({1' 2})- ^V"2 ({1, 2}) J
We can now calculate the values of characteristic functions of coalitions {1} and {2}
V" ({1}) = maxmin E"1 (n1 ,n2) = minmax E"1 (n1 ,n2) = t+t , ni n2 n2 ni 1 — S
V"1 ({2}) = maxmin E"1 (ni ,n2) = minmax E"1 (ni ,n2) = T"^, n2 n1 n1 n2 1 — S
V "2 ({1}) = V "2 ({2}) = ^.
By equation (51), the values of the characteristic functions for the empty set are zero:
V <°>=or
Using (10), we then calculate the values of the characteristic function V"(-) for all possible coalitions taking into account the initial distribution of states n0 = (1,0):
F(0)=0, V({1}) = T+T, ^({2}) = r—V({1, 2}) = 2"—+1.
In this way, we determine cooperative stochastic subgames G"j as the set (N, V(•)), j = 1, 2, and cooperative stochastic game Gc as the set (N, V"(^)).
The Shapley value. We assume that players choose the Shapley value as an imputation of their total payoff in cooperative stochastic game Gc and in all subgames G"j, j = 1,2. For a two-person game, this is given by:
= V"j ({i}) + V"j ({1, 2}) — V"j ({1}) — V"j ({2}), where i = 1, ^d j = i. The Shapley values for the subgames are:
/a +1 \ /
a = Ur
1 — S i a = M = 1 — S
d i , a2 = [a^) =
\1-S) \1 — S,
Taking into account the vector of initial distribution n0, we are able to determine the Shapley value a in game Gc % Definition 20:
_ , a + 1 a = (ai,a2) =
1 - S' 1 - S
Subgame consistency of the Shapley value. We are now in a position to verify the principles of stable cooperation. Begin with subgame consistency. If players get payoffs according to the initially defined payoff functions, their total payoffs will be i—^ and i— in contrast to the components of the Shapley value and In order to obtain subgame consistency, we compute IDP by equating a to n0a by
using (53):
ßi = (I - Sn(n *))ai = ß2 = (I - Sn(n * ))ct2 =
a + 1
- S 0 1 _ S
0 1 - S d
1 — S
/ a
- S 0 1 _ S
0 1 - S d
1 — S
a + 1 d
a ' dl '
We then determine a-regularisation of the initial stochastic game G using the IDP and Definition 23. We re-establish the payoif functions of the initial game in state w1 when players adopt the cooperative action profiles. Therefore, the players' payoffs in state w1 are:
(a + 1, a) (c, b) (b,c) (d + 1,d),
w1
C w1 (a + 1 , a)
(a, a + 1)
regularised by the method described above, the Shapley value and the corresponding PDP satisfy the principle of subgame consistency (see Theorem 9).
Strategic support of the Shapley value. We now evaluate the strategic support of the Shapley value by checking if players may deviate from the cooperative
w1
w2 players have the unique action). In this state the cooperative action profile is not the Nash equilibrium, thus players may benefit from deviation. We should check if the following inequality
a"1 > F"1 ({i}), (96)
is true for any i = 1,2, where
F"1 ({i}) = max Ik"1 (a"1 ^ + 5 E p(w'|wi, (a"1 ,aN\*))V"' ({i})f .
a1 e4(A1H "'efl J
For player 1, inequality (96) yields:
a + 1 _ , . r , „2 T . T 5d
> b + Sd + S2d + ... = b +
for player 2:
1- S ^ ' 1- S'
a , Sd
> b + Sd + S2d + ... = b +
1 - S 1 - S
5
is satisfied:
b — a
S >
bd
Principle of irrational-behaviour-proof. In order to verify irrational-behaviour proof, we need to compare the payoffs of each player when:
1) A player acts as an "individual player" during the whole game.
2) A player cooperates with a competitor until some stage and then plays individually.
If the payoff of 2) is not less than the payoff of 1), then this principle is satisfied.
1
(I - 5n(n *))(ai - V({1}))
(I - 5n(n*))(a2 - V({2})) =
a + 1 d + 1
- 5 0 1 5- 1 5
0 1 - 5 d d
1 - 5- 1 - 5
5 ( a d
- 0 \ 1 _ 5- 1 _ 5
0 1 - 5 d d
1 — 5- 1 — 5
a - d 0
a - d 0
^ 0,
> 0.
Both players benefit from cooperation even if IDP is adopted initially at some stages and then the game is played as a non-cooperative one with initially defined payoff functions as compared with a game played individually by both players during the whole game.
Results. To sum up, we can formulate the conditions under which the Shapley value in the described stochastic game satisfies the three principles of stable cooperation (subgame consistency, strategic support, irrational-behavior-proof):
1. A discount factor is to be 5 >
2. A stochastic game is a-regularised, i. e., the players' payoffs in state w1 are:
C D
Ci(a +1,a) (c, b) Dl (b, c) (d + 1, d)
w2
4. Conclusion
The paper summarizes the results on cooperative stochastic games with finite and infinite duration based on the author's and coauthors' publications. Section 2 is devoted to describing cooperative stochastic games with finite duration and considering some properties of cooperative solutions applying in dynamics. Section 3 contains a method of construction of a cooperative stochastic game with infinite duration. The principles of stable cooperation in these class of games are examined in this section. There are several numerical examples representing theoretical results. For the applications of theoretical results see the following publications (Bure and Parilina, 2017, Parilina, 2009, Parilina, 2008, Parilinaand Sedakov, 2015).
Acknowlegments. The author thanks the Russian Science Foundation (project no. 17-11-01079) for financial support.
References
Avrachenkov, K., Cottatellucci, L. and Maggi, L. (2013). Cooperative Markov decision processes: Time consistency, greedy players satisfaction, and cooperation maintenance. International Journal of Game Theory, 42(1), 239-262.
Aumann, R.J. and Peleg, B. (1960). Von Neumann-Morgenstern Solutions to Cooperative Games without Side Payments. Bulletin of the American Mathematical Society, 66, 173-179.
Baranova, E. M. (2006). The Condition for Keeping Cooperation in Stochastic Cooperative Games. Proceedings of the Russian-Finnish Graduate School Seminar "Dynamic Games and Multicriteria Optimization", edited by V.V. Mazalov, Petrozavodsk, 54-58.
Baranova, E. M. and Petrosjan, L. A. (2006). Cooperative Stochastic Games in Stationary Strategies. Game Theory and Applications, 11, 1-7.
Bellman, R. (1957). Dynamic programming. Princeton: Princeton University Press, New Jersey.
Bure, V. M. and Parilina, E. M. (2017). "Multiple access" game with incomplete information. Mathematical Game Theory and Its Applications, 9(4), 3-17 (in Russian).
Chander, P., Tulkens, H. (1997). The Gore of an Economy with Multilateral Environmental Externalities. International Journal of Game Theory, 23, 379-401.
Chistyakov, S. and Petrosyan, L. (2011). Strong Strategic Support of Cooperative Solutions in Differential Games. Contributions to Game Theory and Management, IV, 105-111.
Driessen, T., Muto, S. and Nakayama, M. (1992). A cooperative Game of Information Trading: the Gore, the Nucleolus and the Kernel. ZOR Methods and Models of Operations Research, 36(1), 55-72.
Filar, J. and Vrieze, K. (1997). Competitive Markov Decision Processes. N.Y.: SpringerVerlag New York.
Fink, A.M. (1964). Equilibrium in a stochastic n-person game. Journal of Science of the Hiroshima University, A-I 28, 89-93.
Gillies, D.B. (1959). Solutions to general non-zero-sum, games. In: Tucker, A. W.; Luce, R. D. Contributions to the Theory of Games IV. (Annals of Mathematics Studies 40). Princeton: Princeton University Press, 47-85.
Gromova, E. V. and Petrosyan, L. A. (2015). On a approach of construction of characteristic function in cooperative differential games. Mathematical Game Theory and Its Applications, 7(4), 19-39 (in Russian).
Horner, J., Rosenberg, D., Solan, E. and Vieille, N. (2010). On a Markov game with onesided information. Operations Research, 58(4 PART 2), 1107-1115.
Jâskiewicz, A. and Nowak, A.S. (2016). Stationary almost Markov perfect equilibria in discounted stochastic games. Mathematics of Operations Research, 41(2), 430-441.
Kohlberg, E. (1971). On the Nucleolus of a Characteristic Function Game. SIAM Journal of Applied Mathematics, 20, 62-66.
Kohlberg, E. (1972). The Nucleolus as a Solution to a Minimization Problem. SIAM Journal of Applied Mathematics, 23(1), 34-39.
Kohlberg, E. andNeyman, A. (2015). The cooperative solution of stochastic games. Harvard Business School. Working Paper, No. 15-071.
Kuhn, H. W. (1950). Extensive Games. Proceedings of National Academy of Sciences of the USA, 36, 570-576.
Kuhn, H. W. (1953) Extensive Games and the Problem of Information. Annals of Mathematics Studies, 28, 193-216.
Markovkin, M. V. (2006). D. W. K. Yeung's Condition for Linear Quadratic Differential Games. In: Dynamic Games and Their Applications, eds. L. A. Petrosyan, A. Y. Garnaev, St. Petersburg State University, St. Petersburg, 207-216.
Maschler, M. and Peleg, B. (1976). Stable sets and stable points of set-valued dynamics system, with, applications to game theory. SIAM Journal Control Optim., 14(2), 985995.
WOLFRAM MATHEMATICA, https://www.wolfram.com/mathematica/
Mathworks (2017). https://se.mathworks.com
Meinhardt, H. I. Graphical Extensions of the Mathematiea Package TU Games. http: //library, wolfram .com/ infocenter/MathSource/5709/TuGames View3D. pdf.
Mertens, J.-F. and Neyman, A. (1981a). Minimax Theorems for Undiscounted Stochastic Games. Game Theory and Mathematical Economics, pp. 83-87.
Mertens, J. F. and Neyman, A. (1981b). Stochastic Games. International Journal of Game Theory, 10, 53-66.
Montero, M. (2005). On the nucleolus as a Power Index. Homo Oeconomicus, 22(4), 551-567.
von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton: Princeton University Press.
Neyman, A. (2008). Existence of Optimal Strategies in Markov Games with Incomplete Information. International Journal of Game Theory, 37(4), 581-596.
Neyman, A. (2013). Stochastic Games with Short-Stage Duration. Dynamic Games and Applications, 3, 236-278.
Neyman, A. and Sorin, S. (2003). Stochastic Games and Applications. Dordrecht: Kluwer Academic Press.
Nowak, A.S. (1985). Existence of equilibrium, stationary strategies in discounted noncoop-erative stochastic games with uncountable state space. Journal of Optimization Theory and Applications, 45(4), 591-602.
Nowak, A.S. (1999). Sensitive equilibria for ergodic stochastic games with countable state spaces. Mathematical Methods of Operations Research, 50(1), 65-76.
Nowak, A.S. and Radzik, T. (1994). A solidarity value for n-person transferable utility games. International Journal of Game Theory, 23(1), 43-48.
Parilina, E. M. (2014). Strategic consistency of single-point optimality pronciples in cooperative stochastic games. Mathematical Game Theory and Its Applications, 6(1), 56-72 (in Russian).
Parilina, E. M. (2015). Stable cooperation in stochastic games. Automation and Remote Control, 76, 1111-1122.
Parilina, E. (2016). Strategic Support of the Shapley Value in Stochastic Games. Contributions to Game Theory and Management, IX, 246-265.
Parilina, E. M. (2009). Cooperative stochastic game of data transmission in wireless network. Mathematical Game Theory and Its Applications, 1(4), 93-110 (in Russian).
Parilina, E. M. (2008). Subgame Consistency of Shapley Value in Cooperative Data Transmission Game in Wireless Network. Contributions to Game Theory and Management, SPb, 1, 381-994.
Parilina, E. and Sedakov, A. (2015). Stochastic Approach for Determining Stable Coalition Structure. International Game Theory Review, 17(4), pp. 155009-1^155009-22.
Parilina, E. M. and Petrosyan, L. A. (2017). Strongly subgame consistent core in stochastic games. Mathematical Game Theory and Its Applications, 9(2), 39-61 (in Russian).
Parilina, E. and Tampieri, A. (2018). Stability and Cooperative Solution in Stochastic Games. Theory and Decision, 84(4), 601-625.
Parilina, E. and Zaccour, G. (2015). Node-Consistent Core for Games Played over Event Trees. Automatica, 53, 304-311.
Pecherski, S.L. and Yanovskaya, E. B. (2004). Cooperative games: solutions and axiom,s. Saint Petersburg: Publ. house of European university in SPb (in Russian).
Petrosjan, L. A. (2006). Cooperative Stochastic Games. In: Advances in Dynamic Games. Annals of the ISDG. Application to Economics, Engineering and Environmental Management, ed. by A. Haurie, S. Muto, L. A. Petrosjan, T.E.S. Raghavan, pp. 139-146.
Petrosjan, L. A. and Baranova, E. M. (2005). Cooperative Stochastic Games in Stationary Strategies. Proceedings of the Fifth International ISDG Workshop International Society of Dynamic Games, Segovia (Spain), pp. 225-234.
Petrosjan, L. and Zaccour, G. (2003). Tim,e-consistent Shapley value allocation, of pollution, cost reduction. Journal of Economic Dynamics & Control, 27, 381-398.
Petrosjan, L. A. and Zenkevich, N. A. (2015). Conditions for Sustainable Cooperation. Automation and Remote Control, 76(10), 1894-1904.
Petrosyan, L. A. (1977). Time consistency of solutions in cooperative games with many participants. Vestnik Leningradskogo universiteta. Séria 1, 19, 46-52 (in Russian).
Petrosyan, L. A. (1992). Construction of strongly time-consistent solutions in cooperative differential games. Vestnik Leningradskogo universiteta. Séria 1, 2, 33-38 (in Russian).
Petrosyan, L. A. (1993). Strongly time-consistent optimality principles in multi-criteria problems of optimal control. Tekhnicheskaya kibernetika, 1, 169-174 (in Russian).
Petrosyan, L. A. (1998). Semi-cooperative games. Vestnik Leningradskogo universiteta. Séria 1, 2, 57-63 (in Russian).
Petrosyan, L. A. and Baranova, E. M. (2003). Stochastic games with, random duration. Trudi XXXIVth nauchnoi konferencii aspirantov i studentov "Procesi upravleniya i ustoichivost". SPb, 456-462 (in Russian).
Petrosyan, L. A. and Baranova, E. M. (2005a). Cooperative stochastic games. Tezisi dok-ladov Mezhdunarodnogo seminara "Teoriya upravleniya i teoriya obobjennih resheniy uravneniy Hamiltona-Yacobi". Ekaterinburg, pp. 33-35 (in Russian).
Petrosyan, L. A. and Baranova, E. M. (2005b). Cooperative stochastic games in stationary strategies. Sbornik trudov Mezhdunarodnoi conferencii "Ustoichivoct i procesi upravleniya". SPb, 1, 495-503 (in Russian).
Petrosyan, L. A., Baranova, E. M. and Shevkoplyas, E. V. (2004). Multistage cooperative games with, random duration. Trudi Instituta Matematiki i Mekhaniki, 10(2), 116-130 (in Russian).
Petrosyan, L. and Chistyakov, S. (2013). Strategic support of Cooperative Solutions in 2-Person Differential Games with, Dependent Motions. Contributions to Game Theory and Management, 6, 388-394.
Petrosyan, L. A. and Danilov, N. A. (1979). Time consistent solutions of non-zero sum differential games with, transferable payoffs. Vestnik Leningradskogo universiteta. Seria 1,1, 46-54 (in Russian).
Petrosjan, L. A. and Grauer, L.V. (2002). Strong Nash, equilibrium in multistage games. International Game Theory Review, 4(2), 255-264.
Gromova, E. V. and Petrosyan, L. A. (2015). Strongly time-consistent cooperative solution for a differential game of pollution, control. Large-scale Systems Control, 55, 140-159 (in Russian).
Petrosyan, L. A. and Kuzutin, D.V. (2000). Games in extensive form,: otimality and consistency. Saint Petersburg: Izd-vo S.-Peterburgskogo universiteta (in Russian).
Petrosyan, L. A. and Shevkoplyas, E. V. (2000). Cooperative differential games with, random duration. Vestnik Sankt-peterburgskogo universiteta. Seria 1, 4, 14-18 (in Russian).
Petrosyan, L. and Sedakov, A. (2015). Strategic support of cooperation in dynamic games on, networks. In: "Stability and Control Processes" in Memory of V.I. Zubov (SCP), 2015 International Conference, 256-260.
Reddy, P. V. and Zaccour, G. (2016). A friendly computable characteristic function. Mathematical Social Sciences, 82, 18-25.
Schmeidler, D. (1969). The Nucleolus of a Characteristic Function game. SIAM Journal of Applied Mathematics, 17, 1163-1170.
Schneider, R. (1993). Convex bodies: the Brunn Minkowski theory. Cambridge Univ. Press.
Shapley, L. S. (1953a). Stochastic Games. Proceedings of National Academy of Sciences of the USA, 39, 1095-1100.
Shapley, L. S. (1953b). A Value for n--person Games. In: H.W. Kuhn, A.W. Tucker, eds. Contributions to the Theory of Games II, Princeton: Princeton Univ. Press, 307-317.
Sedakov, A. A. (2015). The strong time-consistent core. Mathematical Game Theory and Its Applications, 7(2), 69-84.
Selten, R. (1975). Reexamination of the perfeetness concept for equilibrium points in extensive games. International Journal of Game Theory, 4, 25-55.
Shevkoplyas, E. V. (2010). Stable cooperation in differential games with random duration. Mathematical Game Theory and Its Applications, 2(3), 162-190 (in Russian).
Solan, E. (1998). Discounted Stochastic Games. Mathematics of Operations Research, 23, 1010-1021.
Solan, E. (2009). Stochastic Games. In: Encyclopedia of Database Systems, Springer.
Solan, E. and Vieille, N. (2002). Correlated Equilibrium in Stochastic Games. Games and Economic Behavior, 38, 362-399.
Solan, E. and Vieille, N. (2015). Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 112(45), 13743-13746.
Takahashi, M. (1964). Stochastic games with infinitely many strategies. Journal of Science of the Hiroshima University, A-I 28, 95-99.
http://mmiras.webs.uvigo.es/TUGlab/
Tur, A.V. (2014). The irrational behavior proof condition for linear-quadratic discrete-time dynamic games with nontransferable payoffs. Contributions to Game Theory and Management, 7, 384-392.
Vieille, N. (2000). Solvable states in n-player stochastic games. SIAM Journal on Control and Optimization, 38(6), 1794-1804.
Vilkas, E. (1990). Optimality in games and solutions. Moscow: Nauka (in Russian).
Vorobiev, N. N. (1960). Stable profiles in coalitional games. Dokladi Akademii USSR, 131, 493-495 (in Russian).
Vorobiev, N.N. (1967). Coalitional games. Teoriya Veroyatnostei i Ee Primenenie, 12(2), 289-306 (in Russian).
Vorobiev, N. N. (1985). Game theory for ecomonists cybernatists. Moscow: Nauka (in Russian).
Yeung, D.W. K. (2006). An irrational-behavior-proof condition in cooperative differential games. International Game Theory Review, 8, 739-744.
Yeung, D.W. K. and Petrosyan, L. A. (2011). Subgame Consistent Cooperative Solution of Dynamic Games with Random, Horizon. Journal of Optimization Theory and Applications, 150(1), 78-97.