David W.K. Yeung1 and Leon A. Petrosyan2
1 SRS Consortium for Advanced Study in Cooperative Dynamic Games Shue Yan University and
Center of Game Theory, St Petersburg State University E-mail: dwkyeung@hksyu.edu 2 Faculty of Applied Mathematics-Control Processes,
St Petersburg State University E-mail: Spbuoasis7@petrlink.ru
Abstract. In cooperative dynamic games a stringent condition - that of subgame consistency - is required for a dynamically stable cooperative solution. In particular, a cooperative solution is subgame consistent if an extension of the solution policy to a subgame starting at a later time with a state brought about by prior optimal behavior would remain optimal. This paper extends subgame consistent solutions to dynamic (discrete-time) cooperative games with random horizon. In the analysis new forms of the Bellman equation and the Isaacs-Bellman equation in discrete-time are derived. Subgame consistent cooperative solutions are obtained for this class of dynamic games. Analytically tractable payoff distribution mechanisms which lead to the realization of these solutions are developed. This is the first time that subgame consistent solutions for cooperative dynamic games with random horizon are presented.
Keywords:Cooperative dynamic games, random horizon, subgame consistency.
AMS Subject Classifications. Primary 91A12, Secondary 91A25.
1. Introduction
Cooperative games suggest the possibility of socially optimal and group efficient solutions to decision problems involving strategic action. In cooperative dynamic games a stringent condition - that of subgame consistency - is required for a dynamically stable cooperative solution. In particular, a cooperative solution is subgame consistent if an extension of the solution policy to a subgame starting at a later time with a state brought about by prior optimal behavior would remain optimal. In particular dynamic consistency ensures that as the game proceeds players are guided by the same optimality principle at each instant of time, and hence do not possess incentives to deviate from the previously adopted optimal behavior. A rigorous framework for the study of subgame-consistent solutions in cooperative stochastic differential games (which are in continuous-time) was established in Yeung and Petrosyan (2004 and 2006). A generalized theorem was developed for the derivation of an analytically tractable “payoff distribution procedure” leading to dynamically consistent solutions. Cooperative games with subgame consistent solutions are presented in Petrosyan and Yeung (2007), Yeung (2007 and 2008), and Yeung and Petrsoyan (2006).
In discrete-time dynamic games, Basar and Ho (1974) examined informational properties of the Nash solutions of stochastic nonzero-sum games and Basar (1978) developed equilibrium solution of linear-quadratic stochastic games with noisy observation. Krawczyk and Tidball (2006) considered a dynamic game of water allocation. Nie et al. (2006) considered dynamic programming approach to discrete time dynamic Stackelberg games. Dockner and Nishimura (1999) and Rubio and Ulph (2007) presented discrete-time dynamic game for pollution management. Dutta and Radner (2006) presented a discrete-time dynamic game used to study global warming. Ehtamo and Hamalainen (1993) examined cooperative incentive equilibrium for a dynamic resource game. Yeung and Petrosyan (2010) had developed a generalized theorem for the derivation of an analytically tractable “payoff distribution procedure” leading to dynamically consistent solutions for cooperative stochastic dynamic games.
In this paper, we extend subgame consistent solutions to discrete-time dynamic cooperative games with random horizon. In many game situations, the terminal time of the game is not known with certainty. Examples of this kind of problems include uncertainty in the renewal of lease, the terms of offices of elected authorities/directorships, contract renewal and continuation of agreements subjected to periodic negotiations. Petrosyan and Murzov (1966) first developed the Bellman Isaacs equations under random horizon for zero-sum differential games. Petrosyan and Shevkoplyas (2003) gave the first analysis of dynamically consistent solutions for cooperative differential with random duration. Shevkoplyas (2011) considered Shapley value in cooperative differential games with random horizon. In this paper, a class of discrete-time dynamic games with random duration is formulated. A dynamic programming technique for solving inter-temporal problems with random horizon is developed to serve as the foundation of solving the game problem. To characterize a noncooperative equilibrium, a set of random duration discrete-time Isaacs-Bellman equations is derived.
Moreover, subgame consistent cooperative solutions are derived for dynamic games with random horizon. Analytically tractable payoff distribution mechanisms which lead to the realization of these solutions are derived. It represents the first attempt to seek subgame consistent solution for cooperative dynamic games with random horizons. This analysis widens the application of cooperative dynamic game theory to problems where the game horizon is random. The organization of the paper is as follows. Game formulation and derivation of dynamic programming techniques for random horizon problems are provided in Section 2. A feedback Nash equilibrium is characterized for dynamic games with random horizon in Section 3. Dynamic cooperation under random horizon is presented in Section 4. Subgame consistent solutions and payment mechanism leading to these solutions are obtained in Section 5. Concluding remarks are given in Section 6.
2. Game Formulation and Dynamic Programming with Random Horizon
In this section, we first formulate a class of dynamic games with random duration. Then we develop a dynamic programming technique for solving inter-temporal problems with random horizon which will serve as the foundation of solving the game problem.
2.1. Game Formulation
The n—person dynamic game to be considered is a T stage game where T is random variable with range {1, 2, ■ ■■ , T} and corresponding probabilities {Q1, Q2, ■ ■ ■ , QT}. Conditional upon the reaching of stage t, the probability of the game would last up to stages t,t + 1, ■ ■ ■ ,T becomes respectively
^t+1
t ’ t ’ ’ t '
oc ^ec
C=T Z=T C=T
The payoff of player i at stage k € {1, 2, ■ ■ ■ ,T} is g\ [xk, u\, u\, ■ ■ ■ , W^, xk+1j. When the game ends after stage T, player i will receive a terminal payment 4+1(xt+1) in stage T+ 1.
The state space of the game is X € Rm and the state dynamics of the game is characterized by the difference equation:
xk + 1 = fk(xk ,uk ,ukr ■ ■ ,un) (1)
for k €{1, 2, ■ ■ ■ , T} = T and x1 = x0,
where u\ € Rmi is the control vector of player i at stage k and xk € X is the state.
The objective of player i is
E {^2 gk [xk,uk,ul r ■ ■ ,un,xk+1] + 4’+1(xT+1)
k=1
J2eT {Y1 gk [xk ,
,1,2
'k, uk, ■ ■ , uk ,
, xk + 1] + q'f'+1(x^f+1) f , (2)
T= 1
k=1
for i € {1, 2, ■ ■ ■ ,n} = N .
In the continuous time Petrosyan and Shevkoplyas (2003) game the game horizon can reach infinity. In the discrete-time game (1)-(2), the game horizon is random but finite. To solve the game (1)-(2), we first have to derive a dynamic programming technique for solving a random horizon problem.
2.2. Dynamic Programming for Random Horizon Problem
Consider the case when n = 1 in the system (1)-(2). The payoff at stage k € {1, 2, ■ ■ ■ , T} is gk[xk, uk, xk+1]. If the game ends after stage T, the decision maker will receive a terminal payment Qf+1 (xf+1) in stage T + 1.
The problem can be formulized as the maximization of the expected payoff:
E { ^ ^ gk [xk, uk, xk+1] + qT+1 (xT+1 )
U=1
'y 9T { ^ , gk[xk,uk, xk + 1] + qT+1 (xT+1) f , (3)
subject to the dynamics
T= 1
k=1
xk+1 = fk (xk ,uk), x1 = x0 • (4)
Now consider the case when stage t has arrived and the state is xT. The problem can formulized as the maximization of the payoff:
k=T
T
tT { + qf+1(xf+1) , (5)
t=t t * k=T
Z=T
subject to the dynamics
xk+1 fk (xk ,uk), xT xT • (6)
We define the value function V (t,x) and the set of strategies { uk = ^k (x), for k €{t,t + 1, ■ ■ ■ , T}} which provides an optimal solution as follows:
V(t, x) = max E
UT ,UT+1 ,Urp
t 9- \ T
IliaX ^ rp \ ^ Qk [xk> ^k■? xfc + l] “1“ 1 (XT+l) I Xr x
Ut,Ut+1,"‘ ,utt=t £ 9z U=T Z=T
T e- [ T
= 53 ~tI~ { ^9k[x>kAk{xk),x>k+1
T=t £ 9^ U=t
z=t
for t € т, where xk+1 = fk[xk, ^k(xi )L x1 = xo.
A Theorem characterizing an optimal solution to the random-horizon problem (3)-(4) is provided in the theorem below.
] + q^f+1(x^f+1) I x*r =
(7)
x
k=T
x
Theorem 1. A set of strategies {uk = ^k (x), for k € T} provides an optimal solution to the problem (3)-(4) if there exist functions V(k,x), for k € T, such that the following recursive relations are satisfiedI:
V (T + 1,x) = Qt +l(x),
V(T,x) = max {gT[x,ut, f (x,ut)] + V[T + 1, f (x,ut)]} ,
Ut
e
V(t,x) = max { gT[x,uT, fT(x,uT)] + T T qT+i [/t(x, uT)\
E 9z (8)
Z=T T
E 9c
+ c y+1—V[t + 1, fT(x,uT)\ , for T e {1,2, • • • ,T - 1}.
Z = T
Proof. By definition, the value function at stage T + 1 is
V (T + 1,x) = Qt +1(x).
We first consider the case when the last stage T has arrived. The problem then becomes
nuax | gT [xT ,uT ,xT+1]+ qT+1(xT+1)
subject to
xt +1 = fT(xt, ut), xt = xt■ (9)
Using V(T + 1,x)=qT+1(x), the problem in (9) can be formulated as a single stage problem
ma:M gT [xT, uT, fT (x, uT)] + V(T + 1, fT (x, uT)
UT [
with xT = x.
Hence we have V(T, x)= max < gT [x,uT,fT(x,uT)] + V[T +1,fT(x,uT)] J>.
Ut I J
Now consider the problem in stage t€ {1, 2, ■ ■ ■ ,T — 1} in which one have to maximize
53 T A ^ 53 3k\xki uki xfc + l] + Qj'jri(xj'jri) 'Z v k =
C=T
9t
t=t £ ez
xr + l] j, ^T+l(xr+l)
E 9C
Z=T
E e-f | t
^ \ ^ s'fc\%ki xfc+i] “i-
£ ec U=T+1
c=t
Qr\XT: r^ri xr + l] “1“ ~7j^ ^T+l(xr+l)
E 9Z
Z=T
E ez E eT ( t 'j
H y y \ ^ , *7fe [s-fc j wfcj xfc+i] + (a'j,^i) / • (10)
£ ec £ 9z U=T +1 )
Z=T C=T+1
Using V(t + 1,x) characterized, the problem (10) can be formulated as a single stage problem
T
9r £+i ez
max ^ gr [x: uT: /T(x, ur)] -|- Qr+1 [/r(x? ur)] ^ V[t -\- 1, fT(x, wT)]
E ec E ec
Z=T Z=T
with xT = x.
Hence we have
e
V{t,x) = max { gT[x,uT, fT(x,uT)} + TT qT+1[fT(x,uT)\
E 9z
Z=T
T
£ 9z
+ c TT+1 V[t + 1Jt(x,ut)}} , for r G {1,2,-•• ,T-2}. (12)
E 9z
Z=T
and Theorem 1 follows.
Theorem 1 yields a set of Bellman equations for random horizon problems (3)-
3. Random Horizon Feedback Nash Equilibrium
In this subsection, we investigate the noncooperative outcome of the discrete-time game (1)-(2). In particular, a feedback Nash equilibrium of the game can be characterized by the following theorem.
Theorem 2. A set of strategies {^(x), for k G T and i G N} provides a feedback Nash equilibrium solution to the game (1)-(2) if there exist functions Vi(k,x), for k G T and i G N, such that the following recursive relations are satisfied:
v i(T + i,x) = {+1(x),
Vi(T, x) = max {g'lT[x, 4>T(x), 4>T(x), • • • , 4>i—1(x), ut, 4E+1 (x), • • •
uT
• • • , 4>T (x), fT (xi uT )] + qT+1[/T (xi uT )]},
Vi(r, x) = max [glT[x, 4>lT(x), 4>2T(x), • • • , ft-1 (x), ulT, ^T+1(x), • • •
UT
T
e E ez
• • • , #(*), fT(x, <)] + ^^4+1[/;(x, <)] + <-L»—Vi[T + 1, fi(X}
E ec E ec
z=t z=t
for t G {1, 2, • • • ,T - 1},
(13)
where fi(x,uk)= fk[x,4>\(x),4>2k(x), • • • ,^ifc-1(x),uifc,^+1(x), • • • At(x)].
Proof. The conditions in (13) shows that the random horizon dynamic programming result in Theorem 1 holds for each player given other players’ equilibrium strategies. Hence the conditions of a Nash (1951) equilibrium are satisfied and Theorem 2 follows.
The set of equations in (13) represents the discrete analogue of the Isaacs-Bellman equations under random horizon.
Substituting the set of feedback Nash equilibrium strategies {^.(x), for k G T and i G N} into the players’ payoff yields
Vi (t, x) = E ■
T
E
gk [xk Al (xk ),4>k(xk ), • • • ,4ik(xk ),xk+1]
qT+1(x'
^T+1)
T e- I T = E ~¥I~ { E9k[xk,4>\{xk),4>l{xk)
T=t E ez U=t
z=T
where xT = x. The Vi(T, x) value function gives the expected game equilibrium payoff to player i from stage t to the end of the game.
Ak(xk),xk+1] + qT+1 (x^f+1) f, i G N,
4. Dynamic Cooperation under Random Horizon
Now consider the case when the players agree to cooperate and distribute the payoff among themselves according to an optimality principle. Two essential properties that a cooperative scheme has to satisfy are group optimality and individual rationality.
4.1. Group Optimality
Maximizing the players’ expected joint payoff guarantees group optimality in a game where payoffs are transferable. To maximize their expected joint payoff the players have to solve the discrete-time dynamic programming problem of maximizing
* i£
j=1
J2gk[xk ,uk ,uk , • • • ,ukk, xk+i ] + qj+1(xT+i)
k=1
= ££ <£ gk [Xk , uk , uk, • • • ,uk, xk+1] + qT+1(xT+1) | (14)
j=1 T=1 [k=1 J
subject to (1).
Invoking the random horizon dynamic programming method in Theorem 1 we can characterize an optimal solution to the problem (14)-(1) as
Corollary 1. A set of strategies {uk = ^lk (x), for k G k and i G N} provides an
optimal solution to the problem (14)-(1) if there exist functions W(k,x), for k G T,
such that the following recursive relations are satisfied:
W(T + 1,x) =53 qT +1(x)
j=1
k
TW (T, x) = max { ^ ^ gT [x, ut , ut, • • • , ut, fT (x, ut , ut, ut, • • • , ut )]
uT ,u‘T ,••• ,un . = 1
+qT+1[fT(x,uT,uT,uT, • • • ,uT)]} ,
k
W (t,x)= i rn ax {V [gJ [x,u\,u\^ • • ,u'k,fT (x,u1T,u21., • • • ,<)]
ui ,u2 ,-■■ ,un
j=1
(15)
+ T"r <fT+l [fr(x,ul,ul, ■ ■ ■ ,<)]]
E
C=T
T
E
+ —W[T + l,fT{x,ulT,u2T,--- ,<)]}, for t G {1,2, ••• ,T-1}.
E oc
c=t
Substituting the optimal control {^lk(x), for k G T and i G N} into the state dynamics (1), one can obtain the dynamics of the cooperative trajectory as:
xk+1 = fk[xk,r^l(xk),^l(xh), • • • ,^kk(xk)],
(16)
Subgame Consistent Solution for Random-Horizon Cooperative Dynamic Games 497 for k G T and xi = x0.
We use x*k to denote the solution generated by (16).
Using the set of optimal strategies {^lk (x*k), for k G T and i G N} one can obtain the expected cooperative payoff as
C=T
(17)
4.2. Individual Rationality
The players then have to agree to an optimality principle in distributing the total cooperative payoff among themselves. For individual rationality to be upheld the imputation (see von Neumann and Morgenstern (1944)) a player receives under cooperation have to be no less than his expected noncooperative payoff along the cooperative state trajectory.
Let £(• , ■ ) denote the imputation vector guiding the distribution of the total cooperative payoff under the agreed-upon optimality principle along the cooperative trajectory { x*k }^=1. At stage t, the imputation vector according to £(■, ■ ) is
£,(T,x*T)= [£1(T,x*)^k(T,x*T), ■ ■ ■ ,^n(t,x*t)]; for t g t.
There is a variety of imputations that the players can agree upon. For examples, (i) the players may share the excess of the total cooperative payoff over the sum of individual noncooperative payoffs equally, or (ii) they may share the total cooperative payoff proportional to their noncooperative payoffs or a linear combination of (i) and (ii).
For individual rationality to be maintained throughout all the stages t G T, it is required that:
(t,x*t) > V'1(t,x*t), for i G N and t G T.
To satisfy group optimality, the imputation vector has to satisfy
n
W(t,x*t) = 53 ¥ (t,x*t), for T G T. j=i
5. Subgame Consistent Solutions and Payment Mechanism
To guarantee dynamical stability in a dynamic cooperation scheme, the solution has to satisfy the property of subgame consistency. A cooperative solution is subgame-consistent if an extension of the solution policy to a subgame starting at a later time with a state along the optimal cooperative trajectory would remain optimal. In particular, subgame consistency ensures that as the game proceeds players are guided by the same optimality principle at each stage of the game, and hence do not possess incentives to deviate from the previously adopted optimal behavior.
Therefore for subgame consistency to be satisfied, the imputation (( ■, ■ ) according to the original optimality principle has to be maintained along the cooperative trajectory { x*k }^=1. Let the imputation governed by the agreed upon optimality principle be
£(k,x*k) = [e1(k,xk),e2(k,xk),■ ■ ■ ,C(k,xk)] at stage k, for k G T. (18)
Crucial to the analysis is the formulation of a payment mechanism so that the imputation in (18) can be realized as the game proceeds.
Following the analysis of Yeung and Petrosyan (2010), we formulate a discretetime random-horizon Payoff Distribution Procedure (PDP) so that the agreed-upon imputations (18) can be realized. Let Blk(x*k) denote the payment that player i will received at stage k under the cooperative agreement.
The payment scheme involving Blk (x*k) constitutes a PDP in the sense that along the cooperative trajectory {x*k}1[=1 the imputation to player i over the stages from k to T can be expressed as:
(‘(T
E B'„ (.
k(xk
+ qT'+1(xT'+1)
k=T
T ft
E°T
T
t=t £ dc Z = T
J^Bi(xk)+ qT,+i(x*f+i)\ , i G N and k G T. (19)
k=T
If the game lasts up to stage T, then at stage T +1, player i will receive a terminal payment qT +1 (xT +1) and BT +1(xT +1) = 0. Hence the imputation ((T + 1,xT+1) equals qT +1(xT +1).
Theorem 3. A payment equaling
T
E 9
K(xt) = €(T>xr) - —f(r + l,x;+1)~ TT qtT+1(x*+1), for i e N,
^ ftz £ ftz
z=t z=t
(20)
given to player i at stage t G T would lead to the realization of the imputation ((t,x*t)in (18).
Proof. Using (19) we obtain
E “f | t
“t i , * s , Z=t+1
e(r, x*T) = BiT(x*T) + ^qir+1(x*T+1) + ^------------- I ]T Bk(x*k) + 4+l(4+l)
£ “z E “z lk=T+1
z=T C=t
f f
a “z 2 | f
-yi ( *\ I “T if* \ I C=t +1 C = t +1
5;«)+^g;+1«+1)+c;+ 4;+— \ y, Bi(x*k)+qi+1(x*f+1) \.
E “z E “z £ 0.
Z=T Z=t Z=t+1
(21)
Invoking the definition of £>t(r,x^r) in (19), we can express (21) as
f
“ E “z
e(r, <) = B‘«) + -^4+1«+1) + C-^—C(k + 1, a£+1). (22)
£ “z £ “z
z=T z=T
Using (22) one can readily obtain (20). Hence Theorem 3 follows.
Note that the payoff distribution procedure BlT(x*) in (20) would give rise to the agreed-upon imputation
£,(k,x*k) = (k,x*k),£2(k,x*k), • • • ,£n(k, xk)] at stage k, for k € T.
Therefore subgame consistency is satisfied,
When all players are using the cooperative strategies, the payoff that player i will directly received at stage k€ T is
ok [xk, ^1(xk ), ^k(xk), • • •, ^k(xk), xk+1].
However, according to the agreed upon imputation, player i is to received Blk (x*k) at stage k. Therefore a side-payment
(xk) = Bi(xk) - ok[xkAl(xk)Al(xk), • • • Ak(xk),xk], for k € T and i € N,
(23)
will be given to player i.
6. Concluding Remarks
In this paper, we extend subgame consistent solutions to dynamic cooperative games with random horizon. Note that time consistency refers to the condition that the optimality principle agreed upon at the outset must remain effective throughout the game, at any instant of time along the optimal state trajectory. In the presence of stochastic elements, subgame consistency is required. In particular, subgame consistency requires that the optimality principle agreed upon at the outset must remain
effective in any subgame with a later starting time and a realizable state brought about by prior optimal behavior along the optimal cooperative trajectory. However, the optimal cooperative trajectory (16) is a deterministic difference equation with a random stopping time k + 1 €{2, 3, • • • T}. Hence the subgame consistency notion in this analysis is a form of optimal-trajectory-subgame consistency.
New forms of the Bellman equation and the Isaacs-Bellman equation are derived. Subgame consistent cooperative solutions are derived for dynamic games with random horizon. Analytically tractable payoff distribution mechanisms which lead to the realization of these solutions are derived. The analysis widens the application of cooperative dynamic game theory to problems where the game horizon is random. Finally, this is the first time that subgame consistent solutions are derived for cooperative dynamic games with random horizon further research along this line is expected.
Acknowledgements
Financial support by the EU TOCSIN Project is gratefully acknowledged. References
Basar, T. (1978). Decentralized Multicriteria Optimization of Linear Stochastic Systems.
IEEE Transactions on Automatic Control, AC-23(2), 233-243.
Basar, T. and Ho., Y. C. (1974). Informational properties of the Nash solutions of two stochastic nonzero-sum games. Journal of Economic Theory, 7(4), 370—387.
Dockner, E., and Nishimura, K. (1999). Transboundary pollution in a dynamic game model.
The Japanese Economic Review, 50(4), 443-456.
Dutta, P.-K. and Radner, R. (2006). Population growth and technological change in a global warming model. Economic Theory, 29(2), 251-270.
Ehtamo, H. and Hamalainen, R. (1993). A Cooperative Incentive Equilibrium for a Resource Management. Journal of Economic Dynamics and Control, 17(4), 659-678. Krawczyk, J. B. and Tidball, M. (2006). A Discrete-Time Dynamic Game of Seasonal Water Allocation. Journal of Optimization Theory and Application, 128(2), 411-429. Nash, J. F. Jr. (1951). Noncooperative Games. Annals of Mathematics, 54, 286-295.
Nie, P., Chen, L. and Fukushima M. (2006). Dynamic programming approach to discrete time dynamic feedback Stackelberg games with independent and dependent followers. European Journal of Operational Research, 169, 310-328.
Petrosyan, L. A. and Murzov, N. V. (1966). Game Theoretic Problems in Mechanics. Litvsk.
Math. Sb., N6, 423-433.
Petrosyan, L. A. and Shevkoplyas, E. V. (2003). Cooperative Solutions for Games With Random Duration. Game Theory and Applications IX, 125-139.
Petrosyan, L. A. and Yeung, D.W. K. (2007). Subgame-consistent Cooperative Solutions in Randomly-furcating Stochastic Differential Games. International Journal of Mathematical and Computer Modelling (Special Issue on Lyapunovs Methods in Stability and Control), 45, 1294-1307.
Rubio, S., and Ulph, A. (2007). An infinite-horizon model of dynamic membership of international environmental agreements. Journal of Environmental Economics and Management, 54(3), 296-310.
Shevkoplyas, E. V. (2011). The Shapley Value in Cooperative Differential Games with Random Duration. Forthcoming in Annals of Dynamic Games. von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior, John Wiley and Sons, New York.
Yeung, D.W. K. (2007). Dynamically Consistent Cooperative Solution in a Differential Game of Transboundary Industrial Pollution. Journal of Optimization Theory and Applications, 134, 143-160.
Yeung, D.W. K. and Petrosyan, L. A. (2004). Subgam, e Consistent Cooperative Solutions in Stochastic Differential Games. Journal of Optimization Theory Applications, 120, 651-666.
Yeung, D.W. K. and Petrosyan, L. A. (2006). Cooperative Stochastic Differential Games, Springer-Verlag, New York.
Yeung, D. W. K. and Petrosyan, L. A. (2008). A Cooperative Stochastic Differential Game of Transboundary Industrial Pollution. Automatica, 44(6) , 1532-1544.
Yeung, D. W. K. and Petrosyan, L. A. (2010). Subgame Consistent Solutions for cooperative Stochastic Dynamic Games. Journal of Optimization Theory and Applications, 145(3), 579-596.