Leon Petrosyan and Sergey Chistyakov
St. Petersburg University,
Faculty of Applied Mathematics and Control Processes,
35 Universitetsky prospekt, St. Petersburg, 198504, Russia E-mail: [email protected]
Abstract The problem of strategically supported cooperation in 2-person differential games with integral payoffs is considered. Based on initial differential game the new associated differential game (CD-game) is designed. In addition to the initial game it models the players actions connected with transition from the strategic form of the game to cooperative with in advance chosen principle of optimality. The model provides possibility of refusal from cooperation at any time instant t for each player. As cooperative principle of optimality the Shapley value is considered. In the bases of CD-game construction lies the so-called imputation distribution procedure described earlier in (Petrosjan and Zenkevich, 2009). The theorem established by authors says that if at each instant of time along the conditionally optimal (cooperative) trajectory the future payments to each player according to the imputation distribution procedure exceed the maximal guaranteed value which this player can achieve in CD-game, then there exist a Nash equilibrium in the class of recursive strategies first introduced in (Chistyakov, 1981) supporting the cooperative trajectory. In the present paper the results similar to (Chistyakov and Petrosyan, 2011) are obtained without the requirement of independent motions and for the more general type of payoff functions. Keywords: strong Nash equilibrium, time-consistency, core, cooperative trajectory.
1. Introduction
Similar to (Petrosjan and Zenkevich, 2009; Chistyakov and Petrosyan, 2011) in this paper the problem of strategically support of cooperation in differential 2-person game with prescribed duration T and dependent motions is considered.
^ =/(t,x,M(1),M(2)), *€/=[1,2], (1)
x € Rn, u(i) € P(i) C CompRk(i), i € I
x(to ) = xo. (2)
The payoffs of players i e I = [1, 2] have integral form
Htl]X0 (u(1)(-),u(2)(-)) = ^ h(i) ^t,x(t),u(1)(t),u(2)(t^ dt, (3)
where u() = (u(1)( ),u
(2)(.))
is a given vector-function of open loop controls, x(t) = x(t,t0, x0, u(1)(-), u(2)(■)) is the solution of the Cauchy problem (1) with corresponding initial conditions (2) and admissible open loop controls u(1)(-), u(2)(■) of players.
Admissible open loop controls of players i € I are Lebesgue measurable open loop controls
u(i)(■) : t ^ u(i)(t) € Rk(i), i € I = {1, 2}
such that
u(i)(t) € P(i) for almost all t € [t0, T],i € I.
It is supposed that the function f : R x Rn x P(1) x P(2) ^ Rn is continuous, locally Lipschitz with respect to x and satisfies the following condition: 3A > 0 such, that
\\f(t,x,u(l^,u(2'))\\ < A(1 + \\x\\) ix € Rk(l'), iu(1) € P(l'),u(2') € P(2^.
Each of the functions
h(i) : R x Rn x P(1) x P(2) ^ R, i € I
are also continuous.
For all t € R+, x € Rn, I € Rn
max min (< l,f (t,x,u(1),u(2)) > +h(1)(t,x,u(1),u(2))) =
«(!) EP(!) u(2) EP(2)
min max (< t,f(t,x,u(1 ,u(2')) > +h(1')(t,x,u(1'),u(2)))
u(2) EP (2) u(l) EP (!)
and
max min (< I, f (t,x,u(1),u(2)) > +h(2)(t,x,u(1),u(2))) =
u(2) EP(2) u(i) EP(1)
min max (<t,f (t,x,u(1),u(2)) > +h(2)(t,x,u(1),u(2))) ,
u(i) EP(1) u(2) EP(2) V '
here < ■, ■ > is scalar product in Rn.
It is supposed that at each time instant t € [t0,T] the players have information about the current position (t,x(t)) on the time interval [t0,t] and use recursive strategies (Chistyakov, 1977; Chistyakov, 1999).
2. Recursive strategies
Recursive strategies were first introduced in (Chistyakov, 1977) for justification of dynamic programming approach in zero sum differential games, known as method of open loop iterations in non regular differential games with non smooth value function. The e-optimal strategies constructed with the use of this method are universal in the sense that they remain e-optimal in any subgame of the previously defined differential game (for every e > 0). Exploiting this property it became possible to prove the existence of e-equilibrium (Nash equilibrium) in non zero sum differential games (for every e > 0) using the so called "punishment strategies" (Chistyakov, 1981).
The basic idea is that when one of the players deviates from the conditionally optimal trajectory other players after some small time delay start to play against the deviating player. As result the deviating player is not able to get much more than he could get using the conditionally optimal trajectory. The punishment of the deviating player at each time instant using one and the same strategy is possible because of the universal character of e-optimal strategies in zero sum differential games.
In this paper the same approach is used to testify the stability of cooperative agreements in the game r(t0, x0) and as in mentioned case the principal argument is the universal character of e-optimal recursive strategies in specially defined zero sum games ri(t0, x0), i € I = [1, 2] associated with the non-zero sum game r(t0, x0).
The recursive strategies lie somewhere in-between piecewise open loop strategies (Petrosyan, 1993) and e-strategies introduced by B. N. Pshenichny (Pschenichny, 1973). The difference from piecewise open loop strategies consists in the fact that like in the case of e-strategies ofB. N. Pshenichny the moments of correction of open loop controls are not prescribed from the beginning of the game but are defined during the game process. In the same time they differ from e-strategies of B. N. Pshenichny by the fact that the formation of open loop controls happens in finite number of steps.
Recursive strategies U(n) of player i with maximal number of control corrections n is a procedure for the admissible open loop formation by player i in the game r(to, xo), (to, xo) € D.
At the beginning of the game r(t0, x0) player i using the recursive strategy U(n^ defines the first correction instant t^ € (t0,T] and his admissible open loop control u(i) = u(i )(t) on the time interval [t0,t(1')]. Then if t(i < T having the information
1( i) 1
about state of the game at time instant t1 he chooses the next moment of correction t^ and his admissible open loop control u(i) = u(i)(t) on the time interval (tp ,t^] and so on. Then whether on k-th step (k < n — 1) the admissible control will be formed on the time interval [tk ,T] or on the step n player i will end up with the process by choosing at time instant t^1 his admissible control on the remaining time interval (tn—1,T].
3. Associated games and corresponding solutions
For each given state (t*,x*) € D and i € I = [1, 2] consider zero sum differential game ri(t*,x*) between player i and I\{i} with the same dynamics as in r(t*,x*) and payoff of player i equal to:
The game r(t*,x*), i € I, (t*,x*) € D, as r(t*,x*), (t*,x*) € D we consider in the class of recursive strategies. Under the above formulated conditions each of the games ri(t*,x*), i € I, (t*,x*) € D has a value
vaWi(t*,x*),
and optimal strategies (saddle point).
Consider also the following optimization problem r^(t*,x*):
2
denoting the resulting maximal value as vi(t0,x0). We suppose that this optimization problem has an optimal open-loop solution.
The corresponding trajectory — solution of (1), (2) on the time interval [t0,T] we denote by x0(■) and call ”conditionally optimal cooperative trajectory”. This trajectory may not be necessary unique. Thus on the set D the mapping
v(^) : D ^ R3
is defined with coordinate functions
vI^),v1^),v2(^ : D ^ R
Vi(t*,x*) = vaWi(t*,x*), i € I, vi(t*,x*).
This mapping correspond to each state (t*,x*) € D a characteristic function v(t*,x*) :
21 ^ R of non zero-sum game r(t*,x*) and thus 2-person classical cooperative game
(I, v(t*, x*)).
Let E(t*,x*) = {a = (a1, a2) : ai > vi(t*,x*), a1 + a2 = vi(t*,x*)} be the set of all imputations in the game (I,v(t*,x*)). Multivalue mapping
M : (t*,x*) ^ M(t*,x*) C E(t*,x*) C R2,
M(t*,x*)= A i(t*,x*) € D,
is called ”optimality principle” (defined over the family of games r(t*,x*), (t*,x*) € D) and the set M(t*,x*) ”cooperative solution of the game r(t*,x*) corresponding to this principle”.
As it follows from (Fridman, 1971) under the above imposed conditions the following Lemma holds.
Lemma 1. The functions vI(■),v1 (^),v2(^) : D ^ R, are locally Lipschitz.
Since the solution of the Cauchy problem (1), (2) in the sense of Caratheodory is absolutely continuous, from Lemma 1 it follows.
Theorem 1. For every solution of the Cauchy problem (1), (2) in the sense of Caratheodory x( ) corresponding to the open loop controls u( ) = (u(1)( ),u (2)(■)) functions
Pi : [to,T] ^ R, i € I, w(t) = vi(t,x(t)),yi(t)= vi(t,x(t)) are absolutely continuous functions on the time interval [to,T].
As defined let E(t*,x*) be the set of imputations in the game r(t*,x*), and let C(t*,x*) = {£1(t*,x*),&(t*,x*)} € E(t*, x*).
Then we have
£i(t*,x*) > vi(t*,x*).
4. Realization of cooperative solutions
The realization of the solution of the game r(to, xo) we shall connect with the known ”imputation distribution procedure” (IDP) (Petrosjan and Danilov, 1979; Petrosjan, 1995).
Under IDP of the imputation £(to, xo) from the solution M(to, xo) of the game r(to,xo) along conditionally optimal trajectory xo(-) we understand such function
3(t) = (3i(t),32(t)), t e [to,T], (4)
that
£(to,xo)= I 3 (t)dt (5)
Jt 0
and
Jr 3(t)dt e E(t,xo(t)) Vt e [to,T] (6)
where E(t,xo(t)) is the set of imputations in the game (I,v(t,xo(t))).
The IDP 3(t), t e [to,T] of the imputation £(to,xo) e M(to,xo) of the game
r(to,xo) is called dynamically stable (time-consistent) along the conditionally op-
timal trajectory xo(-) if
J 3(t)dt e M(t,xo(t)) Vt e [to,T] (7)
The solution M(to,xo) of the game r(to,xo) is dynamically stable (time-consistent) if for all £(to,xo) e M(to,xo) along at least one conditionally optimal
trajectory the dynamically stable IDP exist.
If M(t,xo(t)) = E(t,xo(t)), t e [to,T], then M(t,xo(t)) = 0 (M(t,xo(t)) is the set of imputations in the subgame r(t, xo(t)) with initial conditions on conditionally optimal cooperative trajectory with duration T — t), and £(t,xo(t)) e M(t,xo(t)) can be selected as absolutely continuous function of t. Then the following theorem holds.
Theorem 2. For any conditionally optimal trajectory xo(-) the following IDP of the solution £(to,xo) e M(to,xo) of the game r(to,xo)
3(t) = ~^(t,x0(t)), te[t0,T], (8)
is the dynamically stable IDP along this trajectory. Therefore the solution M(to, xo) of the game r(to,xo) is dynamically stable.
As £i(to, xo) we can take the Shapley value:
2
VI(to, xo) — J2 Vi(to,xo)
£i(t0, x0) = Shi(t0, x0) = Vi(t0, x0) H--------------^-------------
and for subgame along cooperative trajectory
2
vi(t,xo(t)) — J2 Vi(t,xo(t)) £i(t,x0(t)) = Shi(t,x0(t)) = Vi(t,x0(t)) H-------------------^-------------.
From Theorem 1 it follows that the function Shi(t,xo(t)) is absolutely continuous and thus differentiable along xo(t). This shows that IDP 3(t) for £i(t,xo(t)) = Shi(t,xo(t)) can be computed by (8) according to Theorem 2.
5. About the strategically support of the imputation ^(to?^o)
If in the game the cooperative agreement is reached and each player gets his payoff according to the IDP (8), then it is natural to suppose that those who violate this agreement are to be punished. The effectiveness of the punishment (sanctions) comes to question of the existence of Nash Equilibrium in the following differential game r5(to, xo) which differs from r(to, xo) only by payoffs of players.
The payoff of player i in r5(to, xo) is equal to
Theorem 3. In the game r5(to,xo) for each £ > 0 there exist e-Nash equilibrium with outcomes (payoffs) of players in this equilibrium equal to
The idea of the proof is following. Since £(to,xo) belongs to the imputation set of the game r(to, xo) we have
This means that at each time instant t e [to,T] moving along conditionally optimal trajectory xo(-) no player i e I can guarantee himself the payoff [t, T] more than according to IDP (8), i.e. more than
since if player i deviates from cooperative trajectory at some time instant t, this will be immediately seen by his opponent 3 — i (since both players know x(t) at each time instant t, and deviation of one player will cause the change of x(t)) and he will use punishment strategy in the zero-sum game r3-i(t, xo(t)) (his optimal strategy in zero-sum game r3-i(t,xo(t))). Therefore, the player i will get no more then Vi(t + 6, xo(t + 6)) < £i(t, xo(t) + £.
In the same time on the time interval [to,t] according to the IDP she already got the payoff equal to
Consequently no player can guarantee in the game r5(to, xo) the payoff more than £i(to, xo).
According to the cooperative solution xo(-) but moving always in the game r5 (to , xo) along conditionally optimal trajectory each player will get his payoff according to the imputation £(to,xo). Thus no player can benefit from the deviation from the conditionally optimal trajectory which in this case is natural to call ”equi-librium trajectory”.
Jt 0 dt Jt(u(-))
where t(u(-)) is the last time instant t e [to,T] for which
xo(r)= x(t, to,x,u(-)) Vt e [to,t].
£(to,xo) = i£i(to,xo),£2(to,xo)} e E(to,xo).
£i(t,xo(t)) > Vi(t,xo(t)) Vi e I Vt e [to,T]
(9)
References
Petrosjan, L. A., Zenkevich, N. A. (2009). Principles of stable cooperation. The mathematical games theory and its applications, 1, 1, 102-117 (in Russian).
Chistyakov, S. V. (1977). To the solution of game problem of pursuit. Prikl. Math. i Mech., 41, 5, 825-832, (in Russian).
Chistyakov, S.V. (1999). Operatory znacheniya antagonisticheskikx igr (Value Operators in Two-Person Zero-Sum Differential Games). St. Petersburg: St. Petersburg Univ. Press.
Chistyakov, S.V., Petrosyan, L. A. (2011). Strong Strategic Support of Cooperative solutions in Differential Games. Contributions to Game Theory and Management, Vol. 4, pp. 105-111.
Chentsov, A. G. (1976). On a game problem of convering at a given instant time. Math.
USSR Sbornic, 28, 3, 353-376.
Chistyakov, S.V. (1981). O beskoalizionnikx differenzial’nikx igrakx (On Coalition-Free Differential Games). Dokl. Akad. Nauk, 259(5), 1052-1055; English transl. in Soviet Math. Dokl. 24, 1981, no. 1, pp. 166-169.
Petrosjan, L.A. (1993). Differential Games of Pursuit. World Scientific, Singapore. Pschenichny, B. N. (1973). E-strategies in differential games. Topics in Differential Games.
New York, London, Amsterdam, pp. 45-56 Fridman, A. (1971). Differential Games. John Wiley and Sons, New York, NY.
Petrosjan, L.A., Danilov, N. N. (1979). Stability of Solutions in nonzero-sum Differential Games with Integral Payoffs. Viestnik Leningrad University, N1, pp. 52-59.
Petrosjan, L.A. (1995). The Shapley Value for Differential Games. Annals of the International Society of Dynamic Games, Vol.3, Geert Jan Olsder Editor, Birkhauser, pp. 409-417.