Contributions to Game Theory and Management, XII, 159-176
Solution of the Differential Game with Hybrid Structure*
Ekaterina V. Gromova1 and Natalya G. Magnitskaya2
1 St. Petersburg State University,
7/9 Universitetskaya nab., St. Petersburg, 199034, Russia E-mail: [email protected]
2 St. Petersburg State University,
7/9 Universitetskaya nab., St. Petersburg, 199034, Russia E-mail: magnitsnatalya@gmail. com
Abstract This paper focuses on two approaches for calculating optimal controls in cooperative differential games with hybrid structure: namely, the (joint) payoff function has a form of sum of integrals with different but adjoint time intervals. Our methods had been applied for the game-theoretical model with random time horizon T where T has a discrete structure. But the area of application can be more wide.
Keywords: differential games, random duration, discontinuous cumulative distribution function, discrete random variable, optimal control, Pontrya-gin's maximum principle.
1. Introduction
In the paper the particular problem of calculating optimal controls in open-loop form is considered (Pontryagin, 1961). In many continuous optimal control problems including game-theoretic formulation (Basar and Olsder, 1995) in cooperative form the objective functional can be written as an integral from to to T. But in a hybrid formulation (Gromov and Gromova, 2017) where, for example, the payoff can be considered cis ci sum of integrals with different but adjoint time intervals there is a lack of concrete algorithms for solving the problem. We consider the class of cooperative differential games with discrete random time horizon (Gromova and Tur, 2017; Gromova et al., 2018) to demonstrate the methods which are based on using Pontryagin's maximum principle. The general formulation of the differential games with continuous random time horizon can be found in (Petrosjan and Shevkoplyas, 2000) and the fully discrete case of the dynamic games with discrete random time horizon had been published in (Gromova and Plekhanova, 2019). In the paper we consider hybrid model, namely, continuous dynamics and discontinuous cumulative distribution function which corresponds to discrete random time horizon. Another approach with hybrid cumulative distribution function had been considered in (Gromov and Gromova, 2017).
The paper is structured as follows. In section 2 the problem statement with random discrete time horizon is given, in subsections more particular cases with 1 and 2 points of discontinuity are considered. In section 3 we consider a new approach of solving differential game based on parametrization. In section 5 we analyze another more formal backward approach which includes terminal payoff for results from previous stage, the same example is solved and the results coincide. Numerical example is given in 6. In sections 7, 8 these two methods are applied for the case of
* The reported study was funded by RFBR according to the research project N 18-0000727 (18-00-00725)
two points of discontinuity and it is shown that results coincides.
2. Game formulation
Consider a differential game with n players (Basar and Olsder, 1995). The game starts at the time to and ends at the random moment T, where T is the random variable with known cumulative distribution function F(t), t € [to,Tf], Tf can be infinite (Gromova and Tur, 2017) or finite (Petrosjan and Murzov, 1966). The T
2016; Kostyunin et al., 2014)). Let Tj be the random variable with known cumulative distribution function Fi(t), t € [to, Tf ], i = l,nTj - the time instant of the process stop for the player i, i = l, n. {Ti}rn=1 are assumed to be independent random
to
game stop for the first player, which means
T = min{Ti, T2, ...T„}. (1)
Since T1r .., Tn are independent, we have:
n
F(t) = 1 -JJ(l - Fi(t)). (2)
1=1
The dynamics is defined by the equation:
x = g(x,ui,u2, ...Un), x(to) = xo. (3)
i
Tf
^(<o-x-u' = E </ WW»*). ' = ^ W
to
where hi(x, u1...un) is the instantaneous payoff function of the player i.
It was shown in (Kostyunin and Shevkoplyas, 2011) that uder some mild coni-ditions the payoff (3) can be transformed to the more simple form:
Tf
Ki(-)=j(1 - F(t))hi(-)dt. (5)
to
Ti i = l , . . . , n
means that for all players there are known time instants {t1,t2, ...,tk} in which the game may stop with some probabilities {p1,p2, ...,pk}.
T1 T2
random variables with known cumulative distribution functions F1(i^d F2(t). Let {t1,t2, ...,tk} and {r1,r2, ...,rk} be time instants, when distribution functions F1(t), F2(t) have simple discontinuity. Let
P{T1 = tm} = Pm, P{T2 = Tj} = Qj, m,j = 1, k.
It is assuriied. that tho gamo ends at tho timo of thc gamo stop for tho first playor: T = min {T1,T2}. Since Ti, T2 are independent, hense:
F (t) = 1 -A (1 - Fl (*))■
(6)
i=i
2.1. Problem statement for one point of discontinuity
Consider the following case: Fi(t), F2(t) have simple discontinuities in points ti = ti, t2 = T2 (Fig. 1., Fig. 2.).
Fi
Pi
h h
1
ei
h h
Fig. 1. c.d.f. Fi
Fig. 2. c.d.f. F2
T
f0 t<ti, F(t) = ^ pi = 1 - (1 -pi)(l - Ql) ti < t< t2, [l t > t2.
From (5) for discrete random time horizon (see (Gromova and Tur. 2017)). we have payoff of the player j, i = 1, 2 in the following form:
tl t2 K^) = J hi(x(t),u(t))dt + (1 - pi) J hi(x(t), u(t))dt.
to 11
Let u(t) = (ui(t),u2(t)).
Consider the cooperative form of the game (Petrosjan and Danilov. 1982). Then the optimal control problem is to maximize the total payoff of the players:
t1
maxE2=i Ki(to,xo,ui,u2) = f(hi(x* (t),u*(t)) + h2(x*(t),u*(t))dt+
t2 t0 (7)
+(1 - pi)J(hi(x*(t), u*(t)) + ha(x*(t), u*(t))dt,
t1
F L
1 pi
h h
Fig. 3. c.d.f. F(t)
where x*(t), u* (t) — optimal trajectory and controls.
The solution will be considered in the class of open-loop strategies (Afanasyev et al.. 2003).
2.2. Problem statement for two points of discontinuity
Consider more complicated case of discrete distribution. Let F\(t), F2(t) have simple discontinuities in points ti = ti, t2 = r2, t3 = t3 (Fig. 4., Fig 5.).
Qi
ti t2 t3
1
.02 Ql
tl t-2 t-i
Fig. 4. c.d.f. Fi
Fig. 5. c.d.f. F2
Then
2
F(t) = 1 - II(! - Fi) = 1 - (1 - Fi)(1 - F2),
i= 1
F (t)
0 t < 11,
pi = 1 - (1 - pi)(1 - ßi) ti <t < t2,
P2 = 1 - (1 - Pi - P2)(1 - ßi - ß2) t2 < t < ts,
1 t>ts.
F
1 Pi
1 I J I
i i i J_I_L_
tl t2 h t Fig. 6. c.d.f. F(t)
From (5) and (Gromova and Tur, 2017), we get payoff of the player i, i = 1, 2:
t1 12 t3
K^)= j hi(x(t),u(t))dt + (1 - pi) J hi(x(t),u(t))dt +(1 - p2) J hi(x(t),u(t))dt.
to
12
(8)
Consider the cooperative form of the game. Then the optimal control problem is to maximize the total payoff of the players:
11
maxEi=i Ki(to,xo,n1,U2) = f (hi(x* (t),u*(t)) + h2(x*(t),u*(t))dt+ m,u2 to
t2
+(1 - Pi)S(hi(x*(t),u*(t)) + h2(x*(t),u*(t))dt+ (9)
t1 t3
+(1 - P2) f(hi(x*(t),u*(t)) + h2(x*(t),u*(t))dt,
t2
where x* (t), u*(t) — optimal trajectory and controls.
3. First approach. One point of discountinuity
Let us demonstrate the first approach of calculation open-loop controls for (7) by the example of resource extraction differential game (Gromova, 2016) based on models (Breton et al., 2005; Haurie et al., 2012; J0rgensen and Zaccour, 2007):
The dynamics of total amount of resource x is defined by the equation:
2
x(t) = ui (t), x(to) = xo. (10)
i=i
The instantaneous payoff of i-th player is defined as:
hi(x,u) = Ti(u) - dix, ri(ui) = Ui(ai - -Ui),ai,di > 0, Vi = 1, 2. (11)
The solution will be considered in the class of open-loop strategies (Afanasyev et al., 2003). Let u(t) = (ui(t),u2(t)).
We will apply the Pontryagin's maximum principle (Pontryagin, 1961) and find the solution on two intervals Ii = [0,ti] and I2 = [ti,t2], the problem will be solved with two fixed ends on the first interval, and at the second — with a loose right end. We introduce x(ti) = xi as a parameter of the solution, we will find it's value at the end of the solution from the maximization condition (7). Ii
To find the profile of optimal controls and trajectory we have to solve the max-
ti
imization problem f(hi(x(t),u(t)) + h2(x(t),u(t))dt for dynamic (10) and initial
to
conditions x(t0) = x0, x(ti) = xi; where xi — parameter. The Hamiltonian is:
2
H (x, u, ') = ui + hi(x(t), u(t)) + h2(x(t), u(t)) =
i=i
2 1 1
—+ ui(ai — - ui) + u2(a2 — - u2) — dix — d2 x, (12)
22
i=i
ui
dH , .
t— = + (ai — ui) = 0, dui
u№ = —' + ai.
H
concave w.r.t. u^, t e [0, ti],
d 2H
= —1 < 0-
The adjoint equations:
d' dH (x,u,')
dt dx
Hence,
d, d = di + d2. (13)
'(t) = 'o + dt, u'i(t) = —'o — dt + ai
Dynamic is:
2
x(t) = — Ui(t) = 2*o + 2dt — à, à = ai + a,2-
i=i
We use the initial conditions:
x(0) = xo, x(ti) = xi, (14)
x(t) = 2*ot + dt2 — at + xo. Let us find *o according to the initial condition and x(t):
xi — dt2; + ati — xo
*0 =-2t[-•
Then the optimal trajectory for the interval Ii:
* / \ xi dt i ^^ ati xo a 2 a / \ t a / > v
x (t)ii = -1 + dt — at + xo = (xi — xo)--+ dt(t — ti) + xo. (15)
ti ti
Ii
xi — dti + ati — xo a, i /ir\
ui (t)ii =--^--dt + ai. (16)
2ti
I2
To find the profile of optimal controls and trajectory we have to solve the max-
t2
imization problem J(hi(x(t),u(t)) + h2(x(t),u(t))dt for dynamic (10) and initial ti
x( ti ) = xi xi
2
H (x,u,*) = + (1 — pi)hi(x(t),u(t)) + (1 — pi)h2(x(t),u(t)) =
i=i
2i=i ui + (1 — pi)ui(ai — 2 ui) + (1 — pi)u2(a2 — 2 u2) —
— (1 — pi)di x — (1 — pi)dax, (17)
ui
dH
T— = + (1 — Pi )(ai — ui) = 0, oui
+ (1 — Pi)ai
u* (t) =
(1 — Pi) "
H
concave w.r.t. u^, t e [ti,t2]:
d2H
~3ui = —(1 — Pi) < 0
The adjoint equations:
= -dH(XdxU"P)=(1 - d= dl + ¿2, (18)
t
— (t) = j(1 - Pi)ddt = (1 - Pi)d(t - ti) + —i. ti
The transversality condition:
—fe) = 0.
We get:
—(t) = (1 - pi)d(t - t2).
2
xx(t) = - N ^Ui(t) = 2-———- - a, a = ai + a,2-= (1 - Pi)
X(ti) = Xi
x(t) = -2dt2t + dt2 - at + 2dt2ti - dt'{ + ati + xi. Optimal control:
ui (t)i2 = -d(t - t2) + ai. (19)
Optimal trajectory:
x* (t)i2 = -2dt2t + dt2 - at + 2dt2ti - dt2 + ati + xi. (20)
Intervals Ii, I2
According to (7) we have to solve the maximization problem, taking into account (15), (16), (20), (19) i.e.
max y Ki(to,xo,u*(t,xi)).
xi i
i=i
Substituting (15), (16), (20), (19) into (7), we get: ti
1 1
(ui (t)ii (ai - 2ui (t)i1) + u2 (t)i1 (a2 - 2u2 (t)i1) - dix* (t)i1 - d2x * (t)i1 )dt+
t2
+(1-Pi) /(ui(t)i2 (ai-2ui(t)i2 )+u2(t)i2 (a2-2u*(t)i2 )-dix*(t)i2--d2x*(t)i2 )dt =
t1
x? xi(-ati + xo) dxitx ~ +\>n<+ +\ io 1\
= - 4ti +-2ti---- + (1 - Pi)dxi(ti - t2) + C (ti,t2)> (21)
where C(ti;t2) — expression independent of xi. The maximum (21) is reached at:
xi = -ati + xo - dti + 2ti(1 - Pi)d (ti - t2).
Substituting the obtained value for xi into (15), (16), (20), (19), we finally get expressions for the optimal trajectory and controls on the intervals /2:
x/1 (t) = -at + 2(1 - pi)d(ti - t2)t + dt(t - 2ti) + x0, uj (t) = dti - (1 - pi)d (ti - t2) - dt + aj, t G [to; ti], x12 (t) = -2dt2t + dt2 - at + 2dt2ti - 2dt2 + x0 + + 2ti(1 - pi)d(ti - t2), uji2 (t) = -d(t - t2) + aj, t G (ti; t2].
4. Second approach. One point of discountinuity
Consider the previous example, starting the solution from the second interval 12. The t2
value (1 - pi) f(hi(x(t),u(t)) + h2(x(t), u(t))dt will be considered as the terminal ti
payoff for the interval Ii. Let us substitute (19), (20) and find the value of the terminal payoff:
t2
$(xi) = (1 - pi^(ui(t)/2 (ai - 2u |(t)/2)+ u2(t)/2 (a2 - 2u2(t)/2) - dix*(t)^ -t1
d2x * (t)/2 )dt =
= (ti - t2)(pi - 1) (3a2 + 3a2 + 2d2(ti - t2)2 - 3adti + 3adt2 - 6xid). (22) 6
Interval Ii.
To find the profile of optimal strategies we have to solve the maximization problem
ti
f (hi(x(t),u(t)) + h2(x(t),u(t))dt + ^(xi) for dynamic (10), initial condition x(t0) =
to x0
2
H(x, u, —) = - — ^^ uj + hi(x(t), u(t)) + h2(x(t), u(t)) = i=i
211
^uj + ui(ai - - ui) + u2(a2 - - u2) - dix - d2x, (23)
22
j=1
uj
H
— = + (aj - uj) = 0,
uj
u* (t) = + aj.
H
concave w.r.t. u^, t G [0,ti]:
d2H
^ur = <a
The adjoint equations:
d— dH (x,u, —)
t x
d , d = di + d2. (24)
Hence,
^(t) = ^o + dt, u* (t) = —^o — dt + ai.
Dynamics:
2
x(t) = — ^^ ui(t) = 2^o + 2dt — a, a = ai + a,2-
i=i
We use the initial condition:
x(0) = xo, (25)
x(t) = 2^ot + dt2 — at + xo. According to the terminal payoff condition (22)
m) = ^,
^(ti) = (ti — t2)(i — pi)d.
Then
^o = d(—tipi — t2 + t2pi). Optimal controls for the I\ interval:
u* (t)i1 = —d(—tipi — t2 + t2Pi) — dt + ai. (26)
I1
x
I1 (t) = 2d(—tipi — t2 + t2pi)t + dt2 — at + xo. (27)
Let us substitute the optimal control and trajectory into J(hi(x(t),u(t))+
to
h2(x(t),u(t))dt + $(xi) and get: ti
1 1
(ui (t)ii (ai — 2 u * (t)ii ) + u2 (t)ii (a2 — 2 u2 (t)h ) — dix * (t)h — d2x * (t)h )dt + 4>(xi )
o
= $(xi ) + C (tiM), (28)
where C(ti,t2) expression independent of xi. We know that
x'h (ti) = xi.
x1
xi = 2d (—tipi —12 + t2pi)ti + dt2 — ati + xo = —ati + xo — dt2 + 2ti(1 — pi)d (ti —12).
(29)
Substitute (29) into (19), (20) and get expressions for optimal trajectories and controls:
t
x/1 (t) = 2d(-tipi — ¿2 + Î2Pi)t + dt2 — ât + xo =
- ât + 2(1 — pi)d (ti — t2)t + +dt(t — 2ti) + x0,
Mii1 (t) = —d(—tipi —t2+t2pi) —dt+âj = dti — (1—pi)d(ti —12)— dt+âj, t G [to;ti] X/2 (t) = —2dt21 + dt2 — ât + 2dt2ti — 2dti + x0 + 2ti(1 — pi)d(ti — t2), <2 (t) = —d(t — t2) + âj, t G (ti; t2].
5. Numerical example
Consider the previous example with numeric parameters.
Let âi = 5, â2 = 6, di = 1 d2 = 2 Pi = 0-3 i?i = 0-7 pi = 0.79 ti = 1, t2 = 2 xo = 40.
Consequently:
x1i (t) = 3t2 — 18.26t + 40, «î (t) = —3t + 8.63, «2i1 (t) = —3t + 9.63, t G [t0; ti], x12 (t) = 3t2 — 23t + 44.74, uîi2 (t) = —3t + 11, w2i2 (t) = —3t + 12, t G [ti; t2].
t
Fig. 7. Optimal control for the first player
40
-a^Ci)
30
20
10
0.5
1.5
Fig. 8. Optimal trajectory
6. First approach. Two points of discountinuity
Lot us demonstrate the first approach of calculation open-loop controls for (9). dynamics (10). instantaneous payoff (11) by the example of resource extraction differential game (J0rgensen and Zaccour, 2007):
We will apply the Pontryagin's maximum principle (Pontryagin. 1961) and find the solution on three intervals I1 = [0,ti], I2 = [ti,], I3 = [t2,t3], the problem will be solved with two fixed ends on the first and second intervals, and at the third — with a loose right end. We introduce x(t1) = x1; x(t2) = x2 as a parameters of the solution, we will find its value at the end of the solution from the maximization condition (9). Ii
To find the profile of optimal strategies we have to solve the maximization problem ti
J(h1(x(t),u(t)) + h2(x(t),u(t))dt for dynamic (10) and initial conditions x(t0) =
to
x0, x(t1) = xl5 where x1 — parameter.
By using the Pontryagin's maximum principle we get:
optimal trajectory:
x*(t)h
x\ — dt2 + at\ — xo ti
t + dt —at + xo
optimal controls:
(xi — xo)— + dt(t — ti) + xo, ti
xi — dtl + ati — xo a ,
ui (t)h =--^--dt + ai
¿ti
(30)
I2
To find the profile of optimal strategies we have to solve the maximization problem
t2
/(hi(x(t),u(t)) + h2(x(t), u(t))dt for dynamic (10) and initial conditions x(ti) =
ti
x1, x(t2) = x2, where x1; x2 — parameters. By using the Pontryagin's maximum principle we get: optimal trajectory:
*,.s (x2 - x1 + a(t2 - t1) - d(t2 - ¿1))(t - t1) , ,2^ . N,
x (t)/2 = ------+ d(t - tx) - —a(t - t1)+ X1,
t2 — t1
(32)
optimal controls:
*,.s x2 — x1 + a(t2 — t1) — d(t2 — t1) i., /O^
u* (t)/2 =--o77-71--dt + '33')
2(t2 — t1)
Interval /3.
To find the profile of optimal strategies we have to solve the maximization problem
t3
/(h1(x(t), u(t)) + h2(x(t), u(t))dt for dynamic (10) and initial condition x(t2) = x2, t2
x2
optimal controls:
< (t)/3 = —d(t — ¿3) + «i, (34)
optimal trajectory:
x
: (t)/3 = —2dt3t + dt2 — ât + 2dt3t2 — dt2 + ât2 + x2. (35)
Intervals /1, /2, /3
According to (9) we have to solve the maximization problem, taking into account (30), (31), (32), (33), (34), (35) i.e.
max y Kj(to,xo,w*(t, X1)).
xi ,X2
i=1
Substituting (30), (31), (32), (33), (34), (35) into (9), we get: ti
/"1 1
(u1 (t)i1 («1 — 2 u1 (t)/1) + u2 (t)/1 («2 — 2 u2 (t)/1) — d1X * (t)i1 — d2X * (t)i1 )dt+
o
t2
1 1
(1 — P1) I (u1(t)/2 (a1 — 2u1 (t)/2) + u2 (t)/2 («2 — 2M2(t)/2) — dx* (t)/2 dt+
ti t3
1 1
(1 — P2) I (u1 (t)/3 («1 — 2 u1(t)l3 ) + u2(tk (a2 — 2 u2(t)/3 ) — dx* (t)/3 dt = t2
x1 x1 (—«¿1 + xo) dx^ / (—x2 + x1)2 a(—x2 + x1 )\
= — 4t1 +-2t1---^ + (1 — — 4(t2 — t1) +-2-) +
+(1 - toM - ^ - - ^ - (3)+
+C(ii,i2,i3,xo), (36)
where C(ti, t2) — expression independent of x1 5 x2 • The maximum (36) is reached at:
xi = -âti + xo - dti + 2ti(1 - p2)d(t2 - t3) + 2t 1 (1 - pi)d(ti - ¿2),
â(t t ) d(t t )2 , 2(1 - P2)d(t2 - t3)(t2 - ti) + X2 = -a(Î2 - ti) - d(ti - ¿2) +--:--+ xi.
1 - Pi
Hence,
x *(% = 2td(-ti + (1 - pi)(ti - ¿2) + (1 - P2)(t2 - ¿3)) + dt2 - ât + xo, m*(% = d (ti - (1 - Pi)(ti - t2) - (1 - P2)(t2 - t3)) - dt + âi, t G [to; ti],
x *(t)/2 = 2dt2 + 2(1 p2—d^pt2 t3)j (t — ¿1) + dt2 — at — 2dt1 + xo+
+ 2t1(1 — p2)d(t2 — t3) + 2t1(1 — p1 )d(t1 — t2), *t+\ i (1 — P2)d(t2 — t3) i., l
u* (t)/2 = dt2-----dt + «i, t G (t1, 121,
1 — P1
x * (t)/3 = dt2 — «t — 2dt3t + 2dt3t2 — 2dt2 — 2dt1 + 2dt112+
+ 2(1 — P2)d(t2 — t3)(t2 — t1) + xo + 2t1(1 — P2)d(t2 — t3) + 2t1 (1 — P1)d(t1 —12), 1 — P1
u*(t)j3 = —d(t — t3) + «i, t G (t2; t3]. 7. Second approach. Two points of discountinuity
/3
t3
value (1 — p2) /(h1(x(t), u(t)) + h2(x(t), u(t))dt will be considered as the terminal
t2
/2
terminal payoff:
t3
^3 (x2 ) = (1 — P2)/(u * (t)i3 («1 — 2u * (t)i3) + u2(t)/3 («2 — 2u2(t)/3) — d1x* (t)/2 —
t2
d2x * (t)/2 )dt =
_ (t2 — ¿3)(P2 — 2 , „ 2 . o»/, , n2
6
(3â2 + 3â2 + 2d (t2 - t3) - 3âdt2 + 3âdt3 - 6x2d. (37)
/2
To find the profile of optimal strategies we have to solve the maximization problem
t2
J (hi(x(t),u(t)) + h2(x(t), u(t))dt for dynamic (10), initial condition x(ti) = xi and ti
terminal payoff (37). By using the Pontryagin's maximum principle we get: optimal trajectory for the interval I2:
x* (t)l2 = 2d (h - ts)(1 - P2)(t - tl) + d(t2 -11) - a(t -11) - 2dt2 (t -11)+ xh (38) 1 - pi
I2
d(t2 - t3)(1 - P2) , ^ î
U-i (t)i2 =--Ti-^--+ dt2 - dt + ai. (39)
(1 - Pi)
We know that x*(t2)I2 = x2, hence
x2 = -a(t2 - ti) - d(ti -12)2 + 2(1 - P2)d(t2 - t3)(t2 - ^ + xi. (40)
1 - Pi
t2
The value (1 - pi) f(hi(x(t),u(t)) + h2(x(t),u(t))dt + &3(x2) will be considered as
ti
Ii
of the terminal payoff: t2
&2(xi) = (1 - Pi)j(u*i(t)i2 (ai - 2ul(t)i2 )+ u2(t)i2 (a2 - 2u2(t)h ) - dix* (t)h -ti
d2x* (t)i2 )dt =
= (1 - Pi)dxi(ti - t2) + (1 - P2)(t2 - t3)dxi + C(ti,t2,t3), (41)
where C(ti,t2,t3) — expression independent of xi. Interval Ii.
To find the profile of optimal strategies we have to solve the maximization problem ti
/(hi(x(t), u(t)) + h2(x(t), u(t))dt+<P2(xi) for dynamic (10), initial condition x(t0) =
to
x0 and terminal payoff (41). By using the Pontryagin's maximum principle we get: optimal controls for the interval Ii :
u*(t)ii = -(ti - t2)(1 - Pi)d - (t2 - t3)(1 - P2)d + dti - dt + ai, (42) optimal trajectory for the interval Ii:
x*ii (t)=2((ti - t2)(1 - Pi)d +(t2 - t3)(1 - P2)d - dti)t + dt2 - at + xo. (43) Notice, that x*ii (ti) = xi, hence,
xi = 2((ti - t2)(1 - Pi)d + (t2 - t3)(1 - P2)d)ti - dt\ - ati + xo. (44)
Let us substitute (40), (44) in (34), (35), (38), (39) and get expressions for optimal trajectories and controls:
x*(t)i1 = 2td(-ti + (1 - Pi)(ti - t2) + (1 - P2)(t2 - t3)) + dt2 - at + xo,
u*(t)/i = d (t1 — (1 — P1)(t1 — t2) — (1 — P2)(t2 — t3)) — dt + «i, t G [to; t1], x * (t)i2 = 2dt2 + 2(1 — p2^2 — t3^ (t — t1) + dt2 — «t—
—2dt2 + xo + 2t1 (1 — P2)d(t2 — t3) + 2t1 (1 — P1)d(t1 — t2),
u**(t)/2 = — d (t2 — t3)(1 — P2) + dt2 — dt + Oi, t G (t1; t2], (1 — P1)
x* (t)i3 = dt2 — «t —2dt3t+2dt3t2—2dt2 —2dt2 +2dt1t2 +2(1 — P2)d (t2 — t3)(t2 — t1) +
1 — P1
+xo + 2t1(1 — p2)d(t2 — t3) + 2t1(1 — P1)d(t1 — t2), u*(t)/3 = —d(t —13) + «i, t G (t2; t3].
8. Numeric example
Consider the previous example with numeric parameters.
Let «1 = 5, «2 = 6 d1 = 1 d2 = 2 P1 = 0-2 P2 = 0.4, £1 = 0.7, ¿>2 = 0.2, P1 = 0.76,
P2 = 0.96, t1 = 1 t2 = 2 t3 = 4 xo = 80.
Consequently:
x1i (t) = 3t2 — 18.92t + 80, u1 (t) = —3t + 8.96, u2 (t) = —3t + 9.96, t G [to; t1], x12 (t) = 3t2 — 25t + 86.08, u1i2 (t) = —3t +12, u2i2 (t) = —3t + 13, t G [t1; t2], x13 (t) = 3t2 — 35t + 106.08, u1i3 (t) = —3t +17, u2i3 (t) = —3t + 18, t G [t2; t3]
80
60
H
40
20
0 12 3 4
t
Fig. 9. Optimal trajectory
8
0 12 3 4
t
Fig. 10. Optimal control for the player 1
9. Conclusion
In this paper we considered two different approaches to the calculation of optimal controls and trajectory in differential games with random duration. It was constructed a new approach parametrization. which gives the same answer as the traditional method by using terminal payoff.
References
Afanasyev. V., Kolmanovskiy. V., Nosov, V. (2003). Mathematical theory of designing control systems, P. 617.
Basar, T., Olsder, G. (1995). Dynamic Noncooperative Game Theory. London: Academic Press.
Bellman, R. (1957). Dynamic Programming. Princeton: Princeton University Press.
Breton, M., Zaccour, G., Zahaf, A. (2005). A differential game of joint implementation of environmental projects. Automatica, 41(10), 1737-1749.
Dockner, E. J., Jorgensen, S., Long, N.V, Sorger, G. (2000). Differential Games in Economics and Management Science. Cambridge University Press.
Engwerda, J. (2005). LQ Dynamic Optimization and Differential Games. Wiley.
Gromov, D., Gromova, E. (2014). Differential games with random duration: a hybrid systems formulation. Contributions to Game Theory and Management, 7, 104-119.
Gromov, D., Gromova, E. (2017). On a class of hybrid differential games. Dynamic games and applications, 7(2), 266-288.
Gromova, E. (2016). The Shapley value as a sustainable cooperative solution in differential games of 3 players. Recent Advances in Game Theory and Applications, 67-89.
Gromova, E., Tur, A., Balandina, L. (2016). A game-theoretic model of pollution control with asymmetric time horizons. Contributions to Game Theory and Management, 9, 170-179.
Gromova, E., Tur, A. (2017). On the form of integral payoff in differential games with random duration. XXVI International Conference on Information, Communication and Automation Technologies (ICAT), IEEE, DOI: 10.1109/ICAT.2017.8171597.
Gromova, E., Malakhova, A., Palestini, A. (2018). Payoff Distribution in a Multi-Company Extraction Game with Uncertain Duration. Mathematics, 6(9), P. 165.
Gromova, E., Plekhanova, T. (2019). On the regularization of a cooperative solution in a multistage game with random time horizon. Discrete Applied Mathematics, 255, 40-55.
Haurie, A., Krawczyk, J., Zaccour, G. (2012). Games and dynamic games. World Scientific Books, control and differential equations, 211, 370-376.
J0rgensen, S., Zaccour, G. (2007). Developments in Differential Game Theory and Numerical Methods: Economic and Management Applications. Computational Management Science, 4(2), 159-182.
Kostyunin, S., Shevkoplyas, E. (2011). On simplification of integral payoff in differential games with random duration. Vestnik of St.Petersburg Univ., Ser. 10, 4, 47-56.
Kostyunin, S., Palestini, A., Shevkoplyas, E. (2014). On a Nonrenewable Resource Extraction Game Played by Asymmetric Firms. Journal of Optimization Theory and Applications, (2), 660-673.
Petrosjan, L., Danilov, N. (1982). Cooperative differential games and their applications, Tomsk University Press.
Petrosjan, L., Murzov, N. (1966). Game-theoretic problems of mechanics. Litovsk. Math. Sb., VI-3, 423-433. (in Russian)
Petrosjan, L., Shevkoplyas, E. (2000). Cooperative differential games with random duration. Vestnik Sankt-Peterburgskogo Universiteta. Ser 1. Matematika Mekhanika As-tronomiya (4), 18-23.
Petrosyan, L., Zaccour, G. (2003). Time-consistent Shapley value allocation of pollution cost reduction. Journal of Economic Dynamics and Control, 27(3), 381-398.
Pontryagin, L. (1961). Mathematical theory of optimal processes, 1961. P. 392.
Shevkoplyas, E. (2010). Stable cooperation in differential games with random duration. Upravlenie bol. syst., 31-1, 162-190.