Научная статья на тему 'Public Good Differential Game with Composite Distribution of Random Time Horizon'

Public Good Differential Game with Composite Distribution of Random Time Horizon Текст научной статьи по специальности «Математика»

CC BY
0
0
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
differential games / optimal control / dynamic programming / Hamilton-Jacobi-Bellman equation

Аннотация научной статьи по математике, автор научной работы — Tatyana Balas, Anna Tur

Differential games with random duration are considered. In some cases, the probability density function of the terminal time can change depending on different conditions and we cannot use the standard distribution. The purpose of this work is studying of games with a composite distribution function for terminal time using the dynamic programming methods. The solutions of the cooperative and non-cooperative public good differential game with random duration are considered.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Public Good Differential Game with Composite Distribution of Random Time Horizon»

Contributions to Game Theory and Management, XVI, 7—19

Public Good Differential Game with Composite Distribution of Random Time Horizon*

Tatyana Balas and Anna Tur

St.Petersburg State University, 7/9, Universitetskaya nab., St. Petersburg, 199034, Russia E-mail: st076855@student.spbu.ru, a.tur@spbu.ru

Abstract Differential games with random duration are considered. In some cases, the probability density function of the terminal time can change depending on different conditions and we cannot use the standard distribution. The purpose of this work is studying of games with a composite distribution function for terminal time using the dynamic programming methods. The solutions of the cooperative and non-cooperative public good differential game with random duration are considered.

Keywords: differential games, optimal control, dynamic programming, Hamilton-Jacobi-Bellman equation.

1. Introduction

Differential games are widely used to model conflict-controlled processes that evolve continuously over time. If the end time of the game is known, then the game is considered to be on a finite time interval. It is also common to consider games with an infinite horizon. However, when trying to describe real life processes, one often encounters uncertainty, in the sense that the terminal time of the game is not known in advance, but is the realization of some random variable. Such games are called games with random duration. Optimal control problems with uncertain duration were first considered by Yaari in (Yaari, 1965). Later, this idea was widely applied in problems of dynamic games. The study of cooperative and non-cooperative differential games with random duration is presented by Shevkoplyas and Petrosyan in (Petrosyan and Shevkoplyas, 2003, Shevkoplyas, 2014).

In some cases, the probability density function of the terminal time may change depending on different conditions. The standard distribution cannot fully simulate the random variable responsible for the moment of the end of the game. Such a scenario occurs when the operating mode of the system changes over time at the appropriate switching points and is characterized by its own distribution at each individual interval between switching. In such problems, a composite distribution function for terminal time is used. In (Gromov and Gromova, 2014, 2017), games with a random horizon and composite distribution function for terminal time are considered as hybrid differential games, since payoffs of players in such games take the form of the sums of integrals with different, but adjoint time intervals. In (Balas, 2022), the differential game with composite distribution function for terminal time with two switching moments is investigated. In these works, the solutions were found in the class of open-loop strategies using the maximum Pontryagin principle. In (Balas and Tur, 2023), the cooperative games with a composite distribution function for the terminal time were studied using the methods of dynamic programming.

*This work was supported by the Russian Foundation for Basic Research and the German Research Foundation (DFG), project number 21-51-12007.

https://doi.org/10.21638/11701/spbu31.2023.01

The aim of this paper is to study both cooperative and non-cooperative solutions to the class of games described using dynamic programming methods. We consider a public good differential game where two or more individuals invest in a single stock of capital. Such a model may be relevant, for example, when players invest in an environmental clean-up or reclamation problem. Knowledge accumulation as a public good can also be considered in this formulation (Dockner et al., 2000).

The paper is organized as follows. Section 2 gives the game formulation and the basic assumptions of the model. The following Section 3 presents a method for solving such games. The system of Hamilton-Jacobi-Bellman equations for the problem is given here. A cooperative model is investigated in Section 4. The case of the provision of a public good in Section 5. Section 6 presents the solution of the problem in a non-cooperative formulation using the Hamilton-Jacobi-Bellman equation. The same case is solved in a cooperative setting in Section 7. Comparison of the obtained solutions for the cooperative and non-cooperative cases is presented in Section 8. An illustrative example is solved in non-cooperative case in Section 9 and in cooperative case in Section 10.

2. Problem Statement

Consider differential n-player game proposed in (Gromov, Gromova, 2017) . The game starts from initial state x0 at the time t0 and the terminal time T is the random variable, distributed on the time interval according to known composite distribution function F (t) with N — 1 switches at fixed moments of time t .

F (i) = {°' t<T°' (1)

\ ai(Tj)Fj+i(t) + j(Tj), t e [Tj; Tj+i), 0 < j < N — 1,

{Fj, ° < j < N — 1} is the set of distribution functions characterizing different

modes of °perati°n. Here Fi (to) = ° aj(Tj) = j/j-i, (Tj) = 1 — j/'j-i,

F (j) is defined as the left limit F (t) at t = j, i.e., F (t-) = lim F (t). j _ j j -0) _

For simplicity, we denote Fj+i(t) = aj(Tj)Fj+i(t) + jj(Tj) and fj+i(t) =

aj (Tj )fj+i(t).

Let hi(x(t), u(t)) be an instantaneous payoff of the player i in the game rT (x0, t0) at time t, u(t) = (ui(t),..., un(t)), ui e Ui c Rl is the control of player i, ui are piecewise continuous functions.

The system dynamics is described by a first order differential equation:

x = g(t, x, u), x(to) = xo. The expected integral payoff of the player i is:

Kj(xo, to, u) = E

hj(x(r), u(t ))dr

hi(x(t), u(t))dr

dF(t). (2)

According to (Kostyunin, 2011) this functional can be simplified to the following form:

t

W

Ki(xo,To,u) = J(1 _ F(t))hi(x(t),ui(r))dr.

To

The expected payoff of the player in the subgame rT(x(t),t) starting at the moment t from x(t) is evaluated by the formula:

Ki(t,x(t),u)= 1 _ F J(1 _ F(t))hi(x(T),Ui(T))dT. (3)

t

This problem was considered in (Gromov, Gromova, 2017) in the class of open-loop strategies for cooperative case.

3. Nash Equilibrium

First look at the non-cooperative case of the game. We will solve the optimization problem using the Hamilton-Jacobi-Bellman equation. Let uNE - the set of Nash equilibrium strategies and uNE = (uNE,..., u1N_E1, ui, u,..., uNE), ui G Ui. VJ(t, x(t)) - value of the Bellman function at t G [Tj_i,Tj] for player i:

Vi(t,x(t)) = max Ki(t,x(t),uNE), t G [Tj_i, Tj]. (4)

UiEUi

Consider subgame rT(t,x(t)) starting at the moment t > [tn_1; to). We have a standard model of the game with random duration here, because there are no switches during the period (tn_i; to). The expected gain in r T (t , x(t)) is as follows:

1 p+w n

Ki(t,x(t),u)= (1 _ Fn(t))£ hi(x(T),ui(T))dT.

1 _ F N(t) Jt i=1

In (Shevkoplyas, 2014) the Hamilton-Jacobi-Bellman equation for differential games with random duration was presented. Then according to the Bellman principle and (Shevkoplyas, 2014) VN(t,x) satisfies the equation:

fN(t) TAW. \ dVN (t,x) . /1 / NE\ , dVN (t,x) / NE\\ --vn (t,x) = -^- + max(hi(x,u_i ) + ---9(x,u_i ))

1 _ Fn(t) at UiEUi dx

at t G [tn_1; to), i = 1,... ,n and the boundary condition

lim Vi(t,x) = 0.

t^w

Consider now the subgames rT (t, x(t)) at t G [Tj_1,Tj ] j = 1,... ,N _ 1. For j = 1,..., N _ 1 we have:

^L Vl(t,x) = jl + max(h(x,uNE ) + g(x,uNE )),

1-Fj (t) j v ' y dt 1 meUi

t G [Tj-ù Tj], (5)

Vj(ri, x(Tj )) = VU i(Tj ,x(Tj )), i = l,...,n.

4. Cooperative Case

Consider now the coo ative strategies of players u(t) = (ui(t),... ,un(t)) are defined as follows:

n

u = arg max y Ki(to,xo,u).

«1 ,...,un z—/ i=i

Let Vj(t,x(t)) - value of the Bellman function at t e [Tj-i; Tj]:

n

Vj(t, x(t)) = ma^ Ki(t, x(t), u), t e [Tj-i; Tj]. (6)

« i=i

—T

Consider now the cooperative case of the game r (x0, t0) The optimal cooper-

Over the last interval [tn_i; to), Vn(t, x) satisfies the equation:

fN(t) T/ \ dVN(t,x) , /^W Ï , dVN(t,x) / ^ -- Vn (t,x) =-—-+ max(> hi(x,u) +---g(x,u))

1 - FN(t) dt u i

with the boundary condition lim VN(t,x) = 0.

For all previous intervals, for j = 1,..., N — 1 we have:

fV(t,x) = dVft'X) + max(5^ h¿(x,u) + g(x,u)),t e [Tj-i; TjL

dVj (t,x)

i

Vj (Tj ,x(Tj )) = Vj+1(Tj ,x(Tj )).

(7)

5. Public Good Differential Game

Consider the case of the provision of a public good in which a group of n agents carry out a project by making continuous contributions of some inputs or investments to build up a productive stock of a public good. Let's x(t) - the level of the productive stock at the moment t;

ui(t) - the contribution to the public capital or investment by agent i at the moment t.

The stock accumulation dynamics:

X(t) = 53 ui(t), (8)

=i

x(to) = xo, x e IRl. The instantaneous payoff of player i:

hi(t,x,ui) = qix(t) — riu2(t). (9)

The expected integral payoff of the player i is:

/*œ r /*t

Ki(xo,io,u)= / / (qix(t) - riM2(t))d-

TO y To

To

dF(t) (10)

with

0,

t < to,

F(t), Fi(t), t G [to, ; n), S F2(t), t G [ti;t2),

1,

t > T2.

The game starts at the moment t0 and ends at a random moment before t2. Here Ti is a single switch point.

6. Feedback Nash Equilibrium

Consider the game rT(t,x(t)) when t e [ti;t2].

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

We will look for the Bellman function in the following form

Vi(t,x) = ai(t)x + bi(t). The Hamilton-Jacobi-Bellman equation has the form:

h(t) ^2i(t,x) — = max{hi(t,x,uNE) + ^^g(t,x,u-E)},

1 - F2(t)

dx

with a boundary condition

v^(t2,x)=°. The equilibrium control of the player i has the form:

.NE

i(t)

2ri

After substitution it in the system of the Hamilton-Jacobi-Bellman equations, we have a system of differential equations for ai(t) and bi(t):

Then

Let

Then

• aa f2(t) , , <Mt) = -—grrTTai(t) - qi,

1 - F2(t)

b i(t) =

f2(t) 1 - F2(t)

2

bi(t) - (^ + *(t)£ j1).

ai(t) = -qi

J(1 - F2(t))dt

1 - F2 (t) .

G (t) J(1 - F2(t))dt G

G2 (t) = - 1 - F2 (t)-, wlth G2(T) = 0.

(11) (12)

(13)

ai(t) = qiG2(t).

bi(t) = -Q

/G2(t)(1 - F2(t))dt

1 - i?2(t) ,

(14)

(15)

a

2

where Qi = ^ + f Ej=i j •

Let

H _/ Gl(t)(1 _ F2(t))dt H ( 0 H2(t) = —-1 _ p -with H2(T) =

Then

bi (t) = QiH(t). (16)

The Bellman function over the second interval is

V2(t, x) = qiG2(t)x + QiH2(t).

Consider now the game rT(t,x(t)) over the first interval, when t G [°,t1]. We look for the Bellman function in the form

Vi (t,x)= ai(t)x + bi (t). (17)

The Hamilton-Jacobi-Bellman equation has the form:

fi(t) ^ 0Vf(t,x) _ _i. „e, , dVf(t,x). ^ ^ ne,

1 _ Pl(t) Vi(t,x) _ Qt' = max{hi(t,x,u11E) + ^ g(t, x, u_E)}, with boundary condition

Vi(Ti,x) = v2(ti, x). The equilibrium control of player i is as follows:

uNE =

a.

(t)

2ri

After substitution it in the system of the Hamilton-Jacobi-Bellman equations, we have a system of differential equations for cii(t) and bi(t):

1_F)$) ~ai(t) _ ai(t) = qi, (18)

fl(t) bi(t) _ bi(t) = + (t) £ I1. (19)

where

And where

1 - F1(t) iW iW 4ri 2r.

ai(t) = qiGi(t),

Gi(t) = with Gí(rí) = G2(n).

h (t) = QiHi(t),

H J Gl(t)(1 _ Fi(t))dt G ( ) G ( ) H1(t) = _"-1 _ p^)- with G1(T1) = G2 (T1).

Finally, we have the Bellman function on the first interval:

Vf(t,x)= qiG1(t)x + QiH1(t).

7. Cooperative Case

—T

Consider r (t, x(t)) - the cooperative variant of the game.

Assume that players cooperate in order to achieve the maximum total payoff:

n

VKi(i,u) ^ max

' U = («1,...,«„)

i=i

—T

Consider the subgame r (t, x(t)) over the second interval when t e [ti; t2]. We will look for the Bellman function in the form

V2(t, x) = a(t)x + b(t).

The Hamilton-Jacobi-Bellman equation has the form:

f2(t) \ dV2(t,x) r^i,^ ^ , dV2(t,x)

V2(t,x)--—-= maxO hi(t, x, u) +--—-g(t,x,u)},

1 - F2(t) 2V ' 7 dt « v ' ' 7 dt

with boundary condition

V2(T2,x) =0. Optimal strategy for player i has the form:

_ _ a(t)

= "277.

We have a system of differential equations:

here r = £ —. Then

a(t) = . /2Ff)(t) a(t) - ]T qi, (20)

1 - F 2(t) i=1

*> = Ï-Fk- ^ (21)

,„=(22)

i=1

Using the previously introduced function G2(t), we obtain

n

a(t) = G2(t) ^ qi, (23)

i=i

i=i

With the use of H2(t), we have

b(t) )2 J G2(t)(1 - j^2(t))dt

b(t) = qi) 4(1 -F2(t)) .

r(E qi)2H2(t)

(24)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

6(t) = ^=^-. (25)

r( t qi)2

Let Q = -, then the Bellman function has the form

V2(t,x) = ^ qiG2(t)x + QH2(t).

i=i

Consider the game Г (t, x(t)) when t G [0; т\]. We look for the Bellman function in the form Vi(t,x) = a(t)x + b(t).

The Hamilton-Jacobi-Bellman equation has the form:

fi(t)

тг / N dVl(t,x) , dVl(t,x) -Vi(t, x)--—- = h (t, x, u) +--д-9(t, x, u)},

1 - F1(t) ' J dt u v ' ' 7 dx

v ' i=1

with the boundary condition

V2( ti,x) = Vi( ti,x).

Using the previously introduced functions G1(t), H1(t), we obtain the solution in this case:

n

a(t) = ^ qiGi(t),

i=1

b(t) = QH1(t),

n

V1(t,x) = ^ qiG1(t)x + QH1(t).

i=1

8. Comparing of Results

Now compare the obtained solutions for the cooperative and non-cooperative cases. Let's estimate the possible losses of players if they refuse to cooperate. To do this, compare the total payoff of the players in the cooperative and non-cooperative

nn

cases. Compare Ki(t,x(t),u) and E Ki(t,x(t),uNE) in subgame starting at the i=1 i=1 moment t from the point x(t) on cooperative trajectory. For t G [to; ti\:

n n n

^ Ki(t, x(t),uNE) = ^ qiG1(t)x(t) + ^ QiH1(t), i=1 i=1 i=1

nn

^ Ki(t, x(t),u) = ^ qiG1 (t)x(t) + QH1(t).

i=1 i=1

For t G [t1; T2\:

^ Ki(t,x(t),uNE ) = ^ qiG2(t)x(t)+ ^ QiH2(t), i=l i=l i=l n n

^ Ki(t,x(t),u)= ^ qi G2 (t)x(t) + QH2(t).

It can be seen that the first terms in these expressions coincide for any t. We can consider the ratio of the second terms in these expressions:

QH2(t)

QHi(t)

Q

£Qi#2(t) £QiH1(t) £ Qi i= 1 i=1 i=1

Thus, by refusing to cooperate, players lose in total such part

(26)

Q - £ Qi

i=i

Q

r( E qi )2 2

of the second term of cooperative payoff. Here Q = —=-, Qi = 4r~ + If £j=i j.

9. Example

Consider an example with a distribution function of random terminal time of the following form

0, t e (-to;0),

F(t)-j 1 - ^ t e [0,ti)

F(t)_M - , t e [n,T2),

1, t e [t2; to).

An example of a composite distribution function is shown in Fig.1.

(27)

Fig. 1. Composite distribution function F(t)(A = 0.1, n = 20, t2 = 60) To find the Nash equilibrium we need to find the solution of the system:

f2(t) ;V2% x) - dV^,x) = max{hi(t, x, uNf ) +

1 - ia(t)

A(t) v?(í>x) - ^^ = maxMt,x,u-f) + ^^g(t,x,«-f )},

1 - Fi (t) ' dt ui L v ' ' -i ' ' dx

with the boundary conditions

Vi (T2,x) = 0,

Vj(ri,x) = V2i(n,x).

Taking into account the type of the distribution function and optimal controls, the problem is reduced to the solution of the following system:

1 [a(t)x + bi(t)] - Oi(t)x - bi(t) = qix - ri(^i^)2 + flj(t)(^ + V).

T2 -t 2r¿ 2r¿ ^ j

ai(T2) =0, bi(T2) =0, A[á¿(t)x + 6i(t)] - ¿¿(t)x - ^(t) = qix - ri(a2Tt))2 + äi(i)(^ + £uf E).

ài(ri) = ai(ri), bi(ri) = bi(Ti). The solutions to these equations are

qi to -1)

a¿ (t) = bi(t) =

2

Qi(T2 - t)3

16

äi(t) = 2q^((A(T2 - Ti) - 2)eA(i-Tl) + 2),

b (t)_ eA(t-Tl)Qi r (A(T2 - Ti))3 | (A(T2 - Ti) - 2)2(1 - eA(t-Tl)) bi(t) = A3 L 16 + 4

-A(A(T2 - ti) - 2)(t - Ti) + (e-A(i-Tl) - 1)]. Then the Bellman function takes the form

V2i(t,x)= qi(T2 - t) x(t)+ Qi(T2 - t)3 , 2V ' ; 2 w 16 '

Vi(t, x) = 2qA((A(T2 - Ti) - 2)eA(i-Tl) + 2)x(t)+

+ eA(t-Tl)Qi r (A(T2 - Ti))3 + (A(T2 - Ti) - 2)2(1 - eA(t-Tl)) _ + A3 L 16 + 4

-A(A(T2 - ti) - 2)(t - ti) + (e-A(i-Tl) - 1)],

2

where Qi = + ^ J2 j. The optimal trajectory and controls are of the form:

XNE {t) . xo + ( (A(T2-Tl)-2)(ft-Tl)-e-AT1 ) + 2t) 4^, t G [0, TO, x (t) = ^ x! + (^ - ^) E^ , t G ^ ^ (28)

here xi = xo + (^^m-^1 ) + ^,

uNE(t) i ((X(T2 - T2) - 2)e>(t-Tl) + 21 t G [( T2), u ( ) =1 , t G [T2, T2]. (29)

10. Cooperative Solution

For cooperative case we have

Eq,(T2 -1) nf

\ , m , Q(T2 -1)3 V2(t,x) = ---x(t) +

2 16

E q,

Vi(t,x) = -2_((X(T2 - Ti) - 2)ex(t-Tl) +2)x(t)+

+ ex(t-Tl)Q r (X(T2 - Ti))3 (X(T2 - T2) - 2)2(1 - ex(t-Tl)) + X3 [ 16 + 4

-X(X(T2 - Ti) - 2)(t - Ti) + (e-X(t-Tl) - 1)],

r

Πqi)

here Q = i 1-. The optimal trajectory and controls are of the form:

x-(t) = < xo + (_6-AT1) + It) ^, t G [0, T1), (30) x(t) = \ x1 + (^ _ ^) ^,t G [T1, T2\, (30)

here x1 = xo + (fr^MX1-^) + 2n) ^ rt^,

u (t) = f ^((MT2 _ n) _ 2)eA(t_Tl) +2) t G fo (31)

ui( )=\ ^^ t G [T1, T2]. (31)

Optimal strategies for player 1 is shown in Fig. 2. Optimal trajectories are shown in Fig. 3.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

11. Conclusion

A method of the construction of optimal feedback cooperative and non-cooperative strategies in differential games with random duration and composite distribution function of terminal time is proposed. Due to the switching modes of the distribution function, the payoffs of players in such games take the form of the sums of integrals with different, but adjoint time intervals. A method for solving such problems using dynamic programming is explored. It is shown that finding the optimal control in such problems is reduced to the sequential consideration of intervals, starting from

Fig. 2. Optimal strategies for player 1 (À = 0.1, ti = 20, T2 = 60, qi = 10; q2 = 20; ri =

1; r2 = 2)

the last switching, and compiling the Hamilton-Jacobi-Bellman equation on each interval, and boundary solutions are obtained from the solution on the interval considered earlier. As an example, the public good differential game of n persons with one switch was given.

References

Balas, T.N. (2022). One hybrid optimal control problem with multiple switches. Control Process. Stab., 9, 379-386.

Balas, T., Tur, A. (2023). The Hamilton-Jacobi-Bellman Equation for Differential Games with Composite Distribution of Random Time Horizon. Mathematics, 11(2), 462.

Bellman, R. Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1957.

Dockner, E. J., Jorgensen, S., van Long, N., Sorger, G. (2000). Differential Games in Economics and Management Science. Cambridge University Press: Cambridge, UK.

Gromov, D., Gromova, E. (2014). Differential games with random duration: A hybrid systems formulation. Contrib. Game Theory Manag., 7, 104-119.

Gromov, D., Gromova, E. (2017). On a Class of Hybrid Differential Games. Dyn. Games Appl., 7, 266-288.

Kostyunin, S., Shevkoplyas, E. (2011). On simplification of integral payoff in the differential games with random duration. Vestn. St. Petersburg Univ. Math., 4, 47-56.

Petrosyan, L. A., Shevkoplyas, E. V. (2003). Cooperative Solution for Games with Random Duration. Game Theory Appl., 9, 125-139

Shevkoplyas, E. V. (2009). The Hamilton-Jacobi-Bellman equation for a class of differential games with random duration. Math. Game Theory Appl., 1, 98-118.

Shevkoplyas, E., Kostyunin, S. (2011). Modeling of Environmental Projects under Condition of a Random Time Horizon. Contrib. Game Theory Manag., 4, 447-459.

Yaari, M. E. (1965). Uncertain Lifetime, Life Insurance, and the Theory of the Consumer. Rev. Econ. Stud., 32, 137-150.

i Надоели баннеры? Вы всегда можете отключить рекламу.