Научная статья на тему 'FEEDBACK AND OPEN-LOOP NASH EQUILIBRIA IN A CLASS OF DIFFERENTIAL GAMES WITH RANDOM DURATION'

FEEDBACK AND OPEN-LOOP NASH EQUILIBRIA IN A CLASS OF DIFFERENTIAL GAMES WITH RANDOM DURATION Текст научной статьи по специальности «Математика»

CC BY
5
2
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
DIFFERENTIAL GAME / NASH EQUILIBRIUM / RANDOM VARIABLE / OPEN-LOOP STRATEGIES / FEEDBACK STRATEGIES

Аннотация научной статьи по математике, автор научной работы — Tur Anna V., Magnitskaya Natalya G.

One class of differential games with random duration is considered. It is assumed that duration of the game is a random variable with values from a given finite interval. The game can be interrupted only on this interval. Methods of construction feedback and open-loop Nash equilibria for such games are proposed.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «FEEDBACK AND OPEN-LOOP NASH EQUILIBRIA IN A CLASS OF DIFFERENTIAL GAMES WITH RANDOM DURATION»

Contributions to Game Theory and Management, XIII, 415-426

Feedback and Open-Loop Nash Equilibria in a Class of Differential Games with Random Duration *

Anna V. Tur and Natalya G. Magnitskaya

St.Petersburg State University 7/9, Universitetskaya nab., Saint-Petersburg 199034, Russia E-mail: a.tur@spbu.ru E-mail: magnitsnatalya@gmail. com

Abstract One class of differential games with random duration is considered. It is assumed that duration of the game is a random variable with values from a given finite interval. The game can be interrupted only on this interval. Methods of construction feedback and open-loop Nash equilibria for such games are proposed.

Keywords: differential game, Nash equilibrium, random variable, open-loop strategies, feedback strategies

1. Introduction

Differential game theory is commonly used to describe realistic conflict-controlled processes with many participants. When modeling economic or environmental processes, many researchers turn to the use of differential games with finite or infinite duration. However, the recently popular direction of studying games with random duration allows to simulate a process close to the real one, in which the terminal time of the game is not known in advance, but is an implementation of some random variable (Petrosjan and Murzov, 1966; Petrosjan and Shevkoplyas, 2000; Shevko-plyas, 2014). For the first time the class of differential games with random duration was introduced in (Petrosyan and Murzov, 1966) for a particular case of a zero-sum pursuit game. Later, the general formulation of the differential games with random duration was given in (Petrosyan and Shevkoplyas, 2000).

The aim of this paper is to investigate the case when the ending of the game is possible not over the whole period of the game, but only at a certain given interval. Players know that the game will not be interrupted until a certain point. After this moment, the game may abruptly end. Consideration of the problem in this vein leads to the fact that payoffs of players can be represented as sums of integrals with different but adjoint time intervals. The paper provides ways to construct open-loop and feedback Nash equilibria in this class of games.

The paper is structured as follows. In section 2 the problem formulation is given. One method of construction feedback Nash equilibrium is considered in section 3. This method is applied to an illustrative example in section 3.1. The construction of Nash equilibrium in open-loop strategies is investigated in section 4. In section 4.1 open-loop Nash equilibrium strategies are studied for the example from sec. 3.1.

* The reported study was funded by RFBR according to the research project N 18-0000727 (18-00-00725)

2. Problem Formulation

Consider differential n-player game r(x0, T — to) defined on the interval [to, T] with the system dynamics described by differential equations:

x(t) = g(t, x, u),

x(to) = xo, (1)

x G R1, u = (ui,..., un), Mj = Mj(t) G Ui C compRfc.

The game r(x0,T —10) starts from initial state x0 at the time instant t0. But

T

some predetermined distribution law. Let a cumulative distribution function has the form: _

( 0, forr<T — J, _ F (t )= < ^(r — J < t < T + J, (2)

[ 1, forr > T + J,

where y(r) is assumed to be an absolutely continuous nondecreasing function satisfying the following conditions: y(T — J) = 0 y(T + J) = 1. This means that the game could end only during the period [T — J, T + J], where J G [t0, T].

The expected payoff of player i G N in r(x0, T —10) is defined in the following way:

T-S T+S t

Ki(x0,T — t0; u) = j hj[s,x(s),u]ds + j j hj[s, x(s), u]dsdF(t). (3)

¿o T-ST-S

According to (Gromova and Tur, 2017), where the transformation procedure of the double integral functional and its reduction to a single integral is described, the

iGN

T-S T+S

K"i(x0>T — t0;u)= / h>,xW,u]ds + / hi[..xW,u](1 — F(s)K, (4,

to T-S

And the expected payoff of player i in subgame r(x(t), T — t), starting at the t x( t)

Kj(x(t),T — t; u) =

T-S T+S _

f hj[s,x(s),u]ds + f hj[s, x(s), u](1 — F(s))ds, for t G [t0,T — J),

t _ T-S (5)

t+s _ _

I hj[s,x(s),u](1 — F(s))ds, for t G [T — J, T + J).

( ' t

We assume an existence of a probability density function f (t) = F'(t).

3. Feedback Nash Equilibrium. Hamilton-Jacobi-Bellman Equations

One of the principles of optimality in non-cooperative differential games is a feedback Nash equilibrium. Feedback Nash equilibrium strategies depend only on

the time variable and the current value of the state, but not on memory (including the initial state x0) (Ba§ar and Olsder, 1995). We use the sufficient conditions of Hamiltom-Jacobi-Bellman equations in order to find a feedback Nash equilibrium. The Hamilton-Jacobi-Bellman equations for differential games with random duration was proposed in (Shevkoplyas, 2014). Here this method is adapted for the problem under consideration.

In the framework of this approach the Bellman function V4 (t,x) is defined as the payoff of player i in feedback Nash equilibrium uNE (t,x) in the subgame of r(x(t),T -1) starting at the instant t e [T- J,T + J] in the state x(t). And W*(t,x)

i uNE(t, x)

r(x(t), T — t) starting at the instant t e [to, T — J] in the state x(t). The following theorem takes place:

Theorem 1. uNE (t, x) is the Nash equilibrium in feedback strategies in the differential game r(x0, T — t0), if there exist continuously different-table functions V4(t, x) : [T — J,T + J] x Rl ^ R, i e N and W4(t,x) : [t0,T — J] x Rl ^ R, i e N, satisfying the following system of partial differential equations:

i-l) V4(t, x) — V/ (t, x) = max {h4(t, x, uNE) + Vj (t, x)g(t, x, uNE)} =

= h (t,x,uNE) + Vj (t,x)g(t,x,uNE), i e N,

V4(T + J,x(T + J))=0, i e N,

{}

—Wt4(t,x) =max{hi(t,x,uNE) + Wj(t, x)g(t, x, uNE )} =

h4(t,x,uNE) + WX (t,x)g(t,x,uNE) i e N, W4(T — J, x(T — J)) = V4(T — J, x(T — J)), i e N, where uNiE(&) = (uNE, ...,&,..., uNE).

Proo/. Define Ii = [t0, T — J] and /2 = [T — J, T + J],

First, consider our problem on the segment /2. The payoff of player i e N on /2 is given by

T+S

K/2 (x(t),T — t; u) = --^ i ti [s,x(s),u](1 — F (s))ds, for t e [T — J,T + J).

1 — F (t) J

t

. (?)

The Bellman function V4(t,x) is defined as the payoff of player i in feedback Nash equilibrium uNE(t, x) in the subgame of r(x(t), T — t) starting at the instant t e [T — J, T + J] in the state x(t).

According to (Shevkoplyas, 2014) HJB equations for finding Nash equilibrium in the game with payoffs of the form (7) ctrG clS follows:

i-t)t) V4(t, x) — V/ (t, x) = max {h4(t, x, uNE) + Vj (t, x)g(t, x, uNE)} =

= h (t,x,uNE) + VX (t,x)g(t,x,uNE), i e N, (g)

V4(T + J,x(T + J))=0, i e N,

where uNE(^i) = KE).

Consider now our problem starting at some moment t € /1. The payoff of player i € N is:

T-S

K/1 (x(t),T - t; u) = J hi[s,x(s),M]ds + V(T - 5, x(T - 5)), (9)

t

where Vi(T — 5, x(T — 5)) - the payoff of player i in Nash equilibrium for the period /2. The value Vi(T — 5, x(T — 5)) is considered as a terminal payoff of player i for /1

—Wti(t, x) = max {h (t,x,uNE) + WX (t, x)g(t, x, uNE)} =

h4(t, x, uNE) + WX(t, x)g(t, x, uNE) i € N, (1Q)

W4(T — 5,x(T — 5)) = V4(T — 5,x(T — 5)), i € N,

where uN?(&) = (uNE,...,&,..., uNE)•

3.1. Differential Game of Investment

Consider an illustrative example. Assume that there are n individuals who invest

x( t)

knowledge at time t and u4(t, x) - the investment of agent i in public knowledge at t

x(t) = ui(t) + U2(t) + ... + u„(t), x(0) = xo. (11)

If each agent derives linear utility from the consumption of the stock of knowl-

i€N

Ki(xo,T;u) = E i (qix(t) — riu,2(t))dt. (12)

o

Assume that the random variable T distributed uniformly on [T — 5, T + 5]. The cumulative distribution function has the form:

( 0, _ for t < T — 5, f(t)= i ^ ,fat — 5 < t < T + 5, (13)

[l, forT > T + 5.

3.2. Feedback Nash Equilibrium

To find the feedback Nash equilibrium in the subgame, starting at the time instant T — 5 from the state x(T — 5), consider the first part of HJB equations (6):

V4(t,x) — V/(t,x) = max(qix — r^2 + V*(t,x)(ui + £ ufE)), +S— _ " j=i (14)

Vi(T + 5, x(T + 5))=0, i € N.

Bellman function is defined in the form: V4(t, x) = a4(t)x + bi (t). The maximiza-

i

NEf+ \ Vx(t,x) ai(t)

u (t,x) = ^T~ = irr.

Substituting it into (14) we have the following system of differential equations for a»(t):

ai(t) = w—;—iai(t) - q' i G N' (15)

T + o — t

a,(T + 0) = 0.

Then

a» (t) = § (T + 0 — t).

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

For bj (t) we have:

bi(t) = =^--bj(t) — f2 — £ , i G N, (16)

T + 0 -1 4r» f—f 2r j

So we get

bj(T + 0) = 0.

bi(t) = m (T + 0 — t)3, i G N, (17)

where m» = Hr- + E fj •

Then the feedback Nash equilibrium strategies on Z2 are:

(t,x) = —(T + 0 — t), i = 1 G N, t G [T — 0,T + 0].

4r»

The payoff of player i G N in Nash equilibrium for the period Z2 looks as follows: Vj(T — 0,x(T — 0)) = qj0x(T — 0) + 2mj03, i G N. (18)

Then, the bound condition for the problem on Zi is:

Wi(T — 0,x(T — 0)) = qj0x(T — 0) + 2mi03, i G N.

To find the feedback Nash equilibrium in the subgame, starting at the time instant to from xo and ending at T — 0 in x(T — 0), consider the system of HJB equations:

—Wtj(t,x) =max(q»x — r»«2 + Wj(t,x)(u + V)), i G N, (19)

Ui ^—^ J

Wj(T — 0,x(T — 0)) = q,0x(T — 0) + 2mj03, i G N.

Bellman function is defined in the form: Wj(t, x) = c» (t)x + d»(t). The maximization problem in (19) yields a strategy for player i: (t, x) = ^T*'^ = . Substituting it into (19) we have the following system of differential equations for cj(t):

Cj(t) = —q», i G N (20)

Cj(T — 0) = q»0.

Then

Cj(t) = q» (T — t).

For di (t) we have:

d № = - £ ^, i e N (21)

i i=j 3

We get

di(T - J) = 2miJ3.

d 4mi(T -t)3 + 2miJ3 N

di(t) = -5--1--5— > i e N (22)

3

where mi = if- + £ fj

i=3

Then the feedback Nash equilibrium strategies in the game r(x0, T — to) are:

ufE (t) - >

(T -t), i = 1 e N, t e [to, T — J],_ 4r. (T + J -1), i = 1 e N, t e [T - j,t + J].

ji±- (

4. Open-Loop Nash Equilibrium

The second part of the paper is devoted to construction of open-loop Nash equilibrium for the game under consideration, which depends only on the time t

The method we introduce here is based on Pontryagin's maximum principle (Pontryagin et al., 1963).

We will find the solution on two intervals Zi = [t0; T - J] Mid 12 = [T - J; T + J],

T- J

their values at the end of the solution from the maximization condition.

Let's Stcirt with studying the game at the period Ii.

T-S

Each player i e N tries to maximize / hi[s, x(s), u]ds for dynamic (1). The

t _

problem will be solved with two fixed ends: x(t0) = x0 and x(T - J) = xi. Introduce xi as a parameter of the solution (we will see that xi is indeed a function of n parameters). The use of such a method for cooperative differential games was proposed in (Gromov and Gromova, 2017), (Gromova and Magnitskaya, 2019). Here we adapt it for non-cooperative games. On the interval Zi the Hamiltonian for player i is:

Hi(x,uJVE,V0 = ^g(t,x,uVE) + hi(t,x,uNviE), i e N. (23)

The equilibrium strategies uVE are found from the first order extremality condition:

dHi(x,uVE =0 dui

The adjoint equations are:

d^i dHi(x,uVE

dt dx

We introduce the boundary conditions ^ (T — J) = z® as parameters of the solution. Let (s, zi,..., zn) - equilibrium strategies on /1. And the equilibrium trajectory (t, z1;..., zn) we can found from (1). And x1 = x^(T — J, z1;..., zn). Now turn to studying the solution on the second interval /2.

T+S

Player i maximizes / h®[s,x(s),u](1 — F(s))ds for dynamic (1) with initial

T-S

condition x(T — J) = x1 = xNE(T — J, z1;..., zn), and with a loose right end.

i

Hi(x,MNE ,V0 = ^¿g(t,x,MNE ) + (1 — F (t))h® (t,x,MNE). (25)

To find equilibrium strategies wNE we use the necessary condition for the maximum:

dui '

The adjoint equations are:

d^i dHi(x,uNE

"dT =--dx—, (26)

with transversality conditions

+ 5) =0.

Let (t, zi,..., zn) - equilibrium strategies on J^d xN2E(t, z1;..., zn) -equilibrium trajectory on J2.

On the last step of our solution we find the value of parameters z*,..., z^ in the following way:

T-S

z* = argmax (J hi[s,xNE (s,z*-i),MNE (s,zli)]ds+

to

T+S

+ J hi[s,xN2E(s,z*i),«N2E(s,z*i)](1 - F(s))ds), (27)

T -S

where z *i = (z*,..., z*-1, zi, z*l1,...,z**). Finally we get equilibrium strategies:

.NE (t) = J ufE (t zl,...,zn )1i > t G [t0,T - 5L

(t) = 1 uNE(t, z*,..., zn)i2, t G [T - 5, T + 5]. ^

The method of constructing strategies (28) ensures that they are equilibrium strategies.

4.1. Differential Game of Investment. Open-Loop Nash Equilibrium

Consider again the example suggested in section 3. Find the solution in the class of open-loop strategies. Intervals Zi

Let's Stcirt with studying the game at the period Ii.

T—S

Payoff of player i on A is / (qix(t) - riu2(t))dt for dynamic (1).

0

i

fli(x,u,^) = Vi (ui + J2 ufE))+ qi x(t) - ri u2(t). (29)

3=i

uiN E

imum:

dH-

—- = ^i - 2riui(t) = 0, dui

uVE (t) = ^. 2ri

The Hessian matrix is negative definite hence we conclude that Hamiltonian Hi is concave w.r.t. u^ t e [0, T - J],

d2Hi

mr = -2ri <0.

The adjoint equations are as follows:

d^i dHi(x,u,^)

dt dx

-qi. (30)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

We introduce the boundary conditions ^i(T - J) = zj, i = 1,..., n as parameters of the solution. Hence,

^i(t) = Zi - qit,

NE/. \ zi qit /01^

ui (t,z)/i = 2r. . (31)

The dynamic is:

x(t) = ui(t) = XI Zi 2 qit = z - tqi'

i=i i=i 2ri

where z = £n=i q = £Li 2ri. We use the boundary condition:

x(0) = x0. (32)

Then the optimal trajectory for the interval Zi:

xV1E (t,z) = zt - ^ + x0. (33)

Then xNE (T - 5, z) = z(T - 5) - g~(T-5)2 + xo. Intervals /2

Now turn to studying the solution on the second interval /2. T+5 _

Player i maximizes / (1 — t-T+5 Xgx(t) — ru2(t))dt for dynamic (1) and initial T

condition x(T — 5) = (T — 5, z1;..., zn).

i

t T + 5

Hi(x, u, & = &j(uj + ^ ufE)) + (1 - -)(qix(t) - r<u2(t))- (34)

j=j

To find equilibrium strategies uNE we use the necessary condition for the maximum: _

dH , t - T + 5,

' & - 2(1 - + ^ -

duj 25

Uf^ (t)l2

NE^ &i5

(5 - t + T)ri

The Hessian matrix is negative definite hence we conclude that Hamiltonian Hj is concave w.r.t. u^ t e [T - 5; T + 5],

5 2Hj , t - T + 5.

luf = -2(1--25-)rj < °

The adjoint equations are:

dHj(x,u,& t - T + 5.

If =--dX-= -(1 - "")qj' (35)

with transversality conditions:

& (T + 5) = °, i e N. (36)

We get the solution of (35)-(36):

& (5 + T)(t - T + 5) + gj(t2 - (T - 5)2) + 5 & =--25-+-45-+ ®5' (37)

To get the optimal trajectory substitute (37) into (11):

x(t) = ^^- V &.

W t - T + 5 ^ rj

i= i

The boundary condition is the following: x(T - 5) = xfE (T - 5, z1;..., zn) =

z(T - 5) - + xo.

Then the optimal trajectory is:

xN2E(t) = z(T - 5) - g(T 2 5)2 + xo + C(t), (38)

where C(t) is an expression independent of z1;..., zn

Optimal strategies on /2 have the form:

_ , q®(J + T)(t — T + J) , q®(t2 — (T — J)2) , J _

N (t)i2 = ' ^ ' + W 5 ' ' + «5)(

(5 - t + T )Ti

(39)

25 45 J (5 -1 + T)ri

qi(T + 5 - t)

4ri

Intervals J1, J2

We find z1,..., zn from the maximization condition of the total payoff in the interval [0,1 + 5], i.e.

z* =argmaxKi(0, xo,uNE(t, z-i)),

Zi

according to the (33), (31), (38), (39). Substituting (33), (31), (38), (39) to the (27), we get:

T-S T+S _

| (%xNE (t)ii - ri«NE (t)2i )dt + J (1 - )(qixNE (t)^ - r ufE (t)^ )dt =

T-S

qiz(T - 5)2 z2(T - 5) qizi(T - 5)2

2 4r,- 4r,-

+ qiz(T - 5)5 + B(T,5), (40)

where B(T, 5) is an expression independent of z1,..., zn.

To find z* we use the necessary condition for the maximum:

dKi

—— =0, i = 1,... ,n.

dzi

Solving

qi(T - 5)2 2zi(T - 5) + qi(T - 5)2 + qi(T - 5)5 =0 i = 1 n

4ri 4ri 4r 2r

we have:

z* = qi T.

Finally, we get

uNE(t)ii = , t G [0,T - 5],

uNE(t)/2 = q®(T +rJ t), t G [T — J,T + J].

It can be noted, that in the game under consideration open-loop and feedback equilibrium strategies coincide. This is also characteristic of classical differential games with a linear structure.

Note also that lim uNE (t) = (J-^, and these are Nash equilibrium strategies

in the differential game with prescribed duration T —10.

4.2. Numeric Example

Consider the previous example with numeric parameters. Let n = 3, T = 10 S = 3, qi = 4 q2 = 3 q3 = 6 ri = 2 r2 = 1 = 5 xo = 20. Consequently:

(t) = + 3it + 20, (t)/i =10 - t,

uNE(t)/i =15 - ^

3t 2 '

3t

"5

(t)/1 =6 — t G [0 ,7),

—31t2 403t

xN2E (t) = + — + 57.975,

Ul (t)l2

40 20

13 i

T — 2 '

(t)l2 = ^ ~ ,

39 3t

T — T

39 3t 10 — 10 '

u3lE(t)/2 = ^ — ^, t G [7, 13].

200 — <E (t) — (t) 10 UlE (i)u + uïE (t)l 2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

150 8

H 100 / 6 s 4

50 2 0

0 2 4 6 8 10 12 14 t

0 2 4 6 8 10 12 14 t

Fig. 1. Nasli equilibrium trajectory Fig. 2. Equilibrium strategy for player 1

5. Conclusion

The special class of differential games with random duration is investigated. The method of construction feedback Nash equilibrium based on Hamilton-Jacobi-Bellman equations is proposed. The method of construction open-loop Nash equilibrium based on Pontryagin's maximum principle is studied. An illustrative example demonstrating both of two methods is considered. The numerical example is given. The results are drawn.

Fig. 3. Equilibrium strategy for player 2 Fig. 4. Equilibrium strategy for player 3

References

Ba§ar, T., Olsder, G. (1995). Dynamic Non-Cooperative Game Theory. London: Academic Press.

Dockner. E. J.. S. Jorgensen. N. van Long, and G. Sorger (2000). Differential Games in Economies and Management Science. Cambridge Univ. Press.

Gromov, D., Gromova, E. (2017). On a Class of Hybrid Differential Games. Dyn. Games Appl. 7, 266 288. https://doi.org/10.1007/sl3235-016-0185-3

Gromova, E. V., Maguit.skaya, N. G. (2019). Solution of the differential game with hybrid structure. Contributions to Game Theory and Management. 12, 159 176

Gromova, E., Tur, A. (2017). On the form of integral payoff in differential games with random duration. 2017 XXVI International Conference on Information, Communication and Automation Technologies (ICAT), Sarajevo, 2017, pp. 1-6, doi: 10.1109/IC AT. 2017.8171597.

Petrosyan, L. A., Murzov, N. V. (1966). Game-theoretic Problems in Mechanics. Lithuanian Mathematical Collection, 3, 423 433.

Petrosyan, L. A., Slievkoplyas, E. V. (2000). Cooperative differential games with random duration. Vestnik Sankt-Peterburgskogo Universiteta. Ser 1. Matematika Meklianika Astronomiya Issue 4, pp. 18 23.

Pontryagin, L. S., V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishclienko (1963). The Mathematical Theory of Optimal Processes. Wiley-Interscience.

Slievkoplyas, E. V. (2014). The Hamilton-Jacobi-Bellman equation for a class of differential games with random duration. Aut.om. Remote Control, 75, 959 970. https://doi.Org/10.1134/S0005117914050142

i Надоели баннеры? Вы всегда можете отключить рекламу.