RANDOM INFORMATION HORIZON FOR A CLASS OF DIFFERENTIAL GAMES WITH CONTINUOUS UPDATING

Tur Anna V.; Petrosian Ovanes L.

IJDC 519.837 Вестник СПбГУ. Прикладная математика. Информатика... 2022. Т. 18. Вып. 3

MSC 91А25

Random information horizon for a class of differential games with continuous updating*

A. V. Tur, 0. L. Petrosian

St Petersburg State University, 7-9, Universitetskaya nab., St Petersburg, 199034, Russian Federation

For citation: Tur A. V., Petrosian O. L. Random information horizon for a class of differential games with continuous updating. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2022, vol. 18, iss. 3, pp. 337-346. https://doi.org/10.21638/11701/spbul0.2022.304

In the paper we consider a class of the differential games with continuous updating with random information horizon. It is assumed that at each time instant, players have information about the game (motion equations and payoff functions) for a time interval with the length 0 and as the time evolves information about the game updates. We first considered this type of games in 2019. Here we additionally assume that 0 is a random variable. The subject of the current paper is definition of Nash equilibrium based solution concept and solution technique based on Hamilton — Jacobi — Bellman equations.

Keywords: differential games with continuous updating, Nash equilibrium, Hamilton^ Jacobi — Bellman equation, random information horizon.

1. Introduction. Differential game theory is commonly used to describe realistic conflict-controlled processes with many participants in the context of a dynamical system. In the case where participants (players) have different goals and act individually noncooperative differential game theory is applied. It is usual to consider games with prescribed duration (finite time horizon) or games with infinite time horizon, where players at the beginning of the game know all the information about the dynamics of the game and about the preferences of players (payoff functions).

However, when considering game-theoretic models of realistic processes it is important to take into account the possibility of occurrence of various uncertainties during the game. For example, players may receive incomplete information about the process. Such type of uncertainty was investigated in [1-5]. There it is supposed that players lack certain information about the motion equations and payoff functions on the whole-time interval on which the game is played. At each time instant information about the game structure updates, players receive information about motion equations and payoff functions. The class of noncooperative differential games with continuous updating was considered in the papers [1, 2]. The system of Hamilton — Jacobi — Bellman equations is derived for the Nash equilibrium in a game with continuous updating in [2]. In the paper [1] the class of linear-quadratic differential games with continuous updating considered and the explicit form of the Nash equilibrium is derived. Moreover, papers [3, 4] are devoted to the study of cooperative games with continuous updating.

* The work of the first author was funded by the Russian Foundation for Basic Research and Deutsche Forschungsgemeinschaft (DFG) (project N 21-51-12007). The work of the second author was carried out under the auspices of a grant from the President of the Russian Federation for state support of young Russian scientists — candidates of science (project N MK-4674.2021.1.1). © St Petersburg State University, 2022

Another type of uncertainties in differential games is that the game may end abruptly. The class of differential games with random duration was considered in [6-8] to study this case.

This paper attempts to combine two these approaches and considers the class of differential games with continuous updating and random information horizon. It is assumed that at each time instant, players have information about the game for a time interval with the length 0, where 0 IS cl random variable.

The aim is to present optimality conditions in the form of Hamilton — Jacobi — Bellman equations for the solution concept similar to the feedback Nash equilibrium for a class of games with continuous updating and random information horizon. The results are illustrated with a game-theoretical model of non-renewable resource extraction.

2. Initial game model. Consider differential n-player game with prescribed duration T(xo,T — to) defined on the interval [to,T], Motion equation has the form

x(t) = g(t, x, u),

x(to) = xo, (1)

x € Rl, u = (u\,..., un), Ui = ui(t, x) € Ui C Rk.

Payoff function of player i is defined in the following way:

T

Ki(xo, T — to; u) = J hi[t,x(t),u(t,x)]dt, i e N. (2)

to

In formula (2) hi[t,x(t),u(t,x(t))], g(t,x,u) are the integrable functions by all the arguments, x(t) is the solution of Cauchy problem (1) with fixed u(t,x) = (u^(t,x),..., un(t, x)). We consider the class of closed-loop strategies. The strategy profile u(t,x) = (ui(t, x),..., un(t, x)) is called admissible if the problem (1) has a unique and continuable solution. Each player attempts to maximize his payoff.

T

corresponding differential game with continuous updating.

n

ferential game r(xo,to,to + #)> defined on the interval [to, to + 0], where 9 is a random variable distributed on [0,T], with some predetermined distribution law. Let a cumulative distribution function has the form

i 0 for r < 0, F(t) = < tp(r) for 0^t_<T, [ 1 for T > T,

here f(r) is assumed to be an absolutely continuous non-decreasing function, satisfying conditions (p(0) = 0, <p(T) = 1, 0 < T < T — to- Denote by f^fr) a cumulative distribution function of random variable t + 0:

{0 for r < t,

<p(r-t) for t < r < t + T, 1 for T^ t + T.

Motion equation in r(x0, t0,t0 + 9) has the form

xt0 (s) = g(s, xt0 ,uto), xt0 (to) = xo,

xt0 e Rl, ut0 = (u\0,. . .,utr°), ut0 = ut0 (s,xt0) eUi CcompRfc.

The expected payoff of player i e N in r(x0,t0,t0 + 9) is defined in the following way:

to+T t

Kl°{x0,t0,T]uto) = J J hi[s,xto{s),uto{s,xto)]dsdFto{T), (3)

t0 t0

where xt0 (s), ut0 (s,xt0) are trajectory and strategies in the game r(x0,t0,t0 + 9), xto (s)

s

Subgame of differential game with continuous updating. Consider n-player differential game r(x, t,t + 9), defined on the interval [t, t + 9], here t e [t0, T], Motion equation for the subgame r(x, t,t + 9) has the form

(4)

xt (s) — g(s, xt, ur),

(t) — ^c, x* e Rl, u — (u \ ,...,utn), ui — u\(s,xt) e Ui С compRfc.

The expected payoff of player i e N for the subgame Г(х, t,t + в) has the form

t+T т

Kl(x, t, T; v?') = J J hi[s,xt(s),ut(s,xt(s))]dsdFt(T). (5)

t t

In formula (5) x*(s), u*(s, x*) are trajectories and strategies in the game r(x,t,t + в), xt(s)

s

ieN

t+T

= (a))]d -*•(.))*. (0)

t

And the expected payoff of player i in subgame of Г(х, t,t + в), starting at the moment т from х*(т) is formula

t+T

К1(х\т),т,Т-,и*)= 1_1 J fi'M'W^'Is.i'Wl-F'WKt^.

т

Differential game with continuous updating is developed according to the following rule: current time t e [t0,T] evolves continuously and as a result players continuously obtain new information about motion equations and payoff functions in the game r(x,t, t + в).

Strategy profile u(t, x) in the differential game with continuous updating has the form u(t,x) = ut(s,xt(s))\s=t, t e [to,T], (7)

where хг), а G [t,t + T] are strategies in the subgame Г(ж, t,t + 6).

u(t, x)

For each t e [to,T] we find the strategies ut(s,xt) in the subgame r(x,t,t + 0), starting at t. Then we construct a strategy profile u(t, x) of the game with continuous updating, using the strategies ut(s, xt) in such a way that at each moment of time t e [to, T], u(t, x) coincides with ut(s, xt) at the initial moment of the subgame r(x,t,t + 0) (when s = t). We get u(t,x) by combining initial values of ut(s,xt) for every t e [to,T], x( t)

accordance with (1), where u = u(t,x) are strategies in the game with continuous updating (7). We suppose that the strategies with continuous updating obtained using (7) are admissible or that the problem (1) has a unique and continuable solution.

The essential difference between the game model with continuous updating and classic differential game with prescribed duration r(xo,T — to) is that players in the initial game are guided by the payoffs that they will eventually obtain on the interval [to,T], but in

t

on the expected payoffs (3), which are calculated based on the information defined on the interval [t,t + 0] or the information that they have at the instant t. Unlike previous models [2], here we assume, that the duration of the period on which players have the information about the game 9 is not fixed, 9 is a random variable with distribution on the interval [0 ,T],

It is important to mention that the strategy profile and the related trajectory with continuous updating can be defined on the infinite interval due to the continuously updating structure, but in this paper we present them on the closed interval [to, T],

4. Nash equilibrium in game with continuous updating. In the framework of continuously updated information, it is important to model the behavior of players. To do this, we use the concept of Nash equilibrium in feedback strategies. However, for the class of differential games with continuous updating, we would like to have it the following form: for any fixed t e [to,T], uNE(t,x) = (uNE(t,x),...,uNE(t,x)) coincides with the Nash equilibrium in the game (4)-(6) defined on the interval [t, t + 0] in the instant t.

Choosing at each time moment t e [to, T] strategies that are equilibrium at the initial

t

the players to deviate from the chosen strategies at any moment, since they orient themself

t

Following (7), first we find the Nash equilibrium utNE(s,xt) in each subgame r(x,t, t + 0) for eve ry t e [to,T], and then consider initial values of utNE (s,xt) to construct uNE (t,x).

In order to combine the Nash equilibria in all such subgames, we introduce the concept of generalized Nash equilibrium in feedback strategies as the principle of optimality.

Definition 1. Strategy profile uNE (t,s,xt) = (uNE (t, s, xt),...,uNE (t,s,xt)) is a

te

[to, T] strategy profile uNE(t, s, xt) is the feedback Nash equilibrium in game r(x, t,t + 0).

It is important to notice that the generalized feedback Nash equilibrium uNE(t, s, xl) for a fixed t is a function of s and xwhere s is defined on the interval [t,t + T], Using generalized feedback Nash equilibrium it is possible to define solution concept for a game model with continuous updating. The meaning of the word "generalized" comes from the idea of generalizing the notion of Nash equilibrium by introducing an additional time parameter t as a starting point for each game defined on the interval [t,t + 0].

Definition 2. Strategy profile uNE(t, x) is called the Nash equilibrium with continuous updating if it is defined in the following way:

uNE (t, x) = UNE (t,s,xt(s))\s=t = (UNE (t,s,xt (s))\s=t ,...,UNE (t,s,xt(s))\s=t), ( . t e [to,T],

where uNE(t, s,xt (s)) is the generalized feedback Nash equilibrium defined in Definition 1.

Trajectory xNE (t) corresponding to the Nash equilibrium with continuous updating can be obtained from the system (1) substituting there uNE(t,x).

Unlike the generalized feedback Nash equilibrium, uNE(t, x) does not contain feedback Nash equilibrium strategies for any s G [t,t + T], Strategy profile uNE(t, x) only contains strategies of players that they perform according to the procedure described in Section 4, i. e. continuous updating procedure, where s = t. Strategy profile uNE(t,x) will be used as a solution concept in the game with continuous updating.

5. Hamilton — Jacobi — Bellman equations with continuous updating. In order to define strategy profile uNE (t, x), it is necessary to determine the generalized Nash equilibrium in feedback strategies uNE (t,r,xt) in the game with continuous updating. To do this, we will use a modernized version of dynamic programming (see [9]). In the framework of this approach, the Bellman function Vl(t,r,xt) is defined as the payoff of player i in feedback Nash equilibrium in the subgame of r(x, t, t + 0) starting at the instant t in the state xt (t):

t+T

V\t,r,xt) = T-^-) J hi[s,x^NE{s),u^NE{s,xt)]{l - F\s))ds, teN.

T

The following theorem takes place.

Theorem. uNE (t,T,xt) is the generalized Nash equilibrium in feedback strategies in the differential game with continuous updating and random information horizon, if there exist functions Vl(t, r, x*) : [to,T] x [t,t + T] x Rl —» R, i e N, continuously different-table by t and xt, satisfying the system of partial differential equations:

vi(t,T,xt)-v;(t,T,xt) =

1- Ф -1) x

Фieui

max {Н*(т,хг,uNf) + ^Ht,r,xt)g(r,xl,uNf)} = (9)

= hi(T,xt,UNE) + Vi(t,T,xt)g(T,xt,UNE), i e N,

where uNE(4>i) = (u?E,...,..., u^E), Vi(t, t + T, x1) = 0, i e N, y>(r - t) is the derivative with respect to t.

Proof. According to the definition of generalized Nash equilibrium, uNE (t,T,xt)

t

sufficient conditions for feedback Nash equilibrium in the differential game with random

t

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

lized Nash equilibrium are satisfied. The Theorem is proved. □

6. Model of non-renewable resource extraction. As an illustrative example consider a differential game model with continuous updating for the extraction of nonrenewable resource (see [10,11]). Assume that n players exploit a natural common-property resource, which does not regenerate over time, such as natural gas or earth minerals.

6.1. Initial game model. By x(t) denote the state variable indicating the resource stock at time t available to be extracted by the players. Parameter ui(t) denotes the extraction rate of player i at the same time. We assume that ui(t) > 0 x(t) > 0.

The dynamics of the stock is given by the following equation with initial condition:

n

X(t) = — aiui(t,x), x(to)= xo,

i= l

where ai > 0 for all i = 1,..., n, and x0 > 0.

Let players have logarithmic utility functions

hi(x(t), Ui(t)) = ln Ui(t, x), i = 1,...,n.

i

T

Ki(xo, T — to) = j ln ui(s, x)ds, i = 1,...,n.

to

6.2. Nash equilibrium strategies with continuous updating. Assume that information about motion equations and payoff functions is updated continuously in time. At every instant t e [t0, T] players have information only on the interval [t, t + 0]. It means that at each time instant they can count for the stability of process only over period 9. Let 9 is a uniformly distributed random variable over an interval [0, T], then f(s—t) = (s—t)/T, s G [t,t + T], According to the section to determine the Nash equilibrium feedback strategies in the game with continuous updating, we consider the family of auxiliary subgames r(x,t,t + 0) with duration 0, starting at the moment t from the state x.

The dynamic constraints of the game r(x,t,t + 0) are given by

хг(а) = - J2 aiuti(s,xt),

t i= 1 ^x (t) — ^x,

xt e M', ut — (u 1,..., uUn), ui — ui(s, x) e Ui С compRfc.

The payoff function of player ¿in r(x, t,t + 9) is

t+T t

Kti{x,t,T-1ut)= J J inuKs^^dsdF^r), i e N. t t

It can be represented in the form

t+T

KKx^iT;^') = J lnw^s^Xl - F\s))ds, i G N. t

To calculate the feedback Nash equilibrium strategies in subgame r(x, t,t + 9) we use the dynamic programming technique.

With Vl(t, s, xt ) denote the Bellman function — payoff of player i in Nash equilibrium in the subgame of r(x, t,t + 9) starting at the moment s.

The system of Hamilton — Jacobi — Bellman equations has the following form: 1

vi(t,s,xt ) - vi(t,s,x^

t + T - s

max ^ ln& - Vi(t, s,xt)(ai^i + ^ ajufE) f . (10)

We will find the Bellman functions Vi(t, s,xt) in the form Vi(t,s,xt) = Ai(t, s)hixt + Bi(t, s). Maximizing the expression on the right-hand side of (10), and solving the corresponding differential equations in Ai(t, s) and Bi(t, s), we obtain equations

w N T+t-s

Mt,s) = -g-'

Bi(t, s) = T+!2~S In(T + t - s) - In en - n - In i + i

Finally we get the generalized feedback Nash equilibrium strategies:

2xt

~NE(,

Ua (t, s,x ) =

i(T + t — s)

According to the procedure (8) we consider equilibrium strategies in subgames r(x,t, t + 0) at their starting points and construct the feedback Nash equilibrium with continuous updating as follows:

NE/, \ ~NE/, ts | 2x(t) .

Ui {t,x)=ui (t,S,X ) |s=i= -=-, 1=1 ,...,n.

ail

The equilibrium trajectory xNE (t) with continuous updating is

xNE{t)=x0e-W^ o).

It is interesting to compare the obtained solution with the solution of the initial game and the game with continuous updating and prescribed information horizon.

Equilibrium strategies and the corresponding trajectory in the initial game are

NE ( \ x(t)

"¿.initial^' x> = ~77m IT' l=l i ■ ■ ■ ini

¿.initial1* ' > Oi(T-ty

NE (f) _ (T ~ f)n initial ^ ^ ~ (rp _i0)n'

The solution of the game with continuous updating and prescribed information horizon can be found in f 121:

NE ( \ x(t)

X) = 1=1, . . ., n,

ail

x™{t)=x oe-W-^.

To illustrate the difference between these solutions, consider the numeric example. Assume

a

Figure. Nash equilibrium strategies (a) and resulting equilibrium trajectories (b)

in games of different types

the following values of parameters: n = 3, xq = 500, to = 0, T = 10, T = 1, a\ = 0.5, a2 = 1, a3 = 0.9. The equilibrium strategies of the first player and trajectories in different games are presented in Figure, a and b (for players 2 and 3 the graphs are similar).

It can be noted, that the less information the players have, the faster they try to extract the resource. So, in the game with random information horizon, player 1 starts with the rate of extraction 2000, in the game with continuous updating — with the rate 1000 and in the initial game — with 100. Figure, b shows that the fastest resource consumption corresponds to the game with random information horizon.

7. Conclusion. A differential game model with continuous updating and random information horizon is presented. The concept of Nash equilibrium for the new class of games is defined. A new type of Hamilton — Jacobi — Bellman equations are presented and the technique for defining Nash equilibrium in the game model with continuous updating is described. The theory of differential games with continuous updating is demonstrated in the game model of non-renewable resource extraction. The comparison of Nash equilibrium and corresponding trajectory in the initial game model and in the game model with continuous updating is presented, conclusions are drawn.

References

1. Kuchkarov I., Petrosian O. On class of linear quadratic non-cooperative differential games with continuous updating. Lecture Notes in Computer Science, 2019, vol. 11548, pp. 635-650.

2. Petrosian O., Tur A. Hamilton — Jacobi — Bellman equations for non-cooperative differential games with continuous updating. Mathematical Optimization Theory and Operations Research. MOTOR.

2019. Ed. by I. Bykadorov, V. Strusevich, T. Tchemisova. Communications in Computer and Information Science, 2019, vol. 1090, pp. 178-191.

3. Kuchkarov I., Petrosian O. Open-loop based strategies for autonomous linear quadratic game models with continuous updating. Mathematical Optimization Theory and Operations Research. MOTOR.

2020. Ed. by A. Kononov, M. Khachay, V. Kalyagin, P. Pardalos. Lecture Notes in Computer Science, 2020, vol. 12095, pp. 212-230.

4. Wang Z., Petrosian O. On class of non-transferable utility cooperative differential games with continuous updating. Journal of Dynamics & Games, 2020, vol. 7, no. 4, pp. 291-302.

https://doi.org/10.3934/jdg.2020020

5. Shi L., Petrosian O. L., Boiko A. V. Looking forward approach for dynamic cooperative advertising game model. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2019, vol. 15, iss. 2, pp. 221-234. https://doi.org/10.21638/11702/spbul0.2019.206

6. Petrosyan L.A., Murzov N.V. Game-theoretic problems in mechanics. Lithuanian Mathematical Collection, 1966, vol. 3, pp. 423-433.

7. Petrosyan L. A., Shevkoplyas E. V. Cooperative differential games with random duration. Vestnik of Saint Peterburg University. Series 1. Mathematics. Mechanics. Astronomia, 2000, iss. 4, pp. 18-23.

8. Shevkoplyas E.V. The Hamilton — Jacobi — Bellman equation for a class of differential games with random duration. Autom. Remote Control, 2014, vol. 75, pp. 959-970. https://doi.org/10.1134/S0005117914050142

9. Bellman R. Dynamic programming. Princeton, Princeton University Press, 1957, 342 p.

10. Dockner E. J., Jorgensen S., Long N.V., Sorger G. Differential games in economics and management science. Cambridge, Cambridge University Press, 2000, 382 p.

11. Kostyunin S., Palestini A., Shevkoplyas E. A differential game-based approach to extraction of exhaustible resource with random terminal instants. Contributions to Game Theory and Management, 2012, vol. 5, pp. 147-155.

12. Petrosian O., Tur A., Wang Z., Gao H. Cooperative differential games with continuous updating using Hamilton^ Jacobi ^Bellman equation. Optimization methods and software, 2020, pp. 1099-1127. https://doi.org/10.1080/10556788.2020.1802456

Received: December 21, 2021.

Accepted: June 21, 2022.

Authors' information:

Anna V. Tur — PhD in Physics and Mathematics, Associate Professor; [email protected]

Ovanes L. Petrosian — PhD in Physics and Mathematics, Associate Professor; [email protected]

Случайный информационный горизонт в классе дифференциальных игр с непрерывным обновлением*

А. В. Тур, О. Л. Петросян

Санкт-Петербургский государственный университет, Российская Федерация, 199034, Санкт-Петербург, Университетская наб., 7-9

Для цитирования: Tir А. V., Petrosian О. L. Random information horizon for a class of differential games with continuous updating // Вестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления. 2022. Т. 18. Вып. 3. С. 337346. https://doi.org/10.21638/11701/spbul0.2022.304

Рассматривается класс дифференциальных игр с непрерывным обновлением со случайным информационным горизонтом. Предполагается, что непрерывно во времени игрокам поступает информация (об уравнениях движения и функциях выигрышей) лишь на некоторый временной промежуток с длиной в, по мере развития времени информация об игре обновляется. Впервые этот тип игр был рассмотрен нами в 2019 г. Здесь допол-в

определение концепции решения, основанного на равновесии по Нэшу, и описание техники решения, базирующейся на уравнениях Гамильтона—Якоби — Веллмана.

Ключевые слова: дифференциальная игра с непрерывным обновлением, равновесие по Нэшу, уравнения Гамильтона — Якоби — Веллмана, случайный информационный горизонт.

Контактная информация:

Тур Анна Викторовна — канд. физ.-мат. наук, доц.; [email protected]

Петросян Ованес Леонович — канд. физ.-мат. наук, доц.; [email protected]

* Работа первого автора выполнена при финансовой поддержке Российского фонда фундаментальных исследований и Немецкого научно-исследовательского сообщества (проект № 21-51-12007), работа второго — в рамках гранта Президента Российской Федерации для государственной поддержки молодых российских ученых — кандидатов наук (проект № МК-4674.2021.1.1).

RANDOM INFORMATION HORIZON FOR A CLASS OF DIFFERENTIAL GAMES WITH CONTINUOUS UPDATING Текст научной статьи по специальности «Математика»

Аннотация научной статьи по математике, автор научной работы — Tur Anna V., Petrosian Ovanes L.

Похожие темы научных работ по математике , автор научной работы — Tur Anna V., Petrosian Ovanes L.

Текст научной работы на тему «RANDOM INFORMATION HORIZON FOR A CLASS OF DIFFERENTIAL GAMES WITH CONTINUOUS UPDATING»