Научная статья на тему 'NON-AUTONOMOUS LINEAR QUADRATIC NON-COOPERATIVE DIFFERENTIAL GAMES WITH CONTINUOUS UPDATING'

NON-AUTONOMOUS LINEAR QUADRATIC NON-COOPERATIVE DIFFERENTIAL GAMES WITH CONTINUOUS UPDATING Текст научной статьи по специальности «Математика»

CC BY
16
7
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
DIFFERENTIAL GAMES WITH CONTINUOUS UPDATING / NASH EQUILIBRIUM / LINEAR QUADRATIC DIFFERENTIAL GAMES / NONAUTONOMOUS

Аннотация научной статьи по математике, автор научной работы — Kuchkarov Ildus, Petrosian Ovanes, Li Yin

The subject of this paper is a non-autonomous linear quadratic case of a differential game model with continuous updating. This class of differential games is essentially new where it is assumed that, at each time instant, players have or use information about the game structure defined on a closed time interval with a fixed duration. During the interval information about motion equations and payoff functions of players updates. It is non-autonomy that simulates this effect of updating information. A linear quadratic case for this class of games is particularly important for practical problems arising in the engineering of human-machine interaction. Here we define the Nash equilibrium as an optimality principle and present an explicit form of Nash equilibrium for the linear quadratic case. Also, the case of dynamic updating for the linear quadratic differential game is studied and uniform convergence of Nash equilibrium strategies and corresponding trajectory for a case of continuous updating and dynamic updating is demonstrated.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «NON-AUTONOMOUS LINEAR QUADRATIC NON-COOPERATIVE DIFFERENTIAL GAMES WITH CONTINUOUS UPDATING»

Contributions to Game Theory and Management, XV, 132—154

Non-autonomous Linear Quadratic Non-cooperative Differential Games with Continuous Updating*

Ildus Kuchkarov1, Ovanes Petrosian1'2 and Yin Li13

1 St. Petersburg State University, Faculty of Applied Mathematics and Control Processes, 7/9, Universitetskaya nab., St. Petersburg, 199034, Russia E-mail: kuchkarov_ildus@mail.ru 2 HSE University, 20, Myasnitskaya ul., St. Petersburg, 194 1 00, Russia E-mail: opetrosyan@hse.ru 3 School of Mathematics, Harbin Institute of Technology, 92, West Dazhi St., Harbin, 15000, China E-mail: liyinrus@outlook.com

Abstract The subject of this paper is a non-autonomous linear quadratic case of a differential game model with continuous updating. This class of differential games is essentially new where it is assumed that, at each time instant, players have or use information about the game structure defined on a closed time interval with a fixed duration. During the interval information about motion equations and payoff functions of players updates. It is non-autonomy that simulates this effect of updating information. A linear quadratic case for this class of games is particularly important for practical problems arising in the engineering of human-machine interaction. Here we define the Nash equilibrium as an optimality principle and present an explicit form of Nash equilibrium for the linear quadratic case. Also, the case of dynamic updating for the linear quadratic differential game is studied and uniform convergence of Nash equilibrium strategies and corresponding trajectory for a case of continuous updating and dynamic updating is demonstrated.

Keywords: differential games with continuous updating, Nash equilibrium, linear quadratic differential games, non-autonomous.

1. Introduction

The theory of games and global optimization problems are related to each other. As game theory examines the behavior of multi-agent systems, games can be viewed as a multi-objective optimization problems. Therefore game theory plays an important role as an application of global optimization. From the other side a number of concepts such as Nash equilibrium are taken from the theory of games to construct a special class of heuristic algorithms. In the theory of differential games the conditions of optimality such as Hamilton-Jacobi-Bellman equation or Pontryagin's Maximum principle are constructed using the approaches initialy developed in the theory of optimization. Dynamic programming and Bellman equation were the basis for HJB equations. Pontryagin's Maximum principle was preceded by the method of Lagrangian multipliers.

In the theory of classical differential games, it is usually assumed that players have all information about game dynamics and can forecast each others' actions. In

*Research was supported by the Russian Science Foundation grant No. 18-71-00081. https://doi.org/10.21638/11701/spbu31.2022.11

particular classical differential games are defined on finite or infinite time intervals (players have information about the dynamics of the game on finite or infinite time intervals) (Basar et al., 1995; Isaacs, 1965), on a random time interval (players have information on a given time interval, but the duration of this interval is a random variable) (Shevkoplyas, 2014). One of the first works in the theory of differential games is devoted to the differential pursuit game (the player's gain depends on the time of capture of the opponent) (Petrosyan and Murzov, 1966). Classical differential game models assume that players know all information about the game dynamics (equations of motion) and about players' preferences (cost functions) at the beginning of the game. Different types of differential game models help to model players' behavior for different scenarios, but one important idea is usually missing. In real-life processes, information about the dynamics of the process is not known in advance; i.e. the form of motion equations is not known in advance. Furthermore, participants in the process - the players - cannot usually make a perfect forecast about others' preferences for the whole interval over which the process is defined.

Most real conflict-driven processes evolve continuously over time, and their participants constantly adapt. This paper continues along the lines of others related to continuous updating where it is assumed that players

— have information about motion equations and payoff functions only on [t,t + T], where T is the information horizon and t is the current time instant.

— receive updated information about motion equations and payoff functions as time t G [to, evolves.

In this paper, it is supposed that motion equations and payoff functions explicitly depend on the time parameter. Therefore, in the general form of the differential game with continuous updating, information about motion equations and payoff functions updates because its form changes as the current time t G [t0, evolves. This allows one to fully implement the concept of continuously updating information, in contrast to the autonomy case. While the Nash equilibrium is used to model the individual behavior of players, the unusual introduction of information updating makes the Nash equilibrium difficult to derive. This is because control problems that include a moving information horizon suffer from a lack of fundamental approaches. Such classical methods as dynamic programming, the Hamilton-Jacobi-Bellman equation (Bellman, 1957) and the Pontryagin maximum principle (Pontryagin, 1996) do not allow for directly constructing Nash equilibrium in problems within a moving information horizon.

The class of games with continuous updating is represented in the literature by the following papers (Kuchkarov and Petrosian, 2019; Kuchkarov and Petrosian, 2020; Petrosian and Tur, 2019). In the paper (Petrosian and Tur, 2019) the system of Hamilton-Jacobi-Bellman equations are derived for Nash equilibrium with continuous updating. In the paper (Kuchkarov and Petrosian, 2019; Kuchkarov and Petrosian, 2020) the class of autonomous linear-quadratic differential games with continuous updating is considered and the explicit form of the Nash equilibrium is derived for feedback-based and open-loop-based cases. Actually, the continuous updating approach was the extension, or generalization, of the dynamic updating case wherein the following papers were published (Gromova and Petrosian, 2016; Petrosian, 2016a; Petrosian, 2016b; Petrosian and Barabanov, 2017; Petrosian et al., 2017; Petrosian et al., 2018; Petrosian et al., 2019; Yeung and Petrosian, 2017). Here

the updating procedure occurs not continuously in time, but in discrete time instants.

The class of differential games with dynamic and continuous updating has some similarities with Model Predictive Control (MPC) theory which is worked out within the framework of numerical optimal control (Goodwin et al., 2005; Kwon and Han, 2005; Rawlings and Mayne, 2009; Wang, 2005). The MPC approach achieves the current control action by solving a finite-horizon open-loop optimal control problem at each sampling instant. For linear systems there exists a solution in the explicit form, (Bemporad et al., 2002; Hempel et al., 2015). However, in general, the MPC approach demands the solution of several optimization problems. Another related series of papers corresponding to the class of stabilizing control is (Kwon et al., 1982; Kwon and Pearson, 1977; Mayne and Michalska, 1990; Shaw, 1979) where similar approaches were considered for the class of linear quadratic optimal control problems. But in the current paper, and in papers about the continuous updating approach, the main goal is different: to model players' behavior when information about the course of the game updates continuously in time.

In this paper, we extend the results of papers (Kuchkarov and Petrosian, 2019; Kuchkarov and Petrosian, 2020), where the class of autonomous linear-quadratic differential games with continuous updating is considered and the explicit form of the Nash equilibrium is derived. One of the main results of this paper are the sufficient conditions for the existence of a feedback-based and open-loop-based Nash equilibrium with continuous updating for a non-autonomous case. The current paper is focused on a non-autonomous case for the class of games with continuous updating. For this approach it is very important that motion equations and payoff functions are the functions of time or the system itself is non-autonomous. It is possible to make use of information updating and adaptation because motion equations and payoff functions are functions of the current time and, in the framework of the continuous updating approach, may not be known in advance. The implementation of previously unknown functions models the behavior of the system as if it used limited information at each moment in time. The popularity of the so-called linear quadratic differential games (Engwerda, 2005) can be explained by practical applications in engineering. To some extent, this kind of differential game is analytically and numerically solvable. On the other hand, this linear quadratic problem setting naturally appears if the agents' objective is to minimize the effect of a small perturbation of their nonlinear optimally controlled environment. By solving a linear quadratic control problem, and using the optimal actions implied by this problem, players can avoid most of the additional cost incurred by this perturbation. Also in this paper it is proved that Nash equilibrium in the corresponding linear quadratic game with dynamic updating uniformly converges to the introduced controls. This procedure allows concluding that the constructed control indeed is optimal in the game model with continuous updating, i.e. in the case when the length of the updating interval converges to zero. A similar procedure is performed for the corresponding trajectory. Another important issue addressed in the paper is the issue of the non-uniqueness of the Nash equilibrium for interval [t,t + T] and what the Nash equilibrium with continuous updating will look like in this case.

The paper is structured as follows. Section 2 presents a description of the initial differential game model and corresponding game model with continuous updating as well as a conceptual strategy for it. In section 3, the Nash equilibrium is adapted

for a class of games with continuous updating and the explicit form of it for a class of linear-quadratic differential games is presented. Section 4 provides a description of the game model with dynamic updating and the form of Nash equilibrium with continuous updating. It also demonstrates the convergence of Nash equilibrium strategies and corresponding trajectories for a case of dynamic and continuous updating. The illustrative model example and corresponding numerical simulation are presented in section 5. The demonstration of the convergence result is also presented in the numerical simulation section. In section 6, conclusions are drawn.

2. Classical Differential Game Model and Model with Continuous Updating

2.1. Linear Quadratic Non-autonomous Game Model

Consider n-player (|N| = n) linear quadratic non-autonomous differential game r(x0,t0,T) defined on the interval [t0,T]: Motion equations have the form

X(t) = A(t)x(t) + Bi(t)ui (t) + ... + Bn(t)un(t),

x(to) = xo, (1)

x € R1, u = (u1,...,un), ui = ui(t) e Ui C Rk, t € [t0,T].

The payoff function of player i e N is defined as

Ki(x0,t0,T;u) = J ix'(t)Qi(t)x(t) + ^uj(t,x)Rij(t)uj(t,x) I dt, i e N,

to V j=1 )

(2)

where Qi(t), Rij(t) are assumed to be symmetric for t e [t0,T], Rii(t) is positive defined for t e [t0,T], (■)' means transpose here and hereafter.

Furthermore, for the right-hand side of (1) we suppose that the conditions of Theorem 5.1 from (Basar et al., 1995) are satisfied so that the state equation admits a unique solution for every corresponding N-tuple of strategies. For a strategy space we consider the so-called Markov functions, that is, the set of functions where each function depends only on the current state of the system and time, ui(t) e rfb, i e N:

rfb = {ui(0, T) | ui(t) = ui(t, x(t)) and (ui(.),..., un(.)) e U}.

Later in the paper we will directly refer u(t,x(t)) = (u1(t,x(t)),... ,un(t,x(t))) as a strategies in a feedback form and use this notation.

2.2. Linear Quadratic Non-autonomous Game Model with Continuous Updating

The difference between a non-autonomous game model and an autonomous one for the class of games with continuous updating is that here the right hand side of motion equations and the integrand in the payoff function explicitly depend on the current time t. For the case of a linear quadratic model, the dependency on the current time t in (1), (2) is introduced by the following matrices:

A(t), Bi(t), Qi(t), Rij(t), i,j = 1,...,n. (3)

There is a special meaning to this dependency. At the begging of the overall game, at time instant t0, players have only information about the motion equations and

payoff functions for the interval [t0,t0 + T]. Thus they know values of matrices (3) for interval [t0,t0 + T], because the form of matrices (3) can be different for the interval [t0 + T, +to]. In the case of autonomous system matrices, (3) are constants and it can be said that they are known for the whole interval on which the game is defined.

Consider n-player differential game r(x,t, t + T), t G [t0, defined on the interval [t, t + T], where 0 < T <

Motion equations of r(x, t, t + T) have the form

x4(s) = A(s)xt(s) + B1(s)ui1(s, x4) + ... + Bn(s)un(s,xi),

x4(t) = x, (4)

x4 G R1, u4 = («1,...,«^), ut = wt(s,xi) G U C Rk, t G [t0,

The payoff function of player i G N in game r(x, t, t + T) is defined as

- T( , n , \

K|(x,t,T; u4)= / I (xt(s^ Qi(s)xt(s)^^ (uj (s,xt^ Rij (s)uj (s,xt)l ds,

4 ' _ (5)

where x4(s), ui(s,xi) are trajectory and strategies in the game r(x,t,t + T). It is easy to see for each strategy profile u4(s, x4) = (u1(s, x4),..., u^(s, x4)) for fixed time instant t we receive a trajectory x4(s), s G [t, t+T] (in section 2.1 the conditions of positive definite and symmetry are satisfied). As the current time t G [t0, +to) changes the strategy profile u4(s, x4) changes and corresponding x4(s) as well. Therefore, further we will keep additional indexing t for x4 that represents the trajectory or the state in the game starting in the current time instant t. It is supposed that the same conditions as in the section 2.1 are satisfied for a N-tuple of strategies, but for every current time instant t.

In the class of differential games with continuous updating, time parameter t G [t0, +to) evolves continuously. As a result players continuously receive updated information about motion equations and payoff functions under r(x,t,t + T). According to the model (4), (5) we assume that, at each current time t G [t0, +ro), the strategy u4(s, x4) can be different for fixed s. Using the strategies u(t, x), it is possible to model the behavior of players for continuously updated information:

u(t, x) := ui(s,xi(s))|s=i, t G [t0, +ro),

where ui(s,xi), s G [t,t + T] are some fixed strategies defined in the subgame r(x, t, t+T) starting in time instant t and initial state xi(s)|s=i = x. Such a complex construction is needed because the decision in the whole game with continuous updating is based on particular decisions in each subgame, but only at one moment in time. The state or the trajectory corresponding to u(t, x) in the model with continuous updating is defined according to the motion equations in the initial game without updating

x(t) = A(t)x(t) + B1(t)u1(t, x) + ... + Bn(t)un(t,x),

x(t0) = x0, (6)

x € R1

with strategies with continuous updating u(t, x) involved.

The essential difference between a game model with continuous updating and a classic differential game r(x0,t0,T) with a prescribed duration is that players in the initial game are guided by the payoffs that they will eventually receive for interval [t0,T]. But in the case of a game with continuous updating, at the time instant t they orient themselves on the expected payoffs (5), which are calculated using information about the game structure defined on the interval [t,t + T].

3. Nash Equilibrium with Continuous Updating in LQ Differential Games

3.1. Nash Equilibrium with Continuous Updating

In this section we present the definition of Nash equilibrium with continuous updating presented for the first time in (Petrosian and Tur, 2019).

For a class of games with continuous updating the concept of Nash equilibrium uNE(t,x) = (uNE(t,x),... ,vNE(t,x)) will be defined such that each fixed t e [t0, coincides with the feedback (open-loop) Nash equilibrium in the game (4), (5) defined for the interval [t, t + T] at the instant t. This concept is used to model the behavior of players that use Nash equilibrium strategies under the information they have at every current time instant t.

As has been stated above and in the previous papers on continuous updating, it is impossible to directly use the definition of Nash equilibrium and the classical procedures to obtain it. Therefore direct application of classical approaches such as the Hamilton-Jacobi-Bellman equations or the Pontryagin maximum principle for determining Nash equilibrium in feedback (open-loop) strategies is not possible.

Definition 1. Strategy profile uNbE(t,x) (uNE(t,x)) is called the feedback-based (open-loop-based) Nash equilibrium with continuous updating, if it is defined in the following way:

uNNbE (t,x) = f (t, s,x%=t = (uNb%t, s,x%=t,. ..f^t, s,x%=t),

(uNE (t, x) = UNE (t, s, xt)ls=t = (uNE (t, s, xt)ls=t,..., UNtEn(t, s, xt)|s=t)) , ( )

where t e [t0, and uNbE(t,s,xt) (u^(t,s,xt)) is the feedback (open-loop) Nash equilibrium in the game r(x, t,t + T) defined on the interval [t, t + T].

We suppose that the strategy with continuous updating obtained using (7) is admissible or that the problem (6) has a unique and continuable solution. Corresponding conditions of existence, uniqueness and continuability of A. F. Filippov (Filippov, 2004) are presented for the system (4)-(5).

Strategy profile uNbE(t,x) (u^(t,x)) will be used as a solution concept in the game with continuous updating. It is important to notice that Nash equilibrium with continuous updating uNbE(t,x) {u^(t,x)) is not the Nash equilibrium in the classical sense, but can be used as a solution concept related to Nash equilibrium for a class of games with continuous updating. Tra)ectories corresponding to uNbE (t,x) {uNE(t, x)) we will denote by xNbE(t) (x%E(t)).

3.2. Theorems on Nash Equilibrium with Continuous Updating for LQ Differential Games

One of the main results of this paper is the establishment of sufficient conditions for the existence of a feedback-based Nash equilibrium with continuous

updating for the non-autonomous case. Related results can be found in papers (Kuchkarov and Petrosian, 2019) and (Kuchkarov and Petrosian, 2020) where the sufficient conditions for a Nash equilibrium with continuous updating for an autonomous case were presented.

Theorem 1. For an N-person linear-quadratic differential game r(x0,t0,t0 + T) with continuous updating with Qi(-) > 0, Rj(•) > 0 (i, j G N, i = j), let the system of N coupled matrix Riccati differential equations

dr

+ Z|(T)F*(T) + (F*(T))' Z|(T) + Qi(t + TT) +

+T £ Zj(t)Bj (t + Tt(t + Tt)Rj (t + Tt) x jew

xR-1(t + Tt)Bj(t + Tt)Zj(t) =0, t g [0,1],

Z|(1)=0, i g N, (8)

where

F4(t) = TA(t + Tt) - T2 £ Bi(t + Tt)R- 1(t + Tt)Bi(t + Tt)Z|(t),

iew

A(t), B(t), Q(t), R(t) are bounded continuous functions, has a unique bounded solution Z|(-) > 0, i G N for t > t0. Then a linear-quadratic differential game with continuous updating has a continuous in t linear feedback-based Nash equilibrium with continuous updating

fE(t, x) = -R- 1(t)Bi(t)Zt(0)Tx, i G N. (9)

Proof. In order to prove the Theorem we introduce the following change of variables

s = t + Tt,

y4(r ) = xi/6(t + Tt ), (10)

vt (t, y) = u(t + Tt, y), i G N.

By substituting (10) to the motion equations (4) and payoff function (5), we obtain

_ _ N _ _

y4(t) = TA(t + Tty(T) + £ TBi(t + Tt)*?(t, y) (11)

i=i

and

1

Kt(yt,T; «*) = ! (y4(Ti))' Qi(t + TTi)y4(Ti)

(12)

N

+ ( Ti,y))'Rij (t + Tt i )v| (Ti,y)dTi, i G N.

j=i

The corollary 6.5 from (Basar et al., 1995) (sufficient conditions for the existence of Nash equilibrium in affine-quadratic game) and the existence of solution Z|(t) > 0 for the system of differential equations (8) lead to feedback-based Nash equilibrium strategies in the subgame r (x, t, t + T ) that have the form

v|,NE(t, y) = -R- 1(t + Tt)Bi(t + Tt)Z|(t)Ty. (13)

From (10) we have

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

_ s -1 T = T ,

returning to original variables we obtain the following strategies

u^x) = -Rt 1(s)B'i(s)Ztt (^ Tx. (14)

These strategies are Nash equilibrium in feedback-based strategies in the subgame r(x, t,t + T) by construction.

Task (11), (12) and solution (13) have the same form for all values t in the original game with continuous updating. Then a feedback Nash equilibrium in all subgames has the parametric form

uNbE(t,s,x) = -R-1(s)B'(s)Ztt (ss—^jTx. (15)

Apply the procedure (7) to determine the Nash equilibrium with continuous updating using (15), s = t:

uNbEi(t,x) = -Rt-t 1(t)Bt(t)Z*(0)Tx, t e [t0, +<x>), i e N.

To prove continuity of (9) we use classic theorem from ODE theory: if right part of ODE is continuous, bounded and for every point (t0, x0) it has unique solution then solution of this ODE continuously depends on an initial point and right part of ODE. Thus, ODE (8) satisfy this classic theorem, hence Zf(0) is continuous as function of t. Thus, right part of (9) is continuous. This proves the theorem.

Remark 1. The non-autonomous case differs from the autonomous case in that the solution to the Riccati equations may differ at each moment in time, thus it is necessary to find a solution to the family of Riccati equations. In addition, in the autonomous case, the continuity in t of the strategy followed from the constancy of the solution to the Riccati equations, while in the non-autonomous case this continuity must be shown explicitly.

The form of open-loop-based Nash equilibrium with continuous updating was presented in (Kuchkarov and Petrosian, 2020). We will use these results later to study convergence in section 4.

Theorem 2. For an N-person linear-quadratic differential game with Qt(•) > 0, Rij(•) > 0 (i,j e N,i = j), let there exist a solution set [Mf,i e N,t ^ t0} to the coupled matrix Riccati differential equations

dMt(T) + TMtt (t )A(t + Tt ) + TA'(t + Tt )Mt(r) + Qt(t + Tt )-ar

- T2M!(t) £ Bj(t + Tr) (Rjj(t + Tr))-1 Bj(t + Tr)Mtj (r) = 0, (16)

jeN

Mt(i) = 0, r e [0, l], i e N.

Then, the differential game with continuous updating admits an open-loop-based Nash equilibrium with continuous updating solution given by

uNE(t, x) = -R—1 (t)B't(t)Ml(0)Tx, i e N.

Remark 2. Notice that the open-loop-based solution with continuous updating has a feedback form; i. e. the open-loop-based Nash equilibrium with continuous updating explicitly depends on the current state. This happens because of the way the solution is constructed, when at each current time t players reconsider their decisions under the continuously updated information.

Example 1. Consider the following autonomous differential game model with two players in order to compare an open-loop-based and feedback-based Nash equilibrium with continuous updating. Let the motion equations in the subgame have the form

Xt(s) = + (17) X(t) = x, x g R1

and payoff function of player i G {1,2} is defined as

i+T

Ki (x,t,T,U) = J (q (xi(s))2 + ri (Wi(S,xi(S)))2 + r2 (wj(s,xi(s)))^ ds, (18) t

where j G {1, 2}, j = i, q, r1, r2 > 0.

As we need to consider only one equation because of players' symmetry, for a feedback-based Nash equilibrium with continuous updating the Riccati differential equation has the form:

¿(t) = 2T^z(r) + 2TV(t) - - q, t G [0,1],

\ri r2y

¿(1) = o.

For an open-loop-based Nash equilibrium with continuous updating we obtain

m(T) = 2T^m(T) + 2T2^^^ - q, t G [0,1],

r1

m(1) = 0.

Note that, since in this example we consider a case of two players, the conditions of Theorems 1 and 2 become necessary as well. Therefore, one can say that a feedback-based and open-loop-based Nash equilibrium with continuous updating for this model are unique. We have simulated Nash equilibrium obtained using Riccati equations above for parameters ^ = 0.01, r1 = 1, r2 = 0.1, q = 5, T = 1, t G [0, 2], x(0) = 100, plotted them (Fig. 1) and the corresponding equilibrium trajectories (Fig. 2) for open-loop-based and feedback-based cases. It is possible to see that the solutions are obviously different even though both of them are in fact from the feedback-based form (depending on the state x).

3.3. Existence, Uniqueness

An important assumption made in the previous section is the assumption for the existence of Nash equilibrium with continuous updating. According to the definition, for the existence of Nash equilibrium with continuous updating (t, x), it is necessary and sufficient to assume the existence of a Nash equilibrium (t, s, x4) in each subgame r(x,t,t + T), t > t0.

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

Fig. 1. Ufb (t) — feedback-based Nash equilibrium with continuous updating for (17), (18), u^l^ (t) — open-loop-based Nash equilibrium with continuous updating for (17), (18).

Fig. 2. Xfb (t) — trajectory for feedback-based NE with continuous updating for (17), (18), x^l^ (t) — trajectory for open-loop-based NE with continuous updating for (17), (18).

The problem of the uniqueness of the Nash equilibrium in differential games is not trivial. It is known (Eisele, 1982) that, even for the case of an autonomous linear-quadratic game, the open-loop Nash equilibrium can be non-unique even when sufficient conditions similar to the theorems presented above are satisfied.

Suppose that, in the games with continuous updating, in each subgame r(x, t, t+ T), t > t0 there are at least two Nash equilibria on the interval [t,t + T]. Then the set of Nash equilibria with continuous updating will be uncountable because players can switch from one Nash equilibrium to another one at every current time instant time t.

Example 2. Consider the following game model with continuous updating defined by the motion equations with strategies of players in open-loop form

xt(s)- Ms) 0

x (s) 0 u2(s)J ,

x4(t) = x, xl e R2

and payoff function of player i e {1,2} defined as

t+ f

x, t, , u I —

' ' 2, ) 2

1

| ((xi(s))' Qixi(s) + ds,

where

Qi —

2 -i -1 4

Q2 —

15 -2 -2 i 2 3 .

If x = (0,0)' then according to example 4.1 in Eisele, 1982 this differential

2 U =

3 2

subgame has trivial solution «1 = «2 = 0 and set of nontrivial solutions «1 a, a e R. Thus the set of open-loop-based Nash equilibria with continuous updating is formed by piece-wise continuous functions of the form

Uol,i(t) —

0, t e [to, ti), u*(t), t e [ti, to],

where ti is a switching point from trivial to nontrivial solution and w*(t) is some nontrivial strategy of player i after this switching point. As you can see, the set of switching points is uncountable, therefore the set of open-loop-based Nash equilibria with continuous updating is uncountable.

Moreover, an open-loop-based Nash equilibrium with continuous updating may have continuity or may have a point of discontinuity jump in at some time instant t1, where players switch their strategies from trivial to nontrivial. Thus, if there are several Nash equilibria in the subgames, then, in a game with continuous updating, the Nash equilibrium with continuous updating may be discontinuous.

Lemma 1. Let the differential game be autonomous and the Nash equilibrium in each subgame r (x, t, t + T ) be unique and continuous, then the Nash equilibrium with continuous updating is unique and continuous.

Proof. The uniqueness of a Nash equilibrium with continuous updating follows from its definition and the uniqueness of the Nash equilibrium in each subgame r (x, t, t +

T). _

In the autonomous case, players consider the same subgame r(x, t, t+T) at every moment of time t. Without loss of generality, let w*(s, x) be the Nash equilibrium in this common subgame r (x, t0,t0 + T), then, according to definition 1, the Nash equilibrium with continuous updating has the form (t, x) = u*(t0,x). Thus, the Nash equilibrium with continuous updating does not explicitly depend on time, and its continuity follows from the continuity of the Nash equilibrium in subgame r (x,t,t + T ).

Remark 3. In the case of non-uniqueness of the solution, an algorithm can be constructed to select a single solution by imposing additional constraints on the system. For example, the change in strategy should be as small as possible. For such a criterion for resolving non-uniqueness in Example 2, we obtain the trivial solution

uoi,i(t)=0, t G [to, to].

Study of the properties of Nash equilibrium with continuous updating is a direction for further research.

4. Convergence Results for Strategies and Trajectories

In this section the convergent results of Nash equilibrium strategies and trajectories with continuously updating are presented. In order to do so the concept of dynamic updating is presented, which helps to model the noncooperative behavior of players when information updates in discrete time instants. Convergence of strategies and trajectories with dynamic updating and strategies with continuous updating is proved both for open-loop-based and feedback-based cases.

4.1. LQ Game Model with Dynamic Updating

In papers (Gromova and Petrosian, 2016; Petrosian, 2016a; Petrosian, 2016b; Petrosian and Barabanov, 2017; Petrosian et al., 2017; Petrosian et al., 2018; Ye-ung and Petrosian, 2017) the method for constructing a differential game model with dynamic updating is described. There it is assumed that players have information about the game structure only over a truncated interval and, based on this, make decisions. In order to model the behavior of players in a case when information

updates dynamically, consider the case when information is updated every > 0 and the behavior of players at each segment [t0 + j^t, t0 + (j + 1)^t], j = 0,1,2,... is modeled using the notion of a truncated subgame:

Definition 2. Let j = 0,1,2,.... A truncated subgame rj (x0,t0 +j^t,t0 +j^t+T) is a game defined for the interval [t0 + j^t, t0 + j^t + T] in the following way. At interval [t0 + j^t, t0 + j^t + T] the payoff function, motion equation in the truncated subgame, and initial game model r(x0,t0,T) coincide:

Xj(s) = A(s)xj(s) + Bi(s)vj(s,xj) + ... + Bn(s)vn (s,xj),

xj (t0 + j^t)= x0, (19)

xj e Rn, vj = (vj,..., vjn), vj = vj (s, xj) e Ui C compRk, t e [t0, +to).

t0+jAt+T

Kj (xj,t0 + j^t,t0 + j^t + T; vj) = J (xj(s))' Qi(s)xj(s)

tc+jAt (20)

n

+ ^ (vk(s,xj))'Rik(s)vk(s,xj)ds, i e N.

k=1

At any instant t = t0 + j^t information about the game structure updates, and therefore players adapt to it. This class of game models is called differential games with dynamic updating.

In the same way as in section 3 we will need to define a special form of the Nash equilibrium. According to the approach described above, at any time instant t e [t0, players have or use truncated information about the game structure, therefore classical approaches for determining optimal strategies cannot be directly applied. In order to determine the solution for games with dynamic updating, the notion of feedback-based (open-loop-based) Nash equilibrium with dynamic updating is introduced:

Definition 3. Feedback-based (open-loop-based) Nash equilibrium with dynamic updating

v%E (t,x) = (vf6Ei(t,x),...,vf6En(t,x)) (vNiE (t,x) = (vNE (t,x),...,vNNE(t,x))) of players in the game model with dynamic updating has the form:

{fE (t,x)}=to = f E (t,x), t e (t0 + j^t,t0 + (j + 1)4t], j =0,1, 2, .. . ({vN1E(t,x)}J=to = vN,E(t,x), t e (t0 + j^t,t0 + (j + 1)4t], j =0,1, 2,...)

(21)

where fj(t,x) = fT(t,x),...,fnE(t,x)) (vN,E(t,x) = (vjiNE(t,x),...,

vjJ°nE(t, x))) is some fixed feedback-based (open-loop) Nash equilibrium in the truncated subgame fj(x0'NE,t0 + j^t,t0 + j^t + T), j = 0,1, 2,... starting along the equilibrium trajectory of the previous truncated subgame: j = xj-1 'NE(t0 + j^t).

It is important to notice that Nash equilibrium with dynamic updating vNE (t,x) is not the Nash equilibrium in the classical sense, but can be used as a solution

concept related to Nash equilibrium for a class of games with dynamic updating. Corresponding trajectory x^ (t) (xNE (t)) is obtained by using motion equation (1) and the feedback-based (open-loop-based) Nash equilibrium with dynamic updating (t,x) = (t,x),...,<E (t,x)).

4.2. Nash Equilibrium with Dynamic Updating

Another important set of results from this paper are sufficient conditions for the existence of a feedback-based and open-loop-based Nash equilibrium with dynamic updating for a non-autonomous case:

Theorem 3. For an N-person linear-quadratic differential game Jj, to + j^t, t0 + j^t + T) with dynamic updating with Qi(-) > 0, Rik(•) > 0 (i, k G N, i = k), let system of N coupled matrix Riccati differential equations

^T12 + (T)Fj (T) + (Fj (T))' (T) + Qi(tj + TT) + +T2 £ Z j (t)Bfc (tj + Tt)R-fc1 (tj + Tt)Rifc (tj + Tt) x

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

fcew

xR-1(tj + Tt)B£(tj + Tt)Zj (t) = 0,

Ztj(1)=0, t G [0,1], i G N, (22)

where

Fj (t) = TA(tj + Tt) - T2 £ Bi(tj + Tt)R- 1(tj + Tt)B(tj + Tt)Ztj (t),

has a solution Ztj (•) > 0, i G N, for t > t0 +j^t. Then a linear-quadratic differential game with dynamic updating has a feedback-based Nash equilibrium with dynamic updating such that for t G [t0 + j^t, t0 + (j + 1)^t] it is given by

fE (t,x) = —R- 1(t)Bi(t)Z^t - (toT+ Tx, j =0,1, 2,..., i G N. (23)

Proof. The proof of this theorem is similar to the proof of Theorem 1. The essential difference is the time interval of solutions matching in the whole game and the subgame. On one hand, the Nash Equilibrium with continuous updating coincides with the Nash equilibrium in subgame r(x,t,t + T) in only one point t. On the other hand, the Nash equilibrium with dynamic updating coincides with the Nash equilibrium in subgame fj(x0,t0 + j^t,t0 + j^t + T) on time interval [t0 + j^t,t0 + (j + 1)^t]. Thus proof of this theorem repeats the proof of theorem 1 down to (14).

In a game with dynamic updating, we update information only in discrete time moments tj = t0 + j^t, j G N, and specify (14) for the game with dynamic updating

f^x) = —R- 1(s)Bi(s)Zt^^ Tx

that equals to (23).

Theorem 4. For an N-person linear-quadratic differential game /j-(x0,t0 + j^t, t0 + j^t + T) with dynamic updating with Qi(-) > 0, Rik(•) > 0 (i, k G N, i = k),

let there exist a solution set {M^3, i e N, tj ^ t0} to the coupled matrix Riccati differential equations

7 71 /Ttj(

i (T) + TMitj(T)A(tj + Tr) + TA'(tj + Tr)Mf (t) + Qi(tj + Tr)-

dr

- T2Mt' (r) £ Bk(tj + Tr) (Rkk(tj + Tr))-1 Bk(tj + Tr)Mk (r) = 0, (24)

keN

Mf (1) = 0, r e [0,1], i e N.

Then, the differential game with dynamic updating admits a open-loop-based Nash equilibrium with dynamic updating for t e [t0 + j^t, t0 + (j + 1)^t] is given by

.NE/. N _ t — (t0 + j^t)N

NE/, \ Z?-1/+\ T3't4\A/rtj (t - (t0 + „

v°,i(t,x) = -R- (t)Bi(t)Mi3 (—t— J x

j't - (t0 + j^t)\-

x ^-^^-j Tx(t0 + j^t), j = 0,1, 2,..., i e N,

where i e N, is the solution of following equation

£ = ( A(t + Tr) - £ Bi(t + Tr)Ri-1(t + Tr)Bi(t + Tr) ) ^(r),

V ieN J

& (0) = E.

Proof. The proof of this Theorem can be obtained using Theorem 2 in the same way as we obtained the proof of Theorem 3 using Theorem 1.

4.3. Convergence of Nash Equilibrium Strategies and Trajectory

Now we show the convergence of Nash equilibrium and a corresponding equilibrium trajectory for a differential game with dynamic updating with corresponding equilibria for a differential game with continuous updating.

Lemma 2. Let some matrix function U(t) be uniformly bounded when t ^ t0, some parametric matrix function Pt(r) is continuously differentiated with respect to both t and r and dPt(T) and dPT(T) are uniformly bounded when t ^ t0, r e [0,1].

For ^ 0, x e X (X — limited set), t e [t0 + j^t, t0 + (j + 1)^t] and < T the following convergence holds:

U(t)Pt(0)x ^ U(t)Pto +Wt - t0 +

[to, + ~) V T

Proof. Introduce the notation tj =f t0 + j^t, then t e [tj,tj+1]. We need to show that

tt

0,

U(t)Pt(0)x - U(t)P3 ^^^ I x when ^ 0.

From Taylor decomposition for Pt(r) at the point t = tj we obtain:

, dPt(r)

Pt(r) = Pt3 (r) +

dt

(t - tj)+ o(t - tj).

t=t3

From Taylor decomposition for Pj (t) at the point t = 0 we obtain:

Pr (t ) = Pr (0) +

In result we have estimation: U(t)P-(0)x — U(t)PV ' t — tj

dPr (t)

T

< IIU(t)||||x||

dP -(t )

dt

dT

+ o(T).

T=0

<

4t+

Pr (t)

dT

(25)

+ o(^t) ,

where t = -^r.

When ^ 0 the right hand side of (25) converges to zero and as a result the left hand side of (25) also converges to zero. This completes the proof.

Theorem 5. Let the conditions of Theorem 1 be satisfied, the Nash equilibrium is unique in game r(x, t,t+T) Vt > t0, R- 1(t)Bi(t) is uniformly bounded when t ^ t0, the solution of Riccati equation (8) Z-(t) is continuously differentiated with respect to t and t and 2, ¿^tr^2 are uniformly bounded when t ^ t0, t G [0,1].

For ^ 0 and x G X (X — limited set) the feedback-based Nash equilibrium with dynamic updating (t,x) uniformly converges to the feedback-based Nash equilibrium with continuous updating wf^(t, x):

fi (t,x) ^ x), i G N.

(26)

Proof. With notation tj =f t0 + j^t and t G [tj,tj+1] consider the expressions for

uNE and vf E

fi(t, x) = — R- 1(t)Bi(t)Z- (0) Tx,

..n-eaa „.-! _ d-^ d'm / t —

>vfofi(t,x) = — R- 1(t)Bi(t)Z-^Tx, t G [t0 + j^t,t0 + (j + 1)4t],

where Z-(t) — solution of (8). Let U(t) = — TR- 1(t)Bi(t), P-(t) = Z- (t). Then the application of lemma 2 completes the proof.

Theorem 6. Let the conditions of Theorem 2 be satisfied, the Nash equilibrium is unique in game r(x, t, t+T) Vt ^ t0, R- 1(t)Bi(t) is uniformly bounded when t ^ t0, the product of the solution of Riccati equation (22) M^t) and ^-(t) are continuously differentiated with respect to both t and t and ——tL!—, ——tL!— uniformly bounded when t ^ t0, t G [0,1], where

d-

dr

^^ = ( A(t + TT) — £ Bi(t + Tt)R- 1(t + Tt)Bi(t + Tt) J ^-(t) T V ¿ef /

j-(0) = E.

-=-

T

For ^ 0 and x G X (X — limited set) an open-loop-based Nash equilibrium with dynamic updating v^(t, x) uniformly converges to open-loop-based Nash equilibrium with continuous updating w^f (t,x):

vNNE(t,x) ^ (t,x), i G N. (27)

Proof. With notation tj =f t0 + j^t and t G [tj,tj+1], consider the expressions for

and vNE.

Moi,i and Voi,i .

(t,x) = -R_ 1(t)Bi(t)Mj(0)Tx,

vNE(t,x) = -R-1(t)B;(t)Mt- (t (t0_+

x jt - (t0_+ ^ Tx, t e [t0 + j^t,t0 + (j + 1)4t],

where Mit(r) is the solution of (22).

Let U(t) = -TR- 1(t)Bi(t), Pt(r) = Mf (r) ^ (r). Then the application of lemma 2 completes the proof.

Lemma 3. Let some matrix functions V1(t), V2(t) be uniformly bounded when t ^ t0, some parametric matrix function P t(r) be continuously differentiated with respect to both t and r and dF) and ) are uniformly bounded when t ^ t0, r e [0,1].

Consider differential equation for y(t), t e [t0, +to]

dy(t)

dt

= (V1(t)+ V2(t)P4(0)) y, (28)

where y(to) = yo. _

Consider differential equation for Zj(t), t G [t0 + j^t,t0 + (j + 1)^t], < T

dzj(t) dt

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Vi(t)+ V2(t)P j (t - (t0T+ j^t)))

(29)

where Zj(t0 + j^t) = Zj_i(t0 + j^t) for j > 1, Z0^) = y0. Let y*(t) be a solution of (28) and

Z0(t), t G [t0,t0 + ¿t],

Z*(t) =

1 Zj(t), t G (t0 + j^t,t0 + (j + 1)^t],

where Zj(t) satisfies (29). Let y*(t), z*(t) exist for t ^ t0. Then z*(t) point-wise converges to the y*(t) for ^ 0:

z^(t) ^0 ^(t).

def

Proof. Introduce the notation tj = t0 + j^t, then t e [tj,tj+1]. We need to show

that ||z*(t) - y*(t)|| ^ 0 when ^ 0.

z

j

Trajectories z*(t) and y*(t) satisfy the differential equations (29), (28) for t G [tj ,tj+1]. Notice that

P-(0)y* — Pr (^ = P-(0)(y* — z*) + (P-(0) — Pr (^ j I z .

Let Wj(t) = y*(t) — z*(t) for t G [tj-,tj+1], V3(t) = V1(t) + V2(t)P-(0) and

fj (t) = V2(t) P-(0) — P

D-r I t tj

T

z*(t).

Then wj (t) satisfies the following differential equation

Consider

dwj (t) dt

(t) =

Vs(t)wj (t)+ fj (t).

w0(t), t G [t0,t0 + ¿t]

,-(t), t G (t0 + j^t,t0 + (j + 1)^t],

(30)

and

f(t) =

f0(t), t G [t0,t0 + ¿t],

fj (t), t G (t0 + j^t,t0 + (j + 1)4t],

then (30) satisfies the following differential equation

w(t) = V2(t)w(t) + f (t).

with initial state w(t0) = 0, since z(t0) = y(t0). By the Cauchy formula we have for any t > t0

w(t) = y (t) j y-1(e)f (e)de

where Y(t) is the fundamental matrix of = V2(t)w(t). Taking this into account we have for fixed t

where

lim0 ||w(t)|| < lim0 [||Y(t)|w>(t)^^t(t — t0) + o(^t)] = 0,

w(t)= max ||Y-1(£)||

ie[-o,-j

(31)

£ = I|V2(t)|

dP -(t )

dt

+

dP-r (T)

dT

t =0

|M (t),

M(t)= max ||z(£)||.

ie[-o,-j

According to (31) w(t) ^ 0, when ^ 0. This proves the lemma.

-

-=-

Theorem 7. Let the conditions of Theorem 1 be satisfied, the Nash equilibrium is unique in game r(x, t,t + T) Vt > t0, A(t), Bi(t)R- 1(t)Bi(t) are uniformly bounded for i G N and t ^ t0, the solution of Riccati equation (8) Z-(t) is continuously differentiated with respect to both t and t and dZ(t2, 2 are uniformly bounded for t > t0,T G [0,1]. T

A feedback-based equilibrium trajectory in the game with dynamic updating xf® (t) point-wise converges to the feedback-based equilibrium trajectory af® (t) in the game with continuous updating for ^ 0:

fb

(t)

-WE fb

(t).

(32)

Proof. Let

Vi(t) = A(t),

V2(t) = [-TBi(t)fi-ii(t)Bi(t), ..., -TBw(t)R-W(t)BN(t)],

P *(t)

LZwj

Then lemma 3 applying completes the proof.

Theorem 8. Let the conditions of Theorem 2 be satisfied, Nash equilibrium is unique in game r(x, t,t + T) Vt > t0, A(t), Bi(t)R- 1(t)Bi(t) are uniformly bounded for i G N and t ^ t0, product Mi-(T)^-(t) is continuously differentiated with respect to both t and t and d—i ^^ (t2, d—i (T2 are uniformly bounded for t ^ t0,T G [0,1] where Mi-(T) is the solution of Riccati equation (22), ^-(t) is solution of

dT- = ( A(t + Tt) — £ Bi(t + Tt)R-1(t + Tt)Bi(t + Tt) ) j-(T), T V ¿ef /

j- (0) = E.

An open-loop-based equilibrium trajectory in a game with dynamic updating x^®(t) point-wise converges to the open-loop-based equilibrium trajectory in a game with continuous updating x^® (t) for ^ 0:

-.WE

(t) ^

-WE

(t).

(33)

Proof. Let

Vi(t) = A(t),

V2(t)= [-TBi(t)P-ii(t)Bi(t), ..., -TBn(t)R-W(t)Bw(t)] ,

P4(r) =

Mi(r )^(r)

|_Mf (t )j-(T)_ Then the application of lemma 3 completes the proof.

Later in section 5, convergence results will be demonstrated using one differential game model.

ol

5. Example Model

5.1. Common Description

Consider some non-autonomous linear quadratic game model with two players. Assume that dynamics of state x are described by

x(t) = -^t2px(t)+ tpMi(t,x)+ tpu2(t,x), x(t0) = x0. (34)

Assume the cost function of both players is given by

f t

Kj(x0, t0, T; u) = (t2pqjx2(t) + riu2(i,x))dt, i = 1, 2.

Jtn

(35)

5.2. Game Model with Continuous Updating

Now consider the non-autonomous case with continuous updating. Here we suppose that two individuals at each time instant t G [to, use information about motion equations and payoff functions on the interval [t,t + T]. As the current time t evolves the interval, which defines the information shifts as well. Motion equations for the game model with continuous updating have the form

xt(s) = —^s2pxt(s) + spu1(s, x) + sp«2(s, x), xt(t)

t G [t0, (36)

Cost function of player i G N for the game model with continuous updating is defined as

i+T

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

K|(xt,t,T; ut) = J ((xt(s))2 qs2p + (uf(s,x))^ ds, i = 1, 2.

(37)

According to the Theorem 2 defining the form of feedback Nash equilibrium with continuous updating on the first step we need to solve the following differential equation:

j ki(r) = (t + tT)2p [2T^ki(r) + 3T2 (k4(r))2 - q \ki(l) = 0. The solution of (38) is

(38)

kt(r ) = — tanh

v ; 3T

Tv 2p +1

((t + T )2p+1 — (t + tT )2p+1 ) + tanh

-i £

— -i, (39) 3T v y

where v = v^q + . According to (9) feedback Nash equilibrium with continuous updating has the form:

XNE (t, x) = —kt(0)xTtp.

By substituting (40) in (39) we obtain:

~ne

(t,x)

bxtp vxtp

3

3

tanh

Tv 2p + 1

((t + T )2p+1 — t2^1) + tanh-1 £

(40)

(41)

by substituting (41) in (34) we obtain xNE(t) as solution of equation

•NE ,

x (t) = —(t) + ûne (t,x) + xNE (t, x),

(42)

v

5.3. Game Model with Dynamic Updating

Perform similar calculations for the resulting Nash equilibrium for a game with dynamic updating based on the calculations for the original game and the approach described in Section 4.1. and obtain

t-tA ^

(t,x) = ) xTtp, t G Mi+ij.

(43)

By substituting (39) in (43) we obtain:

„M-P, bxtp vxtp uN (t, x) = —;---;— tanh

3

3

Tv 2p +1

((ti + T)2p+1 - t2^1) + tanh-1 ß

, (44)

by substituting (44) in (34) we obtain xNE (t) as solution of equation

xNE (t) = (t) + (t,x) + (t,x), XNE (0) = xo. (45)

5.4. Autonomous and Non-autonomous Cases

Suppose that information doesn't change in time, i. e. consider autonomous case with continuous updating. To do that we fix t = t0 in (36) and (37). So, we obtain strategies as

<E (t,x) = ^ - V# tanh

Tv 2p + 1

((to + T )

^)2p+1 - t^1) + tanh- ß

ov

, (46)

by substituting (46) in (34) we obtain xNE (t) as solution of equation

•NE ,

(t) = (t) + 2«NNE (t,x).

(47)

0 At 2 At 3At 4 At 5 At 6 At 7 At 8At

0 At 2 At 3At 4 At 5 At 6At 7 At 8 At

Fig. 3. xNE (t) (42) - blue line, xNE (t) (45) Fig. 4. uNE (t) (41) - blue line, uNE (t) (44) - red broken line . - red broken line .

lOAt 15At 20At 25At 30At 35At 40At

lOAt 15At 20At 25At 30At 35At 40At

0

0

Fig. 5. xNE (t) (42) - blue line, xNE (t) (45) Fig. 6. ûne (t) (41) - blue line, ÛNE (t) (44) - red broken line . - red broken line .

0

Fig. 7. xNE (t) (42) - blue line,XN red line.

?(t) (47) - Fig. 8. ÛNE (t, x) (41) - blue line, uN0E (t,x) (46) - red line .

5.5. Numerical Simulation

Consider the results of numerical simulation for the game model presented above on the interval [0.1,4.1], i.e. t0 = 0.1, T = 4.1. At the initial instant t0 = 0.1 the stock of knowledge is 100, i.e. x0 = 100. The other parameters of models: p = 0.01, p = 0.4, q = 0.5, T =1. Suppose that for the case of a dynamic updating (red solid and dotted lines Fig. 3-4), the intervals between updating instants are At = 0.5, therefore l = 8. In Fig. 3 the comparison of resulting Nash equilibrium in the game with dynamic updating (red line) and Nash equilibrium with continuous updating (blue line) is presented. In Fig. 4 similar results are presented for the strategies.

In order to demonstrate the results of Theorems 8 and 9 on convergence of resulting equilibrium strategies and corresponding trajectory to the equilibrium strategies and trajectory with continuous updating, consider the simulation results for a case of frequent updating, namely l = 40. Fig. 5-6 represent the same solutions as in Fig. 3-4, but for the case, when At = 0.1. Therefore, convergence results are confirmed by the numerical experiments presented below.

To compare autonomous and non-autonomous cases with continuous updating, consider the simulation results on t G [1,8], with parameters p = 0.01, p = 0.4,

q = 0.5, xo = 100. Obtained strategies uNE(t, x) and uNE(t, x) in non-autonomous and autonomous cases respectively are presented on Fig. 8. In additional obtained trajectories uNE (t) and U0E (t) are presented on Fig. 7.

6. Conclusion

The concepts of feedback-based and open-loop-based Nash equilibrium for the class of non-autonomous linear-quadratic differential games with continuous updating are constructed and the corresponding Theorems are presented. The forms of feedback-based and open-loop-based Nash equilibrium with dynamic updating are also presented and convergence of feedback-based and open-loop-based Nash equilibrium with dynamic updating to the feedback-based and open-loop-based Nash equilibrium with continuous updating as the number of updating instants converges to infinity is proved. The results are demonstrated using the differential game model of knowledge stock. Obtained results are both fundamental and applied in nature since they allow specialists from the applied field to use a new mathematical tool for more realistic modeling of engineering system describing human-machine interaction.

References

Basar, T. and Olsder, G. J. (1995). Dynamic noncooperative game theory. Academic Press, London.

Bellman, R. (1957). Dynamic Programming. Princeton University Press: Princeton, NJ. Bemporad, A., Morari, M., Dua, V. and Pistikopoulos, E. (2002). The explicit linear

quadratic regulator for constrained systems. Automatica, 38(1), 3-20. Eisele, T. (1982). Nonexistence and nonuniqueness of open-loop equilibria in linear-quadratic differential games. Journal of Optimization Theory and Applications, 37(4), 443-468.

Engwerda, J. (2005). LQ Dynamic Optimization and Differential Games, Willey: New York.

Filippov, A. (2004). Introduction to the theory of differential equations (in Russian), Editorial URSS: Moscow.

Goodwin, G., Seron, M. and Dona, J. (2005). Constrained Control and Estimation: An

Optimisation Approach, Springer-Verlag: London. Gromova, E. V. and Petrosian, O. L. (2016). Control of information horizon for cooperative differential game of pollution control, 2016 International Conference Stability and Oscillations of Nonlinear Control Systems: Pyatnitskiy. Hempel, A., Goulart, P. and Lygeros, J. (2015). Inverse parametric optimization with an application to hybrid system control. IEEE Transactions on Automatic Control, 60(4), 1064-1069.

Isaacs, R. (1965). Differential Games, John Wiley and Sons: New York. Kuchkarov, I. and Petrosian, O. (2019). On class of linear quadratic non-cooperative differential games with continuous updating. Lecture Notes in Computer Science, 11548, 635-650.

Kuchkarov, I. and Petrosian, O. (2020). Open-loop based strategies for autonomous linear quadratic game models with continuous updating. In: Kononov, A., M. Khachay, V. A. Kalyagin and P. Pardalos (eds.). Mathematical Optimization Theory and Operations Research, pp. 212-230. Springer International Publishing: Cham. Kwon, W., Bruckstein, A. and Kailath, T. (1982). Stabilizing state-feedback design via the

st

moving horizon method. 21 IEEE Conference on Decision and Control. Kwon, W. and Han, S. (2005). Receding Horizon Control: Model Predictive Control for State Models, Springer-Verlag: London.

Kwon, W. and Pearson, A. (1977). A modified quadratic cost problem and feedback stabilization of a linear system. IEEE Transactions on Automatic Control, 22(5), 838-842.

Mayne, D. and Michalska, H. (1990). Receding horizon control of nonlinear systems. IEEE Transactions on Automatic Control, 35(7), 814-824.

Petrosian, O. L. (2016a). Looking forward approach in cooperative differential games. International Game Theory Review, (18), 1-14.

Petrosian, O. L. (2016b). Looking forward approach in cooperative differential games with infinite-horizon. Vestnik of Saint Petersburg University. Series 10. Applied Mathematics. Computer Science. Control Processes, (4), 18-30.

Petrosian, O. L. and Barabanov, A. E. (2017). Looking forward approach in cooperative differential games with uncertain-stochastic dynamics. Journal of Optimization Theory and Applications, 172, 328-347.

Petrosian, O.L., Nastych, M. A. and Volf, D. A. (2018). Non-cooperative differential game model of oil market with looking forward approach. Frontiers of Dynamic Games, Game Theory and Management, St. Petersburg, 2017 (Petrosyan L. A., V. V. Mazalov and N. Zenkevich eds), Birkhauser: Basel.

Petrosian, O. L., Nastych, M. A. and Volf, D. A. (2017). Differential game of oil market with moving informational horizon and non-transferable utility. 2017 Constructive Nons-mooth Analysis and Related Topics (dedicated to the memory of V. F. Demyanov).

Petrosian, O., Shi, L., Li, Y. and Gao, H. (2019). Moving information horizon approach for dynamic game models, Mathematics, 7(12), 1-31.

Petrosian, O. and Tur, A. (2019). Hamilton-jacobi-bellman equations for non-cooperative differential games with continuous updating. In: Mathematical Optimization Theory and Operations Research, pp. 178-191.

Petrosyan, L. A. and Murzov, N. V. (1966). Game-theoretic problems in mechanics. Lithuanian Mathematical Collection, (3), 423-433 .

Pontryagin, L. S. (1996). On theory of differential games. Successes of Mathematical Sciences, 26, 4(130), 219-274.

Rawlings, J. and Mayne, D. (2009). Model Predictive Control: Theory and Design. Nob Hill Publishing, LLC: Madison.

Shaw, L. (1979). Nonlinear control of linear multivariable systems via state-dependent feedback gains. IEEE Transactions on Automatic Control, 24(1), 108-112.

Shevkoplyas, E. (2014). Optimal solutions in differential games with random duration. Journal of Mathematical Sciences, 199(6), 715-722.

Wang, L. (2005). Model Predictive Control System Design and Implementation Using MATLAB. Springer-Verlag: London.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Yeung, D.W.K. and Petrosian, O. (2017).Cooperative stochastic differential games with information adaptation. International Conference on Communication and Electronic Information Engineering.

Yeung, D. W. K. and Petrosian, O. (2017). Infinite horizon dynamic games: A new approach via information updating. International Game Theory Review, 19, 1-23.

i Надоели баннеры? Вы всегда можете отключить рекламу.