Научная статья на тему 'On calculating the value of a differential game in the class of counter strategies'

On calculating the value of a differential game in the class of counter strategies Текст научной статьи по специальности «Математика»

CC BY
57
7
i Надоели баннеры? Вы всегда можете отключить рекламу.
Журнал
Ural Mathematical Journal
Scopus
ВАК
Область наук
Ключевые слова
DIFFERENTIAL GAMES / VALUE OF THE GAME / SADDLE POINT / COUNTER STRATEGIES

Аннотация научной статьи по математике, автор научной работы — Gomoyunov Mikhail I., Kornev Dmitry V.

For a linear dynamical system with control and disturbance, a feedback control problem is considered, in which the Euclidean norm of a set of deviations of the system’s motion from given targets at given instants of time is optimized. The problem is formalized into a differential game in “strategy-counter strategy” classes. A game value computing procedure which reduces the problem to a recursive construction of upper convex hulls of auxiliary functions is justified. Results of numerical simulations are presented.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «On calculating the value of a differential game in the class of counter strategies»

URAL MATHEMATICAL JOURNAL, Vol. 2, No. 1, 2016

ON CALCULATING THE VALUE OF A DIFFERENTIAL GAME IN THE CLASS OF COUNTER STRATEGIES12

Mikhail I. Gomoyunov

Krasovskii Institute of Mathematics and Mechanics, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, [email protected]

Dmitry V. Kornev

Institute of Mathematics and Computer Sciences, Ural Federal University;

Krasovskii Institute of Mathematics and Mechanics, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, [email protected]

Abstract: For a linear dynamical system with control and disturbance, a feedback control problem is considered, in which the Euclidean norm of a set of deviations of the system's motion from given targets at given instants of time is optimized. The problem is formalized into a differential game in "strategy-counter strategy" classes. A game value computing procedure which reduces the problem to a recursive construction of upper convex hulls of auxiliary functions is justified. Results of numerical simulations are presented.

Keywords: Differential games, Value of the game, Saddle point, Counter strategies.

Introduction

In this paper a linear dynamical system subjected to actions of control and disturbance is considered. A feedback control problem with quality index optimization is posed. The quality index is given in the form of the Euclidean norm of a set of deviations of the system's motion from given targets at given instants of time. The "saddle point condition in a small game" [1, p. 79] (see also [5, p. 46]) also known as the Isaacs condition [2] (see further inequality (2.7)) is not assumed. Withing the game-theoretic approach [1-10] the problem is formalized into a positional differential game in "strategy-counter strategy" classes (see, e. g., [1, p. 78], [5, p. 20]).

Basing on methods from [4,5], a procedure that reduces the considered problem under condition (2.7) to recurrent constructions of upper convex hulls of auxiliary functions was given in works [9,10]. In the present paper, the applicability of that procedure is proved for the case when condition (2.7) is not imposed. To achieve this, we follow the idea of unification of differential games [3] and use constructions of characteristic inclusions from the theory of minimax solutions of Hamilton-Jacobi equations [6] (see also [7,8]).

Results of numerical simulations are presented.

1. Problem Statement

Consider a dynamical system described by the following equation:

dx/dt = A(t)x + f (t,u,v), to < t < x e Mra, u e P c Rr, v e Q c Rs. (1.1)

*The paper is a translation of the paper "On calculating the value of a differential game in the class of counterstrategies" by M.I.Gomoyunov and D.V.Kornev published in Trudi Instituta Matematiki i Mehaniki UrO RAN, 2013, vol. 19, no. 1, pp. 59-68.

2This work was supported by Complex Program of Fundamental Research UrO RAN (project 15-16-1-13).

Here t is time, x is a phase vector, u is a control vector, v is a disturbance vector; t0 and 0 are fixed instants of time (to < 0); P and Q are given compact sets; matrix function A(t) is continuous on [t0,0], vector function f (t,u,v) is continuous on [t0,0] x P x Q.

A current position of system (1.1) is a pair (t, x) £ [t0,0] x Rra. Denote

A1 = max max ||A(t)x||, X2 = max \\f (t,u,v)\\, A = max{X1,X2}. (1.2)

te[to,-&] xeRn, ||x||=i (t,u,v)e[to,-&]*PxQ

Here and further the symbol || • || denotes the Euclidean vector norm. Define a set K of possible positions:

K = {(t, x) £ [to, 0] x Rra : WxW < (1 + Ro)e(t-to)x - 1}, (1.3)

where R0 > 0 is some fixed number. Let a position (t*,x*) £ K,t* < 0, and an instant t* £ (t*,0] be given. We assume that admissible control and disturbance realizations are Borel measurable functions u[t*[-]t*) = {u(t) £ P, t* ^ t < t*} and v[t*[-]t*) = {v(t) £ Q, t* ^ t < t*}, respectively. From the position (t*,x*), such realizations uniquely generate the motion of system (1.1) as an absolutely continuous vector-function x[t*[-]t*] = {x(t) £ Rra, t* ^ t ^ t*} which for t = t* satisfies the condition x(t*) = x* and for almost all t £ [t*,t*] together with u = u(t) and v = v(t) satisfies equation (1.1). Besides, any position (t,x(t)), t £ [t*,t*], realized on this motion belongs to the set

K (see, e. g., [1, p. 41], [5, p. 40]). _ _ _ _

It is assumed that a natural number N; instants of time t[i £ [t0,0], t[i] < t[i+1\ i = 1,N - 1, tN ] = 0; constant (p[i] x n)-matrices D[i (1 ^ p[i ^ n) and n-dimensional vectors g[i], i = 1,N, are given. The quality of the motion x[t*[-]0], generated from the position (t*,x*) £ K by some admissible realizations u[t*[-]0) and v[t*[^]0), is evaluated by the following index

/ N \ 1/2 7 = 7(*[M-]0])= E ", (1.4)

i=h(tt) '

where

h(t) = min{i =ï;N : t[iil ^ t}, t £ [io,0], (1.5)

The aim of the control is to make quality index 7 (1.4) as small as possible. While solving this problem, it is convenient to consider a problem of forming the most unfavorable from the control's point of view disturbance actions aimed at maximizing y-

According to [1, p. 75; 5, p. 51], these two problems may be united into an antagonistic positional differential game of two players in "strategy-counter strategy" classes. A control action u is interpreted as an action of the first player, a disturbance action v is interpreted as an action of the second player. Admissible strategy u(-) of the first player is an arbitrary function

u(-) = {u(t,x,e) £ P, (t,x) £ K, e > 0}.

Admissible counter strategy of the second player is an arbitrary function

v( ) = {v(t,x,u,e) £ Q, (t,x) £ K, u £ P, e > 0}

which for fixed (t,x) £ K, e > 0 is Borel measurable with respect to u £ P. Here e > 0 is the accuracy parameter (see., e.g., [1, p. 68], [5, p. 47]).

It follows from results of monographs [1,5] that differential game (1.1), (1.4) has a value p(-) and a saddle point which consists of the optimal minimax strategy u0(-) and maximin counter strategy v°(-). Particularly, it means that for any number ( > 0 there exist such a number e* > 0 and a function ¿*(e) > 0, e £ (0, e*] that, for any initial position (t*,x*) £ K, t* < value of the accuracy parameter e £ (0, e*] and partition AM{ti} = {ti : t1 = t*, ti < ti+1, i = 1, M, tM+1 = 0} of the time segment [t*,0] with the diameter 5M = maxi=^M(ti+1 — ti) ^ S*(e), on the one hand,

a step-by-step control law of the first player U0 = {u°(•),£, AM{t} which forms the following control actions

u(t) = u°(ti,x(ti),£), ti < t<ti+1, i = 1,M, guarantees the inequality

Y < p(t*,x*) + (, (1.6)

for any admissible realization v[t*[^]$); on the other hand, for any admissible realization u[t*[-]$) a step-by-step control law of the second player V0 = {v0(-),e, AM{ti}} which forms the following actions

v(t)= v0(ti,x(ti),u(t),e), ti < t<ti+i, i = 1,M, guarantees the inequality

Y > p(t*,x*) - (1.7) 2. Procedure for Calculating the Game Value

In accordance with [10], consider the following procedure for calculating the value of differential game (1.1), (1.4). Let t* £ [t0,$). Assign a partition of the time segment [t*,§] :

Afc = Afc{Tj} = {Tj : Ti = t*, Tj < Tj+i, j = 1, k, Tk+i = §}. (2.1)

In further considerations of partitions like (2.1) we will assume that it contains the instants t[i], i = h(t*),N, from quality index (1.4).

Let X(t, t) be a fundamental solution matrix of the equation dx/dt = A(t)x such that X(t, t) = E. Denote

Tj+i

A^j(t*,m) = minmax(m,X(§,t)f (T,u,v))dT, m £ Rra, j = 1,k. (2.2)

J u£P v£Q Tj

Here and further the symbol (-, ■) denotes the inner product of vectors. Step by step, in the reverse order, starting from the last point of the partition Ak (2.1), define sets Gj(t*,Tj ± 0) of vectors m £ Rra and scalar functions pj(t*,Tj ± 0, m), m £ Gj(t*,Tj ± 0), j = 1, k + 1. For j = k + 1, we set

Gfc+i(t*,Tfc+i + 0) = {m £ Rra : m = 0}, Pk+i(t*, Tk+i + 0, m) = 0, m £ Gk+i(t*, Tk+i + 0), Gfc+i(t*,Tfc+i - 0) = {m £ Rra : m = D[N]Tl, l £ , \\l\\ < 1}, Pk+i(t*,Tk+i - 0, m) = -(m,g[N]), m £ Gk+i(t*,Tk+i - 0),

where the upper index T denotes the matrix transposition.

Further constructions are carried out according to the following recurrent relations. Assume that for j + 1, 1 ^ j ^ k, the sets Gj+i(t*,Tj+i ± 0) and the functions pj+i(t*,Tj+i ± 0, m), m £ Gj+i (t*, Tj+i ± 0), are already defined. Then, for the current j, let us define

Gj(t*,Tj + 0) = Gj+i(t*, Tj+i - 0), ^j(t*, m) = A^j(t*, m) + pj+i(t*,Tj+i - 0,m), m £ Gj(t*,Tj + 0),

Pj (t*,Tj + ■) = (t*, 0} *Gj ^^ +0),

where the symbol {^(O}*G denotes the upper convex hull of the function on the set G, i.e. the minimal concave function that majorizes ■0(-) for m £ G.

Next, if the instant Tj is not equal to any of the instants t[i from (1.4), then we set

Gj(t*, Tj - 0) = Gj(t*,Tj +0), Pj(t*,Tj - 0, m) = pj(t*,Tj + 0,m), m £ Gj(t*,Tj - 0).

Otherwise, if Tj = t[h\ h = h(Tj), then we define

Gj(t*,Tj - 0) = {m £ Rra : m = vm* + XT(t[h],0)D[h]Tl, 0 < v < 1,

l £ , M2 < 1 - v2,m* £ Gj(t*,Tj + 0)}, ^

Pj(t*,Tj - 0, m)= max [vpj(t*,Tj + 0,m*) - {l, D[h^g[h^)], m £ Gj(t*,Tj - 0),

{v,mt ,l}

where maximum is calculated over all such triples {v,m*,l} that according to (2.3) correspond to the given vector m £ Gj(t*,Tj - 0). Let us denote

e(t# ± 0, x; Ak)= max [{m,X(0,U)x) + pi(U,Ti ± 0,m)], x £ Rra (2.4)

mGGi(t* ,T1±0)

For t* = 0, we formally assume that Ak denotes a degenerate partition which contains only one instant Ti = t* = 0 = Tk+i, and Gi(t*,Ti ± 0) = Gk+i(t*,Tk+i ± 0), and pi(t*,Ti ± 0, m) = Pk+i(t*, Tk+i ± 0,m), m £ Gi(t*,Ti ± 0). Then we have

e(0 - 0, x; Ak) = |D[N] (x - g[N])W, e(0 + 0,x;Ak) = 0, x £ Rra. (2.5)

Theorem 1. For any number £ > 0 there exists a number 5 > 0 such that, for any initial position (t*,x*) £ K and partition Ak (2.1) of the time segment [t*,0] with the diameter 5k = maxj=_(Tj+i - Tj) ^ 5, assuming that the instants t[i\ i = h(t*),N, from quality index (1.4) are contained in this partition, the following inequality holds

\p(t*,x*) - e(t* - 0,x*;Ak)| < (2.6)

In paper [10] the statement of this theorem was proved under the assumption that the following saddle point condition in a small game holds:

minmax{s,f(t,u,v)) = maxmin{s,f(t,u,v)), t £ [t0,0], s £ Rra. (2.7)

Ui^z P v GQ v GQ uGP

The aim of this paper is to prove Theorem 1 without using condition (2.7).

3. The u- and ^-stability properties of the value e( )

In paper [10] inequality (2.6) is proved on the basis of the u- and v-stability properties of value e(-) (2.4) with respect to system (1.1). But in the case when condition (2.7) does not hold, some stricter u-stability property is necessary (see, e.g., [1, p. 208]). If one tries to prove this stricter property by following the scheme from [10], there arise the following substantial problems. When the control action v is formed in response to admissible realizations of u = u(t) by the rule v = v*(u(t)), where the function v* : P ^ Q is Borel measurable, the reachable set of system (1.1) may lack compactness. That is why further we consider an auxiliary z-model, establish proximity of motions of system (1.1) and the z-model, and prove an appropriate u-stability property of the value e(-) with respect to the z-model. Property of v-stability does not depend on condition (2.7), that is why further we use this property as it was stated in [10].

Let S C Rra be a unit sphere and q £ S. Motions of the auxiliary z-model are described by the following differential inclusion

dz/dt £ F*(t,z,q) = A(t)z + F(t,q), t0 < t < 0, z £ Rra, (3.1)

where

F(t, q) = {g £ Rra : \\g\\ < V2\2, (g,q) > H(t,q)}, t £ [t0,0], q £ S, H(t, s) = minmax(s, f (t,u,v)), t £ [t0,0], s £ Rra.

uEP vEQ

Here A2 ^ 0 is the constant from (1.2). Note that similar differential inclusions are considered in order to define minimax solutions of Hamilton-Jacobi equations (see, e.g., [6, p. 14], [8]).

A position of z-model (3.1) is a pair (t,z) £ [t0,0] x Rra. Define a set Kz of possible positions of the z-model:

Kz = {(t, z) £ [t0, 0] X Rra : \\z\\ < (1 + R0 + a)e^(t-to)A - 1}, (3.2)

where a > 0 is some fixed number, and A ^ 0 is the constant defined in (1.2). It can be proved that for any (t, z, q) £ [t0,0] x Rra x S the set F*(t, z, q) is nonempty, convex and compact in Rra, and the multivalued mapping [t0,0] x Rra x S 3 (t,z,q) ^ F*(t,z,q) C Rra is continuous in the Hausdorff metric. Therefore (see, e.g., [11]), for any position (t*,z*) £ Kz, t* < 0, and any t* £ (t*,0] and q £ S differential inclusion (3.1) has at least one solution z[t*[-]t*] = {z(t) £ Rra, t* ^ t ^ t*} that satisfies the equality z(t*) = z*. Each such solution determines a motion of z-model (3.1) that starts from the position (t*,z*). For any such motion an inclusion (t, z(t)) £ Kz, t £ [t*, t*], is valid. Moreover, according to [11], for any fixed q the reachability set of differential inclusion (3.1) at the instant t* from the position (t*,z*) is a convex compact set in Rra.

Lemma 1 (proximity of motions). For any number e > 0 there exists such a number 5 > 0, that the following statement holds. Let (t*,x*) £ K, (t*,z*) £ Kz, t* <0,t* £ (t*,0] and t*-t* ^ 5. Let x[t*[-]t*] be a motion of system (1.1) that is generated from the position (t*,x*) by a control realization ue[t*[-]t*) = {ue(t) = ue £ P,t* ^ t < t*}, where

ue £ argminmax(s*, f (t*,u,v)), s* = x* - z*, (3.3)

uep veQ

together with an arbitrary admissible disturbance realization v[t*[-]t*). Let z[t*[-]t*] be a motion of z-model (3.1), for q = qe, that starts from the position (t*,z*), where

qe £ argmax min (s*,g). (3.4)

qes geF(t* ,q)

Then for all instants t £ [t*,t*] the following inequality holds

v(t, x(t),z(t)) ^ v(t*,x*,z*) + (t - t*)e, (3.5)

where

v(t, x, z) = \\x - z\\2e-2(t-t0)A. (3.6)

Proof of this lemma follows the scheme from [1, lemma 25.1] (see also [5, lemma 7.1]), but instead of a model-copy of system (1.1) model (3.1) is used. Using Lipschitz continuity of the functions x(t) and z(t), continuity of the function f (t, u, v) and of the multivalued function F(t, q), relations (3.3) and (3.4), taking the following equality into account (see, e.g., [6, p. 16], [8])

max min (g,s) = H(t,s), t £ [t0,0], s £ Rra,

q^s gEF(t,q)

we deduce that, for almost all t £ (t*,t*), the following inequality

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

dv(t,x(t),z(t))/dT ^ n(S)

holds for some function n(S) such that n(S) — 0 when S — 0. Integrating this inequality for t* ^ t ^ t, we obtain

v(t, x(t),z(t)) ^ v(t*,x*, z*) + (t — t*)n(S). If S > 0 is chosen under condition n(S) ^ e, then inequality (3.5) holds.

Lemma 2 (property of u-stability with respect to the z-model). Let (t*,z*) £ Kz, t* < 0 and a partition Ak (2.1) is chosen. Let t* = t2 be the second instant of the partition Ak. Then for any q* £ S there exists a motion z[t*[-]t*] of z-model (3.1), for q = q*, that starts from the initial position (t*,z*), such that the following inequality holds

e(t* + 0, z*; Ak) ^ e(t* — 0, z(t*);A*k*).

Here Ak* is a partition of the time segment [t*,0], induced by the instants from the partition Ak :

A*k* = Ak*{t*} = {t* = t+1 £ Ak : j = 1,k* + 1, k* = k — 1}. (3.7)

Proof of this lemma is similar to the proof of the u-stability property from [10] with a replacement of the reachability set of system (1.1) by the reachability set of differential inclusion (3.1).

Lemma 3 (property of v-stability). Let (t*,x*) £ K, t* < 0, and a partition Ak (2.1) is chosen. Let t* = t2 be the second instant of the partition Ak and Ak* be partition (3.7). Then for any control realization u*[t*[-]t*) = {u*(t) = u* £ P, t* ^ t <t*} there exists such an admissible disturbance realization v[t*[-]t*), that for a motion x[t*[-]t*] of system (1.1) generated from the position (t*,x*) by these realizations the following inequality holds

e(t* + 0, x*; Ak) < e(t* - 0, x(t*); Ak*).

Proof of this lemma is given in [10].

In the proof of Theorem 1 the following fact from [10] is used:

Lemma 4. For any position (t*,z*) £ Kz, t* < 0, and partition Ak (2.1), the following relations hold

[e2(t* + 0,z*;Ak), for t* <t[h(t*)\ e2(t* - 0,z*;Ak) = < (3.8)

{ !D[h(t*)^(z* - g[h(t*)\)H2 + e2(t* + 0, z*; Ak), for t* = t[h(t^,

where the value of h(t*) is determined according to (1.5).

4. Proof of Theorem 1

By the number ( = {/2 > 0 find a number e* > 0 and a function S*(e) > 0, e £ (0, e*], such that inequalities (1.6) and (1.7) hold. Choose such a number e1 > 0, that inequality e1 + (0 — t°)e1 ^ a2e-2(^-to)x is valid, where the constant a > 0 is taken from (3.2), and the constant A ^ 0 is defined in (1.2). Find such a number e2 > 0 that for any i = 1, N and any positions (t, z1) £ Kz and (t, z2) £ Kz, for which v (t, z1,z2) ^ e2 + (0 — t°)e2, where the function v (■) is taken from (3.6), the following inequality is valid:

\\D[Hz1 — g[il)\\2 — \\D[il(z2 — g[il)\\2 < Z2/N. (4.1)

By the number e = min{e*,£1,62} > 0 choose such a number 5* > 0, for which the statement of Lemma 1 holds. Let us show that the number 5 = min{5*(e),5*} > 0 satisfies the statement of Theorem 1.

For positions (Tj ,x) £ K, j = 1,k + 1, let us define accompanying points z(Tj ,x,e) £ Rra :

z(Tj,x, e) £ argmin e(Tj — 0, z; A^j), (4.2)

z

where minimum is taken under condition

v(Tj,x, z) ^ e + (Tj — to)e, (4.3)

and the partitions A^j are defined on the basis of the partition Ak = Ak{Tj} in the following way:

A ¡j = A ¡j {T(j)} = {Tj = Tj £ Ak : i = 1,kj + 1, kj = k — j + 1}.

Note that, taking into account the choice of the number ei > 0, the inclusion (Tj,z(Tj,x,e)) £ Kz follows from (1.3) and (3.2). For t = Tj, j = 1, k, define a control strategy ue(-) by the condition of the extremal shift to accompanying points:

ue(Tj,x,e) £ argminmax(s(Tj,x,e),f(Tj,u,v)), s(Tj,x,e) = x — z(Tj,x,e), (Tj,x) £ K. (4.4) uep veQ

For the other values of t the strategy ue(-) is defined arbitrarily.

Let x[t*[-]$] be a motion of system (1.1) generated from the position (t*,x*) when the first player forms their control actions according to the control law Ue = {ue(-), e, Ak}, while the second player uses the law V0 = {v0(-), e, Ak} on the basis of the optimal maximin counter strategy v0(-). Then by the above mentioned choice of e* > 0 and 5*(e) > 0 inequality (1.7) holds on this motion. By induction from j = 1 to j = k + 1 let us show that along this motion the following inequality holds:

h(Tj )-i

E ll^](x(t[i]) — /])||2 + e2 (Tj — 0, zj; Aj) < e2 (t* — 0,x*; Ak) + C 2(h(T3) — 1)/N, (4.5) i=h(t*)

where zj = z(Tj,x(Tj),e) and if h(t*) > h(Tj) — 1 the sum is interpreted as zero. For j = 1 inequality (4.5) is derived from relation (4.2).

Given that inequality (4.5) is valid for j, 1 ^ j ^ k, let us prove it for j + 1. Choose a vector qj = qj(Tj,x(Tj), e) £ S from the condition

qj £ argmax min (s(Tj,x(Tj),e),g). (4.6)

qES g£F (Tj ,q)

By Lemma 2 for q = qj there exists such a motion z(j)[Tj[■]Tj+1] of z-model (3.1) that starts from the position (Tj, zj) and for which the following inequality holds

e(Tj + 0, zj; Ajj)) > e(Tj+i — 0, z j)(Tj+i); A j1)). (4.7)

By Lemma 1, due to the choice of the number 5* > 0, taking defenition (4.4) of strategy ue(-), choice (4.6) of the vector qje and inequality (4.3) into account, we obtain

v(Tj+i,x(Tj+i),z(j)(Tj+i)) < v(Tj,x(Tj), zj) + (Tj+i — Tj)e < e + (Tj+i — to)e. Hence, taking into consideration definition (4.2) of accompanying points, we derive

e(Tj+i — 0, z j)(Tj+i); A^) ^ e(Tj+i — 0, zj+i; A ¡j)). (4.8)

From (4.7) and (4.8) we conclude

e(T3+l - 0, Zj+i; A j>) < e(Tj + 0, zj; Aj). (4.9)

If Tj < t[h(Tjthen h(Tj+1) = h(Tj) and the validity of inequality (4.5) for j + 1 follows from inequality (4.9), equality (3.8) and the induction hypothesis.

If Tj = t[h(Tj)], then from (4.9), taking into account equality (3.8), inequality (4.3) and choice (4.1) of number e2 > 0, we derive

e2(Tj+i - 0,zj+i; Aj)) < e2(j - 0,zj ) - \\D[h(T )](z3 - g[h(T ^ <

< e2(Tj - 0, zj; A(j) - \\D[h(T)](x(t[h(Tj)]) - g[h(T)1)\2 + Z2/N,

wherefrom, due to the induction hypothesis and the equality h(Tj+1) = h(Tj) + 1, it follows that inequality (4.5) is valid for j + 1 when Tj = t[h(Tj)].

Taking into consideration (2.5), (4.5) for j = k + 1 together with (4.3) and (4.1), we obtain

N N-1

£ \\D[i(x(t[i) - g[i)\\2 = £ \\D[i(x(t[i) - g[i)\\2 + e2(Tk+i - 0, zk+i; A^))+ i=h(t*) i=h(t*)

+ \\D[N(x(t[N]) - g[N])\\2 - \\D[N(zk+1 - g[N])\\2 < e2(t* - 0,x*; Ak) + Z2.

Thus for the value 7 of quality index (1.4), realized on the considered motion, we have the inequality

Y ^ e(t* - 0, x*;Ak) + Z, therefore using (1.7) we obtain

p(t*, x*) - e(t* - 0, x*; Ak) < 2Z = (4.10)

Let x[t*[-]0] be a motion of system (1.1), generated from the position (t*,x*), when the first player forms their control actions according to the law U° = {u°(-),e, Ak} on the basis of the optimal minimax strategy u0(-), while the second player on every step j = 1, k forms their realization v[Tj [■}Tj+1) by means of the v-stability property (Lemma 3) using the information about the realized position (Tj,x(Tj)) and of the constant control realization u(t) = u°(Tj,x(Tj),e) of the first player that was assigned for the interval [Tj,Tj+1). Then for this motion inequality (1.6) holds. Moreover, by induction from j = 1 to j = k + 1 and on the basis of the inequality

e(Tj + 0, x(Tj); A ¡j) < e(T3+i - 0, x(Tj+i); A j))

which is valid due to the choice of v[Tj[■]tj+1) and equality (3.8), it can be proved that along this motion the following inequality holds as well

h(Tj )-1

£ \\DM(x(S) - gM)\\2 + e2(Tj - 0,x(Tj);Akj j)) ^ e2(t* - 0,x*;Ak). (4.11)

i=h(t*)

From (4.11) for j = k + 1, taking (2.5) into consideration, we derive that for the realized value

Y of quality index (1.4), the inequality 7 ^ e(t* - 0,x*;Ak) holds. Hence, from (1.6) we conclude that

p(t*,x*) - e(t* - 0, x*;Ak) ^ -Z = -£/2. (4.12)

Inequalities (4.10) and (4.12) prove Theorem 1.

Remark 1. In a similar way with clear modifications it can be checked that if in the procedure in definition of function A^j(■) (2.2) the operations of minimum and maximum are exchanged, then value e(-) (2.4), constructed on the basis of such modified procedure, will approximate the function of the value of differential game (1.1), (1.4) in classes of "counter strategies - strategies".

Remark 2. On the basis of value e(-) (2.4) by means of the extremal shift to accompanying points [1,5] one can construct Z-optimal control laws of the players (see [5,13]), that guarantee inequalities (1.6) and (1.7).

5. Example

The example considered below is based on a model problem from [12, p. 49-58] (see also [5, section 38]).

Consider a dynamical system described by the following equation dx\/dt = x2,

dx2/dt = —te°-2tx1 — 0.02e°-2tx2 — 1.8(u cos v1 — u2 sin v1) + e°-2tv2, dx3/dt = x4,

dx4/dt = —te°-2tx3 — 0.02e°-2tx4 — 1.8(u1 sin v1 + u2 cos v1) + e°-2tv3, 0 ^ t ^ 4, x = (x1, x2, x3, x4) G R4, u = (U1, U2) G P = {(1, 0), (0, 1), (—1, 0), (0, —1)} C R2, v = (v1, v2, v3) G Q = {(v1, v2, v3) G R3 : v1 G {—n/4, n/4}, v% + vj ^ 1} C

Initial condition

x(0) = (1, —1, 1, 1),

and quality index

7 = (|x1(2) + 0.512 + |x3 (3) + 2|2 + |x1 (4) |2 + M4) — 2\2)1/2

(5.13)

(5.14)

are given.

The control problem for system (5.13) with quality index (5.14) was solved by means of constructions described above. Results of numerical modeling are the following. In numerical experiments we used uniform partition of time segment [0,4] with the step 5 = 0.02 and the value of accuracy parameter e = 0.2. The a priori calculated value of differential game (5.13), (5.14) in classes "strategies - counter strategies" was pu & 2.46, while in classes "counter strategies - strategies" was pv & 1.52.

X3 2

1 -

-1 -

-2 -

X3 2

1

-1

-2

-2-10 1 x1 -3-2-10 1 2 x1

0

0

Figure 1. Results of numerical modeling

In the picture on the left the narrow curve depicts the motion trajectory of system (5.13) which was formed in the result of actions of (-optimal control laws of the first and the second players in

classes "strategies - counter strategies". The realized value of quality index (5.14) was

Y = (| - 1.55 + 0.512 + | - 0.91 + 2|2 + \- 1.16|2 + |0.75 - 2|2)1/2 w 2.28 w pu.

The thick curve depicts the motion trajectory which was formed in the result of actions of (-optimal control laws of the first and the second players in classes "counter strategies - strategies". The realized value of the quality index was

Y = (| - 0.65 + 0.512 + | - 1.18 + 2|2 + ^ 0.40|2 + |0.78 - 2|2)1/2 w 1.53 w pv.

In the picture on the right, the narrow curve depicts the motion trajectory of system (5.13) that was formed in the result of actions of (-optimal control law of the second player in classes "strategies - counter strategies", while the control actions of the first player were chosen randomly. The realized value of the quality index was Y w 4.51 > pu. The thick curve depicts the motion trajectory that was formed in the result of actions of (-optimal control law of the first player in classes "counter strategies - strategies", while the control actions of the second player were chosen randomly. The realized value of the quality index was y w 0.12 < pv.

The targets are shown in the pictures by small black squares. Points on the trajectories correspond to the moments of motion quality evaluation.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

REFERENCES

1. Krasovskii N.N. Control of a Dynamic System. Problem about the Minimum of the Guaranteed Result. Moscow: Nauka, 1985. (in Russian)

2. Isaacs R. Differential Games. New York: John Wiley, 1965.

3. Krasovskii N.N. On the problem of unification of differential games // Doklady AN SSSR. 1976. Vol. 226, no. 6. P. 1260-1263.

4. Krasovskii A.N. Construction of mixed strategies on the basis of stochastic programs // J. Appl. Math. Mech. 1987. Vol. 51, no. 2. P. 144-149.

5. Krasovskii A.N. Krasovskii N.N. Control under Lack of Information. Berlin etc.: Birkhauser, 1995.

6. Subbotin A.I. Minimax Inequalities and Hamilton-Jacobi Equations. Moscow: Nauka, 1991. (in Russian)

7. Subbotin A.I. Generalized Solutions of First-Order PDEs. The Dynamical Optimization Perspective. Boston: Birkhaauser, 1995.

8. Subbotin A.I. Existence and Uniqueness Results for Hamilton-Jacobi Equations // Nonlinear Anal. 1991. Vol. 16, no. 7/8. P. 683-699.

9. Lukoyanov N.Yu. One differential game with nonterminal payoff // Izvestiya akademii nauk. Teoriya i sistemi upravleniya. 1997. No. 1. P. 85-90.

10. Lukoyanov N.Yu. The problem of computing the value of a differential game for a positional functional // J. Appl. Math. Mech. 1998. Vol. 62, no. 2. P. 177-186.

11. Blagodatskikh V.I., Filippov A.F. Differential inclusions and optimal control // Proc. Steklov Inst. Math. 1986. No. 4. P. 199-259.

12. Krasovskii A.N., Reshetova T.N. Control under Information Deficiency: Study Guide. Sverdlovsk: UrGU, 1990. (in Russian)

13. Kornev D.V. On numerical solution of positional differential games with nonterminal payoff // Automation and Remote Control. 2012. Vol. 73, no. 11. P. 1808-1821.

i Надоели баннеры? Вы всегда можете отключить рекламу.