UDC 518.9 Vestnik of St. Petersburg University. Serie 10. 2014. Issue 4
S. V. Chistyakov, F. F. Nikitin
ON REGULAR DIFFERENTIAL GAMES OF PURSUIT WITH FIXED DURATION
St. Petersburg State University, 7/9, Universitetskaya embankment, St. Petersburg, 199034, Russian Federation
In any differential game the programmed maxmin is a guaranteed payoff of first player. For a long time, due to the simplicity of geometric interpretation of programmed maxmin and difficulties of implementation for Isaacs' method, programmed maxmin was extensively studied. Researchers were interested in finding conditions under which programmed maxmin is the value of differential game. These conditions are called regular conditions. Differential games satisfying these conditions are called regular games. The programmed iteration method could be considered a non-smooth version of the dynamic programming method. Initially the programmed iteration method was aimed at studying non-regular differential games. Later it became obvious that the scope of application of programmed iteration method is wider. For example based on results of the programmed iteration method the theory of differential games could be built. One more example is provided in this article. Based on results of programmed iteration method, theorem on convex-concave functions and the theorem on measurable selector of multi-valued map we provide simple proof of well-known regular condition for linear differential game of approach with fixed duration. Bibliogr. 14.
Keywords: differential games, zero-sum games, regular games, programmed iteration method.
С. В. Чистяков, Ф. Ф. Никитин
О РЕГУЛЯРНЫХ ДИФФЕРЕНЦИАЛЬНЫХ ИГРАХ ПРЕСЛЕДОВАНИЯ С ОГРАНИЧЕННОЙ ПРОДОЛЖИТЕЛЬНОСТЬЮ
Санкт-Петербургский государственный университет, Российская Федерация, 199034, Санкт-Петербург, Университетская наб., 7/9
В любой дифференциальной игре величина программного максимина является гарантированным выигрышем первого игрока. Долгое время, по причине ее простого геометрического смысла в играх преследования и сложности реализации метода Айзекса, эта величина была предметом исследований, целью которых был поиск условий, позволяющих утверждать, что при их выполнении она является также и тем проигрышем, более которого заведомо мог бы не проиграть и второй игрок. Эти условия принято называть условиями регулярности, а игры, в которых они выполняются, — регулярными играми. Таким образом, условия регулярности гарантируют, что величина программного максимина есть значение дифференциальной игры. В истоках метода программных итераций, представляющего собой негладкую версию метода динамического программирования, лежат исследования нерегулярных дифференциальных игр, в которых величина программного максимина значением игры не является. Вместе с тем развитие метода программных итераций показало, что его возможности существенно шире. В частности, он может быть положен в основу построения теории дифференциальных игр в целом. Еще одна иллюстрация этого положения приводится в представляемой статье, где на основе результатов
Chistyakov Sergei Vlo,d,imirovich — doctor of physical and mathematical sciences, professor; e-mail: [email protected]
Nikitin Fedor Fedorovich — candidate of physical and mathematical sciences, assistant; e-mail: [email protected]
Чистяков Сергей Владимирович — доктор физико-математических наук, профессор; e-mail: [email protected]
Никитин Федор Федорович — кандидат физико-математических наук, ассистент; e-mail: [email protected]
метода программных итераций, теоремы о минимаксе для выпукло-вогнутых функций и теоремы об измеримом селекторе многозначного отображения предложено простое обоснование известного условия регулярности в линейной игре сближения в заданный момент времени. Библиогр. 14 назв.
Ключевые слова: дифференциальные игры, игры с нулевой суммой, регулярные игры, метод программных итераций.
1. Introduction. In zero-sum differential game programmed maxmin [1] is the guaranteed payoff for maximizing player. For long time due to simple geometric interpretation of programmed maxmin function [2] and difficulties in applying Isaacs method [3] research in differential games was focused on finding conditions under which programmed maxmin is guaranteed payoff for the second player as well. Such conditions are called regularity conditions and games which posses the property of regularity are called regular games. In other words in regular differential game the value of the game is equal to programmed maxmin.
In the beginning the method of programmed iterations [4-7] was developed for non-regular differential games and was considered as non-smooth version of dynamic programming method. For these games programmed maxmin does not coincide with the value of the game. Later it turned out that the scope of applications of programmed iteration method is wider [8-12]. Particularly, based on results of programmed iteration method the theory of differential games can be developed [12]. In this paper we demonstrate how known regularity conditions for linear differential games with fixed duration is derived based on programmed iteraton method, the theorem on convex-concave functions [13] and the theorem of measurable selector of multi-valued maps [14].
Consider the game TT(t0,x0,y0), where pursuer P (in the space {x} = M") and evader E (in the space {y} = Mm), start from positons x(t0) = x0 and y{t0) = y0 and move according to the system of linear differential equations
dx
— =A{t)x + B{t)u + № (1)
dt
and
^ = C(t)y + D(t)v + g(t), (2)
here A(-), B(-), C(■), D(-) are continuous matrix functions of corresponding dimensions, f (■), §(■) are bounded and measurable vector functions and u, v are vectors of controls of players which are chosen from the sets
u e P G Comp Rp, v e Q e Comp Rq.
The payoff in the game is
H(x(T),y(T))= (j2(xt(T)-yt(T))2^j ,
where x(T ) = {xi(T ),..., xn(T )), y(T )= (y1(T ),..., ym(T )), k < min{n,m}.
Both players make their control decisions based on full information about the game, i.e. they know the system of differential equations, initial conditions and position (t,x(t)) for any moment t e [to, T].
We consider the game in the set of positional strategies [1]. However, last assumption is not important and the game could be formulated in other classes of strategies too [2].
2. Regularity criterion. Let Ct(t*,x*) (correspondingly Ct(t*7y*)) be the set of all positions achievable to the moment t from initial state x* = x(t*) (y = y(t*)) using measurable controls u(-) (v(-)) with values u(t) e U (v(t) e V) for almost all t e [t*,t]. The sets are called reachability sets and they are compact.
Let D C ( —<x>, T] x 1" x 1m be the set which with every position (t*,x*,y*) contains also the set
D(U,x*y) = {(t,x,y) e [U,T] x R" x Rm | x e Ct(tt,xt),y e Ct(t*7y*)} .
Obviously the set D = D(t0,x0,y0) satisfies this property. We embed the game rT(to,xo, y0) in the set of games
rT(D) = {rT(U,x*,y*)l(U,x*y) e D} ,
where each element of the set is differential game with different initial state. Every game
rT (t
* 7 x* 7 y*
), (t
*7 x*7 V* ) e D has a value [1]. Function
w(-): (t*, x*, y*) ^ w(t*,x* 7 y*) 7
maps every position (t* 7x*7y*) e D to the value w(t* 7x*7y*). This function is called the value function of game rT(D). Game rT(D) is regular if the value function is equal to programmed maxmin function
w—\-):(t*7x* 7 y*) ^ w_ (t * 7 x* 7 y*) max min H(x7y)7 (3)
yeoT (t ) xeo T (t„,x„)
(t* 7 x*7 y*) e D.
Define operator $_: C(D) ^ C(D)7 so that for any function w(-) e C(D) and any position (t*7x* 7 y*) e D [6]
ow(t*7x*7y*)= max max min w(t.x.y). (4)
te[t*,T] yeat(t*,yt) xeo*(tt,x*)
here o w(t*7x*7y*) is the value of image of function w(-) in position (t*7x*7y*).
The following theorem could be easily proved based on facts from the method of programmed iterations
Theorem 1. Game rT(D) is regular if and only if the function of programmed maxmin is a fixed point of operator
3. Sufficient condition for regularity. Let (x)k ((y)k) be projection of vector x e 1" (y e 1m) on space 1k. The function
H(x7 y) = (xi - yi)2J
could be expressed as
H(x7 y) = max (I7 (x)k - (y)k)7
i <1
where (■, ■) and || ■ || is scalar product and Euclidean norm in Rk correspondingly. Using this expression in (2) and minmax theorem for convex-concave functions [13] we get
w(°\u,x*,y*) = max min max (l, (x)k - (y)k) =
yecT(t*,y*) xeoT(t*,x*) ienk 1ШЮ
= max max min (l, (x)k — (y)k). (5)
y£CT (t* ,y*) xECT (t*,x*y '
1ШЮ
Due to Cauchy formula there exists control u(^) such that
t
x = W (t,U)x* +J W (t,r )[B(r )u(r ) + f (r )] dr. (6)
t
Here W(t, t) = X(t)X— 1(t) and X(t) is fundamental matrix of solutions for homogeneous linear differential equatios corresponding to (1). Similarly for y e C*(t*,y*) there exists such control v(-) of player E such that
t
y = Z (t,U)y* +J Z (t,T )[D(t )v(t )+ g(T)] dT, (7)
t
where Z(t, t) = Y(t)Y 1(t) and Y(t) is fundamental matrix of solutions for homogeneous linear differential equations corresponding to (2). Thus if we define
h(T) = (W(T,T)f(t)) k - (Z(T,T)g(T))k (8)
and
T
p(l,t,x,y) = (l, (W(T,t)x)k - (Z(T,t)y)k + J h(T) dT), (9)
t
then from (6)-(9) follows
(0)u i
w— '(U,x* ,yt) =
T
= max max mini p(l,U,x*,y*) + (l, (W (T,t)B(t)u(t )),-(Z (T,t)D(t )v(t )). W
ieRk v(-) u(-) I / \ k k/
l|i||<i tt
T
= mRX \p(l,t*,x* ,y*)+maxminj (l, (W (T,t)B(t)u(t )) k - (Z (T,t)D(t )v(t )) ^ da, imNsi v() u() tt
(10)
here maximum and minimum is taken over all controls u(-) and v(-) of players E and P. Note that
T
maxmin f (l, (W (t,t)b(t)u(t )) k - (Z (t,t)d(t )v(t )) a dT =
v( ) u( )
t
T T
mm J (¿7 (W(T7t)B(t)u(t))^ dt - m^ j (l7 (ZTt)D(t)v(t)) k) dT
t* t* T T
mm(l7 (W(T 7 T)B(T)U)^ dT -J m&xll, (Z(T,T)D(T)V) dT = tt
T
=J maxmn(17 wt t)b(t)u)k - (zt t)d(t)v)k) dT. (11)
t*
Equations (11) are valid due to the fact that every multi-valued map
T ^ {u* e P l (l7 (W(T 7 T)u*)k) =mn (17 (W(T 7 T)u)k)}
u e
and
T ^ \v* e Q l d 7 (W (T7 t)v*),) =
v e Q
is upper semi-continuous and, hence, posses measurable selector [14]. From (10) and (11) we get
e Q | (l7 (w(t7t)v*)k) = ^^q (l7 (w(Tt)v)k
(0)(t w_ \v*7 x* 7 y*
T
max\p(l7t*7x*7y*)+ f maxmi^ ¿7 (W Tt)B(t )u), - (Z (T7t)D(t)v).\ dA. (12)
ienk I J veQ ueP\ k k / I
111 m < ^ t,
Theorem 2. If for any t e (-M7T] function
t(l7T)=ma%mpn( l7 (w (T7 t)B(t )u) k - (Z (T7 t)D(t )v)k) (13)
is concave in l e Rk7 ||l|| ^ 1 then game rT(D) is regular.
Proof. By the theorem 1 it is enough to proof that programmed maxmin function w_\-) is fixed point of operator $_ : C(D) ^ C(D).
Let (t*7 x*) e D be an arbitrary point. From (4), (12) and (13) follows that
$_ o w— (t* 7 x* 7 y* ) =
T
min max < p(l7t7x7y)+ I t) dT>. (14)
7t(t*,x*) l£Rk { J J
= max max
te[t*,T] yeot(t*,y*) xeot(t*,x*) i
III II <1
From (10) we have that function p(l7t7x7y) is linear in l, x and y. Then due to concavity of function in l e 1k7 ||l|| ^ I7 the function
T
X(l7t7x7y)= P(l7t7x7y)+ j j>(l7T) dT
is convex in x e Ct(t*7x*) and concave in l e 1k7 ||l|| ^ 1. Then, we can interchange max and min operations in right-hand side of (14). Hence
O J^t, x,.y,)=m»xm** mn Ut,*,y)+f W,r) dr},
itRk te[t*,T] yeot(t*,y*) xeot(t*,x*) I J I
II i m < 1 t
and due to (10)
$_ o w— (t* 7 x*7 y*) =
T T
= max max max min IIl 7 (W (T 7 t)x),-(Z(T7 t)y), + WtWtV/ ¿(LT)dr[
iR te[t*,T] yeot(t*,y*) xeot(t*,x*)\\ ' v v ' y >k v v ' >y,k J w / J ' y J
Mi M <1
or
$_ o w— (t* 7 x*7 y*) =
maT] {x e oimtin,x*) (l 7(w (T t)x)k)- v e max^ (17(z (t7 ^) +
T T
+ (l 7 J h(r) dr} + j j>(17 T) dr). (15)
tt
Let us note that W (T 7 t)W (t 7T) = W (T 7 r). Then from (7) we get
x e omtn xA17(w (T7 t)x)k) = t
= mm( 17 (W (T7 t*)x*)k + J' [(W (T 7 t)B(t )u(r)) k + (W (T7 r)f (r ))J dr) =
t
t t = (l 7 (w (T 7 t*)x* )k + j (w (T 7 T)f (r ))k dr) +J mini 17 (w (T 7 t)B(t )u)k) dr7 (16)
u el
tt
where previous to the last equiality is justified same way as in (11). Similarly, we have
mx«.)(l 7(z (t7 t)y)k) =
y ot(t*,y*)
t
= (l7 (Z(T 7 t*)y*)k +J (Z(T 7 r)f (r))k dr) + J meax(17 (Z(T7 r)D(r)v)k ) dr. (17)
"* y* k
k .J ' k ' J v e
t* t*
From (16) and (17) taking into account (9) and (13) we get that
k t(
x e om^x*) (l 7 w(T7t)xk) - yem^yA17 (Z(T7 t)^k) =
t t
(l 7 (w (T 7 t*)x*)k - (Z (T 7 t*)y*)k) + (l 7J h(T) dr) + J t(l 7T) dr.
i <1
Substituing this expression in (15) we conclude
$_ o w(0) (t*,x*,y*) =
max
l£Rk t£it„
pIKi
ma^j (l, W(T,t*)x*)k - (Z(T,t*)y*)k} + (l, J h(r) dr) +J $(l,r) dr +
t.t t.t
T T
+ (l,J h(T ) dr) + j $(l,r ) drj =
tt
T T
= nmx< (l, (W(T,U)x*)k - (Z(T,t*)y*)k + i h(r) dr^ + ( j>(l,r) dr\.
II 1 i U J
From this, (10), (12) and (13) it follows that
o w— (t*,x*,y*) = w—\u,x*,y*).
Due to position (t*,x*,y*) e D was chosen arbitrary the last equations means that programmed maxmin function w) is indeed fixed point of operator $_. End of proof.
Note. Obviously conditions of theorem 2 are satisfied when A(-) = C(■), B(-) = D(-) and Q = a + (3P, where p > 1.
4. Conclusions. In this paper it was demonstrated how based on results of programmed iteration method, the theorem on convex-concave functions and theorem of measurable selector of multi-valued maps regularity conditions for linear differential games with fixed duration are derived.
References
1. Krasovskii N. N., Subbotin A. I. Game-theoretical control problems. London: Springer, 2011, 532 p.
2. Petrosyan L. A. Differencial'nye igry presledovanija (Differential pursuit games). Leningrad: Izd-vo Leningr. un-ta, 1977, 222 p.
3. Isaacs R. Differential games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York: John Wiley and Sons, Inc., 1965, 384 p.
4. Chentsov A. G. O strukture odnoj igrovoj zadachi sblizhenija (The structure of an approach problem). Dokl. Akad. Nauk of the USSA, 1975, vol. 224, pp. 1272-1275.
5. Chentsov A. G. Ob igrovoj zadache sblizhenija v zadannyj moment vremeni (On differential game of approach). Mat. sb., 1976, vol. 99, issue 3, pp. 394-420.
6. Chistyakov S. V., Petrosyan L. A. Ob odnom podhode k resheniyu igr presledovanija (On one approach for solutions of games of pursuit). Prikl. Mat. Mekh., 1977, vol. 41, pp. 825-832.
7. Chistyakov S. V. K resheniyu igrovyh zo,d,o,ch presledovanija (On solutions for game problems of pursuit). Prikl. Mat. Mekh., 1977, vol. 41, pp. 825-832.
8. Chistyakov S. V. O funkcional'nyh uravnenijah v igrah sblizhenija v zadannyj moment vremeni (On functional equations for differential games with fixed duration). Prikl. Mat. Mekh., 1982, vol. 41, pp. 874-877.
9. Chistyakov S. V. Progpammnye iteracii i universal'nye e-optimal'nye strategii v pozicionnoj differencial'noj igre (Programmed iterations and universal e-optimal strategies in positional differential game). Dokl. Akad. Nauk of the USSA, 1991, vol. 319, pp. 1333-1335.
10. Chentsov A. G., Subbotin A. I. Iteracionnaja procedura postroenija minimaksnyh i vjazkostnyh reshenij uravnenija Gamil'tonana-Jakobi (An iterative procedure for constructing minimax and viscosity solutions for the Hamilton-Jacobi equations and its generalization). Proc. Steklov Inst. Math., 1999, vol. 224, pp. 286-309.
11. Chistyakov S. V. Operatory znachenija v teorii differencial'nyh igr (Value operators in the theory of differential games). Izv. IMI Udm. State University, 2006, vol. 37, issue 3, pp. 169-172.
t
12. Chistyakov S. V., Nikitin F. F. Teorema sushhestvovanija i edinstvennosti reshenija obobshhennogo uravnenija Ajzeksa-Bellmana (Existence and uniqueness theorem for a generalized Isaacs-Bellman equation). Differential Equations, 2007, vol. 43, no. 6, pp. 757-766.
13. Fan Ky. Minimax theorem. Proc. Nat. Acad. Sci. USA, 1953, vol. 39, no. 1, pp. 42-47.
14. Castaing C., Valadier M. Convex analysis and measurable multifunctions. New York: SpringerVerlag, 1977, 277 p.
Литература
1. Krasovskii N. N., Subbotin A. I. Game-theoretical control problems. London: Springer, 2011. 532 p.
2. Петросян Л. А. Дифференциальные игры преследования. Л.: Изд-во Ленингр. ун-та, 1977. 222 с.
3. Isaacs R. Differential games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. New York: John Wiley and Sons, Inc., 1965. 384 p.
4. Ченцов А. Г. O структуре одной игровой задачи сближения // Докл. АН СССР. 1975. Т. 224. С. 1272-1275.
5. Ченцов А. Г. Об игровой задаче сближения в заданный момент времени // Мат. сб. 1976. Т. 99, вып. 3. С. 394-420.
6. Чистяков С. В., Петросян Л. А. Об одном подходе к решению игр преследования // Прикл. математика и мeханика. 1977. Т. 41. С. 825-832.
7. Чистяков С. В. К решению игровых задач преследования // Прикл. математика и мeханика. 1977. Т. 41. С. 825-832.
8. Чистяков С. В. O функциональных уравнениях в играх сближения в заданный момент времени // Прикл. математика и мeханика. 1982. Т. 41. С. 874-877.
9. Чистяков С. В. Про^аммные итерации и универсальные е-оптимальные стратегии в позиционной дифференциальной игре // Докл. АН СССР. 1991. Т. 319, № 6. С. 1333-1335.
10. Ченцов А. Г., Субботин А. И. Итерационная процедура построения минимаксных и вязкостных решений уравнения Гамильтонана-Якоби // Proc. Steklov Inst. Math. 1999. T. 224. C. 286309.
11. Чистяков С. В. Операторы значения в теории дифференциальных игр // Изв. Ин-та математики и информатики Удмурт. гос. ун-та. 2006. Т. 37, вып. 3. С. 169-172.
12. Чистяков С. В., Никитин Ф. Ф. Теорема существования и единственности решения обобщенного уравнения Айзекса-Беллмана // Дифф. уpавнения. 2007. Т. 43, № 6. С. 757-766.
13. Fan Ky. Minimax theorem // Proc. Nat. Acad. Sci. USA. 1953. Vol. 39, N 1. P. 42-47.
14. Castaing C., Valadier M. Convex analysis and measurable multifunctions. New York: SpringerVerlag, 1977. 277 p.
The article is received by the editorial office on June 26, 2014.
Статья поступила в редакцию 26 июня 2014 г.