Научная статья на тему 'Strong strategic support of cooperative solutions in differential games'

Strong strategic support of cooperative solutions in differential games Текст научной статьи по специальности «Математика»

CC BY
5
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
STRONG NASH EQUILIBRIUM / TIME-CONSISTENCY / CORE / COOPERATIVE TRAJECTORY

Аннотация научной статьи по математике, автор научной работы — Chistyakov Sergey, Petrosyan Leon

The problem of strategically provided cooperation in n-persons differential games with integral payoffs is considered. Based on initial differential game the new associated differential game (CD-game) is designed. In addition to the initial game it models the players actions connected with transition from the strategic form of the game to cooperative with in advance chosen principle of optimality. The model provides possibility of refusal from cooperation at any time instant t for each player. As cooperative principle of optimality the core operator is considered. It is supposed that components of an imputation form the core along any admissible trajectory are absolutely continuous functions of time. In the bases of CD-game construction lies the so-called imputation distribution procedure described earlier in (Petrosjan and Zenkevich, 2009). The theorem established by authors says that if at each instant of time along the conditionally optimal (cooperative) trajectory the future payments to each coalition of players according to the imputation distribution procedure exceed the maximal guaranteed value which this coalition can achieve in CD-game, then there exist a strong Nash equilibrium in the class of recursive strategies first introduced in (Chistyakov, 1999). The proof of this theorem uses results and methods published in (Chistyakov, 1999, Chentsov, 1976).

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Strong strategic support of cooperative solutions in differential games»

Strong Strategic Support of Cooperative Solutions in Differential Games

Sergey Chistyakov and Leon Petrosyan

St. Petersburg University,

Faculty of Applied Mathematics and Control Processes,

35 Universitetsky prospekt, St. Petersburg, 198504, Russia E-mail: [email protected]

Abstract. The problem of strategically provided cooperation in n-persons differential games with integral payoffs is considered. Based on initial differential game the new associated differential game (CD-game) is designed.

In addition to the initial game it models the players actions connected with transition from the strategic form of the game to cooperative with in advance chosen principle of optimality. The model provides possibility of refusal from cooperation at any time instant t for each player. As cooperative principle of optimality the core operator is considered. It is supposed that components of an imputation form the core along any admissible trajectory are absolutely continuous functions of time. In the bases of CD-game construction lies the so-called imputation distribution procedure described earlier in (Petrosjan and Zenkevich, 2009). The theorem established by authors says that if at each instant of time along the conditionally optimal (cooperative) trajectory the future payments to each coalition of players according to the imputation distribution procedure exceed the maximal guaranteed value which this coalition can achieve in CD-game, then there exist a strong Nash equilibrium in the class of recursive strategies first introduced in (Chistyakov, 1999). The proof of this theorem uses results and methods published in (Chistyakov, 1999, Chentsov, 1976).

Keywords: strong Nash equilibrium, time-consistency, core, cooperative trajectory.

1. Introduction

Similar to (Petrosjan and Zenkevich, 2009) in this paper the problem of strategically

support of the cooperation in differential m-person game with prescribed duration

T and independent motions is considered.

dx(i)

—jjT = xw, ww), i e I = [1; to],

(1)

x(i) e Rn(i),u(i) e P(0 e CompRk(i)

(2)

The payoffs of players i G I = [1 : m] have integral form

Here u(-) = (u(1')(-),...,u(m )(-)) is a given m-vector of open loop controls, x(t,to,xo,u(-)) = (^x(1^(t,to, xo, u(1)(-)),.. .,x(m)(t,to,xo,u(m)(-))^ ,

where x(i)(•) = x(-,to,x0)i),u(i)() is the solution of the Cauchy problem for i-th subsystem of (1) with corresponding initial conditions (2) and admissible open loop control u(i)() of player i.

Admissible open loop controls of players i G I are Lebesgue measurable open loop controls

u(i)(•) : t ^ u(i)(t) G Rk(i)

such that

u(i)(t) G P(i) for all t G [to,T].

It is supposed that each of the functions

f(i) : R x Rk(i) x P(i) ^ Rk(i), i G I

are continuous, locally Lipschitz with respect to x(i) and satisfies the following condition: 3A(i) > 0 such, that

\\f(i) (t,x(i),u(i) )|| < A (i)(1 + \\x(i) ||) yx(i) G Rk(i), Vu(i) G P(i).

Each of the functions

h(i) : R x Rk(i) x P(i) ^ R, i G I

are also continuous.

It is supposed that at each time instant t G [to ,T] the players have information about the trajectory (solution) x(i)(r) = x(r,to,xo,u(i)( )) of the system (1), (2) on the time interval [to,t] and use recursive strategies (Chistyakov, 1977, Chistyakov, 1999).

2. Recursive strategies

Recursive strategies were first introduced in (Chistyakov, 1977) for justification of dynamic programming approach in zero sum differential games, known as method of open loop iterations in non regular differential games with non smooth value function. The e-optimal strategies constructed with the use of this method are universal in the sense that they remain e-optimal in any subgame of the previously defined differential game (for every e > 0). Exploiting this property it became possible to prove the existence of e-equilibrium (Nash equilibrium) in non zero sum differential games (for every e > 0) using the so called ’’punishment strategies” (Chistyakov, 1981).

The basic idea is that when one of the players deviates from the conditionally optimal trajectory other players after some small time delay start to play against the deviating player. As result the deviating player is not able to get much more than he could get using the conditionally optimal trajectory. The punishment of the deviating player at each time instant using one and the same strategy is possible because of the universal character of e-optimal strategies in zero sum differential games.

In this paper the same approach is used to testify the stability of cooperative agreements in the game r(to, xo) and as in mentioned case the principal argument is the universal character of e-optimal recursive strategies in specially defined zero sum games rs(to,xo), S C I associated with the non-zero sum game r(to,xo).

The recursive strategies lie somewhere in-between piecewise open loop strategies (Petrosjan, 1993) and e-strategies introduced by B. N. Pshenichny (Pschenichny, 1973). The difference from piecewise open loop strategies consists in the fact that like in the case of e-strategies of B. N. Pshenichny the moments of correction of open loop controls are not prescribed from the beginning of the game but are defined during the game process. In the same time they differ from e-strategies of B. N. Pshenichny by the fact that the formation of open loop controls happens in finite number of steps.

(n)

Recursive strategies Ui of player i with maximal number of control corrections n is a procedure for the admissible open loop formation by player i in the game r(to, xo), (to, xo) G D.

At the beginning of the game r(to, xo) player i using the recursive strategy U(n) defines the first correction instant t^ G (to,T] and his admissible open loop control u(i) = u(i )(t) on the time interval [t^t^]. Then if t^ < T having the information about state of the game at time instant t^ he chooses the next moment of correction t^ and his admissible open loop control u(i') = u(i)(t) on the time interval (t^,ttp] and so on. Then whether on k-th step (k < n — 1) the admissible control will be formed on the time interval [t*,T] or on the step n player i will end up with the process by choosing at time instant t^)1 his admissible control on the remaining time interval (t^l^T].

3. Associated zero sum games and corresponding solutions

For each given state (t*,x*) G D and non void coalition S C I consider zero sum differential game rs(t* ,x*) between coalition S and I\S with the same dynamics as in r (t*, x*) and payoff of the coalition S equal to the sum of payoffs of the players

i G S in the game r(t* ,x*):

^ HUx, (u(S)0),u(AS)(•)) = ^ Hux, (u()) = ^ f h(i(t,x(t),u(t))d-t ies ieS ieS^*0

here

u(S)() = {u(i)(-)}ies,

u(1\s) (•) = {u(j\^)}jei\s,

u^) = (u(S)(^u(I\S)(^ = (u(1)(^...,u(m)^).

The game Ps(t*,x*), S C I, (t*,x*) G D, as r(t*,x*), (t*,x*) G D we consider in the class of recursive strategies. Under the above formulated conditions each of the games Ps(t*, x*), S C I, (t*,x*) G D has a value

vaWs (t* ,x*).

If S = I the game rs(t* ,x*) became an one player optimization problem. We suppose that in this game there exist an optimal open loop solution. The corresponding trajectory — solution of (1), (2) on the time interval [to,T] we denote

by

xo(•) = (xo1) t),...,x{m) (^)) and call ” conditionally optimal cooperative trajectory”. This trajectory may not be necessary unique. Thus on the set D the mapping

v(-) : D ^ R2

is defined with coordinate functions

vs(•): D ^ R, S C I,

vs (t*,x*) = vaWs (t*,x*).

This mapping correspond to each state (t*,x*) G D a characteristic function v(t*,x*) :

21 ^ R of non zero-sum game r(t*,x*) and thus m-person classical cooperative game (I,v(t*,x*)).

Let E(t*,x*) be the set of all imputations in the game (I,v(t*,x*)). Multivalue mapping

M : (t*,x*) ^ M(t*,x*) C E(t*,x*) C Rm,

M(t*,x*)= A y(t*,x*) G D,

is called ”optimality principle” (defined over the family of games r(t*,x*), (t*,x*) G D) and the set M(t*,x*) ” cooperative solution of the game r(t*,x*) corresponding to this principle”.

As it follows from (Fridman, 1971) under the above imposed conditions the following Lemma holds.

Lemma 1. The functions vs(•) : D ^ R,S G I, are locally Lipschitz.

Since the solution of the Cauchy problem (1), (2) in the sense of Caratheodory is absolutely continuous, from Lemma 1 it follows.

Theorem 1. For every solution of the Cauchy problem (1), (2) in the sense of Caratheodory

x(0 = (x(1)0),...,x(m) (•)), corresponding to the m-system of open loop controls

u(0 = (u(1)(0,...,u(m) (•))

(x(i)(•) = x(^,to,x(oi),u(i)(•)), i G I),

the functions

ps : [to, T] ^ R, S C I, ps(t) = vs(t, x(t)) are absolutely continuous functions on the time interval [to,T].

Suppose that M(t*, x*) is the core of the game r(t*,x*), and let the imputation £(t*,x*) = {£1(t*,x*),.. .,£m(t*,x*)} G M(t*,x*).

Then for each coalition S C I we have

y^Ai(t*, x*) > vs(t*, x*).

ies

4. Realization of cooperative solutions.

The realization of the solution of the game r(to, xo) we shall connect with the known ” imputation distribution procedure” (IDP) (Petrosjan and Danilov, 1979, Petrosjan,

Under IDP of the imputation £(to,xo) from the core M(to,xo) of the game r(to,xo) along conditionally optimal trajectory xo(-) we understand such function

where E(t, xo(t)) is the set of imputations in the game (I, v(t, xo(t))).

The IDP j3(t), t G [to,T] of the solution M(to,xo) of the game r(to,xo) is

called dynamically stable (time-consistent) along the conditionally optimal trajectory xo(-) if

The solution M(to, xo) of the game r(to, xo) is dynamically stable (time-consistent) if along at least one conditionally optimal trajectory the dynamically stable IDP exist.

Suppose that M(t,xo(t)) = 0, t G [to,T] (M(t,xo(t)) is the core in the subgame r(t,xo(t)) with initial conditions on conditionally optimal cooperative trajectory with duration T — t), and £(t,xo(t)) G M(t,xo(t)) can be selected as absolutely continuous function of t. Then the following theorem holds.

Theorem 2. For any conditionally optimal trajectory xo(•) the following IDP of the solution £(to,xo) G M(to,xo) of the game r(to,xo)

is the dynamically stable IDP along this trajectory. Therefore the solution £(to, xo) G M(to,xo) of the game r(to,xo) is dynamically stable.

5. About the strategically support of the imputation £(t0, x0) from the core M(t0, x0).

If in the game the cooperative agreement is reached and each player gets his payoff according to the IDP (8), then it is natural to suppose that those who violate this agreement are to be punished. The effectiveness of the punishment (sanctions) comes to question of the existence of strong Nash Equilibrium in the following differential game r^(to, xo) which differs from r(to, xo) only by payoffs of players.

The payoff of player i in r^(to, xo) is equal to

1995).

ft(t) = (ft1(t),...,fim(t)), t G [to,T],

(4)

that

(5)

and

Vt G [to, T]

(6)

Vt G [to, T]

(7)

!3{t) = x0(t)), t e [t0, T\,

(8)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

where t(u(-)) is the last time instant t G [to,T] for which

xo(t)= x(r,to,x,u(-)) Vt G [to,t].

In this paper we use the following definition of strong Nash equilibrium.

Definition 1. Let y = {I, , {Ki}ieI) be the rn-person game in normal form,

here I = [1 : m] is the set of players, Xi the set of strategies of player i and

Ki : X = X1 x X2 x---x Xm ^ R

the payoff function of player i. We shall say that in the game y there exists a strong Nash equilibrium if

Ve > 0 3x£ = (x 1 ,x2,..., xem) G X

such that

VS c I, Vxs G Xs = n Xi

i£S

^ Ki(xs,xj\s) - e < ^ Ki(xe), ies ies

where

xj\s = {xjW\s (xj\s G XI\s).

Let C(t^,x,t) be the core of the game (I,v(t*,x*)).

Theorem 3. In the game r^(to,xo) there exist a strong Nash equilibrium with outcomes (payoffs) of players in this equilibrium equal to

£(to, xo) = {£i(to, xo),..., £m(to,xo)j G M(to, xo).

The idea of the proof is following. Since £(to,xo) belongs to the core M(to,xo) of the game r(to,xo) we have

£&(t,xo(t)) > vs (t,xo(t)) VS C I y-t G [to, T] (9)

ies

This means that at each time instant t G [to,T] moving along conditionally optimal trajectory xo(-) no coalition can guarantee himself the payoff [t,T] more than according to IDP (8), i.e. more than

rT rT d

P(T~)dT = -J2 ^(T,x0(T))dT = ^2^i(t,xo(t)),

iesJt iesJt ies

in the same time on the time interval [to,t] according to the IDP she already got the payoff equal to

53 / i3i(T)dT = [ -^&(T’xo(T))dT = ^2^(to,xo)-^2&(t,xo(t))

ies')t° ies')t° ies ies

Consequently no coalition can guarantee in the game r^(to,xo) the payoff more than

(to,xo).

ies

According to the cooperative solution xo(-) but moving always in the game r^(to, xo) along conditionally optimal trajectory each coalition will get his payoff according to the imputation £(to,xo) from the core M(to,xo). Thus no coalition can benefit from the deviation from the conditionally optimal trajectory which in this case is natural to call ’’strongly equilibrium trajectory”.

References

Petrosjan, L. A., Zenkevich, N. A. (2009). Principles of stable cooperation. The mathematical games theory and its applications, 1, 1, 102-117 (in Russian).

Chistyakov, S. V. (1977). To the solution of game problem of pursuit. Prikl. Math. i Mech., 41, 5, 825-832, (in Russian).

Chistyakov, S.V. (1999). Operatory znacheniya antagonisticheskikx igr (Value Operators in Two-Person Zero-Sum Differential Games). St. Petersburg: St. Petersburg Univ. Press.

Chentsov, A. G. (1976). On a game problem of convering at a given instant time. Math.

USSR Sbornic, 28, 3, 353-376.

Chistyakov, S.V. (1981). O beskoalizionnikx differenzial’nikx igrakx (On Coalition-Free Differential Games). Dokl. Akad. Nauk, 259(5), 1052-1055; English transl. in Soviet Math. Dokl. 24, 1981, no. 1, pp. 166-169.

Petrosjan, L. A. (1993). Differential Games of Pursuit. World Scientific, Singapore. Pschenichny, B. N. (1973). E-strategies in differential games. Topics in Differential Games.

New York, London, Amsterdam, pp. 45-56 Fridman, A. (1971). Differential Games. John Wiley and Sons, New York, NY.

Petrosjan, L.A., Danilov, N. N. (1979). Stability of Solutions in nonzero-sum Differential Games with Integral Payoffs. Viestnik Leningrad University, N1, pp. 52-59.

Petrosjan, L. A. (1995). The Shapley Value for Differential Games. Annals of the International Society of Dynamic Games, Vol.3, Geert Jan Olsder Editor, Birkhauser, pp. 409-417.

i Надоели баннеры? Вы всегда можете отключить рекламу.