Научная статья на тему 'LOOKING FORWARD APPROACH WITH RANDOM HORIZON IN COOPERATIVE DIFFERENTIAL GAMES'

LOOKING FORWARD APPROACH WITH RANDOM HORIZON IN COOPERATIVE DIFFERENTIAL GAMES Текст научной статьи по специальности «Математика»

CC BY
9
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
DIFFERENTIAL GAMES / TIME-CONSISTENCY / PREDICTIVE CONTROL

Аннотация научной статьи по математике, автор научной работы — Petrosian Ovanes, Pogozhev Sergei

In the paper authors present a new approach to determination and computation of a solution for differential games with prescribed duration in the case when players lack certain information about the dynamical system and payoff function on the whole time interval on which the game is played. At each time instant players receive information about dynamical system and payoff functions, however the duration of the period of this information is unknown and can be represented as a random variable with known parameters. At certain periods of time the information is updated. A novel solution is described as a combination of imputation sets in the truncated subgames that are analyzed using Looking Forward Approach with random horizon. A resource extraction game serves as an illustration in order to compare a cooperative trajectory, imputations, and imputation distribution procedure in the game with Looking Forward Approach and in the original game with prescribed duration. Looking Forward Approach is used for constructing game theoretical models and defining solutions for conflict-controlled processes where information about the process updates dynamically.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «LOOKING FORWARD APPROACH WITH RANDOM HORIZON IN COOPERATIVE DIFFERENTIAL GAMES»

Contributions to Game Theory and Management, XIII, 360-387

Looking Forward Approach with Random Horizon in Cooperative Differential Games*

Ovanes Petrosian1'2 and Sergei Pogozhev1

1 St.Petersburg State University, Faculty of Applied Mathematics and Control Processes, Universitetskaya emb. 7/9, St.Petersburg, 198504, Russia E-mail: petrosian.ovanes@yandex.ru, s.pogozhev@spbu.ru 2 National Research, University Higher School of Economics, SA Kantem,irovskaya Street, Saint Petersburg, Russia E-mail: opetrosyan@hse.ru

Abstract In the paper authors present a new approach to determination and computation of a solution for differential games with prescribed duration in the case when players lack certain information about the dynamical system and payoff function on the whole time interval on which the game is played. At each time instant players receive information about dynamical system and payoff functions, however the duration of the period of this information is unknown and can be represented as a random variable with known parameters. At certain periods of time the information is updated. A novel solution is described as a combination of imputation sets in the truncated subgames that are analyzed using Looking Forward Approach with random horizon. A resource extraction game serves as an illustration in order to compare a cooperative trajectory, imputations, and imputation distribution procedure in the game with Looking Forward Approach and in the original game with prescribed duration. Looking Forward Approach is used for constructing game theoretical models and defining solutions for conflict-controlled processes where information about the process updates dynamically.

Keywords: differential games; time-consistency; predictive control.

1. Introduction

The cooperative differential game theory offers socially convenient and group efficient solutions to different decision problems involving strategic actions. One of the fundamental elements in this theory is the formulation of optimal behavior for players or economic agents. A design of cooperative strategy and the corresponding payoff, the manner to distribute the payoff between players, and the time consistency of the corresponding solution can be considered as main problems of this theory. Haurie analyzed the problem of dynamic instability of Nash bargaining solutions in differential games (Haurie, 1976). The notion of time consistency of differential game solutions was formalized mathematically by Petrosyan (Petrosyan, 1977). In the present research we examine a special case of cooperative differential games in which the game structure can change or update with time (time-dependent formulation) and assume that the players do not have information about the change of the game structure on the full time interval, but they have certain information

* The research of first author was supported by a grant from the Russian Science Foundation (Project No 18-71-00081).

about the game structure on the truncated time interval. However, the duration of this interval is unknown and it is supposed to be a random variable, so that the players only know the distribution parameters. Under the information about the game structure we understand information about the dynamical system and payoff functions. The interpretation can be given as follows: players have certain information about the game structure, but the duration of the period when this information is correct is unknown in advance. Evidently, this truncated information is valid only for certain time and has to be updated. In order to define the best possible behavior for players in this type of cooperative differential game, it is needed to develop a special approach, which we call the Looking Forward Approach with random horizon. The approach brings up the following points: how to define a cooperative trajectory, how to define a cooperative solution and allocate the cooperative payoff, and what properties the obtained solution will have. The object of this paper is to answer the stated questions. Offered solution is based on the IDP-core introduced in (Petrosian et al., 2016), it is built up using the particular class of imputation distribution procedures (IDP) (Petrosyan and Danilov, 1979). It is demonstrated that newly built solution is not only time-consistent (which is a very-rare event in the cooperative differential games), but also strong time-consistent.

The concept of the Looking Forward Approach is new in game theory especially in cooperative differential games and gives the foundation for further study of differential games with dynamic updating. At the moment there are practically no results in constructing approaches for modeling conflict-controlled processes where information about the process updates in time. In the present work we examine the Looking Forward Approach with random horizon which is one of the variations of Looking Forward Approach introduced in (Petrosian, 2016a). There we supposed that the duration of truncated information is a fixed value. To get more information about the approach one may read the following papers: (Petrosian et al., 2017; Yeung and Petrosian 2017; Gromova and Petrosian, 2016; Petrosian, 2016a; Petrosian, 2016b; Petrosian and Barabanov, 2017; Petrosian et al., 2019; Petrosian and Kuchkarov, 2019). In paper (Petrosian, 2016a) the Looking Forward Approach was applied to the cooperative differential game with finite horizon. The notion of truncated subgame, procedure for defining optimal strategies, conditionally cooperative trajectory and solution concept, and solution property of At-time consistency for a fixed information horizon were determined. The paper (Petrosian and Barabanov, 2017) was focused on studying of Looking Forward Approach with stochastic forecast and dynamic adaptation in case when information about the conflicting process can change during the game. In the papers (Gromova and Petrosian, 2016) the Looking Forward Approach was applied to a cooperative differential game of pollution control. The aim of the paper was to study dependency of the resulting solution upon the value of information horizon, corresponding optimization problem was formulated and solved. The paper (Petrosian et al., 2017) is devoted to apply the Looking Forward Approach to the game model of oil market. Further papers on this subject are to be published in near future. In the paper (Petrosian, 2016b) the Looking Forward Approach was applied to the cooperative differential game with infinite horizon. The paper (Yeung and Petrosian 2017) is devoted to study the Looking Forward Approach for dynamic non-cooperative games. Special type of Hamilton-Jacobi-Bellman equations are derived for a different information structures available to the players during the game. Another interesting class of games

is connected to the class of differential games with continuous updating was considered in the papers (Petrosian and Tur, 2019; Kuchkarov and Petrosian, 2019), here it is supposed that the updating process evolves continuously in time. In the paper (Petrosian and Tur, 2019), the system of Hamilton-Jacobi-Bellman equations are derived for the Nash equilibrium in a game with continuous updating. In the paper (Kuchkarov and Petrosian, 2019) the class of linear-quadratic differential games with continuous updating is considered and the explicit form of the Nash equilibrium is derived.

In the article we use special form of Hamilton-Jacobi-Bellman equation for differential games with random horizon presented in (Shevkoplyas, 2011; Shevkoplyas, 2014; Petrosjan and Shevkoplyas, 2003). A characteristic function of a coalition is an essential concept in the theory of differential games. This function is defined as indicated in (Chander and Tulkens, 1995) as total payoff of players from coalition S in Nash equilibrium in the game with following set of players: coalition S (acting as one player) and players from the set N \ S. A computation of Nash equilibrium fully-described in (Basar and Olsder, 1995) is necessary for this approach. A set of imputations or a solution of the game is determined by the characteristic function at the beginning of each subinterval. For any set of imputations the imputation distribution procedure (IDP) first introduced by L. Petrosyan in (Petrosyan and Danilov, 1979) is analysed. See recent publications on this topic in (Petrosyan and Yeung, 2006; Jorgensen and Yeung, 1999; Jorgensen et al., 2003). In order to determine a solution for the whole game it is required to combine partial solutions and their IDP on subintervals. The characteristics of time consistency and strong time consistency-introduced by L. Petrosyan in (Petrosjan, 1993) and (Petrosyan, 1977) are also examined for the offered solution.

Looking Forward Approach has similarities with the Model Predictive Control theory worked out within the framework of numerical optimal control. We analyze (Goodwin et al., 2005; Rawlings and Mayne, 2009; Wang, 2005; Kwon and Han, 2005) to get recent results in this area. Model predictive control is a method of control when the current control action is achieved by solving at each sampling instant a finite horizon open-loop optimal control problem using the current state of an object as the initial state. This type of control is able to cope with hard limitations on controls and states, which is definitely its strong point over the rest of the methods. It has got, therefore, a wide application in petro-chemical and related industries where key operating points are located close to the set of admissible states and controls. The main problem that is solved in Model Predictive Control is the provision of movement along the target trajectory under the conditions of random perturbations and unknown dynamical system. At each time step the optimal control problem is solved for defining controls which will lead system to the target trajectory. Looking Forward Approach on the other hand solves the problem of modeling players behavior when information about the process updates dynamically. It means that Looking Forward Approach does not use target trajectory, but answers the question of composing trajectory which will be used by players, as well as the question of allocating cooperative payoff along the composed trajectory.

To demonstrate the Looking Forward Approach we present the example of cooperative resource extraction game with finite horizon. The original example was introduced by David Yeung and Steffen Jorgensen in (Jorgensen and Yeung, 1999), the problem of time consistency in this game was examined by David Yeung in

(Yeung and Petrosyan, 2012). In the article, we analyze three player resource extraction game with IDP-core described in (Petrosian et al., 2016) used as a cooperative solution. We present both analytical and numerical solutions for specific parameters. The comparison between the original approach and the Looking Forward Approach with random horizon is presented. In the final part of the example model we demonstrate the strong time consistency property of the constructed solution. The structure of the article is as follows. The basic game models are presented in Section 2. A sequence of the auxiliary random truncated subgames is determined in Section 3. Solutions to these subgames are offered in Section 4. It involves a cooperative behavior of players for the whole game and an allocation of the cooperative payoff between the players at each stage of the game. In Section 5 we present a new concept of the game solution for the case of updating information. The time consistency and strong time-consistency properties of the solution are stated and proved. In Section 6 the Looking Forward Approach is applied to the game of Cooperative Resource Extraction with finite horizon.

2. The Original Game

The n-person differential game r(x0,T — to) with finite horizon T — to, with the initial state x0 e Rm and initial time instant t0 is given (t0 and T are a fixed values). The structure of the game is defined by the following dynamical system:

X = g(t, x, u), x(to) = xo, (1)

where x takes values in Rm, u = (ui,..., un). Denote the set of players by N = {1,..., n}. A player i chooses a control ui for i = 1,... ,n. For each time instant t, ui(t) e Ui C CompRk. When the open-loop strategies are used, we require piecewise continuity with finite number of breaks. For feedback strategies we follow (Basar and Olsder, 1995). We require that for any n-tuple of strategies u(t, x) = (u1(t, x),..., un(t, x)) the solution of Cauchy problem exists and is unique on the time interval [t0, T]. For more sophisticated definition of feedback strategies in zero-sum differential game see (Krasovskii and KotePnikova, 2010).

i

T

Ki(x0,T — t0; u) = J hi(x(r),u(r))dr, (2)

to

where x(t) is the trajectory (the solution) of the system (11) with the control input u = (ui, . . . , un).

3. Random Truncated Subgame

Suppose information for players is updated at fixed time instants t = t0 + jAt, j = 0,...,/, where t0 < At < T, l = T—tto — 1. During the time interval [t0 + jAt,t0 + (j + 1)At], players have full information about the dynamics of the game described by g(t,x,u) and payoff function described by hi(x(t), u(t)). The problem is that the players are not sure about the period of time when this information is valid, all they know is that it is defined on the time interval [t0 + jAt,Tj], where duration Tj is a random variable with known characteristics. Realization of random variable Tj we denote as j j = 0,..., / .

As was mentioned before during the time interval [to + j^t,t0 + (j + 1)^t] players have full information about the dynamics of the game and payoff function on the time interval [t0 + j^t, Tj], where Tj is a random variable which takes values from the time interval [max(t0 + (j + 1)At,tj-1), T], tj-1 is a realization of random variable Tj--1 (Tj--1 is realized at the time instant t = t0 + j^t). At the time instant t = to + (j + 1)^t the information about the game is being updated and random variable Tj is realized, i.e. tj becomes known to players. On the next time interval (t0 + (j + 1)^t, t0 + (j + 2)^t] players have full information about the game structure on the time interval (t0 + (j + 1)^t, Tj+1 ], where Tj+1 is a random variable which takes values from the time interval [max(t0 + (j + 2)^t, tj), T]. For j = 0 we suppose that tj-1 = 0.

It may remain unclear why the random variable Tj is realized at time instants t = t0 + (j + 1)^t, but its value tj exceeds t = t0 + (j + 1)^t. Interpretation can be the following. Suppose that at time instants t = t0 + j^t players receive information about the game, but in order to accurately estimate the duration of information horizon which is random variable Tj they Med time ^t (to make calculations etc.). At the time instant t = t0 + (j + 1)^t calculations are performed and certain value of information horizon Tj becomes known to players, i.e. tj. Another interpretation is that during current ^t-time interval [t0 + j^t, t0 + (j + 1)^t] players receive additional information which helps them to estimate the time on which the information about the process is certain, i.e. define the value of information horizon tj. At the time instant t = t0 + (j + 1)^t after the estimation is performed players receive new information about the game structure with random information horizon Tj+1 and the same procedure continues.

To model this kind of situation we introduce the following definition (Fig. 1). Denote vector xj,0 = x(t0 + j^t).

X

Fig. 1. Each oval represents random truncated information, which is known to players during the time interval [to + jAt, to + (j + 1 )At], j = 0,... , l

Definition 1. Let j = 0,..., I. A random truncated subgame /j(xj,0, t0 + j^t) is defined on the time interval [t0 + j^t, Tj ], where Tj is a random variable which takes values from the time interval [max(t0 + (j + 1)^t,tj-1 ),T], tj-1 is a realization of random horizon Tj-1 in the previous truncated subgame (xj-1;0, t0 + (j — 1)^t).

Realization of Tj_i occurs at time instant t = t0 + jAt. The dynamical system and the payoff function on the time interval [to + jAt,Tj] coincide with that of the game r(x0, T —10) on the same time interval. The dynamical system and the initial condition of the truncated subgame Ij(xj,0,t0 + jAt) have the following form:

X = g(t,x,u), x(t0 + jAt) = xj,0. (3)

The payoff function of player i in random truncated subgame j is equal to

t t

Kj (xj,0,t0 + jAt; u)= J J hi (x(t),u(t))drdFj (t), (4)

t0+jAt to +jAt

where Fj (t) IS cl distribution function of T j.

T T

f dFj (t)= j dFj (t) = 1, (5)

t0 +jAt max(t0 + (j+1)At,tj_i)

due to the definition, Fj (t) is a conditional distribution function, i.e. Fj (t) = Fj (t | Tj_1 = tj_1). Further by notation Fj (t) we will refer to Fj (t | Tj_1 = tj_1).

Suppose that the realization of random horizon Tj_1 in the game Fj_1 (xj_1j0, t0 + (j — l)At) exceeds time t = t0 + (j + \)At:

tj_ 1 >t0 + (j + l)At, (6)

then the random horizon Tj must exceed the realization of Tj_1, because the information about the game structure is already known on the time interval [t0 + jAt,tj_1 ]. That is why in the formula (5) the probability of Tj taking values from time interval [t0 + jAt,tj_1] equals zero:

max(to + (j+1)At,tj_i)

J dFj (t)=0. (7)

to + jAt

In the papers on the topic of cooperative differential games with random horizon (Shevkoplyas, 2009; Shevkoplyas, 2010; Shevkoplyas, 2011; Shevkoplyas, 2014; Petrosjan and Shevkoplyas, 2003) the distribution function of Tj is defined on the infinite time interval. In this paper Tj takes values from the finite time interval because the original game is defined on the finite time interval [t0, T].

In (Kostyunin and Shevkoplyas, 2011) the order of integration in the double integral (4) was changed according to Tonelli's theorem:

T

Kj (xj,0,t0 + jAt; u)= j (1 — Fj (T))hi (x(T),u(T))dT. (8)

to + jAt

3.1. Solution of Random Truncated Cooperative Subgame

Consider a truncated cooperative subgame rc(xj,0,t0 + jAt) defined on the time interval [t0 + jAt,Tj] with the initial condition x(t0 + jAt) = xj,0, where Tj

Fig. 2. Behavior of players in the game with random truncated information can be modeled using the random truncated subgames Jj(xj,0,t0 + jAt), j = 0,... ,l

is a random variable with distribution function (5). Classically on the first step of cooperative differential games we define cooperative strategies and corresponding cooperative trajectory. On the second step we define the rule for allocating cooperative payoff between the players along the cooperative trajectory. To do this we define characteristic function and corresponding cooperative solution. The total payoff of players to be maximized in this game is

t t

(x3,0 ,t0 + jAt; u) = ^ i i hi (x(t ),u(t ))dTdF3 (t) (9)

ieN ieNto +jAtto +jAt

subject to

x = g(t,x,u), x(t0 + jAt) = xj0• (10)

This is an optimal control problem. Sufficient conditions for the solution and the optimal feedback are given by the Theorem 1 firstly presented in (Shevkoplyas, 2014). Denote the maximum value of joint payoff of the players (9) by the function W(jAt) (t, x)

W(jAt) (t,x) = max i ^ Kj(x,t; u) 1 , (11)

uEU lieN J

x, t

correspondingly and U = U1 x ... x Un.

Theorem 1. Assume there exists a continuously differential function W(jAt) (t, x) : [t0 + jAt,Tj] x Rm ^ R satisfying the partial differential equation

j) W {jAt) ™ =

Wt(jAt) (t, x) +ma,x\Y hi(t, x, u) + W(jAt) (t, x)g(t, x,u) }, (12)

uEu I ' j I

t v ' ' ueu ,

k i=1

where lim W(t, x) = 0 fj (t) ¿s a probability density function for random

i^T -

variable Tj (5). Assume that maximum in (12) is achieved, under controls u*(t, x). Then u*(t, x) ¿s optimal in the control problem defined by (9), (10).

Theorem 1 (presented in (Shevkoplyas, 2014)) requires that the function W(j4i) be C1. However, it is possible to assume only continuity considering viscosity-solutions using Subbotin approach (Subbotin, 1984; Subbotin, 1995). But due to the shortage of space, it is not possible to properly introduce and define this solution in the paper. In the example model we define and get solution W(j4i) from C1

3.2. Conditionally Cooperative Trajectory

During the game r(x0, T — to) players posses only truncated information about its structure. Obviously, it is not enough to construct optimal control and corresponding trajectory for the game r(x0,T — t0). As a cooperative trajectory in the game r(x0, T —10) we propose to use a conditionally cooperative trajectory defined in the following way:

Definition 2. Conditionally cooperative trajectory {x*(t))T=i0 is defined as a composition of cooperative trajectories x* (t) in the truncated cooperative subgames -Tjc(x*_1 (t0 + j^t),t0 + j^t) defined on the successive time intervals [t0 + j^t,t0 + (j + 1)^t] (Fig.3):

'x0(t),t € [t0,t0 + ¿t),

x*(t), t € [t0 + j^t,t0 + (j + 1)4t), (13)

{x* (t)K =

. . . ,

x*(t),t G [to + + (1 + 1)^t],

On the time interval [t0 + j^t,t0 + (j + 1)^t] conditionally cooperative trajectory coincides with the cooperative trajectory x* (t) in the truncated cooperative subgame -T.c(x*_1 (t0 + j^t),t0 + j^t). At the time instant t = t0 + ( j + 1)^t information about the game structure updates in the position x*(t0 + (j + 1)^t). On the time interval (t0 + (j + 1)^t,t0 + (j + 2)^t] trajectory x*(t) coincides with cooperative trajectory x*+1 (t) in the truncated cooperative subgame /jc+1 (x* (t0 + (j + 1)^t), t0 + (j + 1)^t) which starts at the time instant t = t0 + ( j + 1)^t in the position x*(t0 + (j + 1)^t). For j = 0: x*_1 (t0 + j^t) = x0.

3.3. Characteristic Function

For each coalition S c N and j = 0,..., l define the values of characteristic function as it was done in (Chander and Tulkens, 1995):

[ £ Kj(x*,0,t0 + j^t; u*),S = N,

(S; j, t0 + jAt) = < js, xj 0, t0 + j^t), S c N, (14)

10, ' S = 0,

where Vj (S, xj 0,t0 + j^t) is defined as total payoff of players from coalition S in Nash equilibrium = ,... ,) in the game with following set of

players: coalition S (acting as one player) and players from the set N \ S, i.e. in the game with |N \ S | + 1 players.

Fig. 3. Solid line represents the conditionally cooperative trajectory {£* (t)}?=t0. Dashed lines represent parts of cooperative trajectories that are not used in the composition, i.e., each dashed trajectory is no longer optimal in the current random truncated subgame.

An imputation <j (xj,o , to + jAt) for each random truncated cooperative subgame rjC(xj,o,t0 + j^t) is defined as an arbitrary vector which satisfies the conditions

<j(xj,o,to + jAt) > Vj({i},x*o,to + jAt), i e N, E (xj,o,to + jAt) = V-(N,x*o,to + jAt). (15)

ieN

Denote the set of all possible imputations for random truncated subgame by Ej(xj,o,to + jAt). As an optimality principle or solution

Wj(xjo, to + jAt) c Ej(xjo, to + jAt) (16)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

in each random truncated cooperative subgame /jc(xj,o ,to + jAt) we use IDP-core introduced in (Petrosian et al., 2016). Construction of this solution is based upon the special class of IDP (Petrosyan and Danilov, 1979).

Definition 3. Function fyj(t,x*), t e [to + jAt,Tj] is called Imputation Distribution Procedure for imputation (xj,o,to + jAt) e Ej(xj,o,to + jAt), if

<j (xj,o ,to + jAt) = / (1 - Fj (T))j (t, x* (t))dT. (17)

./io

Using IDPfy (t

, x*) it is possible to define the rule for allocating imputation <j (xj o, to + jAt) ^ the time interval [to + jAt, Tj], where Tj is a random variable. It is obvious that the number of functions fyj(t,x* (t)) that satisfy the equation (17) is infinite, i.e. the ways of allocation of cooperative payoff between players is infinite, but according to the formula for a class of games with random horizons presented in (Shevkoplyas, 2009; Shevkoplyas, 2010)

fyj(t,x*(t)) = 1 fjFt)(t)<j(x*(t),t) - dT<j(x*(t),t) (18)

to define the unique j(t,x*(t)) that ensures the time consistency property (Pet-rosyan and Danilov, 1979) of the imputation Cj (x*,0, t0 +j^t) or cooperative solution Wj(x*,0,t0 + j^t) (in case of multiple principle of optimality):

Definition 4. Solution Wj(x*0,t0 + j^t) (£j(xj0,t0 + j^t)) is called time-consistent if for any imputation Cj(xj 0,t0 + j^t) G Wj(xj 0,t0 + j^t) exists IDP j (t,x*) which Vt G [t0 + j^t,T] satisfies:

(1 - Fj(t))jj(t, x*G

W(x*(t),t) ^ j£(1 - Fj(t))jj(t, x*)dT j = Cj(xj,0,t0 + j¿t)j . 3.4. IDP-core

In the paper we used the approach proposed in (Petrosian et al., 2016) in which we constructed functions that can be used as IDPs for some imputations and then composed corresponding solutions. Suppose that characteristic function Vj(S; x* (t), t), S C N is continuously diiferentiable by t, t G [t0 + j^t, T] along the cooperative trajectory x* (t). Introduce the following notation:

Uj(S; x* (t),t) = -dtVj(S; x* (t),t), (19)

where t G [t0 + j^t, T] and S C N.

Define Bj (t,x*) as a set of integrable vector functions j (t, x*) satisfying the following inequalities:

Bj (t, x*) = { j (t, x*) = (jj(t, x*),...,$ (t, x*)) :

E(1 - Fj (t))jj (t,x*) > Uj (S, x* (t),t),

¿es

]T jj (t, x*) = Uj(N, x*(t),t), VS C n}. (20)

¿eN

Suppose that Bj(t, x*) = 0, Vt G [t0 + j^t, T], j = 0,..., I. Then using the set Bj (t,x*) it is possible to define the following set of vectors:

Definition 5. Set of all possible vectors Cj (x* (t),t) for some integrable selectors jj(t,x*) G Bj(t,x*) we shall call IDP-core and denote as Cj(x*(t),t), where

Cj(x*(t),t) = {Cj(x*(t),t), t G [t0 + j¿t,T]} (21)

and for t G [t0 + j^t,T]

T

Cj(x* (t),t) = J(1 - Fj(t))j (t, x*)dT. (22)

t

In (Petrosian et al., 2016) it was proved that IDP-core is a subset of Core:

Theorem 2. The set Cj(x*(t), t) is a subset of Core Cj(x*(t),t) in random cooperative truncated subgame rjc (x*(t),t), t G [t0 + j^t,T].

Core is a classical solution in the theory of games (Shapley, 1952). In our case Core Cj(x *(t),t) for each random truncated subgame is defined as a set of imputations (xj o, to + jAt) = (<j (xj o, to + jAt),..., (xj o, to + jAt)) satisfying Vt e [to + jAt, T]:

1. efficiency: £ <j(x*(t),t) = Vj(N; x*(t),t)

ieN

2. coalitional rationality: £ <j(x*(t),t) > Vj(S; x*(t),t), VS c N.

ies

The main result of paper (Petrosian et al., 2016) is the proof that IDP-core is strong time consistent in differential games with prescribed duration. The same result can be obtained for random truncated subgame.

Definition 6. Set Wj ( xj o, to + jAt) is called strong time-consistent if for any solution in the game rjc(x*(t),t)

1. Wj(x*(t),t) = 0,Vt e [to + jAt, T]

2. for each imputation (x*(t),t) e Wj(x*(t),t) exists IDP fy(t, x*) = j*)

(fyj(t,x*),...,j(t,x*)), T e [to + jAt,T], such that

<j(x*(t),t) = £(1 - Fj(t(t, x*)dT, (23)

and

r t

/ (1 - Fj (t (t, x* )dT © Wj (x* (t), t) c Wj (xj o, to + jAt) (24) «'to +jAt

for each t e [to + jAt, T], © a © B = {a + b : b e B}, a e B c Rn.

Strong time consistency of solution means that the solution obtained by "optimal" reconsidering initial solution at any time instant during the game will belong to the initial solution. Particularly for IDP-core in random truncated subgame it means that for each <j (xj, o ,to + jAt) from Cj (xj o,to + jAt) deviation

xj ( t) t e

[to + jAt,Tj] to any other imputation in current IDP-core <j(x*(t),t) e Cj(x*,t) leads to the imputation which belongs to the initial IDP-core Cj (xjo,to + jAt). In paper (Petrosian et al., 2016) is the proof that IDP-core is strong time consistent:

Theorem 3. SupposeCj (x* (t),t) = 0, Vt e [to + j At, T], Then IDP- coreCj (xjo, to + jAt) is strong time consistent in the game rj(xjo,to + jAt).

In the paper (Petrosian et al., 2016) the properties of the IDP-core as a cooperative solution are discussed and the technic for its construction is demonstrated on the linear quadratic game model of pollution control.

It is easy to suggest that distribution of the total payoff of players in the game r(xo,T - to) along the conditionally cooperative trajectory {x *(t)}T=to can be organized as a composition of IDPs for each time interval [to + jAt,to + (j + 1)At], j = 0,..., I, in accordance with the structure of the game r(xo, T - to). This will be formalized in this section as a new solution concept.

The family of sets Wj (xjj0,t0 +j^t) = Cj (xj,0 ,t0 +j^t) do not compose directly a solution for the game r(x0, T —t0). For any j = 0,..., l the optimal solution for the truncated subgame /jc(xj 0, t0 +j^t) is defined on the time interval [t0 +j^t, t0 + j^t+Tj]. This particular solution makes sense on the interval [t0 +j^t, t0+(j+ 1)^t] only, because the information about the game structure updates after every time interval and it is irrelevant to use a solution which is based upon the outdated information. The necessary information can be extracted by using the IDP for each truncated subgame. Therefore in order to construct optimal solution for the whole game r(x0, T — t0 ) we use the set of IDPs Bj (t) instead of set of imputations Cj (x*0,t0 + j^t).

4. Concept of Solution

In order to introduce a solution concept for the differential game r(xo, T — to) with Looking Forward Approach we use a family of sets Bj (t, x*), j = 0,..., l. First we construct the set of IDPs for the whole game r(x0,T — t0) in the following way: for each fixed composition of IDPs / (t, x*) G Bj (t, x*), j = 0,..., l we define

resulting IDP /3(t, x*).

Definition 7. Resulting IDP /(t, x*) is a function defined as a combination of imputation distribution procedures / (t, x*) G Bj (t, x*) in all truncated cooperative subgames rj(xj 0,t0 + j^t), j = 0,..., l:

(1 — Fo(t))^o(t,x5),t G [to, to + ¿t],

/(t,x*)= <{ (1 — Fj(t))/(t,x*),t G [to + j^t,to + (j + 1)4t], (25)

^ (1 — F; (t))/i (t,x* ),t G [to + Mt,to + (l + 1)^t].

The set of all possible resulting IDPs /(t, x*) (25) for different compositions

/j(t,x*) G Bj(t,x*), j = 0,..., l we denote by B(t, x*).

Using resulting IDP /(t, x*) G B(t) it is possible to determine a resulting imputation which can be used as an imputation in the game r(xo, T — to) with Looking Forward Approach. But the question stands, will the resulting imputation actually allocate joint cooperative payoff along the conditionally cooperative trajectory x*(t), this fact is proved in Theorem 4.

Definition 8. Resulting imputation £(xo, T — to) is a vector defined in the following way:

£(x0, T — t0) = / /3(t,x*(t))dr =

to

E

j=0

r (j+1)^t

(1 — Fj(t))&(t, x* (t))dT

L j^t

.

Denote by resulting solution W (xo, T — to) the set of all resulting imputations £(xo, T — to) composed by (25), (26). In game models with Looking Forward Ap-

W cls cl solution.

Theorem 4. With any £(xo, T — to) G W(xo, T — to) ¿i ¿s possible to allocate joint payoff of players (9) along the conditionally cooperative trajectory x*(t) during the

! /

jjA^ /

♦—> t

t0 At At At At At At j

Fig. 4. Combination of IDPs fyj (t,x*) € Bj (t, x*) defined for each £j (x*,0, to + j^t) € Wj (x*,0, to + jAt), j = 0,... , l determines the random truncated distribution fy(t,X*) €

B(t, x*).

game r(xo, T - to) and for Vt e [to + jAt, to + (j + 1)At], j = 0,..., I:

r (k+1)^t

j-i i

Lk=o

n I, /> II,

E / A(T,x*(t))dT = E

i=1 to i=1

t

j (1 - Fj(t))hi(x *(t),u *(t))dT

(1 - Fk(t))hi(x *(t),u*(t))dT

+

to+j^t

(27)

Proof. To prove this theorem we start from the last random truncated subgame Pi(xz*o, to + lAt), i. e. prove that for Vt e [to + lAt, T]

n t n t

£ J ft(t,x*(t))dT = £ J (1 - Fi(t))hi(x*(t),u *(t))dT. (28)

to +l^t

to +l^t

Indeed, maximum joint payoff in this game is defined by function W(l^t) (to + 1At,x*,o) (11). According to the definition of this function for Vt e [to + ZAt,T]:

W (Mt) (t, x *(t)) =

ma«{E K(x *(t), t; u)l = E /(1 - Fl(x*(t), u*(t))^t =

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

UGN J i=1 t

T T T

E iel(x *(t),t)dT = E [ft(T,x*(T))dT = E f ft(T,x*(T))dT. (29)

i=1 + i=1 + i=1 */

However,

W(to + Mi,<0) - W(t,x*(t)) =

]T / (1 - F,(t))hi(X(t), U*(r))dT, (30)

i_1t0+lAt

where Vt G [to + Mt,T]. From (29) and (30) it follows that for Vt G [to + Mt,T] (28) holds. Using this result prove that it also holds for random truncated subgame r=i-i(xl*_1 0,t0 + (l - 1)^t), i. e. prove that

E J a(t,x* (t ))dr =

i=1i0 + (i_1)4i

n 1

E J (1 - Fi-1 (t))h(x*(t), U*(t))dT, (31)

i=1io + (i_1)^i

where Vt G [t0 + (l — 1)^t,t0 + Mt]. Similarly as in the game 1}(x* 0,t0 + Mt) for Vt G [t0 + (1 — 1)^t,t0 + Mt]

W((l-1)4t) (t0 + (l — 1)4t,xi*-1 , 0) — W((l-1)4t) (t,x* (t)) =

n 1

E J (1 — Fi-1 (t ))h (X* (t ), u*(t ))dT. (32)

i=1io + (i-1)^i

Then it follows that (31) is satisfied. We need to proceed until the first random truncated subgame f0(x0,t0). This will enables us to combine results (28), (31) and show that for Vt G [t0,T] (27) holds. This completes the proof.

4.1. Time-consistency of the Solution Concept

It is easy to see that the resulting solution W(x0, T —10) is time-consistent, but there is another surprising property of W (x0, T —10).

Theorem 5. The resulting solution W(x0, T — t0) is strong time-consistent in the game r(x0,T —10).

Proof. Suppose that in the game r(x0,T — t0) players agreed to choose an imputation £(x0, T —10) G W(x0, T —10). It means that during the game, in each random truncated subgame rjc(xj0,t0 + j^t) they agreed on choosing the imputation (xj0,t0 + j^t) G Cj(xj0,t0 + j^t) with corresponding IDP ^(t,x*) G Bj(t, x*), t G [t0 + j^t, T]. In fact, during the game players use IDP /3(t, x*) = (t, x*) and allocate cooperative payoff in the following way:

T 1 /-to +(j+1)At

/ /3(t, x*(t))dT = ]T / (1 — Fj(t))&(t,x*)dt.

J j = 0 to+j^t

Suppose that in a given time instant t = tbr, where tbr G [to + k^t,T] in random truncated subgame ( XfcO'to + k^t) players decide to choose another imputation (xk(tbr),tbr) from the IDP-core Ck(xk(tbr), tbr)• Therefore, there exists IDP /k (t,xk) G Bk (t,xk ), t G [tbr, T] which corresponds to the imputation:

T

ek (xk (tbr ),tbr) = J (i - Fj (t))^k (t,xk )dt. (33)

tbr

In this case, during the game players will allocate cooperative payoff according to £'(xio, T — t0) using the following resulting IDP:

( (1 — Ffc(t))/k(t,xk),t G [to + k^t,tbr), /(t) = ^ (1 — Ffc(t))/k(t,xk),t G [tbr,to + (k + 1)4t],

[ (1 — Fj(t))/j(t,x*), t G [to + ¿¿t,to + (j + 1)^t],

where j = k, j = 0,..., I. Corresponding resulting imputation will have the following form:

T 1 /•to + (j+l)4t

e'(xo,T — to)= / /3'(t,x* )dt = (1 — Fj (t))/j (t,x* )dt +

./ j = o Jio+j^i

io

/•ibr fio +(fc+1)4i

/ (1 — Ffc(t))/fc(t,xk)dt + / (1 — Ffc(t))/k(t,xk)dt. (34)

./io + k^i ./ibr

Since /k (t,xk) G Bk (t,xk ), t G [tbr, T] then the resulting IDP (t, x*) belongs to B(t, x *). According to the definition of W(xo, T — to), all rectors £(xo, T — to) obtained by the formula (26) using /(t, x*) from the set B(t, x*) are called the resulting solution W(xo, T — to) of the game r(xo,T — to). In (34) we constructed the imputation £'(xo, T — to) with the IDP /'(t, x*) from the set B(t, x*) and we saw that the resulting imputation £'(xio, T — to) belongs to the initial solution W(xo,T — to). That completes the proof.

5. Looking Forward Approach with Random Horizon in Cooperative Extraction Game

The following example of the resource extraction game with two players was considered by Jorgensen and Yeung (1999). The problem of time consistency in the considered example was studied by David Yeung et. al. (2012). In the previous paper on Looking Forward Approach (Petrosian and Barabanov, 2017) the same example for two players was considered, but with the new forecast factor. Three competitive models were implemented: with the stochastic forecast, with the deterministic forecast, and without a forecast. In this paper we consider resource extraction game with three players with a special form of cooperative solution described in Section 3.3 and in (Petrosian et al., 2016). An analytical form of characteristic function for each coalition is derived according to (Chander and Tulkens, 1995) and presented below. Furthermore, we apply the Looking Forward Approach with random horizon to the example. In the final part of the example the strong time consistency property of the constructed solution concept is demonstrated.

In the following model we derive the analytical solution to the problem, but general analytical solution cannot be found. In order to apply the Looking Forward Approach to the general class of cooperative differential games we need to solve two main problems. First problem is to solve (9) subject to (10) for each truncated subgame. Mathematically this is a classical control problem, there are numerous methods for solving it. Solving this problem we obtain approximate cooperative strategies, cooperative trajectories x* (t) and corresponding joint payoff (9). Second problem is the problem of defining of how to allocate cooperative payoff between the players. We need to calculate characteristic function (14) for each truncated subgame along the cooperative trajectory, to do this we can use revolutionary algorithms (Eiben and Smith, 2003) suitable for game theoretical problems. After calculating characteristic functions we can determine solution for each truncated subgame (for example IDP-core), then calculate corresponding resulting solution (26).

5.1. The Original Game

Consider an economy endowed with a single renewable resource, with n > 2 resource extractors (firms). Let «¿(t) denote the quantity of the resource extracted by firm i at time t, for i G N, where each firm controls its rate of extraction. Let x(t) G X c R be the size of the resource stock at time t. The growth dynamics of the renewable resource stock becomes

3

x = a\Jx(t) — bx(t) — ^^ «¿, x(t0) = x0, (35)

¿=1

where ^y^x(t) — bx(t) is the natural rate of evolution of the resource and G [0, d],

d > 0, i = T73.

iGN

extracted «¿(t), the resource stock size x(t), and par am eter cj, i = 1,3

Kj(x0,t0; m) = ^«¿(t)--t== «¿(t)dT, (36)

■ Vx(t )

to

where q is constant and q = c^, Vi = k = 1,3. We consider set of parameters x0, T, a ^ d c^i i = 1,3 such that it is always non-negative in the corresponding control problem.

5.2. Random Truncated Subgame

The original game r(x0,T —10) is defined on the time interval [t0, T]. Suppose for any t G [t0 + j^t, t0 + (j + 1)^t], j = 0,..., l players have truncated information about the structure of the game. It includes information about dynamical system and payoff function on the time interval [t0 + j^t, Tj], where Tj is a truncated exponentially distributed random variable with distribution function Fj (t) and density-function fj (t):

F (t) = 1 — exp(—A(t — max(t0 + (j + 1)^t, tj-1))) , .

j () 1 — exp(—A(T — max(t0 + (j + 1)^t,tj-1)))' 1 '

= A exp( —A(t — max(t0 + (j + 1)^t,tj-1))) () 1 — exp(—A(T — max(t0 + (j + 1)^,^-1))). 1 '

Exponential distribution is widely used for describing the time between events in a Poisson process. Under the events we can understand the change in game structure. Also, let us denote ^j (t):

( , t G [max(to + (j + ),T ],

t € [to + j^t, max(to + (j + 1)^t,ij_i)].

4 (t) = \ (t)

The truncated information is formalized in the random truncated subgame fj(xjo,to + j^t). The dynamical system and the initial conditions for this subgame have the following form:

X = a\Jx(t) — bx(t) — ^^ Mj, x(to + j'^t) = Xj,o.

(39)

According to (8) the payoff function of the extractor i is equal to

T

Kj(xj,o,to + j^t; u) = J (1 - Fj(r))h(x(t),u(t))dr.

io+j^i

(40)

Consider the case when the resource extractors agree to act cooperatively in the random truncated subgame Jjc(xjio, to + j^t). They follow the optimality principle under which they would maximize their joint payoffs and share the excess of the total expected cooperative payoff over the sum of individual non-cooperative payoffs proportional to the agents non-cooperative payoffs.

5.3. Cooperative Trajectory

Next, consider the random truncated subgame rj (xj,o,to + j^t). The maximized joint payoff in the game fjc(xjjo,to + j^t) has the following form (Jorgensen and Yeung, 1999):

(t,x)= Aj (t)VX + Cj (t), (41)

where functions Aj (t), Cj (t) satisfy the equations

(t) =

4? (t) +

(t) — £

i)

2

Cj (t) = 4? (t)Cj (t) — - (t)

(42)

with boundary conditions liin Aj (t) = lim Cj (t) = 0.

The optimal cooperative trajectory x* (t) of the random truncated subgame fjc(xj,o,to + j^t) can be represented explicitly (Jorgensen and Yeung, 1999) on the full interval [to + j^t, T]. The trajectory with the initial condition x = xj,o is

x* (t) = ^2(to + j^t,t)

j,o +2 a '

ro? (to + j^t, t) 1dr

(43)

to

Looking Forward Approach with Random Horizon where t G (to + j^t,to + (j + 1)^t],

i

OTj (to + j^t, t) = exp J

io+j^i

2 b

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

¿=1

j)

2

dr.

(44)

The initial condition are defined recursively by the optimal trajectory of the previous game: x* o = xo and xj o = x*-1(to + j^t) for j = 0,..., I. The conditionally cooperative trajectory x* (t) is defined in accordance with Looking Forward Approach as

x*(t) = x*(t), t G [to + j^t,to + (j + 1)4t], (45)

for j = 0,..., I.

5.4. Characteristic Function

In order to allocate cooperative payoff in each random truncated subgame it is necessary to define values of characteristic function Vj (S; xjjo, to+j^t) (Vj (S; x* (t), t)) for each coalition S c N. According to the formula (14) maximized joint payoff Wj (to + j^t,xjio) (41) corresponds to the value of characteristic function of grand coalition Vj(N; xj,o,to + j^t) in the random truncated subgame /^(xj^,to + j^t):

Vj(N; x* (t),t) = Wj4i(t, x*(t)), (46)

where t G [to + j^t, T], j = 0,..., I. Next, we need to define values of characteristic function for the following coalitions:

{1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}. (47)

According to (14), for a single player coalitions {i}, i = 1, 3 we need to determine Nash equilibrium point and as a result Vj({i}; x*(t),t).

5.5. Single Player Coalitions

Random truncated subgame fj(xjjo,to + j^t) has a Nash equilibrium point defined by the feedback

x

U (t, x) = — . ,

i (' ) 4[ci + Aj (t)/2]2

where functions Aj (t) are defined by the equations

b ^ 1

i = 1, 3,

(48)

A j (t) = Aj (t)

Aj (t)

E

k=i

8(ck + A' (t)/2)2

4(ci + Aj(t)/2)'

Cj (t)= Aj (t)Cj (t) — - Aj (t)

fori = 1, 3 with boundary con ditions liin Aj (T) = 0 and lim Cj (T) = 0.

The value function of the extractor i = 1, 3 in the Nash equilibrium point is equal to

Vj(t,x) = Aj(t)Vx + Cj(t), i = 173. (49)

Therefore, value of characteristic function for single coalitions S = {i}, i G N can be calculated in the following way:

Vj ({i}; x**(t),t) = Vj (t, x* (t)), (50)

where t G [to + j^t, T], j = 0,..., I.

5.6. Two Player Coalitions

According to the formula (14) characteristic function Vj (S; xji0, t0 + j^t) (Vj (S; x*(t),t)) for two player coalitions S = {1,2}, {1,3}, {2,3} is defined as total payoff of players from coalition S in Nash equilibrium = , , ) in the

game with following set of players: coalition S (acting as one player) and players from the set N \ S, i.e. in the game with |N \ S| + 1 = 2 players. It means that players from coalition S behave as one player and other players from the set N \ S are acting separately. Using this approach we define Nash equilibrium between two players: combined player (coalition S), and the second player (coalition N/S).

Consider calculations of Vj (S; xjj0, t0 +j^t) in case when S = {1, 2}, calculations for other coalitions have the same algorithm. Payoff of players in this case has the following form:

Vji,2} = A{li2}(t)VX + C{li2}(i), Vj (t,x) = A3(t)VX + C3 (t),

where the functions A{12} (t), A3 (t), Cj12} (t), C3 (t) satisfy the equations

A{1,2} (t) = A{ 1,2} (t)

{1,2}

Aj (t)^ +

2 8(c3 + A3(t)/2)2

-E

1

A 3 (t) = A3(t)

4 (t) + 2 + E

fees

8(Cfc + A{1,2} (t)/2)2

^ 4(cfc + A{1,2}(t)/2) 1

4(c3 + A3(t)/2)'

CC{1,2} (t) = (t)C{1,2} (t) - 2 A{1,2} (t), C3(t)= A?(t)C3(t) - aA3(t)

with initial conditions lim Aj „(t) = lim A3(t) = 0, lim Cx (t) =

i^T- {1>2}w t^T- 3W t^T- {1>2}^ '

lim C3 (t)=0.

Therefore, the value of characteristic function for coalition S = {1,2} can be calculated in the following way:

Vj({1, 2}; x*(t),t) = V{j1,2}(t, x*(t)),

(51)

where t G [to + j^t, T], j = 0,..., /. 5.7. IDP-core

Using the values of characteristic function V?(S; xjj0,t0 + j^t), VS c N (46), (50), (51) and formula (20) we construct set B?(t, x*) as a set of integrable vector

1

1

functions ySj (t,x* ) satisfying:

3

£(1 - F(*))#' ) = --V({1, 2, 3}; jt),t),

=1

1 - Fj(*))($ ) + $ )) > --Vj({1, 2}; ajt),t), 1 - Fj (*))($ (t,**) + $(t,aj)) >-dd-Vj ({1, 3}; jt),t), 1 - Fj (*))($ (i,aj) + $(t,**)) >-dd-Vj ({2, 3}; jt),t), 1 - Fj(t))$(i,x* ) > -dd-Vj({1}; a*(t),t), 1 - Fj(t))$(t,**) > -d-Vj({2}; a*(t),t), 1 - Fj(t))$(t,**) > -dd-Vj({3}; a*(t),t).

(52)

Then, combining sets Bj(t, x* ), t G [to + j^t, to + (j + 1)^t], j = 0,..., l for all random truncated subgames we construct set B(t, x*). Further we calculate the set of all possible imputations £(xo, T — to) G W(xo, T — to) (26).

The step by step construction of the IDP-core for a linear quadratic game model of pollution control is presented in the paper (Petrosian et al., 2016).

5.8. Numerical Example

Consider a numerical example, where information about the structure of the game during the time intervals [to + j^t,to + (j + 1)^t] is known for next the time interval with length Tj, where Tj is a random variable distributed by (37) with A = 0.5. The total game length T = 4. Information about the game updates every

= 1. Parameters of the dynamical system are following: a = 10 b = 0.5. Assume c1 = 0.15, c2 = 0.65, mid c3 = 0.45 in the payoff function and the initial conditions to = 0 xo = 200 following values:

to = 2.423, t1 = 3.538, t2 = 3.871, t3 = 4.

Generated values of information horizon influence the distribution of the time untill the available truncated information being correct. In Fig. 5 it is easy to see how the information horizon Tj was generated and how the probability density function fj (t) (38) changes between random truncated subgames.

In Fig. 6-8 we can see cooperative strategies for each player defined with Looking Forward Approach with random horizon (non smooth solid line) and cooperative strategies in the original game in (Jorgensen and Yeung, 1999) (smooth dotted line).

Conditionally cooperative trajectory x* (t) is composed from solutions of the random truncated subgames Fj(xjo,to + j^t) with the dynamical system (39). In Fig. 9 the following comparison is presented: conditionally cooperative trajectory x* (t) (thick solid line) defined using Looking Forward Approach with random horizon, conditionally cooperative trajectory x*(t) (thin solid line) defined with classical Looking Forward Approach (Petrosian, 2016a) (where Tj = 2 is a determined value), and cooperative trajectory x* (t) (dotted line) in the origin al game F (xo, T —

Fig. 5. Probability density function fj(t), j = 0,1, 2, 3 (38) for each random truncated subgame.

Fig. 6. Cooperative strategies for player 1 defined with Looking Forward Approach with random horizon (non-smooth), and cooperative strategies in the original game in (Jorgensen and Yeung, 1999) (smooth)

Fig. 7. Cooperative strategies for player 2 defined with Looking Forward Approach with random horizon (non-smooth), and cooperative strategies in the original game in (Jorgensen and Yeung, 1999) (smooth)

150 0.5 1 1.5 2 2.5 3 3.5

Time

Fig. 8. Cooperative strategies for player 3 defined with Looking Forward Approach with random horizon (non-smooth), and cooperative strategies in the original game in (Jorgensen and Yeung, 1999) (smooth)

t0). Cooperative trajectory x* (t) is defined in (Jorgensen and Yeung, 1999). In the other two figures you can see conditionally cooperative trajectory x* (t) or x* (t) and corresponding cooperative trajectories for each truncated subgame.

Fig. 9. The trajectory of the resource stock x* (t) (thick solid line) with Looking Forward Approach with random horizon, trajectory x* (t) (thick dotted line) defined with classical Looking Forward Approach, and cooperative trajectory x* (t) (thin dotted line) in the original game r(xo, T — to).

Next, in order to allocate cooperative payoff between players it is necessary to define a set of IDPs ^ (t, x*) for each random truncated subgame /jc (x*,0, t0 + j^t), j = 0,..., I. For that using fixed parameters of the model we numerically calculate values of characteristic function Vj (S; x*(t),t), S C N for each random truncated subgame fjc(x*,0,t0 + j^t).

Using values of characteristic function Vj(S; x*(t),t), S C N we construct the set Bj (t, x*), j = 0,..., l (20). By combination of sets Bj (t, x*) we can construct set of IDPs for the whole game B(t, x*). On the basis of B?(t, x*) we construct solution concept W(x0, T — t0) using formula (26).

Let us demonstrate the property of strong time-consistency of solution concept W. Suppose that at the beginning of the game r(x0, T — t0) players agreed to use proportional solution. For each random truncated subgame /jc(x*,0 ,t0 + j^t)

Fig. 10. The trajectory of the resource stock x* (t) (thick solid line) with Looking Forward Approach with random horizon, and corresponding cooperative trajectories (dotted lines).

Fig. 11. The trajectory of the resource stock x* (t) (thick solid line) defined with classical Looking Forward Approach, and corresponding cooperative trajectories (dotted line).

proportional solution for players i E N is defined using the IDP in the following way:

(1 - f (t)) j ) = U ({i}; g^o+^ j E Uj({i}; xj o,to + j^t)

Uj (N; x*o,to + j^t),(53)

where Uj(S; xj,o,to + j^t), VS c N is defined in (19). According to the Looking Forward Approach proportional solution should allocate cooperative payoff during the whole game r(xo, T — to) using the following IDP:

/Wp(i,x* ) = (1 — Fj (t))/Prop (t,x**), t E jAt, (j + 1)^t],j = 0, (54)

Via integration of /Prop(t,x* ) by t it is possible to define the proportional imputation i;prop(x*(t), T—t) (26). In the Fig. 13,14 it can be seen that /Prop(t, x*) consists in the set B(t, x*), which means that proportional solution is strong time-consistent with given parameters.

Suppose that at the moment of time tbr E [to, T] players decide that proportional solution is no longer fair for them and they choose another imputation from the solution concept WT(x*(tbr), T — tbr), for example solution, which is based upon the Shapley value for each random truncated subgames. For each random truncated

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

subgame Shapley Value is calculated according to the formula:

Shj (X* (tbr ),i6r ) =

(|N | — |S |)!(|S |- 1)!

E 1 1 1-- • (Vj(S; tbr(tbr)) - Vj(S\{i}; W(tbr))). (55)

SCN I I

ies

Using Shapley value it it possible to define IDP for each random truncated subgame (17). According to the Looking Forward Approach proportional solution should allocate cooperative payoff during the whole game using the following IDP:

¡sh(t,X*) = (1 — F3(t))jh(t,X*), t G [to + j^t,to + (j + 1)^t], j = 0,... ,1,

where ¡sh (t, x*) is defined using the formula (18). It is worth mentioning that the IDP for Shapley value ¡sh (t,x*) and the IDP for proportional solution ¡prop(t,x*) are calculated in the way to ensure the time-consistency property, i.e. using the formula (18). The extended description of the step by step solution of IDP for a Shapley value is presented in (Shevkoplyas, 2009).

Let us set the moment tbr = 1.2 when players decide to reconsider the proportional solution. Then, according to (25), the formula for the IDP for the whole game has the following form:

. (¡Prop (t,X* ),t G [to .hr],

P(t,x*) = { . (56)

[¡Sh(t,x*), t G (tbr,T].

In Fig. 12 IDP ¡Prop(t,X*) for the proportional solution (54) (thick solid line) and IDP ¡(t,X*) for the combined solution (56) (dotted line) are presented.

Fig. 12. IDP (3prop(t,x*) to the proportional solution (53) (thick solid line), IDP ¡3(t,x*) for the combined solution (56) (dotted line).

Via the direct integration of ¡¡(t, X*) (56) by t it is possible to define formula (26) for resulting allocation £(X* (t), T — t). According to £(X* (t), T — t), players allocate cooperative payoff in the game r(x0, T — t0) in the following way:

£(X*(t),T — t) = (12.3, 30.2,16.8). (57)

In Fig. 13,14 it can be seen that ¡¡(t,X*) (56) is within the set B(t,X*) (20), which means that corresponding imputation £(X* (t),T—t) G WT(X* (t),T—t) with given parameters. This fact demonstrates the property of strong time-consistency of solution

concept

W(x0, T — t0). Also, in Fig. 13,14 it can be seen that the proportional solution fiprop(t,X*) (54) is within the set B(t,X*).

In Fig. 15 the difference between the |(X*(t),T — t) and |Prop(X*(t),T — t) is presented.

6. Conclusion

A novel approach to definition of a solution for a differential game is presented. The game is defined on a time interval divided into subintervals. The players do not have full information about the structure of the game on the full time interval. Instead, they know parameters of the dynamical system and of the payoff function, but the duration of this information is unknown in advance. A combined trajectory is composed recursively by the local trajectories. As a solution IDP-core is used. Solution for the whole game is described as a new solution concept. It is proved that the new solution is not only time-consistent but also strong time-consistent which is a rare property of cooperative differential games.

The approach is illustrated by an example of the resource extraction game. The comparison between the original approach and the Looking Forward Approach with random horizon is presented. Combined trajectories for both approaches are presented. Solution concept based on the IDP-core is constructed. In the final part

Fig. 13. Axis: Pi, t can be calculated using (20).

Fig. 14. Axis: P2, P3, t Pi can be calculated using (20).

Time

Fig. 15. Imputation ¿Prop(x*(t),T — t) for the proportional solution (thick solid line),

imputation £(x*(t),T — t) for the combined solution (dotted line).

of the example the strong time consistency property of the constructed solution is

demonstrated. It is supposed that players agreed on using a proportional solution

from the solution set, but at some point they decide to switch to a Shapley value.

As it turns out the resulting solution belongs to the solution set.

References

Eiben, A.E., Smith, J. E. (2003). Introduction to Evolutionary Computing. Berlin, Springer.

Haurie, A. (1976). A note on nonzero-sum differential games with bargaining solutions. Journal of Optimization Theory and Applications, 18, 31-39.

Subbotin, A.I. (1984). Generalization of the main equation of differential game theory. Journal of Optimization Theory and Applications, 43, 103-133.

Subbotin, A.I. (1995). Generalized Solutions of First Order PDEs. Basel, Birkhauser.

Cassandras, C. J., Lafortune, S. (2008). Introduction to discrete event systems. New York, Springer.

Gillies, D. B. (1959). Solutions to general non-zero-sum games. Contributions to the Theory of Games, 4, 47-85.

Yeung, D.W. K., Petrosyan, L. A. (2012). Subgame-consistent Economic Optimization. New York, Springer.

Gromova, E. V., Petrosian, O. L. (2016). Control of information horizon for cooperative differential game of pollution control. 2016 International Conference Stability and Oscillations of Nonlinear Control Systems (Pyatnitskiy's Conference), 1-4.

Shevkoplyas, E. V. (2009). Time-consistency Problem Under Condition of a Random Game Duration in Resource Extraction. Contributions to Game Theory and Management, 2, 461-473.

Shevkoplyas, E. V. (2010). Stable cooperation in differential games with random duration. Mat. Teor. Igr Pril., 2(3), 79-105.

Shevkoplyas, E. V. (2011). The Shapley value in cooperative differential games with random duration. Ann. Dyn. Games. New York, Springer.

Shevkoplyas, E. V. (2014). The Hamilton-Jacobi-Bellman Equation for a Class of Differential Games with Random Duration. Automation and Remote Control, 75, 959-970.

Goodwin, G.C., Serón, M. M., Dona, J. A. (2005). Constrained Control and Estimation: An Optimisation Approach. Springer, New York.

Kuchkarov, I. and O. Petrosian (2019). On Class of Linear Quadratic Non-cooperative Differential Games with Continuous Updating. Lecture Notes in Computer Science, 11548, 635-650.

Rawlings, J. B., Mayne, D .Q. (2009). Model Predictive Control: Theory and Design. Madison, Nob Hill Publishing.

Petrosjan, L. A. (1993). Strongly time-consistent differential optimality principles. Vestnik St. Petersburg Univ. Math., 26, 40-46.

Petrosjan, L. A., Shevkoplyas, E. V. (2003). Cooperative solutions for games with random duration. Game Theory Appl., 9, 125-139.

Petrosyan, L. A. (1977). Time-consistency of solutions in multi-player differential games. Vestnik Leningrad State University, 4, 46-52.

Petrosyan, L. A., Yeung, D.W. K. (2006). Dynamically stable solutions in randomly-furcating differential games. Trans. Steklov Inst. Math, 253, 208-220.

Petrosyan, L. A., Danilov, N.N. (1979). Stability of solutions in non-zero sum differential games with transferable payoffs. Vestnik Leningrad State University, 1, 52-59.

Shapley, L. S. (1953). A Value for n-person Games. Contributions to the Theory of Games, 2, 307-317.

Shapley, L. S. (1952). Notes on the N-Person Game III: Some Variants of the von-Neumann-Morgenstern Definition of Solution. Rand Corporation research memorandum RM-817, 1-12.

Wang, L. (2005). Model Predictive Control System, Design and Implementation Using MATLAB. Springer, New York.

Krasovskii, N.N., Kotel'nikova, A.N. (2010). On, a differential interception game. Proceedings of the Steklov Institute of Mathematics, 268, 161-206.

Petrosian, O. L. (2016). Looking Forward Approach in Cooperative Differential Games. International Game Theory Review, 18(2), 1-14.

Petrosian, O. L. (2016). Looking Forward Approach in Cooperative Differential Games with, infinite-horizon. Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr. 4, 18-30.

Petrosian, O. L., Barabanov, A. E. (2017). Looking Forward Approach in Cooperative Differential Games with, Uncertain-Stochastic Dynamics. Journal of Optimization Theory and Applications, 172(1), 328-347.

Petrosian, O., Tur, A. (2019). Hamilton-Jacobi-Bellman Equations for Non-cooperative Differential Games with, Continuous Updating. Mathematical Optimization Theory and Operations Research, 178-191.

Petrosian, O. L., Gromova, E. V., Pogozhev, S. V. (2016). Strong Time-consistent Subset of Core in Cooperative Differential Games with, Finite Time Horizon. Mat. Teor. Igr Pril., 8(4), 79-106.

Petrosian, O., Kuchkarov, I. (2019). About the Looking Forward Approach in Cooperative Differential Games with, Transferable Utility. Frontiers of Dynamic Games: Game Theory and Management, St. Petersburg, 2018, 175-208.

Petrosian, O., Shi, L., Li, Y., Gao, H. (2019). Moving Information Horizon Approach for Dynamic Game Models. Mathematics, 7, 1239.

Petrosian, O. L., Nastych, M. A., Volf, D. A. (2017). Differential Game of Oil Market with, Moving Informational Horizon and N on-Transferable Utility. Constructive Nonsmooth Analysis and Related Topics (dedicated to the memory of V.F. Demyanov) (CNSA), 2017, 1-4.

Chander, P., Tulkens, H. (1995). A core-theoretic solution for the design of cooperative agreements on transfrontier pollution. International Tax and Public Finance, 2(2), 279-293.

Bellman, R. (1957). Dynamic Programming. Princeton, Princeton University Press.

Jorgensen, S., Yeung, D.W. K. (1999). Inter- and intragenerational renewable resource extraction. Annals of Operations Research, 88, 275-289.

Jorgensen, S., Martin-Herran, G., Zaccour, G. (2003). Agreeability and Time Consistency in Linear-State Differential Games. Journal of Optimization Theory and Applications, 119, 49-63.

Kostyunin, S.Yu., Shevkoplyas, E.V. (2011). On simplification of integral payoff in differential games with random duration. Vestnik S.-Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr. 4, 47-56.

Basar, T., Olsder, G.J. (1995). Dynamic Noncooperative Game Theory. London, Academic Press.

Kwon, W. H., Han, S.H. (2005). Receding Horizon Control: Model Predictive Control for State Models. Springer, New York.

Yeung, D.W. K., Petrosian, O.L. (2017). Infinite Horizon Dynamic Games: A New Approach via Information Updating. International Game Theory Review, 20(1), 1-23.

i Надоели баннеры? Вы всегда можете отключить рекламу.