Научная статья на тему 'Cooperative Differential Games with Pairwise Interactions in Pollution Control Problems'

Cooperative Differential Games with Pairwise Interactions in Pollution Control Problems Текст научной статьи по специальности «Математика»

CC BY
5
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
dynamic network game / pairwise interaction / characteristic function / Shapley value / programming

Аннотация научной статьи по математике, автор научной работы — He Yang

This paper establishes a new class of dynamic games which contain two phenomena often observed in real-life, differential network games and pairwise interactions. It is assumed that the vertices of the network are players, and the edges are connections between them. By cooperation, a particular type of characteristic function is introduced. Then, the cooperative solutions are constructed, proportional solution and the Shapley value. Finally, the results are illustrated by an example.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Cooperative Differential Games with Pairwise Interactions in Pollution Control Problems»

Cooperative Differential Games with Pairwise Interactions in

Pollution Control Problems

He Yang

St.Petersburg State University,

7/9, Universitetskaya nab., St. Petersburg, 199034, Russia

E-mail: [email protected]

Abstract This paper establishes a new class of dynamic games which con-

tain two phenomena often observed in real-life, differential network games

and pairwise interactions. It is assumed that the vertices of the network

are players, and the edges are connections between them. By cooperation, a

particular type of characteristic function is introduced. Then, the coopera-

tive solutions are constructed, proportional solution and the Shapley value.

Finally, the results are illustrated by an example.

Keywords: dynamic network game, pairwise interaction, characteristic func-

tion, Shapley value, programming.

1. Introduction

Recently, differential games on networks are widely used in real life. For instance,

different kinds of papers investigate the following topic. How to define the motion

equation for each player? What is the particular payoff of each player? How do the

players react to the changes in the behavior of their neighbors? If the player can cut

the connection with his neighbor, will he do it or not? The cooperative solutions

are dynamic stable (time-consistent) or not. For the first time, (Petrosyan, 2010)

proposed a differential game on network, which assumes that the state variable

of each player is dependent on his control, and the payoff function of each player

relies on the action by himself and his neighbor. Later, (Petersyan and Yeung, 2020)

proposed a new characteristic function, which satisfies the convexity property. In

the paper (Tur and Petrosyan, 2021) some basic solutions are investigated, such as

Core, The Shapley value (Shapley, 1953) and т—value (Tijs, 1987). (Petrosyan and

Pankratova, 2023), a new characteristic function combined with the partner set is

constructed.

However, a serious problem is the time-consistency property, but in real life, it

is often not satisfied ( Yeung and Petrosyan, 2016, Petrosyan, 1993). In this paper,

we consider differential games with pairwise interactions, for instance, a player's

payoffs depend on his or her strategy and the strategies of their neighbors in gaming

networks.

The paper is organized as follows. Section 2 describes the model of differential

games with pairwise interactions. Section 3 defines a characteristic function. In

Section 4 proportional solution is constructed. In Section 5 the dynamic Shapley

value is proposed. Section 6 demonstrates the results and the numerical simulation.

In the Section 7 conclusion is given.

2. Differential Games with Pairwise Interactions

Consider a class of n-person differential network games with pairwise interaction

over the time horizon [t0,T]. The players are connected to a network system. Let

https://doi.org/10.21638/11701/spbu31.2023.18

N = {1, 2,..., n} denote the set of players in the network. The nodes of the network

are used to represent the players in the network.

A pair (N, L) is called a network, where N is a set of nodes, and L c N x N

is a given set of arcs. Note that the pair (i, i) G L. Nodes represent the players. If

pair arc(i, j) G L, a link connects players i G N and j G N. It is supposed that all

connections are undirected. We also denote the set of players connected to player i

as K(i) = [j : arc(i,j) G L], for i G N, i = j. K(i) = K(i) U i.

The state dynamics of the game are given by

xij (t ) = fij (xij (t ),uij (t )); xij (to) = xO, (1)

for t G [t0;T] and i G N, j G K(i).

Here xij (t) G Rm is the state variable of player i interacting with player j G K(i)

at time t, and uij (t) G Uij , Uj c Rl, the control variable of player i interacting

with player j. Every player i plays a differential game with player j according to the

network structure. The function fij(xij(t),uij(t)) is continuously differentiable in

xij (t) and uij (t).

Define the payoff of each player i at each link or arc i ^ j by

Kij (x'O ,x30l,ulj ,uji,T - to) = hj (xij (t) , xjt (t) , utj (т),иРг(т))3,т

(2)

Because player i plays multiple different differential games, the dynamic equation

contains the player i's control and the control of his neighbor who plays the differ-

ential game with him. The payoff function of player i is not only dependent upon

his control variable, which is from the strategy profile u®(t) = (uij(t), j G K(i)),

and trajectories x®(t) = (xij(t), j G K(i)) but also depend on the control variables

of his neighbor, which is from the strategy profile uj(t) = (uji(t), i G K(j)). Denote

by u(t) = (u1(t),...,ui(t),...,un(t)), where u®(t) = (uij(t),j G K(i)) is the control

variable of player i in the network structure. We use x0 = (xO, ...,x0, ...,xg) to de-

note the vector of initial conditions, where xO = (xij(t0), j G K(i)) is the set of

initial conditions of player i. The payoff function of player i is given by

Hi(xlo,x0,ul,u3,j G K(i),T - to)

J2 K(x0j,x0l,ulj,uji,T - to)

jeK (i)

(xij (t), xji(T), uij

(t), uji(T))dT

(3)

Here, the term hj(xij(t),xji(T),uij(t),uji(T)) is the instantaneous gain that

player i can obtain through network links with player j G K(i). We also suppose

that the term hj(xij(T),xji(T),uij(t),uji(T)) is non-negative.

3. Cooperative Differential Games with Pairwise Interactions

In this section, we use the characteristic function which is first proposed by

Petrosyan et al, (Petrosyan, Yeung and Pankratova, 2023). The game Г(x0,T —10)

is defined on the network (N, L), the system dynamics (1) and playersBTi™ payoffs

are determined by (3). Player i (i G N ), choosing a control variable uij from his

set of feasible controls, seeks to maximize his objective functional (3). Suppose that

players can cooperate to achieve the maximum total payoff.

EE (/ hj(xj(t),aji(T),uj(t),uTl(T))dT)

ieNjeK (i) t0

,„E

E

ieN jeK (i)

hj'(xij(t), xji(T), uij(t), uji(r))dT

U

0

subject to dynamics (1).

(4)

Definition 1. The characteristic function V(S; xo,T —to) is defined as

V (S; xo,T — to) = E E

hj (xij (t ),xji(T ),uij (t),u31(t ))d/r +

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

ieS jeK (i)ns

' to

+a(S) E E

hj(xij(t),xji(t),uij(t),uji(T))dT ,S C N. (5)

ieS jeK(i)nN\S

Here a(S) G [0,1), since player i is inside the coalition S and his neighbour player j

is outside coalition S, and player i is losing part of his payoff,

[1 — a(S)] £ies £jeK(i)n(N\s) /t0 hj (xij(t),xji(T)).

From (4), for coalitions {i} , {0}, we get

to

v({i} ,x0,x0,T—to) = «({i}) E (/ hj(xij(T),xji(T),uij(T),uji(T))dT

je/0(i)j=i ^ 0

jeK (i),j=i

V ({0} ,xo,T — to) =0

(6)

(7)

subject to dynamics (1).

Here more interesting thing is that for different coalition S, a(S) may be differ-

ent, a({S}) = a(S \ {i}), j = i.

4. Proportional Solution (в—value)

Using the defined characteristic function, we introduce the proportional solution

(в-value) as

ei(x o,T — to)

V({i} ,x0,T — to)

EieN V({i} , xo,T — to)

V (N,xo,T — to)

(8)

for i £ N. Here ^ V\ is individual player taken contribution from

T,i£N V({i/,x0 ,T-t0) ^ J

all player.

5. Dynamic Shapley Value

Introduce the Shapley value as

Shi(x0,T - to )= V (|S |- 1)!)(n ~|S |)! x [V (S; xo ,T - to) - V (S\{i} ; xo ,T - to)]

n!

SCN

S3i

(9)

for i € N.

Form (9), we obtain

Shi(xo,T - to) = ^

SCN

S3i

(|S|- 1)!)(n -|S|)!

n!

x

V V if hj (x1 (T),x3l(T),ulj (T),u3l(T))dT)] +

leS jeK(l)nS

+a(S) V V (/ hj(xlj(T),xjl(T),ylj(T),ujl(T))dTj -

lP,S \-'t0 )

leS jeK(l)n(N\s)

- V V

leS\{i} jeK(l)nS\{i}

hj (xlj (t ),xjl(T ),ulj (t ),ujl(T ))dT 1 -

-a(S\{i}) V V

hj(xlj(t),xjl(t),ulj(t),ujl(t))dT 1] (10)

leS\{i} jeK (l)nN\(S\{i})

' to

Applying the Shapley value imputation in (11) to any time instance t € [to,T],

we obtain:

Sh,(XW,T -1)= V <|S|- 1)!>in -|S|>! x

f J П 1

SCN

S3i

[V V if hj(xlj(T),xjl(T),ulj(T),ujl(T))dT)j +

leS jeK (l)nS

+a(S) V V (/ hj(xj(T),xjl(T),ulj(T),ujl(T))dT) -

.....Wt /

leS je-K (l)n(N\s)

- V V

leS\{i} jeK(l)nS\{i}

hj(xlj(t),xjl(T),ulj(t),ujl(T))d,T I -

-a(S\{i}) V V

hj(xlj(t),xjl(t),ulj(t),ujl(t))dT 1] (11)

leS\{i} jeK (l)nN\(S\{i})

Proposition 1. It is clearly that the Shapley value imputation in (10)-(11) satisfies

the time consistency property.

Proof. By direct computation. Then, we have

Shi (xo, T - to)

(|S|- 1)!)(n - |S|)! x

scn n!

S 3i

x[E ^ hj(xlj(T),xjl(T),ulj(T),j (т))^т) +

lgS j(l)ns to

+a(s)^ ^ hj(xlj(t),xjl(t),ulj(t),ujl(t))dT -

lgS j gi (i)n(N\s ) ' to '

- ^ ^ hj(xlj(t),xjl(t),ulj(t),ujl(t))dT -

les\{i} j ек (l)ns\{i} to '

-a(S \{i}) ^ ^ hj(xlj(r),xjl(r),uh(r),ujl(r))dT ] +

leS\{i} j gif (l)nN\(S\{i}) to

+Shi(x(t),T - t) (12)

i E N, which exhibits the time consistency property of the Shapley value impu-

tation Shi(x(t),T - t), for t E [t0, T].

6. Differential Games with Pairwise Interactions in Pollution Problems

Consider following alternative game-theoretic model. The network structure is

shown in Figure 1. There are three players to present three national or regional

factories that participate in the game with the network structure. N = {1, 2, 3}.

As for link 1 ^ 2 (similar game is considered by Breton et al.(Breton, Zaccour,

Zahaf, 2005)). Region 1 and Region 2 play the pollution game. Each region has

an industrial production site. The production is assumed to be proportional to

the pollution u12. Thus the strategy of each player is to choose the amount of

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

pollutants emitted to the atmosphere, u12 E [0, b12],b12 > 0. A12 is the amount

that the government subsidizes to the factory 1 at each moment, d12x12(t) is the

environment department that penalizes factory 1 at each moment.

Let x12 be the accumulated volume or stock of the pollution in region 1. The

dynamics of each player 1 and 2 at link 1 ^ 2 is described by

x12(t) = u12(t),x12(to) = x02,t E [to,T]

(13)

x21(t)

u21 (t),x21(to) = x01,t € [to, T]

(14)

The payoff of each player in the pairwise interactions game on the link 1 ^ 2 is

defined as

i 1

Ki12(xJ2,x21,u12(t),u21(t),T - to) = [(b 12 - -u12(t))u12(t)-

Jt0 2

-d12(x12(t) + x21 (t)) + A^]dt

Гт 1

K221(x21,xJ2,u12(t),u21(t),T - to) = ^ [(621 - ^u21(t))u21(t)-

-d21(x12(t) + x21 (t)) + A21]dt

As for link 2 ^ 3 (similar game is considered by Gromova at al.(Gromova, Tur

and Barsuk, 2022)), we consider another pollution game. The release pollution of

each player 2, 3 are denoted by u23 and u32. Where u23 € [0, b23],b23 > 0, u32 €

[0, b32], b32 > 0. Let x23(t) and x32(t) denote the stock of accumulated pollution by

time t. The dynamics of each player 2 and 3 at link 2 ^ 3 is described by

x23(t) = u23(t) - Sx23, x23(t0) = x(j3, t € [to, T] (15)

x32(t) = u32(t) - Sx32, x32(to) = x32, t € [to, T] (16)

Where S is the absorption coefficient corresponding to the natural purification of

the atmosphere, we assume that S > 0. Here we don’t consider the additional cost.

The payoff of each player in the pairwise interactions game on the link 2 ^ 3 is

defined as

Kf(x23, xo2,u23(t), u32(t), T - to)=/ ((623 - 2 u23(t))u23(t)-

-d23(x23(t) + x32(t)) + A23)dt

т

Kf(x32,x32,u23(t),u32(t),T - to)= / ((632 - -u32(t))u32(t)-

Jto 2

-d32(x32(t) + x23(t)) + A32)dt (17)

In the network game, as for multiple links, the payoff of each player is defined

as

= [(b12

Jta

H (xo2, x

+ f [(b23

■Jt0

21

o

H (xj2, x^1, u12(t), u21(t), T - to)

2u12(t))u12(t) - d12(x12(t) + x21(t)) + A12]dt

, x23, x32, u12(t), u21(t), u23(t), u32(t), T - to)

2u21(t))u21(t) - d21(x21(t) + x12(t)) + A21]d+

2u23(t))u23(t) - d23(x23(t) + x32(t)) + A23]dt

(18)

(19)

Hs(

J32

0

J23

0

u23(t), u32(t), T - to)

2u32(t))u32(t) - d32(x32(t) + x23(t)) + A32]dt

Subject to dynamics (13)-(16).

Under the cooperation, players maximize the total payoff

(20)

V ({N} ; xo,T - to)

Г1 1

12 ma532 23( / [(bl2 - 2ul2(t))ul2(t) - d12(xl2(t)+ x21(t)) +

u12 ,u21 ,u32 ,u23 J t0 2

+A12 + (&21 - 2u21(t))u21(t) - d2i(x21(t) + x12(t)) + A21 + (623

2 u23(t)u23(t)-

-d23(x23(t) + x32(t)) + A23 + (632 - 2u32(t))u32(t) - d32(x32(t) + x23(t)) + A32]dt)

(21)

Subject to the dynamics (13)-(16).

Using Pontryagin Maximum Principle (PMP) to solve the optimization problem,

firstly, write down the Hamiltonian function:

H(xo, T - to,u(t), Ф) = u12(612 - 1 u12) - d^(x12 + x21) + A12 + u21 (621 - 1 u21)-

-d21(x21 + x12) + A21 + (623 - 1 u23)u23 - d23(x23 + x32) + A23 + (632 - 1 u32)u32-

-d32(x32 + x23) + A32 + ^12u12 + ^21 u21 + ^23(u23 - dx23) + ^32(u32 - dx32)

Then we have the following boundary conditions on adjoint variable (T)

Фгэ (T) = 0, i € N, j € K(i) (22)

Taking the first derivative with respect to u12, we get the expressions for the

optimal controls:

u12(t) = 612 + Ф12 (t)

The canonical system is written as

( x12 = u12 = 612 + Ф12 (t)

\Ф 12 = ^12 + d21 = d, (Ф21 = d21 + ^12

Where 6 = 612 + 621, d = d12 + d21.

Recall that the initial condition is x12(t0) = x102, also

condition, which is obtained from (22), then we get the

Ф12 (t) = -d(T - t)

Ф21 (t) = -d(T - t)

Substitute this solution to the differential equation (23) to obtain the expression for

x12(t):

=d

(23)

using another boundary

x12(t) = 2 • t2 — 2 • + (bi2 — dT) • t + (—bi2 + Td) • to + xg2 (24)

The optimal control is

u12(t) = b12 — d • (T — t)

Similarly, we get the optimal trajectories

x21(t) = d • t2 — d • t2 + (621 — dT) • t + (—621 + Td) • to + x02 (25)

Here d = d12 + d21, b = 612 + 621 -

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

here C23

x23(t) = C23e

5t + bj3_ e-5(T-t)d~

+ S 2 S2

d

S2

eS t° (x03

b23 I

5 +

e

S-(T-t°)J

252

+ 52), 6

623 + 632, d = d23 + d32

(26)

x32(t) = C32e

5t + 6j2_ e-5(T-t)d

+ S 2S2

d

S2

(27)

here C32 = e5t°(x32 — + e-^-*°)<? + #■).

The corresponding optimal controls are

u21(t)= 621 — d • (T — t) (28)

u23(t)= 623 — e-----,---d (29)

S

e-5(T-t) . d

u32(t) = 632 — e—-—d (30)

S

V({1} ,xo,T — to) = a({1})(/t°[(612 — 2u12(t))u12(t) — d12(x12(t) + x21(t)) +

A12]dt)

V({2} ,xo,T — to) = a({2})(/t°[(621 — 2u21(t))u21(t) — d21(x21(t) + x12(t)) +

A21]dt + fT [(623 — 2 u23(t))u23(t) — d23(x23(t) + x32(t)) + A23]dt)

V({3} ,xo,T — to) = a({3}) [(632 — u32(t))u32(t) — d32(x32(t)+ x23(t)) + A32]dt

V({1, 2} , xo, T — to) = [(612 — 2u12(t))u12(t) — d12(x12(t)+ x21(t)) + A12]dt +

/t°[(621 — 2u21 (t))u21(t)—d21(X21(t)+x12(t))+A21]dt+a({1, 2})(/t^°[(623 —1 u23)u23(t) —

d23(x23(t) + x32(t)) + A23]dt)

V({1, 3} ,xo,T — to)

A12 + (632 — 2 u32)u32 —

= a({1, 3})(/tT[(612 — 2u12(t))u12(t) — d12(x12(t) + x21(t)) +

d32(x32(t) + x23(t)) + A32]dt)

V({2, 3} ,xo,T — to) = /tT[(623 — u23(t))u23(t) — d23(x23(t) + x32(t)) + A23]dt +

/t°[(632 u32(t))u32(t)—d32(x32(t)+x23(t))+A32]dt+a({2, 3})(/t(°[(621 — 2u21(t))u21(t) —

d21(x21(t) + x12(t)) + A21]dt)

Remark 1. The instantaneous payoff in the game is (bj — 2 uij (t))uij (t)—dj (xij (t)+

xji(t)) + Aj, since (bij — 2uij(t))uij(t) > 0,uij G [0, bj], and if Aij >

maxxij(t),xji(t)(dij(xij(t) + xji(t)),t G [to, T], then all instantaneous payoffs for each

player at any time t are non-negative.

Additional conditions:

A12 > max [di2(x12(t) + x21(t))] = di2[b(T — to) + xQ2 + Xq ]

A21 > max [d2i(x21(t) + x21(t))] = d2i[b(T — to) + x^1 + xQ2]

A23 > max [d23(x23(t) + x32(t))] = d23[(x03 + x02)e S(T to) + -(1 — e S(T to))]

A32 > ma^[d32(x32(t) + x23(t))] = d32[(x02 + x03)e S(T to) + -(1 — e S(T to))]

Try to compute the core and the Shapley value. Assume the following values of

parameters: b12 = 200, b21 = 250, b23 = 300, b32 = 280, d12 = 1,d21 = 1.5, d23 =

1.5,d32 = 2,5 = 0.3, a({1}) = a({2}) = a({3}) = 0.2,a({1, 2}) = a({1,3}) =

a({2,3}) = 0.8, to = 0, T = 5,x02 = 50, xO1 = 60, x23 = 60, x02 = 80, A12 =

2360, A21 = 3540, A23 = 2300, A32 = 30 66.5, V(N,x0,T — t0) = 7.0751872 • 105.

Then calculate

x ,x

x ,x

x ,x

x ,x

V({1} , x0, T — t0) = 2.114062 • 104,V({2} ,x0,T — t0) = 7.920335 • 104,

V({3} ,x0,T — t0) = 4.115977 • 104,V({1, 2} ,x0,T — t0) = 4.5528798 • 105,

V({1, 3} ,x0,T — t0) = 2.4920158 • 105,V({2, 3} ,x0,T — t0) = 5.6904414 • 105

Sh(x0; T — t0) = (1.5055948 • 105, 3.3951212 • 105, 2.1744712 • 105).

в(х0, T — t0) = (1.0570313 • 105,3.9601674 • 105, 2.0579885 • 105)

To illustrate the time consistency, we choose the proportional solution в value as

the cooperative solution, when t=2.5, then the payoffs of players at time period

[0,2.5] are (5.422852 • 104, 2.0087443 • 105,1.0361796 • 105), and at the time period

[2.5, 5] are (5.147461 • 104,1.9514232 • 105,1.0218089 • 105).

7. Conclusion

In this paper, we studied the differential games with pairwise interactions. A

new characteristic function is proposed in the game. By cooperation, we considered

proportional solution and the Shapley value as solutions. Finally, the results are

illustrated by an example.

References

Breton, M., Zaccour, G., Zahaf, M. (2005). A differential game of joint implementation of

environmental projects. Automatica, 41(10), 1737-1749.

Gromova, E., Tur, A., Barsuk, P. (2022). A pollution control problem for the aluminum

production in eastern Siberia: Differential game approach. In: Smirnov, N., Golovkina,

A. (eds) Stability and Control Processes. SCP 2020. Lecture Notes in Control and

Information Sciences - Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-

030-87966-244

Petrosyan, L. A. (1993).Strong Time-Consistent Differential Optimality Principles. Vestn.

Leningrad. Univ., 4, 35-40.

Petrosyan, L. A. (2010). Cooperative differential games on networks. Trudy Instituta

Matematiki i Mekhaniki UrO RAN, 16(5), 143-150.

Petrosyan, L. A., Pankratova, Y. B. (2023). Solutions of cooperative differential games with

partner sets. Matematicheskaya Teoriya Igr i Ee Prilozheniya, 15(2), 105-121.

Petrosyan, L. A., Yeung, D.W., Pankratova, Y. B. (2023). Power Degrees in Dynamic

Multi-Agent Systems. Trudy Instituta Matematiki i Mekhaniki UrO RAN, 29(3), 128-

137.

Tijs, S. H. (1987). An axiomatization of the т value. Math. Soc . Sci. 13, 177-181.

https://doi.org/10.1016/0165-4896(87)90054-0

Tur, A. V., Petrosyan, L. A. (2020). The Shapley value for differential network

games: Theory and application. Journal of Dynamics and Games., 8(2), 151-166.

https://doi.org/10.3934/jdg.2020021

Tur, A.V., Petrosyan L. A. (2021). Cooperative optimality principles in differen-

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

tial games on networks. Automation and Remote Control., 82, 1095-1106.

https://doi.org/10.1134/S0005117921060096

Shapley, L. S.(1953). A value for N-person Games. In: Kuhn,H., Tucker,A.(eds.). Contri-

butions to the Theory of Games, pp. 307-317. Princeton University Press, Princeton.

Yeung, D. W. K., Petrosyan, L. A. (2016). Subgame Consistent Cooperation. Subgame Con-

sistent Cooperation; Springer Science and Business Media LLC: Singapore.

i Надоели баннеры? Вы всегда можете отключить рекламу.