Cooperative Differential Games with Pairwise Interactions in
Pollution Control Problems
He Yang
St.Petersburg State University,
7/9, Universitetskaya nab., St. Petersburg, 199034, Russia
E-mail: [email protected]
Abstract This paper establishes a new class of dynamic games which con-
tain two phenomena often observed in real-life, differential network games
and pairwise interactions. It is assumed that the vertices of the network
are players, and the edges are connections between them. By cooperation, a
particular type of characteristic function is introduced. Then, the coopera-
tive solutions are constructed, proportional solution and the Shapley value.
Finally, the results are illustrated by an example.
Keywords: dynamic network game, pairwise interaction, characteristic func-
tion, Shapley value, programming.
1. Introduction
Recently, differential games on networks are widely used in real life. For instance,
different kinds of papers investigate the following topic. How to define the motion
equation for each player? What is the particular payoff of each player? How do the
players react to the changes in the behavior of their neighbors? If the player can cut
the connection with his neighbor, will he do it or not? The cooperative solutions
are dynamic stable (time-consistent) or not. For the first time, (Petrosyan, 2010)
proposed a differential game on network, which assumes that the state variable
of each player is dependent on his control, and the payoff function of each player
relies on the action by himself and his neighbor. Later, (Petersyan and Yeung, 2020)
proposed a new characteristic function, which satisfies the convexity property. In
the paper (Tur and Petrosyan, 2021) some basic solutions are investigated, such as
Core, The Shapley value (Shapley, 1953) and т—value (Tijs, 1987). (Petrosyan and
Pankratova, 2023), a new characteristic function combined with the partner set is
constructed.
However, a serious problem is the time-consistency property, but in real life, it
is often not satisfied ( Yeung and Petrosyan, 2016, Petrosyan, 1993). In this paper,
we consider differential games with pairwise interactions, for instance, a player's
payoffs depend on his or her strategy and the strategies of their neighbors in gaming
networks.
The paper is organized as follows. Section 2 describes the model of differential
games with pairwise interactions. Section 3 defines a characteristic function. In
Section 4 proportional solution is constructed. In Section 5 the dynamic Shapley
value is proposed. Section 6 demonstrates the results and the numerical simulation.
In the Section 7 conclusion is given.
2. Differential Games with Pairwise Interactions
Consider a class of n-person differential network games with pairwise interaction
over the time horizon [t0,T]. The players are connected to a network system. Let
https://doi.org/10.21638/11701/spbu31.2023.18
N = {1, 2,..., n} denote the set of players in the network. The nodes of the network
are used to represent the players in the network.
A pair (N, L) is called a network, where N is a set of nodes, and L c N x N
is a given set of arcs. Note that the pair (i, i) G L. Nodes represent the players. If
pair arc(i, j) G L, a link connects players i G N and j G N. It is supposed that all
connections are undirected. We also denote the set of players connected to player i
as K(i) = [j : arc(i,j) G L], for i G N, i = j. K(i) = K(i) U i.
The state dynamics of the game are given by
xij (t ) = fij (xij (t ),uij (t )); xij (to) = xO, (1)
for t G [t0;T] and i G N, j G K(i).
Here xij (t) G Rm is the state variable of player i interacting with player j G K(i)
at time t, and uij (t) G Uij , Uj c Rl, the control variable of player i interacting
with player j. Every player i plays a differential game with player j according to the
network structure. The function fij(xij(t),uij(t)) is continuously differentiable in
xij (t) and uij (t).
Define the payoff of each player i at each link or arc i ^ j by
Kij (x'O ,x30l,ulj ,uji,T - to) = hj (xij (t) , xjt (t) , utj (т),иРг(т))3,т
(2)
Because player i plays multiple different differential games, the dynamic equation
contains the player i's control and the control of his neighbor who plays the differ-
ential game with him. The payoff function of player i is not only dependent upon
his control variable, which is from the strategy profile u®(t) = (uij(t), j G K(i)),
and trajectories x®(t) = (xij(t), j G K(i)) but also depend on the control variables
of his neighbor, which is from the strategy profile uj(t) = (uji(t), i G K(j)). Denote
by u(t) = (u1(t),...,ui(t),...,un(t)), where u®(t) = (uij(t),j G K(i)) is the control
variable of player i in the network structure. We use x0 = (xO, ...,x0, ...,xg) to de-
note the vector of initial conditions, where xO = (xij(t0), j G K(i)) is the set of
initial conditions of player i. The payoff function of player i is given by
Hi(xlo,x0,ul,u3,j G K(i),T - to)
J2 K(x0j,x0l,ulj,uji,T - to)
jeK (i)
(xij (t), xji(T), uij
(t), uji(T))dT
(3)
Here, the term hj(xij(t),xji(T),uij(t),uji(T)) is the instantaneous gain that
player i can obtain through network links with player j G K(i). We also suppose
that the term hj(xij(T),xji(T),uij(t),uji(T)) is non-negative.
3. Cooperative Differential Games with Pairwise Interactions
In this section, we use the characteristic function which is first proposed by
Petrosyan et al, (Petrosyan, Yeung and Pankratova, 2023). The game Г(x0,T —10)
is defined on the network (N, L), the system dynamics (1) and playersBTi™ payoffs
are determined by (3). Player i (i G N ), choosing a control variable uij from his
set of feasible controls, seeks to maximize his objective functional (3). Suppose that
players can cooperate to achieve the maximum total payoff.
EE (/ hj(xj(t),aji(T),uj(t),uTl(T))dT)
ieNjeK (i) t0
,„E
E
ieN jeK (i)
hj'(xij(t), xji(T), uij(t), uji(r))dT
U
0
subject to dynamics (1).
(4)
Definition 1. The characteristic function V(S; xo,T —to) is defined as
V (S; xo,T — to) = E E
hj (xij (t ),xji(T ),uij (t),u31(t ))d/r +
ieS jeK (i)ns
' to
+a(S) E E
hj(xij(t),xji(t),uij(t),uji(T))dT ,S C N. (5)
ieS jeK(i)nN\S
Here a(S) G [0,1), since player i is inside the coalition S and his neighbour player j
is outside coalition S, and player i is losing part of his payoff,
[1 — a(S)] £ies £jeK(i)n(N\s) /t0 hj (xij(t),xji(T)).
From (4), for coalitions {i} , {0}, we get
to
v({i} ,x0,x0,T—to) = «({i}) E (/ hj(xij(T),xji(T),uij(T),uji(T))dT
je/0(i)j=i ^ 0
jeK (i),j=i
V ({0} ,xo,T — to) =0
(6)
(7)
subject to dynamics (1).
Here more interesting thing is that for different coalition S, a(S) may be differ-
ent, a({S}) = a(S \ {i}), j = i.
4. Proportional Solution (в—value)
Using the defined characteristic function, we introduce the proportional solution
(в-value) as
ei(x o,T — to)
V({i} ,x0,T — to)
EieN V({i} , xo,T — to)
V (N,xo,T — to)
(8)
for i £ N. Here ^ V\ is individual player taken contribution from
T,i£N V({i/,x0 ,T-t0) ^ J
all player.
5. Dynamic Shapley Value
Introduce the Shapley value as
Shi(x0,T - to )= V (|S |- 1)!)(n ~|S |)! x [V (S; xo ,T - to) - V (S\{i} ; xo ,T - to)]
n!
SCN
S3i
(9)
for i € N.
Form (9), we obtain
Shi(xo,T - to) = ^
SCN
S3i
(|S|- 1)!)(n -|S|)!
n!
x
V V if hj (x1 (T),x3l(T),ulj (T),u3l(T))dT)] +
leS jeK(l)nS
+a(S) V V (/ hj(xlj(T),xjl(T),ylj(T),ujl(T))dTj -
lP,S \-'t0 )
leS jeK(l)n(N\s)
- V V
leS\{i} jeK(l)nS\{i}
hj (xlj (t ),xjl(T ),ulj (t ),ujl(T ))dT 1 -
-a(S\{i}) V V
hj(xlj(t),xjl(t),ulj(t),ujl(t))dT 1] (10)
leS\{i} jeK (l)nN\(S\{i})
' to
Applying the Shapley value imputation in (11) to any time instance t € [to,T],
we obtain:
Sh,(XW,T -1)= V <|S|- 1)!>in -|S|>! x
f J П 1
SCN
S3i
[V V if hj(xlj(T),xjl(T),ulj(T),ujl(T))dT)j +
leS jeK (l)nS
+a(S) V V (/ hj(xj(T),xjl(T),ulj(T),ujl(T))dT) -
.....Wt /
leS je-K (l)n(N\s)
- V V
leS\{i} jeK(l)nS\{i}
hj(xlj(t),xjl(T),ulj(t),ujl(T))d,T I -
-a(S\{i}) V V
hj(xlj(t),xjl(t),ulj(t),ujl(t))dT 1] (11)
leS\{i} jeK (l)nN\(S\{i})
Proposition 1. It is clearly that the Shapley value imputation in (10)-(11) satisfies
the time consistency property.
Proof. By direct computation. Then, we have
Shi (xo, T - to)
(|S|- 1)!)(n - |S|)! x
scn n!
S 3i
x[E ^ hj(xlj(T),xjl(T),ulj(T),j (т))^т) +
lgS j(l)ns to
+a(s)^ ^ hj(xlj(t),xjl(t),ulj(t),ujl(t))dT -
lgS j gi (i)n(N\s ) ' to '
- ^ ^ hj(xlj(t),xjl(t),ulj(t),ujl(t))dT -
les\{i} j ек (l)ns\{i} to '
-a(S \{i}) ^ ^ hj(xlj(r),xjl(r),uh(r),ujl(r))dT ] +
leS\{i} j gif (l)nN\(S\{i}) to
+Shi(x(t),T - t) (12)
i E N, which exhibits the time consistency property of the Shapley value impu-
tation Shi(x(t),T - t), for t E [t0, T].
6. Differential Games with Pairwise Interactions in Pollution Problems
Consider following alternative game-theoretic model. The network structure is
shown in Figure 1. There are three players to present three national or regional
factories that participate in the game with the network structure. N = {1, 2, 3}.
As for link 1 ^ 2 (similar game is considered by Breton et al.(Breton, Zaccour,
Zahaf, 2005)). Region 1 and Region 2 play the pollution game. Each region has
an industrial production site. The production is assumed to be proportional to
the pollution u12. Thus the strategy of each player is to choose the amount of
pollutants emitted to the atmosphere, u12 E [0, b12],b12 > 0. A12 is the amount
that the government subsidizes to the factory 1 at each moment, d12x12(t) is the
environment department that penalizes factory 1 at each moment.
Let x12 be the accumulated volume or stock of the pollution in region 1. The
dynamics of each player 1 and 2 at link 1 ^ 2 is described by
x12(t) = u12(t),x12(to) = x02,t E [to,T]
(13)
x21(t)
u21 (t),x21(to) = x01,t € [to, T]
(14)
The payoff of each player in the pairwise interactions game on the link 1 ^ 2 is
defined as
i 1
Ki12(xJ2,x21,u12(t),u21(t),T - to) = [(b 12 - -u12(t))u12(t)-
Jt0 2
-d12(x12(t) + x21 (t)) + A^]dt
Гт 1
K221(x21,xJ2,u12(t),u21(t),T - to) = ^ [(621 - ^u21(t))u21(t)-
-d21(x12(t) + x21 (t)) + A21]dt
As for link 2 ^ 3 (similar game is considered by Gromova at al.(Gromova, Tur
and Barsuk, 2022)), we consider another pollution game. The release pollution of
each player 2, 3 are denoted by u23 and u32. Where u23 € [0, b23],b23 > 0, u32 €
[0, b32], b32 > 0. Let x23(t) and x32(t) denote the stock of accumulated pollution by
time t. The dynamics of each player 2 and 3 at link 2 ^ 3 is described by
x23(t) = u23(t) - Sx23, x23(t0) = x(j3, t € [to, T] (15)
x32(t) = u32(t) - Sx32, x32(to) = x32, t € [to, T] (16)
Where S is the absorption coefficient corresponding to the natural purification of
the atmosphere, we assume that S > 0. Here we don’t consider the additional cost.
The payoff of each player in the pairwise interactions game on the link 2 ^ 3 is
defined as
Kf(x23, xo2,u23(t), u32(t), T - to)=/ ((623 - 2 u23(t))u23(t)-
-d23(x23(t) + x32(t)) + A23)dt
т
Kf(x32,x32,u23(t),u32(t),T - to)= / ((632 - -u32(t))u32(t)-
Jto 2
-d32(x32(t) + x23(t)) + A32)dt (17)
In the network game, as for multiple links, the payoff of each player is defined
as
= [(b12
Jta
H (xo2, x
+ f [(b23
■Jt0
21
o
H (xj2, x^1, u12(t), u21(t), T - to)
2u12(t))u12(t) - d12(x12(t) + x21(t)) + A12]dt
, x23, x32, u12(t), u21(t), u23(t), u32(t), T - to)
2u21(t))u21(t) - d21(x21(t) + x12(t)) + A21]d+
2u23(t))u23(t) - d23(x23(t) + x32(t)) + A23]dt
(18)
(19)
Hs(
J32
0
J23
0
u23(t), u32(t), T - to)
2u32(t))u32(t) - d32(x32(t) + x23(t)) + A32]dt
Subject to dynamics (13)-(16).
Under the cooperation, players maximize the total payoff
(20)
V ({N} ; xo,T - to)
Г1 1
12 ma532 23( / [(bl2 - 2ul2(t))ul2(t) - d12(xl2(t)+ x21(t)) +
u12 ,u21 ,u32 ,u23 J t0 2
+A12 + (&21 - 2u21(t))u21(t) - d2i(x21(t) + x12(t)) + A21 + (623
2 u23(t)u23(t)-
-d23(x23(t) + x32(t)) + A23 + (632 - 2u32(t))u32(t) - d32(x32(t) + x23(t)) + A32]dt)
(21)
Subject to the dynamics (13)-(16).
Using Pontryagin Maximum Principle (PMP) to solve the optimization problem,
firstly, write down the Hamiltonian function:
H(xo, T - to,u(t), Ф) = u12(612 - 1 u12) - d^(x12 + x21) + A12 + u21 (621 - 1 u21)-
-d21(x21 + x12) + A21 + (623 - 1 u23)u23 - d23(x23 + x32) + A23 + (632 - 1 u32)u32-
-d32(x32 + x23) + A32 + ^12u12 + ^21 u21 + ^23(u23 - dx23) + ^32(u32 - dx32)
Then we have the following boundary conditions on adjoint variable (T)
Фгэ (T) = 0, i € N, j € K(i) (22)
Taking the first derivative with respect to u12, we get the expressions for the
optimal controls:
u12(t) = 612 + Ф12 (t)
The canonical system is written as
( x12 = u12 = 612 + Ф12 (t)
\Ф 12 = ^12 + d21 = d, (Ф21 = d21 + ^12
Where 6 = 612 + 621, d = d12 + d21.
Recall that the initial condition is x12(t0) = x102, also
condition, which is obtained from (22), then we get the
Ф12 (t) = -d(T - t)
Ф21 (t) = -d(T - t)
Substitute this solution to the differential equation (23) to obtain the expression for
x12(t):
=d
(23)
using another boundary
x12(t) = 2 • t2 — 2 • + (bi2 — dT) • t + (—bi2 + Td) • to + xg2 (24)
The optimal control is
u12(t) = b12 — d • (T — t)
Similarly, we get the optimal trajectories
x21(t) = d • t2 — d • t2 + (621 — dT) • t + (—621 + Td) • to + x02 (25)
Here d = d12 + d21, b = 612 + 621 -
here C23
x23(t) = C23e
5t + bj3_ e-5(T-t)d~
+ S 2 S2
d
S2
eS t° (x03
b23 I
5 +
e
S-(T-t°)J
252
+ 52), 6
623 + 632, d = d23 + d32
(26)
x32(t) = C32e
5t + 6j2_ e-5(T-t)d
+ S 2S2
d
S2
(27)
here C32 = e5t°(x32 — + e-^-*°)<? + #■).
The corresponding optimal controls are
u21(t)= 621 — d • (T — t) (28)
u23(t)= 623 — e-----,---d (29)
S
e-5(T-t) . d
u32(t) = 632 — e—-—d (30)
S
V({1} ,xo,T — to) = a({1})(/t°[(612 — 2u12(t))u12(t) — d12(x12(t) + x21(t)) +
A12]dt)
V({2} ,xo,T — to) = a({2})(/t°[(621 — 2u21(t))u21(t) — d21(x21(t) + x12(t)) +
A21]dt + fT [(623 — 2 u23(t))u23(t) — d23(x23(t) + x32(t)) + A23]dt)
V({3} ,xo,T — to) = a({3}) [(632 — u32(t))u32(t) — d32(x32(t)+ x23(t)) + A32]dt
V({1, 2} , xo, T — to) = [(612 — 2u12(t))u12(t) — d12(x12(t)+ x21(t)) + A12]dt +
/t°[(621 — 2u21 (t))u21(t)—d21(X21(t)+x12(t))+A21]dt+a({1, 2})(/t^°[(623 —1 u23)u23(t) —
d23(x23(t) + x32(t)) + A23]dt)
V({1, 3} ,xo,T — to)
A12 + (632 — 2 u32)u32 —
= a({1, 3})(/tT[(612 — 2u12(t))u12(t) — d12(x12(t) + x21(t)) +
d32(x32(t) + x23(t)) + A32]dt)
V({2, 3} ,xo,T — to) = /tT[(623 — u23(t))u23(t) — d23(x23(t) + x32(t)) + A23]dt +
/t°[(632 u32(t))u32(t)—d32(x32(t)+x23(t))+A32]dt+a({2, 3})(/t(°[(621 — 2u21(t))u21(t) —
d21(x21(t) + x12(t)) + A21]dt)
Remark 1. The instantaneous payoff in the game is (bj — 2 uij (t))uij (t)—dj (xij (t)+
xji(t)) + Aj, since (bij — 2uij(t))uij(t) > 0,uij G [0, bj], and if Aij >
maxxij(t),xji(t)(dij(xij(t) + xji(t)),t G [to, T], then all instantaneous payoffs for each
player at any time t are non-negative.
Additional conditions:
A12 > max [di2(x12(t) + x21(t))] = di2[b(T — to) + xQ2 + Xq ]
A21 > max [d2i(x21(t) + x21(t))] = d2i[b(T — to) + x^1 + xQ2]
A23 > max [d23(x23(t) + x32(t))] = d23[(x03 + x02)e S(T to) + -(1 — e S(T to))]
A32 > ma^[d32(x32(t) + x23(t))] = d32[(x02 + x03)e S(T to) + -(1 — e S(T to))]
Try to compute the core and the Shapley value. Assume the following values of
parameters: b12 = 200, b21 = 250, b23 = 300, b32 = 280, d12 = 1,d21 = 1.5, d23 =
1.5,d32 = 2,5 = 0.3, a({1}) = a({2}) = a({3}) = 0.2,a({1, 2}) = a({1,3}) =
a({2,3}) = 0.8, to = 0, T = 5,x02 = 50, xO1 = 60, x23 = 60, x02 = 80, A12 =
2360, A21 = 3540, A23 = 2300, A32 = 30 66.5, V(N,x0,T — t0) = 7.0751872 • 105.
Then calculate
x ,x
x ,x
x ,x
x ,x
V({1} , x0, T — t0) = 2.114062 • 104,V({2} ,x0,T — t0) = 7.920335 • 104,
V({3} ,x0,T — t0) = 4.115977 • 104,V({1, 2} ,x0,T — t0) = 4.5528798 • 105,
V({1, 3} ,x0,T — t0) = 2.4920158 • 105,V({2, 3} ,x0,T — t0) = 5.6904414 • 105
Sh(x0; T — t0) = (1.5055948 • 105, 3.3951212 • 105, 2.1744712 • 105).
в(х0, T — t0) = (1.0570313 • 105,3.9601674 • 105, 2.0579885 • 105)
To illustrate the time consistency, we choose the proportional solution в value as
the cooperative solution, when t=2.5, then the payoffs of players at time period
[0,2.5] are (5.422852 • 104, 2.0087443 • 105,1.0361796 • 105), and at the time period
[2.5, 5] are (5.147461 • 104,1.9514232 • 105,1.0218089 • 105).
7. Conclusion
In this paper, we studied the differential games with pairwise interactions. A
new characteristic function is proposed in the game. By cooperation, we considered
proportional solution and the Shapley value as solutions. Finally, the results are
illustrated by an example.
References
Breton, M., Zaccour, G., Zahaf, M. (2005). A differential game of joint implementation of
environmental projects. Automatica, 41(10), 1737-1749.
Gromova, E., Tur, A., Barsuk, P. (2022). A pollution control problem for the aluminum
production in eastern Siberia: Differential game approach. In: Smirnov, N., Golovkina,
A. (eds) Stability and Control Processes. SCP 2020. Lecture Notes in Control and
Information Sciences - Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-
030-87966-244
Petrosyan, L. A. (1993).Strong Time-Consistent Differential Optimality Principles. Vestn.
Leningrad. Univ., 4, 35-40.
Petrosyan, L. A. (2010). Cooperative differential games on networks. Trudy Instituta
Matematiki i Mekhaniki UrO RAN, 16(5), 143-150.
Petrosyan, L. A., Pankratova, Y. B. (2023). Solutions of cooperative differential games with
partner sets. Matematicheskaya Teoriya Igr i Ee Prilozheniya, 15(2), 105-121.
Petrosyan, L. A., Yeung, D.W., Pankratova, Y. B. (2023). Power Degrees in Dynamic
Multi-Agent Systems. Trudy Instituta Matematiki i Mekhaniki UrO RAN, 29(3), 128-
137.
Tijs, S. H. (1987). An axiomatization of the т value. Math. Soc . Sci. 13, 177-181.
https://doi.org/10.1016/0165-4896(87)90054-0
Tur, A. V., Petrosyan, L. A. (2020). The Shapley value for differential network
games: Theory and application. Journal of Dynamics and Games., 8(2), 151-166.
https://doi.org/10.3934/jdg.2020021
Tur, A.V., Petrosyan L. A. (2021). Cooperative optimality principles in differen-
tial games on networks. Automation and Remote Control., 82, 1095-1106.
https://doi.org/10.1134/S0005117921060096
Shapley, L. S.(1953). A value for N-person Games. In: Kuhn,H., Tucker,A.(eds.). Contri-
butions to the Theory of Games, pp. 307-317. Princeton University Press, Princeton.
Yeung, D. W. K., Petrosyan, L. A. (2016). Subgame Consistent Cooperation. Subgame Con-
sistent Cooperation; Springer Science and Business Media LLC: Singapore.