Anna Belitskaia1
St.Petersburg State University,
Faculty of Applied Mathematics and Control Processes, Universitetski pr35, St. Petersburg, 198504, Russia E-mail: [email protected]
Abstract In this paper a n-person network game theoretical model of emission reduction is considered. Each player has its own evolution of the stock of accumulated pollution. Dynamics of player i, i = 1, ...,n depends on emissions of players k € Ki, where Ki is the set of players which are connected by arcs with player i. Nash Equilibrium is constructed. The cooperative game is considered. As optimal imputation the ES-value is supposed. The restriction on network structure to realization the irrational behavior proof condition is deduced.
Keywords: network game, Nash equilibrium, ES-value, imputation destribution procedure, irrational behavior proof condition.
1. Introduction
The public interest in environmental problems increases recently. It leads to special intergovermental agreements for reducing emissions. There may be disagreement among different parties as to the problem of allocation of costs of reducing emissions or pollution accumulations. Considerable attention is devoted to the principles of formation of agreements aimed to reduce the level of pollution, including conflict of interest parties to the agreement, as well as game-theoretic models in the field of environmental protection. One example of such models is a game-theoretic model of pollution cost reduction.
The model of pollution cost reduction is proposed in (Petrosjan and Zaccour, 2003). There is two types of costs in the model: the emission reduction cost when limiting emission to the specified level and damage cost. The players aim is to reduce their total costs.
In this paper the network game of emission reduction is considered. This model is based on the model considered in (Petrosjan and Zaccour, 2003).
2. Problem statement
Let consider network differential game G = (P, L), where P is finite set of vertexes; L is the set of pairs (i, j), which is named the set of arcs, where i G P, j G P. Let call p G P — vertexes of network, and the pair (p, y) G P — arc, which connect vertexes p and y.
Consider network game of emission reduction r(I, L), where I is the set of players involved in the network game, I = {1, 2,...,n}.
Players of the set I are vertexes of network.
L — the set of arcs (i, j) G L, i G I, j G I.
Denote the emission of player i, i =1, 2,...,n at time t, t G [to, <x>) as ui(t).
Denote by:
Ki is the set of players, which influence the evolution of the stock of accumulated pollution of player i; in this model Ki is the set of players, which are connected with player i with arc,
Mi is the set of players, on which the player i influences, i.e. the set of players, which have the connection with player i,
mj — the number of players, which have the evolution of the stock of accumulated pollution, which depends on player j emissions, Uj, \Mj \ = mj, Mj = %, j G I.
Let xi(t) be the stock of accumulated pollution of player i by time t. The evolution of the stock of accumulated pollution of player i is governed by the following differential equation:
where 6 denotes the natural rate of pollution absorption.
The arc (i, j ) G L in network game of emission reduction, if the evolution of the stock of accumulated pollution of player i depends on the emissions of player j. Network is oriented, i. e. if the arc (i, j) G L, then it doesn’t follow that the arc
Each player has its own evolution stock of accumulated pollution in the network game of emission reduction, as opposed to the model in Petrosjan and Zaccour, 2003. The evolution stock of accumulated pollution of player i can depend not only of the player i emissions, but of other players emissions, which have the connections with player i.
The game begins at time t0 with initial state x0 = (x0,x2, ...,Xn).
Denote by Ci(ui) the emission reduction cost incurred by country i when limiting its emission to level ui:
U •
¿¿W = y - ôxi(t), Ki = 0,
(1)
(j, i) G L.
Ci(ui(t)) = Ui(t) - Ui)2,
0 < ui(t) < ui, j > 0.
Suppose that the following condition is hold:
Ui-^p + s)-
Di(xi) denotes its damage cost.
Di(xi)= nxi(t), n> 0.
Both functions are continuously differentiable and convex, with and C'i(ui) < 0 and Di(x) > 0. Each player seeks to minimize its total cost. The payoff function of the player i is defined as:
Ki(x0,to) = J e-p(t-to) (Ci(ui(t))+ Di(xi(t))) dt,
to
where p is the common social discount rate.
Example 1. Consider the example, which demonstrates the rule of constructing of evolution the stock of accumulated pollution of player i.
Four players participate in the game I = {1, 2,3,4}.
Player 1 influences on the player 2 only. That is the evolutions the stock of accumulated pollution of players 1 and 2 depend of emissions of player 1. First player holds a half of its own emissions and the second half it gives to player 2. Player 2 influences on the player 3 only.
Player 3 influences on the players 2 and 4. Player 3 holds a half of its own emissions, first quarter it gives to player 2, second quarter it gives to player 4. Player 4 influences on the players 1 and 3.
Thus we obtain the following evolutions of the stock of accumulated pollution:
ii(t) = y + y -SXl(t),
^(i) = y + y + y-^2(i),
^3(i) = y + y + y-<Mi),
±4(i) = y + y - Sx4(t),
xi(to) = x0, i =1,..., 4.
3. Solution of the problem
In subsection 3.1 we calculate a feedback Nash equilibrium. Then in subsection 3.2 we minimize the total cost of grand coalition. The solution of the game in the form of ES-value is considered in 3.3. In subsection 3.4 the time-consistent ES-value distribution procedure is calculated. In the last subsection the irrational behavior proof condition is verified for the network game of emission reduction, when the time-consistent ES-value distribution procedure is used. The restriction on the network structure necessary for the realization the irrational behavior proof condition is deduced in the subsection 3.5.
3.1. Computation of feedback Nash equilibrium.
On the first step we compute a Nash equilibrium. To obtain a feedback Nash equilibrium, assuming differentiability of the value function, the system of Hamilton-Jacobi-Bellman equations must be satisfied. Denote by Fi(x) the Bellman function
of this problem. Above mentioned system is given by the following formula:
pFi(x) = mini ]-(ui - Uif + 7TXi +
ui k 2
+^|^(£(u^) +y-**)}’ *G/’ (2)
jeKi
where x = (xi,x2, ■ ■■, xn) — is situation in the game;.
Costs of player i in any fixed situation x = (xi,x2,...,xn) depend on the stock of accumulated pollution of player i only, and it doesn’t depend on the stocks of accumulated pollution of another players. So we will seek the Bellman function Fi (x) in the following form:
Fi(x) = ai xi + bi■ (3)
Differentiating the right hand side of formulas (2) with respect to ui and equating to zero leads to:
‘ ’ 27 dXi ' {)
Substituting uN (4) and the Bellman function Fi(x) (3) in Hamilton-Jacobi-Bellman equation (2) we get:
1 r9(aixi+ 6j)12 rY^/- 1 -i , 1-
patxt + pbt= - [--—-----] + nxi+[^ («,—) + -Ui -
1 jeKi j
1 1 1 d(aixi + bi) d(aixi + bi) d(aixH + bi)
~ 4^—te,—1—aii------------------ai—(5)
Simplifying the right hand side of (5) leads to:
1
where
paiXi + pbi = -¡-a2 + TTXi + Uf - ciiSxi
u, = («7----- + -üi----------V (<H--------------ai
1 ^ y 3 2m/ 2 27 ^ y 2m/ 47
jeKi j ' jeKi j '
Rewrite the Nash strategies (4) in the following form:
1
— c
27
N _ -,_____.
U— Ujr Clô
Let calculate the coefficients ai and b
n
CLi
_ 7T2 7T .
* 8frf(p + 0)2 + p(p + S) *
a;
where
Substitute the coefficient ai in the equation (6):
UN = Ui —
27 (P + $)'
Cost of player i in the Nash equilibrium:
(7)
F-(xn) -______-___ _____-____+ UN + oxN
t p(p + 5)\81(p +5) +U* +pX*
where xN — noncooperative trajectory of player i.
Substituting the Nash equilibrium strategies uN (7) into the differential equation (1) with initial state xi(t0) = x0, we obtain the following noncooperative trajectory:
xf = e_<5(t_to)x° + ^¿^(l - i = 1,2, ...,n.
3.2. Minimization the total cost of grand coalition
Minimize the total cost of the grand coalition I = {1,2, ■■■,n}. We have following system of optimization problems:
TO
0
mm V'Ki(x°,to) = Y, / e p(t to) (Ci(ui(t)) + nxi(t)) dt, (8)
Ui,U2,...,Un 1' /
iel iel to
subject to equation dynamics:
2mj
jeKi j
xi(to) = xi, i = 1, ■■■,n■
Rewrite the system for dynamic programming problem (8) in the following view:
min ^ Ki(x0,to)= i e-p(t-to) (^(Ci(ui(t))) + n ^ xi(t^\ dt■ (9)
Ul,U2,...,Un*-' / \Z---' Z-' /
iel to \iel iel J
Denote by:
x xi■
iel
The minimizing functional in the right side of (9) depends only on x and it doesn’t depend on xi, i = 1,„,n. So the minimal costs of grand coalition I depends on x and don’t depend on xi,x2,—,xn. Therefore we can consider the Bellman function as the function which depends only on iei xi = x.
The solution of the problem (9) is equivalent to the solution of the following Hamilton-Jacobi-Bellman equation:
n
pF(I,x i,x2, ...,x„) = min - Ui)2 + TYXi) +
Ui,U2,...,Un 2
(10)
i=i
where F(I, xi, x2, ■■■, xn) is the Bellman function.
Differentiating the right hand side of expression (10) subject to ui, we get the strategies uI:
/ 1 1 dF(I,xi,X2, ...,xn) dF (I,xi, X2 ,...,xn) \
= -------¡or--------+-----------ai---------) (“>
1 jEMi i j i
It can be shown in the usual way that the linear function F(I, xi,x2, ■■■, xn)
n
F(I,xi, x2, ■■■,xn) = a E xi + b = ax + b, (12)
i=i
satisfies the equation (10). The Bellman function depends only on ^i r xi.
By assumption,
F(I, xi,x2, ■■■, xn) = F(I, x)■
Substitute the strategies u1 (11) and the Bellman function (12) in the Hamilton-Jacobi-Bellman equation:
n n 2 n
pa ^ Xi + pb = ^ + 7T ^ Xi +
i=i i=i 27 i=i
n 11 n
+“£( 2 (13)
+ 2,J'
i=l jEKi j i=l
Solving the equation (13) leads to the following expression for coefficients a and
b:
n
a=---------- (14)
>=^(X>-^). <->
i=l
Taking into account (14), we get the optimal strategies of the grand coalition:
Y (P + S')'
(16)
Substituting coefficients (14) and (15) into the formula (12) we get the minimal cost of the grand coalition as follows:
n n
nl' *> = (g * - + )1 '17)
where xj is the optimal cooperative trajectory of player i e I.
Substituting the optimal strategies of the grand coalition uj (16) and solving equation of dynamics (1) with initial state xi(t0) = x0 we obtain the optimal cooperative trajectory of player i e I :
x{ = e-^o)xo+l[//(i_e-,(t-to))j ¿ = l,2,...,n,
where
jeKi j ' jeKi j
The sum of Nash emissions of all players is equal to:
nn
2')(p + 5y
i£l iEl
The sum of optimal emissions of players involved in grand coalition I is equal
to:
y (p + S) ’
iel iel '
Thus the sum of emissions of all players in noncooperative case is greater than the the sum of emissions of all players in cooperative case. The more players are involved in the game the more emissions will be reduced in cooperative case as compared with noncooperative case.
3.3. The ES-value.
Definition 1. The vector
£(t) = Mt),b(t),...,£n (t)],
is a ES-value (Driessen and Funaki, 1991), if the component of ES-value £i(t) is given by
F(I,x) - E pi(xf) m = Fi(x?) +-----------------^----------, iel, (18)
n
where Fi(xN) is the costs of player i in the Nash equilibrium;
F(I, x) is the the minimal cost of the grand coalition.
Let calculate ES-value in network game of emission reduction. Substitute the costs of player i in the Nash equilibrium Fi(xN) and minimal cooperative costs
F(I,x) (17) in the equation (18). The first summand in right hand side of the equation (18) is given by:
Therefore the component of ES-value for the player i,i G I for the network emission reduction game is equal to:
3.4. Time-consistency
Time-consistency means that if one renegotiates the agreement at any intermediate instant of time, assuming that coalitional agreement has prevailed from initial date till that instant, then one would obtain the same outcome. The notion of time-consistency was introduced by Petrosjan, 1993 and was used in problems of environmental management (Petrosjan and Zaccour, 2003).
Definition 2. The vector 3(t) = (3i(t),32(t),... ,3n(t)) is a ES-value distribution procedure (ESDP) (see Petrosjan, 1993) if
Definition 3. The vector ¡3(t) = (3i(t),^2(t),---,Pn(t)) is a time-consistent ESDP
Consider ES-value (19) that was computed in the section 3.3.
Straightforward calculations give us the following view for the time-consistent ESDP in the network game of emission reduction:
n
(19)
TO
(Petrosjan, 1993) if at (x1 (t),t) at any t e [to, <x>) the following condition holds
Pi{t) = - J^i{xl{t),t), i £ L
3.5. The irrational behavior proof condition
Consider the case where the cooperative scheme has proceeded up to time t e [to, +ro) and some players behave irrationally leading to the dissolution of the scheme. A condition under which even if irrational behaviors appear later in the game the concerned player would still be performing better under the cooperative scheme is the irrational behavior proof condition (Yeung, 2006), which also is called the D.W.K. Yeung condition.
Consider the solution of the game in the form of ES-value. The irrational behavior proof condition for the problem of emission reduction is described as follows:
t
Fi(x0) >j e-p(T-t0)pi(t)dT + e-p(t-to)FiX(t)), i e I, (21)
to
where Fi(x1 (t)) — costs of player i in the Nash equilibrium with initial state x1 (t) on the optimal cooperative trajectory;
/3i(T) — time-consistent ES-value distribution procedure.
Verify the realization of the irrational behavior proof condition. The left hand side of the inequality (21) is written as follows:
Fi(x0) ==
+ Ai~ + — ■ (22)
P(P + $) \81 (P + $) (P + $) (P + $)
where
Ai = (v,j-----------------) H—us
3 2m,- 2
jeKi j
Consider the integral in the right hand side of inequality (21). The substitution of ^(t) (20) leads to the following integral:
e-S(t — to )x
_ — P(T — toiO/_N.7_ . — 0( t — to) _ I 0
/ {~2rtT+ 5r- „ + S
to
(a tt / , 1 s 3,\ 7T2
+ 1 * 7(/3 + (5) 4m3- + 4'J 5(p + S) ) + 2pj(p + S)2 +
nxo I . n / x—r , 1 . 3 \ i n . ,
The second summand in the right hand side of inequality (21) can be calculated:
+ A, -
7T \ 1 7T
47(p + S) ^ rrij 47(,9 + (5) +
+ I(.4, - ^ (J-) + |))(1 - «-><•->))}. (24)
The substitution of integral (23) and value of e-p(t-to) Fi(xI(t)), which is defined
by the formula (24) into the inequality (21) leads to:
t
I e-p(T-t0)pi(r)dT + e-p(t-to)Fi(xI(t)) =
to
= e-p(t-t0) (________________+_______________________^_______y- 1_\ +
1 8fr/(p + S)2 45^j(pJr 5)2 45^j(pJr5)2 rrij J
n2 I n , 1 3. \ n n
+ 2P1{P+5)2 + “ ^TT)'(.f£ W + Kp+S) + X°JT~5' (25)
Compare the right side (25) and the left side (22) of the formula (21). It can be shown that the inequality (21) is equivalent to the following inequality:
S (e~p(i~io) - l) + 2pe-p{t-to) ( 1 - — I < °- (26)
\ jeKi mj J
The inequality (26) get the following form at the moment t = t0:
<27>
jeKi j
If the sum ^2jeK. satisfies the following inequality:
£ ^ a <28>
jeKi j
the inequality (27) is satisfied. Hence the inequality (26) is satisfied at time t = t0.
The first summand in the right hand side of the inequality (26) is nonpositive for all t e [to, +to). If (28) is satisfied, than the second summand in the right hand side of the inequality (26) will be nonpositive for all t e [t0, +to). Therefore the inequality (26) is satisfied for all t e [t0, +ro), if the (28) is satisfied. It means that the following theorem is proved.
Theorem 1. The irrational behavior proof condition is realized in the network game of emission reduction for time-consistent ES-value distribution procedure if the following restriction to the network structure is satisfied:
References
Driessen, T. S.H. and Y. Funaki (1991). Coincidence of and collinearity between game theoretic solutions. OR Spektrum, 13, 15-30.
Dockner, E. J., S. Jorgensen, N. van Long and G. Sorger (2000). Differential Games in Economics and Management Science. Cambridge University Press, 41-85.
Haurie, A. and G. Zaccour (1995). Differential game models of global environment management // Annals of the International Society of Dynamic Games, 2, 3-24.
Iljina, A., Kozlovskaya, N. (2010). D. W. K. Yeung’s Condition for the coalitional solution of the game of pollution cost reduction. // Graduate School of Management, Contributions to game theory and management, 3, 171-181.
Kaitala, V. and M. Pohjola (1995). Sustainable international agreements on green house warming: a game theory study // Annals of the International Society of Dynamic Games, 2, 67-88.
Owen, G. (1997). Values of games with a priory unions. In: R. Henn and O. Moeschlin (eds.). Mathematical Economy and Game Theory (Berlin), 78-88.
Petrosjan, L. (1993). Differential Games of Pursuit. World Sci. Pbl., 320.
Petrosjan, L. A. (2010). Cooperative differential games on networks. Trudy Inst. Mat. i Mekh. UrO RAN, 16(5), 143-150.
Petrosjan, L. and G. Zaccour (2003). Time-consistent Shapley value allocation of pollution cost reduction // Journal of Economic Dynamics and Control, 27 , 381-398.
Yeung, D. W. K. (2006). An irrational - behavior - proofness condition in cooperative differential games // Intern. J. of Game Theory Rew., 8, 739-744.
Yeung, D.W. K., L. A. Petrosjan (2006). Cooperative Stochastic Differential Games, Springer Science+Business Media. Springer Science+Business Media.