David W.K. Yeung2, Leon Petrosyan1, Vladimir Zhuk1 and
Anna V. Ilj ina1
1 St.Petersburg State University,
Faculty of Applied Mathematics and Control Processes, Russia, St.Petersburg 2 Department of Business Administration, Hong Kong Shue Yan University, Hong Kong
Abstract Irrational behavior proof condition for single player was introduced in Yeung, 2006. In his paper the generalization off this condition for arbitrary coalitions S C N is proposed. The condition is demonstrated on differential cooperative game first considered in Petrosyan and Zaccour, 2003.
It is shown that the dynamic Shapley Value computed for this game satisfies also the irrational behavior proof condition for coalitions.
Keywords: optimal cooperative trajectory, Nash equilibrium, imputation distribution procedure, characteristic function, Shapley value.
1. Irrational Behavior Proof Condition
Consider n-person differential game r(x0, t0) with infinite duration and independent motions on the time interval [to, +ro). Let I = {1,n} be the set of players. Motion equations have the form:
where hi(x, u) is a continuous function and x(t) = {x1(r), ...,xn(t)} is the solution of system (1) when open-loop controls u1(t), ...,un(t) are used and
is the initial condition.
Each player seeks to decrease his total costs.
Suppose that there exist an n -tuple of open-loop controls u(t) = (tT1(t),..., un(t)) and the trajectory x(t),t G [to, +ro), such that
x(to) = {xi(to), ...,xn(to)} = (x?, ...,x(n} = xo
n
(ui(t.
1=1
The trajectory x(t) = (x!(t), ...,xn(t)) satisfying (3) is called "optimal cooperative trajectory”.
Define the characteristic function V(x0,t0; S) of cooperative game for S C I. The computation of the characteristic function values is not standard (and is similar to the definition given in Petrosyan and Zaccour, 2003) and therefore it needs to be discussed. We make the assumption that left-out players (I\S) stick to the feedback Nash strategies when the characteristic function value is computed for coalition S. The values of characteristic function V(x0, t0; S) are called ”the minimal guarantied cost of coalition S”.
Let L(x0, t0) be the imputation set, where I = {1, 2,..., n}. a*(t) denote the cost of player i under cooperation over the time interval [t, +ro) along the cooperative path x(t) for t G [t0, +ro). Since the imputation satisfies group and individual rationality, we have:
n
V(x0,t0; I) = 53 aj(^), and
j=1 a*(t0) < V(t0,x0; {i}).
Suppose a G L(x0 ,t0). Consider V (x(t),t; S),L(x0,t0) along the cooperative trajectory x(t) and a G L(x0,t0). Let the function 3*(t) satisfies the condition:
a*(t) = J e-p(T-to)3i(T)dr. t
We call the function 3*(t), t G [t0, +ro] imputation distribution procedure
(IDP) (see Petrosyan, 1993 and Petrosyan and Zaccour, 2003). Suppose for all t0 < t < m now that in some intermediate instant of time the irrational behavior of some player(or players) will force the other players to leave the cooperative agreement, then the irrational behavior proof condition (see Yeung, 2006) requires that the following inequality must be satisfied
t
V (x0,t0; {i}) >J e-p(T-to)3i (t )dT + e-p(t-to)V (x(t); {i}), i G I, (4)
to
where V(x0,t0; {i}) is the minimal guarantied cost of player i with the initial state x0, when he plays individually, V(x(t); {i}) is the similar cost with initial state x(t) on the cooperative trajectory. The first expression which stands in right part of condition (4) is the cost of player i which he gets by cooperation on time [t0, +ro) using cost distribution over time 3*(t) along the cooperative trajectory x(t). Since the costs are calculated from initial state t0 the second term in right side of inequality (4) is taken with discounting. The irrational behavior proof condition means that the minimal guarantied costs of player i calculated at the time t0, provided that all players act individually has to exceed the minimal guarantied costs of player i calculated at the time t0 in the case when player i enters the Grand coalition I during the time [t0, t] and then the coalition I breaks up at moment t and all players play individually over the time interval [t, +ro). If the condition (4) is satisfied player i is irrational-behavior-proof (I-B-P) because irrational actions
leading to the dissolution of cooperative scheme will not bring his resultant costs below his initial noncooperative costs.
V(xo,to; {i}) is constant. If t = t0 the inequality (4) turns into identity. The sufficient condition for realization of inequality (4) is the monotone increase of the function which stands in the right hand side of inequality (4). The sufficient condition of the monotone increase of function is non-negativity of its derivative.
Differentiating the right hand side of inequality (4) with respect to t leads to
= e~p('t~to)/3i(t) + e-pit-to)-^V(x(t); {*}) - pe^pi-t^to)V{x{t)] {*}).
Then we get the follows sufficient condition of realization of inequality (4) for
IDP p(r) = (pi(r),p2 (r),...,pn (t )) :
e-p{t-t0)< -e-p^-to^V(x(T)]{i}) + pe-p{t-t^V(x(t)]{i}), i =
dT
(5)
Multiplying the right and left sides of inequality (5) on ep(t-to), we get:
Pi(r) <-^V(x(r);{i}) +pV(x(t);{i}), i = l,...,n. (6)
In (6) V(x(t); {i}) is the value of the zero-sum game played with coalition I\{i} as one player and player i with the coalitional payoff equal to \-Hi(x(T); u\,.., un)]. Suppose that y(t),t G [t, +to) is the trajectory of this zero-sum game, when the saddle point strategies are played. We suppose that for each initial condition X(t),t G [to, +ro) such saddle point exist (if not we can consider e -saddle point in piecewise open loop strategies which for every given e > 0 exist always, but the following formulas in this case are to be considered with e -accuracy).
Substitute the initial state on optimal cooperative trajectory X(t) and trajectory y(t) in the value of payoff of player i (2) :
V(x(t); {i})= J e-p(t-t0]hi(x(T); y(t))dt, (7)
T
where y(T) = X(t).
Substitute the value of the zero-sum game V(x(t); {i}) from (7) to inequality (6) and calculate differential in right hand side of inequality (7):
e-p(T-t°)ßi(T) < _A(e-p(T-io) j e-p(t-t°)hi(x(T)]y(t))dt).ct
T
Then using the definition for differential of a function of several variables and motion equations (1), we obtain:
= pe-^-^ j e~p{-t~to)hi{x{T)\y{t))dt-e~P{T~to)-J- J e~P^t~to'*hi(x(r)',y(t))dt =
TT
= pe-p(T-to) j e-p(t-to)hi(x(T);y(t))dt - e-p(T-to)[-e-p(T-to)hi(x(T);y(T))+
hOO
1 „ n m
+ I e-^Y:'£dh‘{T)Mt))MiMMrm =
T 1=1 k=i lk pe-p(T-to) j e-p(t-to)hi(x(T); y(t))dt + e-2p(T-to')hi(x(T); X(t)) —
nm
l = 1 k = 1
e-p(T-to)f%(T) < pe-p(T-to) J e-Mt-‘o)hi(x(T);y(t))dt + e-2p(T-'o)h,;(x(T);X(t))-
T 1=1 k=i
(8)
Let multiply (8) on ep(T-t0):
fa(t) < p J e-p(t-to)hi(x(t); y(t))dt + e-p(T-t0)K('X(t); X(t))-
T
T 1=1 k=1 xik
If a(t) G M(x(t),T — t) C L(x(t),T — t), where M(x(t),T — t) is some fixed optimality principle (core, NM-solution, Shapley value), and if a(t) can be chosen as differentiable selector, we can write
pc*-i(t} dt°ii^
or
which means also time-consistency of a (see Petrosyan, 1993). And the condition (8) gives as in terms of optimal imputation
T
fik (x(t ),u(t ))dt.
which now guarantees both time-consistency and irrational behavior proof.
2. The Irrational Behavior Proof Condition for Coalitions
In Section 1 we considered the irrational behavior proof condition for player i. We can introduce the similar condition for coalitions:
where Sk is any coalition in n-person differential game r(xo,to) and player i is included in the coalition Sk. V(x0,t0; {Sk}) is the minimal guarantied cost of coalition Sk with the initial state x0 on the time [to, +to) when left-out players (I\Sk) stick to the feedback Nash strategies. V(x(t); {Sk}) is the similar cost with the initial state on optimal cooperative trajectory x(t) during the time [t, +to). The first expression which stands in right hand side of inequality (9) is the sum of costs of players included in coalition Sk obtained using IDP ¡3i(T) when players are involved in Grand coalition.
The condition (9) means that the minimal guarantied costs of coalition Sk calculated at the moment t0 provided that coalition Sk operates individually should be more than the minimal guarantied costs calculated at moment t0 in the case when the Grand coalition I breaks up at the moment t and the coalition Sk survived till the end of the game.
2.1. Example
Consider the game of emission reduction for which the irrational behavior proof condition for coalitions is satisfied. Earlier this the problem was considered by Kajtala and Pohjola, 1995, Petrosyan and Zaccour (2000).
Problem statement. The dynamics of the model is proposed in Petrosyan and Zaccour, 2003.
Let I be the set of countries involved in the game of emission reduction: I = {1, ...,n}. The game starts at the instant of time t0 from initial state x0. Emission of player i, (i = 1,..., n) at time t,t G [t0; to), is denoted ui(t). Let x(t) denote the stock of accumulated pollution by time t. The evolution of this stock is governed by the following differential equation:
(9)
X(t) =5^ Uj(t) — öx(t),
x(to) = xo,
where S denotes the natural rate of pollution absorption. Let denote ui(t) = ui and x(t) = x. Ci(ui) will be the emission reduction cost incurred by country i while limiting its emission to level ui:
Ci(ui(t)) = ^(ui(t) - üi)2, 0 < Ui(t) < üi, 7 > 0.
Di(x(t)) denotes its damage cost:
Di(x) = nx(t), n > 0.
Both functions are continuously differentiable and convex, with and C'i(ui) < 0 and Di (x) > 0. The payoff function of the player i is defined as
Ki(x0,t0 ,u) = J e-p(t-to) (Ci(ui(t)) + Di(x(t))) dt,
to
subject to the equation dynamics (10), where u = (ui, ...,un) and p is the common social discount rate. Further check the irrational behavior proof condition for coalitions for the problem of emission reduction.
Solution of the problem. The irrational behavior proof condition for coalitions for the problem of emission reduction is described as follows:
t
V(xo; {Sfc}) > ^ f e-p(T-t0)ßi(r)dT + e-p(t-t0)V(x(t); }), (11)
ieSk t0
Formulas for a feedback Nash equilibrium uN, an optimal cooperative trajectory x(t), optimal cost of Grand coalition, optimal cost of intermediate coalitions, the Shapley value and the value of a time-consistent IDP first obtained in Petrosyan and Zaccour, 2003 we present for completeness.
Since the game is played over the infinite time horizon, we look for stationary strategies. To obtain a feedback Nash equilibrium, assuming differentiability of the value function, the following Hamilton-Jacobi-Bellman (H-J-B) equations must be satisfied:
pFi(x,t) = min < Ci(ui) + Di(x) + F/(x)^ ui(t) - Sx(t)] >
u% I iei J
From the H-J-B equation we obtain the Nash emission Strategy:
UN = Ui —
Y (P +
For this Nash strategies the value of function Fi(x,t) satisfies the H-J-B equation is calculated as follows:
n I n _ nn I
= + - ^5)^+-pxf-
n
For computation of cost of Grand coalition we need to solve a standard dynamic programming problem. The following Bellman equation must be satisfied:
pF(I,x,t) — min \ 53 C(ui) + Di(x)) + F'(I,x,t)
= 1
53 ui(t) — Sx(t)
Hei
i e I.
The optimal emission strategy satisfies the Bellman equation is calculated as follows:
nn
U1 — Ui —
Y(P + S)’
i I.
And the function F(I,x,t) in this case takes the following value:
F(I, x, t) —
P(P + S)
.iel
2Y(P + S)
+ px>.
To get time representation of accumulated stock of pollution, we insert the expression of ul in (10) and solve to obtain the optimal cooperative trajectory:
1
x(t) = e *■ °’)£0 + 7
d
(12)
To compute the optimal cost for intermediate coalition Sk we should to consider the Bellman equation:
pF(Sk, x, t) — rminj 53 (Ci(ui) +Di(x))+
ieSk + F'(Sk, x1t)
53 ui(t) + 53 uN(t) — 5x(t)
ieSk ieI\Sk
}, i & I.
By a very similar procedure as the one adopted for solving for the Grand coalition we get the following results:
F(Sk, x, t)
kn
u i —
k2n n(n — k)
P(P + S) Ve 2Y(P + S) Y(P + S)
Sk —
u k — ui —
kn
Y (P + S)’
+ x(t)P \ , n = \II
Vi & Sk, k — \Sk\.
The characteristic function V(Sk ,x,t) of the cooperative game is defined as follows:
V(x, t, {i}) — Fi(x,t)
S.
V(x,t, {Sk}) — F(Sk,x,t) —
p(p+S) |27(,9 + (5) kn
+ 53 u i —
Y (P + S)
+ Px? ,
u i —
iel
k2n n(n — k)
P(P + S) \ei 2Y(P + S) Y(P + S)
+ x(t)P
P(P + S)
iei
2Y (P + S)
‘)
nn
nn
‘)
nn
Denote it by 4>(V,x,t) = (^i(V,x,t), ...,4>n(V,x,t)). As imputation in this example we consider the Shapley Value. Component i is given by:
px
Kiel
Define a time-consistent IDP. Allocate to player i, i = 1, 2,..., n, at instant of time t € [to, +ro), the following amount:
¡3i{t) = p<f>i(V,x,t) - ^(j>i(V,x,t). (14)
The formula (14) allocates at instant of time t to player i a cost corresponding to the interest payment (interest rate times his cost-to-go under cooperation given by his Shapley value) minus the variation over time of this cost-to-go.
The following proposition shows that f3(t) = (3i(t), ...,3n(t)), as given by (14), is indeed a time-consistent IDP.
Substituting the Shapley value in (14) leads to:
2 2 nn
m = + 2*W (15)
The game of emission reduction is the symmetrical game, so the minimal guarantied costs of every player are equal. Then we can write:
t t
^ f e-p(T-toS>i3i(r)dr = k i e-p(T-to)¡3i(r)dT, k = \Sk\.
ieSk to to
Rewriting the inequality (11):
V(xo; {Sfc}) > kj e-p(T-to)pi(r)dT + e-p(T-to) V(x(t); {Sfc}), (16)
to
Let calculate V(x0; {Sk}):
kn / k2n n(n — k) \ / N
v(xo; {s‘l) = IaFTT) (J> “ ij - ^Tsf + (17)
Substitution the trajectory x(t) (11) in the value of characteristic function (13) and multiplication the resulting expression by e-p(t-to) leads to:
-p(t-to
W(x(t)-lSh\) - e-^-t0> kn (Vu____________________—________<n~k) +
V[x[t), {^4)-e p + S)\^Ui 2^(p+S) ^(p + S) +
P(P + S) e 2Y (P + S) Y (P + S)
+p{e-^t-t^x(} + -5
Y,ui —
iel Y(p + S)
^1 — e--5^)) I) y
t
e
Consider the integral in right hand side of inequality (11). Substitute the IDP values for Shapley value ßi(T) (15) and the optimal cooperative trajectory x(t) (12) in the integral:
+ 7T e~'5(T~Îo)Xo + —
.iel
Y (P + S)
(1 - e-S(T-to)^ jj yir.
Compute the integral and simplify the resulting expression:
= e-p(t-to)n I —
J e-p(T-to)ßi(T)dT =
to
- -
n2n e s(t to)x0 I _ n2n \ 1
2PY(P + S)2 P + S Y(P + S')) p5
\iel
+ 7
TTXq
Y(p + S) / S(p + S) / 2py(p + S)2 p + S
yieI Y(P + S) ^ PS Y(P + S) / S(P + S)'
Adding ^e p(t to)3i(T)dT and e p(t to)V(x(t); {Sk}) and simplify resulting
to
expression we get:
t
kJ e-p(T-to)ßi(T)dT + e-p(t-t0')V(x(t); {S'fc}) =
(t t ) kn2 ( n2 k2 .
-p(t-to)--------------- I---------------n+k) +
YP(P + S)H 2 2
knx0 kn
+—? + > ü p + S
(18)
P + S ikl i P(P + S) 2PY(P + S)2'
t
t
2
nzn
t
22
nznz
2
nzn
e
kn2n2
Substitute values of right and left sides (17) and (18) in inequality (11) and rewrite the irrational behavior proof condition for coalitions for the problem of emission reduction:
kn ^ - k37r2 7T2(n — k)k kn
l^Ui~ +x°~,
P(P + 5) 2PY(P + py(P + ¿)2 P +
kn2 ( n2 ok2
>
s I”Y + "2 - T - " + k) + (19)
knxo _ kn kn2n2
knxn x—^
+-----------F + } r‘
n -I- A
P + ^ ikr P(P + ^ 2PY(P + 5)2'
Cancel same summands in right and left sides of inequality (19). Divide the
k'K2
p-y(p^S)2
resulting inequality on the multiplier —f\x\2 > 0:
«-*-.> (—- — -n + k)-—. (20)
2 V22 /2 v 7
(20) is equivalent the following inequality:
e p(t-to) _ A I v_ _ _n + k ] < 0. (21)
22
e-p(t-t0) _i < o for t > to- The inequality (21) is true when ^ — -y- — n + k > 0. Let verify that. Multiply this inequality on 2 and simplify it:
(n — k)(n + k — 2) > 0,
hence the inequality (21) is true. Thus we obtain that the irrational behavior
proof condition for coalitions is satisfied for the problem of emission cost reduction.
References
Haurie, A. and G. Zaccour (1995). Differential game models of global environment management. Annals of the International Society of Dynamic Games, 2, 3-24.
Kaitala, V. and M. Pohjola (1995). Sustainable international agreements on green house warming: a game theory study. Annals of the International Society of Dynamic Games,
2, 67-88.
Petrosyan, L. (1993). Differential Games of Pursuit. World Sci. Pbl., 320.
Petrosyan, L. and G. Zaccour (2003). Time-consistent Shapley value allocation of pollution cost reduction. Journal of Economic Dynamics and Control, 27 , 381-398.
Petrosyan, L. and S. Mamkina (2006). Dynamic games with coalitional structures // International Game Theory Review, 8(2), 295-307.
Petrosyan, L. and N. Kozlovskaya (2007). Time-consistent Allocation in Coalitional Game of pollution cost reduction. Computational Economics and Financial and Industrial Systems, A Preprints Volume of the 11th ifac symposium, IFAC publications Internet Homepage, http://www.elsevier.com/locate/ifac, 156-160.
Yeung, D. W. K. (2006). An irrational - behavior - proof condition in cooperative differential games. Intern. J. of Game Theory Rew., 8, 739-744.