On the Construction of the Characteristic Function in Cooperative Differential Games with Random Duration
Ekaterina Shevkoplyas
St. Petersburg State University,
Faculty of Applied Mathematics and Control Processes,
198904, Russia, St.Petersburg, Petrodvorets, University pr., 35 E-mail address: [email protected]
Abstract. The class of cooperative differential games with random duration is studied. The problem of characteristic function construction is researched. The Hamilton-Jacobi-Bellman equation for the problem with random duration is derived. The method of calculating the characteristic function values with the help of given equation is represented as algorithm. The results are illustrated with the examples.
Keywords: Cooperation, differential games, random duration, characteristic function, the Hamilton-Jacobi-Bellman equation, resource extraction.
Introduction
In differential game theory the common way is to consider cooperative differential games with prescribed duration or infinite time horizon. However it seems to be more realistic to expect the end of a game in a random time instant. Therefore, the class of cooperative differential games with random duration was purposed to introduce in [Petrosjan and Shevkoplyas, 2003]. We start by introducing a definition of the game in section 1.
In cooperative differential game players seek to solve the optimal control problem such that maximization of total payoff is subject to a number of constraints, in particular, a differential equation describing the evolution of the state of the game. One of the basic solution techniques for optimal control problem is the Hamilton-Jacobi-Bellman equation [Dockner, 2000]. But we have non-standard dynamical programming problem for games with random duration because of the objective functional form (double integral), that is why till recently the Hamilton-Jacobi-Bellman equation appropriates for using in cooperative differential games with random duration
didn’t exist. In section 2 we derive the Hamilton-Jacobi-Bellman equation for the problem with random duration in general case for arbitrary probability frequency distribution f (t) = F'(t). In section 3 we discuss methods of the characteristic function construction such classical solution by Neumann and Morgenstern [Vorobyev, 1985] and non-standard Nash equillibrium approach [Petrosjan, 2003]. The main idea of this approach is an assumption that if a subset of players form a coalition then the left-out players stick to their feedback Nash strategies. We represent an algorithm of the characteristic function value calculating based on Petrosjan and Zaccour concept and new Hamilton-Jacobi-Bellman equation derived in section 2.
At last we consider 2 applications of our theory. In section 4 we consider the first example of 3-person cooperative differential game with random duration. In this example open-loop and feedback solutions coincide. In section 5 we research one simple model of common-property nonrenewable resource such as an oil field which had been studied in [Dockner, 2000] with the assumption of infinite time horizon. We consider this problem under condition of a random game duration. Now we use the new Hamilton-Jacobi-Bellman equation to solve the cooperative game of nonrenewable resource under condition of random duration with exponential probability distribution.
1. Definition of the game
Consider n-person differential game r(xo) from the initial state xo with random duration T — to. Here the random variable T with distribution function F(t), t e e [to, ro),
/ dF (t) = 1,
Jt 0
is the time instant, when the game r(x0) ends. The game starts at the moment t0 from a position x0.
Let the motion equations have the form
X = g(x, ui,..., un), x e Rn, ui e U C comp Rl, (1)
x(to) = xo.
The “instantaneous” payoff at the moment t, t e [to, ro) is defined as hi(x(t)). Then the expected integral payoff of the player i, i = 1,...,n is evaluated by the formula
p OO pt
Ki(xo, ui,..., Un) = / hi(x(T))dTdF(t), hi > 0, i = 1,...,n. (2)
t0 t0
Let x*(t) and u*(t) = (u*(t),...,un(t)) be the cooperative trajectory and the corresponding n-tuple of open loop controls maximizing the joint expected payoff of players (we suppose that maximum is attained):
maxV' Ki(xo,ui,...,un) = V' Ki(xo,u*,...,un) =
u
i=i i=l
n Ptt ft
= ^2 / hi(x*(T))dTdF(t) = V(I,xo). (3)
i=i ^to Jto
Trajectory x* (t) and open loop controls u*(t) = (u*(t),..., u"n(t)) are called optimal. For simplicity we shall suppose further, that the optimal trajectory is unique. The following will be true for each optimal trajectory.
For the set of subgames r(x* ($)) occurring along an optimal trajectory x*($) one can similarly define the expected total integral payoff in cooperative game r(x* ($)):
n Ptt ft
V(I,x*(tf)) = ^ / / hi(x*(T))dT dF$(t). (4)
i=i J -& J-&
It is clear, that (1 — F($)) is the probability to start r(x* ($)).
Then we have the conditional distribution function F#(t) as follows:
m) = t<=[d,oo). (5)
In the same way we get the expression for conditional distribution in subgames r(x*(^ + A)):
_ F*(t) - + A) _ F(t)-F(0 + A)
1 — Ftf($ + A) 1-F(i? + A) '
Further we assume an existence of a density function f (t) = F'(t). As above we get the formula for conditional density function:
* W = I t G °°); x _ /(^+ A). i G [I? + A, oo). (7)
From (7) we obtain
This is needed for the sequel.
2. The Hamilton—Jacobi—Bellman equation
The Hamilton-Jacobi-Bellman equation lies at the heart of the dynamic programming approach to optimal control problems. Let us remark that the functional (4) doesn't have the standard form for dynamic programming problem. Thus, we need
to derive the Hamilton-Jacobi-Bellman equation appropriates for the problem with random duration.
n
We denote H(x(t)) = J2 hi(x(t)). In general case we consider H(x,u).
i=1
Let P(x, ft) be an optimization problem
/ ftf(t) H(x(t), u(t))drdt
Itf Jtf
(9)
subject to x(t) satisfies (1), x(ft) = x.
Let W(x, ft) be the optimal value (or Bellman function) of the objective functional of problem P(x,ft) in (9):
W (x, ft) = max
/ ftf(t) H (x(t ),u(t ))dTdt
Itf Jtf
(10)
We can see that the maximal total payoff in r(x0) is
V (I,xo) = W (xo,to).
In control theory one usually makes the assumptions that given functions g and H are sufficiently smooth and satisfy certain boundless conditions to ensure that solutions to (10) are uniquelly defined and integral in (10) make sense. Here we don’t impose any strong restrictions because we can not easily assume any restrictive properties for the functions g and H, but we make an assumption to objective functional W to be well defined.
Clearly, if we behave optimally from t + A onwards, the total expected payoff is given by formula
W (x,ft + A) = max
p t
ftf+A(t) H (x(t ),u(t ))dTdt
tf+A Jtf+A
Using (10), (8), (7) and (11), we get:
/•tf+A t-1
W(x, ft) = max / ftf(t) H(x(T),u(T))dTdt +
u Jtf Jtf
ptf+A
+ ftf(t)dt I H(x(t),u(t))dT +
■J tf+A J tf
+ 1 -i f [ H{x{T),u{T))dTdt =
1 - F(ft) Jtf+A Jtf+a
(11)
(12)
= max
ptf+A pt
ftf(t) H(x(t),u(t))dT dt +
tftf
+ 1 i!f(^)A) j*+AH^u^dT + 1 + A)’^ + A) )■■
t
oo
t
oo
Note,
1-F(fl + A) A)
1 -F(d) 1 -F(d) ' 1 ’
Now substract W(x, ft) from both sides of (12) and divide the resulting equation by A. This yelds
( 1 />tf+A /■t
0 = max( — fs{t) H(x(T),u(T))dTdt+ (14)
u \AJ tf Jtf
1 f tf+A F (ft)- F (ft + A) 1 f tf+A
+ — H(x(T),u(r))dT +------1 _ H(x(T),u(r))dT +
+ W(x(0 + A),0 + A)-W(x(0),0) + + A)j ^ + A)^
Let A ^ 0. From the mean value theorem we know that
1 t'tf+A /• t
lim — ftf(t) H(x(t), u(r))dTdt = 0. (15)
A—>o A Jtf Jtf
Moreover, we have
1 /*tf+A
^lim — H(x(t), u(r))dr = H(x(ft), «($)); (16)
1 F(ft) - F(ft + A) F'(ft) f (ft)
lim
A—>o A 1 — F(ft) 1 — F(ft) 1 — F(ft)
Combining (14), (15) and (16), we obtain
0 = max (HixiS), uOd)) H---------W(x, $)+
u dft
+ lim
A—>o
_A 1 — F (ft)
Finally, we have the Hamilton -Jacobi-Bellman equation:
1 F(<&) F(<& + A) + A^ ^ + Aj
f (ft) ^ dW (x,ft)
-W(x,,i9) = ---------—--------h max
H(x(i}),u(i})) + g(x,u)
. (17)
1 — F (ft) K ' dft
Suppose that the final time instant T has the exponential distribution:
f (t) = pe—p(t—to), F(t) = 1 — e—p(t—to) if t > t0, (18)
F (t) = f (t) = 0 if t<to.
Then the Bellman function is as follows
. . r t
W(x, V) = max
pe p(t ^ H(x(t),u(t))drdt
(19)
and the Hamilton-Jacobi-Bellman equation (17) has the form:
pW(x,t) = 8W^ + max (H(x(t), u(t)) + dW^X,t\(x, u)l . (20)
dt u [ dx )
This equation looks like Hamilton-Jacobi-Bellman equation for the problem with prescribed duration with discount factor p ([Dockner, 2000]). Let us remark that firstly the Hamilton-Jacobi-Bellman equation for the 2-person game of pursuits with random duration was derived in a paper ([Petrosjan, 1966]).
3. The characteristic function
Classical solution. The common way ([Vorobyev, 1985]) to define the characteristic function in r(xo) is as following:
[0, S = 0;
V(S, xo) = [ ^um^ies Ki(xo, u), s c N, (21)
[maxj]i= 1 Ki(xo, u), S = N.
Then V(S,x0) (21) is superadditive. But this approach doesn’t seem to be the best in context of environmental or other problems, because unlikely that if a subset of players form a coalition to tackle an enviromental problem, then the remaining players would form an anti-coalition to harm their efforts. For enviromental problem we can use another method of characteristic function construction ([Petrosjan, 2003]) with assumption that left-out players stick to their feedback Nash strategies. This approach was proposed in ([Petrosjan, 2003]). Then we have the following definition of the characteristic function:
[0, S = 0;
V (S,x* (V)) = < Wi (x*(V),V), * = 1,...,n; {z}eI; (22)
[Wk(x*(V),V), K C I,
where Wi(x* (V),V), WK(x* (■&),$) are the results of the corresponding Hamilton-Jacobi-Bellman equations. Consider an algorithm for computation of characteristic function values V(S,x* (■&)), S c N based on Petrosjan and Zaccour approach with the help the new Hamilton-Jacobi-Bellman equation (17).
(1) Maximize the total expected payoff of the grand coalition I.
1 n p^O pt
W7(x,tf)=max--------hi(x(T))dTdF(t), (23)
uieU 1 - F (V) Jv
x(V) = x.
Denote n=i hi(-) by H(■). Then the Bellman function Wj(x,V) satisfies the
HJB equation (17). Results of optimization are optimal trajectory x*(t) and optimal strategies u* = (u*,. ..,u*n).
(2) Calculate a feedback Nash equilibrium. Without cooperation each player i seeks to maximize his expected payoff (2). Thus, the player i solves a dynamic programming problem:
Wi(x, i9) = max ----l——[ [ hi{x{T))dTdF{t), (24)
uieu 1 - F(V) J# J#
x(V) = x.
Denote hi( ) by H(■). In this notation Wi(x,V) satisfies the HJB equations (17) for all i e I. Denote by uN(■) = {uN(■), i = 1,...,n} any feedback Nash equilibrium of this noncooperative game r(xo). Let the corresponding trajectory be xN(t). We calculate Wi(x*(V),V) under condition that before time instant V players use their optimal strategies u*.
(3) Compute outcomes for all remaining possible coalitions.
1 r & !■ t
WK{x,ti)= max -----------^77^ J2 hi(x(T))dTdF(t), (25)
ui,iEK 1 - F(V)^KJ# J#
uj = uN for j e I \ K,
x(V) = x.
Here we insert for the left-out players i e I \ K their Nash values (see step 2). In the notation ^^K hi(^) = H(■) the Bellman function WK(x,V) satisfies the corresponding HJB equation (17).
(4) Define the characteristic function V(S,x*(V)), yS C I as
(0, S = 0;
V(S,x*(V)) = lWi(x*(V),V), i = 1,...,n; {i}e I; (26)
[WK(x*(V),V), K C I.
Let us remark that the constructed function V(S,x* (V)) is not superadditive in general.
4. Example I
Now consider an example of 3-person cooperative differential game r(z0) with random duration T — t0. The game starts at the moment t0 from a position z0.
Suppose the random variable T has a probability density function exp-t. The motion equations have the form
Z = u + v + w, (27)
z = (x, y), z(t0) = Z0 = (x0, y0); u = {ui; u2}, v = {vi; V2}, w = {wi; W2},
\u\ < 1, |v| < 1, |w| < 1. (28)
The “instantaneous” payoff at the moment t, t e [t0, to) is defined as
hi(z(r)) = ai ■ x(t)+ bi ■ y(T)+ Ci, ai,bi,Ci > 0; (29)
a2 + b,2 + c2 = 0, i = 1, 2, 3.
The expected integral payoff is evaluated by the formula
& t
Ki(z0,u, v, w) = j j hi(z,u,v,w)exp-t dTdt, i = 1, 2, 3. (30)
to to
The cooperative form of the game r(z0) means that before the beginning of the game the players agree about usage by them such controls u*, v*, w*, that the corresponding trajectory z*(t) will maximize the joint expected payoff of players, i.e.
3 3
u,v,w
i=1 i=1
max y'' Ki(z0,u,v,w) = S^ Ki(z0,
u.v.w • ^ ^
=i
3 /• & /• t
hi(z*
i to to
3 p & p t
/ / hi(z*(T ))exp-t dTdt. (31)
to to
Classical solution
The value Val Gs,j\s is defined as:
Val Gs.j\s = <
maxmin (K1(.)
u.v w
maxmin (K1(.)
u.w v
maxmin (K2(.)
v.w u
maxmin Ki (.) ,
u v.w
maxmin K2(.),
w u.w
maxmin K3(),
w u.v
-K2(.)) ,
-K3(.)), -K3(.)),
S = {1}, S = {2}, S = {3},
S = {1, 2}, I\S = {3}; S = {1, 3}, I\S = {2}; S = {2, 3}, I\S = {1}; I\S = {2, 3};
I\S = {1, 3};
I\S = {1, 2}.
(32)
From (31) we get the formula for the value V(zo,I) in the game r(zo):
p O p t
V(zo,I) = max/ / («123 • x(t)+ 6123 • y(r) +
u’vWJt0 Jto
+ c123) exp-t dTdt, where 0123 = ai + a2 + 03; 6123 = 61 + 62 + 63;
C123 = C1 + C2 + C3.
(33)
Apply maximum principle ([Pontryagin, 1976]) for calculating optimal open loop controls and corresponding trajectory. The functional which is being a subject to maximization is defined by (33). Consider the internal integral in (33). Let us begin to solve the problem:
rtf
J = max / («123 • x(t) + 6123 • y(T) + C123)dT.
u,v,WJt 0
(34)
Here tf (t final) is some fixed t from interval [t0, to). Consider the dual problem:
J = min —
u,v,w '
rtf
(«123 • x(T) + 6123 • y(T) + C123)dT
(35)
Hamiltonian for (35) has the form
H = -01 (u1 + V1 + W1) + ^2(^2 + V2 + W2) + («123 • x(.) + 6123 • y(.) + C123). (36)
Functions 01, 02 satisfy the following differential equations:
dx dH dt dipi dy dH dt d'tp 2
with initial conditions
U1 + V1 + W1; U2 + V2 + W2;
_ dH dt dx
<902 dH
dt dy
—«123;
6
123
x(to) = xo; y (to) = yo;
x(tf) free; y (tf) free;
01 (to) free; 02 (to) free;
01 (tf) = 0; 02 (tf) = 0.
Assume further to = 0.
Optimal controls can be calculated from the condition of maximizing H. As the controls are contained in H linearly, the maximum is reached on boundary. Moreover, taking into account constraints of the admissible controls for the system (28), we have
U2 = v 1 — «1; V2 = \/1 — v2; W2 = \/1 — wf
(37)
Substituting expressions (37) into equation (36) for H and also taking a partial derivative, we get
2; U*2 = Jl-u\2 = J ^ (38)
02 + 02 2 v 1 v 02 + 02
By the similar way we get another optimal controls:
iL- ,,* - - Jl _v*2 - J V>i
V’f+V’l ’ 2 2 y "1 “ V V’f+V’l ■
For 01 (t), 02 (t) we get the formulas:
01 (t) = —a123 • tf + a123 • t = a123 • (t — tf); (39)
02(t) = —6123 • tf + 6123 • t = 6123 • (t — tf ).
From (38) and (39) we get optimal controls:
a123 6123
Ui = v 1 = Wj = — „ ; w9 = w9 = w9 =
1 1 1 a2 2 2 2
Va123 + ^123 Va123 + ^123
(40)
Note, that the resulting optimal controls are stationary! Thus, the usage of maximum principle for internal integral (34) is well defined. From (27) we get
x(t) = («1 + V1 + W1) • t + xo; y(t) = («2 + V2 + W2) • t + yo,
and then for the optimal trajectory:
3a123 36123
x {t) = —====-t + xo-, y (t) = —====■ t + y0. (41)
V a123 + 6123 V a123 + 6123
Now we can calculate the value of the functional (33) along the optimal trajectory. Thus, we shall define the value of the characteristic function V(zo, I) (in a case of all players cooperation). Outgoing from (41), we have
3 ,---------------------------
’Sy^J hi (z*(t)) = 3y a223 + 6223 • t + a123 • xo + 6123 • yo + C123.
i=1
It is not difficult to show, that the following formula is true:
O t 3
V (zo,I) = j j ’^Jhi (z*(r ))exp-t dTdt = (42)
o o i=1
= ^\Ja123+^123+a123 • ^0+^123 • 2/0+C123-
Hence,
,----------- 3
V(zo, I) = 3\Ja123 + 6223 + hi(zo). (43)
1
Applying maximum principle for functionals (32), we get the expressions for the characteristic function:
^(*0, {*, j}) = + hi(z0) + hj(zo), i,j = 1,2,3; * < j;
V"(zo, {*}) = — \J+ hi(zo), * = 1,2,3. (44)
For simplicity rename the characteristic function V(zo, S) as V{S'}. It is not difficult
to show that the property of convexity ([Vorobyev, 1985]) of the characteristic func-
tion
V (S1 U S2) > V (S1 )+V (S2) — V (S1 n S2), V S1,S2 C I (45)
is satisfied. It means, that c-core defined for the characteristic function V(zo,S), S C I in the game r(zo) is not empty and it contains the Shapley Value.
Calculate the Shapley Value in the game r(zo). Substituting the values for V(S) from (44), we get
1 / O . O 1 / O . O 1 / O . O 1
3V1, 1 1 6 V ^ ^ 1 6 V J 1 J,6
+ g \/a13 + &13 “ 3 \Ja23 + &23 + \Ja 123 + ^123 + ^l(zo),
£/12 — “ g \/a2 + ^2 + g \/al + ^ 1 + g \/a3 + ^3 + g \/a12 + ^12 +
+ g \/a23 + ^23 — 3 \/“I3 + ^13 + \/«123 + &123 + ^2(20), (46)
^3 = - 2 \Ja3 + b3 + g \Jal+H + g \Ja2 + ^2 + g \/«13 + 613 +
+ g \/a23 + ^23 - 2 \/a12 + &12 + \/a123 + ^123 + ^(^o)-
In the similar way we get the expressions for the characteristic function in the subgame r(zo) starting at the moment $ as following:
v(z*(d), = yjafj +bfj + hi(z*(i})) + hj(z*('&)), i,j = 1,2,3; * < j;
y(z*(0), {*}) = ~\Jai + % + hi(z*(d)), * = 1,2,3, (47)
and then we get the Shapley Value for the subgame r(z*($)):
s/if = g\Z«i~+^i+g\/«i+^i+g\A*?7+^?2+
+ g \/a13 + &13 _ T,\j a23 + t>23 + \/a123 + ^123 + hl(z* ($)),
Sh2 =
+ g \Ja23 + ^23 _ g \/a13 + 613 + \/a123 + ^123 + ^2{z* {•&)),
# - Li _L /,2 , I I „2 , 1,2 , I I „2 , 1,2 , I. /72
S'/i? =------------------
+ - y af + + - \J^2 + b2 + g \/a13 + 613 +
3 — v 3 3 6V 1 1 6V 2 2 6 V 13 13
+ g \/a23 + ^23 _ 2 \/a12 + &12 + \/a123 + ^123 + ^3(z* ('&))■
Nash equilibrium approach
To construct the characteristic function we use steps 1-4 of the algorithm. Clearly, the HJB equation (17) has the form:
. dW (x,y,t)
W(x,y,t) = -----+
+ maxf.H~(x(t), w(t)) + (Ml _|_ _|_ Wi) _|_ (48)
u y dx
+ ------7^------(«2 +«2 + w2) I.
Step 1. Bellman function:
W(x,y,ft,I) = max J J ^amx(t)+ 6my(T) + c^^ e-(t-i})dTdt.
Further denote W(x, y, ft, I) by W. We have the HJB equation:
dW ( dW
W = ^—+ max ai23a: + 6i23y + C123 + ^—(«1 + wi + wi)+ (49)
dft u [ dx
dW, ,1
+ -7^-(«2 +w2 +W2)|.
As above, the system (37) is true. Differentiating the right-hand side of (49) with respect to u1,v1,w1 we obtain optimal strategies:
«1 = V1 = W1
«2 = V2 = W2
fdW\2
^ dx >
(dW\2 i fdW\2 ’
^ <9cc ' v dy >
(50)
(dW\2
v dy >
We suppose W = Ax + By + C. Then we have = A, = B. Further, substituting this to (49) we get
dx ' v dy
m _
dx ^‘L’ dy
A = a123; B = 6123; , C = 3i/a223 + 6223 + C123.
Then we obtain
w = ai23X + £>1232/ + %\J «123 + ^123 + c123,
and the optimal controls
* * * ^123 * * * ^123 ^ r 1
U-1 = V-t = W-1 = ----. --- «,='!),=»,= —. -- 51
jTP 111 r~2 —T2 v >
v a123 + 6123 v a123 + 6123
Note that open loop controls (40) coincide with feedback solution (51). Hence, it follows that the optimal trajectory has the form
3a123 36123
X W = / 9 ,9 = ' t + X°’ y W = /9 ,9 = • * + ?/0, (52)
V aH3 + 6H3 V aH3 + 6123
and the value
y(z*(i?), /) = W(x*(0), y*(tf), I?, I) = a123x* + b123y* + 3^a\23 + b\23 + c123 =
= a123 + ^123 + a123 + ^123^ + «123^0 + &1232/0 + Ci23^0-
Let us remark, this is the same result as above (see usage maximum principle). Step 2. Similarly, in noncooperative case we get
a1 n 61 N a2
,N = “1 . AT = . JV _ __
1 v^T6!’ 2 1
N __ i>2 fq __ a>3 jy _______ b3
i2 ~7W+M] Wl ~7W+W] U2 ~7W+W]
6i
2 62 a3 63 a3 63
33
N ai N
v'({*}, **w = <i/«?+6?+£ +v,»'+6f + £ “•?0+l,,i,J+
va2 +62 j=i ^aj +62
j=i A / a
+ aixo + 6iyo + Ci.
Step 3. Consider coalition ^ = {1, 2}. Then we obtain
1,2 a12 1,2 1,2 612 1,2
U1 — ---—rr —Vl> U2 — -—pr ~ V2 ’
V a22 + 622 V a22 + 622
N a3 N 63
Va3 + H Va3 + b3
1
12^ I 2ai2 a3
\/ a12 + &12 \/a3 + &3 '
l/({l,2},z*(tf) = (2^ af2 + b\2 + ai2^3 + ^ + 2^ af2 + &i2 +
V v a3 + bo /
, ai2«3 + &12&3 , , ,
^-----to 10-----^ «12^0 + 0122/0 + Cl2-
We get the similar results for coalitions {1, 3} and {2, 3}. Thus, we have constructed the characteristic function V(S,z*($)), S C I with the help of HJB equation (17). However, we can show that V(S, z*($)) is not superadditive.
5. A Game Theoretic Model of Nonrenewable Resources with Random Duration
Consider one simple model of common-property nonrenewable resource extraction published by [Dockner, 2000].
Let x(t) and ci(t) denote respectively the stock of the nonrenewable resource such as an oil field and player is rate of extraction at time t [Dockner, 2000]. Let the transition equation has the form
n
x(t) = ci(t), * =1,...,n; (53)
i=1
x(to )= xo. (54)
The game starts at t0 from x0. We suppose that the game ends at the random time
instant T with exponential distribution f (t) = p * e-p(t-t0), t > t0.
The utility function of player i at t is as follows:
h(ci (t )) = Aln(ci)+B. (55)
Here, A is positive and B is a constant which may be positive, negative or zero.
As in general case we define integral expected payoff
PO ft
Ki(xo,ci,...,cn)= / (A x ln(ci (t )) + B)p x e-p(t-to) dTdt, i =1,...,n,
Jt0 Jt0
and consider total payoff in cooperative form of the game:
nn
Ki(xo, ci, ...,cn) = ^ Ki(xo, c[, ...,cIn)= (56)
Z i=1 i=1 r°° pt n
= / (A x^^ ln(cIi (t)) + nB)p x e-p(t-t0)dTdt.
t0 t0
Step 1. Grand coalition I = {1,...,n}. We have the following Bellman function:
r°° rt n
Wj(x,$) = max / (A x ln(ci(r)) + nB)p x e-p(t-'9)drdt. (57)
J-d
Let us define n=i hi(ci(-)) = H(c(-)). Then we can use the Hamilton-Jacobi-
Bellman equation (17):
dWJ (x,t) ( . dWJ (x,t) \
pWT(x,t) =------—---------h max (H(c)-\--------—------g(x,c)j. (58)
Combining (58) and (57), we obtain
dWJ (x,t) ( dWJ (x,t) \
pWi(x,t) =-------—--------h max ( Anln(ci) + nB -|-------—------(-nc*) j . (59)
Suppose the Bellman function Wj has the form
Wj = Aj ln(x) + Bj. (60)
Then we get
<9Wj(x, t) _ Aj _ dWT(x,t) _ Aj .
dx x ' dt x
Differentiating the right-hand side of (59) with respect to ci, we obtain optimal strategies
jA c- =
dWj (x,t) •
dx
Using (61) and (62), we get
x = —~~x. (63)
Ai y y
Substituting (69), (62) and (63) in (59), we have an equation for a coefficients:
pAI ln(x) + pBI = -2nA + nB + Anln(A) — Anln(AI) + Anln(x). (64)
The result is:
A • B ^n Anln(n) ^ Anln(p)
P ' P P P P '
From (57) and (58) it follows that
An Bn 2An Anln(n) Anln(p)
W/(x,t) =--------Zn(x) +------------------------— +----------—. (65)
P P P P P
Then we get the optimal strategies cf = i = 1,... ,n.
Finally, we have optimal trajectory and optimal controls
xJ(t) = Xo x e-p{t-to)] cl(t) =
and
tt/7- r ^ An r Bn 2An Anln(n) Anln(P)
V(I, x1{d)) = WI{x1 ,d) =------ln(xJ)-\-------------------------—+ -
P P P P P
An, . . , . „ . Bn 2An Anln(n) Anln(p)
=-----ln(x0) - Ant'd - to) H--------------------— H---------—.
P P P P P
Let $ = to. Then
. TT, . . An, , . Bn 2An Anln(n) Anln(p)
V(I,x0 = Wj x0,t0 =---------/n xo +---------------------^ 66
P P P P P
Step 2. Feedback Nash equilibrium. Bellman function for player i:
p OO /* t
Wi(x,$)=max / (A x ln(ci(r)) + B)p x e-p(t-v)drdt. (67)
ci Jv Jv
The initial state
x($) = xj ($).
Now the HJB equation (58) has the form
dWi(x,t)^ dWi(x,t) •n, \
pWi(x,t) =------—------H max \ Aln(ci) + B-\----—-------(~Y2Cin ' (68)
We find Wi in the form As before, we get
Wi = An ln(x) + BN. (69)
A
AN = -■ (70)
P
B 2An Aln(p)
Bn =----------------1------•
P P P
Then we get the Nash feedback strategies and trajectory
cN = px, i = 1,...,n; (71)
xN (t) = xj($) x e-np(t-v); cf(t) = pxj($) x e-np(t-v);
So, we get the value
y({i}, x1^)) = Wi{xJ, i?) = -/n(xJ(i?)) + — - — + Aln<-P^.
P P P P
Let $ = to. Then
-tr/t , A, B 2An Aln(p)
V({*}, x0) = Wi(xo, to) = —ln(x0) H------------------1--------• (72)
P P P P
Step 3. Coalition K C I, \K\ = k, |I \ K\ = n — k. Here k players form a coalition K. Their optimization problem is:
p O p t ____
WK(x, $) = maW I (A ln(ci(r)) + kB)p x e~p(t~v)drdt. (73)
Ci^eKJv Jv “
The initial state x($) = xj ($). Let us recall, that the left-out players i G I \ K will use feedback Nash strategies (71).
In the same way, we get “optimal” for coalition K trajectory, controls
xK{t) = XJ(i?) x e-{n-k+i)p{t-«). CK^ = jjV(tf) x e-{n-k+i)p{t-v).
and the value of coalition payoff:
Tr/Ts jtt , r Ak r kB 2Ak Ak Akln(p)
V(K, x {•&)) = Wk(x , $) = —ln(x ($))H-------------------------ln(k)-\---------.
P P P P P
Let $ = to. Then
Ak kB 2Ak Ak Akln(P)
V(K, x0) = WK(xo, t\ = —ln(x0)-\------------------------------------------ln(k)-\-. (74)
P P P P P
Thus, we have constructed the characteristic function V(K, x0),K C I (see (66),(74)).
Proposition 1. Suppose the characteristic function V(K,xo), K C I is given by (66), (74). Then V(K,xo) is superadditivity.
Lemma 1. Let s1 > 1,s2 > 1. Then
siln(si) + S2ln(s2) + 4sis2 > (si + s2)ln(si + S2). (75)
This lemma can be proved by standard methods. It is easily shown that the left-hand side is faster increasing than the right-hand side. Now proof of the Proposition 1 is by direct calculations.
Finally, we get the Shapley Value in our example:
V(I,x 0) A B 2 A Aln(n) Aln(p)
bhi(xo) =------------= —ln(x0) H---------------------------1--------.
n p P P P P
References
Dockner E., Jorgensen S., van Long N., Sorger G. 2000. Differential Games in Economics and Management Science. Cambridge University Press.
Petrosjan L.A., Zaccour G. 2003. Time-consistent Shapley Value Allocation of Pollution Cost Reduction. Journal of Economic Dynamics and Control. Vol. 27: 381-398.
Petrosjan L.A., Shevkoplyas E.V. 2003. Cooperative Solutions for Games with Random Duration. Game Theory and Applications. Vol. IX. Nova Science Publishers: 125-139.
Bellman R. 1957. Dynamic programming. Princeton University Press: Princeton, NJ.
Shevkoplyas E.V. 2005. The Hamilton-Jacobi-Bellman equation for cooperative differential games with random duration. Stability and Control Processes Conference. St.Petersburg. Ext. abstracts: 630-639 (in Russian).
Shevkoplyas E.V. 2005. On the Construction of the Characteristic Function in Cooperative Differential Games with Random Duration. International Seminar “Control Theory and Theory of Generalized Solutions of Hamilton-Jacobi Equations” (CGS’2005). Ekaterinburg, Russia. Ext. abstracts. Vol.1: 262-270 (in Russian).
Petrosjan L.A., Murzov N.V. 1966. Game Theoretic Model in Mechanics. Litovskyi matematicheskyi sbornik. Vol. VI. Vilnjus, Litva: 423-432.
Pontryagin L.S., Boltyansky V.G. 1976.Mathematical Theory of Optimal Processes. Moscow. Nauka.
Vorobyev N.N. 1985. Game Theory for Economists and Cybernetics. Moscow, Nauka.