Научная статья на тему 'The D. W. K. Yeung condition for cooperative differential games with nontransferable payoffs'

The D. W. K. Yeung condition for cooperative differential games with nontransferable payoffs Текст научной статьи по специальности «Математика»

CC BY
5
2
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
PARETO-OPTIMALITY / TIME-CONSISTENCY / PAYOFF DISTRIBUTION PROCEDURE

Аннотация научной статьи по математике, автор научной работы — Belitskaia Anna V.

Irrational behavior proof condition for single player was introduced in Yeung, 2006. The irrational behavior implies the dissolution of cooperative scheme. The condition under which even if irrational behaviors appear later in the game the concerned player would still be performing better under the cooperative scheme was considered in Yeung, 2006. In this paper the differential game with transferable payoffs was considered. In this paper the irrational behavior proof condition for differential games with nontransferable payoffs is proposed.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «The D. W. K. Yeung condition for cooperative differential games with nontransferable payoffs»

Anna V. Belitskaia

St.Petersburg University,

Faculty of Applied Mathematics and Control Processes, Russia, St.Petersburg

Abstract Irrational behavior proof condition for single player was introduced in Yeung, 2006. The irrational behavior implies the dissolution of cooperative scheme. The condition under which even if irrational behaviors appear later in the game the concerned player would still be performing better under the cooperative scheme was considered in Yeung, 2006. In this paper the differential game with transferable payoffs was considered. In this paper the irrational behavior proof condition for differential games with nontransferable payoffs is proposed.

Keywords: Pareto-Optimality, Time-consistency, Payoff Distribution Procedure.

1. Problem statement

Consider a two-person non-zero-sum differential game r(xo, [to, +TO)) starting from the initial state x0 G R2 at moment t0 G R1.

o , i. _ 1 [9(aiSi + bi)-l2 | _0 | 1 -i , 1-

paiSi + ph - — [ ^ ] + nSi + [ ^ (ej —) +

Y S ^ j (1) 1 / 1 ■, 1 d(a,iSi + 6*) i d(aiSi + 5j) d(a,iSi + !);)

27 12mj 47 dSi dSi dS\ *’

The motion equations have the form

X = fi(x, ui, U2), ui G Ui C Rl, x G R2, i = 1, 2

x(to) = x°,

where ui G Ui is the control variable of player i and Ui is a compact set.

Let gi(x(t),u1(t),u2(t)) be the instantaneous payoff of player i G {1, 2} at moment t G [to, +ro).

The payoff of player i is given by:

Ki(xo, [to, +<x>),ui(-),u2(-))= J e-p(T-to)gi(x(t),ui(t),u2(t))dr,

to

gi > 0, i = 1,2,

where p is the discount rate. Suppose that payoffs are nontransferable.

If players agree to cooperate, they play an open-loop strategy pair u*(t) = (u(t),u2(t)) which generates a Pareto-optimal payoff vector in the game

r(xo, [to, +to)):

Xi(xo, [to, +ro)) = J e~p(T~to)gi[r,x*(r),U*(t),u* (r)]dr, i = 1, 2,

to

where x* (t) is Pareto-optimal trajectory, which has been calculated by substituting the Pareto-optimal open-loop strategy pair u*(t) = (u*(t),u|(t)) in the motion equations (1) and solving of corresponding differential equations.

In the absence of cooperation players stick to the feedback Nash strategies u(t) = (Ui(t), u2(t)). Noncooperative payoff of player i along the noncooperative trajectory x(t) for t € [to, +ro) is defined as:

Wi(xo, [to, +ro)) = J e~p(r~to)gi[T, x(t), ui(t), u(T)]dT, i = 1, 2.

to

2. Time-consistency

The stringent requirement for solutions of cooperative differential games is time-consistency. Time-consistency of cooperative solution guaranties optimality of chosen optimality principle along the optimal trajectory. Time-consistent solution to nontransferable payoffs requires

1) Pareto optimality throughout the game horizon,

2) individual rationality throughout the game horizon.

For time-consistency of solution in the game with transferable payoffs the Imputation Distribution Procedure is used.

For time-consistency of solution in the game with nontransferable payoffs the notion of Payoff Distribution Procedure (PDP) was introduced in Petrosyan, 1997:

Definition 1. The function (3i(t) is called a Payoff Distribution Procedure (PDP) if it satisfies

Ai(xo, [to, +ro)) = j e-p(t-to) j3i(t)dt, i = 1, 2.

to

Definition 2. An admissible Pareto-optimal solution is called time-consistent if there exist a PDP pair (3(t) = (p1(t),p2(t)) such that the condition of individual rationality is satisfied

J e-P(T-‘"^)dT-e-p(‘-“’w-<x*<t)[t- +“»• i= 1- 2

t

where Wj,(x* (t), [t, +ro)) — is Nash outcome of player i € I in subgame r(x*(t), [t, +to)). These payoffs are computed with initial condition on the Pareto-optimal trajectory x*(t).

Theorem 1. An admissible Pareto-optimal solution is time-consistent if functions wi(x* (t), [t, +to) are differentiable. There exist a nonnegative function ni(t), t €

[to, such that the following conditions are satisfied:

TO

Ai(xo, [to, +ro)) - Wi(xo, [to, +ro)) = J e-p(T-to)ni(T)dT, i = 1, 2.

to

Then PDP fa(t) can be computed by formula (see Petrosyan, 1997):

Pi{t) = Vi(t) +pwi(x*(t), [t, +oo)) - ^Wi(x*(t), [t,+oo)), * = 1,2. (2)

3. Irrational Behavior Proof Condition

Consider the case where the cooperative scheme has proceeded up to time t € [to, +ro) and some players behave irrationally leading to the dissolution of the scheme. A condition under which even if irrational behaviors appear later in the game the concerned player would still be performing better under the cooperative scheme is the irrational behavior proof condition (Yeung, 2006), which also called the D.W.K. Yeung condition.

The D.W.K Yeung condition for the differential game with infinite duration r(xo, [to, +ro)) is described by following inequality:

t

Wi(xo, [to, +ro)) <J e-p(T-to)fai (t )dT + e-p(t-to)Wi(x* (t), [t, +to)), i = 1, 2,

to

(3)

where fa (t) is time-consistent payoff distribution procedure; x* is the cooperative optimal trajectory;

Wi(xo, [to, +ro)) is the noncooperative payoff of player i with the initial state xo over the time interval [to, +ro) ;

Wi(x* (t), [t, +ro)) is the noncooperative payoff of player i with the initial state on the cooperative trajectory x* (t) over the time interval [t, +to).

Differentiating the right hand side of inequality (2) with respect to t leads to the follows sufficient condition of realization of inequality (2).

Theorem 2. Performance of the following inequality is sufficient to realization of the D.W.K Yeung condition for player i = 1, 2 on time interval [to, +to):

Pi{t) > -^Wi(x*(t),[t,+oo)) +pwi(x*(t),[t,+oo)). (4)

Proposition 1. The D.W.K Yeung condition for any Pareto-optimal solution in the cooperative differential game with nontransferable payoffs is satisfied for PDP, which is computed by formula (1).

Proof. If Pareto-optimal solution is time-consistent, PDP /3i(t) can be computed by formula (1).

Rewrite formula (1):

AW -Viit) = pwi(x*(t), [t,+oo]) - Wi(x*(t), [t,+oo]),

dt

Vi(t) — 0, i = 1, 2.

Because rj-i(t) is nonnegative function, the PDP (3i(t) satisfies the following inequality:

Pi{t) > pwi{x*{t), [t,+oo]) - Wi(x*(t), [t,+oo]), i = 1,2.

Then the sufficient condition of realization of inequality (2) for PDP (1) is satisfied. Hence the D.W.K. Yeung condition is satisfied.

4. Example

As a demonstration the technique of computation the PDP and verification the D.W.K. Yeung condition consider the dynamic game of emission reduction.

The dynamics of the model is proposed in Petrosyan and Zaccour, 2003.

As a Pareto-optimal solution consider the payoffs that implement the maximal total payoff.

4.1. Problem statement

Let I = {1, 2} be the set of countries involved in the game of emission reduction. The game starts at the instant of time t = 0 from initial state xo.

Denote by ui(t) the emission of player i = 1, 2 at time t,t € [0; to).

Let x(t) denote the stock of accumulated pollution by time t. The evolution of this stock is governed by the following differential equation:

x(t) ui(t) — 5x(t),

iei (5)

x(to) = xo, i = 1, 2,

where 6 is the natural rate of pollution absorption.

Ci(ui) is the emission reduction cost incurred by country i while limiting its emission to level ui:

Ci(ui(t)) = ^(ui(t) -Ui)2, 0 <Ui(t)<Ui, 7 > 0.

Di(x(t)) denotes its damage cost:

Di(x) = nx(t), n > 0.

Both functions Ci(ui) < 0 and Di (x) > 0 are continuously differentiable and convex.

The payoff function of the player i is defined as

TO

Ki(xo, [0, +TO),u) = y*e-pt(Ci(ui(t)) + Di(x(t)))dt,

o

where u = (ui, u2) is situation in the game; p is the common social discount rate. Each player seeks to minimize its total cost.

Let payoffs be nontransferable in the emission reduction game.

4.2. Solution of the problem

Every player seeks to decrease total costs, so the sign in the irrational behavior proof condition for the emission reduction game will be different.

The D.W.K. Yeung condition for emission reduction game:

t

Wi(xo, [0, +ro)) >J e~pTfti(r)dr + e~ptWi(x*(t), [t, +ro)).

o

Formulas for a feedback Nash equilibrium, an optimal cooperative trajectory x* (t), optimal cost of Grand coalition first obtained in Petrosyan and Zaccour, 2003 we present for completeness.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Noncooperative payoff wi(x0, [0, +ro)) can be computed with help of system of Hamilton-Jacobi-Bellman equations.

Noncooperative payoff of player i is defined as:

«,(*„, [0, +00)) = m ) , . = 1, 2.

(6)

By minimizing the sum of all player’s costs, the cooperative payoff of player i (the component of minimal total payoff) is defined as

A,(w,|0,+»)) = ^^ ^ga,-^^ + №j, . = l, 2.

Corresponding cooperative strategies are defined as:

i 2n

(7)

For computation of optimal cooperative trajectory we need to substitute cooperative strategies (6) in motion equations (4).

Optimal cooperative trajectory:

x*(t) = e~Stx0 + i '^2ui _ e~'5t).

iel

Noncooperative payoff with initial state on optimal cooperative trajectory x* (t) is defined as:

n2(8p + 3 (5) ne-Stxo

|(-+”» = + -JTT+

nJ2Ui nJ2uI e

_l_ iEl iEl

Ie-st (8)

5(p + 5) •

For computation PDP by formula (1) let compute nonpositive function ^(t):

+ 00

n2

n2 f _

Ai(x0, [0, +oo)) -Wi(x0, [0, +oo)) = ~ 2p^(p s)2 = J 6 pT^r)dr'

o

As a nonpositive function ni(t) we take the function:

n2

= “27(p + sy’ * = 1>2- (9)

For computation the PDP for emission reduction game substitute the function ni(t) (8) and noncooperative payoff w,(x*(t), [t, +ro)) (7) in the equation (1):

= ~~6^+~sW + 7re~Stx° + - ^2uie~St)’ * = 1,2. (10)

Y(p + ) iei iei

Substitution of time-consistent PDP (9), noncooperative payoff wi(x0, [0, +ro)) with initial state t0 (6) and noncooperative payoff w,(x* (t), [t, +ro)) with initial state on the cooperative trajectory x* (t) (7) implies to the realization of the D.W.K. Yeung condition for emission reduction game.

References

Petrosjan, L. A. (1997). The Time-Consistency Problem in Nonlinear Dynamics. RBCM -J. of Brazilian Soc. of Mechanical Sciences, Vol. XIX, No 2. pp. 291-303.

Yeung, D.W.K. (2006). An irrational-behavior-proofness condition in cooperative differential games. Intern. J. of Game Theory Rew., 8, 739-744.

David W. K. Yeung, Leon A. Petrosyan (2006). Cooperative Stochastic Differential Games. Springer Science+Business Media, Inc., P. 242.

Petrosyan, L., Zaccour, G. (2003). Time-consistent Shapley value allocation of pollution cost reduction. Journal of Economic Dynamics and Control, 27, 381-398.

Petrosjan, L. A. (2003). Cooperation in differential games. Conference Book of the XV Italian Meeting on Game Theory and Applications.

Petrosyan, L. A., Zenkevich, N. A. (2008). Time-consistency of cooperative solutions in management. Game theory and applications, Nova Science Publ., NY.

Haurie, A., Zaccour, G. (1995). Differential game models of global environment management. Annals of the International Society of Dynamic Games, pp. 3-24.

Kaitala, V., M. Pohjola (1995). Sustainable international agreements on green house warming: a game theory study. Annals of the International Society of Dynamic Games, pp. 67-88.

Dockner, E. J., S. Jorgensen N. van Long and G. Sorger (2000). Differential Games in Economics and Management Science. Cambridge University Press, pp. 41-85.

i Надоели баннеры? Вы всегда можете отключить рекламу.