Научная статья на тему 'Динамически устойчивая кооперация и принцип переходящей компенсации'

Динамически устойчивая кооперация и принцип переходящей компенсации Текст научной статьи по специальности «Математика»

CC BY
79
35
i Надоели баннеры? Вы всегда можете отключить рекламу.

Аннотация научной статьи по математике, автор научной работы — Янг В. К., Петросян Л. А.

Динамическая кооперация представляет собой одну из наиболее сложных форм принятия решений при неопределенности. При этом необходимо одновременно исследовать взаимодействие между стратегическим поведением, эволюционной динамикой и случайным процессом, что приводит к большим трудностям при построении динамически устойчивых решений. Несмотря на настоятельную необходимость кооперации в глобальной экономике, отсутствие формального анализа решений мешало серьезному исследованию проблемы. В этой статье исследуется существенный элемент, ведущий к динамически устойчивым (позиционно состоятельным) решениям принцип переходящей компенсации.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

DYNAMICALLY STABLE COOPERATION AND THE TENET OF TRANSITORY COMPENSATION

Dynamic cooperation represents one of the most complex forms of decision-making analysis under uncertainty. In particular, interactions between strategic behavior, dynamic evolution and stochastic elements have to be considered simultaneously in the process. This complexity leads to great difficulties in the derivation of dynamically stable solutions. Despite urgent calls for cooperation in the global economy, the lack of formal analysis solutions has precluded rigorous analysis of this problem. In this paper, we examine an essential factor leading to subgame consistent solutions for dynamic cooperation transitory compensation.

Текст научной работы на тему «Динамически устойчивая кооперация и принцип переходящей компенсации»

UDK 539.3 Viestnik St.Peterburg University. Ser. 10, 2006, issue 1

W. K. Yeung, L. A. Petrosyan (В. К. Янг, JI. А. Петпросян)

DYNAMICALLY STABLE COOPERATION AND THE TENET OF TRANSITORY COMPENSATION *) (ДИНАМИЧЕСКИ УСТОЙЧИВАЯ КООПЕРАЦИЯ И ПРИНЦИП ПЕРЕХОДЯЩЕЙ КОМПЕНСАЦИИ) *•)

1. Introduction. Interactive behavior in stochastic dynamic environments constitutes a highly complex form of decision making under uncertainty. In particular, interactions between strategic behaviors, dynamic evolution and stochastic elements have to be considered simultaneously. This has resulted in a situation where such problems are not extensively studied. Examples of solvable problems include Clemhout and Wan [1], Kaitala [2], J0rgensen and Yeung [3], and Yeung [4-5]. Cooperative actions hold out the promise of more socially optimal and group efficient solutions to problems involving strategic actions. Economic cooperation represents one of the most complex forms of decision-making analysis under uncertainty. On top of strategic interactions, dynamic evolution and stochastic elements the participants' optimal behavior has to be formulated. This complexity leads to great difficulties in the derivation of satisfactory analysis.

Formulation of optimal behaviors for agents is a fundamental element in the theory of cooperative games. The agents' behaviors satisfying some specific optimality principles constitute a solution of the game. In other words, the solution of a cooperative game is generated by a set of optimality principles. In dynamic cooperation, a stringent condition on the cooperative agreement is required: The solution optimality principle must remain optimal at any instant of time throughout the game along the optimal state trajectory chosen at the outset. This condition is known as dynamic stability or time consistency. In the presence of stochastic elements, subgame consistency (formally defined in Yeung and Petrosyan [6]) is required in addition to dynamic stability for a credible cooperative solution. A cooperative solution is subgame consistent if an extension of the solution policy to a situation with a later starting time and any feasible state brought about by prior optimal behaviors would remain optimal.

In this paper, we examine an integral factor leading to subgame consistent solutions for dynamic cooperation — transitory compensation. Section 2 presents the basic formulation of a cooperative stochastic differential game. Section 3 examines the notion of subgame consistency. Equilibrating transitory compensations supporting subgame consistent solutions are derived in Section 4. The economic rationale behind transitory compensation is examined in Section 5. Concluding remarks are provided in Section 6.

2. Game Formulation and Noncooperative Equilibria. Consider the situation in which there exist two economic agents (firms, regions or nations). We shall use the terms agent and player interchangeably. The planning horizon of the agents begins at time t0 and ends at time T, where T can be finite or infinite. The state space of the economic system is denoted by X С Rn, and {x(s) € X, to ^ s ^ T} represents permissible state trajectories. The state may include any stock variables like resource endowments, sizes of labor force, capital stocks and states of technology. The state space of the game is X G Rn,

*) Financial support from the Research Grant Council of Hong Kong (Grant HKBU2103/04H) and Hong Kong Baptist University (Grant FRG/04-05/II-03) is gratefully acknowledged.

**) Статья опубликована без издательского редактирования.

© W. К. Yeung, L. A. Petrosyan, 2006

with permissible state trajectories {x(s), fo ^ s ^ T}. The state dynamics of the problem is characterized by the vector-valued stochastic differential equations:

dx (s) = / [s, x (s), u1 (s), u2 (s)] ds + a[s,x (s)] dz (s), , .

x(t0) = s0,

where a [5, x (s)] is an n x 0 matrix and z (s) is a 0-dimensional Wiener process, and the initial state xo is given. Let il [s, x (s)] = o [s, x (s)] <j [s, x (s)]' denote the covariance matrix with its element in row h and column £ denoted by Qh<> [s,x (s)]. Ui € Ui C comp Rl is the control vector of player i G {1,2}.

The instantaneous payoff of player i is denoted by gl [s, x (s), u\ (s), u2 (s)], and when the game terminates at time T, player i receives a terminal value of q1 (x(T)). Given a time-varying instantaneous discount rate r (s), for s € [to,T], values received at time t have to be discounted by the factor exp | — r (y) dy]. Hence at time to, the payoff function of

player i is given as:

E*o {£ 9* [«»x (s) > «1 (s) . u2 («)] exp [- f/o r (y) dy] ds+ exp [-j£r(y)<fo] q* (x(T))}.

+ <

(2)

We use T (xq,T — f0) to denote the noncooperative game (1), (2).

Definition 1. A set of feedback strategies (t,x) e , for i e {1,2} j provides a

Nash equilibrium solution to the game T (xo, T — t0), if there exist continuously differentiable functions y(io)i (i,x) : [t0,T]xRn -4 R,i€ {1,2} satisfying the following (Fleming-Bellman-Isaacs) partial differential equations.

_y(to)i (tfX) _ 1 ^ (tfX) = yjtoji (tjX) =

+

= maxUi \t,x,Ui (t,x) (t,x) exp [- ¡{q r (y) dy

+ Vx{to)i 01, x) f [t, x, Ui 0t, x), 4>{/o)* (t, x)] | , and VM* (T, x) = exp [- r (y) dy] qi (x), i,j e {1, 2} and j ± i.

Consider the alternative game T(xT,T — r) with payoff structure (2) and dynamics (1) starting at time r € [to,T] with initial state xT e X.

A set of feedback strategies (t, x) € i e {1,2} and t 6 [r, T] j provides a Nash

equilibrium solution to the noncooperative game T (xT,T — r) if there exist suitably smooth functions V^ (t,xt) : [t,T] x Rn R, for i £ {1,2}, satisfying the corresponding Isaacs-Bellman-Fleming equations in Definition 1.

Remark 1. Examining the Isaacs-Bellman-Fleming equations of the game T {xT,T - r) for different values of r € [to,T], one can readily verify that:

4>\t)* (s, x (a)) = (t>?o)* (s, x (s)) which is defined as ft (s, x (s)), s€[r,T],

V(r)i (r, xT) = exp [ftTo r {y) dy] V{to)i (r, xT), and

yi4)^ (t, xt) = exp [/^ r (y) dy\ VW (t, xt), for t0 ^ r < t ^ T

and i e {1,2}.

3. Subgame Consistency. The notions of time consistency or subgame consistency guarantees the dynamic stability for a credible cooperative solution. In particular, the dynamic stability of a solution of a cooperative differential game ascertains the condition that when the game proceeds along an optimal trajectory, at each instant of time the players are guided by the same optimality principle, and hence do not have any ground for deviation from the previously adopted optimal behavior throughout the game. Central to a successful economic cooperative scheme is the property of subgame consistency. The question of dynamic stability in differential games has been explored rigorously in the past three decades. Haurie [7] discussed the problem of instability in extending the Nash bargaining solution to differential games. Petrosyan [8] formalized mathematically the notion of dynamic stability in solutions of differential games. Petrosyan and Danilov [9] introduced the notion of "imputation distribution procedure" for a cooperative solution.

As pointed out by j0rgensen and Zaccour [10], conditions ensuring time consistency of cooperative solutions could be quite stringent and analytically intractable. In the presence of stochastic elements, a more stringent condition — subgame consistency — is required for a credible cooperative solution. A cooperative solution is subgame consistent if an extension of the solution policy to a situation with a later starting time and any feasible state brought about by prior optimal behaviors would remain optimal. The recent work of Yeung and Petrosyan [6] developed a generalized theorem for the derivation of an analytically tractable payoff distribution procedure supporting a subgame consistent solution.

Let Tc (xo,T — to) denote a cooperative game with the game structure of T (xo,T — to) in which the players agree to act according to a particular optimality principle. The solution optimality principle for a cooperative game rc (x0,T — to) includes

(i) an agreement on a set of cooperative strategies and

(ii) a mechanism to distribute total payoff among players.

Under the solution optimality principle, both group rationality and individual rationality are required to be satisfied.

3.1. Group Rationality and Optimal Trajectory. Consider the cooperative game rc (xo,T — t0) with transferable payoffs. To achieve group rationality, the players can agree to act so that the sum of the expected payoffs is maximized. We use

{ip\to)* (t,x) 6 i E {1,2} and t e [t0,T]}

to denote a set of controls that maximizes their joint expected payoff, and W^ (t,x) : [to,T] x Rn —> R to denote the corresponding optimal value function that satisfies the stochastic dynamic programming equation of Fleming and Rishel [11]. We delineate the corresponding trajectory by the stochastic process:

x*(t) = xo +J* f[s,x*(s),4t0)*(s,x^s)),4t0)*(s,x*(s))]ds +

+ [l a[s,x* (s)]dz{s). (3)

Jto

We use X* to denote the set of realizable values of x* (t) at time t generated by (3). The terms x* (t) and x*t are used interchangeably.

Consider the alternative cooperative game rc(x*,T — r) which begins at time r G [to,T] with initial state x* G X* and the same solution optimality principle. We use

(t,x) G for i G {1,2} and t G [r,T]} to denote a set of controls that yields the

optimal trajectory of Tc (x*, T — r), and W^ (£, x) : [r, T] x -» iZ to denote the optimal value function that satisfies the stochastic dynamic programming equation.

Remark 2. Examining the stochastic dynamic programming equations for Tc(xo, T — to) and rc (x*,T — r), one can readily verify that:

vr («,«(«)) («.«(«)). for se[r,T],

W^ (r, xT) — exp / r (;y) dy (r,xT), and

L-/t0

№(t) (i,xt) - exp J^r (y) dy W™ (t,xt), for t0 ^r^t^T.

Hereafter we adopt the notation (s,x (s)) = (s,x (s)) = ip* (s,x (s)), for i G {1,2} and t0 ^r <: t <: s ^T. An optimality principle satisfying group rationality would involve an agreement of the players to adopt [np[ (s,x* (s)) ,ip2 (s,x* (S))L for s G [ioj^1]- A corollary of Remark 2 is:

Corollary 1. Any optimal trajectory of the game rc(x*,T —r) is a continuation of some optimal trajectories of the game rc (xo,T — to).

Hence the optimal (stochastic) trajectory is characterized by (3).

3.2. Individual Rationality. The solution optimality principle would also provide a mechanism to distribute total expected payoff among players. Denote by (r,x*) — [£(r)x (r, x*) , (t,x*)] the expected shares of the players over the interval [r,T] in the game Tc (x*,T — r) according to the optimality principle. Following Yeung and Petrosyan [6], we formulate a payoff distribution procedure (PDP) over time so that the agreed upon shares can be realized. Let BJ (s) denote the instantaneous payment given to player i in the cooperative game at time instant s € [r,T] for the cooperative game rc (x*,T — r). A terminal value of q1 (x* (T)) is received by player i at time T.

In particular, for i G {1,2} and r G [t0,T], BJ (s) and q* {x* (T)) constitute a PDP for the game Tc (x*,T - r) in the sense that (r,x*) equals:

ET { (/J BJ(s) exp [- /; r(y)dy] ds+

+^(x*(T))exp[-/rTr(y)dy]) | x(r)=x*T],

Definition 2. The vector (r,x*) = [£(r)1 (t,x*),^2 (r,x*)] is an imputation of the cooperative game rc (x*,T — r), for t e[t0,T], if it satisfies:

(i) E £(T)j (T>0 = W{T) (T,K)>

3=1

(ii) ^ (r,x*) > V*7* (r, x*) for i G {1, 2}.

Part (i) of Definition 2 ensures Pareto optimality and hence group rationality throughout the game. Part (ii) guarantees individual rationality in the sense that each player receives at least the payoff he will get if there is no agreement and the game is played noncooperatively.

Moreover, for i G {1, 2} and t G [r,T], we introduce the term (t, x*t) which equals

ET { (f BJ(S) exp [- /; r(y)dy] ds+

+^(T))exp[-/Jr(y)^])| x(t) =x;j,

to denote the expected value of player i's cooperative payoff over the time interval [t,T], given that the state is x*t G Xf at time t G [r,T], in the game Tc (x*,T — t). The vector (r,xjf) — f^7"'1 (r, xl) , (t,)] can be regarded as the dynamic imputation vector over the interval [t, T] of the game Tc (a;*,T — r) so that (r, x*) is attained.

Definition 3. The vector £(T) (t,x*t) = [¿M1 (t,x*t) (t,x*t)], for t G [r,T], satisfying Definition 2 and the condition that

t

-/

*t*), for ï G {1, 2}

and G X*, is a subgame consistent imputation of Tc (x*,T — r).

Subgame consistency as defined in Definition 3 guarantees that the solution imputations throughout the game interval in the sense that the extension of the solution policy to a situation with a later starting time and any possible state brought about by prior optima] behavior of the players remains optimal. Moreover, group and individual rationality are satisfied throughout the game. Crucial to the analysis is the formulation of a compensation mechanism that would lead to the realization of ^ (t,x^) according to the solution optimality principle. This will be done in the next section.

4. The Tenet of Transitory Compensation. In this section, a profit distribution mechanism will be developed to compensate transitory changes so that (t, x*t) based on the solution optimality principle can be realized. For Definition 3 to hold, it is required that BJ (s) = B\ (s), for i G [1,2] and r G [t0,T], and t G [t0,T], and r ^ t. Adopting the notation BJ (s) = B\ (s) = Bi (s) and applying Definition 3, the PDP of the subgame consistent imputation vectors (t, xf) has to satisfy the following condition.

Condition 1. The PDP with B (s) and q(x* (T)) corresponding to the subgame consistent imputation vectors (t, x*) must satisfy the following conditions:

(i) ZLl Bi (s) = £ gi [s,x*s,ri (s,x*s),r2 (s,®:)] , for s G [fo,T];

¿=1

(ii) ET { (f Bi (s) exp [- /; r (y) dy] da +

+ ql (x* (T)) exp [- /rTr (y) dy])\x (r) = x*} ^ V™ (r,<) ;

and

(iii) (r, x%) = ET { (fTr+At Bi (s) exp [- /; r („) dy] ds +

+ exp [- /Tr+Ai r (y) dy] (r + Ai, x; + Axrj)

x{t) = <},

for t G [t0,T] and i G {1,2};

where AxT = f[t,x*,ipî (r,x*T),ip% (r,x*)] A t + a[T,x*} AzT + o(At), x (r) = x* G Azr = 2 (r + Ai) - z (r), and ET [o (Ai)]/At -> 0 as Ai 0.

Consider the following condition concerning subgame consistent imputations ^ (r,x*t ), forr€[i0,T]:

Condition 2. For i e {1, 2} and t ^ r and r G [io,T], (i, xl) are functions that are continuously twice differentiable in t and x*t.

If the dynamic imputation (i, x*t), for r G [¿0,^] and t G [r,T], satisfy Conditions 1 and 2, a PDP with B (s) and q (x* (T)) will yield the following theorem:

Theorem 1. Tenet of Transitory Compensation. If the solution imputations (r,x*), fori G {1,2} and r G [io^L satisfy Conditions 1 and 2, a PDP with a terminal payment ql (x (T)) at time T and an instantaneous imputation rate at time r G [ioj^1]:

Bi (r) = -

ilT)i (t,xt)

t—T

«t=i!

&\t,xt)

t = T

xT=xZ

f[r,X*T,rAr,X*T),r2(r,X*T)]-

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

E (r,x*r)

ëZ M

, for i G {1,2} ,

yields a subgame consistent solution to the cooperative game Tc (xq,T — to).

Proof. See Yeung and Petrosyan [6].

5. Economic Exegesis of Transitory Compensation. In this section, we examine the rationale behind the tenet of transitory compensation and provide an economic exegesis for it. Consider a cooperative scheme Fc (x0,T — to) in which the players agree to maximize the sum of their expected payoffs and divide the total cooperative payoff according to a certain imputation mechanism. For instance, the imputation mechanism may be required to satisfy the Nash bargaining scheme - that is, they maximize the product of expected gains in excess of the noncooperative payoffs. In a more general setting, (r,x*) may be expressed as a function of the cooperative payoff and the individual noncooperative payoffs. In particular

(r, x*T) = u™ [W^ (r,x*r), V(t)< (t, x*), V^ (t,x*T)] ,

and

^ (t,x*t) = uA)' [WW {t,xl)MT)i (t,x*t)MT)j (t,xt)] , for i G {1,2}.

If uW(t,xi) is continuously differentiable in W^{t,x*t), V^ (t,x*t), and V^ (t,x*t), then Condition 2 is satisfied because the latter three expressions are continuously differentiable in t and x*t. Moreover, {t,x*t) ^ 0, (t,x*t) ^ 0 and (t,x*t) sC 0.

Using Theorem 1 we obtain the equilibrating transition formula:

Bi (r) - (r, x*T) £ [r,(r, <), V2 (r, <)] + 3=1

+ (r, x* ) {<?< [r, x*, # (7-, , <¡>1 (r, <)] +

+

vitT)i(i,xt

t=T

X (/ [r,x*, 4>i (r,x;),p2 (r,x*)] - / [r,x;, vr (r,*;),r2 (r,<)])} + + (r, x;) [r,x;, # (r,x*), p2 (r, *;)] +

(4)

+

t — T XT =x 1

+ (/[r,x;,# (t,x*t),r2 (t,x;)] -/[r,x*,vi (r,x;),r2 (r,<)])}.

The formula in (4) provides the components of the equilibrating transitory compensation in economically interpretable terms, uff* ('Tix*) the marginal share of total cooperative payoff that player i is entitled to received according to agreed upon optimality principle; ojy}1 (r, x*) is the marginal share of his own payoff that player i is entitled to received according to agreed upon optimality principle; wjj]1 (r, x*) is the marginal share of the other player's payoff that player i is entitled to received according to agreed upon optimality principle.

The term £^=1 gJ [r,x*,ip* (r, x*) (t, x*)] is instantaneous cooperative payoff and gl [t,x*, 4>l (r,x*), 02 (t, a;*)] is the instantaneous nonccoperative payoffs of player i. The term

t = T

xT=xi

x (/ [r, x*T, 4>l (r, x*T), cf>*2 (T, x*)]-f [r, x*T,ri (r, <), r2 (r, <)])

reflects the instantaneous effect on player i's noncooperative payoff when the change in the state variable x follows the optimal trajectory governed by (3) instead of the noncooperative path (1).

Therefore the compensation Bi (r) player i receives at time r is the sum of

(i) player z's agreed upon marginal share of total cooperative profit,

(ii) player Vs agreed upon marginal share of his own noncooperative profit plus the instantaneous effect on his noncooperative payoff when the change in the state variable x follows the optimal trajectory instead of the noncooperative path, and

(iii) player i's agreed upon marginal share of player j's noncooperative profit plus the instantaneous effect on player j's noncooperative payoff when the change in the state variable x follows the optimal trajectory instead of the noncooperative path.

6. Concluding Remarks. Stochastic dynamic cooperation represents one of the most complex forms of decision-making analysis under uncertainty. In particular, interactions between strategic behavior, dynamic evolution and stochastic elements have to be considered simultaneously in the process. This complexity leads to great difficulties in the derivation of dynamically stable solutions. In this paper, we examine an integral factor leading to

subgame consistent solutions of cooperative differential games - transitory compensation. The analysis can be readily applied to the deterministic version of the class of cooperative stochastic differential games introduced by setting o equal zero. Since analytically tractable subgame consistent solutions of cooperative stochastic differential games have been found recently, further research along this line is expected.

Summary

Yeung W. K., Petrosyan L. A. Dynamically stable cooperation and the tenet of transitory compensation.

Dynamic cooperation represents one of the most complex forms of decision-making analysis under uncertainty. In particular, interactions between strategic behavior, dynamic evolution and stochastic elements have to be considered simultaneously in the process. This complexity leads to great difficulties in the derivation of dynamically stable solutions. Despite urgent calls for cooperation in the global economy, the lack of formal analysis solutions has precluded rigorous analysis of this problem. In this paper, we examine an essential factor leading to subgame consistent solutions for dynamic cooperation - transitory compensation.

References

1. Clemhout S., Wan H. Y. Dynamic Common-Property Resources and Environmental Problems // J. of Optimization Theory and Applications. 1985. Vol. 46. P. 471-481.

2. Kaitala V. Equilibria in A Stochastic Resource Management Game Under Imperfect Information // Europ. J. of Operational Research. 1993. Vol. 71. P. 439-453.

3. J0rgensen S., Yeung D. W. K. Inter- and Intragenerational Renewable Resource Extraction // Annals of Operations Research. 1999. Vol. 88. P. 275-289.

4. Yeung D. W. K. A Stochastic Differential Game Model of Institutional Investor Speculation // J. of Optimization Theory and Applications. 1999. Vol. 102. P. 463-477.

5. Yeung D. W. K. Infinite Horizon Stochastic Differential Games with Branching Payoffs // J. of Optimization Theory and Applications. 2001. Vol. 111. P. 445-460.

6. Yeung D. W. K., Petrosyan L. A. Subgame Consistent Cooperative Solutions in Stochastic Differential Games // J. of Optimization Theory and Applications. 2004. Vol. 120. P. 651-666.

7. Haurie A. A Note on Nonzero-sum Differential Games with Bargaining Solution // J. of Optimization Theory and Applications. 1976. Vol. 18. P. 31-39.

8. Petrosyan L. A. Stable Solutions of Differential Games with Many Participants // Viestnik of Leningrad University. 1977. N 19. P. 46-52.

9. Petrosyan L. A., Danilov N. Stability of Solutions in Non-zero Sum Differential Games with Transferable Payoffs // Viestnik of Leningrad Universtiy. 1979. N 1. P. 52-59.

10. J0rgensen S., Zaccour G. Time Consistency in Cooperative Differential Games // Decision and Control in Management Science: Essays in Honor of Alain Haurie / Ed. by G. Zaccour. Boston: Kluwer Academic Publ., 2002. P. 349-366.

11. Fleming W. H., Rishel R. W. Deterministic and Stochastic Optimal Control. New York: Springer-Verlag, 1975. 212 p.

Article is entered of a editorial office by 2005 24 November.

i Надоели баннеры? Вы всегда можете отключить рекламу.