Научная статья на тему 'The irrational behavior proof condition for linear-quadratic discrete-time dynamic games with nontransferable payoffs'

The irrational behavior proof condition for linear-quadratic discrete-time dynamic games with nontransferable payoffs Текст научной статьи по специальности «Математика»

CC BY
6
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
GAMES WITH NONTRANSFERABLE PAYOffS / LINEAR-QUADRATIC GAMES / DISCRETE-TIME GAMES / PARETO-OPTIMAL SOLUTION / TIME CONSISTENCY / PDP / IRRATIONAL BEHAVIOR PROOF CONDITION

Аннотация научной статьи по математике, автор научной работы — Tur Anna V.

The paper considers linear-quadratic discrete-time dynamic games with nontransferable payoffs. Pareto-optimal solutionis studied as optimality principle. The time consistency and irrational behavior proof condition of this solution are investigated. As an example, the government debt stabilization game is considered.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «The irrational behavior proof condition for linear-quadratic discrete-time dynamic games with nontransferable payoffs»

The Irrational Behavior Proof Condition for Linear-Quadratic Discrete-time Dynamic Games with Nontransferable Payoffs*

Anna V. Tur

St.Petersburg State University,

Faculty of Applied Mathematics and Control Processes, Universitetskii pr. 35, St.Petersburg, 198504, Russia E-mail: [email protected]

Abstract The paper considers linear-quadratic discrete-time dynamic games with nontransferable payoffs. Pareto-optimal solution is studied as optimality principle. The time consistency and irrational behavior proof condition of this solution are investigated. As an example, the government debt stabilization game is considered.

Keywords: linear-quadratic games, discrete-time games, games with non-transferable payoffs, Pareto-optimal solution, time consistency, PDP, irrational behavior proof condition.

1. Introduction

Consider N-person discrete-time dynamic game r(k0,x0) which is described by the state equation

n

x(k + 1) = A(k)x(k) + Bi(k)ui(k), (1)

i= 1

k > ko, ko G K+, x(ko) = xo.

x is m-dimensional state of system, ui is a r-dimensional control variable of player i, x(k0) = x0 is the arbitrarily chosen initial state of the system, A(k), Bi(k) G Z(K+) are matrices of appropriate dimensions, K+ is the set of nonnegative integers, Z(K+) is the set of bounded real matrices. The payoff function of player i G N is

TO

Ji wi(k, x(k), Ui(k)), Vi = 1,... ,n, (2)

k=ko

wi(k, x(k), Ui(k)) = xT(k)Pi(k)x(k) + uT(k)Ri(k)ui(k),

Pi(k),Ri(k) G Z(K+), Pi(k)= PT(k), Ri(k) = RT(k) Vi G N.

Suppose that payoffs are nontransferable.

We will assume that the players use feedback strategies,

ui(k,x) = Mi(k)x(k),

to control the system.

This work was supported by the St. Petersburg State University under grants No. 9.38.245.2014

*

Definition 1. A set of strategies

{ui(k,x) = Mi(k)x(k), i = 1,...,n} (3)

is called permissible if the following conditions are satisfied:

1. Mi(k) e Z(K+) Vi = 1,...,n.

2. The resulting system described by

n

x(k + 1) = (A(k) + Y, Bi(k)Mi(k))x(k) (4)

i= 1

is uniformly asymptotically stable (when k ^ to).

Suppose that players agree to use a Pareto-optimal solution as optimality principle.

And suppose that players consent to use vector of weights

n

a = (ai,. .., an) : ^2 ai = 1, 0 < ai < 1

i=1

on their payoffs to obtain a Pareto-optimal outcome.

Then the optimal cooperative strategies of players can be found by solving the following control problem (Engwerda, 2005)

n

max y aiJi(ko,xo,u), (5)

i=1

Let ua(k) = (u'a(k),... ,u<a(k)) be the set of strategies solving this optimal control problem:

n

(u'a,... ,u'a) = arg max aiJi(k0,x0,u). (6)

(Ul,...,Un)

i=1

n n

Assume Ja(ko,xo,u) = aiJi(k0, x0,u), Pa(k) = aiPi(k), k > k0,

i=1 i=1

Ra(k)

f aiRi(k) O ... O ^

O a2R2(k) ... O

V O O ...anRn(k)J

k > ko.

Then

Ja(k0,x0,u) = ^^(xT (k)P a(k)x(k) + u(k)Ra(k)u(k)). (7)

k=ko

Finding of Pareto-optimal solution is reduced to linear-quadratic optimal control problem (1)-(7) with one control variable u(k).

The unique control in class of admissible

{u?(k) = M?(k)x, i = 1, . . . , n},

maximizing Ja(k0,x0,u) exists if and only if (Bertsekas, 2007) the following conditions are satisfied:

1. The system of matrix equations

' (A(k) + B(k)Ma(k))Tea(k + l)(A(k) + B(k)Ma(k)) - Oa(k)-- Pa(k) - Ma(k)TRa(k)Ma(k) = 0,

M a(k) = -(-Ra(k) + BT (k)Oa (k + l)B(k))-1BT (k)Oak + l)A(k),

k > ko

has the solution {Ma(k), Oa(k)} € Z(K+), with dimensions rs x m and m x m respectively, where Oa(k) - is symmetric for all k > k0.

2. The set of strategies

{ua(k) = Ma(k)x, i = 1,...,n}, (9)

^Mf(k)\

Ma(k)

is admissible.

where Ma(k) - i-th block of the matrix Ma(k) =

\M a(k)J

3. (-Ra(k) + BT(k)Oa(k + l)B(k)) - positive definite matrices.

The cooperative state trajectory xa(k) one can find by substituting the cooperative strategies {uf(k)} in (1) and solving the system:

x(k + 1)= A(k)x(k)+ B(k)ua(k). (10)

And payoffs of players are:

Ja(ko,xo,ua) = ^2

k=ko

Here B(k) = (Bi(k) B^(k) ... B(k)) .

2. Time-consistency

Suppose that there exists such a, that inequalities

f(xa(k))T Pi (k)xa(k) + (u*(k))T Ri (k)u*(k)\. (11)

Ja(k0,x0,ua) > Vi(k0,x0), i = 1,...,n. (12)

requiring for individual rationality in the cooperative game are satisfied at initial time. Here Vi(k0,x0) - is Nash outcome of player i in game r(k0,x0).

But if there exists k > k0 such that for some i:

Ja(k,xa(k),ua) < Vi(k,xa(k)),

then time-inconsistency of the individual rationality condition is appear.

To overcome the time inconsistency problem in the game with nontransferable payoffs the notion of Payoff Distribution Procedure (PDP) was introduced by L.A. Petrosyan (1997). In this paper the PDP and time-consistency of Pareto-optimal solution are detailed for linear-quadratic discrete-time dynamic games.

On The Irrational Behavior Proof Condition for Linear-Quadratic Games 387 Definition 2. Vector @(k) = (^i(k), ...,@n(k)) is a PDP if

CO / \ TO

E [(xa(k))TPi(k)xa(k) + (ua(k))TRi(k)ua(k)\= ^ вг(к), i = l,...,n.

k=ko k=ko

Definition 3. Pareto-optimal solution is called time-consistent if there exists a PDP such that the condition of individual rationality is satisfied

O

J2ei(k) > Vi(l,xa(l)), Vl > k0, i = 1,...,n, (13)

k=l

where Vi(l, xa(l)) - is Nash outcome of player i in subgame Г(l, xa(l)).

Let for some Pareto-optimal solution the condition (12) is satisfied. Then there exist such functions ni(k) > 0, that

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

O

Ja(ko, xo, ua) - Vi(ko, xo) = ^2 ni(k). (14)

k=ko

In (Petrosyan, 1997) the formula for PDP, which guarantees a time-consistency in cooperative differential game with nontransferable payoffs, is considered. The following theorem gives an analog of this formula.

Theorem 1. Let inequalities

Ja(k0, x0, ua) > Vi(ko, x0), i = 1,... ,n, are satisfied for some Pareto-optimal solution. Then PDP @(k) computed by formula fc(k) = ni(k) - Vi(k + 1, xa(k + 1)) + Vi(k,xa(k))

i = 1,... ,n, k > ko (15)

guarantees time-consistency of this Pareto-optimal solution along the cooperative trajectory xa(k) for k > k0. Here ni(k) > 0 - are functions satisfying (14).

Proof. Show that j3(k) is a PDP:

OO

У~] Pi(k) =^2 ni(k) - Vi(<x>, xa(<x>)) + Vi(ko, xo) =

k=ko k=ko

= Ja(ko,xo,ua) - Vi(ko,xo) + Vi(ko,xo) = Ja(ko,xo,ua). (16)

Here Vi((X, xa(rx)) = lim Vi(k,xa(k)) = 0. So j3(k) satisfies definition 2.

k——to>

Now show that the condition of individual rationality is satisfied. Using (15) we obtain

OO

^2@i(k) = ^2 Vi(k) - Vi(w,xa(^)) + Vi(l,xa(l)) = k=l k=l

O

= X)Vi(k) + Vi(l,xa(l)) > Vi(l,xa(l)). (17)

k=l

2.1. Irrational Behavior Proof Condition

The condition under which even if irrational behaviors appear later in the game the concerned player would still be performing better under the cooperative scheme was considered in (Yeung, 2006). The irrational behavior proof condition for differential games with nontransferable payoffs is proposed in (Belitskaia, 2012). In this paper the irrational behavior proof condition is concretized for linear-quadratic discretetime dynamic games with nontransferable payoffs.

Definition 4. Pareto-optimal solution (J^(k0, x0, ua),..., Ja(k0, x0, ua)) satisfies the irrational behavior proof condition (Yeung, 2006) in the game r(k0,x0), if the following inequalities hold

i

Y,Pi(k) + Vi(l + 1,xa(l +1)) > Vi(ko,xo), i = 1,...,n (18)

k=ko

for all l > k0, where 3(k) = (31(k),..., 3n(k)) is time-consistent PDP of (J^(k0, x0, ua), ..., Ja(ko,xo,ua)).

So if for all i = 1,... ,n the following inequalities holds

3i(k) + Vi(k + 1,xa(k +1)) - Vi(k,xa(k)) > 0, k > k0,

then the Pareto-optimal solution satisfies the irrational behavior proof condition. Rewrite these inequalities using (8)

3i(k) + (xa(k))T^ (A(k) + B(k)M a(k))T Oi(k + 1)(A(k) + B(k)M a(k))-

Qiik^j xa(k) > 0, k > k0 (19)

If we use formala (15), then

3i(k) + Vi(k + 1, xa(k + 1)) - Vi(k, xa(k)) = ni(k), k > ko,

where ni(k) > 0 for all k > k0. It means that conditions (19) are always satisfied in this case.

Let’s formulate these results.

Theorem 2. If in linear-quadratic discrete-time dynamic games with nontransferable payoffs for some Pareto-optimal solutions and its PDP the following inequalities hold

3i(k)+ Vi(k + 1,xa(k + 1)) - Vi(k,xa(k)) > 0, k > k0 i = 1,..., n.

where Vi(l,xa(l)) - is Nash outcome of player i in subgame r(l,xa(l)), then the irrational behavior proof condition for this Pareto-optimal solutions is satisfied.

Proposition 1. If the PDP 3(k) of Pareto-optimal solution in linear-quadratic discrete-time dynamic games with nontransferable payoffs is calculated using formula (15), then the irrational behavior proof condition for this Pareto-optimal solutions is satisfied.

3. Example

As an example consider the government debt stabilization game(van Aarle, Boven-berg and Raith, 1995). Pareto solution of this game is considered in (Engwerda, 2005). This paper shows the discrete-time case of this problem and time-consistency of cooperative solution.

Assume that government debt accumulation, d(k), is the sum of interest payments on government debt, rd(k), and primary fiscal deficits, f (k), minus the seignorage (i.e. the issue of base money) m(k). So,

The objective of the fiscal authority is to minimize a sum of time profiles of the primary fiscal deficit, base-money growth and government debt

The monetary authorities are assumed to choose the growth of base money such that a sum of time profiles of base-money growth and government debt is minimized. That is

d(k + 1) = rd(k) + f (t) — m(t), d(0) = d0,

k=0

k=0

Let

k

fc + 1

k

k

Then our system can be rewritten as

2

x(k + 1) = A(k)x(k) + Bi(k)ui(k)

The payoff function of player i

TO

2

J =^Z(xT (k)Pi(k)x(k)+^2 uT (k)Rij (k)ui(k)), Vi = 1, 2

k = ko

Pl = (oo) ,p2 = (oo) , Rn = 1, Rl2 = n, R21 =°, R22 = 1-

Following (Basar and Olsder, 1999) to find the Nash equilibrium we solve the system

(A(k) + ]T Bi(k)MlNE (k))T Giik + 1)(A(k) + ]T Bi(k)MlNE (k))-i=1 i=1

< - Oi(k) + Pi(k) + MNE(k)TRij(k)MNE(k) + MNE(k)TRii(k)MfE(k) = o, MNE (k) = -(Rii(k) + BT (k)Oi(k + l)Bi(k))-1B'T (k)Oi(k + l)x x (A(k) + Bj(k)MNE(k)), i =1, 2, j = i.

Let A = 1] = 1, 2 = s = 2,7 = 1- Then

uNE(k,x) = (—o.o73193 -o.166311) x(k), uNE(k,x) = (o.142o83 o.318188) x(k),

T = T fo.656174 o.3542o2\

Jl = Xo lvo.3542o2 o.844156J x°’

T = T /1.273766 o.613o87\

T2 = xo \^o.613o87 1.444844) xo.

V(1 x(k)) = xT(k) (o656174 o3542o2\ x(k)

V (1,x(k)) x (k^o.3542o2o.844156) x(k):

V(2 (k))= t(k) /1.273766 o.613o87\ (

V(2,x(k)) x (k) [o.613o87 1.444843) x(k)>

According to (8) to find the Pareto Solution we solve the system

' (A(k) + BiM^ + B2M2a)Tea(k + 1)(A(k) + BiM^ + B2Ma)-

- Oa(k) + Pa(k) + Ma(k)TRa(k)Ma(k) = o,

M a(k) = -(Ra(k) + BT (k)Oa(k + 1)B(k))-1 x x BT(k)Oa(k + 1)A(k).

wi,ere pam = ap1(k) + (1 - a)pm, R‘(k) = (“O ^ “ _ «r J ,

B(k) = (B1(k) B2(k)) .

For a = o, 45

M^ = (-o.22726184o8 - o.5o75o99515)

M^ = (o.1o22678284 o.2283794781)

T ( a)= T Zo.68o8499o28o.413935316^

J1(u ) = xo yo.4139353163o.94o9769o84y xo

T = T / 1.22391491o o.4917964794\

J2(u )= xo \p.4917964794 1.139o11465 J xo

If, for example, xo = (-3 2), then

Ja(ko,xo,ua) - V1(ko,xo) = -Q.1Q7435164999999722

Jaa(ko,xo,ua) - V2(ko,xo) = -{).2164976646{){){){){)528

So, conditions (12) are satisfied (we consider the minimization problem, that is why we have an opposite sign in (12)).

But on the next step we have

Ja(k1,x1,ua) - V1(k1,x1) ={)X)5Q4{)46297943969643

It means, that time-inconsistency of the individual rationality condition is appear. To avoid this problem, use PDP, calculated by formula (15)

-o.1o7435164999999722 aT / o.53743o998 o.o7896oo736\ a.

k(k+ 1) +X° ^(^0.0789600736 0.16307449 )x°^k^

-0.216497664600000528 aT /1.060309389 0.144646529\ k(k+ 1) +X ^ V0.144646529 0.35798954 J X

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(20)

Note, that ni(k) < o, because we consider the minimization problem now.

Sufficient condition for realization of irrational behavior proof condition has form:

01 (k)- xaT (k)( o.53743o<998 o°7896oo736\ xa(k)< o P1(k) x (k) Vo.o7896oo736 o.163o7449 J x (k) ~ o

Mk)- xaT (k) (1.o6o3o9389o144646529\ xa(k)< o P2(k) x (k) \o. 144646529 o.35798954 J x (k) ~ °

And they are satisfied for P(k), computed by formula (20).

References

Aarle, B. van, Bovenberg, L. and Raith, M. (1995). Monetary and fiscal policy interaction and government debt stabilization. Journal of Economics, 62(2), 111-140.

Basar T. and Olsder G. J. (1999). Dynamic Noncooperative Game Theory, 2nd edition. Classics in Applied Mathematics, SIAM, Philadelphia.

Belitskaia, A. V. The D.W.K. Yeung Condition for Cooperative Differential

Games with Nontransferable Payoffs. Graduate School of Management, Contributions to game theory and management, 5, 45-50.

Bertsekas D.P. (2007). Dynamic Programming and Optimal Control, Vol I and II, 3rd edition. Athena Scientific,

Engwerda, J. C. (2005). LQ Dynamic Optimization and Differential Games. Chichester: John Wiley Sons, 497 p.

Markovkin, M. V. (2006). D. W. K. Yeung’s Condition for Linear Quadratic Differential Games. In: Dynamic Games and Their Applications (L. A. Petrosyan and A. Y. Garnaev , eds.), St Petersburg State University, St Petersburg, 207-216.

Markovkina, A. V. (2008). Dynamic game-theoretic model of production planning under competition. Graduate School of Management, Contributions to game theory and management, 2, 474-482.

Petrosjan, L. A. (1997). The Time-Consistency Problem in Nonlinear Dynamics. RBCM -J. of Brazilian Soc. of Mechanical Sciences, Vol. XIX, No 2. pp. 291-303.

Petrosyan, L. A. and N. N. Danilov (1982). Cooperative differential games and their applications. (Izd. Tomskogo University, Tomsk).

Yeung, D. W. K. (2006). An irrational-behavior-proofness condition in cooperative differential games. Intern. J. of Game Theory Rew., 8, 739-744.

Yeung, D. W. K. and L. A. Petrosyan (2004). Subgame consistent cooperative solutions in stochastic differential games. Journal of Optimization Theory and Applications, 120(3), 651-666.

i Надоели баннеры? Вы всегда можете отключить рекламу.