The Irrational Behavior Proof Condition for Linear-Quadratic Discrete-time Dynamic Games with Nontransferable Payoffs*
Anna V. Tur
St.Petersburg State University,
Faculty of Applied Mathematics and Control Processes, Universitetskii pr. 35, St.Petersburg, 198504, Russia E-mail: a.tur@spbu.ru
Abstract The paper considers linear-quadratic discrete-time dynamic games with nontransferable payoffs. Pareto-optimal solution is studied as optimality principle. The time consistency and irrational behavior proof condition of this solution are investigated. As an example, the government debt stabilization game is considered.
Keywords: linear-quadratic games, discrete-time games, games with non-transferable payoffs, Pareto-optimal solution, time consistency, PDP, irrational behavior proof condition.
1. Introduction
Consider N-person discrete-time dynamic game r(k0,x0) which is described by the state equation
n
x(k + 1) = A(k)x(k) + Bi(k)ui(k), (1)
i= 1
k > ko, ko G K+, x(ko) = xo.
x is m-dimensional state of system, ui is a r-dimensional control variable of player i, x(k0) = x0 is the arbitrarily chosen initial state of the system, A(k), Bi(k) G Z(K+) are matrices of appropriate dimensions, K+ is the set of nonnegative integers, Z(K+) is the set of bounded real matrices. The payoff function of player i G N is
TO
Ji wi(k, x(k), Ui(k)), Vi = 1,... ,n, (2)
k=ko
wi(k, x(k), Ui(k)) = xT(k)Pi(k)x(k) + uT(k)Ri(k)ui(k),
Pi(k),Ri(k) G Z(K+), Pi(k)= PT(k), Ri(k) = RT(k) Vi G N.
Suppose that payoffs are nontransferable.
We will assume that the players use feedback strategies,
ui(k,x) = Mi(k)x(k),
to control the system.
This work was supported by the St. Petersburg State University under grants No. 9.38.245.2014
*
Definition 1. A set of strategies
{ui(k,x) = Mi(k)x(k), i = 1,...,n} (3)
is called permissible if the following conditions are satisfied:
1. Mi(k) e Z(K+) Vi = 1,...,n.
2. The resulting system described by
n
x(k + 1) = (A(k) + Y, Bi(k)Mi(k))x(k) (4)
i= 1
is uniformly asymptotically stable (when k ^ to).
Suppose that players agree to use a Pareto-optimal solution as optimality principle.
And suppose that players consent to use vector of weights
n
a = (ai,. .., an) : ^2 ai = 1, 0 < ai < 1
i=1
on their payoffs to obtain a Pareto-optimal outcome.
Then the optimal cooperative strategies of players can be found by solving the following control problem (Engwerda, 2005)
n
max y aiJi(ko,xo,u), (5)
i=1
Let ua(k) = (u'a(k),... ,u<a(k)) be the set of strategies solving this optimal control problem:
n
(u'a,... ,u'a) = arg max aiJi(k0,x0,u). (6)
(Ul,...,Un)
i=1
n n
Assume Ja(ko,xo,u) = aiJi(k0, x0,u), Pa(k) = aiPi(k), k > k0,
i=1 i=1
Ra(k)
f aiRi(k) O ... O ^
O a2R2(k) ... O
V O O ...anRn(k)J
k > ko.
Then
Ja(k0,x0,u) = ^^(xT (k)P a(k)x(k) + u(k)Ra(k)u(k)). (7)
k=ko
Finding of Pareto-optimal solution is reduced to linear-quadratic optimal control problem (1)-(7) with one control variable u(k).
The unique control in class of admissible
{u?(k) = M?(k)x, i = 1, . . . , n},
maximizing Ja(k0,x0,u) exists if and only if (Bertsekas, 2007) the following conditions are satisfied:
1. The system of matrix equations
' (A(k) + B(k)Ma(k))Tea(k + l)(A(k) + B(k)Ma(k)) - Oa(k)-- Pa(k) - Ma(k)TRa(k)Ma(k) = 0,
M a(k) = -(-Ra(k) + BT (k)Oa (k + l)B(k))-1BT (k)Oak + l)A(k),
k > ko
has the solution {Ma(k), Oa(k)} € Z(K+), with dimensions rs x m and m x m respectively, where Oa(k) - is symmetric for all k > k0.
2. The set of strategies
{ua(k) = Ma(k)x, i = 1,...,n}, (9)
^Mf(k)\
Ma(k)
is admissible.
where Ma(k) - i-th block of the matrix Ma(k) =
\M a(k)J
3. (-Ra(k) + BT(k)Oa(k + l)B(k)) - positive definite matrices.
The cooperative state trajectory xa(k) one can find by substituting the cooperative strategies {uf(k)} in (1) and solving the system:
x(k + 1)= A(k)x(k)+ B(k)ua(k). (10)
And payoffs of players are:
Ja(ko,xo,ua) = ^2
k=ko
Here B(k) = (Bi(k) B^(k) ... B(k)) .
2. Time-consistency
Suppose that there exists such a, that inequalities
f(xa(k))T Pi (k)xa(k) + (u*(k))T Ri (k)u*(k)\. (11)
Ja(k0,x0,ua) > Vi(k0,x0), i = 1,...,n. (12)
requiring for individual rationality in the cooperative game are satisfied at initial time. Here Vi(k0,x0) - is Nash outcome of player i in game r(k0,x0).
But if there exists k > k0 such that for some i:
Ja(k,xa(k),ua) < Vi(k,xa(k)),
then time-inconsistency of the individual rationality condition is appear.
To overcome the time inconsistency problem in the game with nontransferable payoffs the notion of Payoff Distribution Procedure (PDP) was introduced by L.A. Petrosyan (1997). In this paper the PDP and time-consistency of Pareto-optimal solution are detailed for linear-quadratic discrete-time dynamic games.
On The Irrational Behavior Proof Condition for Linear-Quadratic Games 387 Definition 2. Vector @(k) = (^i(k), ...,@n(k)) is a PDP if
CO / \ TO
E [(xa(k))TPi(k)xa(k) + (ua(k))TRi(k)ua(k)\= ^ вг(к), i = l,...,n.
k=ko k=ko
Definition 3. Pareto-optimal solution is called time-consistent if there exists a PDP such that the condition of individual rationality is satisfied
O
J2ei(k) > Vi(l,xa(l)), Vl > k0, i = 1,...,n, (13)
k=l
where Vi(l, xa(l)) - is Nash outcome of player i in subgame Г(l, xa(l)).
Let for some Pareto-optimal solution the condition (12) is satisfied. Then there exist such functions ni(k) > 0, that
O
Ja(ko, xo, ua) - Vi(ko, xo) = ^2 ni(k). (14)
k=ko
In (Petrosyan, 1997) the formula for PDP, which guarantees a time-consistency in cooperative differential game with nontransferable payoffs, is considered. The following theorem gives an analog of this formula.
Theorem 1. Let inequalities
Ja(k0, x0, ua) > Vi(ko, x0), i = 1,... ,n, are satisfied for some Pareto-optimal solution. Then PDP @(k) computed by formula fc(k) = ni(k) - Vi(k + 1, xa(k + 1)) + Vi(k,xa(k))
i = 1,... ,n, k > ko (15)
guarantees time-consistency of this Pareto-optimal solution along the cooperative trajectory xa(k) for k > k0. Here ni(k) > 0 - are functions satisfying (14).
Proof. Show that j3(k) is a PDP:
OO
У~] Pi(k) =^2 ni(k) - Vi(<x>, xa(<x>)) + Vi(ko, xo) =
k=ko k=ko
= Ja(ko,xo,ua) - Vi(ko,xo) + Vi(ko,xo) = Ja(ko,xo,ua). (16)
Here Vi((X, xa(rx)) = lim Vi(k,xa(k)) = 0. So j3(k) satisfies definition 2.
k——to>
Now show that the condition of individual rationality is satisfied. Using (15) we obtain
OO
^2@i(k) = ^2 Vi(k) - Vi(w,xa(^)) + Vi(l,xa(l)) = k=l k=l
O
= X)Vi(k) + Vi(l,xa(l)) > Vi(l,xa(l)). (17)
k=l
□
2.1. Irrational Behavior Proof Condition
The condition under which even if irrational behaviors appear later in the game the concerned player would still be performing better under the cooperative scheme was considered in (Yeung, 2006). The irrational behavior proof condition for differential games with nontransferable payoffs is proposed in (Belitskaia, 2012). In this paper the irrational behavior proof condition is concretized for linear-quadratic discretetime dynamic games with nontransferable payoffs.
Definition 4. Pareto-optimal solution (J^(k0, x0, ua),..., Ja(k0, x0, ua)) satisfies the irrational behavior proof condition (Yeung, 2006) in the game r(k0,x0), if the following inequalities hold
i
Y,Pi(k) + Vi(l + 1,xa(l +1)) > Vi(ko,xo), i = 1,...,n (18)
k=ko
for all l > k0, where 3(k) = (31(k),..., 3n(k)) is time-consistent PDP of (J^(k0, x0, ua), ..., Ja(ko,xo,ua)).
So if for all i = 1,... ,n the following inequalities holds
3i(k) + Vi(k + 1,xa(k +1)) - Vi(k,xa(k)) > 0, k > k0,
then the Pareto-optimal solution satisfies the irrational behavior proof condition. Rewrite these inequalities using (8)
3i(k) + (xa(k))T^ (A(k) + B(k)M a(k))T Oi(k + 1)(A(k) + B(k)M a(k))-
Qiik^j xa(k) > 0, k > k0 (19)
If we use formala (15), then
3i(k) + Vi(k + 1, xa(k + 1)) - Vi(k, xa(k)) = ni(k), k > ko,
where ni(k) > 0 for all k > k0. It means that conditions (19) are always satisfied in this case.
Let’s formulate these results.
Theorem 2. If in linear-quadratic discrete-time dynamic games with nontransferable payoffs for some Pareto-optimal solutions and its PDP the following inequalities hold
3i(k)+ Vi(k + 1,xa(k + 1)) - Vi(k,xa(k)) > 0, k > k0 i = 1,..., n.
where Vi(l,xa(l)) - is Nash outcome of player i in subgame r(l,xa(l)), then the irrational behavior proof condition for this Pareto-optimal solutions is satisfied.
Proposition 1. If the PDP 3(k) of Pareto-optimal solution in linear-quadratic discrete-time dynamic games with nontransferable payoffs is calculated using formula (15), then the irrational behavior proof condition for this Pareto-optimal solutions is satisfied.
3. Example
As an example consider the government debt stabilization game(van Aarle, Boven-berg and Raith, 1995). Pareto solution of this game is considered in (Engwerda, 2005). This paper shows the discrete-time case of this problem and time-consistency of cooperative solution.
Assume that government debt accumulation, d(k), is the sum of interest payments on government debt, rd(k), and primary fiscal deficits, f (k), minus the seignorage (i.e. the issue of base money) m(k). So,
The objective of the fiscal authority is to minimize a sum of time profiles of the primary fiscal deficit, base-money growth and government debt
The monetary authorities are assumed to choose the growth of base money such that a sum of time profiles of base-money growth and government debt is minimized. That is
d(k + 1) = rd(k) + f (t) — m(t), d(0) = d0,
k=0
k=0
Let
k
fc + 1
k
k
Then our system can be rewritten as
2
x(k + 1) = A(k)x(k) + Bi(k)ui(k)
The payoff function of player i
TO
2
J =^Z(xT (k)Pi(k)x(k)+^2 uT (k)Rij (k)ui(k)), Vi = 1, 2
k = ko
Pl = (oo) ,p2 = (oo) , Rn = 1, Rl2 = n, R21 =°, R22 = 1-
Following (Basar and Olsder, 1999) to find the Nash equilibrium we solve the system
(A(k) + ]T Bi(k)MlNE (k))T Giik + 1)(A(k) + ]T Bi(k)MlNE (k))-i=1 i=1
< - Oi(k) + Pi(k) + MNE(k)TRij(k)MNE(k) + MNE(k)TRii(k)MfE(k) = o, MNE (k) = -(Rii(k) + BT (k)Oi(k + l)Bi(k))-1B'T (k)Oi(k + l)x x (A(k) + Bj(k)MNE(k)), i =1, 2, j = i.
Let A = 1] = 1, 2 = s = 2,7 = 1- Then
uNE(k,x) = (—o.o73193 -o.166311) x(k), uNE(k,x) = (o.142o83 o.318188) x(k),
T = T fo.656174 o.3542o2\
Jl = Xo lvo.3542o2 o.844156J x°’
T = T /1.273766 o.613o87\
T2 = xo \^o.613o87 1.444844) xo.
V(1 x(k)) = xT(k) (o656174 o3542o2\ x(k)
V (1,x(k)) x (k^o.3542o2o.844156) x(k):
V(2 (k))= t(k) /1.273766 o.613o87\ (
V(2,x(k)) x (k) [o.613o87 1.444843) x(k)>
According to (8) to find the Pareto Solution we solve the system
' (A(k) + BiM^ + B2M2a)Tea(k + 1)(A(k) + BiM^ + B2Ma)-
- Oa(k) + Pa(k) + Ma(k)TRa(k)Ma(k) = o,
M a(k) = -(Ra(k) + BT (k)Oa(k + 1)B(k))-1 x x BT(k)Oa(k + 1)A(k).
wi,ere pam = ap1(k) + (1 - a)pm, R‘(k) = (“O ^ “ _ «r J ,
B(k) = (B1(k) B2(k)) .
For a = o, 45
M^ = (-o.22726184o8 - o.5o75o99515)
M^ = (o.1o22678284 o.2283794781)
T ( a)= T Zo.68o8499o28o.413935316^
J1(u ) = xo yo.4139353163o.94o9769o84y xo
T = T / 1.22391491o o.4917964794\
J2(u )= xo \p.4917964794 1.139o11465 J xo
If, for example, xo = (-3 2), then
Ja(ko,xo,ua) - V1(ko,xo) = -Q.1Q7435164999999722
Jaa(ko,xo,ua) - V2(ko,xo) = -{).2164976646{){){){){)528
So, conditions (12) are satisfied (we consider the minimization problem, that is why we have an opposite sign in (12)).
But on the next step we have
Ja(k1,x1,ua) - V1(k1,x1) ={)X)5Q4{)46297943969643
It means, that time-inconsistency of the individual rationality condition is appear. To avoid this problem, use PDP, calculated by formula (15)
-o.1o7435164999999722 aT / o.53743o998 o.o7896oo736\ a.
k(k+ 1) +X° ^(^0.0789600736 0.16307449 )x°^k^
-0.216497664600000528 aT /1.060309389 0.144646529\ k(k+ 1) +X ^ V0.144646529 0.35798954 J X
(20)
Note, that ni(k) < o, because we consider the minimization problem now.
Sufficient condition for realization of irrational behavior proof condition has form:
01 (k)- xaT (k)( o.53743o<998 o°7896oo736\ xa(k)< o P1(k) x (k) Vo.o7896oo736 o.163o7449 J x (k) ~ o
Mk)- xaT (k) (1.o6o3o9389o144646529\ xa(k)< o P2(k) x (k) \o. 144646529 o.35798954 J x (k) ~ °
And they are satisfied for P(k), computed by formula (20).
References
Aarle, B. van, Bovenberg, L. and Raith, M. (1995). Monetary and fiscal policy interaction and government debt stabilization. Journal of Economics, 62(2), 111-140.
Basar T. and Olsder G. J. (1999). Dynamic Noncooperative Game Theory, 2nd edition. Classics in Applied Mathematics, SIAM, Philadelphia.
Belitskaia, A. V. The D.W.K. Yeung Condition for Cooperative Differential
Games with Nontransferable Payoffs. Graduate School of Management, Contributions to game theory and management, 5, 45-50.
Bertsekas D.P. (2007). Dynamic Programming and Optimal Control, Vol I and II, 3rd edition. Athena Scientific,
Engwerda, J. C. (2005). LQ Dynamic Optimization and Differential Games. Chichester: John Wiley Sons, 497 p.
Markovkin, M. V. (2006). D. W. K. Yeung’s Condition for Linear Quadratic Differential Games. In: Dynamic Games and Their Applications (L. A. Petrosyan and A. Y. Garnaev , eds.), St Petersburg State University, St Petersburg, 207-216.
Markovkina, A. V. (2008). Dynamic game-theoretic model of production planning under competition. Graduate School of Management, Contributions to game theory and management, 2, 474-482.
Petrosjan, L. A. (1997). The Time-Consistency Problem in Nonlinear Dynamics. RBCM -J. of Brazilian Soc. of Mechanical Sciences, Vol. XIX, No 2. pp. 291-303.
Petrosyan, L. A. and N. N. Danilov (1982). Cooperative differential games and their applications. (Izd. Tomskogo University, Tomsk).
Yeung, D. W. K. (2006). An irrational-behavior-proofness condition in cooperative differential games. Intern. J. of Game Theory Rew., 8, 739-744.
Yeung, D. W. K. and L. A. Petrosyan (2004). Subgame consistent cooperative solutions in stochastic differential games. Journal of Optimization Theory and Applications, 120(3), 651-666.