UDC 519.837 Вестник СПбГУ. Прикладная математика. Информатика... 2018. Т. 14. Вып. 4
MSC 91A20, 91A25
New characteristic function for multistage dynamic games*
Y. B. Pankratova, L. A. Petrosyan
St. Petersburg State University, 7—9, Universitetskaya nab., St. Petersburg, 199034, Russian Federation
For citation: Pankratova Y. B., Petrosyan L. A. New characteristic function for multistage dynamic games. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 2018, vol. 14, iss. 4, pp. 316-324. https://doi.org/10.21638/11702/ spbu10.2018.404
The finite stage dynamic те-person games with transferable payoffs are considered. The cooperative version of the game is defined, and a new approach for constructing characteristic functions in multistage games based on characteristic functions defined in stage games is proposed. It is proved that the values of this new characteristic function dominate the values of characteristic function constructed using the min-max approach. This allows constructing the subcore of the classical core in the multistage game under consideration and guarantees that this new approach leads to time-consistent (works L. Petrosyan, G. Zaccour, 2003; L. Petrosyan, 1991) and in some cases strongly time-consistent solutions (paper L. Petrosyan, 1993). The example is provided showing the construction of this newly defined characteristic function and the time-consistency and strong time-consistency of the core.
Keywords: multistage game, characteristic function, time-consistency, strongly time-consistency.
Introduction. There are many ways to define characteristic function in cooperative games. For one stage games in recent literature characteristic function is defined axiomatically. In the same time this approach is difficult to use for multistage dynamic games since for solution of cooperative version of dynamic games the evolution of characteristic function plays important role. In this paper we try to define the characteristic function in multistage game using the characteristic function in stage game, this approach seems to be effective from computational point of view. Also the newly defined characteristic function outnumbers the value of classical characteristic function in each stage game. This gives the possibility of constructing time-consistent and strongly time consistent solution concepts using non-negative imputation distribution procedure (IDP).
Main result. In the paper a finite multistage game G(z1), which starts from one stage game r(zi) in vertex zi of game tree G is considered. Denote by r(z) one stage game in vertex z of game tree G(z):
r(z) = < N;U*,...,UZ,...,UZ;K*,...,K*,...,KZn > . (1)
In formula (1) N is a set of players, which is the same for all games r(z), z e G, UZ is the set of strategies of player i e N and K* is a payoff function of player i. The case, when the game r(z) is finite, i. e. the sets U* are finite, is considered. During the game G( zi ) a finite sequence of one stage games
Г\z1),...,r(zk),...,r(zl)
* This work was supported by the Russian Foundation for Basic Research (grant N 17-51-53030). © Санкт-Петербургский государственный университет, 2018
is realized. Here if the stage game r(zk) takes place the next stage game r(zk+z) occurs in the vertex zk+z = T(zk; u1k,... ,u:^k) dependent on the stage zk and strategies of players uZk = (uzik ,...,unk) chosen in stage game r(zk). Under the strategy of player i e N in game G(z1) we understand the mapping ui(-), which determines the player's i e N choice in each possible one stage game, i. e. ui(z) = uZ e UZ. Each strategy profile u = (uz(-),... ,un(-)) uniquely determines the trajectory z = (z1,...,zl) and the payoffs of players as sums of their payoffs in corresponding stage games.
Strategy profile u = (uz(-),... ,un(-)), which maximizes the sum of players playoffs in game G(zz), we will call "cooperative strategy profile". This cooperative strategy profile generates a sequence of one stage games r(zz), r (z2),... ,r (zl), and corresponding path z = (z1,...,zl), which we will call cooperative trajectory.
We are interested in subgames of game G(zz) along cooperative trajectory
G(zi),G(z2),...,G(zi),
each of them starts from one stage game r (zk), k = 1,...,l.
Consider cooperative version of game G(zz) and subgames G(zk), k = 1,...,l.
Let V(zk,S), S c N, be a characteristic function in subgame G(zk), which is defined in classical sense [1], i. e. as a lower value of zero-sum game G(zk) between coalition S and coalition N\S, where coalition S is the first player and N\S is the second with payoff of coalition S equal to the sum of payoffs of its members. Also define the characteristic function V(z,S), S c N, in one stage game r(z) in a classical way as a lower value of zero-sum game associated with r(z) between coalition S as first player and coalition N\S as second.
Define W(S), S c N, as follows: W(S) = max V(z,S), S c N.
z
Denote by C(zk) the core in multistage game G(zk), and by C(zk) the core in one stage game r(zk), which realizes in stage k in vertex zk and D(zk) the set of imputations aZk = (aZk,...,annk) in r(zk), satisfying the condition
Y, aZk > W(S), S c N, S * N,
itS
Y azk = v(zk,N),
itN
here V(zk,N) is sum of players payoffs in the game r(zk), when players use the cooperative strategy profile in G(zk). Suppose that C(zk) * 0, C(zk) * 0 and D(zk) * 0. The last condition is true only if the inequality fulfils (a necessary condition) W(S) ^ min V(z, N).
z
Define
W(zk,S) = (l - k + 1)W(S), (2)
where l is a number of stages in game G(zz). Formula (2) defines a new characteristic function in multistage game G(zz), based on the analogue of characteristic function W(S) of one stage game r(z). Consider the set D(zk) as a set of imputations aZk = (aZk ,...,a:zk) in game G(zk) such that
Y aZ > W(zk,S), S c N, S * N,
itS
Y aZk = V(zk,N).
itN
Suppose that the condition
max V (z,S)=W (S )<V (z,N)=V (N ), z e G(z1), S * N,
is satisfied. The following theorem holds. Theorem 1. The inclusion
D (zk )cC(zk)
is true.
Proof. Since C(zk) is the set of all imputations azk = (aZk ,...,azZk) in the cooperative version of the multistage subgame G(zk) such that
£ aZk > V(Zk,S), S c N, S + N £ aZk = Z(Zk,N).
ieS icN
To prove (3) it is sufficient to prove
V (Zk,S) < W (Zk,S), k = 1,...,l, S c N. (4)
We prove (4) by induction on number of stages l in subgame G(zk). Suppose l = 1, then we have the subgame G(zl), which coincides with one stage game r(zl). Obviously in this case V(zl,S) = W(S), since all subgames of G(zl) contain only one vertex Zl. Consider a less trivial case l = 2. Then for lower value V(zl-1,S) we have the following analogue of Bellman equation:
V (zi-i,S) = max where Zi = T (Zi-i; ui,... ,un), or
min ( > K,
Uj ,jtN\S
(y Kf-1 (ui,...,un) + V (zi,S))
\itS L
V (zi-i,S) = max
Ui,itS
min ( У Kf-1 (ui,...,un) + V (T (zi-i; ui,...,Un),S)
Uj -jtN\S\itS
but
V(zi,S) = V(zi,S) ^ max V(z,S) = W(S)
and we get
V(zl-1,S) ^ max
Ui,itS
min (y Kf-1 (ui,...,un) + w (S)
Uj -jtN\S\itS
)
= W (S ) + max
Ui-itS
.(y Kf-1 (
\itS
min 0( У KT-1 (ui,...,un)
Uj ,jtN\S
= W(S) + V(zl-1,S) ^ W(S) + max V(z, S) = 2W(S) = W(zl-1,S).
z
Suppose the theorem is true for all l - k stage subgame. Then we can write
V (zi-k ,S) = max
Ui:itS
: (es
min ( У Kf к (ui,...,un) + V (T (zi-k; ui,...,un)
Uj ,jtN\S
,S))
By induction hypotheses
V (T (Zi-k; ui,...,un),S) = V (Zi-k+i ,S) < W (Zi-k+i,S) = (l - k)W (S).
318 Вестник СПбГУ. Прикладная математика. Информатика... 2018. Т. 14. Вып. 4
This gives us
V(zk,S) ^ max
min (£ KZk (u1,...,uri) + (l - k)W (S))
= (l - k)W(S)+ max
Ui,itS
min (У KZk (ui,...,un) Uj \S\tS г
^(l - k)W(S) + max V(z, S) = (l - k)W(S) + W(S) =
z
= (l - k + 1)W(S) = W(zk,S).
The theorem is proved.
Definition 1. The finite sequence of vectors 3 = (31,...,3i,...,3n) is called IDP in
i
G(z1) for an imputation a = (o1,...,On), if a = Y 3ik, where 3i = (3i1,...,3ik ,...,3ii)
k=1
and k is the corresponding stage in the game G(z1).
The IDP 3 = (31,...,/k____,/l) is called time-consistent [2-5] for imputation oZ1 e
D(z1) if for any 1 ^ k ^ l
i
Y 3m = oZk e D(zk).
m=k
It is clear that if we take sequence of imputations oZ1 ,oZ2 ,...,oZk ,...,oZl, aZk e D(zk), k = 1,...,l, and define
3k = oZk - oZk+1, k = 1,...,l - 1,3 = oZl, (5)
the IDP 3 = (31,... ,3k____,3l) will be time-consistent IDP for oZ1 e D(z1). In some cases
it is important that 3k ^ 0, k = 1,...,l. But in general the nonegativeity of IDP can not be guaranteed. In our case the following theorem holds.
Theorem 2. For any oZ1 e D(z1) there exist non-negative time-consistent IDP 3
(3 > 0).
Proof. For each 1 ^ k ^ l we have
aZ 1 = Y 3m + aZk+1,
m=1
- - oZ1
which follows from (5) and time consistency of IDP /3 for imputation aZl. Take 3k = k = 1,...,l. Prove that 3 = (31,..., 3k,...,3i) is time-consistent IDP for oZ k. Define
~zk l-k+1 2
a k =-a
and prove that azk e D(zk) or
k+laZl > (l + k-l)W(S). (6)
itS l
Since az 1 e D(z\) we have
У azz1 > lW(S), У oZZ1 = lW(N). (7)
itS itN
U
Multiplying both sides of (7) on —~~ we get (6), which means that aZk e D(zk), k = 1,...,l. Then,
0Zk
¡3k = oZk - aZk+1 = — > 0.
Theorem is proved.
Definition 2. Set D(z1) is called strongly time consistent, if for each imputation
oZ1 e D(z 1), there exist such IDP 3 = (31,..., 3k,..., 3l), that D(z ^ d Y 3j ® D(z k+1),
k
У
j=l
where the operation © means a © B = {a + b, b e B}.
Theorem 3. Suppose V(z,N) = V(N) and is the same in all stage games. Under this condition the set D(z^ is time-consistent and strongly time-consistent.
The theorem holds also in general case without requirement that V(z, N) is the same for all stage games r(z), but the proof is more difficult.
If the condition of theorem 3 holds W(N) = lW(N) in G(z1). Suppose ^ e D(z1),
then
Y in > W(z1,S) = lW(S),
Y ti1 = W(z1,N) = lW(N).
itN
From (8) we have
Define Zik, k = l,...,l, as
itS l
= W(ZUN) = W(N).
itN l
A* = Y,
then we get that f3k = ((? 1k,... ,(iik) e D(zk), f3ik ^ 0. Consider the expression
k
Y z j © D(z k+1) j=1
and let £k+1 e D(z k+1), then we can construct IDP /? ' for the imputation £k+1
ni _ £fc+i,i 1 im~ l-(k + i)
and construct vector
6 = ¿1fj + ,¿1= l~(k + 1)6+1 = I6 + 6+1'
kk E ^ = 7 E in + E ^ 7lw(s) + (l~ k)w(s) =
itS l itS itS l
= kW(S) + (l - k)W(S) = lW(S)
and we get that £1 e D(z1).
(8)
Since imputation £k+i e D(zk+i) was arbitrary
k
£ Pi ® D(zk+i)cD(zi), j=i
which proves strongly time-consistency of D(z1).
Example. Consider a three persons three stages game G(z1) starting from the vertex z1, in which the stage game r(z^ is played. For each player i e N = {1,2,3} the set of strategies consists of two elements, which for simplicity we shall denote by {1,2}. The payoff function of players i e N in r(z^ is define as shown on Fig. 1.
/(5,5,5) (0,10,5)4 V(10,0,6) (1,1,5) /
Player 3
Player 1
Player 2
/(5,5,5) (0,10,5)4 \(10,0,6) (1,1,5)/
1\ /2
Player 2
Figure 1. Game r(zi)
Notice: If players choose profile (1,1,1), the payoffs are equal to (5,5,5). If on the first stage players 1 and 2 choose profiles (1,1) and (2,2) and player three chooses arbitrary strategy k = 1,2 the game passes to the stage
z2 = T(zi;1,1,k) = T(zi;2,2,k), k = 1, 2,
and on stage z2 the game r(z2) = r(z^ is played (the game r(z^ is repeated). If players 1 and 2 choose strategies (1,2) and (2,1) and player 3 chooses strategy 1 or 2 the game passes to the stage
z2 = T(zi; 1,2,k) = T(zi; 2,1,k), k = 1, 2,
and at stage z'2 the new stage game r(z2) is played. The payoff function of players i e N is defined as shown in Fig. 2.
Figure 2. Game )
In our example in games r(zi), r(z2) player 3 cannot change payoffs of players 1 and 2. This is done for simplicity.
If in stages z2, z'2 players 1 and 2 choose profiles (1,1) or (2,2), the game passes to the stage z3, where the following stage game r(z3) is played (Fig. 3). In either case the stage z3 is realized with the game r(z3) played on this stage (Fig. 4).
Player 3
1 / 2
(
(5,5,5) (6,0,10)
(0,10, or
(1,1,5) j
Player 1
1\ /2
Player 2
(5,5,0) 1(10,0,5)
(0,0,5) \ (1,1,10)/
/a
Player 2
Figure 3. Game r(z3)
Figure 4- Game Г(г3)
Thus, we have four different stage games which can occur in the game G(zi). In the Table we represent the values of characteristic functions for each of stage games.
Table. The values of characteristic functions
V(S) У(1) У(2) У(3) У(1,2) У(1,3) У(2, 3) У(1, 2, 3)
r(*2) = r(*i) 1 1 5 10 6 6 16
г(4) 0 0 5 10 5 6 16
Г(*з) 1 0 0 10 11 10 16
г(4) 0 0 0 10 9 5 16
By definition of W we have in this case
W (1) = 1, W (2) = 1, W (3) = 5,
W({1, 2}) = 10, W({1, 3}) = 11, W({2,3}) = 10, W({1, 2,3}) = 16.
All possible trajectories in G(zi) are cooperative, since V(N) = 16 in each stage game r(z). Thus, we have
W (zi,S) = 3W (S), W (z2,S)=2W (S), W (zs,S) = W (S).
Denoting for simplicity W(z,S) by W(S), we get
W(1) = 3 ■ 1 = 3, W(2) = 3 • 1 = 3, W(3) = 3 • 5 = 15,
W({1,2}) = 3 ■ 10 = 30, W({1, 3}) = 3 ■ 11 = 33, W({2,3}) = 3 ■ 10 = 30,
W({1,2, 3}) = 3 ■ 16 = 48.
The analogue of the core D(z1) in this case coincides with the set of solutions of following inequalities:
01 ^ 3, 02 ^ 3, 03 ^ 15, 01 + 02 ^ 30, 01 + 03 33, 02 + 03 30, 01 + 02 + 03 = 48.
It is clear that the set D(z1) is not empty (one can take 01 = 15, 02 = 15, 03 = 18 or «1 = 16, «2 = 15, «3 = 17). For each a e D(z\) we can define IDP = k = 1,2, 3, then it is easily seen that D(z1) is strongly time-consistent.
Conclusion. In this paper we tried to define conditions under which a subset of imputations from the core is time consistent and strongly time consistent in multistage game. For this reason we introduce a new characteristic function which dominates the values of characteristic function in the sense of the papers [6-9]. With the help of this newly defined characteristic function we construct the core and if it is not empty we prove its time consistency and strongly time consistency. This condition is strong enough but it holds in multistage games when stage games do not differ much from each other and have the same maximal joint payoff.
References
1. Von Neumann J., Morgenstern O. Theory of games and economic behavior. Princeton, Princeton University Press, 1953, 666 p.
2. Basar T. Time consistency and robustness of equilibria in noncooperative dynamic games. Dynamic Policy Games in Economics. Eds by F. Van der Ploeg, A. de Zeew. Amsterdam, North-Holland, Elsevier Science Publ., 1989, pp. 9-54.
3. Beard R., McDonald S. Time consistent fair water sharing agreements. Ann. Intern. Soc. Dyn. Games. Boston, Birkhauser Publ., 2007, vol. 9, pp. 393-410.
4. Marin-Solano J. Time-consistent equilibria in a differential game model with time inconsistent preferences and partial cooperation. Dynamic Games in Economics. Berlin, Springer Publ., 2014, pp. 219-238.
5. Yeung D. W. K., Petrosyan L. A. Cooperative stochastic differential games. New York, SpringerVerlag Publ., 2006, 253 p.
6. Petrosyan L. A. Strongly time-consistent differential optimality principles. Vestnik of Saint Petersburg University. Series Matematics. Mechanics. Astronomy, 1993, no. 4, pp. 35-40.
7. Petrosyan L. A., Gromova E. V. On an approach to constructing a characteristic function in cooperative differential games. Automation and Remote Control, 2017, vol. 78, no. 9, pp. 1680-1692.
8. Petrosyan L., Zaccour G. Time-consistent Shapley value allocation of pollution cost reduction. Journal of Economic Dynamics and Control, 2003, vol. 27, no. 3, pp. 381-398.
9. Petrosyan L. A. Time consistency of the optimality principles in non-zero sum differential games. Lecture Notes in Control and Information Sciences, 1991, vol. 157, pp. 299-311.
Received: August 28, 2018. Accepted: September 25, 2018.
Author's information:
Yaroslavna B. Pankratova — PhD in Physics and Mathematics; [email protected] Leon A. Petrosyan — Dr. Sci. in Physics and Mathematics, Professor; [email protected]
Новая характеристическая функция для многошаговых динамических игр
Я. Б. Панкратова, Л. A. Петросян
Санкт-Петербургский государственный университет, Российская Федерация, 199034, Санкт-Петербург, Университетская наб., 7—9
Для цитирования: Pankratova Y. B., Petrosyan L. A. New characteristic function for multistage dynamic games // Вестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления. 2018. Т. 14. Вып. 4. С. 316-324. https:// doi.org/10.21638/11702/spbu10.2018.404
В работе рассмотрены конечношаговые динамические игры те-лиц с трансферабельны-ми выигрышами. Для таких игр разработана кооперативная версия игры и предложен новый подход к построению характеристической функции на основе характеристических функций, определенных в одновременных играх. Показано, что значения новой характеристической функции для каждой коалиции превосходят величины характеристической функции, построенной на основе максиминного подхода. Последнее обстоятельство позволяет использовать новую характеристическую функцию для построения подъядра рассматриваемой многошаговой игры. Получены условия, которые гарантируют, что этот новый подход приводит к динамически устойчивому (см. работы Л. Пет-росяна, Д. Закура 2003 г. и Л. Петросяна 1993 г.) и в некоторых случаях к сильно динамически устойчивому решению, которое совпадает с подъядром (статья Л. Петросяна 1993 г.). В работе приведен контрольный пример определения новой характеристической функции и показана сильная динамическая устойчивость подъядра, построенного с его помощью.
Ключевые слова: многошаговая игра, характеристическая функция, динамическая устойчивость, сильная динамическая устойчивость.
Контактная информация:
Панкратова Ярославна Борисовна — канд. физ.-мат. наук, ст. преподаватель; [email protected]
Петросян Леон Аганесович — д-p физ.-мат. наук, проф.; [email protected]