STRATEGIC SUPPORT OF THE SHAPLEY VALUE IN STOCHASTIC GAMES

Parilina Elena M.

Contributions to Game Theory and Management, IX, 246—265

Strategic Support of the Shapley Value in Stochastic Games*

Elena M. Parilina

Saint Petersburg State University, 7/9 Universitetskaya nab., St. Petersburg, 199034, Russia. E-mail: [email protected] WWW home page: http://www.apmath.spbu.ru/en/staff/parilina/index.html

Abstract We consider the cooperative behavior in stochastic games. We assume that players cooperate in the game and agree on realizing the Shapley value as an imputation of their total payoff. The problem of subgame (time) consistency of the Shapley value is examined. The imputation distribution procedure is constructed to make the Shapley value subgame consistent. We redefine the payoffs in stochastic game applying the imputation distribution procedure. The problem of strategic support of the Shapley value is examined. We prove that the cooperative strategy profile is the Nash equilibrium in the stochastic game with re-defined payoff functions when some conditions are satisfied. The theoretical results are demonstrated on the example of a data transmission game for a wireless network of a specific topology.

Keywords: cooperative stochastic game, time consistency, subgame consistency, imputation distribution procedure, strategic support

1. Introduction

We consider the class of stochastic games with discounted payoffs when players use stationary strategies. This class of games for two players was introduced by (Shapley, 1953a). Most papers devoted to stochastic games examine the non-cooperative behavior of the players, e. g. see (Herings and Peeters, 2004), (Jaskiewicz and Nowak, 2015), (Rosenberg et al., 2003). The cooperative model of a stochastic game was initially proposed by (Petrosjan, 2006). He investigated the problem of subgame consistency of cooperative solutions in a stochastic game played over a finite tree. The same problem was examined for discounted stochastic games when the set of states is finite and players use stationary strategies in (Petrosjan and Baranova, 2006). In this paper the method of finding a cooperative solution and verifying if the solution satisfies the principle of subgame (time) consistency. The problem of time consistency of the cooperative solution was proposed by Petrosyan in (Petrosyan, 1977). He proposed to modify the payment mechanism along cooperative trajectory of the initial game and introduced the IDP (imputation distribution procedure) to make the cooperative solution time-consistent (Petrosyan and Danilov, 1979). This idea was realised for the class of differential games but it is actual for stochastic games as well.

Two other principles of stable cooperation in dynamic games were formulated in (Petrosyan and Zenkevich, 2009) including the principle of strategic support of a cooperative solution. If the cooperative solution is strategically supported, then

* This work was supported by Saint Petersburg State University (research project 9.38.245.2014) and Russian Foundation for Basic Research (project 16-01-00713).

there exists the Nash equilibrium in trigger strategies with the players' payoffs equal to their payoffs in cooperation. The trigger strategy punishes the deviating player by allowing him to obtain the maxmin value in a subgame starting from the stage following the stage when the deviation has been observed (Petrosyan, 2008). The problem of strategic support is considered in (Parilina and Zaccour, 2015b) where the subgame perfect e-equilibrium is constructed for the dynamic games played over event trees1. Another principle of stable cooperation is irrational-behaviour-proof (Yeung, 2006). It allows players to guarantee that their payoffs in cooperation will be not less than the payoffs when the cooperation breaks down at some stage and then players proceed playing the game as singletons. These three principles of stable cooperation were adopted to the discounted stochastic games (Parilina, 2015). Recently, the existence of cooperative solutions (Harsanyi, Shapley, Nash solutions) for discounted stochastic games was proved in (Kohlberg and Neyman, 2015).

In our paper we focus on two problems of cooperation in stochastic games: subgame consistency and strategic support. We prove the theorem which allows to construct the imputation distribution procedure to make the Shapley value subgame consistent. Then we define the behavior strategy profile in trigger strategies to support cooperation in case when some player deviates from the cooperative strategy profile. We need to mention here that initially we find the cooperative solution assuming that players use stationary strategies, but the construction of the trigger strategies requires considering the class of behavior strategies. Behavior strategies allow to observe the player's deviation and switch to a trigger mode of the trigger strategy.

As an example of a stochastic game we examine the problem of data transmission in the simple wireless network. The simple network of data transmission consisting of three nodes is taken as a basis of network topology. Two of the nodes generate data packages in each time slot with the corresponding probabilities. The third node is the destination one. The first two nodes are connected by a channel, the connection is one-way, i.e. the first node (first player) can transmit a package directly to node 3 or to node 2. For the transmission of a package to node 2 node 1 receives a nonnegative reward. The system of rewards and costs makes it possible to support cooperation between nodes 1 and 2 which are players 1 and 2 in the game, respectively. The described situation can be solved as a cooperative stochastic game.

Modeling data transmission as a stochastic game was introduced in (Altman et al., 2003, Parilina, 2010, Sagduyu and Ephremides, 2006). The game theory models of the behaviour in ad hoc wireless networks with emphasis on the development of cooperation mechanisms to stimulate package forwarding are considered in (Michiardi and Molva, 2003). Game theoretical models are useful for modeling the data transmission not only in ad hoc but also in CSMA networks (Benslama et al., 2013). The problem of constructing and analyzing the simple mechanism to stimulate the nodes for package forwarding is investigated in (Buttyan and Hubaux, 2003).

The rest of the paper is organized as follows: Section 2 describes the model of non-cooperative stochastic game, and Section 3 deals with the construction of the cooperative version of the stochastic game. Section 4 contains the description of the subgame consistency problem and the method to make the cooperative solu-

1 The details of the specification of a game played over event trees may be found in (Haurie et al., 2012)

tion subgame-consistent. The idea of strategic support of a cooperative solution is investigated in Section 5. We provide an illustrative example of data transmission in wireless network in Section 6, and briefly conclude in Section 7.

2. Non-cooperative stochastic game

We consider a stochastic game with finite number of states when player's payoff is a discounted sum of the stage payoffs which players obtain along the realized trajectory (sequence of the realized action profiles). The game begins with the chance turn, i.e. with the choice of the initial state of the game which the game process begins with. The state of the stochastic game is determined as a normal form game of n players. One of the finite number of states is realized at each stage of the stochastic game. In the state some action profile is realized depending on the transition probabilities.

Definition 1. Let stochastic game G be determined by the set

| N, {rj j4. ,S,n0, {p(j,k; xj)} _ _ n ) , (1)

y 1 JJ=1 1 u n j=i,t,h=i,t,xie n Xjy v '

where

— N = {1,..., n} is the set of players.

— rj = (N, Xj,..., Xil, Kj,..., Kj) is a non-cooperative normal form game which defines the state j, j = 1,... ,t, Xj is the finite set of pure actions of player i in rj, Kj ^x1,..., x^j = Kj (xj) is a payoff function of player i in state rj, j = 1,...,t.

— p(j, k; xj) is the probability that state rk is realized if at the previous stage (in state rj) action profile xj = (x1,..., x4) has been realized, p(j, k; xj) ^ 0 and Sk=i P(j, k; xj) = 1 for each xj G Xj = Xj and for any j, k = 1,..., t.

ieN

— S G (0,1) is the discount factor.

— n0 = (n0,..., ) is the vector of the initial distribution on states r1,..., rt where nj1 is the probability that state rj is realized at the first stage of the

game, £j=i n0 = 1.

For constructing the cooperative model of a stochastic game we need to define its subgame and the class of strategies which players use in the game.

Definition 2. Stochastic game (1) with vector n0 = (0,..., 0,1, 0,..., 0) (the j-th component is equal to 1), i.e. game beginning from state rj, is called a stochastic subgame Gj , j = 1, . . . , t.

We assume that players realise stationary strategies in the game. Let Si = {ni} be the set of stationary strategies of player i G N in game G. Using stationary strategies a player chooses an action in each state from {r1,..., r^ depending only on which state is realized at this stage, i.e. n : rj i—> xj G Xj, j = 1,..., t. Considering stochastic game in stationary strategies, and taking into account that the set of states is finite and the game has an infinite horizon, there are a finite number of subgames of game G. The number of subgames equals the number of the states.

Remark 1. Obviously, the stationary strategy of player i in game G is also a stationary strategy of the player in subgames G1,..., Gt.

The payoff to the player in game G is a random variable. Consider the mathematical expectation of the player's payoff as his payoff in game G. Let Ej(n) be the expected payoff of player i in game G and Ej (n) be the expected payoff of player i in subgame Gj when strategy profile n is realised in game G (subgame Gj). Let Ej(n) be the vector (E1(n),..., E|(n))'.

The expected payoff to player i in subgame Gj satisfies the following recurrent equation:

t

Ej(n) = Kj(xj) + k; xj)Ek(n) (2)

k = 1

s. t. n(rj) = xj, i. e. n(^) = (^0);..., nn(0) where ni(rj) = xj G Xj, xj = (x1,..., xn) for any j = 1,...,t, i G N. Hereinafter, let n(0 = (ni (•),..., nn(•)) be the stationary strategy profile such as n (rj) = xj G Xj where j = 1,..., t, i G N.

The transition matrix of stochastic game G when stationary strategy profile n(-) is realised is:

/p(1,1; x1) .. .p(1,t; x1)^ p(2,1; x2) ...p(2,t; x2)

n (n)

(3)

\p(t, 1; xt) ... p(t, t; xt) J

We can rewrite equation (2) in a matrix form using matrix (3) in the following way:

Ej(n) = Kj(x) + ¿n (n)Ej(n), (4)

where Kj(x) = (K^x1),..., Kit(xt)), and Kj(xj) is the payoff to player i in state rj when action profile xj G Xj is realized in this state. Equation (4) is equivalent to the equation2 :

Ei(n) = (It - ¿n(n))-1 Ki(x), (5)

where It is an identity t x t matrix.

The expected payoff to player i in game G is calculated by formula:

Ej(n)= n0Ei(n). (6)

3. Cooperative stochastic game

Suppose that the players from the grand coalition N decide to cooperate and receive the maximal total payoff. Denote the strategy profile maximizing the sum of the expected players' payoffs in game G as n(-) = (^(0,..., nn(0):

E*(n) = E Ej(n). (7)

ne n Si z—' z—'

' ¿eiv * ieN ieN

2 Matrix (It — Sn(n))-1 always exists for 5 G (0,1). The proof follows. It is known that all the eigenvalues of stochastic matrix n (n) are in the interval [—1,1]. For the existence of matrix (It — Sn(n))-1 it is necessary and sufficient that the determinant of matrix (n(n) — 1 It) be not equal to zero. Thus matrix (n(n) — 1 It) must not have the eigenvalue to be equal to 1. The last condition takes place because 1 > 1, so this number cannot be the eigenvalue of stochastic matrix n (n).

Call the strategy profile n(-) cooperative strategy profile.

The cooperative model of a non-cooperative game G is given by set (N, v), where N is the set of players and v is a real-valued function, called the characteristic function of the game, defined on the set 2N (the set of all subsets of N), and satisfying the property: v(0) = 0. The value v(S) is a real number which is assigned to coalition S C N, and may be interpreted as the worth or power of coalition S. The members of coalition S play together as a unit.

Define the characteristic function -y(S) in stochastic game G using characteristic functions vj (S) of stochastic subgames Gj, j = 1,..., t, as follows:

v(S )= n0v(S) (8)

for any coalition S C N where v(S) = (vx(S),..., v4(S))'. And vj(S) is the value of the characteristic function for subgame Gj calculated for coalition S. Now the problem is to define the characteristic function vj (S) for any coalition S. We use a-approach to define the characteristic function. According to this approach the value of characteristic function for coalition S is equal to the maximal total payoff of coalition S which this coalition can guarantee when the left-out players cooperate and minimize total payoff of coalition S.

First, consider coalition S = N. Bellman equation for v(N) is:

v(N) = max ne n s

+ ¿n (n)v(N)

.ieN

= ^ Ki(x) + ¿n(n)v(N),

ieN

where n(0 is the cooperative strategy profile. Therefore, the value v(N) is:

v(N) = (It - ¿n(n))-1^ Ki(x). (9)

ieN

Second, consider coalition S C N, S = 0. To define the value of characteristic function vj (S), j = 1,..., t, for each subgame Gj, we consider a zero-sum stochastic game G^ with two players (coalitions S and N\S) where coalition S C N plays as a maximizing player and coalition N\S plays as a minimizing player. Define the value vj (S) for subgame Gj as a maxmin of the payoff of coalition S in stochastic game G^ (in fact, the lower value of matrix game):

vj(S) = maxmin V(ns,nw\s), (10)

ns Vn

ies

where (nS(-),nw\s(•)) is a stationary strategy profile such that nS(•) = (n«i (•),..., nik (•)) is a vector of stationary strategies of players i1,..., € S, i1 U ... U = S, nS(•) € nk=1 Sj, the set of stationary strategies of coalition S C N, and nN\s(•) is a vector of stationary strategies of players ik+1,..., «„ € N\S, ik+1 U... U«„ = N\S, n;=fc+1 , the set of stationary strategies of coalition N\S.

Third, consider S = 0. Let the value of characteristic function be:

vj (0) = 0. (11)

Remark 2. Characteristic functions -y(S) determined by (8) and vj (S) determined by (9)—( 11) are superadditive.

eN

Definition 3. Cooperative stochastic subgame GCo is a set (N, vj), where N is the set of players and vj : 2N —> R is the characteristic function calculated by (9)-(11).

Definition 4. Cooperative stochastic game Gco is a set (N, v), where N is the set of players and - : 2N —> R is the characteristic function calculated by (8).

Definition 5. Vector aj = (aj,..., a^) satisfying the two following conditions:

1. £ieN aj = vj (N),

2. aj > vj({i}) for any i G N,

is called an imputation in subgame GCo (j = 1,..., t). Denote the imputation set in cooperative subgame GCo as Aj.

Definition 6. The vector a = (a1,..., an), where aj = n0aj, aj = (a1,..., at), and (a1, ...,an) = aj G Aj is called an imputation in game Gco. Denote the imputation set in cooperative stochastic game Gco as A.

Suppose that the imputation set in any subgame GCo, is nonempty, j = 1,..., t. Therefore, the imputation set in cooperative stochastic game Gco is also nonempty.

4. Subgame consistency of the Shapley value

Suppose that players decide to cooperate in stochastic game and for every subgame GCo they agree to choose an imputation aj = (j ...,a^) G Aj. The problem is to realize payments to the players at each stage of the stochastic game to guarantee the expected payoff aj for player i in stochastic subgame Gj. If players receive stage payoffs according to their payoff functions they hardly ever obtain the components of the chosen imputation in mathematical expectation sense. To solve this problem we should suggest the method of redistribution of the total players' payoff in every state which may be realized during the game. Initially, the method was proposed by (Petrosyan and Danilov, 1979), for differential games.

There are two principles of constructing the payment scheme in a dynamic game which can be applied to the theory of stochastic games:

1. The sum of the payments to the players in every state is equal to the sum of the players' payoffs in action profile realized in this state according to the cooperative strategy profile -(•).

2. The expected sum of the payments to player i in the game G is equal to the ith component aj of the imputation a.

Taking into account that in stochastic game (1) with stationary strategies the number of subgames is equal to the number of possible states, we need to define the vector $ = (ySj1, ) for every i G N, where ,0j is a payment to player i in state rj, j = 1,..., t. If these payments satisfy the two mentioned principles, they are called imputation distribution procedure (IDP) (see (Petrosyan and Danilov, 1979)). We are interested in constructing the subgame-consistent (time-consistent) IDP.

Definition 7. We call the IDP subgame-consistent in stochastic game G if for any subgame of game G the vector of the expected discounted sums of the payments to the players 1,..., n belong to the same cooperative solution3.

3 Let the cooperative solution be a singleton like the Shapley value. Then in any subgame we consider the Shapley value as a cooperative solution. The case where the cooperative solution is the set (e.g., the core) is considered in details by Parilina and Zaccour, 2015a for the games played over event trees.

In the paper we examine the Shapley value as a cooperative solution. Therefore, the subgame-consistent IDP guarantees any player to obtain the corresponding component of the Shapley value in any subgame.

Theorem 1. Let the components of the IDP be calculated by equation:

A = (It - ¿n(n))ai, (12)

where ai = (a1,. .., at), and (aj, .. ., a^) = aj is the Shapley value in the cooperative game GCo with characteristic function vj (S). Then the IDP is subgame-consistent.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Proof. First, prove that A, i € N, calculated by equation (12) is the IDP. Taking into account equation (9) we obtain

£ A = (It - ¿n(n)) £ a = (It - ¿n(n))v(N) = £ Ki(x).

ieN ieN ieN

Second, we calculate the expected sum of the payments to player i in the game G according to equation (12). Denote this sum for player i by Bi and it satisfies the equation:

Bi = n0 Bi = n0(Bj1,...,B|)', where Bj can be found from equation:

t

Bj = Aj + )Bk,

k=1

or in a vector form:

Bj = Ai + ¿n (n)Bj. (13)

Equation (13) is equivalent to the following one:

Bj = (It - ¿n(n))-1 Ai. (14)

Taking into account equation (12), we prove that Bi = ai and then Bi = ai. The equity Bi = ai proves the subgame consistency of the IDP determined by equation (12).

Remark 3. Equation (12) is equivalent to the following one:

ai = Ai + ¿n (n)ai. (15)

The second summand at the right-hand side of the equation (15) is the expected value of the component of the Shapley value in subgame starting from the next stage. Therefore, any player will receive his component of the Shapley value in any subgame if the payments to the players are the IDP satisfying equation (12).

Obviously, if players realise the cooperative strategy profile n(-), the expected payoff of player i in stochastic game G with new payments in cooperative action profiles is equal to the expected value of the correspondent component of the Shapley value in cooperative stochastic game Gco.

Now for the imputation a = (a1,..., an), where ai = n0ai, ai = (a1,..., at), (a"[,..., an) = aj € , we determine the regularization of game G in the following way.

Definition 8. Noncooperative stochastic game Ga (subgame Gj,, j = 1,..., t) is called a -regularization of stochastic game G (subgame Gj ), if for any player i G N in state rj payoff function K®j (xj) is defined as follows:

Kaj (xj )=J Ajj . if xj = xj; (16)

j V 7 \Kj(xj), if xj = xj, V 7

where IDP A = (A1,..., An) satisfies equation (12)4.

We suggests a method of construction of the new payoff function in the game G (subgame Gj ) in every state when the action profile is cooperative. Here we may ask a question: "Do the players agree to redefine the payoff function in the game?" Our answer is "Yes", if they want to make the payoff functions in the states subgame-consistent in the sense of Definition 7. Redistributing the payoffs using the IDP A1,..., At in the states r1,..., rt respectively, player i receives the same sum (in terms of mathematical expectation) in game Ga (GO,) as he has planned to receive in the cooperative stochastic game Gco (GCo). Moreover, in any subgame his expected payoff will be the corresponding component of the Shapley value. In this case, we can state the subgame consistency (time consistency) of the chosen cooperative solution.

5. Strategic support of the Shapley value

In this section we need to consider the additional notations. Let r(k) G {r1,..., rt} be the state realized at stage k of game Ga. Let x(k) be the action profile realized in state r(k). Denote the subgame of game Ga starting from state r(k) as G^. Call the sequence ((r(1),x(1)),(r(2), x(2)),.. .,(r(k - 1),x(k - 1))) the history of stage k and denote it as h(k). Let T be the set {(r1, x1), (r2, x2),..., (rt, xt)}.

In this section we consider stochastic game Ga as the game with perfect information in the sense that at each stage k (k = 1, 2,...) all players know state r(k) and the history of stage k. We would like to prove that the cooperative strategy profile in the game Ga is the Nash equilibrium in trigger strategies. To construct the Nash equilibrium we need to consider the sets of behavior strategies , i G N.

Definition 9. We call the behavior strategy profile <p* = the Nash

equilibrium in game Ga if for any player i G N the inequality

Ea > e?^ ii ^j) (17)

is true for any behavior strategy G of player i, and E®(•) is the expected payoff of player i in a-regularization Ga.

The following theorem gives the condition when the cooperative strategy profile in the game Ga is the Nash equilibrium in the a-regularization Ga of game G.

Theorem 2. If in the a-regularization Ga the following inequality is true for any coalition player i G N:

A > (It - ¿n(n))F({i}), (18)

4 The IDP for stochastic games was initially proposed in (Petrosjan, 2006) when the game process is realised on a graph and in (Petrosjan and Baranova, 2006) when the number of states in stochastic game is finite and players use stationary strategies.

where F({i}) = (F 1({i}),..., F4({i})),

({i}) = max ^ Kj (xj || xj) + 6 || xj)v ({i}) k then in the game Ga

xj exj L i=i J

there exists the Nash equilibrium with payoffs (ai,.. .,an).

Proof. Consider the behavior strategy profile ((•) = ((;[(•),...,(„(•)) in game Ga:

<^i(h(k)) = <

^ anyone

if r(k) = rj, j = T7t, h(k) c T;

if r(k) = rj, j = T7t, 3 l G [T, k - T]

and p G N, p = i: h(l) c T,

and (r(l),x(l)) G T,

but (r(l), (x(l) || Xp(l))) G T,

in other cases,

(19)

where Xj (p) is an action of player i in state which with actions x^, k = i, k = p forms the strategy of coalition {N\p} in zero-sum game against player p in subgame Gr j.

The proof of the theorem is based on the proof of any folk theorem (for example, see (Dutta, T995)) using the structure of the trigger strategy (T9). We prove that <£(•) = (<£i(-),..., <£«(•)) determined in (T9) is a Nash equilibrium in stochastic game

If the player p does not deviate from the cooperative strategy profile ny, then taking into account the definition of the strategy (T9), the expected payoff of player p in the subgame Gj,, j = T,..., t, is

Ep (#•)) = Ep (ny(O).

Let Ep(<£(•)) be equal to the vector (Ep (<£(•)),..., Ep(<£(•))), then for any player p G N the equation is true:

Ep(0) = (It - ¿n(n))-1ßp.

(20)

Consider the strategy profile (((•) || Cp(0), p G N, when player p G N deviates from strategy <£p(-). Let stage k be such that there exists number l G [1, k — 1] such that history h(l) C T and state (r(l),x(l)) G T but (r(l), (x(l) || xp(l))) G T. Without loss of generality, we suggest that r(k) = . Calculate the payoff of player p in game Ga in strategy profile (((•) || Cp(0) as

Epa(^ y ^p)= n°E?(0 y ^),

where

Epa(^

) = EP"[1'fc-1](0 y ^p) + ¿k-1 nk-1 (0 y ^p)Epa'[fc'~)(0 y ^), (21)

where Ea'[1'k 1](( || (p) is the expected payoff of player p at the first k — 1 stages of game Ga, and || (p) is the expected payoff of player p in the subgame

of game Ga starting from stage k. Since there were no deviations of any players

j

from the cooperative strategy profile ??(•) up to stage k — 1 inclusive as it was shown before, the following equalities holds for the elements of the right side of (21):

Ea-[i-fc-i](0 ii = e^1-*-1^),

nfc-1№ ii ^) = n*-1 (n).

In the second term of the right side of (21) by Ep"''*'^^ || <^>p) we mean vector (Ep*-1^ || ^p),..., Ep^^ || ^p)) where Ep^'(<£> || is the expected payoff of player p in regularized subgame Ga starting from the state rj.

Now we calculate the expected payoff of player p in subgame G^, starting from stage k when the state r(k) is :

Epaj(0 y ^) = Kp(xj y xp) + xj y xp)V ({p}), (22)

i=i

because players from coalition N \ p will punish player p playing in zero-sum game against player p beginning from stage k +1 according to the definition of strategy profile <£(•).

Since the expected payoffs of player p in strategy profiles <3(-) and (<£(•) || <£>p(0) equal until stage k — 1, then as a result of the deviation, player p can guarantee the increase of his payoff only at the sacrifice of the part of game Ga beginning with stage k, i.e. at the sacrifice of the expected payoff in subgame Ga, j = 1,...,t. Player p in strategy profile (<3(-) || <£>p(-)) can guarantee the following expected payoff from stage k:

max i Kp(xj | xp) + ¿¿p(j,l; xj || xp)V ({p})l . (23)

xPeXP I i=i J

According to the definition of IDP, the expected payoff of player p in the subgame Ga in strategy profile <£(•) can be calculated by the equation:

Epa№) = (It — ¿n (n?))-1^p, (24)

where Ep"^) = (Ep*'1 (<£>(•)),..., Ep"'4^^)). Taking into account the inequality (18) and (23), (24) and we prove the inequality

Epa(#•)) > E^O) |

Therefore, the behavior strategy profile (19) is the Nash equilibrium in a-regulari-zation of game G. The expected payoff of player p in game Ga in strategy profile <£(•) is equal to ap where ap = n0ap and vector ap = (ap,..., ap) consists of p-th components of the Shapley value a1, ..., a4 calculated for the cooperative subgames G1, ..., G4 accordingly. This completes the proof.

6. Data transmission game 6.1. Model

In this section we introduce an example of a stochastic game application in telecommunication systems. We consider a slotted synchronous system in which nodes 1

and 2 independently generate packages in each time slot with probabilities a1 and a2, respectively, provided that their individual queues were empty at the end of the previous time slot. The graph of wireless network is depicted in Fig. 1. Some assumptions about this system are as follows:

1. Nodes 1 and 2 (players 1 and 2, respectively) are going to send their packages to a common destination (node 3).

2. The maximum buffer capacity of any node equals one. The destination node can accept only one transmitted package in one time slot. We do not assume multiple package transmissions or simultaneous transmissions and reception by any node in any time slot.

3. If players simultaneously transmit packages to the destination node, the last one rejects these packages and they return to their initial nodes, i.e. at the next time slot no new packages can be generated in nodes 1 and 2.

4. All transmitted packages have the same length, and it requires one time slot to transmit a package from one node to the other which has the direct channel with the first one.

5. Player 1 chooses between sending a package directly to node 3 or relying on node 2 to forward the package to the final destination (node 3).

6. If player 1 (node 1) transmits a package to player 2 (node 2) which has already had a package in its queue, player 2 rejects this package. Otherwise, player 2 decides on whether to accept or reject the package from player 1.

We suggest the following system of rewards and costs:

— f ^ 0 is a reward to player 1 or player 2 for each successful transmission to the destination node.

— Player 1 receives a reward c ^ 0 from player 2 for delivering a package to player 2 which can obtain the value f only after successful transmission of that particular package to the final destination in a subsequent time slot.

— Each time slot of package delay results in an additional cost d ^ 0 for the node that has that particular package in its queue (regardless of that package source).

— Djj is an energy cost of one package transmission from node i to node j.

We suppose that the game ends in any time slot with the probability 0 < q < 1. The probability 1 — q can be interpreted as a discount rate. The transmission problem in a wireless network can be solved as a stochastic game. Denote the pair (Q1, Q2) as the state of the stochastic game where Qj is a queue content of node i, i = 1, 2. The queue content Qj can be equal to 0 or 1 if no or one package is present at the queue of node i, respectively.

1

2

3

Fig. 1: Topology of a wireless network.

The set of states in stochastic game is

Q = {(0,0); (0,1); (1,0); (1,1)}.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Consider the game in a cooperative setting meaning that the players' actions are coordinated by one center to improve the work of the network. The coordination of the device actions are useful to increase the speed of data transmission. For solving the cooperative version of the stochastic game, we assume players have information not only on their own queues but also on the other player's queue.

Now we need to describe the states, i. e., the games in normal form corresponding to the states:

1. State (0, 0): Player 1 has a unique action W (waiting), player 2 has the same action W (waiting). The payoffs to the players are (0,0).

2. State (0, 1): Player 1 has a unique action W (waiting), player 2 also has a unique action T3 (transmission to node 3). The payoffs to the players are (0, f — D23).

3. State (1, 0): Player 1 has two actions: i) T3 (transmission to node 3), ii) T2 (transmission to node 2). Player 2 has two actions: i) Ac (accepting a package from node 1), ii) Rej (rejecting a package from node 1).

The payoffs to the players are represented in the matrix:

( (f — D13,0) (f — D13,0) N

^(c — D12, —c) ( —d — D12, 0)y!

4. State r(1,1): Player 1 has two actions: i) T3 (transmission to node 3), ii) W (waiting). Player 2 has two actions: i) T3 (transmission to node 3), ii) W (waiting). The payoffs to the players are as follows:

/(—d — D13, —d — D23) (f — D13, —d)\ ^ (—d,f — D23) (—d, —d) )

6.2. Transition matrix

Assume the players use the stationary strategies. In the game defined in stationary strategies the players' choice of an action in the states depends neither on the history, nor on the time slot, in which the game is at present, but depends only on the state. In applications of stochastic games it is important to use a simple set of strategies for decreasing the number of calculations of players' expected payoffs.

Denote the set of mixed stationary strategies of player i as t i, i — 1 , 2. According to the game structure the player 1 's mixed stationary strategy assigns him to choose action W with probability one in the states (0, 0), (0, 1), action T3 with probability P11 in the state (1, 0), and action T3 with probability p12 in the state (1, 1). The player 2's mixed stationary strategy assigns him to choose action W with probability one in the state (0, 0), action T3 in the state (0, 1), action Ac with probability p21 in the state (1, 0), and action T3 with probability p22 in the state (1, 1). Denote a player i's mixed stationary strategy as n — (pi1 ,pi2). A stationary strategy profile is n — (^1 ,^2) — (p11,p12,p21,p22).

The transition matrix when players realise stationary strategy profile n is

n(n) — {p(k,1; xfc)}fc=1,...ii;i=1,...ii, (25)

where

p(1,1; x1) —(1 — a1)(1 — 02),

p(1 2 x1 ) = (1 — ai)a2,

p(1 3 x1 ) = ai(1 — «2),

p(1 4 x1 ) = aifl2,

p(2 1 x2) = (1 — ai)(1 — «2),

p(2 2 x2) = (1 — ai)a2,

p(2 3 x2) = ai(1 — «2),

p(2 4 x2) = «i«2,

p(3 1 x3) = pii(1 — ai)(1 — «2),

p(3 2 x3) = pii(1 — ai)«2 + (1 — pii)p2i(1 — ai),

p(3 3 x3) = pii«i(1 — «2) + (1 — pii)(1 — P2i)(1 — «2),

p(3 4 x3) = piifl^ + (1 — Pii)P2i «i + (1 — Pii)(1 — P2i)fl2,

p(4 1 x4) = 0,

p(4 2 4 x4) = Pi2 (1 — P22)(1 — «i),

p(4 3 4 x4) = (1 — Pi2)P22 (1 — «2),

p(4 4 4 x4) = Pi2P22 + (1 — Pi2)(1 — P22) + Pi2(1 — ^22)04 + (1 — Pi2)P22«2

6.3. Payoff functions

If the stationary strategy profile n is realized, the payoff to player 1 in the stochastic game is

Ki(x) = (K^x1), K2(x2), K?(x3), K(x4))' ,

where

Kl(x 1) = K 2(x2) = 0,

K?(x3) = pn(/ - D13) + (1 -pii)p2i(c - D12) + (1 -pn)(1 -p2i)(-d - D12), K4(x4) = P12P22 (-d - D13) + pi2 (1 - P22)(f - D13) + (1 - pi2)(-d).

If the stationary strategy profile n is realized, the payoff to player 2 in the stochastic game is

K2(x) = (K2V), K2(x2), K23(x3), K24(x4))'

where K2 (x1)

0,

K2 (x2)= f - D23, K23(x3) = (1 - pii)p2i(-c),

K2(x4) = P12P22 (-d - D23) + (1 - P12)P22 (f - D23) + (1 - P22)(-d).

We consider the set of pure stationary strategies which is denoted as ^i, i — 1 2. For example, player 1's pure stationary strategy n1 — (1,0) assigns player 1 to choose action T3 in the state (1,0) and action W in the state (1,1). Each player has 4 pure stationary strategies in the stochastic game, therefore, there are 16 pure stationary strategy profiles. For each pure stationary strategy profile n — (n1 , n2 ) the transition matrix n(n) is determined by (25).

For example, for the pure stationary strategy profile n1 — (1,1,1,1) the transition matrix is

n (n1)

/(1 — ai)(1 — a2) (1 — ai)a2 ai(1 — a2) aia2\ (1 — ai)(1 — «2) (1 — ai)«2 ai(1 — «2) «i«2 (1 — ai)(1 — «2) (1 — ai)«2 ai(1 — «2) «i«2

\ 0 0 0 1 /

For each strategy profile n G ^ — n®=i we can calculate the expected players' payoffs for subgames which are denoted as: Ej(n) — (E(0'0)(n),E(0,1)(n),E(1'0)(n),

Strategic Support of the Shapley Value in Stochastic Games Ef-1)(n))', and

Ei(n) = (It — (1 — q)n (n))-1Ki(x), (26)

where Kj(x), n(n) are determined above.

The expected payoff to player i in the whole game including the chance move is

Ei(n) = n0Ei (n), (27)

where n0 = (n00 0), n°0 i), n°i 0), ^ 1)) is a vector of the initial probabilities, and n0 is the probability that the first state in the stochastic game will be k € Q. Vector n0 is given.

6.4. Algorithm of solving cooperative stochastic game

In this section we describe the steps of solving cooperative stochastic game of data transmission in a wireless network of topology represented in Fig. 1.

1. For any state k € Q and any pure strategy profile n = (ni, n2), n € Si, i = 1, 2, calculate the expected players' payoffs E* (n) in subgame G* by equation (5) and their expected payoffs in the whole game Ej(n) by equation (6).

2. Find the cooperative strategy profile n by equation (7).

3. Calculate the values of the characteristic functions v* (S) for any state k € Q and any coalition S C N using equations (9), (10), (11). Then calculate the values of the characteristic function v(S) for any S C N by (8).

4. Calculate the Shapley values a* = (a*,..., a^) for any subgame G* starting from state k € Q using formula (Shapley, 1953b):

a* = £ S"|N l-l,S|- (v* (S U{i}) — „>• (S)). (28)

SCN\{i} 1

Then calculate the Shapley value for the whole game a = (a1, a2) using equation ai = n0ai.

5. Calculate the components of the IDP A*, i = 1, 2 and k € Q by equation (12).

6. To construct the subgame-consistent Shapley value we determine the a-regula-rization Ga re-defining the payoff functions by equation (16).

7. Verify if there exists the Nash equilibrium in behavior strategies with payoffs (a^a2) using inequality (18).

6.5. Numerical illustration

We introduce the numerical example of the data transmission game for wireless network. We identify the parameters of the simulation. The probability of package appearance at node 1 is higher than in node 2: a1 = 0.4, a2 = 0.1. The probability of a game end is q = 0.01 which is equivalent to the discount rate 0.99. The rewards and costs are f = 1, d = 0.1, c = 0.3, D12 = 0.1, D13 = 0.6, D23 = 0.2. We may notice that the cost of package transmission from node 1 to node 3 is three times more than the cost of package transmission from node 2 to node 3. Therefore, the cooperation of nodes 1 and 2 may be profitable. Let the game begin from any state with equal probability, i. e., n0 = (0.25, 0.25, 0.25, 0.25).

Table 1 represents the expected players' payoffs

Ei(n)= (E(0 0)(n), e( 01) (n),E(1 • 0) (n),E(1 ■1)(n))'

for any pure stationary strategy profile n for any player i = 1, 2, and the sum of the expected payoffs. The last column in Table 1 is + E2 which is the total expected players' payoff in the whole game taking into account the vector of initial probabilities n0.

The cooperative strategy profile maximizing the total players' payoff is

n = n11 = (0,0,1,1),

in which the player 1's strategy nl1 = (0, 0) assigns him "not to transmit to node 3, but transmit to node 2" in state (1,0) when there is a package at node 1 and there is no package at node 2. In this state the player 2's strategy n;!1 = (1,1) assigns her "to accept the package" from player 1 in state (1,0). When the game in state (1, 1), player 1 "waits" and player 2 "transmits" to node 3.

The maximum of the total expected players' payoff in the whole game is

max V £i(n) = V ¿¿(n) = 26.9472. nes z—' ^—'

¿eN ieN

Table 1: Expected payoffs in the stochastic game.

n Ei(n) E2(n) Ei + E2

n1 = (1,1,1,1) -53.0129 -53.0129 -52.6129 -70.1000 -22.9935 -22.1935 -22.9935 -30.0000 -81.7048

n2 = (1,1,1, 0) 15.8400 15.8400 16.2400 16.2400 6.76818 7.56818 6.76818 7.27732 23.1355

n3 = (1, 0,1,1) 14.7353 14.7353 15.1353 14.8563 7.9200 8.7200 7.9200 8.7200 23.1855

n4 = (1, 0,1, 0) -5.10968 -5.10968 -4.70968 -10.0000 -7.02581 -6.22581 -7.02581 -10.0000 -13.8016

n5 = (1,1, 0,1) -53.0129 -53.0129 -52.6129 -70.0000 -22.9935 -22.1935 -22.9935 -30.0000 -81.7048

n6 = (1,1, 0, 0) 15.8400 15.8400 16.2400 16.2400 6.76818 7.56818 6.76818 7.27732 23.1355

n7 = (1, 0,0,1) 14.7353 14.7353 15.1353 14.8563 7.9200 8.7200 7.9200 8.7200 23.1855

n8 = (1, 0, 0, 0) -5.10968 -5.10968 -4.70968 -10.0000 -7.02581 -6.22581 -7.02581 -10.0000 -13.8016

n9 = (0,1,1,1) -64.7464 -64.7464 -65.9794 -70.0000 -27.3398 -26.5398 -27.9446 -30.0000 -94.3241

E1(n) E (n) E + E2

n10 = (0 ,1,1,0) 11.5347 11.5347 11.8060 12.0060 13.4228 14.2228 13.6218 13.8218 25.4926

n11 = (0 ,0 ,1,1) 4.90248 4.90248 5.04298 4.87602 21.4759 22.2759 21.8337 22.4792 26.9472

n12 = (0 ,0 ,1,0) -8.93504 -8.93504 -9.06741 -10.0000 -8.73596 -7.93596 -8.97396 -10.0000 -18.1458

n13 = (0 , 1,0 , 1) -64.2491 -64.2491 -65.4128 -70.0000 -26.728 -25.928 -27.248 -30.000 -93.4537

n14 = (0 , 1,0 ,0) -8.4855 -8.4855 -8.8128 -7.6828 5.60847 6.40847 5.57380 6.13680 -2.43475

n15 = (0 ,0 ,0 , 1) -18.532 -18.532 -19.010 -18.910 7.9200 8.7200 7.9200 8.7200 -10.426

n16 = (0 ,0 ,0 ,0) -10.559 -10.559 -10.917 -10.000 -8.8313 -8.0313 -9.0826 -10.000 -19.49526

We calculate the values of the characteristic functions for subgames by equations (9), (10), (11):

v({1}) = (-5.10968, -5.10968, -4.70968, -10.0)', v({2}) = (-8.735960, -7.93596, -8.97396, -10.0)', v({1, 2}) = (26.3784,27.1784,26.8766, 27.3553)',

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

The characteristic function of the whole game is found by equation (8):

w({1}) = -6.23226, w({2}) = -8.91147, w({1,2}) = 26.9472.

Then we may calculate the Shapley Values for the subgames and the whole game using equation (28):

— for subgames:

• a = (15.0023, 15.0023, 15.5705, 13.6776)',

• a2 = (11.376, 12.176, 11.3062, 13.6776)',

- for the whole game:

• a1 = 14.8132,

• a2 = 12.134.

The cooperative payoff distribution procedure A for player 1 and A for player 2 are found by equations (12) using the Shapley values:

- A = (0, 0,1.24274, -1.54974)',

- A = (0, 0.8, -1.34274, 2.24974)',

where is a payment to player i in the state k. Remind that the payoffs of the players in the states defined in the matrix forms are as follows:

- K1 = (0, 0,0.2, -0.1)',

- K2 = (0, 0.8, -0.3, 0.8)',

We may notice that in states (0,0) and (0,1) the components of IDP coincide with the payoffs according to the payoff functions K1 and K2. But in states (1, 0) and (1, 1) there is a redistribution of the total payoffs among the players. In state (1, 0) players obtain -0.1 together and according to the IDP player 1 receives 1.24274 instead of 0.2, and player 2 receives -1.34274 instead of -0.3. Therefore, player 2 gives 1.04274 to player 1 to make IDP subgame-consistent. In state (1,1) players obtain 0.7 together and according to the IDP player 1 receives -1.54974 instead of -0.1, and player 2 receives 2.24974 instead of 0.8. Therefore, player 1 gives 1.44974 to player 2 to make IDP subgame-consistent.

Thus, the Shapley Value a = (14.8132,12.134)' is subgame consistent if the payoffs to the players in the states are made according to the IDP A = (0, 0,1.24274, -1.54974)', and A = (0,0.8,-1.34274, 2.24974)'.

Now we need to examine the problem of strategic support of the cooperative strategy profile. First, calculate values Fk({i}) for i = 1, 2 and k G Q determined in Theorem 2:

1. F({1}) = (-5.10968, -5.10968, -4.70968, -5.28632)',

2. F({2}) = (-8.73596, -7.93596, -8.97396, -8.1858)'.

Second, verify if the inequalities (18) are true. For player 1 the inequality (18) takes the form:

0 0

1.24274

-1.54974

-0.186662 -0.186662 0.418853 -0.566649

We notice that the inequality is not true. In state (1,1) there is an intense for deviation of player 1 as his payoff according to IDP in this state -1.54974 is less than his payoff in case of deviation -0.566649. This means that the cooperation cannot be supported strategically by the behavior strategy profile (19).

For player 2 the inequality (18) takes the form:

0

0.8 -1.34274 2.24974

-0.0718427 0.728157 -1.01842 0.620393

Again, we notice that the inequality is not true. In state (1 , 0) there is an intense for deviation of player 2 as her payoff according to IDP in this state -1.34274 is lower than her payoff in case of deviation -1.01842. We state that the behavior strategy profile determined by (19) cannot strategically support the cooperative payoffs.

7. Conclusion

We have examined the problem of cooperation in a dynamic game having a stochastic structure. First, we construct the subgame consistent cooperative solution of the game by redefining players' state payoff functions using the imputation distribution procedure. Second, we provide the conditions to verify if the cooperative solution can be supported strategically. All theoretical results are demonstrated by the example of a stochastic game modeling data transmission in an ad hoc wireless network with a simple topology. Numerical simulations show the actuality of an application of a game-theoretical model to telecommunication problems because it proposes the method of cost reduction.

References

Altman, E., El-Azouzi, R. and Jimenez, T. (2003). Slotted Aloha as a stochastic game with partial information. in Proceedings of Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt '03), Sophia Antipolis, France, March 2003. Benslama, M., Boucenna, L., Batatia, H. (2013) Ad Hoc Networks Telecommunications

and Game Theory, Wiley-ISTE. Buttyan, L. and Hubaux, J. P. (2003) Stimulating cooperation in self-organizing mobile ad

hoc network, ACM Journal for Mobile Networks (MONET), 8(5), 579-592. Dutta, P. (1995). A folk theorem for stochastic games. Journal of Economic Theory 66,

1-32. International Game Theory Review, 4(3), 255-264. Haurie, A., Krawczyk, J.B., Zaccour, G. (2012) Games and Dynamic Games, Scientific World, Singapore.

Herings, P. J.-J. and Peeters, R. J. A. P. (2004). Stationary Equlibria in Stochastic Games:

Structure, Selection, and Computation. Journal of Economic Theory, 118(1), 32-60. Jaskiewicz, A. and Nowak, A. (2015). On pure stationary almost Markov Nash equilibria in nonzero-sum ARAT stochastic games. Mathematical Methods of Operations Research, 81(2), 169-179.

Kohlberg, E. and Neyman, A. (2015). The Cooperative Solution of Stochastic Games.

Working Paper 15-071, Harvard Business School. Michiardi, P. and Molva, R. (2003). A game-theoretical approach to evaluate cooperation enforcement mechanisms in mobile ad hoc networks, In Proc. Wi0pt'03: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks, Sophia-Antipolis, France. Parilina, E. M. (2010). Cooperative data transmission game in wireless network. Upravlenie

bol'simi sistemami, 31.1, 93-110 (in Russian). Parilina, E. M. (2015). Stable cooperation in stochastic games. Automation and remote

control, 76(6), 1111-1122. Parilina, E. and Zaccour, G. (2015a). Node-consistent core for games played over event

trees. Automatica, 53, 304-311. Parilina, E. and Zaccour, G. (2015b). Approximated cooperative equilibria for games played

over event trees . Operations Research Letters, 43(5), 507-513. Petrosyan, L. A. (1977). Stability of the solutions in differential games with several players.

Vestnik Leningrad. Univ. Mat. Mekh. Astronom, 19(4), 46-52 (in Russian). Petrosjan, L. A. (2006). Cooperative Stochastic Games. Advances in dynamic games, Annals of the International Society of Dynamic Games, Vol. 8, Part VI, 139-146.

Petrosyan, L. A. (2008). Strategically Supported Cooperation. International Game Theory Review 10(4), 471-480.

Petrosjan, L. A. and Baranova, E. M. (2006). Cooperative Stochastic Games in Stationary Strategies. Game Theory and Applications, XI, 7-17.

Petrosjan, L. A. and Danilov, N.N. (1979). Stability of the solutions in nonantagonistic differential games with transferable payoffs. Vestnik Leningrad. Univ. Mat. Mekh. Astronom., 1, 52-59 .

Petrosyan, L. A. and Zenkevich, N. A. (2009). Principles of dynamic stability. Upravlenie bol'simi sistemami, 26.1, 100-120 (in Russian).

Rosenberg, D., Solan, E. and Vieille, N. (2003). Stochastic Games with Imperfect Monitoring. International Journal of Game Theory, 32, 133-150.

Sagduyu, Y. E. and Ephremides, A. (2006). A game-theoretic look at simple relay channel. Wireless Networks, 12(5), 545-560.

Shapley, L. S. (1953). Stochastic Games. Proceedings of National Academy of Sciences of the USA, 39(10), 1095-1100.

Shapley, L. S. (1953). A Value for n-person Games. In Contributions to the Theory of Games, vol. II, by H.W. Kuhn and A.W. Tucker, editors. Annals of Mathematical Studies, 28, 307-317. Princeton University Press.

Srinivasan, V., Nuggehalli, P., Chiasserini, C. F., Rao, R. R. (2003). Cooperation in wireless ad hoc networks, Proc. INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications, San Francisco, CA, USA, 808-817.

Yeung, D.W.K. (2006). An irrational-behavior-proof condition in cooperative differential games. International Game Theory Review, 8, 739-744.

STRATEGIC SUPPORT OF THE SHAPLEY VALUE IN STOCHASTIC GAMES Текст научной статьи по специальности «Математика»

Аннотация научной статьи по математике, автор научной работы — Parilina Elena M.

Похожие темы научных работ по математике , автор научной работы — Parilina Elena M.

Текст научной работы на тему «STRATEGIC SUPPORT OF THE SHAPLEY VALUE IN STOCHASTIC GAMES»