COOPERATION IN THE MULTI-AGENT SYSTEM WITH DIFFERENT TYPES OF INTERACTIONS

Grinikh Aleksandra L.

Contributions to Game Theory and Management, XV, 60—80

Cooperation in the Multi-Agent System with Different Types of Interactions *

Aleksandra L. Grinikh

St. Petersburg State University, Faculty of Applied Mathematics and Control Processes, 7/9, Universitetskaya nab., St. Petersburg, 199034, Russia E-mail: st062331@student. spbu.ru HSE University,

St. Petersburg School of Physics, Mathematics, and Computer Science, 3A/2, Kantemirovskaya ul., St. Petersburg, 194100, Russia E-mail: [email protected]

Abstract This paper summarizes the list of our works that contain researches about optimality principles for the "n-person prisoner's dilemma" game. The classic model is considered through the new payoff function for each player that allows to consider it without restrictions for the number of players. The new characteristic function gives an opportunity to introduce the time-consistent subset of the core of the dynamic game. In accordance with this type of game we consider some specific properties of players' payoffs and construct the new way of their interactions. Using the network representation, the classic model is modified to the wider class of games that allows to specify players' influence to each other's payoff function. These investigations can be used for the description of cooperation in the other multi-agent systems.

Keywords: n-person prisoner's dilemma, cooperative game, characteristic function, network game, Shapley value.

1. Introduction

These days the most of people interactions can be approximated by the game theory models. One of the fundamental model is a "prisoner's dilemma". The game allows to consider rational agents' interactions in terms of mutually advantageous concessions, that means the rejection of strictly dominant strategy to achieve the Pareto optimal solution. Hamburger (Hamburger, 1973) involved "n-person prisoner's dilemma" game to construct the interactions of the wider range of people. His problem statement represents the cumulative effect of players' influence on each other results. The game contains all the main properties of the classical model that show the possibility of the higher joint gain if all of them abandon personal interests. Analysis of such kind of games is usually represented in the search of Nash equilibrium and, moreover, there can be investigated various principles of optimality of the cooperative type of the game.

This problem exhibits the confrontation of the own gains and the social welfare. All players are asked about their joint crime. Each of them has two pure strategies "to stay silent" or "to betray".

The higher number of players makes the game more interesting for the cooperative game theory, since it gives an opportunity to analyze players' interactions in

*This work was supported by the Russian Science Foundation grant No. 22-11-00051, https: //rscf.ru/en/project/22-11-00051/

https://doi.org/10.21638/11701/spbu31.2022.06

terms of the network. The construction of population structure through cooperation in social dilemmas can be represented using the "n-person prisoner's dilemma" type of game.

The great amount of researches of the model aims to explore the following four items:

— theoretical analysis of one-stage models;

— the study of multi-stage and repetitive models;

— the analysis of evolutionarily stable strategies;

— the representation of empirical results.

Straffin (Straffin, 1993) conducted the experiments with partially cooperative behavior in the repeated "n-person prisoner's dilemma" game. Since people interact during several stages and, moreover, each choice of strategies leads to changes in the relationships between them, it makes sense to expand the research of a repeated or multi-stage version of the proposed model. In the paper Straffin provides a general description of the one-stage model of "n-person prisoner's dilemma" and introduces the following basic principles for construction of players' payoff functions:

1. The strategy D ("to betray") is dominant for each player.

2. If all players choose the dominant strategy D, the sum of their payoffs is lower

than if all of them choose the strategy C ("to stay silent").

Aumann (Aumann, 1959) attempts to analyze the equilibrium behaviour of the players in regards with an unknown number of repetitions of the given type of game. He provides an example of construction of a characteristic function for the one-stage three-person prisoner's dilemma game. Moreover, both non-cooperative and partially cooperative behaviour for some coalitions in the model are analysed. Examples of well-known economic issues, such as the "tragedy of the commons", were suggested that could be analysed using the model.

To sum up the results of the final stage model, Carroll (Carroll, 1988) introduces the restricted probability function of the next stage of the game. He proves the theorem that there is a noncooperative Nash equilibrium in the repeated " n-person prisoner's dilemma" with a probability-constrained function that can be considered as the other type of the final game.

The paper of Petrosjan and Grauer (Petrosjan and Grauer, 2002) examines a repeated game and contains the discount factor that provides an effective punishment in the infinite version of the game. There is provided the theorem that shows a strong Nash equilibrium for the restricted range of discount factor. It is proved that for an infinitely repeated "n-person prisoner's dilemma" game the core is nonempty.

This paper researches the new equilibrium principle of behavior in terms of " n-person prisoner's dilemma" game. We investigate the new characteristic function (Petrosyan, 2019) to consider the different optimality principles in the dynamic "n-person prisoner's dilemma" model, particularly the subcore of dynamic game (Petrosyan and Pankratova, 2018) that contains the Shapley value.

As a result, the analysis of optimality principles for the multi-stage games and their time-consistency for the variety of models for the "n-person prisoner's dilemma" game are provided.

2. The Model of "n-Person Prisoner's Dilemma" Game without Differentiation of Players' Relationships

2.1. Model Description for "n-Person Prisoner's Dilemma" Game

Let y be a static game. A set of players is N that has a cardinality |N| = n. Each player has two pure strategies: C ("to stay silent") and D ("to betray"). Suppose players have the same impact to each other payoff, so that all of them can be considered as symmetric players.

Let x be the number of players from the set N that chooses the strategy C. The payoff function of the player i for the static game y without differentiation of players' relationships is hi that depends on the strategy of the player i and the number of other players from the set N that choose the strategy C, V i G N:

hi (x1? . . . , xi? . . . , xn)

(1)

Ci (x) = aix + bi, V x G (0, n], if xi = C and x is number of players from the set N that chooses strategy C,

Di (x) = a2x + &2, V x G [0, n), if xi = D and x is number of players from the set N that chooses strategy C.

The payoff function hi (x1, ..., xi, ..., xn) satisfies the following conditions:

1. Di(x — 1) > Ci(x), Vx G [1,n], i. e. the strategy "to betray" strictly dominates the strategy "to stay silent";

2. Ci(n) > Di(0), so the strategy profile (C, ..., C) is Pareto effective in contrast to (D, ..., D).

3. Di (x) > Di (0), Vx G [0, n — 1] and Ci (x) > Ci (1), Vx G [1,n], therefore, payoffs of the players in the case of the x silent players are at least not less than in case of the absence of silent players.

4. Ci(x) = Cj(x) and Di(x) = Dj(x), this means that the players are symmetric.

A considerable ammount of literature uses the table form of writing the payoff function. However, our investigation allows us to simplify the calculations of characteristic function and then describe the construction of the core and the calculating of the Shapley value. Moreover, such form of payoff function is useful for future research.

2.2. An Effective Punishment for the Infinitely Repeated Game

Denote by r infinitely repeated stage game y . For this game a strong Nash equilibrium (Petrosjan and Grauer, 2002) is found includes the new way of player's behaviour that provides players higher payoffs. Also this behavior is stable against the deviations of coalitions.

Suppose that each player chooses the strategy C during the first ki — k* stages. The sum of their payoffs on each stage equals to:

n n

A (N) = £ Hi (C,..., C) = £ Hi (x).

i=i i=i

Since the strategy D is the strictly dominant for every stage of the " n-person prisoner's dilemma" game in accordance with the first property of the model, each

of the players tends "to betray", because their payoffs increase regardless of the strategies of other players. Therefore, choosing strategy D is preferable for each player i, because when all other players choose strategy C payoff of player i will be equal to:

V (i) = Hi (x||xi) >Hi (x) = A (i).

Thus, there should be a "punishment" for "betraying" players to ensure cooperative behaviour throughout the whole game. Choose k* in such a way that the "betrayal" of any coalition would not be advantageous to that coalition, so that strategy "to betray" of each player in the coalition would significantly reduce the sum of players' payoffs.

Definition 1. The strong Nash equilibrium in the game rk is n-dimensional strategy profile x* (•) = (x* (•),..., x*n (•)) which satisfies to the inequality

(x* (•)) Hi (x* (•) ||xm (•))

ieM ieM

for all M c N, xM ( ) € rijeM X (Petrosyan and Grauer, 2004).

Theorem 1. The strategy profile X is strong Nash equilibrium for infinitely repeated "n-person prisoner's dilemma" type of game, when all players choose strategy D on stage ki only if not all players from set N chose strategy D on stage ki — k* and all players choose strategy C on the stage ki otherwise.

Proof (of theorem). Construct an equilibrium in the n-person prisoner's dilemma that satisfies the condition of strong Nash equilibrium, that means it is stable relative to the deviation of any coalition.

Consider Xi as a strategy that consists of the following: a player chooses to play "against each who betrayed" for the next k* stages as soon as he got less than with full cooperation. Since the player's payoff with both the D and C strategies increases while the more other players choose the strategy C, the only way to minimize any player's payoff is to choose the strategy D. Remarkably, the strategies of the game "against all" and "against each who betrayed" (Petrosyan, 2019) are the same for "n-person prisoner's dilemma". That means that the only important fact is the betrayal of any player or coalition, however, it doesn't matter who was it. Thus, let all n players at each stage play the following strategy:

_q _ J D on the stage kq, if 3 j € N : xq-1 = D and 3 I € N : xp1 = D i 1 C, otherwise.

Find such a number of stages k*, that can guarantee the cooperation and "effectively punish" the betrayed coalition.

Consider the coalition S C N, that has cardinality |S | = S. In accordance with our approximation, all n players play the strategy C ("to stay silent") at the initial stages. Therefore, the coalition S get the following sum of players' payoffs:

A (S) = (ain + b 1) s = ains + b 1 s.

This coalition may try to deviate from the strategy Xs for maximization of the sum of the payoffs of its players. Suppose that the rest of the N\ S players continue to play the strategy XN\S. Then, xi = C for all i G {N \ S} at the stage of other deviation. When coalition S deviates from the cooperative behavior, the maximum sum of its players' payoffs equals:

W (S) = max £ hi (xs, xN\s) .

In our case, this represents the sum of s players payoffs DS, if n — s players choose the strategy "to stay silent" (C)

W (S) = £ Di (n - s)

An -

ies

And since all the players are symmetric from the forth property about payoffs in the " n-person prisoner's dilemma", then

W (S) = sDi (n — s) = s (a2 (n — s) + 62) = a2ns — a2s2 + 62s.

Remarkably, the only coalition S with the cardinality not exceeded s can get the sum of its players' payoffs bigger than for cooperative behavior, so W (S) > A (S)

a2ns — a2s2 + 62s > ains + 61s.

That means that the cardinality of the coalition that betrayed follows the condition:

_ a2n + 62 — ain — bi s < -.

a<2

Therefore, it is enough to prove that the "punishment" is effective for the coalition with cardinality that is less than s.

When all the players from the coalition S chose the strategy D, on the next stage players from the coalition {N \ {S}} choose the strategy "to betray" (D) in accordance with the strategy X^ In that terms, the strategy of the coalition S at this stage should be decided based on a function that maximizes the amount of mathematical expectations of the payoffs of players from the coalition S:

£hi (xs, xn\s) . i

Since (n — s) players choose the strategy D at the stage, this value can be counted as

V (S) = maJ £ Di (0), £ Ci (s)

Ues ies

The simmetry of the players allows us to calculate this value as V (S) = max {s (ais + bi), sb2} .

V (S) = max

xs

Therefore, at the "punishment" stage the sum of the payoffs of the players from the betrayed coalition is

V (S) i a1s2 + if s > ^,

1 b2s, otherwise.

Define the number of stages that can guarantee the cooperative behavior of the players. Therefore, it can divide the sum of the coalition's players' payoffs for the whole period of the game in such a way, that it can be smaller than if all the players from the set N choose the strategy Xi during the whole game.

k* (V (S) — A (S)) < A (S) — W (S),

that means that if k* > T/f^—^f?, the "punishment" is effective.

_ A(f)—V (f)

Since V (S) is different for some kind of a coalition's cardinality, find k* for each of them.

For the coalition S that has less than 62—61 players the number of stages equals:

k* (a1n + b1 — b2) > a2n — a2s + b2 — a1n — b1.

In this case, if the number of players in the coalition is s < 62—61, it is unprofitable for them to deviate during the stages K — k*:

* a2n — a2s + b2 — a1n — b1

k > -;-;-.

a1n + b1 — b2

Then, there should be found the number of stages for the "punishment" that

provides protection for coalition with cardinality 62—61 < s < a2"+&2a2a1"-61 against

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

deviations from cooperative behavior. Remember that coalitions with cardinality

s > a2"+62-ai"-61 have no reason to deviate even in the absence of punishment. If _ 12 ^

the number of stages for the "punishment" satisfies the condition

k (a!ns + b1s — a1s2 — b1s) > a2ns — a2s2 + b2s — a1ns — b1s, then the betrayal is unprofitable for coalitions with cardinality 62—61 < s <

02^+62— ain-61

02 '

Then, for the cardinality 62—61 < s < (a2"+62—a1"—61)n we can get

—* a2ns — a2s2 + b2s — a1ns — b1s

k > -.

a1 ns — a1 s

Since for the considered intervals the number of stages required to achieve an effective cooperation has decreased with the increase of the cardinality of the coalition, we can define the number of stages k* that can provide the punishment efficiency for any coalition max(k*) = k*(1):

02(n—1) + 62—01n —61 if 62—61 ^ 1

_—1

01n+61 —62

max(k*M a2(n—ia)fc101n—5/ otherwise.

□

Example 1. Construct a "three-person prisoner's dilemma" repeated game r that corresponds the previous assumptions. It is possible to introduce the game in matrix form, where the first player is the row-player and choose his strategy as the row C or D of the table, the second player is the column-player and the third player is the page-player. The order of payoffs corresponds to the number of players (see Table 1).

Table 1. Three-Person Prisoner's Dilemma (the first player is a row-player, the second one is a column player and the third one is a page-player)

C C D D C D

C (6,6,6) (4, 12, 4) C (4, 4, 12) (2, 8, 8)

D (12, 4, 4) (8, 8, 2) D (8, 2, 8) (4, 4, 4)

Consider each player's payoff as a function of the number of players choosing the strategy C:

Hi (xi, X2, £3) = <

Ci (x) = 2x, V x € (0, 3], if xi = C and x players choose the strategy C;

Di (x) = 4x + 4, V x € [0, 3), if xi = D and x players choose ^ the strategy C.

Since b2—bl > 1, the number of stages that can guarantee the cooperative behavior equals

= 4 * (3 - 1) + 4 - 2 * 3 - 0 = 3 2 * 3 + 0-4 ,

therefore, three stages is enough to "effectively punish" the "coalition that betrayed".

Thus, the new Nash equilibrium is the profile of strategies Xi for all players. Moreover, this is the strong Nash equilibrium.

In this multi-stage game r, there was found the number of stages k* that can guarantee that deviation from the cooperative behavior is disadvantageous for any coalition.

2.3. The Model of Cooperation in the Dynamic Game

Suppose that there are the finite number f of "n-person prisoner's dilemma" games: (71,...,7/). During the infinite number of stages one of these games is realized. Define the number of stages k* that are necessary for the "effective punishment" of the "coalition that betrayed" in the given game J/.

The payoff function for each of the game of the set of f games is defined as

(x1 , . . . , xi? . . . ? xn) ^

C/3 (x) = a?3' x"3 + bY, V x"3 € (0, n], if xY3 = C and x73 players choose the strategy C,

Dj3 (x7j) = aY x + b23, V x3 € [0, n), if x/3 = D and x73 players choose the strategy C.

The whole payoff of each player i, (X1,..., Xj,..., Xn) can be calculated as the sum of payoffs for each stage of the game If.

Construct the gain from deviation for coalition S in the game with these coefficients, as the difference between its payoffs for deviating behavior and on cooperative trajectory.

W3 (S) — A3 (S) = a?" ns — a?" s2 + b3 s — a?* ns + b3 s. The maximum gain from betraying can be calculated as:

(a3 (n — s) + b3 ) s — (a?* n + b?* ) s ma^------—

se(1,n) s

This is the gain of the coalition S with the cardinality |S| = 1, since the payoff increases while the number of players with the strategy C increases.

An effect of the punishment equals the difference between the payoff for the whole cooperation and the guaranteed payoff of the deviated coalition.

A73 (S) — V73 (S) = a]3ns + bY3 s — max {a]3s2 + b^ s; bYJ3 s} ,

but it is necessary to understand that these are at different stages of the game, therefore, at the stage of "punishment" j > j*.

The least effect of the punishment can be achieved if the difference between the deviating and cooperative behaviour is maximized at the stage of deviation, so

max (W7' (1) — A7' (1)),

je[1,/i

and the difference between the singleton's payoff with cooperative behavior of all players and the payoff for this player at the punishment stage is minimal

min (A73 (1) — V3 (1)).

je[1,/i

In this case, we can find the number of stages for the "effective punishment", when the gain from the deviation is less than the loss from the punishment during the k* stages, i.e.

(W3 (1) — A3 (1)) < k* jmin/] (A3 (1) — VY3 (1)) . In the given game, it can be reached, if

max ((aY3n - aY3 + 6Y3) - (aY3 + 6Y3)) < f min ((aYin - aYi + 6Yi) - 6Yi).

j£[Y,/]

¿£[1,/]

Therefore, there are not coalition that tends to deviate from the cooperative behavior, if the number of stages for the "punishment" is at least:

/ >

maxj£[1,/] (aY n - aY + - aY - bj mini£[i,/] (a?n - a?)

2.4. The Core of the "n-Person Prisoner's Dilemma"

Consider a dynamic game r/ which is played during K/ stages. This game consists of the set of f static games (y1, ..., y/), which can be described as the model of "n-person prisoner's dilemma". The games are realised with probabilities p1,... ,p/ on each stage of the game r/, Xf=1 Pj = 1.

The payoff functions for each possible static game are:

h Y (x 1, . . . , xi

)=

C?3' (x) = aY3 x+ 6Y3, V xY € (0, n], if xY3 = C and xY3 is a number of players, who plays the C strategy, DY3 (xY3) = aY3 x + 6Y3, V xY3 € [0, n), if xY3 = D and xY3 is a number of players, who plays the C strategy.

Let

VY3 (N)= max £ hY3 (x1,..., x,,..., xn)

x.....x.....' *

is equal for all Yj : j € [1, f ].

Definition 2. A core of the game If is a set of possible allocations (a1, ..., an) which doesn't contradict to the following statements:

1. individual rationality: a > Vrf (i), Vi € N;

2. coalitional rationality: Eies a > Vrf (S), VS C N;

3. efficiency: a = Vrf (N).

The value of the characteristic function of each individual player at each stage of 1/ equals to

VY3 (i) = dY3 (0) = 6Y3.

Suppose, that S is a coalition: S C N, |S| = s. It can guarantee the payoff

VY' (S) = max (r (aY3r + &Y3') + (s - r) (aY3 r + &Y0) , VS C N.

r£ [0,s]

VY3 (N) is the same in all games (y1, ..., Y/) and it is equal to

VY3 (N)= max (s (aY3 s + 6Y3) + (n - s) (aY3 s + 623))

s£[0,n]

x

So, the core of the game If is the set of allocations which meets with the following conditions:

X^ili ai = Kf Xj=1Pj maxse[0jn] (a^ s2 + 6^ s + a^ns - a^ s2 + 6^n - 6^ s) ; _ Xf=1 ai > Kf Xf=1 Pj maxre[o, s] (a^'r2 + 6^'r + a^' sr - a^'r2 + 6^' s - 6^'r) .

Definition 3. Define W (S), S c N, as follows: W (S) = maxj V (7j, S). Denote by D ( dition

by D (Yj ) the set of imputations aYj = ^a]^j, ..., a j in If, satisfying the con-

Y^ ajj > W (S), S c N, S = N, £ aYj = V (Yj, N),

a

ies

- Yj

iew

here V (Yj, N) is the maximum sum of players' payoffs in the game If. (Petrosyan and Pankratova, 2018)

Definition 4. A set D (y1) is called to be strongly time consistent in If if

1. D (7i+i) = 0j e [1, f ];

2. there exists imputation distribution procedure (See Petrosyan, 1993) P = (A,...,£j ,...,Pk ) : D (71) D Xj=1 Pj © D (Yi+i ) for all allocations

aY 1 e D (Y1).

Here the sign © means

Pj©D (71+1) = {^j © d (71+1) : d (71+1) G D (71+1)} ( Petrosyan and Grauer, 2004).

Theorem 2 (Time-Consistent Subset of the Core). 3-person game Ff has time-consistent subset of the core D which can be described by the following inequalities

aD > Kf max 6?, je[1,f] 2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

aD+af > Kf max ( max (4aYj + 26?j ) ; max (aYj + 6Yj + aYj + 6Yj ) ; max (26Yj ) ]

i f Vje[1,fr 1 je[1,f]v 1 1 2 27 je[1,f]v 2 V

af + af + af = Kf max (9aY3 + 3fe70 , V i, j, k G N. * j k f je[1,/] 1 W

Proof (of theorem). Since the dominant strategy for each player is D ("to betray"), the value of characteristic function for singleton in the game Yj, Vj G [1, f ] is

(i) = HYj (D, D, D) = bYj, V i G N.

All players are symmetric, so the value of characteristic function for two-person coalition is the maximum of three sums H*Yj (D, D, D),

£*es HYj (C, D, D) or £*es HYj (C, C, D), where S = {1, 2}. Therefore,

VY (1, 2) = VY (1, 3) = VY3 (2, 3) = max (4aY3 + 26 Y3; aY3 + bY3 + aY3 + 6Y3; 26Y3') Since as the profile of strategies (C, C, C) is more effective than (D, D, D),

VY3 (N) = max {9aY3 + 36 Y3; 4aY3 + 26 Y3 + 2aY3 + 6Y3; aY3 + 6Y3 + 2aY3 + 26Y3 } . Assume that

W (S) = max V (yj-, S), S C N j£/

Define Yj as a subgame of 1/ with the starting point j, where j € [1, K/]. Thus, the subgame Y1 coincides with r/.

Next define a new characteristic function W (!/, S):

W (Yj, S) = (K/ - j + 1) W (S),

where j € [1, K/].

Since VY3 (N) = maxIli...,Iii...,x„ H3 (xi,..., xj,..., x„) is equal for all

Yj,j € [1,f],

af > Kf max 6?.

1 j£[1,/] 2

af + af > W (S), where |S| = 2 and W (S) = Kf max | max (4aY3 + 26Y3) ; max (aY3 + 6Y3 + aY3 + 623) ; max (26Y3) |

J \je[i,/] j£[i,/K 1 1 2 2 7 je[i,/r 2 V

V af = K/ max {9aY3 + 36Y3; 4aY3 + 26Y3 + 2aY3 + 6Y3; aY3 + 6Y3 + 2aY3 + 26 23 } ■

j£[1,/]

Vj € [1, f ] .

These inequalities prove that D is a subset of the core.

But as far as W (N) is equal for all Yj- in r/, W (N) = K/ W (N).

Therefore, we can define an imputation distribution procedure, as

aY i

= f

Then, for all subgames of the If

Aj =

7,-

a3

Kf - j + 1'

where j G [1, Kf].

The allocation a can be written using IDP (Petrosyan, 1993) which gives us

Kf

^ - j + 1 ^ -j + 1+^ - j + 1

je[i,Kf] ¿es f 7 j=i ¿es f 7 j=i+i ¿es '

>

> 1W (S) + (Kf - 1)W (S) = W (S). Hence, we construct the strongly time-consistent D-subset of the core.

□

Example 2 (D-subset of the core).

Consider a dynamic 3-person prisoner's dilemma If, where (N) are equal for all j € [1, f]. Define Kf = 5, f = 3.

Table 2. Game 71 (the 1st is row-player, the 2nd is column player and the 3rd is pageplayer)

C C D D C D

C (100, 100, 100) (90, 115, 90) C (90, 90, 115) (80, 100, 100)

D (115, 90, 90) (100, 100, 90) D (100, 80, 100) (85, 85, 85)

Table 3. Game 72 (the 1st is row-player, the 2nd is column player and the 3rd is pageplayer)

C C D D C D

C (100, 100, 100) (50, 105, 50) C (50, 50, 105) (0, 90, 90)

D (105, 50, 50) (90, 90, 0) D (90, 0, 90) (75, 75, 75)

Table 4. Game 73 (the 1st is row-player, the 2nd is column player and the 3rd is pageplayer)

C

D

(100, 100, 100) (90, 110, 90) (110, 90, 90) (98, 98, 80)

C

D

(90, 90, 110) (80, 98, 98) (98, 80, 98) (86, 86, 86)

The values of characteristic funtions of the games (71, 72, 73) for each coalitions

are:

Table 5. The values of the characteristic functions of the games 71-73

S {1} {2} {3} {1, 2} {1, 3} {2, 3} {1, 2, 3}

V Y1 85 85 85 180 180 180 300

V 72 75 75 75 150 150 150 300

V Y3 86 86 86 180 180 180 300

7,-

7

7,-

Then we construct D-subset of the core for the game If

a* > 430, V i G N;

a* + aj > 900, V i, j G N, i = j;

a* + aj + ak = 1500, V i, j, k G N, i = j = k.

Consequently, D-subset of the core for this dynamic 3-person prisoner's dilemma contains imputations like (430, 470, 600), (500, 500, 500) (430, 535, 535), etc.

2.5. The Shapley Value of Dynamic "n-Person Prisoner's Dilemma"

Definition 5. The Shapley value for r^ is called an imputation ^Shp,..., Sh^/j

of the payoff Vrf (N) such that

Shp = £ (S - 1)n(n - S)! [Vrf (S) - V(S \ {i})] (Shapley, 1953).

SCN !

It is well-known that the Shapley value satisfies:

1) efficiensy:

£ Shff (Vrf) = Vrf (N);

*ew

2) symmetry: if players i and j are symmetric in accordance with Vrf

Shff (V) = Shf (V) ;

3) additivity: for two games Vrf and Wrf

ShSf (Vrf) + ShSf (Wrf) = ShSf (Vrf + Wrf);

4) null-player: if Vrf (S U {i}) - Vrf (S) =0, VS C N \ {i}

Shff (Vrf) = 0

Consider now the generalisation of the finitely repeated game Tf, where VYj (N) can be not equal in different stages. Define the values of the characteristic functions of each games Yj, j G [1, f ] for the coalition N:

VY (N)= max E H? (x1,..., x*,..., xn).

Assume that Sj is a coalition |Sj | = sj, which select the strategy D. This coalition we call deviating coalition. Then |N \ Sj | = n — sj.

The maximum of the characteristic function for the game Yj is achieved when the benefit from the deviation of each additional player

DYj (x = n — (s + 1))—CYj (x = n — s) = (aYj (n — s — 1) + 2)-(a1' (n — s) + bYj)

is less than the amount of losses of all other players, including those who have already deviated

n —Sj — 1 Sj

£ (cY3 (x*) - cY3 (x**)) + £ (dY3 (x*) - dY3 (x**)), i=1 i=1

where x* = n — Sj, and x** = n — s — 1.

We shall find the number of players in the deviated coalition Sj that gives the maximum values of the characteristic functions for each of f possible realizations of Yj in rf:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

af n — af s j — af + a^3 s j > a^3 n — a^3 s j — a^3 + b^3 — af n — af s j — . Then, the number of diviating players is

(2aY3 — aY3 ) n + (aY — bY3 + 673 — aY3 )

(2aY3 — 2aY3 )

, V7j G If.

Therefore, the values of the characteristic functions on each stage of the game If we can just share as

VY3 (N) = (aY3 (n - Sj) + 6Y3) (n - Sj) + (aY3 (n - Sj) + bY3) Sj,

where s,

, j G [1, f ] .

(2a?3' -aY3 )n+(aY3 -bY3 +bY3' -aY3 )

(2aY3-2aY3 )

Due to the efficiency and the symmetry axioms we can just shared equally the expected value of the characteristic function of If for the grand coalition.

s

j

Shi (V

f

= K E 3

j=i

3

) s2 + (aY3'n + bY3' — a

3

—b?;

3

s, + a/n

2 + bi 3

)Pj

2

i

n

for Vi G N, s, =

(2aY3 -aY3 )n+(aY3 -bY3 +&Y3 -aY3 )

(2aY3 — 2aY3 )

And furthermore, on the each stage of If, the Shapley value is equal to

Shi (VY3 )

f

E

j=i

((aY3' — aY3') s2 + (aY3'n + bY3' — aY3'n

— bY3') s, + aY3'n2 + bY3'n) p,

for V« G N, s,

(2aY3-aY3 )n+(.

3

-bY3 +bY3 -

a1 3)

. It does not change during the

(2aY3 —2aY3 )

transition from one stage of the game to the next one, given that the probability of each possible game (y1, ..., Yf) remains at all stages. Accordingly, the Shapley value of this game is time-consistent and belongs to the D-subset of the core.

n

2

3. The Model of "n-Person Prisoner's Dilemma" on the Network

3.1. The Description of "n-Person Prisoner's Dilemma" on the Network

Let M be the network. Its nodes represent players of the n-person prisoner's dilemma game. The path from i to j is a sequence of players connected by the edges of the network M. Then the path length is the number of edges on the path from i to j. If the path from i to j contains the minimal possible number of edges, then it is the shortest path from i to j. Thus, the distance between players i and j is the length of the shortest path from i to j.

Let ym be the static noncooperative n-person prisoner's dilemma network game. We denote the set of all players as N. Each of them has two possible pure strategies:

— the strategy "C" means "to stay silent";

— the strategy "D" means "to betray".

Therefore, the set of pure strategies of each player in the static n-person prisoner's dilemma network game can be represented as x* = {C, D}, Vi G N.

Let xm s be the number of players from the set S for which the following conditions are fulfilled:

— They use the strategy "C";

— The distance between them and the player i equals to m.

The payoff function h* (x1,..., x*,..., xn), (2), of the player i in the n-person prisoner's dilemma network game depends entirely on its strategy and the number

xm,N.

i (Em=o ) + ^if x* = C;

h* (x1, ... ,x*,. .. ,x„) = < (2)

I (E~=1 «2^mxm,N) + b2, if x* = D.

Hereinafter, the parameters a1, a2, b1, b2, S are the same for all players. The payoff function of n-person prisoner's dilemma network game meets the following conditions:

1. (Em=o a1Smxm,N) + b1 < (Em=1 a2Smxm,N) + b2, Vi G N, so the strategy D strictly dominates the strategy C;

2. E*ew ((Em=o a1Smxm,w) + M > e*£n (b2), where xm,N means all the players from the set N choose the strategy C. This unequality shows that the joint ("silence" brings th)e bigger sum of (all players payoffs )than the joint "defection";

3. (Em=o «1Smxm,N) + b1 > b1 rE (Em=1 a2Smxm,N) + b2 > b2, so the "defection" of any of the other player makes the payoff of the player i lower.

This type of "n-person prisoner's dilemma" allows to consider a wider class of games, where the previous game without differentiation of players' relationships can be represented as the game on the complete network.

3.2. An Effective Punishment for "n-Person Prisoner's Dilemma" on the Network

Let rM be the infinitely repeated prisoner's dilemma game yM on the network M. The payoff of the player i in the game rM equals to the sum of its payoffs on each stage-games.

It can be seen that the subgame perfect Nash equilibrium for rM is the set of players strategies that consist of repetition of the action "D" in all subgames of rM, since for each stage of this game the strategy "D" is strictly dominant.

Definition 6. The Grim-Trigger strategy of the player i in the n-person prisoner's dilemma game on the network M is the strategy that consists of the choosing action "C" on all the stage of game rM till the stage, when one of the other players chooses the action "D". After that stage the player i will choose the action "D" regardless the actions of all other players.

Lemma 1. The set of Grim-Trigger strategies of all the players constitutes the Nash equilibrium for the n-person prisoner's dilemma game rM on the network M

Proof (of lemma).

From the conditions 2-3 of the payoff functions in one-stage prisoner's dilemma game ym it follows that ^™=0 ai5mxm,N + bi > b2.

Suppose that all the players choose the Grim-Trigger strategies. Then, the difference between the payoffs that player i can achieve on each stage using the Grim-Trigger strategy and that he can achieve after the deviation from Grim-Trigger strategy equals to m=o ai^mxm N + bi — b2. This difference is bigger than 0, so, for the infinite period the player i looses infinite gain. The future loses is bigger than the benefit from the deviation, therefore, the set of Grim-Trigger strategies is the Nash equilibrium for the n-person prisoner's dilemma game rM on the network M.

However, the set of Grim-Trigger strategies leads to uncertainty as a consequence of slight deviation of the current strategies. As a result, each player will get the worst payoffs for an infinite period.

Let introduce the "effective punishment" for the game rM on the network M.

Definition 7. The "punishment" is choosing by the non-deviating players the actions that gives the minimal possible payoffs to the deviated players.

Theorem 3. The number of stages, that provides an "effective punishment" in the game rM on the network M and makes all the players to follow the actions "to stay silent" during all stages of the game, equals to (3).

fc=maxr a2^™=i 5 Xm,N 1 (3)

k = max| , œ i |. (3)

¿eN ai + bi — b2 + ai £~=y 5mx

,N

Proof (of theorem). Let be the minimal number of stages that provides an "effective punishment" for the player i. If he decide to use the action "D" on any stage of the game rM, all the other players will try to minimize his payoffs on the next stages to punish him for such behaviour. Since the strategy "to defect" in contrast with the strategy "to stay silent" brings to all the other players lower payoffs,

non-deviating players will choose it as a punishment. Then the number of stages k* satisfies the inequality (4).

№ + £ a,2£ma4,N < I £ ai0mxm,N + M k (4)

m=1 \m=0 /

Since ki is the minimal number that satisfies 4, it is equal to (5).

, r m=1 0 Xm,N 1 (5)

ki = I i I. I. i Xrn -i I V '

«i+bi- b2+«i E m=i °mxm,w

Therefore, the number of stages k that provides an "effective punishment" to the n-person prisoner's dilemma rM game on the network M is the maximal number of stages that provides an "effective punishment" for the players from the set N (6).

k = max ki (6)

iew

□

Example 3. Let Fig.1. be the network M for 3-person prisoner's dilemma rM, where the second player is a head of criminal group.

Fig. 1. An example of the network M for three-person prisoner's dilemma rM game The one-stage payoff functions equal to (7).

(E™=o 0, 8mxm,N) +3, if Xi = C; hi (xi,x2,x3)={ (7)

(Em=i 2 * 0, 8mxm,N) +5, if Xi = D.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

The number of stages for "effective punishment" for each player is (8-10).

2 * (0, 8 + 0, 64) ki = 11 + 3 - 5 + 1 * (0, 8 + 0, 64)1 = 7 (8)

2 * (0, 8 * 2)

k2 = 11 + 3-5Ï1*(0, 8 * 2)1 = 6 (9)

2 * (0, 8 + 0, 64) k3 = 11 + 3 - 5+W 8 + 0, 64)1 =7 (10)

Therefore, an "effective punishment" for the game rM can be realized during (11) stages.

2

k = max {7, 6, 7} = 7 (11)

Thus, the deviation from the strategy "to stay silent" during all stages of the game rM on the network M, that provides the maximum of the sum of all players payoffs, can be surely punished during kk = 7 stages.

3.3. Cooperative network game "n-Person Prisoner's Dilemma"

Let N = {1, 2,..., n} be the set of players. Any nonempty subset S Ç N is called a coalition.

Definition 8. The cooperative solution is the strategy profile that maximizes the sum of all player's payoffs.

Definition 9. By characteristic function of an n-person game we mean a function

V (S) that assigns a value to each subset of players V : 2N ^ R in such a way, that

V (S) is the payoff to the subset S Ç N if they maximize the sum of the payoffs of the players from the subset S, whereas the players from the set N \ S act against S. V (S) is called the value of the coalition S:

V (S) = max min , vN\ S) , S Ç N,

MS vn\S \

where G XS, vNG XN\S and r = (XS, XN, is a mixed extension of the zero-sum game r.

The characteristic function represents the guaranteed total payoff for the players in a given coalition.

Let xN for the player i be the number of the adjacent players on the network

M.

Theorem 4 (Cooperative Solution). In the cooperative solution player i will choose his pure strategy C, if the number of the adjacent players on the network M satisfies the condition

xN > . (12)

2ai - a2

Proof (of theorem). Player i should use the strategy D, if the gain of the player i from choosing of the dominant strategy, in contrast to the strategy C is greater than losses of the adjacent players from his "betraying".

Therefore, the player i chooses the strategy D to maximize the sum of payoffs of all the players from the coalition N, if the unequality aixN < (a2 — ai ) xN + b2 — bi holds.

Then, the player i will choose the strategy C in the cooperative solution, if

«ixN > («2 — ai) xN + 62 — bi. (13)

Consequently, if 2a—is less than the number of players from the set N that are adjacent to the player i, then this player will choose the strategy C in the cooperative solution.

□

The same ratio can be found for the coalition S C N:

If the number of players from the set S C N that are adjacent to the player i satisfies the relation XN > '„2, then in the cooperative solution the player i will choose the strategy C. Otherwise, he will choose D.

Therefore, the characteristic function of a cooperative game for n-person prisoner's dilemma on the network can be written as:

V (i) = 62; (14)

V (S) = V hJXj = C, if xS > ^—— and j e S, otherwise xj = D ) ; (15) ££ V 2«! - J

V (N) = V hA Xj- = C, if xN > ———, otherwise Xj- = D )

V 2«i - «2 y

j = c , if xN > t6-—-—, otherwise xj = D ) . (16) . 2ai — a-

i£N v 1 2

Example 4 (3-person game).

Consider the payoff function for the 3-person prisoner's dilemma game on the network M:

{Ci (x) = 16x*N + 1, if x4 = C;

Dj (x) = 16xN + 18, if x4 = D. The network M for this game is represented in the Figure 1.

2

Fig. 2. An example of network M for the 3-person prisoner's dilemma game

Next, we construct the characteristic function of the resulting game on the basis of ratios relative to the neighbors of each player. In accordance with the payoff function, a player from the coalition S should choose the strategy C, if the number of the adjacent to him players from this coalition is not less, than 2'2-' = 2/186l116 =

17 1 2

16 .

As we see from 1, this ratio can be met only for the second player if all players belong to the coalition S.

We can see that the maximal sum of players' payoffs for the grand coalition is not necessarily achieved by all players choosing a strategy C. This is not an obligatory condition for players' payoffs in the n-person prisoner's dilemma on the network game. As we can see, we still have all the basic features of the game's prototype that is the two-person prisoner's dilemma.

Cooperation in the Multi-Agent System with Different Types of Interactions 79 Table 6. This is the example table

S Unequalities V (S)

0 1 / S, 2 / S, 3 / S 0

1 x| < , 2 /S, 3 /S 18

2 1 /S, x| < , 3 /S 18

3 1 /S, 2 /S, x| < 16 18

(1 2) x| < , x| < 2^X6-16 , 3 / S 36

(1 3) x| < , 2 /S, x| < 2iT6-T6 36

(2, 3) 1 / S, x| < 2»1186-116 , X| < 2*1186-116 36

N x| < x| > x| < ^ 69

4. Conclusion

Thus, there are summarized our papers that consider "n-person prisoner's dilemma" type of games. The publication also provides an analysis of a new equilibrium behavior of players under the conditions of the dynamic model.

The article identifies the function of players' payoffs in the model that depends on the number of cooperating players and the strategy of the player. The most recent complete consideration of the distinctive properties of the model for each player is presented. The characteristic function for dynamic type of the game is constructed.

We consider the new strong Nash equilibriums for both types of the repeated game: without relationship's differentiation and on the network. There is found the "effective punishment" for the dynamic types of the "n-person prisoner's dilemma" model.

There is constructed the core of the dynamic model. The D subset of the core for the dynamic "n-person prisoner's dilemma" game shows the time-consistent solution. It's time-consistency is proved in terms of the considered model.

There is provided the search of the optimality principles, such as the Shapley value in the repeated and dynamic games that are based on the properties of the "prisoner's dilemma".

We consider two types of non-cooperative "n-person prisoner's dilemma" game on the network that take into account the distance between the nodes of the network (players with their relationships) as the degree of influence on each other's payoff function. This helps to construct a cooperative type of the game that considers pairwise interactions of players on the network and, moreover, discounted influences based on the network interaction. The key principle of player's behavior to achieve the maximum sum of players' payoffs from the coalition is presented taking into account the number of adjacent players (in this research, the term of "adjacent players" refers to the nodes of the players that are connected by network edges). The construction of the game makes it possible to investigate such a principle of optimality as the Shapley value that is presented in several ways depending on the coefficients of the initial game.

References

Aumann, R.J. (1959). Acceptable points in general cooperative n-person games. Contributions to the Theory of Games 4(AM-40), 287-324.

Carroll, J. W. (1988). Iterated N-player prisoner's dilemma games. Philosophical Studies. 53(3), 411-415.

Grinikh, A. L. (2019). Stochastic n-person prisoner's dilemma: the time-consistency of core and Shapley value. Contributions to Game Theory and Management, XII, 151-158.

Grinikh, A. L. and Petrosyan, L. A. (2021). An Effective Punishment for an n-Person Prisoner's Dilemma on a Network. Trudy Instituta matematiki i mekhaniki UrO RAN, 27(3), 256-262.

Grinikh, A. L. and Petrosyan, L. A. (2021). Shapley value of n-person prisoner's dilemma. Journal of Physics: Conference Series. IOP Publishing, 1864(1), 012061.

Grinikh, A. L. and Petrosyan, L. A. (2021). Cooperative n-person Prisoner's Dilemma on a Network. Contributions to Game Theory and Management, 14(0), 122-126.

Hamburger, H. (1973). N-person prisoner's dilemma. Journal of Mathematical Sociology, 3(1), 27-48.

Petrosyan, L. A. (1993). Differential games of pursuit (Vol. 2). World Scientific, 312.

Petrosyan, L. (2019). Strong Strategic Support of Cooperation in Multistage Games. International Game Theory Review (IGTR), 21(1), 1-12.

Petrosjan, L. A. and Grauer, L. V. (2002). Strong Nash equilibrium in multistage games. International Game Theory Review, 4(2), 255-264.

Petrosyan, L. A. and Grauer, L. V. (2004). Multistage games. Journal of applied mathematics and mechanics, 68(4), 597-605.

Petrosyan, L. A. and Pankratova, Y. B. (2018). New characteristic function for multistage dynamic games. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 14(4), 316-324.

Shapley, L. S. (1953). A value for n-person games. Contributions to the Theory of Games, 2(28), 307-317.

Straffin, P.D. (1993). Game theory and strategy. MAA, 36.

COOPERATION IN THE MULTI-AGENT SYSTEM WITH DIFFERENT TYPES OF INTERACTIONS Текст научной статьи по специальности «Математика»

Аннотация научной статьи по математике, автор научной работы — Grinikh Aleksandra L.

Похожие темы научных работ по математике , автор научной работы — Grinikh Aleksandra L.

Текст научной работы на тему «COOPERATION IN THE MULTI-AGENT SYSTEM WITH DIFFERENT TYPES OF INTERACTIONS»