Научная статья на тему 'Altruistic and aggressive types of behavior in a non-antagonistic differential game'

Altruistic and aggressive types of behavior in a non-antagonistic differential game Текст научной статьи по специальности «Математика»

CC BY
109
6
i Надоели баннеры? Вы всегда можете отключить рекламу.
Журнал
Ural Mathematical Journal
Scopus
ВАК
Область наук
Ключевые слова
NON-ANTAGONISTIC POSITIONAL DIFFERENTIAL GAME / ALTRUISTIC TYPE OF BEHAVIOR / AGRESSIVE TYPE OF BEHAVIOR

Аннотация научной статьи по математике, автор научной работы — Kleimenov Anatolii F.

An example of a non-antagonistic positional (feedback) differential two-person game (NPDG) is considered in which each of two players, in addition to the normal type of behavior, oriented toward maximizing own functional, can use other types of behavior. In particular, it can be altruistic and aggressive types. In the course of the game players can switch their behavior from one type to other. The use by players of types of behavior other than normal can lead to outcomes more preferable for them than in a game with only normal behavior. The example with the dynamics of simple motion on a plane and phase constraints illustrates the procedure of constructing new solutions.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Altruistic and aggressive types of behavior in a non-antagonistic differential game»

URAL MATHEMATICAL JOURNAL, Vol. 4, No. 2, 2018, pp. 79-87

DOI: 10.15826/umj.2018.2.009

ALTRUISTIC AND AGGRESSIVE TYPES OF BEHAVIOR IN A NON-ANTAGONISTIC DIFFERENTIAL GAME

Anatolii F. Kleimenov

N.N. Krasovskii Institute of Mathematics and Mechanics Ural Branch of Russian Academy of Sciences, 16 S. Kovalevskaya Str., Ekaterinburg, Russia, 620990;

Ural Federal University, 19 Mira str., Ekaterinburg, Russia, 620002 kleimenov@imm.uran.ru

Abstract: An example of a non-antagonistic positional (feedback) differential two-person game (NPDG) is considered in which each of two players, in addition to the normal type of behavior, oriented toward maximizing own functional, can use other types of behavior. In particular, it can be altruistic and aggressive types. In the course of the game players can switch their behavior from one type to other. The use by players of types of behavior other than normal can lead to outcomes more preferable for them than in a game with only normal behavior. The example with the dynamics of simple motion on a plane and phase constraints illustrates the procedure of constructing new solutions.

Keywords: Non-antagonistic positional differential game, Altruistic type of behavior, Agressive type of behavior.

Introduction

In this article we consider a non-antagonistic positional differential two-person game (see, for example, [9]), for which emphasis is placed on the case where each of the two players, in addition to the normal (nor), type of behavior, oriented on maximizing their own functional, can use other types of behavior introduced in [2, 5], such as altruistic (alt), aggressive (agg) types. It is assumed that during the game, players can switch their behavior from one type to another. The idea of using the players to switch their behavior from one type to another in the course of the game was applied to the game with cooperative dynamics in [5] and for the repeated bimatrix 2 x 2 game in [3], which allowed to obtain new solutions in these games.

It is assumed that in the game each player chooses the indicator function determined over the whole time interval of the game and takes values in the set {nor, alt, agg}, along with the choice of the positional strategy. Player's indicator function shows the dynamics for changing the type of behavior that this player adheres to. Rules for the formation of controls are introduced for each pair of behaviors of players.

The formalization of positional strategies in the game is based on the formalization and results of the general theory of antagonistic positional differential games [7, 8]. For non-antagonistic positional differential games this formalization was developed in [1].

In the article the concept of the BT-solution is introduced.

An example of a game with dynamics of simple motion in the plane and phase constraints is proposed. We assume that the first and second players can exhibit altruism and aggression towards their partner for some time periods and a case of mutual aggression is allowed. Sets of BT- solutions are described. This paper is a continuation of [4].

1. Equations of motion and phase constraints.

Let equations of dynamics be as follows

x = u + v, x,u,v € R2, ||u|| < 1, ||vH < 1, 0 < t < tf, x(0) = x0, (1.1)

where x is the phase vector; u and v are controls of Player 1 (P1) and Player 2 (P2), respectively. Let payoff functional of Player i be

Ii = oi(x(tf)) = M -||x(tf) - a(i)||, i = 1,2, (1.2)

where M is a constant. That is, the goal of Player i is to bring vector x(tf) as close as possible to the target point r/^.

Let the initial conditions and values of parameters be given (Fig. 1):

tf = 5.0, x0 = (0, 0), a(1) = (10, 8), a(2) = (-10, 8), M = 18.

The game has the following phase restrictions. The trajectories of the system (1.1) are forbidden from entering the interior of the set S, which is obtained by removing from the quadrilateral Oabc the line segment Oe (Fig. 1). The set S consists of two parts S1 and S2, that is, S = S1 U S2. Coordinates of the points defining the phase constraints: a = (-4.5,3.6), b = (0,8), c = (6.5, 5.2), O = (0,0), e = (3.25, 6.6). We have a € Oa(2), c € Oa(1), e € bc and |a(1)b| = |a(2)b| = 10.

Figure 1. The attainability set

2. Attainability set

Attainability set of the system (1.1) constructed for the moment $ consists of points of the circle of radius 10 located not higher than the three-link segment aOc and also bounded by two arcs connecting the large circle with the sides ab and bc of the quadrilateral. The first arc is an arc of the circle with center at the point a and radius ri = 10 — |Oa| = |ad2|. The second (composite) arc consists of an arc of the circle with center at the point e and radius r2 = 10 — |Oe| = |ed1| and an arc of the circle with center at the point c and radius r3 = 10 — |Oc|. (Fig. 1). Results of approximate calculations: r1 = 4.24, r2 = 2.64, r3 = 1.68, d1 = (0.82, 7.65), d2 = (—1.47, 6.56). We have also: |Oa(1)| = |Oa(2)| = 12.81.

In Fig. 1 the dashed lines represent arcs of the circle with center at the point b and radius r4 = |Oa(1) | — |a « b| = 12.81 — 10 = 2.81. These arcs intersect the sides ab and bc at the points p1 = (2.58, 6.89) and p2 = (—2.01, 6.04). By construction, the lengths of the two-links a(1)bp2 and a(2)bp1 are equal to each other and equal to the lengths of the segments Oa(1) and Oa(2).

3. Strategies and motions

Assume that both players have information about the current position of the game (t,x(t)). Then P1 and P2 acts in the class of pure positional strategies.

The formalization of players' positional strategies and motions generated by them in a NPDG is based on the formalization and results of antagonistic positional differential games theory from the books [7, 8]. The following presentation is given in [1].

Strategy of P1 is identified with the pair U = {u(t,x,e), (e)}, where u(-) is an arbitrary function of position (t, x) and a positive precision parameter e > 0 and taking values in the set {u € R2, ||u|| < 1}. The function : (0, to) i—> (0, to) is a continuous monotonic function satisfying the condition ^1(e) ^ 0 if e ^ 0. For a fixed e the value ^1(e) is the upper bound step of subdivision the segment [t0,$], which P1 applies when forming step-by-step motions. Similarly, the strategy of P2 is defined as V = {v(t, x,e),^2(e)}.

Motions of two types — approximated (step-by-step) and ideal (limiting) are considered as motions generated by a pair {strategy of P1 - strategy of P2}.

Approximated motion = x[-, t0,x0, U, e1, A1,V, e2, A2] generated by a pair of strategies (U, V) from the initial position (t0,x0) for fixed values of the players' precision parameters e1 and e2, for fixed subdivisions A1 = {t(1)} and A2 = {t(2)} of the interval [t0,$] chosen by P1 and P2, respectively, under the conditions ¿(Aj) < $(ej),i = 1,2, is introduced as step-by-step solution of the differential equation

4[t] = f (t, xA[t], u£A\ [t], [t]), xAM = X0,

[t] = u(i(VA[i(1)],£1), t(1) < t<ti+1, [t] = v(t(2), xA [tf],^ ), j < t < tj+1.

A limiting motion generated by the pair of strategies (U, V) from the initial position (t0,x0) is a continuous function x[t] = x[t, t0,x0, U, V] for which there exists a sequence of approximated motions {x[t, t§, xk, U, e^, A^, V, ek, ]}, uniformly converging to x(t) on [t0,$] as k ^ to, ek ^ 0, e| ^ 0, t§ ^ to, x§ ^ xo.

A pair of strategies (U, V) generates a nonempty compact (in the metric of the space C[t0, $]) set X(t0, x0, U, V) consisting of limit motions x[-, t0, x0, U, V].

The control laws (U, e1, A1) and (V, e2, A2) are said to be consistent with respect to the precision parameter if e1 = e2. Consistent control laws generate consistent approximate motions, the sequences of which generate consistent limit motions.

4. Auxiliary zero-sum positional differential games ri

Consider auxiliary zero-sum positional differential games r1 and r2. Dynamics of both games is described by the equation (1.1). In the game ri Player i maximizes the payoff functional ai(x(tf)) (1.2) and Player 3 — i opposes him. It follows from [7, 8] that both games r1 and r2 have universal saddle points

{u(i)(t, x, e), v(i)(t,x,e)}, i = 1,2, (4.1)

and continuous value functions

Y1(t,x), Y2(t,x). (4.2)

The property of strategies (4.1) to be universal means that they are optimal not only for the fixed initial position (t0,x0) but also for any position (t*,x*) assumed as initial one.

It is not difficult to see that the value of Yi(t, x) (4.2) is the maximal guaranteed payoff of the Player i in the position (t, x) of the game.

The value functions Y1(t,x) and Y2(t,x), 0 < t < tf, x € R2\S in our example will be as follows

) ( 18-||(x-a(i) ||, xa(i)H intS = 0, ( )

Yi x) = \ 18 - ps (x, a(i)) otherwise, (4.3)

where i = 1,2, and ps (x,a(i)) denotes the smallest of the two distances from the point x to the point a(i), one of which is calculated when the set S is bypassed clockwise and the other when the set S is bypassed counterclockwise.

5. NE- and P(NE)-solutions in NPDG

At first we solve the game NPDG (without abnormal behavior types). Introduce the following definitions from [1].

Definition 1. A pair of .strategies (UN, VN) is called a Nash equilibrium .solution (NE-solution) of the game, if for any motion x*[-] € X(t0,x0,UN, VN), any moment t € [t0,tf], and any strategies U and V the following inequalities hold:

maxo"1(x[tf,T,x*[t], U, Vn]) < mino"1(x[tf,T, x*[t], U N ,V N ]),

max CT2(x[tf, t, x* [t], UN, V]) < min ^2(x[tf, t, x* [t], UN, VN]).

xH

where the operations min are performed over a set of agreed motions, and the operations max by sets of all motions.

Definition 2. An NE-solution (UP, VP) which is Pareto non-improvable with respect to the values I1,I2 (1.2) is called a P(NE)-solution.

In [1] the following structure of NE- and P(NE)-solutions was established. Namely, it was shown that all NE- and P(NE)-solutions of the game NPDG can be found in the class of pairs of

strategies (U, V) each of which generates a unique limit motion (trajectory). The decision strategies that make up such a pair generating the trajectory x*(-) have the form

U0 = {u0(t,x,e),£0(e)}, V0 = {v0(t,x,e),$(e)}, (5.1)

u0 (t x e) = / u* (t,e), yx - x*(t)| <e^(t),

u (2) (t , x,e), ||x - x*(t)|| > e^ (t),

v0 (t, x, e) =

v* (t, e), ||x - x*(t)|| <e^ (t), v(1)(t,x,e), ||x - x*(t)|| > e^ (t),

for all t € [t0, tf], e > 0. In (5.1) we denote by u*(t, e), v*(t, e) families of program controls generating the limit motion x*(t). The function and the functions ^(O and are chosen in such a way that the approximated motions generated by the pair (U0, V0) from the initial position (t0,x0) do not go beyond the e^(t)-neighborhood of the trajectory x*(t). Functions u(2)(-, ■, ■) and v(1)(-, ■, ■) are defined in (4.1).

It is proved in [1] that the point t = tf is the maximum point of the value function Yi(t, x), i = 1, 2 computed along NE-trajectory and P(NE)-trajectory.

One can check that in this game the trajectory x(t) = 0, t € [0, 5] (stationary point O) is the only NE-trajectory, and, consequently, the only P(NE)-trajectory; the players' gains on it are I1 = I2 = 5.19.

6. Types of behavior

Let us move on to the game NPDGwBT (with abnormal behavior types), in which each player during certain periods of time may exhibit altruism and aggression towards another player, and the mutual aggression is allowed. The definitions of behavior types other than normal are given in [2, 5].

Definition 3. We say that on the time interval (t^t2) c [t0,tf] Player 1 adheres to the altruistic (alt) type of behavior with respect to Player 2, if his payoff functional on this interval is equal to the functional I2 (1.2) of Player 2.

Definition 4. We say that on the time interval (t1,t2) C [t0,tf] Player 1 adheres to the aggressive (agg) type of behavior with respect to Player 2, if his payoff functional on this interval is equal to the functional -12, where I2 (1.2) is the payoff functional of Player 2.

Definition 5. We will say that on the time interval (t1,t2) C [t0,tf] Player 1 adheres to the paradoxical (par) type of behavior if his payoff functional on this interval is equal to the functional I1 (1.2), taken with the opposite sign.

Similarly, we define the altruistic and aggressive types of behavior for Player 2 towards Player 1, as well as the paradoxical type of behavior for Player 2.

In this paper, we do not use the paradoxical type of players behavior. Note that the aggressive type of player behavior is actually used in NPDG in the form of punishment strategies contained in the structure of the game's decisions (see, for example, [1, 6]).

The above definitions characterize the extreme types of behavior of players. In reality, however, real individuals behave, as a rule, partly normal, partly altruistic, partly aggressive and partly paradoxical. In other words, mixed types of behavior seem to be more consistent with reality.

If each player is confined to "pure" types of behavior, then in the game (1.1), (1.2) there are 9 possible pairs of types of behavior: (nor, nor), (nor, alt), (nor, agg), (alt, nor), (alt, alt), (alt, agg), (agg,nor), (agg, alt), (agg, agg). For two pairs (nor, alt) and (alt, nor) the interests of the players coincide and they solve a team problem of control. For two pairs (nor, agg) and (agg, nor) players have opposite interests and, therefore, they play a zero-sum game. The remaining 5 pairs define a non-antagonistic games.

The idea of using the players to switch their behavior from one type to another in the course of the game was applied to the game with cooperative dynamics in [5] and for the repeated bimatrix 2 x 2 game in [3], which allowed to obtain new solutions in these games.

The extension of this approach to non-antagonistic positional differential games leads to new formulation of problems. In particular, it is of interest to see how the player's winnings, obtained on Nash solutions, are transformed. The actual task is to minimize the time of "abnormal" behavior, provided that the players' payoffs are greater than when the players behave normally.

7. Formalization of actions. Rules 1,2

In NPDGwBT we assume that simultaneously with the choice of positional strategy [7, 8] each player also chooses his indicator function [5]. We denote the indicator function of Player i by the symbol a : [t0,$] -—> {nor, alt, agg}, i = 1,2. If the indicator function of some player takes a value, say, alt, on some time interval, then this player acts on this interval as an altruist in relation to his partner. Thus, in the game NPDGwBT P1 controls the choice of a pair of actions {position strategy, indicator function} (U, a1(-)), and P2 controls the choice of a pair of actions (V, a2(-)).

As mentioned above, for any pair of types of behavior three types of decision making problems can arise: a team problem, a zero-sum game, and a non-antagonistic game. We adopt the following Rules 1-2.

Rule 1. If on the time interval (t1,t2) C [t0,$] the players' indicator functions generate a non-antagonistic game, then on this interval P1 and P2 choose one of P(NE)-solutions of this game. If a zero-sum game is realized, then as a solution, P1 and P2 choose one of saddle points of this game. Finally, if a team problem of control is realized, then P1 and P2 choose one of the pairs of controls such that the value function y(t,x) calculated along the generated trajectory is non-decreasing function, where i is the number of the player whose functional is maximized in team problem.

Generally speaking, the same part of the trajectory can be tracked by several pairs of players' types of behavior, and these pairs may differ from each other by the time of use of abnormal types. It is natural to introduce the following Rule 2.

Rule 2. If there are several pairs of types of behavior that track a certain part of the trajectory, then P1 and P2 choose one of them that minimizes the time of using abnormal types of behavior.

8. BT-solution in NPDGwBT

We now introduce the definition of the solution of the game NPDGwBT. The definition of BT-solution is given in [5].

Note that the set of motions generated by a pair of actions {(U, a1 (■)), (V, a2(■))} coincides with the set of motions generated by the pair (U, V) in the corresponding NPDG.

Definition 6. The pair {(U0,a0(-)), (V0,a0(-))}, consistent with Rules 1, 2, forms a BT -ssolution of the game NPDGwBT if there exists a trajectory xBT(■) generated by this pair and there

is a P(NE)-solution in the corresponding game NPDG generating the trajectory xP(■) such that the following inequalities are true

Oi(xBT(tf)) > Oi(xP(tf)), i = 1, 2, where at least one of the inequalities is strict.

Definition 7. The BT-solution {(U0, a°(-)), (V0, a°(-))}, which is Pareto non-improvable with respect to the values I1,I2 (1.2), is called P(BT)-solution of the game NPDGwBT.

Problem 1. Find the set of BT-solutions.

Problem 2. Find the set of P(BT)-solutions.

In the general case, Problems 1 and 2 have no solutions. However, it is quite expected that the use of abnormal behavior types by players in the game NPDGwBT can in some cases lead to outcomes more preferable for them than in the corresponding game NPDG only with the normal type of behavior.

In our example, just such a situation will take place.

9. Building BT-solutions

Let us now turn to the game NPDGwBT, in which each player during certain periods of time may exhibit altruism and aggression towards another player, and the case of mutual aggression is allowed.

In the attainability set, we find all the points x for which the inequalities hold

Oi(x) > Oi(O), i = 1,2, O1(x) + O2 (x) > O1(O) + O2 (O). ( . )

Such points form two sets D1 and D2 (see Fig. 1). The set D1 is bounded by the segment p1d1, and also by the arcs p1q1 and q1d1 of the circles mentioned above. The set D2 is bounded by the segment p2d2, and also by the arcs d2q2 and q2p2 of the circles mentioned. On the arc p1q1, the non-strict inequality (9.1) for i = 2 becomes an equality, and on the arc q2p2, the non-strict inequality (9.1) becomes an equality for i = 1. At the remaining points sets D1 and D2 , the non-strict inequalities (9.1) for i € {1, 2} are strict.

We construct a BT-solution, leading to the point d1 € D1. Let us find a point m equidistant from the point a(2) if we go around the set S2 clockwise, or if we go around S2 counterclockwise. We also find a point n equidistant from the point a(1) as if we were go around the set S1 clockwise, or if we go around S1 counterclockwise. The results of the calculations: m = (1.79, 3.63), n = (0.32, 0.65).

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Consider the trajectory Oed1; the players' gains on it are I1 = 8.82, I2 = 7.10, that is, the gains of both players on this trajectory are greater than the gains on the single P(NE) -trajectory. As follows from the above, the trajectory Oed1 is not Nash one. Therefore, if it is possible to construct indicator functions-programs of players that provide motion along this trajectory, then a BT-solution will be constructed.

First of all find that if we move along the trajectory Oed1 with the maximum velocity for t € [0, 5], the time to hit the point n will be t = 0.361, the point m will be t = 2.022, and the point e will be t = 3.678. It is easy to verify that for such a motion along the trajectory Oed1 on the interval t € [0, 0.361], both functions Y1(t,x) and Y2(t,x) (4.3) decrease monotonically;

for motion on the interval t € [0.361, 2.022], the function Y2(t,x) continues to decrease, and the function Y1(t,x) increases; for motion on the interval t € [2.022, 3.678], both functions increase; finally, on the remaining interval t € [3.678, 5], the function 72(t,x) continues to increase, and the function y1 (t,x) decreases.

We check that on the segment On of the trajectory, the pair (agg,agg), which determines the non-antagonistic game, is the only pair of types of behaviors that realizes motion on the segment in accordance with Rule 1; this is the motion generated by the P(NE)-solution, the best for both players. In the next segment nm, two pairs of types of behaviors realize motion on the segment according to Rule 1, namely (nor, alt) and (agg,alt); however, according to Rule 2, only the pair (nor, alt) remains; it defines a team problem of control in which the motion represents the maximum shift in the direction of point m. There are already four pairs of "candidates" (nor, nor), (alt, nor), (nor, alt) and (alt, alt) on the segment me, but according to Rule 2 the last three pairs are discarded; the remaining pair defines a non-antagonistic game and the motion on this segment is generated by the P(NE)-solution of the game. Finally, for the last segment ed1, the only pair of types of behaviors is the pair (alt, nor), which defines a team problem of control; the motion represents the maximum shift in the direction of the point d1.

Thus, we have constructed the following indicator function-programs

«i2)(t) = {agg,t € [0,0.361); nor,t € [0.361,3.678); alt,t € [3.678, 5]}, (9.2)

a22) (t) = {agg,t € [0, 0.361); alt,t € [0.361, 2.022); nor,t € [2.022, 5]}. (9.3)

We denote by (U(2), V(2)) the pair of players' strategies that generate the limit motion Oed1 for t € [0, 5] and is consistent with the constructed indicator functions. Then we obtain the following assertion.

Theorem 1. The pair of actions {(U(2) ,0^(■)), (V(2) ,a22)(■))} (9.2), (9.3) provides the BT-solution.

Following the scheme of the proof of Theorem 1, we arrive at the following assertion.

Theorem 2. The sets D1 and D2 consist of those and only those points that are endpoints of the trajectories generated by the BT-solutions of the game.

10. Conclusion

In this paper, we use complex switching, namely, from one type of behavior to another, changing the nature of the problem of optimization from non-antagonistic games to zero-sum games or team problem of control and vice versa. These switchings are carried out according to pre-selected indicator function-programs. Each player controls the choice of a pair of actions {positional strategy, indicator function}. Thus, the possibilities of each player in the general case have expanded (increased) and it is possible to introduce a new concept of a game solution (P(BT)-solution) in which both players increase their payoffs in comparison with the payoffs in Nash equilibrium in the game without switching types of behavior. For players, it is advantageous to implement P(BT)-trajectory; so they will follow the declared indicator function-programs, for example, (9.2), (9.3).

REFERENCES

1. Kleimenov A. F. Neantagonisticheskie positsionnye differentsialnye igry [Non-antagonistic Positional Differential Games]. Ekaterinburg: Nauka, 1993. 185 p. (in Russian).

2. Kleimenov A. F. Solutions in a non-antagonistic positional differential game. J. Appl. Math. Mech., 1997. Vol. 61, No. 5. P. 717-723. DOI: 10.1016/S0021-8928(97)00094-4

3. Kleimenov A. F. An approach to building dynamics for repeated bimatrix 2x2 games involving various behavior types. In: Dynamic and Control. London: Gordon and Breach Sci. Publ., 1998. P. 195-204.

4. Kleimenov A.F. Altruistic behavior in a non-antagonistic positional differential game. Autom Remote Control, 2017. Vol. 78, No. 4. P. 762-769. DOI: 10.1134/S0005117917040178

5. Kleimenov A. F., Kryazhimskii A. V. Normal Behavior, Altruism and Aggression in Cooperative Game Dynamics. Interim Report IR-98-076. Laxenburg: IIASA, 1998. 47 p. URL: https://core.ac.uk/download/pdf/52947411.pdf

6. Kononenko A. F. On equilibrium positional strategies in nonantagonistic differential games. Dokl. Akad. Nauk SSSR, 1976. Vol. 231, No. 2. P. 285-288. (in Russian).

7. Krasovskii N.N. Upravlenie dinamicheskoi sistemoi [Control of a Dynamical System]. Moscow: Nauka, 1985. 520 p. (in Russian).

8. Krasovskii N. N., Subbotin A. I. Game-Theoretical Control Problems. New York: Springer-Verlag, 1988. 517 p.

9. Petrosyan L. A., Zenkevich N. A., Shevkoplyas E. V. Teoriya igr [Game Theory]. St. Petersburg: BHV-Petersburg, 2012. 424 p. (in Russian).

i Надоели баннеры? Вы всегда можете отключить рекламу.