Научная статья на тему 'RANDOM SEARCH METHODS FOR THE SOLUTION OF A STACKELBERG GAME OF RESOURCE ALLOCATION'

RANDOM SEARCH METHODS FOR THE SOLUTION OF A STACKELBERG GAME OF RESOURCE ALLOCATION Текст научной статьи по специальности «Математика»

CC BY
5
3
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
RANDOM SEARCH METHODS / STACKELBERG GAME / RESOURCE ALLOCATION

Аннотация научной статьи по математике, автор научной работы — Belyavsky Grigory I., Danilova Natalya V.

We consider a dynamic Stackelberg game on a finite time interval. The game is reduced to a problem of infinite-dimensional optimization with two additional constraints. Two finite-dimensional approximations of the problem are defined. They are solved by two numerical algorithms which do not require calculation of the gradient of the payoff function. The first algorithm is an algorithm of simulated annealing with a uniform partition of the interval. The second algorithm uses a piecewise-constant approximation of the solution with a choice of the interval partition. Two illustrative examples connected with a resource allocation problem are considered. The numerical results are given and compared.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «RANDOM SEARCH METHODS FOR THE SOLUTION OF A STACKELBERG GAME OF RESOURCE ALLOCATION»

Contributions to Game Theory and Management, XII, 37-48

Random Search Methods for the Solution of a Stackelberg Game of Resource Allocation*

Grigory I. Belyavsky and Natalya V. Danilova

1.1. Vorovich Institute of Mathematics, Mechanics and Computer Sciences of Southern Federal University, 8a, Milchakova, Rostov-on-Don, Russia beliavsky@hotmail.com daniloval98686@mail.ru

Abstract We consider a dynamic Stackelberg game on a finite time interval. The game is reduced to a problem of infinite-dimensional optimization with two additional constraints. Two finite-dimensional approximations of the problem are defined. They are solved by two numerical algorithms which do not require calculation of the gradient of the payoff function. The first algorithm is an algorithm of simulated annealing with a uniform partition of the interval. The second algorithm uses a piecewise-constant approximation of the solution with a choice of the interval partition. Two illustrative examples connected with a resource allocation problem are considered. The numerical results are given and compared.

1. Introduction

Dynamic Stackelberg games (Basar and Olsder, 1999) are actively analyzed and discussed as adequate models of the hierarchically controlled dynamic systems. Thus, one of the interesting problem domains is resource allocation in organizational and economic systems (Christodoulou et al., 2015; Novikov, 2013).

Analytical methods of solution of the dynamic Stackelberg games are quite complicated due to the complex nature of those models. A comprehensive approach was proposed by Germeier for static Stackelberg games (Germeier, 1986) and developed by Kononenko and Gorelov for the dynamic case (Gorelov and Kononenko, 2015; Kononenko, 1977; Kononenko, 1980). The idea consists in the implementation of a cooperative trajectory and punishment in the case of defection.

However, the numerical algorithms are more convenient in this context. Evolutionary algorithms are especially useful, such as genetic and simulated annealing algorithms (Jones, 2008). An important place belongs to the methods which do not require the calculation of the gradient of the payoff function (Hazan, 2015).

The authors' approach is presented in (Belyavsky et al., 2016; Belyavsky et al., 2018a; Belyavsky et al., 2018b). In the paper Belyavsky et al., 2016 an application of the evolutionary modeling for the solution of the problems of sustainable management in active systems is considered. The different information structures of hierarchical differential games are described. The result which gives the opportunity of using of genetic algorithms for the solution of these problems is obtained and illustrated by a model example. In (Belyavsky et al., 2018a) a dynamic game theoretic model of resource allocation in the organizational system is proposed. The algorithms of evolutionary modeling are developed in this context and illustrated

* The research is supported by the Russian Science Foundation, project 17-19-01038.

by model examples. The paper (Belyavsky et al., 2018b) considers resource allocation among producers (agents) in the case where the Principal knows nothing about their cost functions while the agents have Markovian awareness about their strategies. We use a dynamic setup of the stochastic inverse Stackelberg game as the model and suggest an algorithm for solving this game based on Q-learning. The associated Bellman equations contain functions of one variable for the Principal and the agents.

This paper develops the described approach. In Section 2. the model formulation is given. Section 3. presents the Stackelberg game in infinite-dimensional and finite-dimensional spaces. In Sections 4. and 5. the simulated annealing and binary-partition algorithms are exposed respectively. Section 6. is dedicated to the numerical results and their comparative analysis based on the first numerical example. The section 7. treats an application of the simulated annealing algorithm in a static game with incomplete information. The numerical results concerned with an additional illustrative example are given in Section 8.. Section 9. concludes.

2. A model formulation

A dynamic Stackelberg game with one leader and multiple followers (agents) is considered. The game theoretic model contains the following main elements: state of the

game (x0, x(t)) e R2, strategies (u(t), v(t)) e Rr+1, leader's payoff / g0(x0, u, v) dt,

J 0

r T

agents' payoffs / gi (x, u,v) dt; u — the leader's control; vi — a reaction of the agent

0

indexed by i.

Define the leader's problem as calculation of

f T

max g0(x0,u,v) dt, with constraint dx0(t) = f0(x0,u) dt, xo(0) = x0. (1)

u J 0

A homeostasis condition x(t) e X can be also added, for example, in the form (x0 — x(t))2 < a. The homeostasis condition can be expressed by a penalty: r t

k j (x*0 — x(t))2 dt. We can include the penalty into the leader's payoff functional J 0

f t

g0 (0 , u, v ) — k(x*0 — x(t))2} dt.

0

The agents' problems are set up in the form:

rT

max

0

f t

/ gi(x,u,v) dt, with constraints dxi(t) = fi(xi,vi) dt, xi(0) = x0. (2)

0

It is supposed that the game (1), (2) can be transformed into a static Stackelberg game in the infinite-dimensional linear spaces:

J0 (u,v) ^ max; Ji(u,v) ^ max, i = 1, 2,...,r. (3)

u Vi

In other words, for any feasible strategies (u,v) there is an algorithm of calcu-

(x0, xi)

that the functions u and vi belong to a Banach space B[0,1] of the bounded functions with a uniform norm: ||f || = sup f (t). The normalization in time is made

te [0,1]

additionally. Thus, the game (3) is considered.

3. The Stackelberg game in infinite-dimensional and finite-dimensional spaces

The leader chooses her strategy u and reports it to the agents. In turn, the agents choose their strategies as a best response to the leader's strategy from the set of Nash equilibria in their game in normal form: v(u) G N(u). Therefore the leader's problem takes the form

max min J0(u, v), (4)

u veN(u)

if the agents do not cooperate with her, and the form

max max J0(u, v), (5)

u v£N(u)

if they cooperate.

Let <?(u) be the solution of the internal problem in (4) or (5). Then the leader's problem has the form

max Jo(u), (6)

u

where J0(u) = J0(u,<P(u)).

Consider a finite-dimensional approximation of the problem (6). A sufficient condition of the possibility of the finite-dimensional approximation is a continuity of the functional J0(u) on the set of feasible solutions. The continuity is ensured by the Lipshitz condition:

| J0(u) — J0(w)| < L\\u — w\\ which follows from the two inequalities:

|Jo(u2,v) — J0(ui,w)| < Lu\\u2 — u2\\ + Lv\\v — w\\r,

\\<?(u2) — <?(ui)\\r < L*\u2 — ui\\. U

The last inequality in (7) is the most difficult for checking.

Consider the first class of feasible controls as a subset of the space of bounded functions in the form

L1([0,1])= Ju G B[0,1]: 3 a, sup |u(^ ~ u(s)l < a,

I |t — s| (8)

0 < t < 1, 0 < s < 1, t = si.

It is assumed that the leader is in a sense restricted in her actions and therefore chooses her controls from this class. In other words, the leader is unable to make 'sharp motions'.

The next result forms a base for the proposed method of finite-dimensional approximation.

Theorem 1 (Belyavsky et al., 2016). If u G L1([0,1]) then a sequence exists

n

un(t)= u0 + ^^¿n(u)1{t>ri>, ¿n(u) G{ —1,0,1}, Ti = i/n, i = 0,1,...,n, (9)

i=i

that converges to u by the norm of the space B[0,1].

This result means that the subset

i n

L1[0,1] = < u G L[0,1] : 3 (uo, a, n, S), u = uo + a §iI{t>T.},

^ i=i

Si G {-1,0,1}, Ti = i/n, i = -0,1,.. .,n

is dense in L1[0,1].

Assume that the initial problem (6) with an additional constraint u e L1[0,1] has a solution. Then the finite-dimensional approximation of the problem (6) with the additional constraint is the optimization problem

max J0(u) = max J0(u0,a,A), A = (5i)rl=1- (10)

«£L1[0,1] u0eR,a,A

A

max J0(u0, a, A) has a solution. Suppose that the condition holds. The reason of

uoER,a

the finite-dimensional approximation is given by the following result. Theorem 2 (Belyavsky et al., 2016). Let p* = max J0(u) = J0(u*), q* =

ueL[0,i]

max J0(u) = J0(u*). Then for any e> 0 we have p* — q* < e.

ueL[0,i]

In fact, the continuity of the functional J0 (u) and the density of t he set L1([0, 1]) in the set L1([0,1]) imply the following inequalities: p* — q* = J0(u*) — J0(u*) < J0(U*) — J0(u1) < e. To satisfy the last inequality it is required to choose u1 e L1([0,1]) close enough to u*.

The second class of feasible controls is a set of bounded functions having on the segment [0,1] a finite number of the points of discontinuity. Denote this set by L2([0,1]). Define the set

L2([0,1]) = |u e B[0,1]: 3 (n, (ci)n=1, (Ti)=), 0 = T0 < n < • • • < rn = 1,

n 1

f (t) = C1l0(t)+Y, CiI{T-i^](t) .

i=1 >

It is evident that the set L2 is dense in the set L2. Given the continuity of the functional J0(u) on the set L2([0, 1]) the problem

max J0(u) (11)

u£L 2 [0,1]

is a finite-dimensional approximation of the problem (6) with the additional constraint u e L2 [0,1] if both problems have solutions.

4. A simulated annealing algorithm

The simulation annealing algorithm is used for the solution of the problem (10). Let

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

a uo

real numbers, the third variable A takes its values from the finite set of sequences S = {№)" : Si G {-1,0,1}}. If the functional J0(u) satisfies a global Lipshitz condition with the Lipshitz constant ^^d K = sup a in the definition (8) then for

a given e the number of elements in the sequence is equal to n = [K^] + 1 (see details in Belyavsky et al., 2016). Thus, the considered problem is connected with

calculation of the maximum of the function F(A) = max J0(w0, a, A) on the finite

uo,a

set S. Problems of that kind can be solved efficiently by the algorithms of evolutionary modeling, such as genetic algorithms and simulated annealing algorithms. The genetic algorithm was used in (Belyavsky et al., 2016; Belyavsky et al., 2018a), that's why here we consider the simulated annealing algorithm (Jones, 2008).

The algorithm starts from an initial A and an initial temperature T = Ts. The iterations have the following form.

1. The new A is emulated in the neighborhood of the current A.

2. If F(A) > F(A) then A := A else A := A with probability

f F(A) - F(A) \ p = exP ( —t—) •

3. Set the temperature T := qT where 0 < q < 1.

4. The iterations are repeated until (T > Tf) A (|AF| > e).

5. A binary partition algorithm

Now suppose that the functional J0(w) is additive, or for an arbitrary partition of the segment [0,1]: 0 = t0 < ti < • • • < Tn = 1 the inequality holds:

n n

Jo(u) = Jo (53u(i)/{[Ti_liTi)}(i^ = ^ .7o(u(i)/{[Ti_1 ,n)}(t)) • (12)

i=i i=i

The algorithm has the following form.

1. Initialization. Let the initial partition of the segment is given: [0,1/2)U[1/2,1]. Consider the approximation u1(t) = c1i0/[01/2](t) + c1i1/(1/2i1](t). From all approximations in this form choose the best one using the property of additivity of the functional J^w1) = J0(c1j0/[0j1/2]) + •/0(c1j1/(1/2,1]). For this purpose let's solve two independent problems: max Jo(c/[0i1/2]), max Jo(c/[0i1/2]).

n

current partition be 0 = Ton) < T(n) < • • • < T,ln) = 1, and the respective current

n

approximation be w(n)(t) = cn 1/0(t) + J2 cn i / (n) <™>i (t)- Define the sequence

i=1 ' (Ti-1 ,Ti ]

bn = (n)1 (n) J0 (cnI/ (n) (n)i ) and the respective probabilities of choice of the Tj -Tj-1 ^ ' Vi-1' Tj J'

interval: pnj = Jbn,j—, j = 1, 2,..^n. The interval is chosen randomly accordE bn,k k = 1

ing to the distribution of probabilities pn , The chosen interval with the index j,

namely

t (n) t (n)

r(n) + T (n) j-1 +Tj t (n) 2 'Tj

is partitioned on two intervals of equal length and the new approximation

n+1

wn+1(t) = Cn+1 ,1/0 + Cn+1 ,i1fT(n+1) T(n+1)] (t)

, s T(n) I T(n) rj-1' 2

is calculated, where cn+iii = c^ if i = 1,... ,j — 1;

cn+i,j = argmax J0 cl

j —1 3

cn+i,j+i = argmax J0 cl{T(n)+T(n)

c \ 3—1+ 3

cn+i,i = cn,i-i if i = j + 2,... ,n + 1 The new partition Tl>n+li = ri—\ is calculated

if i = 0,...,j — 1 r(n+i) = iZ^-^drf+V = T-i if i = j + 1,...,n + 1. The iterations are repeated until the changes become small enough. This algorithm is a monotonous one. Therefore the sequence J0(un) converges if the leader's payoff functional is bounded from above. An objective of minimization of the number of points of the partition of the interval is aimed additionally.

6. The first example

We will consider a dynamic Stackelberg game which is already reduced to the form (6). The game describes a resource allocation between producers. Notice the monograph (Novikov, 2013) in this connection. An amount of the resource allocated in the moment t is denoted by u(t). Each player receives his part of the resource

r

ui(t); it is evident that u(t) = ui(t). Given the resource each player produces

i=i

a good vi(t) so that to maximize his instant payoff. The leader tends to maximize the total production and uses the proportional distribution mechanism, or

r

ui(t) = Yi(t)vi(t). Then J2 Yi(t)vi(t) = u(t). An instant payoff of the i-th follower

i=i

is calculated as the difference between his part of the resource and production cost, or Pi(t) = Yi(t)vi(t) — fi(vi(t)). The cost pi(x) is a convex non-decreasing function defined on R+, and pi(0) = 0. Thus, an auxiliary Stackelberg game arises in the following form:

rr

max^^vi with constraint ^^Yivi = u, (13)

i=i i=i

max[Yivi — p (vi)].

Vi>0

t

in this formulation the game between followers is decomposed by the independent

r

optimization problems. If we assume that Yi = y then the equality ^ jivi = u

i=i

u

implies the equality y = -> and the solution of the followers' game consists

E

i=i

in the calculation of the Nash equilibrium in the game with individual followers'

max

jr

— fi(vi

ij

(see details in Christodoulou et al., 2015).

Consider the game with cost functions ^ (x) = ^¿x2. In this case the solution of (11) takes the form

vi = -¿T- > Yi = J v^r2" i • (14)

Notice that all Yi do not depend on i

The instant payoff of the leader is determined as i vi (t), u(t) ) • The function R(x, y) increases in the first argument and decreases in the second argument. The

¡■T '

total payoff is an integral of the instant payoff: / u) dt. Based on (14)

«/ 0 „•_1

' 1 / 2u 1

we receive the total payoff as J0(u) = R ( — J =7-1- —, u I dt. Denote

•/o V2V j ¡j № /

1 r 1 - I'T

A(t) = . —rr-. Then J0(u) = RMVu u) dt. Assume that F(x,y) =

2 ^¿(i) yo

/" T 2

x — y, then J0(u) = [AVw — u] dt. It is evident that the function u*(t) = A 4(t) 04

is a maximizer for J0(u). This function satisfies the first and second additional constraints if (t) > 0.

Consider the algorithm of simulated annealing in the game with two followers and ^i(t) = t2 + 1 M2(t) = 2t2 + 1. In this algorithm a partition of the interval

r n__-

is fixed. Calculate F(A) = max J2(aiVu0 + ag^A) — (u0 + ag^A))At) . From

«o,« Li=i J

the concavity by u0 and a follows that the optimal values are the solutions of the

algebraic system of two equations:

nn

Y, , aj = 2, y,\ , aj — 2gi{A)At \ = 0. VU0 + agj(A) y ^0 + agj(A) y

/Ti i j-1

A(t) dt, g^A) = — Jj At = t — Tj_1. For calculi " j=i lation A in the neighborhood of A each Jj is replaced by 4 e { — 1,0,1} \ Jj with probability q/n and equal probabilities on the set { — 1,0,1} \ Jj. The parameter q e {1,2,..., n — 1} determines a mean number of the changing elements A. The results of calculations are presented in Fig. 6..

Now consider an application of the binary partition algorithm to the same problem. For this purpose, the following problem is solved each time: for an interval

[t, s] it is required to find min

c

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

. 1—- [ A(t) dt ). The results of calculations are presented in Fig. 6.. 2(s — T) Jt j

Vc J A(t) dt - c(s - t)

. The optimal value is

7. A static game with incomplete information: The simulated annealing algorithm

Consider the following problem setup (Belyavsky et al., 2018b). The leader uses a resource allocation as an incentive for the agent and tries to max[^(v)-^(u, v)]. The

Fig. 1. The results of simulated annealing algorithm for Tmax = 100, Tmin = 10 9 = 5, n = 64. The dotted line is the exact solution

Fig. 2. The results of the binary partition algorithm. The dotted line is the exact solution

agent maximizes his profit: max.[<(u,v) — f (v)]. Assume that a Nash equilibrium

v

(u*,v*) exists in this game. The feature of the game is that the leader does not know f

So, the leader uses a sequence of controls u(t) for the determination of u*. The sequence v(t), t = 0,1,... represents the agent's best responses on u(t). In each moment of time t the leader knows an interval v(0),... ,v(t — 1) of the best response sequence of the agent. Based on this information, the leader chooses the control u(t) = u(t — 1)(1 + aS(t)), according to which the agent receive the amount of resource <(v(t — 1), u(t)). Similarly to Section 3., S(t) e { — 1,0,1} 0 < a < 1. Thus, in each iteration of the game the leader can save her control or increase/decrease it by a fixed value according to a 'Lipschitz' concept of the paper. The initial value u(0) and the sequence S(t) completely determine the sequence u(t). Assume that the initial value u(0) = x is known.

Consider the agent's problem. The agent supposes that S(t) is a Markov sequence with the set of states { — 1,0,1} and a transition probabilities matrix Q with dimension (3 x 3). The initial probability distribution y on the set { — 1,0,1} is given. The agent's problem is to calculate

w

max EXyY, F [f(v(t — 1), u(t)) — f (v(t))]. (15)

t=i

If the function — f (z) + ft<p(z,w) is strictly concave for any value of the argument w then the optimal control of the agent is

v*(x,y) = arg max [—f (z) + ¡3<p(z,y(1 + a(qx,3 — qx,i)))] (16)

(see details in Belyavsky et al., 2018b). Thus, the current reaction of the agent is calculated as

v(t) = argmax [-f (z) + i3<(z,u(t)(1 + a(qfmi3 — qkt),^)] ■ (17)

Qt

for the interval S(1),... ,S(t).

v(t)

S(t), and chooses the next value S(t + 1) so as the consequent agent's best response v(t + 1) is a random value. Thus, the leader solves the problem

max E |>(v(S(t + 1))) — <(v(t),ut(1 + S(t + 1)))] . (18)

5(t+1)

This problem is solved by the reinforcing learning algorithm. According to this

Q

Qt+i(S(t)) = Qt(S(t)) + ht(R(S(t)) — Qt(S(t))). (19)

In (19) R(S(t)) = v(t)) — <(v(t — 1),^^^ №ial value Q0(•) = 0, and the

sequence h satisfies the condition: ^^ ht = ro, ^^ hj < to. Then the distribution

t=i t=i of probabilities on the set { — 1,0,1} is calculated as follows,

i

Pt+i(j)=exp(Qt+i(j)/Tt+i) / ]T exp(Qt+i(k)/Tt+i), j = —1, 0,1; (20)

k=-i

and respectively S(t + 1) is chosen. In (20) T is a temperature that controls the degree of randomness in the choice of the next S see Section 3.. The convergence of the algorithm is studied in (Sutton and Barto, 1998).

8. The second example

Consider another illustrative example where ^(x) = yjx, <p(x,y) = xy h fi(x) = —x2. It is borrowed from (Belyavsky et al., 2018b) and slightly modified. The equal. A

ity (17) takes the form v(t) = argmax — —z2 + ¡zu(t) i 1 + a(qtg(t) 3 — qtg(t) 1

fizu(t)h+ a(qfS(t)3 — q/mA)

simple calculation gives v(t) =---. The equilibrium so-

2^

pu* ( u, \ 1/3

lution in the game (15), (18) is v* = ——, u* = i — J . The equilibrium solution

u * ( — \ 1/3

in the initial game is: v* = ——, u* = ( — ) . For — =1 ¡5 = 0.9 a = 0.05 the

2— V 8 /

following numerical results are received:

Iteration The leader's control The leader's payoff

1 0.4 0

2 0.412 0.352264

3 0.39964 0.354196

4 0.411629 0.352203

5 0.423978 0.35414

6 0.436697 0.355904

7 0.436697 0.357482

8 0.449798 0.357482

9 0.463292 0.358856

10 0.449394 0.36001

11 0.462875 0.358817

12 0.476762 0.359978

13 0.491064 0.360902

14 0.491064 0.361569

15 0.491064 0.361569

16 0.476333 0.361569

17 0.490622 0.360877

18 0.505341 0.361553

19 0.520501 0.361952

20 0.504886 0.362054

21 0.520033 0.361944

22 0.504432 0.362055

23 0.504432 0.361936

24 0.504432 0.361936

25 0.504432 0.361936

26 0.504432 0.361936

Table 1. The numerical results for the Example 2

Note that the equilibrium solution of the leader for the given input data is equal to 0.519, and the respective payoff is 0.362.

9. Conclusion

The considered example with the known exact solution demonstrates a numerical applicability of both algorithms. As the algorithms require only the calculation of payoff functionals then they are efficient in situations when other methods do not generate implementable numerical schemes. It should be noticed that optimization methods which not require the calculation of gradient are actively discussed in the modern literature (see, for example, Hazan, 2015).

For a comparable precision of the approximate calculations given by the algorithms in the second example, these algorithms essentially differ. Their comparison by some important criteria is given in Table 2.

A simulated annealing algorithm A binary partition algorithm

Additional conditions on the leader's functional absent present (additivity)

Additional conditions on the leader's solution present (Lipschitz codi-tion) practically absent

Pre-partition interval require do not require

Table 2. Comparison of algorithms

Note that in the static game with incomplete information (Section 7.) a binary-partition algorithm is not applicable.

The comparison of the simulated annealing and the genetic algorithm results in the following conclusions. Both of them are random search algorithms in the optimization problems. An essential difference is that the simulated annealing is an algorithm of the depth search where only one potential solution is studied in each iteration, while the genetic algorithm is a width search algorithm which tests several potential solutions at each step. By the genetic algorithm a faster convergence maybe expected, while the genetic algorithm has simpler iterations in a numerical sense. Therefore, the choice between them is determined by a specific problem. For example, if the calculation of the value of a finite-dimensional leader's objective function is a complicated problem, and the objective function is close to the concave one then the simulated annealing algorithm looks more attractive.

The considered methods are close to the method of scenarios used in simulation modeling.

References

Basar, T., Olsder, G. Y. (1999). Dynamic Non-Cooperative Game Theory. SIAM. Belyavsky, G.I., Danilova, N.V.; Ougolnitsky, G.A. (2016). Evolutionary modeling in sustainable management of active systems. Math. Game Theory Appl., 8(4), 14-29 (In Russian).

Belyavsky, G.I., Danilova, N.V., Ougolnitsky, G.A. (2018a). Evolutionary methods for solving dynamic resource allocation problems. Math. Game Theory Appl., 10(1), 5-22 (In Russian).

Belyavsky, G., Danilova, N., Ougolnitsky, G. (2018b). A Markovian Mechanism of Proportional Resource Allocation in the Incentive Model as a Dynamic Stochastic Inverse Stackelberg Game. Mathematics, 6(8), 131. Christodoulou, G., Sqouritza, A., Tang, B. (2015). On the Efficiency of the Proportional Allocation Mechanism for Divisible Resources. M. Hoefer (Ed.): SAGT. LNCS 9347, 165-177.

Germeier, Yu.B. (1986). Non-antagonistic Games. Reidel Publishing Co., Dordrecht, Boston.

Gorelov, M. A., and Kononenko, A. F. (2015). Dynamic models of conflicts. III. Hierarchical games. Automation and Remote Control, 76(2), 264-277.

Hazan, E. (2015). Introduction to Online Convex Optimization. Foundations and Trends in Optimization, 2(3-4), 157-325.

Jones, M. T. (2008). Artificial Intelligence: A Systems Approach. Infinity Science Press, Hingham, MA.

Kononenko, A. F. (1977). On multi-step conflicts with information exchange. USSR Comp. Math, and Math. Phys., 17, 104-113.

Kononenko, A. F. (1980). The structure of the optimal strategy in controlled dynamic systems. USSR Comp. Math, and Math. Phys., 13-24.

Novikov, D. (2013). Theory of Control in Organizations. Nova Science Publishers, New York.

Sutton, R. S., Barto, A. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.

i Надоели баннеры? Вы всегда можете отключить рекламу.