Contributions to Game Theory and Management, XV, 51—59
Optimal Control in a Multiagent Opinion Dynamic System*
Jingjing Gao and Elena Parilina
St. Petersburg State University, Faculty of Applied Mathematics and Control Processes, 7/9, Universitetskaya nab., St. Petersburg, 199034, Russia E-mail: [email protected] E-mail: [email protected]
Abstract The paper considers a multiagent system of opinion dynamics modeling a finite social network opinion transformation. In the system, there is an influencer or a player who is interested in making the agents' opinions in the system close to the target opinion. We assume that the player can influence the system only at a limited number of time periods. The player minimizes his costs by selecting moments to control the multiagent system at these moments, while at any time period he observes the agents' opinions. The optimization problem is solved using the Euler-equation approach. The numerical simulations represent the proposed method of finding the optimal solution of the problem.
Keywords: multiagent system, opinion dynamics, linear-quadratic games, Euler-equation approach.
1. Introduction
Social network modeling attracts a great interest of the researchers from different areas (sociology, applied mathematics, physics, management science, economics, etc.). The social network is represented as a multiagent system, in which the members of the network are represented as the agents, and the influencer on the agents' opinions is represented by the player. To describe information exchange in the social networks, the opinion dynamics modeling is used. An agent's opinion is influenced over time by the opinions of those agents who are around him and can be also influenced by the players. There are well known classical models of opinion dynamics, e.g. the DeGroot model (DeGroot, 1974), the Sznajd model (Sznajd-Weron and Sznajd, 2000), bounded confidence model (Deffuant et al., 2000), (Hegselmann and Krause, 2002). Based on these models, many extensions have been proposed like Friedkin and Johnsen model (Friedkin and Johnsen, 1990). Some works on this topic are based on discrete-time linear-quadratic problems or games (Ignaciuk and Bartoszewicz, 2010; Liu et al., 2014). The mean-field linear-quadratic optimal control problem modeling opinion dynamics is studied in (Elliott et al, 2013; Ni et al., 2015). The mean-field game approach is also used in (Bauso et al., 2016). The opinion dynamics models with cooperative and noncoop-erative approaches are examined in (Rogov and Sedakov, 2020; Sedakov and Zhen, 2019).
The papers (Gao and Parilina, 2021a) and (Gao and Parilina, 2021b) consider linear-quadratic optimization problems in the opinion dynamic model. The former focuses on the observation of the opinion state differences at the terminal time in a
*The work of the second author was supported by the Russian Science Foundation grant No. 22-11-00051, https://rscf.ru/en/project/22-11-00051/
https://doi.org/10.21638/11701/spbu31.2022.05
finite-time horizon. The latter focuses on the optimization problem when the difference between agent opinions and socially desirable opinions are taken into account for a given number of observations. In the present paper, we continue the study of the model proposed by Mazalov and Parilina, 2020 and focus on the optimal control problem in the multiagent system when the control can be realized only at the limited number of periods.
We consider a multiagent system of opinion dynamics with two agents and one player. The agents and the player represent a simple social network or multiagent system. The opinions of the agents are influenced by the average social opinion. The player influences the opinion of only one agent trying to make it closer to the target opinion as possible. We assume that the influence of the player is limited meaning that he can control the agent only at the limited number of periods. The main research question is when the player should add control to influence the agent minimizing his costs when he observes the agents' opinions at any time period. In the paper, we assume that the player considers a set G of control periods for a given number of such periods k. The player uses the periods from this set to influence the agent's opinion at these particular moments. A set of control moments is called optimal for a player when the associated costs are minimal among all such sets. The model is a linear-quadratic optimal control problem, and the solution is found by the Euler equation approach (see Gonzalez-Sanchez and Hernandez-Lerma, 2013; Dechert, 1978; Gonzalez-Sanchez and Hernandez-Lerma, 2014).
The rest of the paper is organized as follows. Section 2 describes the model of a multiagent system of opinion dynamics. Section 3 introduces the Euler-equation approach. Section 4 provides main theoretical results while Section 5 demonstrates the results of the numerical simulations. Section 6 concludes the paper.
2. Model
We consider a multiagent system representing a social network with two agents. Let xi(t) G IR (x2(t) G IR) be the opinion of agent 1 (agent 2) at moment t, t = 0,...,T. We assume that the player, who is not an agent in the system, can control agent 1 at several moments but he can observe the system at any moment t. We denote the player's influence on agent 1 at moment t by u(t) G IR, t = 0,..., T — 1. The set of moment at which player controls agent 1 is called the set of control moments. The number of elements in the set G is given, and it is equal to k < T. Therefore, we consider the problem, when k is known to the player, but the set of control moments G is not fixed. Let this set be represented as G = {t1,t2,... ,tk}.
When moment t belongs to the set of control moments G, agent 1's future opinion depends on his own present opinion, the present average opinion of the society, and the player's present control. When moment t does not belong to the set of control moments G, agent 1's future opinion depends on his own present opinion and the present average opinion of the society. Agent 2 is not influenced by the player, and his future opinion depends on his own present opinion and the present average opinion of the society. The dynamics of the agents' opinions are defined by
Optimal Control in a Multiagent Opinion Dynamic System 53 the following equations:
xi (t + 1)= xi (t) + o? (X1 (t)+ X2 (t) - xi (t)^ + u (t), t e G, (1)
xi (t + 1)= xi (t) + o^X1 (t)+ X2 (t) - xi (t)^j , t eG, (2)
X2 (t + 1)= x2 (t) + oW xi (t)+ x2 (t) - x2 (t) ) , t = 0,...,T - 1, (3)
with the initial condition
xi (0) = x?, x2 (0) = x0. (4)
In the equations of dynamics (1)-(3), the constants o? > 0, o2 > 0 denote the agent 1 and agent 2's beliefs about the average social opinion, respectively.
The player needs to define a set of control moments G = {t?,t2,...,tk} for a given k. Let 0 < t? < t2 < ... < tk < T - 1, where k < T. The player's target opinion is s e IR. The player aims to minimize his costs by choosing the set of control moments and choosing the values of controls for the periods from the set of control moments. We first solve the optimization problem over the set of controls for a given set of control moments G. The optimization function of the player is
k T
min J (u) = ^ 5fj (cu2(tj)) + ^ 5l ((x? (t) - s)2 + (x2 (t) - s)^ , (5)
" j=i t=o
where 5 e (0,1] is the discount factor and c > 0 is the player's cost per unit level of influence.
Second, we choose the set of control moments over all possible ones with the player's minimal costs.
3. The Euler-Equation Approach
In this section we briefly describe the Euler-equation approach to solve the optimization problem. Let X c IRn and U C IRm be the state space and the control set, respectively. Given an initial state x0 e X, the state of a system evaluates with respect to dynamics
x(t + 1) = /i(x(t), u(t)), t = 0,1,..., T - 1. (6)
The optimal control problem is to find a control u(t) e U maximizing functional
T
£ 5iri(x(t), u(t)) (7)
t=o
with respect to the state dynamics equations (6) and a given initial condition x(0) = x0, where rt(x(t), u(t)) is a reward or cost function of a player.
We can reformulate this problem in terms of the state trajectory x(t) by expressing u(t) from equation (6) as a function of x(t) and x(t + 1), say u(t) = q(x(t),x(t + 1)). Therefore, we can rewrite functional (7) in the following form:
T
£ 54 gt(x(t),x(t + 1)),
t=o
where gt(x(t),x(t + 1)) = rt(x(t), q(x(t), x(t + 1))), t = 0,1,... ,T - 1. The Euler-equation approach gives the necessary condition1 of the optimal trajectory x*(t) that are2
dgt_l (x*(t - 1),x*(t)) + (x*(t),x*(t + 1)) =0, t =1,...,T - 1. (9)
dy dx iii
where gt-1 is differentiated by y, the second variable of gt-1, and gt is differentiated by x, the first variable in gt.
We can notice that the problem considered in the paper belongs to the class of linear-quadratic optimization problems, and we apply the Euler-equation method to find the player's optimal strategy in the dynamic problem with average-oriented opinion dynamics (see Mazalov and Parilina, 2020).
4. Theoretical Results
The necessary conditions of the optimal control problem (5) s.t. (1)-(3) with initial condition (4) are given in the following theorem.
Theorem 1. Let {w*(t) : t = t1,t2,... , tk} be the optimal strategy minimizing functional (5) subject to initial conditions (4) and state dynamics equations (1), (2) and (3), and {(x|(t),x|(t)) : t = 0,..., T} be the corresponding state trajectory. The moments 0 < t1 < t2 < ... < tk < T — 1, are given. Then the optimal strategy u*(t),t = t1,t2,... ,tk is defined as
u*(t) = z*(t + 1) — Az*(t)
and corresponding optimal state trajectory (x|(t),x|(t)), t = 1,...,T satisfy the system of equations:
(^ — 5) z (t) + z (t — 1) = (5 — a^) (x2 (t) — s) — x2 (t — 1) + s,
t = 1,...,T — 1, t,t — 1 £ {t1,t2,...tfc, } , Bz (t) + Cz (t — 1) — Acz (t — 2) = (52 — a252) (x2 (t) — s) —5 (x2 (t — 1) — s), t = 1, ...,T — 1, t £ {t1,t2,...,tfc},t — 1 e{t2,...,tfc}, Dz (t) + Ez (t 1) — Acz (t — 2) + Ac52z (t + 1)
= (52 — a252) (x2 (t) — s) — 5 (x2 (t — 1) — s), t = 1,..., T — 1, t,t — 1 £ {t2,...,tfc} , Fz (t) + (A2c + 1) z (t — 1) + Ac5z (t + 1) = (5 — a25) (x2 (t) — s)
—x2 (t — 1) + s, t = 1, ...,T — 1, (10)
t £ {t1,t2,...,tfc} ,t — 1 £ {t1,t2,...,tfc} ,
z (t) + z (t — 1) = —a25 (x2 (t) — s) — x2 (t — 1) + s, t = T, T £ {t1,t2,...,tk} ,T — 1 £ {t1,t2,...,tk} , ^^z (t) + (c + 5) z (t — 1) — Acz (t — 2) = —a252 (x2 (t) — s)
—x2 (t — 1) + s, t = T, T £ {t1,t2,...,tfc } ,T — 1 £ {t2,...,tfc } , z (t) + x2 (t) — s = 0, t = T, T — 1 £ {t1, t2,..., tk} , c (z (t) — Az (t — 1)) + 5 (z (t) + x2 (t) — s) = 0, t = T, T — 1 £ {t1,t2,...,tk} ,
x2 (t +1) = x2 (t) + a2 z (t), t = 0,..., t — 1,
1See (González-Sánchez and Hernández-Lerma, 2013; Dechert, 1978;
González-Sánchez and Hernández-Lerma, 2014).
2 We assume that the conditions of Theorem 2.1 in (González-Sánchez and Hernández-Lerma, 2013) are satisfied.
where z*(t) = x|(t) - x2(t), and
a = o? + 02
= 2 ,
B = o^ - c5 - 52, 2 '
C = Ac5 - c - 5,
D = - Ac5 - c5 - A2c52 - 52, E = Ac5 + c + A2c5 + 5,
F = - Ac - 5 - A2c5. 2
Proof. We represent a new variable z (t) as
z (t) = x? (t) - x2 (t), t = 0,..., T.
From state equations (1), (2) and (3) taking into account expression of z (t), we obtain the new state equations:
z (t + 1) = Az (t)+ u (t), t e{ti,t2,...,tk} , (11)
z (t +1) = Az (t), t e {ti,t2 ,...,tk } , x2 (t + 1) = x2 (t) + o2z (t), t = 0,..., T - 1, (12)
(0) = x? - x2, x2 (0) =
with initial condition
1 - x2
where A = 1 - ^^.
We find an expression of u (t) from (11) and obtain
u (t) = z (t + 1) - Az (t), t e{ti,t2,...,tk} . (13)
Substituting these expressions into i=0 5tgt(x(t), x(t + 1)), we can rewrite the functional in the following form:
k
2
J (z, X2) = (x?(0) - s)2 + (x2(0) - s)2 + ^ |_c (z (tj + 1) - Az (tj))
j=i
T
+ 2 ^ f(z (t) + X2 (t) - s)2 + (X2 (t) - s)
t=l
To minimize J (z, x2) under condition given by equations (12) and (13), we form the Lagrange function
t-1
L (z, X2, k) = J (z, X2) + 2 kt (x2 (t +1) - X2 (t) - f. z (t)j .
t=0
The first-order conditions are dLdz'(t)'k) = 0, t = 1,...,T, and dLd(X,:(2)k) = 0, t =1,...,T. 2
First, we find the derivatives and get
dJ (z,x2) = 5*2 (z (t) + x2 (t) — s) ,t = 1,...,T — 1,t,t — 1 £ {t1,t2,...,tk } ,
dz (t)
d J (z, x2) - 5i_12c (z (t) — Az (t — 1)) + 5*2 (z (t) + x2 (t) — s),
dz (t)
t £ {t1,t2, •••tk} ,t — 1 £{t1,t2,...,tk} ,
^ J^(t)^ = 5t-12c (z (t) — Az (t — 1)) — 5*2Ac (z (t + 1) — Az (t))
+ 5*2 (z (t) + x2 (t) — s) , t,t — 1 £ {t1,t2,...,tk} ,
(z'x2) = —5*2Ac (z (t +1) — Az (t)) + 5*2 (z (t) + x2 (t) — s), dz (t)
t £{t1,t2, •••tk} ,t — 1 £ {t1,t2,...,tk} , 5 J (Z, x2) = 5*2 (z (t) + x2 (t) — s), t = T, t — 1 £ {t1, t2,..., tk} ,
dz (t)
^ 5i_12c (z (t) — Az (t — 1)) + 5*2 (z (t) + x2 (t) — s),
<9J (z,x2^ rf_1
dz (t)
t = T,t — 1 £ {t1,t2,...,tk} ,
<9J (z,x2^ rf
dx2 (t)
= 5* [2 (z (t) + x2 (t) — s) + 2 (x2 (t) — s)], t = 1,..., T.
Second, we rewrite the system of the first-order conditions in the following form:
' z (t) + x2 (t) — s = f-ki5-i, t = 1,..., T — 1, t,t — 1 £ {t1,t2,...,tk}, c (z (t) — Az (t — 1)) + 5 (z (t) + x2 (t) — s) = f M_i+1, t £ {t1,t2,...,tk} ,t — 1 £ {t1,t2,...,tk} , c (z (t) — Az (t — 1)) — Ac(z (t + 1) — Az (t)) + z (t) + x2 (t) — s = f ki5_i, t,t — 1 £{t1,t2,...,tk} , < —Ac (z (t +1) — Az (t)) + z (t) + x2 (t) — s = f ki5_i, (14)
t £ {t1,t2, ••• tk} ,t — 1 £ {t1,t2,...,tk} , z (t) + x2 (t) — s = 0, t = T, t — 1 £ {t1, t2,..., tk} , c (z (t) — Az (t — 1)) + 5 (z (t) + x2 (t) — s) = 0,
t = T,t — 1 £ {t1,t2,...,tk} , 5* [2z (t) + 4 (x2 (t) — s)] + kt_1 — k = 0, t = 1,..., T — 1, 5* [2z (t) + 4 (x2 (t) — s)] + kt_1 =0, t = T,
with initial conditions z (0) = x0 — x2, x2 (0) = x2.
Excluding kt from system (14), finally we obtain the system of equations
(Y - 8) z (t) + z (t - 1) = (5 - 028) (xa (t) - s) - X2 (t - 1) + s,
t = 1,...,T - 1, t,t - 1 £ {ti,ta,...,tfc} , Bz (t) + Cz (t - 1) - Acz (t - 2) = (82 - o282) (x2 (t) - s) - 8 (x2 (t - 1) - s),
t = 1,..., T - 1, t £ {ti, t2,..., tfc} , t - 1 £ {t2,..., tfc} , Dz (t) + Ez (t - 1) - Acz (t - 2) + Ac82z (t + 1)
= (82 - o282) (x2 (t) - s) - 8 (x2 (t - 1) - s), t = 1,..., T - 1, t,t - 1 £ {t2,...,tfc} , Fz (t) + (A2c + 1) z (t - 1) + Ac8z (t + 1) = (8 - a28) (x2 (t) - s)
-X2 (t - 1) + s, t = 1,...,T - 1, t £ {ti,t2,...,tfc} , t - 1 £ {ti,t2,...,tfc} , z (t) + z (t - 1) = -0,28 (x2 (t) - s) - x2 (t - 1) + s, t = T,T £ {ti,t2,...,tfc} ,T - 1 £ {ti,t2,...,tfc} , ^z (t) + (c + 8) z (t - 1) - Acz (t - 2) = -o282 (x2 (t) - s) - x2 (t - 1) + s,
t = T,T £ {ti,t2, ••• tfc} ,T - 1 £ {t2,...,tfc } , z (t) + x2 (t) - s = 0, t = T, T - 1 £ {ti, t2,..., tfc} , U (z (t) - Az (t - 1)) + 8 (z (t) + x2 (t) - s) = 0, t = T, T - 1 £ {ti, t2,..., tfc} ,
where B = ^f2 - c8 - 82, C = Ac8 - c - 8, D = ^^ - Ac8 - c8 - A2c82 - 82, E = Ac8 + c + A2c8 + 8, F = ^ - Ac - 8 - A2c8. The theorem is proved. □
We use the following algorithm to find the optimal solution:
1. Using Theorem 1, solve the optimization problem by minimizing functional (5) subject to initial conditions (4) and state dynamics equations (1), (2) and (3) for a given set of control moments 0 < ti <t2 < ... <tk < T - 1. Calculate the optimal value of the functional (5).
2. Choose the set of control moments to minimize the value of the functional (5) over all sets of control moments, when k is fixed.
5. Numerical Simulations
In this section we demonstrate the results of the numerical simulations based on the theoretical results presented in Section 4.
Table 1. Optimal control trajectories and state.
t 11 = 0 1 2 3 4 5
xi(t) 0.7000 0.5193 0.5084 0.5036 0.5016 0.5007
X2 (t) 0.2000 0.4250 0.4674 0.4858 0.4939 0.4973
z(t) 0.5000 0.0943 0.0409 0.0178 0.0077 0.0034
u(t) -0.1307
t 6 7 t2 =8 ts = 9 10
xi(t) 0.5003 0.5001 0.5001 0.5000 0.5000
X2 (t) 0.4988 0.4995 0.4998 0.4999 0.5000
z(t) 0.0015 0.0006 0.0003 0.0001 0.00005
u(t) -0.000007 -0.000006
Let a\ = 0.2, a2 = 0.9, 5 =1, c = 0.8 and initial agents' opinions be xi(0) = 0.7, x2 (0) = 0.2. The player's target opinion is s = 0.5. We also assume that k is equal to three. For the time horizon T = 10, we realize the algorithm and obtain that the player's minimal costs are obtained when the set of control moments is {0, 8,9}. The values of the optimal agents' opinion trajectories and the optimal control trajectory are given in Table 1. The optimal value of functional (5) is 0.1511.
We introduce the optimal opinion trajectory (for both agents 1 and 2) and player's strategy trajectory on Figures 1 and 2 respectively.
Fig. 1. Optimal state trajectories (blue — x1(t), red — x2(t))
u
o
-0.02 -0.04 -0.06 -o.os -0.1 -0.12 -0.14
Add control at moments 0, 8, 9
1 5 -« -1 ! - 1
I
Fig. 2. Optimal strategy trajectory u(t)
Behaving optimally, the player chooses to control agent 1 at the moments 0, 8, and 9 to influence his opinion. We should remind that the player observes the opinions of both agents at each moment. Calculations show that the player finds this set of the control moments optimal, i.e. the set of control moments {0,8, 9} minimizes his total costs which are 0.1511. We can easily notice on Figure 1 that after moment 4 the opinions of both agents almost reach the target opinion s = 0.5.
6. Conclusion
We model a multiagent system of an opinion dynamics. In the system, there are two agents and one player who is interested in making society opinion closer to the target opinion. The feature of the model is that the player can control only one agent in the system, and can influence this agent limited number of time moments. We find the necessary conditions for the optimal solution of the problem which is in his costs minimization. In the numerical simulations, we find the set of control moments and optimal player's controls that he chooses at the moments from this set. The model can also be extended to the larger number of agents or multiple players.
References
Bauso, D. Tembine, H. and Basar, T. (2016). Opinion Dynamics in Social Networks
through Mean-Field Games. SIAM J. Con. Opt., 54(6), 3225-3257. Dechert, D. (1978). Optimal control problems from second-order difference equations. J.
Econ. Theory, 19(1), 50-63. Deffuant, G., Neau, D., Amblard, F. and Weisbuch, G. (2000). Mixing beliefs among interacting agents. Advances in Complex Systems, 3(01n04), 87-98. DeGroot, M. H. (1974). Reaching a consensus. J. Ame. Sta. Ass., 69(345), 118-121. Elliott, R., Li, X. and Ni, Y. H. (2013). Discrete time mean-field stochastic linear-quadratic
optimal control problems. Automatica, 49(11), 3222-3233. Friedkin, N. E. and Johnsen, E. C. (1990). Social influence and opinions. J. Mat. Soc., 15(3-4), 193-206.
Gao, J. and Parilina, E. (2021a). Average-oriented opinion dynamics with the last moment
of observation. Control Processes and Stability, 8(1), 505-509 (in Russian). Gao, J. and Parilina, E. M. (2021b). Opinion Control Problem with Average-Oriented Opinion Dynamics and Limited Observation Moments. Contributions to Game Theory and Management, 14, 103-112. Gonzalez-Sanchez, D. and Hernandez-Lerma, O. (2013). Discrete-time stochastic control and dynamic potential games: the Euler-Equation approach. Chap. 2. Springer International Publishing: Cham, Switzerland. Gonzalez-Sanchez, D. and Hernandez-Lerma, O. (2014). On the Euler equation approach
to discrete-time nonstationary optimal control problems. J. Dyn. Games, 1(1), 57. Hegselmann, R. and Krause, U. (2002). Opinion dynamics and bounded confidence models,
analysis, and simulation. J. art. soc. soc. simu., 5(3). Ignaciuk, P. and Bartoszewicz, A. (2010). Linear-quadratic optimal control strategy for
periodic-review inventory systems. Automatica, 46(12), 1982-1993. Liu, X., Li, Y. and Zhang, W. (2014). Stochastic linear quadratic optimal control with
constraint for discrete-time systems. App. Math. Comp., 228, 264-270. Mazalov, V. and Parilina, E. (2020). The Euler-Equation Approach in Average-Oriented
Opinion Dynamics. Mathematics, 8(3), 355. Ni, Y. H. Elliott, R. and Li, X. (2015). Discrete-time mean-field Stochastic linear-quadratic
optimal control problems, II: Infinite horizon case. Automatica, 57, 65-77. Rogov, M. A. and Sedakov, A. A. (2020). Coordinated Influence on the Opinions of Social
Network Members. Automation and Remote Control, 81(3), 528-547. Sedakov, A. A. and Zhen, M. (2019). Opinion dynamics game in a social network with two influence nodes. Vestnik of Saint Petersburg University. Applied Mathematics. Computer Science. Control Processes, 15(1), 118-125. Sznajd-Weron, K. and Sznajd, J. (2000). Opinion evolution in closed community. Int. J. Mod. Phy. C, 11(06), 1157-1165.