Vladimir D. Matveenko
St.Petersburg Institute for Economics and Mathematics, RAS Tchaikovskogo Str. 1, St.Petersburg, 191187, Russia E-mail: [email protected]
Abstract A game model proposed here helps to reveal a relation between institutions (i.e. norms and rules used in a society) and decisions of private agents. Players in the game are a government and numerous private agents. Activities of the private agents (the second player) are modeled as paths in an oriented graph with a finite set of nodes. The government (the first player) establishes and announces an institutional system - a set of actions (e.g. taxes, incentives etc.) on the arcs of the graph. A move of a private agent yields her and the government gains depending on the institutional system created by the government. The players try to maximize discounted sums of utilities given discount factors and horizons. The government has no information about a precise number and initial positions of the private agents. The basic question is: can the government establish a consistent institutional system corresponding to a Nash equilibrium? We show that in a specific case of myopic private agents a consistent institutional system does exist. A constructive proof is provided. A case of an almost myopic government is considered in detail. A possible application of the game model is a problem of effectiveness of the government control in science and R&D sector in Russia which became actual in connection with a reform started by the Russian government recently.
Keywords: iterated games, economic behavior, myopic agents, institutions, dynamic programming.
1. Introduction
Studying institutions (i.e. norms and rules used in a society) is an actual question in modern economics, sociology and management. The following problems related to institutions seem to be of great importance. How do institutions emerge? What is a relation between institutions and a behavior of economic agents? Why do institutions differ among countries (and among organizations)? Why is a transplantation of institutions from abroad often unsuccessful? To answer such questions game models are needed.
Generally there are two points of view on an emergence of institutions and correspondingly on a possibility of their changing. These views originate in pamphlets of philosophers of 17th and 18th centuries: Thomas Hobbes’ ’’Leviathan” and Bernard Mandeville’s ”The Fable of Bees”. The first of the views is that institutions are created by governments artificially and purposefully, and the second one is that institutions are a result of a game equilibrium and reflect interests of many participants. Our model unifies these points of view. Governments create institutions
* This work was supported by the Russian Foundation for Humanities (RGNF), project 07-02-04048a.
taking into account a further reaction of private agents. Here are opinions of two famous economists about the role of institutions. (Galbraith, 1983) defined power as ”the ability of individuals or groups to win the submission of others to their purpose”. Political and economic institutions often serve to force economic agents to act in a way suitable for a government. As (North, 1990) noted, ”In the jargon of economists, institutions define and limit the set of choices of individuals”. Our model uncovers mechanisms hinted in these quotations. The game model proposed here helps to reveal a relation between institutions and decisions of private agents.
A description of the model and a content of the further part of the paper are provided in the following Section.
2. The Model
Players in the game are a government and numerous private agents. Activities of the private agents (the second player) are modeled as paths in an oriented graph (.M,N) with a finite set of nodes M (states) and a set of arcs N (actions). The government (the first player) establishes and announces an institutional system P. More precisely, for each node i = 1,..., n the government defines a set p% of actions on the arcs starting from the node i to be used in response to actions of private agents. The institutions p% are taken from a set A1 of feasible institutions. In reality the government’s actions can be e.g. values of taxes and subsidies, other measures of incentives and punishment, etc. Thus the government commits to act in a definite way in response to actions of private agents. The whole institutional system P consists of the institutions p1:
P =(p1 ,p2 ,...,pn).
After the institutional system is announced the private agents choose their paths of actions. A move of a private agent from the node i to the node j yields her an instant utility gain u(i,j,pr). At the same time the government receives an instant utility gain v(i,j,pr). The gains depend on institutions created by the government. We will also use designation u(i,j, P), v(i,j, P). The preferences of different private agents are assumed to be identical, it means that the agents receive similar utilities if they choose the same actions. But the agents can be situated in different nodes, and naturally their choice depends not only on their preferences but on their initial states as well. Each private agent solves the following problem:
max ft7 u(it,it+i,P).
j1 ’"'jn 7=0
where (it,it+i) <G N, 0 < ft < 1 is a discount factor of the private agent, i0 is an
initial state of the agent, and P is an institutional system.
An agent is myopic if she possesses a zero discount factor ft = 0 or a zero horizon T = 0. These two cases are close to each other though not identical. A difference between the two definitions of a myopic agent is discussed in Section 4.
An optimal path for a myopic agent can be found stepwise.
The government maximizes a discounted sum of utilities related to the actions of the private agents given a discount factor 0 < ft < 1 . If the government deals
with a single agent, the government’s problem would be:
T
max ft7v(it ,ii+i,P)
t=o
A specific peculiarity of the model is that the government deals with many agents and possesses only a restricted information. The government has information on the agents’ preferences but not enough information about their number and position. In particular the government has no information about how many agents are active now and in what nodes new active agents will appear in the future.
A question arises: is it possible for the government to establish a consistent institutional system corresponding to a Nash equilibrium under this partial information? Very often in economics and politics it is supposed that such an institutional system is possible. For example, governments in countries are often accused in not satisfying interests of one or other group of citizens. The U.S. is often criticized for so called double standards when different institutions are applied to different partners in international relations. We will see that in a special case of myopic private agents a consistent institutional system really exists in our model.
If the game with a single agent is considered as a game in an expanded form, its structure seems to be rather simple. The first player defines an institutional system P i.e. her actions in all arcs of the graph. After thst the second player chooses a route in the graph. Despite the structure is simple, there may be three serious problems: (1) an enormous number of strategies; (2) a hard calculation of gains; (3) a presence of not a single but many ’’second players” with different initial nodes.
In Section 3 a constructive procedure solving the problem is proposed based on methods of dynamic programming and ’extremal’ (’idempotent’) algebra with operations a ®b = a + ftb,a©b = max {a, b}. (See (Matveenko, 1990, Matveenko, 1998) for applications of extremal algebra to schemes of dynamic programming without or with discounting). An example of application of this constructive procedure to a version of iterated Prisoner’s Dilemma is provided in Section 5.
Another possible application of our model (Section 6) is a problem of an effectiveness of the government control in science and R&D sector in Russia which became actual in connection with a reform started by the government recently. Here M can be interpreted as a set of possible themes which can be explored by researchers, and N is a set of possible directions of new research. The government would be more satisfied if the science deals with themes related to some definite practical issues such as nanotechnologies. The model shows that if the researchers are myopic (i.e. they are interested more in their current achievements and welfare than in long-term perspectives of their activities) then the government is really able to establish effective incentives under incomplete information. However, if the researchers are long-term-oriented, the government’s problem is insolvable.
In Section 4 a detailed discussion of the notion of a myopic behavior of private agents and governments is provided.
3. The Basic Theorem
In this Section a constructive proof of the basic theorem is given.
Theorem 1. Let us assume that, in each node i and for each institution pl, utilities u(i,j,p% ),j = 1, 2,...,n for alternative actions of a private agent are different: u(i,ji ,p%) = u(i, j2 ,p%) if ji = j2. Then if the agents are myopic the government is able to create a consistent institutional system P.
Proof. In each node i a unique response of the private agent corresponds to each institution p%. For a node i denote N the set of arcs (i,j) chosen by the agent under any institutions P. Then N = N (J ...Nn is the set of all arcs (i,j) chosen by the agent under any institutional systems P. To each arc s = (i,j) € N an institution p1 (s)corresponds under which the arc is chosen. If the agent chooses an arc s under several different institutions p% then let p%(s) be an institution which provides the maximum utility v(i,i,pr) to the government.
As a result, a subgraph (M,N) is constructed, to each arc s = (i,j) of which an institution p1 (s) corresponds. (The subgraph may consist of more than one connected components). Given discount factor ft <G (0,1) a family of dynamic programming problems
max E u(it, it+i, p )
P t=o
with different initial states i0 = i is defined. Denote these problems by Sp(i) and their values by Vp (i). The value function satisfies a recurrent relation of the dynamic programming (a Bellman equation)
Vp(i) = max {u(i,j)+ ftVp(j)}.
j=i ,...,n
For each node i an arc (i,j(i) exists for which the maximum is achieved:
Vp(i) = {u(i,j(i)) + f3Vp(j(i))}.
Thus a policy function j(i) is defined (possibly not in a unique way) which shows what action of the private agent in the node i is the most desirable for the government. Fixing the policy function j(i) the government establishes an institutional system
P=(p1(lJ(l)),...,pn(nJ(n))).
If horizon T is sufficiently large this institutional system provides the maximum discounted utility for the government independently on initial state i0. □
4. An Almost Myopic Government
It was already said that a myopic behavior can be defined in two ways: an agent possesses either a zero horizon or a zero discount factor. The first definition seems to be more natural from the point of view of the game theory. However this definition allows for a big degree of indeterminacy in the agent’s choice. For example, for the following utilities matrix
/2 5 4 3 2\
31323 U= 35415
2 5 15 3 51324
a choice of an agent with a zero horizon is indeterminate in each of the nodes 2, 3, 5. Thus the agent with a zero horizon faces a risk of a disadvantageous choice. For example, if the agent makes her choice in the node i = 4 then the nodes j = 2 and j = 4 are equally attractive for her in the moment, however the long-run consequences of the choice are different, despite the agent (at the present time) may be not interested in studying these consequences.
The second definition of a myopia (a zero discount factor) leads to a concept of an almost myopic agent trying to diminish this risk. A zero discount factor can be treated as a limit of a sequence of diminishing positive discount factors. This approach provides a solution not only for the myopic agent but for an agent with a small discount factor as well.
We assumed above that the private agents are absolutely myopic, however the government can be almost myopic having a small discount factor. Below in this Section we show how the dynamic programming problem described in the previous Section can be solved for an almost myopic government. The argument P or pl will be omitted for convenience.
Let us call lexicographic maximal (l.m.) such a path {io,ii,...} that for any path {i0 = i0,ii,...} the following takes place: if v(it,it+i > v(it,it+i) for an index t then such an index k < t exists for which v(ik,ik+i) > v(ik,ik+i). Notice that any l.m. path {it} is stepwise optimal, i. e.
it+i € Arg max v(i, j), t = 0,1,...
The inverse is true if all the sets
Argmaxv(i, j),i £ M
are one-element.
Theorem 2. There exists a number /3 € (0,1) such that for all discount factors /3 € (0, /3) solutions of the problems Sp(i),i £ M are l.m. paths and only they.
Proof. For an initial node i0 consider an l.m. path t = {i0,ii,...} and an arbitrary path t = {i0 = i0,ii,...} which is not l.m. Then
v(is,is + i) = v(is,is + i)
for some s > 0,
v(it, it+i) = v(it,it+i)
for 0 < t < s if s = 0.
Hence
v(is , is+i) > v(is,is + i),
OO OO
^fttv(it,it+i) - ^2 fttv(it,it+i)
t=0 t=0
= {v(is,is + i) v(is ,is+i) + ft^t=0ft [v(is+i + t,is+2+t) v(is+i + t ,is+2+t)]}
where
A = min (v(i,j) — v(k,1)) > 0,
i,j,fc,l:v(i,j)>v(fc,l)
B = min[v(i,j) — v(k,1)] = min v(i,j) — max v(k,1) < 0.
¿,jEa(i) fc,lEa(fc)
If ft < A/(A — B) then A + ftB/(l — ft) > 0 and hence
OO OO
Efttv(*t,*t+1) > y^fttv(it,it+i), t=0 t=0
thereby t is a solution of the problem (i0), and r is not. □
Example 1. For the utility matrix U pointed out above we have A = 1,B = —4, A/(A — B) = 0.2. The value ft = 0.2 can be used.
Now let us describe an algorithm for constructing all l.m. paths. An example of its application is provided below. On the first iteration of the algorithm for each node i € M a number
a = max v(i, j)
i=1,...,n
is calculated and a set of ’’successors” of the node i is found:
A(1) (i) = Arg max v(i,j).
j=1,...,n
Let h(1) be a number of nonrecurring values among ai;i € M. The set of the nodes M is decomposed into disjoint subsets (classes)
Hs(1),s = 1,...,h(1)
in such a way that nodes with equal values a are located in the same class. The classes are numerated in ascending order of the values a. The order number r(1) (i) of the class containing the node i is called a rating of the node i.
On the k-th iteration k > 1 the sets of successors and the ratings received on the previous iteration are modified in the following way. For i = 1,..., n the value
r(k-1) = max r(k-1) (j)
(i)
is calculated and it is set that
A(k) (i) = Arg max r(k-1) (j).
jeA(k-1) (j)
If a class 1),s = 1,...,h(k-1) contains more than one element and the values r{k\i),i £ Hgk 1') differ, then this class is decomposed into new classes which are ordered in ascending order of the values r(i) . All the classes (both those conserved unchanged and newly created) are numerated anew, and after that to each node i € Hjfc) a rating r(k) (i) = s is assigned.
The work of the algorithm is finished when all the sets A(k)(i),i = 1,...,n become one-element or when all the sets A(k)(i),i = 1,...,n and Hs(fc),s = 1,...,h(k) stabilize. Not more than n iterations are needed for the stabilization.
Notice that on the k-th iteration the set of successors A(k)(i) is a set of nodes which immediately follow the node i0 = i on k-step l.m. paths. The ratings r(k) (i) provide a possibility to compare among themselves k-step l.m. paths initiating in different nodes.
Example 2. Let us construct l.m. paths for the case of the utility matrix U given above in the beginning of the Section.
1-st iteration:
A(1) (1) = {2},r(1)(2) = 1,
A(1)(2) = {1,3, 5},r(1)(1) = r(1)(3) = r(1)(4) = r(1)(5) = 2,
A(1) (4) = {2,4},
A(1) (5) = {1}.
2-nd iteration: r<2>(l) = 1,A(2 r^( 2) = 2,A(2 r^(3) = 2,A(2 r(2)( 4) = 2,A(2 r^( 5) = 2 ,A(2
3-rd iteration: r^(l) = 1,A(3 r^( 2) = 3,A<3 r^( 3) = 3 ,A(3 r^( 4) = 3,A<3 r^(5) = 2,A<3
4-th iteration: r<4>(l) = 1 ,A(4 r<4>( 2) =4,A(4 r(4)( 3) = 3,A(4 r(4)(4) =4,A(4 r<4>( 5) = 2,A(4
(1) = {2}, r(2) (2) = 1,
(2) = {1,3, 5},r(2)(1) = 2,
(3) = {5},r(2) (3) = r(2)(4)
(4) = {4},
(5) = {1}.
(1) = {2}, r(3) (2) = 1,
(2) = {3,5},r(3)(1) = 2,
(3) = {5}, r(3) (5) = 3,
(4) = {4},r(3) (3) = r(3)(4)
(5) = {1}. ,
r(2) (5) = 3,
(1) = {2}, r(4) (2) = 1,
(2) = {3}, r(4) (1) = 2,
(3) = {5}, r(4) (5) = 3,
(4) = {4}, r(4) (3) =4,
)(5) = {l},r(4)(4) = 5.
Thus l.m. paths go through the contours {1, 2, 3, 5, 1} and {4}.
An influence of the discount factor ft on the asymptotics of optimal paths in models with a continuous set of states was studied by (Deneckere and Pelican, 1986) and (Boldrin and Montucchio, 1986) who demonstrated a possibility of a complex
dynamics if the discount factor is between 0 and 1 but is far from the bounds of the
segment. For models with a finite set of states such kind of dynamics was studied by (Matveenko, 1998).
5. An Example of the Game
An example of our model arises as a version of a repeated Prisoner’s Dilemma where the first player establishes an institutional environment and then monitors the actions of the second player and, in fact, directs her behavior by use of the established institutions.
Example 3. The gains in the base game Prisoner’s Dilemma are taken from
(Mueller, 2003):
Strategies Not steal Steal
Not steal (S) Steal(T) (10,9) (12,6) (7,11) (8,8)
The first player commits to execute ’governmental’ functions and announces her further actions (i.e. an institutional system). Now it seems proper not to use terms ’not steal’ and ’steal’ in connection with the first player (the government) but to speak about her soft (S) and tough (T) actions. Notice that a resulting iterated game is not a game with simultaneous moves: the government answers the actions of the private agents but commits her reaction in advance.
4
The first player has 16 strategies (versions of an institutional system P). Three of them deserve a special attention. The first two institutional systems (Figures 1,2) are equally perfect, from the point of view of the government, if the agents are patient, what means here 2/3 < /3 < 1 . The institutional system represented in Fig. 1 is a Tit-for-Tat or a grim strategy (it suits also for a standard simultaneous moves version of an iterated Prisoner’s Dilemma).
S 7,11
T 12, 6
Fig.1. A Tit-for-Tat strategy of the first player and corresponding gains of the players. Under this institutional system the best way of behavior for a patient second player is not to steal, however an inpatient second player will steal.
Te,6
T 12, 6
Fig.2. A ’tougher’ strategy of the first player. If private agents are patient this institutional system provides the same gains to the players as the Tit-for Tat strategy. An impatient agent will continue to still if she steals initially.
It is easy to see that under the Tit-for-Tat strategy an impatient agent with a discount factor 0 < 3 < 2/3 will continue to steal if she stole initially and will alternate stealing/not stealing if she did not steal initially. Under the ’tougher’ strategy (Fig.2) an inpatient agent will not steal if she did not steal initially, however she will continue to steal if she did initially. In case of an impatient agent the best institutional system, from the point of view of the government, is shown in Fig. 3.
T 8.8
S 10,9
Fig.3. An institutional system which is the best, from the point of view of the government, in case of impatient private agents.
Now let us assume that the private agents are myopic and hence looking for their paths stepwise. The government can solve the game by use of the method described in Section 3. On the first step, a correspondence is constructed between arcs s = (i,j) <G N and institutions pl(s) under which the arcs are chosen by private agents. Notice that instead of 16 alternative full institutional systems P we deal now only with 4 institutions p1 in node i = ” not steal’ and 4 institutions in node ” steal1.
Fig.4. Four possible institutions in node ” not steal” and gains of the second player.
In node ” not steal” (see Fig. 4) the private agent will continue not to steal if the government plays S in response to non-stealing and T in response to stealing. Thus institution (S,T) corresponds to the arc (not steal, not steal). Under each of the three other institutions the private agent will steal. In this case the best play for the government is (T,T). Thus institution (T,T) corresponds to the arc (not steal, steal).
In node ” steal ”, in a similar way, the following correspondence between arcs and institutions takes place. The arc (steal, not steal) corresponds to (S,T) played by the government in response to non-stealing and stealing correspondingly, and the arc (steal, steal) corresponds to (T,T).
On the second step, a dynamic programming problem is being solved with a matrix of gains of the first player corresponding to the institutions p% (s) found on the first step (see Fig. 5).
Fig.5. The dynamic programming problem of the first player. It is easy to calculate the value function for our example:
V(Notsteal) = --------,
V(Steal) = ■
In more complex cases a value function for such a problem can be often received as a result of a recurrent process of multiplication
xt+1 = ,t = 0,1,...
where x0 is an arbitrary n-dimensional initial vector with positive elements, n is the number of nodes in the graph, A is the matrix of gains of the first player, in our case
A=(121).
and the multiplication of a matrix and a vector is defined in a natural way using elementary operations a ®b = a + ftb,a©b = max{a, b}. A justification of the method and a discussion of conditions for its applicability are provided in (Matveenko, 1998).
The value function allows to find a policy function, i.e. to identify in all nodes desirable (for the first player) actions of the second player. For our example (see Figure 5 ) these actions are (not steal, not steal) and (steal, not steal).
On the third step the policy function is used to select corresponding institutions from the set of institutions found on the first step. As we could expect we come to the third institutional system described above in Fig. 3 which, as we remember, was suitable for impatient private agents.
6. Some Applications of the Model
Not only a Prisoner’s Dilemma but some other kinds of iterated games (Snowdrift, etc.) can be studied by use of our model. Practical applications of such games are numerous: race of arms, corruption and anti-corruption, power safety, etc. Let us consider an example of a game of power safety.
Example 4- The second player - industrial countries - has two strategies: to increase or not to increase oil consumption. The first player - oil producing countries - can restrict or not restrict their oil production:
Strategies Increase consumption Not increase consumptions
Restrict production (5,-1) (1,1)
Not restrict production (3,3) (2,2)
OPEC usually uses a Tit-for-Tat strategy: it restricts production if buyers increase demand and does not restrict production if the demand is stable. Average gains of players without discounting are 2 and 2. Russian government proposed a non-restricting concept of a power safety promising an increase in oil production in response to demand. Average gains in this case are higher: 3 and 3.
An important for Russia example of our model is a game of the government and researchers (see also (Kraynov and Matveenko, 2007a, Kraynov and Matveenko, 2007b )). It is known that Science and R&D sector in Russia is financed mostly by the government and is very ineffective in a practical sense. Nobel laureate in physics Jorez Alferov formulated it in such way: ”Our science is first class, not business class”. In particular, in 2003 the state sector of science included 463 institutes of the Russian Academy of Sciences. Russian government started a reform of the Russian science. They picked out several priority directions, such as nanotechnologies and try to create an institutional system forcing researchers to act in a way leading to
developing the priorities. The question is: can such a reform be successful or not? In our model the nodes of the graph can be interpreted as different subjects (themes) of research and arcs as different ways to use results for a new research. Our model answers that if the researchers are myopic in their preferences then in principle the government is able to create a consistent institutional system to manage the science. The assumption of a myopia of the researchers seems to be rather likely in the present time. However, if the discount factor (or the horizon) of the researchers increase, the system of management will stop working.
Another, more general, conclusion of the model is that if a government wants to create a consistent institutional system it will try to use all possibilities (such as media, school education) to create myopic citizens. In history we can find a lot of confirmations of this thesis.
References
Boldrin, M. and L. Montrucchio (1986). On the indeterminacy of capital accumulation paths.. Journal of Economic Theory, 40, 26-39.
Deneckere, E. and S. Pelican (1986). Competitive chaos. Journal of Economic Theory, 40, 13-25.
Galbraith, G. K. (1983). The anatomy of power. Houghton Mifflin: Boston.
Kraynov, D. E. and V. D. Matveenko (2007a). A game model of interaction between the government and the science sector. In: V Moscow International Conference on Operations Research (ORM2007) dedicated to the outstanding Russian scientist Nikita N. Moiseev 90th birthday. Proceedings, 276-278. MAXPress: Moscow (in Russian). Kraynov, D. E. and V. D. Matveenko (2007b). Modeling of government’s control of research and innovation activities in the Russian economy. In: 9th International Conference ’’Public Sector Transition: New Quality of Management”. Conference Papers, Vol. 2, 105-149. St. Petersburg State University: St. Petersburg (in Russian).
Matveenko, V.D. (1990). Optimal, trajectories of the scheme of dynamic programming and extremal powers of nonnegative matrices. Diskretnaya Matematika, 2, 59-71 (in Russian).
Matveenko, V.D. (1998). The structure of optimal trajectories of a discrete deterministic scheme with discounting. Diskretnaya Matematika. English version: Discrete Mathematics and Applications, 8, 637-651.
Mueller, D. (2003). Public choice III. Cambridge University Press: North Cambridge. North, D. (1990). Institutions, institutional change and economic performance. Cambridge University Press: North Cambridge.