Controllable Queueing Systems: From the Very Beginning
up to Nowadays1
V. Rykov
Doctor of Sci, Professor, Department of Applied Mathematics & Computer Modeling, Gubkin Russian State Oil & Gas University, Leninsky Prospect, 65,
119991 Moscow, Russia. e-mail: vladimir rykov@mail.ru
Abstract
The present paper represents a review of the Controllable Queueing Systems theory development from the very beginning up to nowadays. The main stages of this theory development are considered. Some new problems are mentioned. The review is devoted to those, who are interested in the creation and the development of the Controllable Queueing Systems theory from its generation up to nowadays, and who want to understand the tendency of its development and the new directions and problems of its study.
Keywords: Controllable queueing systems, Markov decision processes, Optimality principle, monotonicity of optimal policies
1 Introduction
The theory of Controllable Queueing Systems (CQS) is a special direction of investigations of a general theory of controllable stochastic processes from one side, and of a Queueing Theory (QT) from another side. The theory of controllable stochastic processes is a special topic, which we will not touch here, and will fix on CQS. Some papers devoted to the problems of Queueing Systems (QS) control have been arisen almost simultaneously with the first works about QS, but the special approach to CQS has been done by Rykov [48] in 1975. Several monographs devoted to this problem arisen thereafter [25, 26, 31, 64, 67].
The preliminary results of this theory development one can find in [48]. In this paper, we concentrate our attention on the new results and approaches in this theory, nevertheless some initial principal results, on which the theory is based also should be reminded a little bit.
The paper is organized as follows.
In the next section the definition of CQS will be done and some examples of CQS will be proposed. The elements of the theory of Discrete Time Controllable Semi-regenerative Processes (DTCSRP), which serves as a base for the CQS study, will be considered then. The optimality principle for CQS as a result of this theory development and the problems of the real optimal rules for CQS calculation including numerical methods of the optimal policies calculation will be discussed in the next two sections. The qualitative properties of the optimal policies in this section also considered. Sections 5 - 8 are devoted to special CQS. At least in the Conclusion some new problems, approaches and new problems settings will be discussed.
1 The publication was financially supported by the Ministry of Education and Science of the Russian Federation (the Agreement number 02.A03.21.0008) and by the Russian Foundation for Basic Research according to the research projects No. 17-07-00142 and No. 17-01-00633.
39
2 CQS. Definition and main properties 2.1 Definitions
In this review, we will use a little bit modified Kendall's system of notations for QS [29] and will consider QS as a mathematical object, consist of four components a\ f3\y \8, where
• a —input flow,
• 3 —service mechanism,
• Y — system structure,
• 8 —service discipline.
The symbols a and 3 take usual values M, GI etc., for Markov, recurrent and others for input flow, and distributions of recurrent service mechanism, the system structure Y consists of two numbers Y = (n, m), where the first symbol means the number of servers, and the second means size of the buffer, and it is omitted for buffer size equal to infinity. The last symbol 8 is used for service discipline and it is omitted for FIFO (first in first out) discipline.
Each of these components is also a complex mathematical object, which is usually studied in details in QT and is determined by some parameters. QS investigation usually could be divided into three directions:
• analysis that means the calculation of some output system Quality of service (QoS) characteristics for completely determined system;
• synthesis that means finding of some of the input characteristics in order to provide needed QoS indexes, and
• control that means operating of the system during its work with a goal of its behavior optimization with respect to given criteria.
In the review we focus on the last problems that is most important for applications. Based on the above, the following definition of a CQS is reasonable.
Definition 1 CQS is a QS, for which some parameters of its components admit dynamic variation during its operation. Naturally, this variation is admitted in some domain and serves to some goals, determined with some QoS functionals.
2.2 Classification of CQS
Accordingly to this definition CQS's can be classified as systems with:
• controllable input flow,
• controllable service mechanism,
• controllable system structure,
• controllable service discipline and
• complex CQS, for which some parameters of several components admit control.
For the control problem setting one needs in the goal of control and control rules.
2.3 Control times, goals and rules
Note firstly that in most practical situation the change of control (decision making) is possible only in special times, for example in times of customers arrival or their service completion.
We will call these times as control or decision times (DT). From another side the control problem solution is usually accompanied with the goal of control and besides of control parameters also depends on the admissible domain of their variation, rules of their use as well as of process observation possibility.
Concerning the goals of control they should be given by some control quality functionals optimization. It might be some QoS characteristics optimization in a steady state regime of the system operation or optimization (minimization or maximization) of a time of some state or set of states attainment. In some other cases the goal of control could be formulated as an optimization of some economic indexes, connected with the system operation. In the last case some structure of losses and rewards (Loss-Reward Structure—LRS) connected with the system operation should be done. Following to the tradition we will consider as an optimality criterion the minimization of some Loss Functional (LF). Losses or rewards could be connected both with the system stay in different states and with transition from one state to another. The LRS usually includes:
• random reward Rn of the system manager for the service of n -th customer with finite mean value m ( service cost),
• penalty Cw (l) for sojourn time of l customers in the system ( waiting or holding cost),
• cost Cu (ak ) for using k -th service mode ( using cost),
• penalty C (k, k') for switching from k -th service mode to the k' -th one ( switching
cost).
Using these data the LF should be constructed. The general form of the LF will be considered in the next section, and its special form jointly with concrete examples will be represented.
The control rules are usually determined with the help of control strategies, which define the manner to take the decisions by a Decision Maker (DM) and depends, generally speaking, on system behavior observability and can be realized in several ways:
• taking into account whole history of the process,
• taking into account only the system last state, or
• without any information;
• also the decision can be made randomly or not.
Specification of these strategies will be introduced in the section 3.2.2.
2.4 Examples
Consider some examples of CQS that will be studied in details later.
2.4.1 Arrival control
Consider a M/GI/1/8-QS with controllable input. The customers arrive accordingly to Poisson flow with intensity X and are served during random times that are i.i.d. with general Cumulative Distribution Function (CDF) B(t) with mean value mB = ju 1 and variance a2B. The LRS includes:
• the random reward R of the system manager from the service of n -th customer with finite mean value m (service cost),
• the linear penalty Cw (l) = Cwl for sojourn time of l customers in the system (waiting
cost),
The control times are the arrival times, and the decisions consist in the admission or rejection of an arriving customer into the system. The problem consists in the admission of customers in the
system organization aimed reward maximization (or loss minimization).
2.4.2 Control of service mechanism
Consider a MM/1/5 -QS with controllable service rate. Customers arrive accordingly to Poisson input with intensity X, and are served with exponentially distributed service time having one of finite number K values of parameter juk (k = 1,2, ...K). The LRS includes:
• the random cost Rn for the service of n -th customer with finite mean value mR (service cost),
• the linear penalty Cw (l) = Cwl for sojourn time of l customers in the system (waiting
cost),
• cost C (j) of using server in k -th regime with service rate j (using cost), and
• penalty C (U, Ui) for switching from k -th regime to the l -th one (switching cost).
Control times are both: arrival and service completion times, and the decision consist in the choice of the service regime (service rate) aiming at long run expected reward per unit of time maximization.
2.4.3 Control of system structure
Consider a M/GI/1/5 -QS with controllable system structure. Customers arrive accordingly to Poison input with intensity X, and are served accordingly to recurrent service mechanism with generally distributed with CDF B(t) service time. The LRS includes:
• penalties C for switching on a server, and C0 for its switching off,
• cost C of server using per unit of time (using cost),
• the linear penalty Cw (l) = Cwl for sojourn time of l customers in the system (waiting, cost).
Control times are both: arrival and service completion times, and the decision consists in possibility to switch on a server in customer arrival time and switch off it in service completion time aiming long ran expected loss per unit of time minimization.
2.4.4 Service discipline control
Consider s M^/GIN/1/ 5 -QS with several (N) types of customers and controllable service discipline among the priority disciplines. Inside the classes the customers are served accordingly to the FIFO discipline.
The customers arrive from Poison input with intensity Xi (i = 1, N) for i -th type of
customer, and are served accordingly to the recurrent mechanism with general CDF Bt (t) (i = 1, N) . The LRS includes:
• the linear penalty Ci (l) = CJ for sojourn time of l customers of i -th type in system (waiting cost).
The control consists in the choice of a customer for service in any service completion (decision) time, aiming the minimization of a long ran expected loss per unit of time.
Consider firstly the common model for investigation those and many others CQS. 3 Discrete time controllable semi-regenerative processes
As it has been mentioned above in the section 2.3 in the most practical situation the change of control (decision making) is possible only at special times, for example in times of customers arrival or their service completion. We will call these times as decision times (DT) and denote by {Sn, n = 0,1,., S0 =0} the sequence of decision times. Of course they are some measurable functionals of the process, describing the system behavior. For modelling CQS it is possible to use so called Discrete Time Controllable Semi-regenerative Process (DCSRP). Detailed information about these processes and its applications one can find in [31]. Remind here some needed definition and properties of these processes.
3.1 Semi-regenerative processes
3.1.1 Definitions and main properties
Semi-regenerative processes are some mixture of regenerative and semi-Markov processes. Several authors (G .Klimov, E .Nummelin, V. Rykov, M. Yastrebenetsky) introduced them under different names. Remind here its contemporary definition. Let
• X = {X(t),t e R} be a stochastic process with measurable state space (E,E),
• FtX = cr{X(v), v < t} be generated flow of a -algebras, and
• {SK ,n = 0,1,2,...} be a sequence of its Markov times with S0 =0.
7
Definition 2 (Rykov, Yastrebenetsky (1971)) A pair {X(t), Sn} is called a (homogeneous) Semi-Regenerative Process (SRPr), if for any subset rc E and for all n = 1,2,. takes place P{X(Sn +1) e r| Fs } = P{X(Sn +1) e r |X(Sn)} =
n
= P{X(Sj +1) er |X(Sj)}. (1)
Here
• r.v.'s SH are called Regeneration Times (RT's),
• intervals Tn = (Sn, Sn+1 ] and their lengths Tn = Sn+1 _ S are called Regeneration Periods (RP's),
• functional random elements Wn, where
W ={( X(Sn +1), Tn ),t < Tn, n = 1,2,..} are called Regeneration Cycles (RC), and
• random elements XK = X(Sn) are called Regeneration States (RS).
Remark 1 Intuitively clear that the SRP behavior is fully determined with its regeneration cycles W that form a Markov Chain in functional space, and under an additional condition of regularity any SRP
is reconstructed (up to equivalence) by it. However the real determination of a generator of Markov chain in functional space is not a simple procedure.
Therefore, we focus on the most important consideration of the one-dimensional characteristics of SRP. For this consider also some complementary processes
Y =(Xn,Tn), N(t) = max{n :Sn < t}, and Y(t) = X(SnW).
Theorem 3 (Jolkof, Rykov (1981) and Rykov (1997)) Let {(X(t), Sn ),t > 0,n = 1,B,...} is
an SRP. Then the sequences {Xn ,n = 1,B,...} and {Yn ,n = 1 ,B,..} are homogeneous Embedded Markov (EMCh) and Semi-Markov chains (SMCh), and {N(t),t > 0} is a Markov Renewal Process (MRP) while Y(t) is a semi-Markov process (SMP).
Proof of the theorem can be found in the remind papers (see also [20, 45] for generalization). Denote
• the Transition Matrix (TM) of MCh {Xn,n = 1,B,..} and Semi-Markov Matrix (SMM) of SMCh {Yn,n = 1,B,...} as
P( x, y) = P{Xn+1= y\Xn = x},
Q(x, t, y) = P{Xn+l = y, Tn+l < t\Xn = x};
• one-dimensional SRP distribution of separate regeneration periods ( SRP transition function) by
$(x,t,r) = P{X(Sn +1) e B},t < Tn \X(Sn) = x}, n > 1;
• one-dimensional SRP distribution given an initial state x by
7T(x, t, B) = P{X(t) e B\ X(0) = x} = P {X(t) e B};
• Markov Renewal Matrix (MRM) [Korolyuk, Turbin (1972), Jolkof, Rykov (1981)] and [Rykov (1997)] by
K (x, t, y) = Mx Y}mt x y )}(Sn, Xn).
n>0
The behavior of a SRP {(X(t),S„ ),t > 0,n = 1,B...} does not fully determined by its
transition function. But many useful properties and characteristics of SRP can be represented in terms of appropriate characteristics at its separate cycle and its MRM. Especially, for a most interesting in practice one-dimensional distributions of a SRP, the following theorem can be proved with the help of complete probability formulae:
Theorem 4 One dimensional distributions of SRP satisfy to the following relations
t
tt( x, t, B) = 0( x, t, B) + x, du, y)x( y, t - u, B0, (2)
yeE 0
t
tt( x, t, B) = 0( x, t, B) + ^JK (x, du, y)0( y, t - u, B)= (3)
yeE 0
= x, t, B) + K*0(t, B), (4)
where * denotes the matrix-functional convolution..
Remark 2 One can see that the equality (4) is the solution of the Markov renewal equation (2). 3.1.2 Renewal, Limit and Ergodic Theorems
A well- known Key Renewal, Limit and Ergodic Theorems are generalized for SRP's as follows. Theorem 5 Key renewal theorem, [Rykov, Yastrebenetsky (1971), Jolkoff, Rykov (1981)] Under
44
usual for the renewal theorem conditions, the following limiting formula holds
t
lim Kkg(t, x) = lim (x, du, y)g(y, t - u)dt =
f_f_• _
t ^ 0 yes
TO
-1
JY^ ( y) g ( y, t )dt,
= m
0 yeE
where ft = {/t(x),x e E} is the invariant distribution of the EMCh and
iY/r (x)Q(x, [5,to), E )ds
m =
0 xeE
is a stationary mean of RP length.
The last statement provides the calculation of SRP stationary probability distribution in terms of its distributions at separate RP's and invariant measure of EMCh.
Theorem 6 Limit theorem Under some addition assumption of uniform regularity, SRP with ergodic (positively recurrent) EMCh steady state probabilities does exist and is equal to
to
z(r) = lim /(x, t, r) = m 1jY«(x)^(x, t, r)dt (5)
t^TO o xeE
Theorem 7 Ergodic theorem For uniformly regular SRP with ergodic (positively recurrent)
EMCh,
i
t M Jg ( X (u))du
limit Jg ( X (u))du =--
0 ft 1
= m1Y z (y) g (x)q(xX (6)
xeE
to
where q(x) = J(1 — Q(x, u, E)du is the mean time of the semi-Markov process Y(t) staying in the
0
state x .
The proof of these theorems one can find in [20, 22, 45, 61].
3.2 Discrete time controllable SRP
3.2.1 Definition
In controllable stochastic processes usually two factors should be considered: actions of Nature and will of the Decision Maker (DM). As a system behavior modeling process, we consider the SRP and suppose that the decision times coincide with the regeneration times that leads to the definition of a discrete time controllable process (DTCSRP).
Definition 8 DTCSRP is a triple {(X(t), S„, Un ),n = 1,2,...}, where {(X (t), Sn ),n = 1,2,..} is a SRP for which Regeneration Times (RT's) {S„ } are also Decision Times (DT's), and Un denotes the decision in time Sn. As before for DTSRP the intervals Tn = (S„, Sn+1 ] and their lengths Tn = S„+1 — S„ are called Regeneration Periods (RP's).
45
Many concrete CQS could be modeled with DTCSRP, including those that has been proposed in the section 2.4. We will return to them in the sections 5 - 8. See also [31].
As for usual SRP the main role for DTSRP play the controllable embedded Markov (CMCh) {(Xn,Un),n = 1,B,...} and semi-Markov {(Yn,Un),n = 1,B,.--} (CSMCH) chains. Its family (with respect to the decision a e A ) of transition matrices (described the Nature actions) is denoted as P(x,y;a) = P{X(Sn) = y \ X(S^) = x,U(Sn) = a}, Q(x,t,y;a) = P{X(Sn) = y,Tn < t \ X(Sn_Y) = x,U(Sn) = a}.
Remark 3 Besides these two processes one could consider also controllable functional random elements {(Wn,Un),n = 1,B,...} where Wn = {(X(Sn +1),Tn),t < Tn,n = 1,B,...} is a Regeneration Cycle (RC), and U is the appropriate decision in time S .
The behavior of the controllable process determines not only its transition probabilities but also by the controller or Decision Maker (DM). The control rules are determined by the strategies.
3.2.2 Strategies
Definition 9 The manner of decision making is called strategy.
As it was mentioned in the section 2.3, the strategy depends on system behavior, observability and can be realized by several ways. In order to formalize mentioned there possibilities we need to consider the history of the DTCSRP. Denoting the decision at the time S by U and putting T = S — S„-i the random history of the process up to time Sn+l can be represented as a sequence
Hn ={X0,U0,T, X1,.Un—1,Tn, Xn} and their realization with appropriate small letters.
A trajectory of DTCSRP is presented in the diagram (-25, 10) X0
The mostly general decision rule in the case of fully observable controllable process trajectory is determined by the random measurable strategy
8 = {dt(u, \ h),i = 0,1,..}, where di (ut \ ht) is a distribution of the DTCSRP decisions admissible at the trajectory. This class of
strategies is denoted by A. Beside this class, there exists another classes, and a mostly popular one is a class of simple Markov strategies, for which decisions depends only on the last state of the process and appropriate distribution is degenerated one at some decision
d, (u, \ h) = e(f (x,))
where x; is the last state of the trajectory ht and f (xt) = u * determines an optimal non-randomized decision in the state x . The appropriate strategy can be represented as 8 = {f, f ,.•• f ,.••} = f ^, and appropriate class of simple Markov strategies as A.
Suppose, for simplicity that the initial time S = 0 is a decision and regeneration time. Then, with an initial distribution a of the process and given strategy 8 on the set of process trajectories, there exists some probability measure P8 . Appropriate to this measure expectation will
be denoted by Esa . For degenerated at the state x initial distribution a = s(x) appropriate
46
Rykov, V.
CONTROLLABLE QUEUEING SYSTEMS probability and expectation will be denoted as P^ and Esx , respectively.
3.2.3 Quality functional and optimization problem
The loss functional associated with the system RLS will be denoted as Z(t) and it is supposed that it can be represented in the form
Z(t) = ZliVt} Z„ (t - S„) (7)
n>0
where Zn (t) represents the appropriate loss functional resulting only from the decision time Sn and which does not depend on another decisions. It is calculated based on the LRS for the concrete models.
There are several approaches for control of the process. The most popular are two of them. • Expected discounted loss minimization,
to
w(x, S) = Esx^e~stZ(t)dt ^ infinum; (8)
0
• Expected long run loss minimization,
g(x, S) = lim ItESZ(t) ^ infinum. (9)
t^TO
For both cases the optimal strategy S is that, for which
w( x, S*) = inf {w(x,S),S e A} = w(x), (10)
g( x, S*) = inf {g (x, S),SeA} = g( x). (11)
3.2.4 Optimality equations
One of the main results of the Markov Decision Processes theory, which also holds for DTCSRP is (see [31]) that the functions (10) satisfy to so called optimality, or Bellman equations that have the forms:
• For the discounted loss minimization
w(x) = inf <! ~(x, a) + x, a, y)w(y) L (12)
aeAx [ yeE J
where ~(x, a) is one step discounted expected lost function with the initial state x and decision a,
TO
~(x, a) = e: \e-stZn (t); 0
and q~(x, a, y) is a probability generating function of a RP (inter decision times) under decision a for initial state x at given DT S ,
q( x : y) = ¡e-tQx, y (dt;a).
• For long run criterion optimization, the optimality equation besides the model price g (x) includes also so called value function v = {v( x), x e E},
0
v(x) = inf j c(x, a) - g(x)m(x, a) + ^P(x, a, y)v(y) L (13)
aeAx { yeE J
where c(x, a) is a one step expected lost function with an initial state x and decision a,
c( x, a) = EZ (<x>),
and P(x, a, y) is a transition probability during a RP (inter decision times) under decision a for initial state x at given DT ^,
P( x a, y) = Qx y (^;a).
For given functions Wx) or g(x) and v(x), the right side hand of the equations (12, 13) are known as a Bellman function b(x, a) . Another name of these equations is the Bellman equations, which in general case could be represented as
v(x)=inf b(x, a). (14)
aeA
4 Optimal strategy construction
4.1 Optimality Principle
The main result of the DTCSRP theory, as well as the theory of DMP, consists in the validity of optimality principle in the framework of some assumptions that takes place for mostly applicable situations. In [31] it was shown that under enough reasonable from application point of view conditions for DTCSRP, the optimality principle holds. i.e. there exists a simple Markov optimal strategy, and appropriate policies can be found from optimality equations. The optimality principle means that:
• an optimal strategy exists and belongs to the class of simple Markov strategies, therefore it is determined by the policy f = {f (x : x e E), and
• it could be found as the solution of optimality equation
f (x) = argmin{b(x,a):a e A(x)}.
As a result of these investigations the problem of QS control is reduced to the problems of
• solution of Bellman equation (14) and
• the Bellman function b(x, a) minimization with respect to a e A(x).
4.2 Numerical methods
There are two main methods for the solution of Bellman equation
• iteration algorithm due to Howard [28]
• linear programming algorithm due to Wolf and Danzig .
4.2.1 Iteration algorithm
The iteration algorithm has been proposed firstly by R.A.Howard for MDP [28]. For both discounted and long run cost minimization, it consists of two procedures. The algorithm
• Beginning: Choose some (for example, one-step optimal policy f0)
• Policy evaluation: For a given policy fk , solve the Bellman equations (12) or (13) in
order to find value functions wk = [wk (x), x e E} or g^ = {gk (x), x e E} and vk ={vk (x\ x e E
• Policy improvement: For the given value functions wk or gj. and vk , find for each state x the decision fk+j (x) that minimizes the value of Bellman function
fk+i(x) = argmin{b(x,a;vk) :a e A(x)}. Construct the new policy f+1 = {fk+j(x); x e E}, leaving in each state x the previous decision f (x) if it coincides with the new one: fk+x (x) = f (x) .
• End: Compare two successive policies f+1 and fk . If they coincide STOP and the last policy is optimal, if no go to the step Policy evaluation.
It has been proved in [28] that at each step k of the algorithm the value functions wk or gj.
and vk
do not increase and therefore for the case of the decision process with finite number of states, the algorithm stops for the finite number of steps, and for the case of system with denumerable states space the algorithm converges.
There are different improvements and specification of the algorithm (see, for example, Puterman [46], Rykov [49], and others.
4.2.2 Linear programming algorithm
Consider firstly LPA for the discounted losses model. For an event
Bn (x, u) = {..., X (Sn ) = x,U (Sn ) = u,...}
denote by 7Tn (x, u) the quantity
TTn(x u) = KesSn l{Bn (xu)}
for simple Markov strategy S = fw and for initial process distribution / . Than the functional (8) with the process initial distribution / due to (7) can be represented as follows
w/S) = El\e-stZ (t )dt =
0
w
= K\e-St EE E l{Bn(x,u),Sn*} Zn (t - Sn )dt =
0 n>0 xeEueA(x)
w
= KYE E e~SS" l{Bn(xu}E\e-Zn(v)dv =
n>0 xeEueA( x) 0
=EE E *n(x u)~( xu) (15)
where c( x, u) = E" jVsvZ„ (v)dv
n
n>0 xeEueA( x)
w
u f -svr
x Je ^ 0
Using the backward Kolmogorov equation for DTCSRP, one can show that the function TTn (x, u) satisfies the equation
f /(y) for n = 0
E*n(y,u) = ^EE~-i(x,u)q(x,u,y) forn > 0, (16)
xeE ueAx
If we introduce the variables x, u) — Y >o~»(x,u) ' then the problem of the discounted cost minimization can be represented as
Y Y x, u)^(x, u) ^ infinum (17)
xeEueA( x)
under the restrictions that arise from the equations (16)
Y Y y " ~(x, u, y))^(X u) — M(yX y e E, (18)
x, y
xeEueA( x)
and
Y Y^(x, u) — 1 + % (s)(l + t2 (s)(l + ...)..., 4(x, u) > 0. (19)
The connection of the Linear Program solution and optimal control policy to an appropriate DTCSRP is contained in the following theorem:
Theorem 10 (Wolf Danzig) For DTSRP with any simple Markov strategy 8 = f" the variables x, u) of the Linear Program (17-rest-lp-discopt-2) have the following property: £( x, u) > 0 only for one, say u (x) value u e A(x) for any x e E, being x, u) = 0 for all another u e A(x) and
vice versa. To to any solution y of Linear Program with this property corresponds a simple Markov strategy f = {f (x) = u(x), x e E} •
Let us turn now to the problem of long run loss minimization. The Linear Programming Algorithm (LPA) for the criteria of long run expected loss minimization looks like a little bit complicated. For any simple Markov strategy 8 = f", denote by n(8) and H(8) the limiting and the fundamental matrices of the embedded Markov chain (MC) X = {Xn },
n(8) = 1n Y pk(8) = [z(x,y;f)],H(8) = (I -P(8) + n(8))-1 = [h(x,y;f)].
0<k<n-1
The LPA for long run expected loss minimization looks like the following
Y Y c(x, u)%(x, u) ^ infinum (20)
xeEueA( x)
with respect to variables
£(y, u) = Y"(x)^(x, y; 8)d(u | y), and tj(y, u) = Y"(x)h(x, y; 8)d(u | y).
xeE xeE
under the restrictions
Y Y (8x, y- p( x u y)^( x, u)=0
xeEueA( x)
Y Y ^(x,u)+Y Y (8x, y- p(xu, y)v( x u)=M y) v ye E. (21)
xeEueA( x) xeEueA( x)
Connections of the LPA solution to optimal policy is established with the help of Wolf and Danzig theorem.
Remark 4 Additional attractive feature of the LPA consists in the possibility to use the dual linear programming algorithms for constructing of the optimality domains of some given simple control rules, as it will be shown in the section 82
4.3 Qualitative properties of optimal policies
The knowledge of some qualitative properties of optimal policies allows significantly simplify their calculation. For example, for monotone policies it is enough to find only the levels for
xeEueA
x
switching the policy from one regime to another. The optimality principle validation allows to investigate some qualitative properties of optimal policies, for example its monotonicity. Because the optimal policy is the optimizer of the Bellman function b(x, u) ,
f (x) = argmin{b(x,u) :u e A(x)}, one can investigate the conditions for it, which allow to provide the monotonicity of the optimal
policy f = {f(x) : x e E} .
Because for CQS the states and decision sets E and A are usually multi-dimensional, the problem of the optimal policy monotonicity investigation consists in finding the conditions for monotonicity of the multidimensional optimization problem solution. These conditions usually have a form of sub- or super-modularity for the optimizing function [69]. For the special case of QS optimal control an appropriate condition has been proposed by Rykov in [53]. In order to explain the optimal policy monotonicity conditions, consider firstly the problem of a smooth function b(x, a) minimization in an enough good domain G ^ E x A.
min {b( x, a): a e A } (22)
The necessary condition of a local minimum is
ba (x, a) = 0, baa(x, a) > 0, (23)
where the notations are used:
ba x a)= , Ka {x, a) = , bax(x, a) = .
a v 7 s 7 aa \ 7 / ax v 7 /
oa (da) oaox
Therefore the problem solution is a function a = f (x) from the first equation of (23). Its derivative can be found from the equation
KAx a) + baa(x a)f Xx) = 0
or
f'(x) = - bax(^ a)
baa (x a)
Thus, the monotonicity condition of the smooth optimization problem can be formulated as a theorem
Theorem 11 , If the minimizer f (x) of a smooth function b(x, a) minimization is monotone, then bax (.,.) preserve the sign along this solution. At that it is non-positive for non-decreasing solution f (x) and it is non-negative for non non-increasing solution. The inverse conversation also holds: if the second mixed derivative of smooth function b(.,.) in enough good domain G preserves the sign, then the solution of the minimization problem (22) can be chosen monotone.
It is necessary to note that because the solution of problem (22) may be non-unique, the monotone choice should be used.
For the discrete problem of minimization, appropriate conditions are almost similar, however instead of the second mixed derivative now the second difference of the Bellman function with respect to variables of state and control spaces is used. If it will be denoted the same as before by , b(x, a) the conditions of the discrete monotonicity problem look like in the previous theorem. For detailed formulation and proof see [53].
It succeeds not too often to prove the monotonicity property of an optimal policy. Nevertheless, because for queueing systems monotonicity leads to the threshold type of strategy, and the attraction of this kind simple strategies and the simplicity of their realization leads to many investigations of CQS under strategies with monotone (threshold) policies. In a series of papers of Dudin and his colleagues [11] - [16], the systems with multi-threshold control policies for QS with Markov Arrival (MAP) and Batch Markov Arrival Processes (BMAP) have been investigated. The
numerical investigation of optimal control of a multi-server system with heterogeneous servers has been done in the paper [17]. In Rykov & Efrosinin [57], the optimal control policy of systems on their lifetime has been found in the class of threshold policies.
In the next sections we demonstrate the above considered methods for some of CQS examples.
5 Arrival control
Control of arrivals to QS is a traditional problem in QT and has a long story. An excellent review of the earliest works about the problem one can find in Stidhan Jr. [65], to which we will follow in some parts of this section. Consider a model of arrival control M/GI/1, proposed in the example of the section 2.4.1. In this model the customers arrive accordingly to Poisson flow with intensity X and are served during random times that are i.i.d. with CDF B(t), mean value
mB = ju 1 and variance a2B. The LRS usually includes: (a) reward R for each served customer with mean value mR and (b) penalty Cw (l) for l customers staying in the system per unit of time (holding or waiting cost). The control times are the arrival times, and the decisions consist in the admission or rejection of an arriving customer into the system.
The problem consists in admission of the arriving to the system customers (jobs) for service. There are different possibilities for control of arrivals to QS: (a) static, or (b) dynamic for (c) singleserver, (d) multi-server, or (e) network queueing models.
5.1 Static flow control
5.1.1 Single-server static arrival control
In [65] a single-server static flow control model M GI 1 has been reviewed. For this model the DM admits each job with probability p and rejects it with additional probability 1 — p. Therefore the real admitted flow of jobs has an intensity Xp . Under this control rule for any control parameter p the system is a usual M/GI/1 QS with Poisson input intensity Xp and generally distributed service time B, B(t) = P{B < t} with mean service time b and variance a
b .
The problem consists in admission of jobs to the system in order to maximize the value of long run mean reward that due to proposed LRS has the form
XpmR — XpCww(p) under the condition 0 <Xp < b, (24)
where w(p) is a stationary customer sojourn time in the system. Therefore, for the considered system one has
Xpw( p) — l ( p) — Xpb +
Xp +a2)
B(1 — Xpb)
In terms of Xb = p and V = abbl the problem (24) can be represented in the form
XpmR - XpCww(p) — XmRp - cm
pp +
P2(1 + Vb2) p 2(1 -pp)
^ max
(25)
under the condition 0 < p < p 1.
This is a simple optimization problem that could be simply solved numerically. Moreover, the last equation allows to obtain some interesting theoretical results proposed in [65].
5.1.2 Multi-server static arrival control
In this case the r -server system MI GI r is considered. A Poisson input flow of customers with the intensity X arrives into the system . The customers are served by servers during random times with general distribution Bt (•) with mean value b- and variance J2 for i -th server (i = 1, r) . In order to provide the stationary regime's existence, it is supposed that bt < 1 and the
servers are ordered such that
bi <b2 <...<br.
The (static) control rule is determined by the probability vector p, p,..., p, where p is the
probability to admit the arrived job to the i -th server.
In this case in [65] the problem of the static optimization is considered with respect to minimization of summary number of customers in the queue
minimize ^l, (p) subject to ^p =1,0 < p < pjl,
1<i<r 1<i<r
where lt (pt) is the stationary number of customers in the buffer of i -th server, and p = Xp .
5.2 Dynamic arrival control
The dynamic arrival control of this model also in [65] has been considered. Some different generalization of the dynamic flow control model in Kitaev & Rykov (1995) [31] has been done.
The Bellman equation for the problem has been proposed and it was shown that the Bellman function satisfies to the theorem 11, from which it follows that the optimal policy belongs to the class of monotone polices and therefore the optimal strategy has a threshold property. Thus, there exists 1*
the threshold level, say l such that all arriving customers, which find in te
queue more than l another customers should be rejected. The optimal threshold level l can be found by investigation of the loss functional for the system MI GII(1, l) with finite buffer.
6 Service mechanism control
Control of the system service mechanism is the most frequently considered area of control. There are many diverse settings of the problem. The distinguish consists in input and service mechanism as well as in LRS.
One of the earliest work devoted to the service mechanism control has been proposed by Sabeti [62]. He considers the MI MI (1, n) system with controllable service rate without waiting cost
Cw and proves the monotonicity property of the optimal policy. For the MIGII(1, n) system, this result has been generalized in [63]. For closed queueing systems with controllable service rate the analogous results have been done in [7, 8]. More detailed review of the earliest works on the topic one can find in [9, 66, 48]. In the framework of DTCSRP, the system has been considered in [31]. The numerical investigation of the optimal service rate policies has been proposed in [73].
The hysteresis phenomena of optimal policy arise when the switching costs are taken into account. The MI MI (1, n) system, with n < », with controllable both input and service rate and a switching cost in [42] has been considered. As a control parameter the pair a = (X, ju) is considered and the total order of the control parameters is supposed to be: a < a' iff both X< X,j< jU. The control epochs for the model are both arrival and service completion times. The switching cost C (a, a') has been included into the model. The conditions that provide optimality of monotone
53
hysteresis policy are studied. One step optimization problem of the Bellman function for the same model was studied in [27]. The optimality of monotone hysteresis policy under a little bit more general conditions is proved in [50]. The same mode under more general conditions in the framework of DTCSRP has been considered in [31]. The optimality of monotone hysteresis policy has been established there.
Remark 5 This system also can be considered as a complex system, in which two parameters input and service rates are controllable•
The oore general case of a M/GI/(1, n) system with controllable service rate, however without switching cost] has been studied in [73]. Customers arrive with Poisson flow of intensity X and are served with one of finite numbers of rates (service modes) a e A = {ak :k = 0,1,B,... r}, such that the service time has a CDF
Bk (t) = B(t;ak) = B(akt), k = 0, r, where B(t) is a given CDF with finite mean m . As the control times (epochs) the service beginning times are supposed. At that the service delay it is convenient to consider as service mode with rate a = 0 . The control consists in choice in the control epochs, one of the service modes in order to maximize the discounted or long run expected reward or minimize appropriated loses. The monotonicity of the optimal policy has been studied numerically.
For multi-server systems the control rules consist usually in choice of servers for service. This setting is closed to the problems of the system structure control and will be considered in the next section.
7 System structure control
7.1 Servers switching on and off problem
The problems of system structure control are very closed to the service mechanism control and consist in switching on and off servers depending on the system state with the goal to optimize some given functional. Such type of systems have been considered by Heyman [24] and Deb [10].
Consider the system M/GI/1 with controllable system structure, which consists in the possibility to switch on and off the server. Customers arrive accordingly to Poison flow of the intensity X and are served during random service time with general CDF B(t) . Reward structure includes:
• penalties C and C0 for switching on and off the server;
• penalty c per unit of server operating time (using cost); and
• penalty cw (i) for waiting i customers per unit of time.
Control epochs are both the arrival and service completion times and the control consist in switching on and off the server with the goal long run average losses minimization.
The monotonicity of the optimal policy has been proved that leads to its threshold property.
7.2 Slow server problem
So called slow server problem also can be considered as a system structure control problem. The problem is the following. For the M/M/r QS with Poisson input intensity X and r
heterogeneous servers with service intensities j (i = 1, r), ordered such that
u> U2> . > U (26)
the conservative discipline, for which any idle server should be used, will be not expedient for example, with respect to minimization of the mean sojourn (holding) time (or mean number) of customers in the system. Therefore, the problem of optimal servers using arise.
The problem firstly has been considered for two servers in the static regime by B. Krishnamoorthy in [39], then also for two servers in the dynamic regime it is studied by Hajek,
Lin&Kumar and Koole in [23, 41, 37]. It was shown that the optimal rule for using of the slow server
*
has a threshold character and the optimal level of queue length q for using the slow server is that for which
q* = lu^1 ]+1-
For the system with several heterogeneous servers the analogous rule has been proved by Rykov in [54], see also [53]. For the problem investigation consider a Markov decision process with set of space
states E = {x = (q,dx,...dr)}, where q is the queue length and dk(k = 1,r) is an indicator of the
k -th server state: d^ =0 if the k th server free, and d^ =1 if it is busy. A system of sets
A( x) = {set of indexes of free serves in the state x}
will be used as action sets.
It was proved that the optimal servers using rule also have the threshold structure. For any system state x = (x,. • •, xr), there exists a threshold level q (x) such that the new server should be used iff only the queue length q is greater than q (x), q > q (x) . At that the server of the most intensity among the free servers should be switched on. However the levels of the servers switching on and off should be calculated numerically, and one example of this calculation one can be found in [17].
For the generalized setting of the problem, which include also the waiting cost c for any
customer waiting in queue per unit of time and the using servers cots c^ (k = 1, r) for k -th server
using per unit of time, the analogous optimal service rule has been found in [56, 55]. In this case the monotonicity of the optimal policy preserves if for servers ordering accordingly to (26) also the following condition holds:
c1-1u1> C2*U2> .•• >c;Ur. (27)
It should be noted that the optimal rule does not depend on the input intensity X and waiting cost c . Also the threshold levels q*(x) should be found numerically.
8 Service discipline control
One of the first investigations on CQS does not touch the delicate problems regarding the MDP and deals only with the optimization problems in the framework of simple priority systems.
8.1 Priority optimization
The problem of priority system optimization can be considered as the problem of service discipline optimization in small class of decision rules that does not depend on the process trajectory observation and on the system state.
One of the first papers, devoted to QS optimal control, was the problem of optimal priority assignment [2]. The problem is the following. Consider a single server Mr/GIr/1 queuing system
with r independent Poison inputs of intensity Xk ,(k = 1, r) . Random service time Bk of k -th type
customers has CDF Bk (t), (k = 1, r) with mean value bk = j1 and its waiting unit of time
penalties with ck units. The problem consists in choice of the service priorities in order to minimize the expected long run loses
t
Hm1tE J X ckLk (s)ds = Z cklk,
0 \<k<r 1<k<r
where L (t) is the k -th type of customers queueing length in time t , l its stationary mean value, wk is the stationary waiting time of the k -th type customers. The priority systems has been investigated in detail by Klimov [32, 35], Gnedenko and others [19], Jaiswal [21] and others, where the steady state system characteristics have been calculated. Based on the stationary characteristics of the priority systems, the solution has been proposed by Bronstein and Rykov in [2] (see also [6]) with the help of a simple method that many years after got the name perturbation method [3], and the rule now known as the c¡1 -rule. It is the following. The priority should be organized in a such manner that
b!c! < b2c2 < ■■■ < brcr
or in the order
ci1l ^ c2^2 ^ ■■■ ^ cr1r ,
which gives the name c^ -rule to this discipline.
Further, it was shown [59] that this discipline is also optimal inside essentially awide class of so called dynamic priorities, where the idea is the following: the whole set of states E is divided into classes Et,(i = 0,1,.r^), in which the decision a = i in the decision epoch should be taken.
This class of decisions in fact represents a class of Markov decision strategies, and therefore the results show that the priority is the optimal discipline in the class of Markov decisions. These results have been developed further for system with dynamic preemptive resume priorities in [70, 71, 72] and others.
8.2 System with feedback
For more general settings the problem of service discipline control has been proposed by Klimov [33] and Kitaev&Rykov [30]. Remind more general system Mr/GIF l IS with feedback, proposed in [30]: r types of customers arrive into the single-server system from Poisson flows with intensities for i -th type of customers ( i -customer); the customers are served with i.i.d. service
times distributed accordingly to the CDF Bt (t) with mean value b-. Each served i -customer leaves the system with probability q(0), or generates the set of the same type of customers (n,■■■,nr) with probability q(n,■■■,nr) that should be served also in the system. The LRS includes only a linear waiting (holding) cost Cw (l, i) = ctlt for the i -th type of customer unit of time spent in the system.
The problem consists in service discipline construction that minimize the long run expected loss for the system exploitation. In [31] the existence of a simple Markov optimal strategy has been proved that gave the possibility to consider the problem in the framework of dynamic priority setting. As the result, the problem has been reduced to the linear program
Z bIcJxIJ ^ mn (28)
1<i,j<r
with respect to x , under restrictions
Z (aJxik+a*«)=Yjk (j,k=1, r)■ (29)
1<i<r
Here x = EL is the stationary mean value of J -th type of customers in the i -th type decision
epochs, atj = 8 —biX . — q, and q is mean number of j -th type of customers generated by the i -th one. The model constant yjk = y >0 does not depend on the control rule.
The linear problem (28, 29) solution is attained as usual in the extreme points of manifold, determined by the restrictions (29). This statement allows to show that the optimal policy determines the priority discipline. The dual linear program has been used in order to find the optimal priority rule. It is not enough simple and it is determined with an algorithm, which one can find in [30, 31].
Remark 6 It is well known the connection between iteration algorithm of optimal policy construction and some linear program. In this problem, using a dual linear program allows to construct the optimal control policy. The possibilities of the dual linear program does not fully used up today. Really using the dual linear program allows to search parameters of the model, in which some given simple rule (or service discipline) will be optimal (see also remarks in the Conclusion). This approach has not yet exhausted itself and should be developed in future.
The further investigation of the priority queues has been done by Miscoy and others [43], where especially numerical methods for generalized Kendall equation has been proposed. It also can be used as a method for optimal priority construction for the priority queues with switching times.
8.3 Closed queuing systems
The optimality of the priority rule for closed queueing system has been proved by Koole in [38]. A closed system < Mr/M/1/8 > with r sources, one server and controllable service discipline is considered. Sources have exponentially distributed life and repair times with parameters Xtj (i = 1, r) .. The system includes a waiting (holding) cost c^ (k = 1, r) for every unit of time
that the k -th source is not functioning. The preemptive service discipline that minimizes the total average waiting cost is the goal of investigation.
The problem is formulated in the framework of the decision Markov model. In [38] it is proved that if the sources are ordered in such a manner that the conditions hold
X! < ...<Xr and c1j1 > ...>cjr (30)
then the priority rule: serve the source with minimal index is optimal.
In the case when all c = 1 this result shows that this rule is also optimal for minimizing
the average queue length. For the case when Xk = X for all k = 1, r , the results show that the cj -rule also holds for the closed system with homogeneous sources. It is possible to see that the results also coincide with the analogous results for the open queueing system with heterogeneous servers considered in subsection 7.2.
9 Conclusion
To model CQS in the paper the DTCSRP is used. Of course, the assumption about the possibility to change the control in special control epochs is a some restriction. However, this restriction is natural for many practical situations. Moreover, for MDP due to the memoryless property of exponential distribution the epochs of any states changing are the natural decision times. However, there are some examples (and works), where the choose of control epochs is a subject of investigation. So in [4, 72, 58] the problem of optimal service interrupting has been investigated.
Another approach that could be mentioned as a subject for further investigation consists of construction of the domains into system parameters set such that some natural control rule would be optimal if the system parameters lie in the domain ( domains of optimality). Besides the practical
importance of such approach, it allows in some cases to find an optimal policy for all possible values of the system parameters (example of such approach has been proposed in [30, 31].
Further development of the above approach for optimal control of QS is the following. Because for CQS the states and decision sets E and A are usually multi-dimensional, the problem of the optimal rules qualitative properties investigation should be stated as follows: Is it possible to introduce into sets E and A such a partial order, in which the optimal policy would be monotone?
References
[1] A.K. Agrawala, E.G. Coffman Jr., M.R. Garey, and S.K. Tripathi (1984). A stochastic optimization algorithm, minimizing expected flow times on uniform processors.// IEEE Trans. Comput., V. C-33, pp. 351-356.
[2] O.I. Bronstein, V.V. Rykov (1965). On optimal priority rules in queueing systems.// Izv. AS USSR. Techn. Cibern. No. 6, pp. 28-37 (in Russian).
[3] Buyukkoc C C., Varaiya P., Walrand J. (1985) The cj rule revisited.// Advances in Applied Probability. V. 17, pp. 237-238.
[4] S.S. Chitgopepeker (1969). Continious time Markovian sequential control processes. // SIAM J. Contr. V. 7, No. 3, pp. 367-389.
[5] R.W. Conway, W.L. Maxwell, L.W. Miller (1967). Theory of scheduling.// Addison. Wisley Mass. 1967.
[6] D. Cox, W. Smith (1966). Queueing theory.// M.: "Mir", 1966.
[7] T.B. Crabill (1972). Optimal control facility with variable exponential service times and constant arrival rate.// Management Sci. V.18, No. 9, pp. 560-566.
[8] T.B. Crabill (1974). Optimal control of a maintenance system with variable service rate. // Oper. Res. V.22, No. 4, pp. 736-745.
[9] T.B. Crabill, D. Gtross, and M. Nagazine (1977). A classified bibliography of research on optimal design and control queues.// Oper. Res. V.25, pp. 219-232.
[10] R.Deb (1976). Optimal control of batch service queues with switching costs.// Adv. Appl. Prob. V. 8, pp. 177-194.
[11] A. Dudin (1998). Optimal multithreshold control for a BMAP/G/1 queue with N service modes.// Queueing Systems. V. 30, pp. 273-287.
[12] A. Dudin, I. Khalaf (1988). Optimal control by the current value of the threshold: Data communication/switching communication under the adaptive communication, in: XIII AllUnion Workshop in Computer Networks. VINITI: Moscow, 1988.
[13] Che Soong Kim, Valentina Klimenok, Alexander Birukov, Alexander Dudin (2006). Optimal multi-threshold control by the BMAP/SM/1 retrial system.// Ann. Oper. Res. V. 141, pp 193210.
[14] Che Soong Kim, Sergey Dudin, Valentina Klimenok (2009). The MAP/PH/1 /N queue with flows of customers as a model for traffic control in telecommunication networks.// Performance Evaluation V. 66, pp. 564-579.
[15] Bin Sun, Moon Ho Lee, Sergey A. Dudin, and Alexander N. Dudin (2014). Analysis of Multiserver Queueing System with Opportunistic Occupation and Reservation of Servers. Hindawi Publishing Corporation. Mathematical Problems in Engineering. Volume 2014, Article ID 178108, 13 pages, http://dx.doi.org/10.1155/2014/178108.
[16] Bin Sun, Moon Ho Lee, Alexander N. Dudin, and Sergey A. Dudin (2014). MAP + MAP / M2/N/œ Queueing System with Absolute Priority and Reservation of Servers.// Hindawi Publishing Corporation. Mathematical Problems in Engineering. Volume 2014, Article ID 813150, 15 pages, http://dx.doi.org/10.1155/2014/813150.
[17] D.V. Efrosinin, V.V. Rykov (2003). Numerical investigation 0ptimal control of a system with heterogeneous servers.// Autom and Remote Control, No. 2, pp. 143-151.
[18] A. Emphermides, P. Varaiya and J. Walrand (1980). A simple dynamic routine problem.// IEEE
58
Trans on AC. V. AC-25, No.4, pp. 690-693.
[19] B.V. Gnedenko and others (1973). Priority queueing systems.// M.: Moscow State Univ. Publ. House, 1973.
[20] J. Jacod (1971). Theoreme de renouvellemant et classificasion pour les hains semi-Markoviennes.// Ann. inst. Henri Poincare, sect B. V. 7, pp. 85-129.
[21] N. Jaiswall. Queues with priority. M.: "Mir", 1973.
[22] S. Jolkoff, V. Rykov (1981). Generalized regenerative processes with embedded regenerative periods and their applications.// MOS, Ser. Optimization. 1981, pp. 575-591.
[23] B. Hajek (1984). Optimal control of two interacting service stations.// IEEE Trans. Automat. Control. V.29, pp. 491-499.
[24] D.P. Heyman (1968). Optimal operating policies for M/ GI/1 queueing systems.// Oper. Res. V. 16, pp. 362-382.
[25] D.P. Heyman, M. Sobel (1982). Stochastic Models in Operation Research. Vol. 1. McGraw-Hill, New York, 1982.
[26] D.P. Heyman, M. Sobel (1984). Stochastic Models in Operation Research. Vol. 2. McGraw-Hill, New York, 1984.
[27] S.K. Hipp, U.D. Holzbaur. (1988) Decision process with monotone histeretic policies.// Oper. Res. V. 36, No. 4, pp. 585-588.
[28] R.A. Howard (1960). Dynamic Programming and Markov Processes. MITPress, 7-th edition (1972).
[29] D.G. Kendall (1953) Stochastic processes occurring in the theory of queues and their analysis by the method of embedded Markov chains.// Annals of Math. Stat. Vol. 24, pp. 338-354.
[30] V.Yu. Kitaev and V.V. Rykov (1980) A service system with a branching flow of secondary customers.// Autom and Remote Control, No. 9, pp. 52-61.
[31] V.Yu. Kitaev and V.V. Rykov (1995) Controlled queueing systems. CRC Press, N.Y, 1995.
[32] G.P. Klimov (1966). Stochastic queueing systems.// M.: "Nauka', 1966 (In Russian).
[33] G.P. Klimov (1974). Time sharing service systems I.// Theory Prob. Appl. V. 39, pp. 479-490.
[34] G.P. Klimov (2011). Probability theory and mathematical statistics.// M.: Moscow State Univ. Publ. House, 2011 (in Russian).
[35] G.P. Klimov (2011). Queueing theory.// M.: Moscow State Univ. Publ. House, 2011 (in Russian).
[36] G.M. Koole (1995) Stochastic scheduling and dynamic programming. CWI TRACT. V.113. Amsterdam.
[37] G.M. Koole (1995) A simple proof of the optimality of the threshold policy in two-server queueing system.// Syst.&Control Lett. V. 26, pp. 301-303.
[38] G.M. Koole (1996) Scheduling a Repairman in a Finite Source System.// Mat. Meth. in Oper. Res., V. 44, pp. 333-344.
[39] B. Krishnamoorthy (1963). On Poisson queue with two heterogeneous servers.// Oper. Res. V. 11, pp. 321-330.
[40] P. Langrock and W.W. Rykow (1984) Methoden und Modelle zur Steurung von Bedienungssystemen. In: Handbuch der Bedienungstheorie. Berlin, Akademie-Verlag, V. 2, pp. 422-486.
[41] W. Lin, P.R. Kumar (1984) Optimal control of a queueing system with two heterogeneous servers.// IEEE Transactions on Automatic Control. V.29, pp. 696-703.
[42] F.V. Lu, R.F. Serfozo. (1984) M M 1 Queueing decision process with monotone histeretic optimal policies.// Oper. Res. V.32, No. 5, pp. 1116-1129.
[43] G.K Miscoy, S Giordano, A.Yu. Bejan, V.V. Rykov(2008). Multi-dimensional analogs of Kendal's equations for priority queueing systems: numerical aspects.// Autom. and Remote Control No. 6 (2008), pp. 82-94.
[44] Ph. Nain, P. Tsoucas, J. Walrand (1969) Interchange arguments in stochastic scheduling.// J. Appl. Prob., V. 27, pp. 815-826.
[45] E. Nummelin (1978). Uniform and ratio-limit theorems for Markov-renewal and semi-
regenerative processes on a general state space.// Ann. inst. Henri Poincare, sect. B, V. 14, pp. 119-143.
[46] M. Puterman (1964). Markov Decision Processes. Wiley, N.Y., 1994.
[47] S. Ross (1970). Applied Probability Models with Optimization Applications. Holden-Day, San Francisko, 1970.
[48] V.V. Rykov (1975). Controllable queueing systems.In: Itogi nauki i techniki. Teoria verojatn. Matem. Statist. Teoretich. Kibern. V. 12, pp. 45-152. (In Russian). There is English translation in Journ. of Soviet Math.
[49] V.V. Rykov (1977). Controllable stochastic processes and systems.// Gubkin Moscow state institute of oil and gas industry. Moscow. 106 p. (In Russian).
[50] V.V. Rykov (1995) Histeretic phenomena in controllable queueing systyems.// Vestnik RUDN. Ser. Applied mat. and informatics, No. 1, pp. 101-111.
[51] V.V. Rykov (1997) Two approaches to complex hierarchical systems decomposition. Continuously interrupted systems.// Autom and Remote Control, No. 10, pp. 91-104.
[52] V.V. Rykov (1997) Two approaches to complex hierarchical systems decomposition. Aggregative systems.// Autom and Remote Control, No. 12, pp. 140-149.
[53] V.V. Rykov (1999) On monotonicity conditions for optimal control policies of queueing systems (in Russian).// Autom. and Remoute Control, No. 9, pp. 92-106.
[54] V.V. Rykov (2001). Monotone Control of Queueing Systems with Heyerogeneous Servers.// Quaueing Systems. V. 37, pp. 391-403.
[55] V.V. Rykov (2013). On a Slow Server Problem.// In: Stochastic Orders in Reliability and Risk (edd. by H. Li and X. Li). Springer, N.Y., 2013, pp. 351-361.
[56] V.V. Rykov, D.V. Efrosinin (2009). To the slow server problem.// Autom and Remote Control, No. 12, pp. 81-91.
[57] V.V. Rykov, D.V. Efrosinin (2012). On optimal control of systems on their life time. Recent Advances in System Reliability. Springer Series in Reliability Engineering. Springer, (Edds. by A.Lisnianski and I.Frenkel), 2012, pp. 307-319.
[58] V.V. Rykov, D.V. Kozyrev. (2014) Optimal control for scheduling with correction.// In: Proceedings of the XII All-Russian Workshop on Control Problems (Moscow 6-19 June 2014). M.: ITP Ras, 2014, pp. 8850-8854.
[59] V.V. Rykov, E.E. Lemberg. On optimal dynamic priorites in single server queueing systems. Izv. AS USSR. Techn. Cybern. No. 1, pp. 25-34 (in Russian).
[60] V.V. Rykov, E. Levner (1998) On optimal allocation of jobs between heterogeneous servers.// In Distributed Computer Communication Networks. Proceedings of the International Workshop, June 16-19 1998, Moscow. IITP RAS, Moscow, 1998, pp. 20-28.
[61] V.V. Rykov, M.A. Yastrebenetsky (1971). On regenerative processes with several types of regeneration points.// Cybernetics, No. 3 (1971), pp. 82-86. Kiev (in Russian).
[62] H.Sabeti (1973). Optimal selection of service rates in queueing with different costs.// J.Oper.Res.Soc. Jpn. V.16, No. 1, pp. 15-35.
[63] R.Schasberger (1975). A note on optimal service selection in a single server queue.// Management Sci. V.21, No. 11, pp. 1321-1331.
[64] L.I. Sennott (1999). Stochastic Dynamic Programming and the Control of Queueing Systems.// John Wiley & Sons, N.Y., 1999, 358p.
[65] Stidham, Sh. Jr. (1985). Optimal control of admission to a queueing system.// IEEE Transactions on Automatic Control. V. AC - 30, No. 8, pp. 705-713.
[66] Sh. Stidham, Jr. and R. Weber, R (1993) A survey of Markov decision models for control of networks of queues.// Queueing Systems. V. 13, pp. 291-314.
[67] H. Tilms (1994). Stochastic Models. An Algorithmic Approach. Wiley, N.Y., 1994.
[68] E.A. Timofeev. (1991). On optimal choise mean waiting times in queueing system GI/Mn/1.//
Autom. and Remote Control, 1991. No. 6, pp. 77-83.
[69] D. Topkis (1978) Minimizing a submodular function on a lattice.// Operations Research. V.26,
No. 2, pp. 305-321.
[70] E.B. Veklerov (1967) On optimal pre-emptive dynamic priority disciplines in queueing systems.// Izv. AN SSSR, Techn. cyb., No. 2, pp. 87-90. (In Russian).
[71] E.B. Veklerov (1971) On optimal priority disciplines in queueing system. Autom. and Remout Control No. 6, 149-153. (In Russian).
[72] E.B. Veklerov, V.V. Rykov (1971) On optimal synthesis of queueing systems.// In. Adaptive systems. Complex systems. M.:"Nauka", 1971, pp. 384-389.
[73] S.N. Verbitskiy, V.V. Rykov. (1998) Numerical investigation of servixe rate optiman policies.// Autom. and Remote Control. No. 11 (1998), pp. 59-70 (In Russian. There is English translation).
[74] W. Winston (1977) Optimality of the shortest line discipline. J. Aapl. Prob. V. 14, pp. 181-189.
[75] R. Weber (1993) On a conjecture about assigning jobs to processors of different speeds. IEEE Transactions on Automatic Control. V. 38, No. 1, pp. 166-170.