MSC 93E20, 60J50, 60G40
DOI: 10.14529/ mmp190104
PERFORMANCE BOUNDS AND SUBOPTIMAL POLICIES FOR MULTI-CLASS QUEUE
A. Madankan, University of Zabol, Zabol, Iran, Amadankan@uoz.ac.ir
In this paper, we consider a general class of a queuing system with multiple job types and flexible service facility. We use a stochastic control policy to determine the performance loss in multi-class M/M/1 queue. The considered system is originally a Markov decision processes (MDP). The author showed how to compute performance bounds for the stochastic control policy of MDP with an average cost criteria. In practice, many authors used heuristic control policies due to some hardness in computing and running mathematically optimal policies. The authors found bounds on performance in order to an optimal policy where the goal of this job is to compute the difference of optimality and a specific policy. In other words, this study shows that, the optimal bounds of the average queue length for any non-idling policies can be found by a factor of service rates.
Keywords: queueing system; multiple job classes; stochastic control policy.
Introduction
We are interested to use an average cost per period (ACPP) in a multi-class job M/M/1 queue to determine the performance loss associated with using a control policy. We consider non-idling policies, which always mean serving jobs as long as there are jobs in the queue. The problem considered in this paper is to control a single server queue with multiple job class. [1,2] have shown that the optimal policy for this problem is known where the implementation of that optimal policy needs the exact information about the service rate for each class. However, note that it is a little difficult to analyse the performance of some policies, such as FIFO (see [1]).
Indeed it is known (see for example [3, 4]) that the c^-rule is the optimal control in two main settings: (i) generally distributed service requirements among all non-preemptive disciplines and (ii) exponentially distributed service requirements among all preemptive disciplines. In the preemptive case c^-rule is only optimal if the service times are exponentially distributed. The queuing system that we considered to work on, is in the discrete-time case where the other case and its optimal control policy is studied before and is c^ rule. In the discrete-time case, optimality of c^ rule was established in [4,5]. We recall that c^-rule is the discipline that gives a strict priority in descending order of ck, where ck and refer to a cost and the inverse of the mean service requirement, respectively, of class k.
The problem that we considered is Markov decision processes with an objective of an average cost per period. The main job of this paper is to find a method to determine the performance loss associated with using an optimal control. We produce a systematic approach to reach that and to evaluate the difference between optimality and the costs from a specific policy. We use the presented methods to supply a relation between two costs, one by a specific sub-optimal policy and the other one by an optimal policy. We believe that our results open an interesting topic for the further research. For instance, well-known optimality results in a single-class queue like the optimality of the Shortest Service time discipline or the optimality of FCFS can all be derived as corollaries of the
queue. In order to get insights into the structure of the optimal policy in the multi-class case we consider several relevant cases where the service time distributions are exponential.
Finding an optimal control policy for a Markov decision process is one of the highlighted topics for infinite (or even very large) state space systems where computing it often is intractable [3]. It is not easy to run even the known general form of the optimal control policy due to some difficulties since at each time there is a slot in the discrete time process. In each time slot, we need high computational work to evaluate the costs of control action [6]. As a result, these are the reasons why we use sub-optimal heuristic control policies in practice.
In the present work, by using the general discussed methodology we present a factor which contains only the service rates for sub-optimality of the queue for any considered policy. The obtained bound establishes that there should not be much effects on the queue length when the service rates are approximately the same. Our contribution bounds on ACPP for the problem of controlling multi-class queues. Many authors have already worked to find bounds in Markov chain problems and also in finding the optimal control and analysis of multi-class queues.
Due to the average cost of Markov decision processes for finite state Markov chains, as it is done in [7], bounds are used to provide convergence of an iteration algorithm. On the other hand for general state spaces that we considered here, the obtained bounds are associated with Lyapunov theorems for Markov chains where one can find a similar upper bound in [5]. For systems with positive unbounded costs, the standard Lyapunov theorems are used to produce upper bounds. The bounds can be considered universalization of the Lyapunov bounds and also the finite state bounds. These bounds have a feature that with unbounded costs one can provide an upper bound and lower bound.
1. Problem Definition
The model that we want to consider is a multi-classes queuing system where each job belongs to a variety distinct classes. On the assumption that the system deals with one class of the job, then the order of serving jobs is not important and also does not impress the quantities such as average queue length, the results from any simple control policies (such as first come-first service (FCFS)) and more complex control policies are the same. While jobs depend on service time then the order of serving impress on quantities such average queue length. There are many policies which minimizes an average queue length for finite multi-class M/M/1 queuing system where one can find in [3]. There are some differences between multi-classes queue rather than single class queue which, for instance, are (1) the frequently of the job arrival of some classes is more than some other classes, (2) the service time of some classes is longer than other classes.
Due to the optimal policy for the considered model, we give priority to classes according to an average service time which means job classes with shorter average service times have a higher priority than job classes with longer average service times. Also, we consider preempting in the service where it means we temporarily stop serving a low priority job when a job of higher priority arrives. This means we consider the difference between classes, need statistically information of serving all classes, and allow to preempt jobs in the service if it is needed. Due to sub-optimal control policy, let consider the set of policies that server serves all jobs in the queue without idling and taking rest. We call these theme non-idling
policies. We denote that there is a factor which contains only the service rates when it bounds the queue lengths. The obtained bound, for almost the same service rates for all classes, produces quantifying the inherent sense where the control policy affects just a little bit on the queue length.
1.1. Notation
Let consider discrete-time Markov decision processes where for such a system we consider Y be a measurable general state space with respect to some given a-field B(Y), a finite set of available actions A at each time slot, and measurable cost function z : YxA^ R. At state y G Y, action a G A is the cause of specified cost.
Let consider stochastic kernel p to evolute the state. The kernel is p : Y^YxA^ [0,1] and its definition is presented by
p(B,y,a)
■h
Pr^Yt+1 G B
Y = y, At
Vy G Y, a G A, B G B(B).
In addition, p(B,.,.) : Y x U ^ [0,1] is a measurable function for each B G B(B). In the supposed system, we considered the performance of systems subject to the static state-feedback policies. We consider p : Y ^ A to be a measurable function that depends on the running system state, it chooses the action in each time slot. Let define a set of all measurable policies p by set Q = {p : Y ^ A|p is measurable}.
The state evolution is a random process which is based on policy p G Q, is time-homogeneous Markov chain (Y0,Y1;...). Stochastic kernel p(B,y,p(y)) specifies the transition probability for the Markov chain.
To convenience t-i
k=0
c Yfc ,p(Yk)
and to
Y = y
ease of and Ec
use of t-i
- 1 E E
notation we define EC
t
k=0
c(Yk)
Y0
y
and use Eg to
show the expectation of g(Yt+1) on condition Yt = y. Now we consider the following performance function for a system under specific policy p G Q in terms of ACPP
J (p)
lim sup EC,
t^<x>
where we can define
Jopt = inf \ lim inf EC >.
peQ I J
Finding and developing tools to determine policies which achieve ACPP J(p) close to Jorpt, is our goal.
a
2. Bounds of Markov Chain without Control
Y. Wang and S. Boyd in [8] showed how to compute performance bounds for finite-horizon stochastic control problems with linear system dynamics and arbitrary constraints, objective, noise distribution, and with asymmetric costs and constraint sets. Here we present a methodology to find an approach to determine bounds on ACPP. In this order and before presenting a lower bound on ACPP, first we try to find the bounds for Markov chain without control (or with a given state-feedback control) and then we develop this approach to present a lower bound on ACPP for any policy and then we investigate the
difference between the optimality and the given policy with using an upper bound and a lower bound on the cost by a given policy and by any policy, represently. Here we consider
Theorem 1. Suppose g : Y ^ R is a measurable function. We suppose C(y) = c(y) + Eg — g(y) and define
au = sup s C(y) ai = inf C(y) VzY { j {
If there exists an £ > 0 such that
sup <j E
y&Y
|g(Y+i)l
1+e
Y = y
-|g(y)|
1+e
< TO,
then for all y G Y,
> lim sup Ec, < lim inf Ec.
t—^^o
This theorem is the main result of this paper and we use it to determine upper and lower bounds on the average cost incurred by Markov chains with general measurable state spaces. But before we prove Theorem 1, we express the existence of all expectations that we need.
Lemma 1. Suppose g : Y ^ R is a measurable function, consider
sup C(y) < to, inf C(y) > -to, yeY у^У
and there exists an £ > 0 such that
sup E
yey
|g (Yt+i)|
1+e
Yt
t = y
- |g(y)|
1+e
< TO,
then for all y G Y
E
|g(Yt+i)|
1+e
E
c(Yfc)
Y = y
Yo = y
< TO,
< TO.
(1)
(2) (3)
Proof. Suppose that there is M such that (1) equals to M, so the immediate result is
E
|g(Yt+1)|
1+e
Yt = y
< M + |g(y)|1+e, Vy g Y.
With clarity E[|g(Yo)|1+e|Yo = y] < to. Also, using induction, if for some t, E[|g(Yt)|1+£|Yo = y] < to, then
E
and as a result,
|g(Yt+1)|
1+e
Yo = y
< M + E
|g(Y)|
1+e
Yo = y
< TO.
E
|g(Y)|
1+e
Yo = y
< TO.
Also from the fact that g (y) < 1 + |g(y)|1+£,
E
g (Yt)
Y0 = y
< 1 + E
|g (Y)|
1+£
Y0 = y
< to, Vy G Y.
And also, since — g(y) > 1 + |g(y)|1+£,
—to < —1 — E
|g (Yt)|
1+£
Y0 = y
<E
g(Yt)
Y0 = y
Vy G Y.
Now if
then
sup <j C(y) ¡> < au < to, v&y
c(y) < au + g(y) — E
g(Yt+1)
Yt = y
Vy g Y,
where |E[g(Y+1)|Yt
In the similar way, if
|E[g(Y1)|Yo = y]| < to. Accordingly,
E
c(Y)
Y0 = y
< to, Vt and Vy G Y.
ynf i C(y) ¡> > ai > —to,
then
Accordingly
c(y) > a + g (y) — E
g (Y+1)
Yt = y
Vy G Y.
—to < E
c(Yt)
Y0 = y
Vt and Vy G Y.
Now it is clear that (3) is the result of combining of (4), (5).
Now we can prove theorem 1. Proof. [Proof of Theorem 1] By the definition of au we have
t-1
£ C (Yk)
au > \E t k=0
which means
k=0
In the similar way,
Y0 = y
1
t-1
E
k=0
c(Yk)
Y0 = y
g(Yt)
(4)
(5)
□
Y0 = y
—g(y)
t-1
c(yk)
Y0 = y
<au + -t[g(y)-E[g(Yt)\Y0 = y]
1 t-1
k=0
E
c(Yk)
Y0
o = y
>al + -t[g(y)-E[g(Yt) \ Y0 = y]
t
Now if we show that lim^oo \E the proof.
g (Y)
Yo = y
0, for all y G Y, then we have completed
|g(Y+i)|
1+£
Y = y
Suppose that the supreme of E M where M < to. So
-i-i ,
£ E[|g(Yfc+i)|1+£ | Y] — |g(Y)|
— |g(y)|1+£ f°r all y ^ Y is equal
1
M > -E ~ t
.fc=0
1+e
Yo = y
E
|g(Yt)|
i+e
Yo = y
— |g(y)l
i+e
(6)
with some algebra we have that. Also we know that
(\g(y)\l+£ + tM)^ < |g(y)| + (tM) therefore (6) and (7) imply that
i
E
|g (Y)|
Yo = y
< |g(y)| + (tM)
.
(7)
(8)
By taking limsup as t approaches to, from both sides of (8) we have
limsup -E
t—У^О t
|g (Yt)|
Yo = y
< \im- (\g{y)\ + {tM)^ t—<X t V
and so
lim —E
t—ж t
g(Yt)
Yo = y
0, Vy G Y.
□
3. Bounds with Control
We used Theorem 1 to provide bounds on ACPP that acquired by Markov chains. These bounds are established in the general measurable state spaces of the Markov chains. In this section we extend the result to establish a lower bound on ACPP acquired by any policy to bound the difference between J(p) and J^ for some specific p.
Lemma 2. Suppose h : Y ^ R is a measurable function, consider a be the infimum of the {c(Y, a) + E[g(Y+0|Y = y, At = a] — g(y)} for all y G Y and a G A. And for any static state-feedback control policy p : Y ^ A such that
sup {E[|g(Y+i)|1+£|Y = y, At = p(y)] — |g(y)|1+£} < to, v&y
has ACPP where satisfies
1
t-i
OH < liminf- y^E[c{Yk,p{Yk))\Y0 = y], Vy G Y-
t—те t z—'
k=o
i
0
Proof. For any state feedback policy p G P
a = inf {c(y, a) + E[g(Y+1)|Y = = a] - g(y)} <
< inf {c(y, p(y)) + E[g(Ym)| Y = y, At = p(y)] - g(y)} . yef
Now by applying Theorem 1 the proof is done.
□
4. Control Policy of Multi-Class Queue
The considered queuing system is a discrete time model. Let us consider in the each time slot t, we have W/ G {0,1} arrival jobs in class i to the queue. Also let's put restriction to have at most one arrival in each time slot (we can consider a small time interval to have
it), which means E < 1. Let consider Wt = (Wt1,...,WtN) denotes the arrival vector.
i
Also consider that any vectors Wt and Wt are i.i.d for any individual time slots, t' = t. We consider A* = E [W/], where it is independent from t. Let Y* denote the number of jobs in class i at given time t, and consider Yt = (Y1,..., YtN) as a state vector. Also we characterize control sequence U in each time slot t such that
Ai = 1, if a job of class i is being serviced, t [0, otherwise.
A service, of a job of class i that is being served in a given time slot, will be completed with probability 9i where this 9i is independent from the service history. Also, let's define the number of departure jobs of class i in time slot t (those are served successfully) as random variable D£ = AtI(Yt*)B*. In this random variable, parameter briefs a Bernoulli random variable where E[B*] = and I is the indicator function,
j( ) = i 0, if y = 0, 1, otherwise.
We have the form of the queue length dynamic according to Yt+1 = Y* + W/ — Dt for each i G {1,..., N}.
This problem is a Markov decision process with ACPP criteria since we aimed to choose how to serve different job classes to minimize an average queue length. The state space, for the considered model, is Y = Z+N, where the actions are chosen from the
A ={ a G{0,1}N IE a = 1}.
In each time slot t, the cost gives the number of jobs of all classes where it represents
N
by c(Yt) = E Y*. Let p : Y ^ A be a policy, and consider sequence of (Y0, Y1,...) for the
i=i
queue length process supporting by this policy. Then the average queue length supporting by policy p is
1 t-1
J(p) = limsup - £ E[c(Yk)\Y0 = 0]. t fc=0
The control policy affects on the average queue length where it is the problem that we want to consider. Stability is a measure that under, the bounded queue lengths exists for
any non-idling policy. We call a policy is non-idling if y = 0, then we have y* > 0 when p(y) = a*. Lemma 3 is for the cases that the system is not stabilized under a non-idling policy and since the result is standard (see, [16]), we omitted the proof.
N
Lemma 3. On condition that j; > I, we do not have any non-idling policy with bounded
i=1 1
average queue length.
Theorem 2. Consider that QNI be the set of all policies that are non-idling one and let p G Qni be an arbitrary one, and also let
p&Qni N
If E ¥ < 1? then J(p) < 00 and we have
i=i 1
Jopt = inf < lim inf EC > .
p€qni l i^œ j
J{p) < г
max (6^}
Jopt mmj^} '
Proof. In order to have a lower bound for J^, we define
( f N \ 2 N
*<»> = *. ((Ef)
where the coefficients K and K2 are
minj^i} / N д
Let Aj(y, a) = E[g^(Yt+1)|Yt = y, At = a] — gz(y). For all y = 0, action a which minimizes z(y) + Aj(y, a) has
N D
• 1 ^
*=1
E
Therefore,
N I min{0j }\
min{z(y) + Ai(y,a)} = V 1--3—- y* + amin^-}.
aeA t! \ ^ y j
In order to have an upper bound on Jp, we use following function,
(/ n \ 2 N
tef
where constant K3 is
max {6^ }
K, = -
N
2(1-Elr
Let consider Au(y) = E[gu(Yt+i)|Yt = y] — gu(y). With using the dynamic of queue length Yt+ = Yti + Wi — Dj we have
n / max{6j }\ z{y) + AM(y) = £ ( 1--3—-J Vi + «max{^},
where
N x x / N N 2
i=i v 7 \i=i
a =-
N
2 (l-E£
V i=i
To finish our job in this proof, we need to show that for any policy p G QN1 which is non-idling we have,
sup{E[gu(Yt+i)2|Yt = y, At = p(y)] — gu(y)2} < TO v&y
and
sup{E[gi(Yt+i)2|Yt = y, At = p(y)] — gi(y)2} < to.
y€Y
Functions gu and gi have the following form
g(x) = K (x2 + K2 x),
N
where in the above equation, variable x = E f1- Now by squaring g we have
i=i 1
g(x)2 = K2 (x4 + 2k2x3 + k2x2).
The expected drift for any non-idling policy, E[g(Yt+i)2|Y = y, At = p(y)] — g(y)2 is a polynomial of degree third where its third order term is equal to
It is obviously negative for all y = 0 when the system is stable, where it means, the expected drift in g2 has upper bound for all policies in QNI. Hence, bounds au and ai are well-founded and satisfy
J{p) < au i
max{6i}
J opt ai min{6i} '
i
□
The point of this job is that we did not consider simply the maximum and minimum service rates of queues to find the lower and upper bound of the average queue length of the multi-class queue, respectively. Although these type approaches, that consider the minimum and maximum service rates of queues give the bounds but they can be made a large difference between these bounds for the given service rates. In fact, the bounds by provided in Theorem 2 is tighter than the bound obtaining from considering minimum and maximum service rate.
Conclusion
In this paper, we considered a queuing problem which is Markov decision process in the general state space and we described the method for computing bounds on the costs in such processes with average cost per period. Our method naturally yields the factor that can be used at the problem of controlling of a multi-class queue to find a bound on. The bound that we found is a relation between the average queue length acquired by any policies and acquired by an optimal policy where this bound is totally different with the bound that is obtained by applying minimum and maximum service rate in queues which serve multi-class jobs.
References
1. Atar R., Mandelbaum A., Reiman M.I. Schesuling a Multi-Class Queue with Many Exponentioal Servers: Asymptotic Optimality in Heavy Traffic. The Annals of Applied Probability, 2004, vol. 14, no. 3, pp. 1084-1134. DOI: 10.1214/105051604000000233
2. Regan K., Boutilier C. Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies. Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, July, 2010, pp. 1127-1133.
3. Kebarighotbi A., Cassandras C.G. Optimal Scheduling of Parallel Queues with Stochastic Flow Models: The c^-rule Revisited. IFAC Proceedings Volumes, 2011, no. 44, pp. 8223-8228.
4. Shanthikumar J., Yao D. Multiclass Queueing Systems: Polymatroidal Structure and Optimal Scheduling Control. Operations Research, 1992, vol. 40, no. 2, pp. 293-299. DOI: 10.1287/opre.40.3.S293
5. Meyn S., Tweedie R. Markov Chains and Stochastic Stability. London, Springer, 1993. DOI: 10.1007/978-1-4471-3267-7
6. Puterman M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New Jersey, John Wiley and Sons, 2009.
7. Schweitzer P.J., Seidmann A. Generalized Polynomial Approximations in Markovian Decision Processes. Journal of Mathematical Analysis and Applications, 1985, vol. 110, pp. 568-582. DOI: 10.1016/0022-247X(85)90317-8
8. Yang Wang, Boyd S. Performance Bounds and Sub-Optimal Policies for Linear Stochastic Control via LMIs. International Journal of Robust and Nonlinear Control, 2011, no. 21, pp. 1710-1728. DOI: 10.1002/rnc.1665
9. Osipova N., Ayesta U., Avrachenkov K. Optimal Policy for Multi-Class Scheduling in a Single Server Queue. 21st International Teletraffic Congress, 2009, p. 10951139.
10. Jia Li, Zhang H.M. Bounding Queuing System Performance with Variational Theory. Transportation Research Procedia, 2015, no. 7, pp. 519-535.
11. Senderovich A., Weidlich M., Gal A., Mandelbaum A. Queue Mining for Delay Prediction in Multi-Class Service Processes. Information Systems, 2015, no. 53, pp. 278-295. DPI: 10.1016/j.is.2015.03.010_
12. Huang Qing, Chakravarthy S.R. Analytical and Simulation Modeling of a Multi-Server Queue with Markovian Arrivals and Priority Services. Simulation Modelling Practice and Theory, 2012, no. 28, pp. 12-26. DOI: 10.1016/j.simpat.2012.05.010
13. Casale G., Sansottera A., Cremonesi P. Compact Markov-Modulated Models for Multiclass Trace Fitting. European Journal of Operational Research, 2016, vol. 255, no. 3, pp. 822-833. DOI: 10.1016/j.ejor.2016.06.005
14. Lefeber E., Lammer S., Rooda J.E. Optimal Control of a Deterministic Multiclass Queuing System For Which Several Queues Can Be Served Simultaneously. Systems and Control Letters, 2011, vol. 60, no. 7, pp. 524-529. DOI: 10.1016/j.sysconle.2011.04.010
15. Walraevens J., Bruneel H., Fiems D., Wittevrongel S. Delay Analysis of Multiclass Queues with Correlated Train Arrivals and a Hybrid Priority/Fifo Scheduling Discipline. Applied Mathematical Modelling, 2017, no. 45, pp. 823-839. DOI: 10.1016/j.apm.2017.01.044
16. Kleinrock L. Queueing Systems. Volume II: Computer Applications. New Jersey, John Wiley and Sons, 1976.
17. Ching-Tarng Hsieh, Lam S.S. Two Classes of Performance Bounds for Closed Queueing Networks. Performance Evaluation, 1987, vol. 7, no. 1, pp. 3-30. DOI: 10.1016/0166-5316(87)90054-X
18. Koukopoulos D., Mavronicolas M., Spirakis P. Performance and Stability Bounds for Dynamic Networks. Journal of Parallel and Distributed Computing, 2007, vol. 67, no. 4, pp. 386-399. DOI: 10.1016/j.jpdc.2006.11.005
Received June 28, 2018
УДК 519.872 БЭТ: 10.14529/mmp190104
ГРАНИЦЫ ПРОИЗВОДИТЕЛЬНОСТИ И СУБОПТИМАЛЬНЫЕ СТРАТЕГИИ ДЛЯ МНОГОКЛАССОВОЙ ОЧЕРЕДИ
А. Майаикаи, Университет Заболы, г. Забола, Иран, Amadankan@uoz.ac.ir
В этой статье рассматривается общий класс системы массового обслуживания с несколькими типами заданий и гибкими возможностями обслуживания. Используется стохастическая стратегия управления для определения потери производительности в многоклассовой очереди Ы/Ы/1. Рассматриваемая система изначально представляет собой марковский процесс принятия решений. В работе показано, как рассчитать границы производительности для стратегии стохастического управления марковского процесса принятия решения с критериями средней стоимости. На практике многие исследователи использовали эвристические стратегии управления из-за некоторой сложности в вычислениях и использовании математически оптимальных стратегий. Цель данной работы заключается в расчете разницы между оптимальной и конкретной стратегий, а также в нахождении границы производительности для оптимальной стратегии. Другими словами, это исследование показывает, что оптимальные границы средней длины очереди для любых стратегий без простоя можно найти с помощью коэффициента скорости обслуживания.
Ключевые слова: система массового обслуживания; многоклассовые задачи; стратегия стохастического контроля.
Али Маданкан, кафедра «Информатика:», Университет Заболы (г. Забола, Иран), Amadankan@uoz.ac.ir.
Поступила в редакцию 28 июня 2018 г.