On averaged expected cost control as reliability for 1D
ergodic diffusions
S.V. Anulova5 6, H. Mai7 8, A.Yu. Veretennikov9 10 Mon Nov 27 10:24:42 2017
Abstract
For a Markov model described by a one-dimensional diffusion with ergodic control without discount on the infinite horizon an ergodic Bellman equation is proved for the optimal readiness coefficient; convergence of the iteration improvement algorithm is established.
1 Introduction
According to textbooks in reliability - see, e.g., [7], [19] - coefficient of readiness is one of the main characteristics of reliability of the system. In this paper the model under consideration is presented by an ergodic Markov process described as a one-dimensional diffusion process which is controlled so as to spend more time in a "good domain" on average on the infinite horizon of time. The current readiness of the system is measured by a non-negative function f taking values on the interval [0,1]: one signifies a full readiness, while zero means that the model is in the break down state. Hence, in particular, we do not just split the real line into two parts - where / = 1 or / = 0 -but allow a soft transition from full readiness (/ = 1) to a complete failure of the model (/ = 0). Both coefficients of the diffusion as well as the function f itself may depend on the control. We allow only feedback (Markov) control strategies with values from some compact set. The main result states an ergodic Bellman equation on the optimal readiness characteristic p along with some auxiliary function; this p may be regarded as the most favourable readiness averaged simultaneously in space and time. Also we state an algorithm of improvement of control which in principle provides a tool to solve the Bellman equation approximately.
Earlier results on ergodic control in continuous time were obtained in [13], [15], [3], et al. The latest works include [1], [2], [18], see also the references therein. In the very first papers and books compact cases with some auxiliary boundary conditions - so as to simplify ergodicity - were studied; convergence of the improvement control algorithms were studied only partially. In the
5 Institute for Control Sciences, Moscow, Russia; email: anulovas @ ipu.ru
6 For the first author this research has been supported by the Russian Foundation for Basic Research grant no. 16-08-01285
7 CREST and ENSAE ParisTech, France; email: hilmar.mai @ gmail.com
8 The second author thanks the Institut Louis Bachelier for financial support.
9 University of Leeds, UK, & National Research University Higher School of Economics, & Institute for Information Transmission Problems, Moscow, Russia; email: a.veretennikov @ leeds.ac.uk
10 The third author is grateful to the financial support by the DFG through the CRC 1283 "Taming uncertainty and profiting from randomness and low regularity in analysis, stochastics and their applications" at Bielefeld University during his stay there in August 2017; also, for this author this study has been funded by the Russian Academic Excellence Project '5-100' and by the Russian Foundation for Basic Research grant no. 17-01-00633_a. All the authors gratefully acknowledge the support and hospitality of the Oberwolfach Research Institute for Mathematics (MFO) during the RiP programme in June 2014 where this study was initiated.
later investigations noncompact spaces are allowed; however, apparently, ergodic control in the diffusion coefficient a of the process was not tackled earlier. About controlled diffusion processes on a finite horizon, or, on infinite horizon with discount (also known as killing) the reader may consult in [3], [10].
Discrete time and space theory was developed simultaneously in the monographs [5], [6, 8], [14], [17] and some others; important journal references can be found therein. Technical difficulties related to control in the diffusion coefficients are not an issue in discrete models. Combination of discrete state spaces and continuous time can be found in [18], et al. Reliability was not an issue in most of the cited works; however, it may be introduced in any Markov model. The paper consists of five sections not counting two lines of the Conclusions: 1 - Introduction, 2 -Setting, 3 - Assumptions and Auxiliaries, 4 - Main result and 5 - Sketch of the Proof.
2 Setting
Given a standard probability space (H, F, (F t), P) and a one-dimensional (Ft) Wiener process B = (Bt)t>o on it we consider a one-dimensional SDE with coefficients b,a and a control parameter a described as follows:
dX? = b(a(X?),X?) dt + a(a(Xta),Xta) dWt, t > 0,
(1)
X? = xe r.
Its (weak) solution does exist [11] and under our conditions - 1D, boundedness of all coefficients and uniform non-degeneracy (or ellipticity) of a2 - is weakly unique.
Let a non-empty compact set WcKbea range of possible control values. Without any further reminder U being compact is always bounded. Let b: U x R ^ R, a: U x R ^ R, a: R ^ U be given Borel functions (some more regularity assumptions will be presented later).
Denote the (extended) generator, which corresponds to the equation (??) with a fixed function a(-) by La:
La(x) = b(a(x),x)-d +1a2(a(x),x)-d2, xeR. Given a running cost function f:UxR^R from a suitable function class we aim to choose an optimal (in some relaxed setting, at least, "nearly-optimal") control strategy a:R^ U (Markov homogeneous, or, in another language, Markov feedback strategy) such that the corresponding solution X a maximizes the averaged cost function
pa(x): = liminf 1 £ EJ(a(Xta),Xta) dt. (2)
Recall that the function f takes values
0<f<1, (3)
then this running cost may be regarded as a measure of current readiness of the underlying device. Namely, any value between zero and one we can treat as a measure of availability, while the limit pa if it exists, can be understood as an averaged - with respect to time and "ensemble" -availability (=readiness) of the system. This is especially natural for the set of possible values {0; 1} for such a function; however, the whole interval of values [0,1] also makes an evident sense in the context of reliability theory. In the sequel we assume that the assumption (3) is satisfied.
By K we denote the class of strategies a:R^ U which are Borel measurable. For convenience for every a e K we define the function fa: R ^ R, fa(x) = f(a(x),x), x e R. Now, instead of(2) we can use the equivalent form,
pa(x) = liminf 1 J0T Exfa(Xta) dt.
The "maximin" cost function - or, in other terms, the ergodic availability or readiness coefficient of the system - is defined by the expression
p(x): = supliminf 1 ft EJa(Xta) dt. (4)
aeK T 0
Suppose that for every aeK the solution of the equation (??) Xa is an ergodic process, that is,
Anulova, S.V., Mai1, H., Veretennikov, A.Yu. RT&A, No 4 (47) ON AVERAGED EXPECTED COST CONTROL_Volume 12, December 2017
there exists a unique limiting distribution of X", t ^ m, the same for all initial conditions X0 = x £ R. Then it is true that for every x £ R,
pa(x) = pa.= / fa(x') \ia(dx') =. (fa,^a), (5)
and
p(x) =p: = sup / fa(x') ¡ia(dx') = sup(fa,na). (6)
a£K a£K
Note that under our assumptions p does not depend on x. Ergodicity requires special conditions on the characteristics b,a,a; they will be later specified in the next section. We also define an auxiliary function which depends on x and which looks like a cost function but it is not,
va(x). = 0 Ex(fa(X?) - pa) dt, a £ K. This integral will converge under the recurrence assumptions below.
Solutions of the equation (??) will be understood as weak ones. Correspondlingly, the ergodic Bellman equation (7) below will be established for weak solutions.
The first goal of the paper is to prove that the cost p - which is a constant in the ergodic setting - is the component of the pair (V,p), which is a unique solution of the ergodic HJB or Bellman's equation,
sup[LuV(x) + fu(x) -p] = 0, x £ R, (7)
u£U
where V will be unique up to an additive constant, while p will be unique in the standard sense. The meaning of the function V is that it coincides with va for the optimal strategy a if the latter exists, and this function is the main tool for finding an optimal strategy. Note that due to the unidimensional setting and the non-degeneracy of a2 which will be assumed, the equation (7) is equivalent to the folowing,
supa2(u,x) V"(x) +^^V'(x) + ---= 0, x £ R. (8)
ueu 12 a2(u,x) v J a2(u,x) &2(u,x)i v '
Further, due to the non-degeneracy of a2 and in particular because the right hand sides in (7) and (8) are equal to zero, we conclude that they are both equivalent to
b(u,x) Y'(x) + fU(x) 2(u,x) a2(u,x) a2(u,x)l
sup [± V"(x) + yp^ V'(x) + = 0, x £ R. (9)
The second goal is to show that the "RIA" algorithm ("reward improvement algorithm", or, in some papers, "PIA" for "policy improvement algorithm") provides a sequence of convergent approximate costs, pn^ p, n^ Also let us emphasize that unlike in the finite horizon case, here in the average ergodic control setting, the solution of the HJB equation is a couple (V, p), where p is the desired cost while V is some auxiliary function, which also admits a certain interpretation in terms of control theory.
Note that solutions of the equations (7), (8) and (9) will be studied in Sobolev classes, hence, (second) derivatives will be defined up to almost everywhere with respect to Lebesgue's measure. To keep all strategies Borel, all expressions involving Sobolev derivatives will be uderstood as Borel measurable expressions since for any Lebesgue's function there is a Borel function which coincides with the former almost everywhere. Respectively, all HJB or Poisson equations will be understood in the Sobolev sense with Borel versions of any second order Sobolev derivative. First order derivatives are all continuous due to Sobolev imbedding theorems.
3 Assumptions and auxiliaries
To ensure ergodicity of Xa under any feedback control strategy a £ K, we make the following assumptions on the drift and diffusion coefficients.
1. The function b is bounded, C1 in x, and
lim sup x ¿(u, x) =(10)
lxl^mu£U
2. The function a is bounded, uniformly non-degenerate and C1 in x.
3. The function f takes values in the interval [0,1].
4. The functions a(u,x),b(u,x),f(u,x) are continuous in (u,x).
5. The set U c R is compact.
Lemma 1 Let the assumptions (A1) - (A4) be satisfied. Then the function va has the following properties:
1. For any strategy a the function va is continuous as well as (va)', and there exist C,m > 0 such that supa(|va(x)| + lva(x)'l) < C(1 + lxlm).
2. va e W^loc for any p > 1.
3. va e c1,Lip (i.e., (va)' is locally Lipschitz).
4. v a satisfies a Poisson equation in the whole space,
Lava(x) + fa(x)-< fa,pa >= 0, (11)
in the Sobolev sense.
5. Solution of this equation is unique up to an additive constant in the class of Sobolev solutions Wpjoc with a no more than some (any) polynomial growth.
6. < va,pa >= 0.
Proof. follows from [21] & [16]; see also [9, Lemma 4.13 and Remark 4.3].
Lemma 2 Let the assumptions (A1) - (A3) hold true. Then,
• For any C1,m1 > 0 there exist C,m > 0 such that for any strategy a e K and for any function g growing no faster than Ct(1+ lxlmi),
sup|Exg(Xta)| <C(1+lxD. (12)
For any strategy a e K the function pa is a constant, and there exists C < ^ such that
sup|pa| <C<™. (13)
For any a e K, the invariant measure pa integrates any polynomial:
f ^r^d) <
Proof follows from [21] and [16].
a
Anulova, S.V., Mail, H., Veretennikov, A.Yu. RT&A, No 4 (47) ON AVERAGED EXPECTED COST CONTROL_Volume 12, December 2017
4 Main result
Recall that the state space dimension is D = 1 and that all SDE solutions with any Markov strategy are weak, unique in distribution, strong Markov and ergodic. All of these follow from [11] and from the assumptions (A1) and (A2) (see [21] about ergodicity).
The "exact RIA" reads as follows. Let us start with some homogeneous Markov strategy a0, which uniquely determines p0 = pa° = {fa°,pa°) and v0 = va°. Next, for any couple (v,p) such that v £ C2, or v £ W2loc with any p > 0, and for p EM., define
F[v,p](x): = sup[Luv(x) + fu(x) — p] = max[L^v(x) + fu(x) — p].
u£U u£U
Recall that unless v £ C2, we consider a Borel version of the expression in the right hand side. Now, by induction given an, pn and vn, the next "improved" strategy an+1 is defined as follows: for any x,
Lan+^vn(x) + fan+1(x) — pn = F[vn,pn](x). (14)
which is equivalent to
Lan+1vn(x) + fan+1(x) = max[Luvn(x) + fu(x)] =: G[vn](x).
u
In the sequel we assume that a Borel measurable version of such a strategy can be chosen. In our case existence of such a Borel strategy follows from Stschegolkow's (Shchegolkov's) theorem, see [20], [12, Satz 39], [4, Theorem 1] (the first two references are in German, the last one cites the same result in English), which states that if any section of a (nonempty) Borel set E in the direct product of two complete separable metric spaces is sigma-compact (i.e., equals a countable sum of closed sets) then a Borel selection belonging to this set E exists. Now, the value pn+1 is defined as
pn+i:= {fan+1,Van+1),
where, in turn, p.an+1 is the (unique) invariant measure, which corresponds to the strategy an+1. Recall that
vn(x) = f0°° Ex(fan(Xtan) — pn) dt.
Theorem 1 Let the assumptions (A1) - (A5) be satisfied. Then the Bellman equation (7) holds true for p and some auxiliary function V £ C2, solution ofthis equation is unique for p, and for any n, pn+1 > pn, the sequence pn is bounded, and there is a limit pn'\ p,n ^ <x.
5 Sketch of the Proof
Let us show the sketch of the main steps of the proof.
noindent 1. From (14) and (11) it may be derived that
a.e.
(Lan+1vn — Lan+1vn+i)(x) > pn— pn+1. Further, from Dynkin's formula applied to (vn — vn+1)(X^n+1) we obtain,
Exvn(X?n+1) — Exvn+1(Xan+1) — vn(x) + vn+1(x) > (pn — pn+1) t.
Since the left hand side here is bounded for a fixed x, after division of all terms by t and at t ^ ro, we obtain,
0> pn — pn+u
as required. Therefore, pn < pn+1, so that pn'\ p with some p. Thus, the RIA does converge, although so far we do not know whether p = p. Clearly, p < p, since p is the sup over all Markov strategies, while ¡5 is the sup over some its countable subset.
Recall that now we want to show that vn ^ v such that the couple (v, p) satisfies the HJB equation (7), and that p - as well as v in some sense - here is unique.
2. What we want to do is to pass to the limit in the equation
Lan+1vn+1(x) + fan+1(x) - pn+1 = 0, as n^ m, after having showed compactness of the set (vn) at least in C1 (and later on, in C1^ for any 0 < p < 1). Since
Pn=LanVn(X)+f-n(x),
we obtain after division by a2/2,
Vw (x) ^(x)-^)-2^ (x). (15)
Due to the local boundedness and absolute continuity of vw - see the Lemma 1 - we conclude that the sequence (vn) is locally (i.e. on any bounded interval) tight in C1. Hence, there is a subsequence n' ^ m such that vw converges in C1 on any bounded interval to some function v £ C1 (in fact, even v £ C1'Lip = {g £ C1(R): g' £ Lip } as will be clear in a few lines and follows, e.g., from (15)). Denoting
F1[x, v',p]: = max[buv' + fu - p](x) = max\buv' + fu --^t\ (x),
u u L a J
where
au(x) = 1(au(x))2, bu(x) = bu(x)/au(x), fu(x) = fu(x)/au(x), pu(x) =
p/au(x),
and by using the bounds as in [10, Chapter 1], it can be shown the limiting equation as n' ^ m
v'(x) — v'(r) + J^ F1[s,v'(s),p] ds = 0, (16)
which implies by differentiation that
v'' (x) + F1 [x, v', p] (x) = 0. (17)
This equation is equivalent to (9) and, hence, to (7), as required. In other words, the limiting pair (v,p) satisfies the HJB equation (7).
3. Uniqueness for p. Suppose there are two solutions of the (HJB) equation, v1,p1 and v2,p2 with a polynomial growth for vl. Denote v(x):= v1(x) — v2(x) and consider two Borel strategies a1(x) £ argmax u(Luv(x)) and a2(x) £ argmin u(Luv(x)), and denote by XI a (weak) solution of the SDE corresponding to each strategy at. (It exists and is weakly unique.) Note that
h2(x): = max(Luv(x) - p1 + p2) = max(Luv1(x) + fu(x) - p1 - Luv1(x) - fu(x) + p2)
U U
> max(Luv1(x) + fu(x) - p1) - max(Luv2(x) + fu(x) - p2) a= 0,
u u
and similarly,
h1(x): = min(l^v(x) - p1 + p2) = -max(Lu(-v)(x) - p2 + p1)
u u
< - \max(Luv2(x) + fu(x) - p2) - max(Luv1(x) + fu(x) - p1)] = 0. L u u J
We have, L"2v(x) = h2(x) - p2 + p1, and Laiv(x) = h1(x) - p2 + p1. Further, Dynkin's formula is applicable. So,
Exv(Xl) - v(x) = Ex J^ Laiv(X1) ds
, (hi^O)
= ExJtQh1(X1) ds+(p1-p2) t < (p1-p2) t.
a.e.
The last inequality here is due to the h1 < 0 along with Krylov's bounds [10]. Here the left hand side is bounded (x fixed) due to the Lemma 2, so, we obtain,
P1-P2>
Absolutely similarly we show that also
(h2>0) p1 - p2 < 0.
Thus, eventually,
P1 = P2.
4. Proof of the equality p = ¡5. We have seen that for any initial (a0,p0), the sequence pn
converges monotonically to p, which is a component of solution of the Bellman equation (7), as shown earlier in the step 2, and this component p is unique as was just shown in the step 3. Hence, given some (any) > 0, take any initial strategy a0 such that
po= pa0 > p —
Then, clearly, the corresponding limit p will satisfy the same inequality,
p = limpn > p + £.
n
Due to uniqueness of p as a component of solution of the equation (7), and since e > 0 is arbitrary,
and because it is already established that p < p, we now conclude that
p = p.
The sketch of the proof of the Theorem 1 is thus completed.
6 Discussion
Thus, we have an approach which in principle allows to evaluate the ergodic readiness coefficient in certain diffusion Markov models.
7 Addendum: Borel measurability
In the presentation of RIA we have assumed existence of a Borel measurable version of such a strategy to be chosen which maximizes some function ofr a fixed x. In our case existence of such a Borel strategy follows from Stschegolkow's (Shchegolkov's) theorem, see [20], [12, Satz 39], [4, Theorem 1] (the first two references are in German, the last one cites the same result in English), which states that if any section of a (nonempty) Borel set E in the direct product of two complete separable metric spaces is sigma-compact (i.e., equals a countable sum of closed sets) then a Borel selection belonging to this set E exists. In our case E = {(u,x): F[u,x] = $(x): = maxv£UF[v,x],x £ M}. This set is nonempty and closed and, hence, Borel. Indeed, if E 3 (un,xn) ^ (u,x),n ^ ro, then F[un,xn] ^ F[u,x] due to continuity of F. Also, due to continuity of F, <p(xn) ^ $(x). Since each un is a point of argmax (F[-,xn]) where F[un,xn] = $(xn) we have, F[u,x]=Wmn^mF[un,xn\ = \imn^m$(xn) = $(x), we find that (u,x) £ E, i.e., E is closed. Further, any section Ex of E is also closed itself again due to continuity of F, as if (un,x) £ E and un ^ u, then F[un,x] ^ F[u,x], i.e., actually, F[un,x] = F[u,x]. Thus, Stschegolkow's theorem is applicable.
References
[1] A. Arapostathis. On the policy iteration algorithm for nondegenerate controlled diffusions under the ergodic criterion in book. In: Optimization, control, and applications of stochastic systems, Systems Control Found. Appl., 1-12. Birkhauser/Springer, New York, 2012.
[2] A. Arapostathis, V. S. Borkar, M. K. Ghosh. Ergodic control of diffusion processes. Encyclopedia of Mathematics and its Applications 143. Cambridge: Cambridge University Press, 2012.
[3] V. S. Borkar. Optimal control of diffusion processes. Harlow: Longman Scientific & Technical; New York: John Wiley & Sons, 1989.
[4] L.D. Brown, R. Purves. Measurable selections of extrema. Ann. Stat., 1: 902-912, 1973.
[5] E.B. Dynkin and A.A. Yushkevich. Upravlyaemye markovskie protsessy i ikh prilozheniya, Moskva: "Nauka", 1975 (in Russian).
[6] R. A. Howard. Dynamic programming and Markov processes. New York-London: John Wiley &; Sons, Inc. and the Technology Press of the Massachusetts Institute of Technology, 1960.
[7] B. V. Gnedenko, Yu. K. Belyaev, A. D. Solovyev. Mathematical Methods in Reliability
Theory. Academic Press, New York, 1969.
[8] R. A. Howard. Dynamic probabilistic systems. Vol. II: Semi-Markov and decision processes. Reprint of the 1971 original ed. Mineola, NY: Dover Publications, 577-1108, 2007.
[9] R. Khasminskii. Stochastic stability of differential equations. With contributions by G.N. Milstein and M.B. Nevelson. 2nd completely revised and enlarged ed. Berlin: Springer, 2012.
[10] N.V. Krylov. Controlled diffusion processes, 2nd ed. Berlin, et al., Springer, 2009.
[11] N. V. Krylov, On the selection of a Markov process from a system of processes and the construction of quasi-diffusion processes. Math. USSR Izv. 7: 691-709, 1973.
[12] A.A. Ljapunow, E.A. Stschegolkow, and W.J. Arsenin. Arbeiten zur deskriptiven Mengenlehre. Mathematische Forschungsberichte. 1. Berlin: VEB Deutscher Verlag der Wissenschaften, 1955.
[13] P. Mandl. Analytical treatment of one-dimensional Markov processes. Berlin-Heidelberg-New York: Springer, 1968.
[14] H. Mine and S. Osaki. Markovian decision processes. New York: American Elsevier Publishing Company, Inc., 1970.
[15] R. Morton. On the optimal control of stationary diffusion processes with inaccessible boundaries and no discounting. J. Appl. Probab., 8: 551-560, 1971.
[16] E. Pardoux and A.Yu. Veretennikov. On Poisson equation and diffusion approximation. II. Ann. Probab., 31(3): 1166-1192, 2003.
[17] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, 2005.
[18] V. V. Rykov, Controllable Queueing Systems: From the Very Beginning up to Nowadays, RT&A, 2(45), vol. 12, 39-61, 2017.
[19] A. D. Solovyev. Basics of mathematical reliability theory, vol. 1. Moscow: Znanie, 1975 (in Russian).
[20] E.A. Shchegol'kov. Über die Uniformisierung gewisser B-Mengen. Dokl. Akad. Nauk SSSR, n. Ser., 59: 1065-1068, 1948.
[21] A.Yu. Veretennikov. On polynomial mixing for SDEs with a gradient-type drift. Theory Probab. Appl., 45(1): 160-164, 2000