DOI: 10.17516/1997-1397-2022-15-3-292-307 УДК 519.2
On Approximation of Empirical Kac Processes under General Random Censorship Model
Abdurahim A. Abdushukurov*
Moscow State University named after M. V. Lomonosov, Tashkent Branch
Tashkent, Uzbekistan
Gulnoz S. Saifulloeva^
Navoi State Pedagogical Institute Navoi, Uzbekistan
Received 07.11.2021, received in revised form 10.12.2021, accepted 20.02.2022 Abstract. A general random censorship model is considered in the paper. Approximation results are proved for empirical Kac processes. This model includes important special cases such as random censorship on the right and competing risks model. The obtained results use strong approximation theory and optimal approximation rates are built. Cumulative hazard processes are also investigated in a similar manner in the general setting. These results are also used for estimating of characteristic functions in random censorship model on the right.
Keywords: censored data, competing risks, empirical estimates, Kac estimate, strong approximation, Gaussian process, characteristic function.
Citation: A.A. Abdushukurov, G.S. Saifulloeva, On Approximation of Empirical Kac Processes under General Random Censorship Model, J. Sib. Fed. Univ. Math. Phys., 2022, 15(3), 292-307. DOI: 10.17516/1997-1397-2022-15-3-292-307.
1. Introduction and preliminaries
Following of ( [3-5]) we define a general random censorship model in the following way: Let Z be a real random variable (r.v.) with distribution function (d.f.) H(x) = P(Z < x), x £ R. Let us assume that are pairwise disjoint random events for a fixed in-
teger k > 1. Let us define the subdistribution functions H(x; i) = P(Z < x,A(l)),i £ 3 = {1,...,k}. Suppose that when observing Z we are interested in the joint behaviour of the pairs
(Z,A(i)), i £ 3. Let {(Zj,A(/),...,A(jk)), j > 1} be a sequence of independent replicas of (Z,A(1),... ,A(k)) defined on some probability space {Q,A,P}. We assume throughout that functions H(x), H(x; 1), . . . , H(x; k) are continuous. Let us denote the ordinary empirical d.f. of Z1,... ,Zn by Hn(x) and introduce the empirical sub d.f. Hn(x; i), i £3
1 n —
Hn(x; i) = (Zj < x), (x; i) £ R x 3,
n j=i
where R = [—ro; to], j = I(Ajl)) is the indicator of event Ajl) and
* [email protected] [email protected] © Siberian Federal University. All rights reserved
1 n — Hn(x; 1) + • • • + Hn(x; k) = -^I(Zj < x) = Hn(x), x G R,
n j=i
is the ordinary empirical d.f.. Properties of many biometric estimates depend on the limit behaviour of proposed empirical statistics. The following results are straightforward consequences of the Dvoretzky-Kiefer-Wolfowitz exponential inequality with constant D = 2 [8,12] : For all n = 1, 2,... and e > 0
p( sup \Hn(x) - H(x)\ > (^ • ^) 1/2) < 2n~(1+E\ (1)
yi^K^ V 2 n J J
and
p( sup \Hn(x; i) - H(x; i)\ > 2(^^< n^). (2)
2 n J
Vector-valued empirical process |an(t) = (an\to), a,n\ti),..., a^ftk)) , t = (t0,... ,tk) G
Rk+1j plays a decisive role, where a,n\x) = Jn(Hn(x)-H (x)), a^} (x) = Jn(Hn(x; i)-H (x; i)), i G 9. The following Burke-Csorgo-Horvath theorem [3,4] is an extended analogue of Komlos-Major-Tusnady's result [9-11].
Theorem A([3,4]). If the underlying probability space {Q, A, P} is rich enough then one can define k +1 sequences of Gaussian processes Bno)( x (x (x) such that for an (t) and
Bn(t) = (Bn°)(x0),Bni)(x1),...,Bnk)(xk)), t =(t0,...,tk) we have
P{ sup \\an(t) - Bn(t)\\{k+1) >n-2 (M(log n) + z)\ < K exp(-Xz), (3)
Usfc+1 J
for all real z, where M = (2k+1)A1, K = (2k+1)A2 and X = A3/(2k + 1) with A1,A2 and A3 are absolute constants. Moreover, Bn is (k + 1)-dimensional vector-valued Gaussian process that has the same covariance structure as the vector an (t), namely, EBff(x) = 0, (x,i) G RxS = 9U{0}. We have for any i,j G 9, i = j, x,y G R that
EBn\x)Bn\y) = min {H(x),H(y)} - H(x) • H(y), EBn\x)Bi(i\y) = min {H(x; i), H(y; i)} - H(x; i) • H(y; i), EBni](x)Bij)(y) = -H{x; i) • H(y; j), EBn° (x)B%) (y) = min {H (x; i), H (y; j)} - H (x) • H (y; i).
(4)
If we set z = i (1 + e log n ) in (3) then
P{ sup \\an(t) - Bn(t)\\(k+1) >Cn-2 log n\ < Kn-(1+e),
-TTDk + 1
where C = (2k + 1) ( A1 + ^^ ). Then
an(t) - Bn(t)
(k+1)
='0[ n 2 log n
(5)
Let us note that in proving Theorem A (Theorem 3.1 in [4]) the sequence of two-parametrical Gaussian processes Q(0)(x,n), Q(2)(x,n),..., Q(k)(x,n) was constructed such that for an(t) and ; n) = (Q(0) (x; n),..., Q(k)(x; n)), t G Rk+1 the following approximation was used
1 (k+1) , 1 X
an(t) - n-2 Q(t,n) a=Oin-2 log2 nl,
where Q(t,n) is the (k + 1) dimensional vector-valued Gaussian process and Q(t; n)=n1 an(t). Hence
EQ(i) (x; n) = 0, (x,i) G R xS and we have for any i,j G S, i = j, x,y G R that
EQ(0)(x; n)Q(0)(y; m) = min(n, m){ min{H (x), H (y)} - H (x)H (y)}, EQ(0)(x; n)Q(i)(y; m) = min(n, m){min{H (x; i), H (y; i)} - H (x)H (y; i^, EQ(i)(x; n)Q(i) (y; m) = min(n, m){min{H (x; i), H (y; i)} — H (x; i)H (y; j)}, EQ(i) (x; n)Q(j) (y; m) = — min(n, m) H (x; i) • H (y; j).
Let us observe that {Q(i), i G S} are Kiefer processes and they satisfy the distributional equality
Q(i)(x; n)=W(i) (H(x; i); n) - H(x; i)W(i)(1; n), (6)
where {W(i)(y; n), 0 < y < 1,n > 1,i G S} are two-parametric Wiener processes with EW(i)(y; n) = 0 and
EW(i)(y; n)W(i)(u; m) = min(n, m) min(y, u), i GS.
It is important to note that though Kiefer processes {Q(i), i G S} are dependent processes, corresponding Wiener processes are independent. Indeed, it follows from the proof of Theorem A that
Q(1)(x; n)=K(H(x; 1); n),
x; n)=K (H (x; 2) - H (+œ; 1); n) - K (H (+œ; 1); n),
Q(i)(x; n)=K(H(x; i) + H1) + • • • + Hi — 1); n) — — K(H1) + • • • + Hi — 1); n), i £ 3, where H i) = lim H (x; i), H 1) + ••• + H k) = 1.
xf+x
The Kiefer processes {K(y; n), 0 < y < 1,n > 1} are represented in terms of two-parametrical Wiener processes {W(y; n), 0 < y < 1,n > 1} by distributional equality
{K(y; n), 0 < y < 1,n > 1}={W(y; n) — yW(1; n), 0 < y < 1,n > 1}. (7)
Then, taking into account (6) and (7), the Wiener process {W(i),i £ 3} also admits the following representations for all (x; i) £ R x 3
W(1)(H(x; 1); n)=W(H(x; 1); n),
W (2)(H (x; 2); n)=W (H (x; 2) + H 1); n) — W (1)(H 1); n),...,
W(i)(H(x; i); n)=W(H(x; i) + Hi — 1); n) — W(H1) + • • • + Hi — 1); n).
Now performing direct calculations of covariances of processes {W(i), i £ 3}, it is easy to show that these processes are independent.
2. Kac processes under general censoring
Following [9] we introduce the modified empirical d.f. of Kac by the following way. Along with sequence {Zj,j > 1} on a probability space {Q, A,P} consider also a sequence {vn, n > 1} of r.v.-s that has Poisson distribution with parameter Evn = n, n = 1, 2,.... Let us assume throughout that two sequences {Zj,j > 1} and {vn,n > 1} are independent. The Kac empirical d.f. is
1
I — > I (Zj ^ x) if vn 1 a.s., H*(x) =1 n ^ K j ^ n ^ '
j=i
0 if vn =0 a.s.,
while the empirical sub-d.f. is
1
i -V l(Zi < x,Aj'), i £9 if vn > 1 a.s., H*(x; i) = j n j= ^ ' j J "" ' .
^ 0, i £9 if vn = 0 a.s.,
with H'n(x; 1) + • • • + Hn(x; k) = H*(x) for all x £ R. Here we suppose that sequence {vn, n > 1}
is independent of random vectors {(Zj, dj ',... ,dj '),j > 1}, where j = I(Aj'). Let us note that statistics Hn(x; i) (and also Hn(x)) are unbiased estimators of H(x; i), i £ 9 (and also of H (x))
E(H*n(x; i)) = nEl £ E
¿¿(i) • i (Zk < x)
k=i
n E EE
m=1
to
^ • I (Zk ^ x)/vn = m
k=1
■ P (Vn = m)
1 v-r 1 v-r nme-
= — y H(x; i)mP(vn = m) = — H(x; i) N m •--
n n m!
m=1 m=1
H(x; i)e-n } —- = H(x; i), (x; i) m!
m=0
x; i) e R x 9.
Consequently,
E[H*n(x)] E[H*n(x; i)] H(x; i) = H(x), x £ R.
i=1
i=1
Let us define the empirical Kac processes a^'^x) = y/n(Hn(x; i) — H(x; i)), i £ 9 and
a<n)* (x) = y/n{Hn(x) — H (x)) .
Theorem 1. If the underlying probability space {Q, A,P} is rich enough then one can
define k + 1 sequences o
an(t) = (an'* (to), an1* (t t = (to, t1,... ,tk) we have
define k + 1 sequences of Gaussian processes x) such that for
an (t) = (an°)*(to),ani)*(t1),...,ank)*(tk)) and W*(t) = (W^^W^^),... ,Wnk)(tk)),
P< sup
•fz-TOk+l
a*n (t) — Wn (t)
(k+1)
> C* n 2 log n> < K* n
(8)
where r ^ 2 is an arbitrary integer, C* = C* (r) depends only on r, and K* is an absolute constant. Moreover, W* (t) is (k+1)-dimensional vector-valued Gaussian process with expectation
v„ = m
a
EW(i) (x) = 0, (x, i) G R xS. We have for any i,j G S, i = j, x,y G R that
EWn0) (x)Wn0) (y) = min {H (x), H (y)},
EW^ (x)Wj (y) = min {H (x; i), H (y;})}, EW^ (x)Wn0) (y) = min {H (x; i), H (y)}.
The basic relation between an(t) and a,*n(t) is the following easily checked identity
a*n(x) = \l n av1(x) + H (x;i)
(Vn - n)
Hence, the approximating sequence have the form
Wni)(x) = B$(x)+ H(x; i) ^^,
i G S.
i,
(9)
(10)
where B(iJ(x) is a Poisson indexed Brownian bridge type process of Theorem A and {W(*((x), x > 0} is a Wiener process. It is easy to verify that {W^(x), (x; i) £ R x 3}= |W* (H(x; i)), (x, i) £ R x 3}. The proof of Theorem 1 is similar to the proof of Theorem 1 of Stute [6] and, it is omitted.
Since lim Hn(x) = Hn(+x>) = — then using Stirlings formula, we obtain
xf+TO n
P (vn = n)= P(K(+œ) = 1)
and
\j2nn
— nke-n
(1 + o(1)), n ^ œ,
k !
= o(1), n ^ œ.
P(Hn(+œ) > 1)= P (vn > n) = ]T
k=n+1
Thus Hn(x) with positive probability is greater than 1. In order to avoid these undesirable property the following modifications of the Kac statistics is proposed
Hn(x) = 1 — (1 — Hn(x))l(Hn(x) < 1), x £ R,
Hn(x; i) = 1 — (1 — Hn(x; i))lH(x; i) < 1), (x; i) £ R x 3.
The following inequalities are useful in studying the Kac processes.
(11)
Theorem 2. Let {vn,n ^ 1} be a sequence of Poisson r.v.-s with Evn = n. Then for any e > 0 such that
r> a-
(12)
n £
-— , e = exp(1),
have
log n' 8(1 + f )2:
1( £
P( \Vn - n\ > 2^2 n log n ) 2n-£W,
P( sup \Hn(x; i) - H(x; i)| > 2
\\x\< —
P ( sup \Hn(x; i) - H (x; i) \ > 2
\x\< —
£ log n 2n
£ log n 2n
< 4n-iew, i e 3,
< 4n-4ew, i e 3,
(13)
(14)
(15)
where w =
nn
1
ne
n!
e
2
2
Proof. Let 71,72,... be a sequence of Poisson r.v.-s with E^k = 1 for all k = 1,2,.... Then
nn
Sn = Vn — n = J2(lk — 1) = J2 Zk and
k=1 k=1
~ (et)k
Eexp(t^k) = e-t exp(t^i) = exp ( - (t + 1))Y, ^f" = ex^e4 - (t + .
k=0 '
t
(e4)
p\ - (t + 1)! Z^
k=0
Using Taylor expansion for e4, we obtain
Eexp(tÇk) = exp j 1 + t + ^ + t(t) - (t + 1) J = exp j ^ + t(t) t3
where ^(t) = — exp(9t), 0 < 9 < 1. Taking into account that t3 ^ t2 for 0 ^ t ^ 1, we obtain 6
t3 t t e
the estimate for ^(t): ^(t) < — e < e—. Thus, Eexp(t£k) = ex^ ^ I 1 ^ ) 0 < t < 1.
6 6 I 2 y 3
The following result (from [13]) is necessary for further considerations.
Lemma 1 ([13]). Let {£n,n ^ 1} be a sequence of independent r.v.-s with E£n = 0, n =1, 2,.... Suppose that U, X1,..., Xn are positive real numbers such that
E exp(t£k ) < exp[1 Xk t\) for k = 1, 2,...,n |t| < U. (16)
Let A = Xi +-----+ Xn. Then
z2
, X I 2exp[ —— if o < z < AU,
P{ + ••• + Zk | > z) < { V 2AJx
V ' [ 2exp( — Uf) if z> AU.
e 1 ( \1/2
Let us assume that Xk = 1 + -, U = 1, z = -( |nlogn i in Lemma 1 then we obtain
3 2
1( V
(13). Here 0 < z = ^ fnlogn j < (1 + |)n = AU. Consider probability in (14). Using total probability formula, we have
' £ log n \ 2 ' v 2n
p( sup \Hn(x; i) - H(x; i)| > 2Î-
^^ | ^ | ^^ ^^^
= p( sup Hn(x; i) - H(x; i) + 1 V ô(i)I(Zk < x) > 2() V^ • P(vn > n) +
\l*l<™ n .±1-, \ 2n ) I )
n
k = Vn + i
'-log n)2
+ p( sup H(x; i) - H(x; i) - 1 V ôflZ < x) > V Vn < n) • P(vn < n) <
V|œ|<œ n k=v +1 2n J
< P sup \Hn(x; i) - H (x; i)\ > (') 2 + P sup- V Ô<iÎ)I(Zk < x)
k=min(n,Vn ) + i
>
>(-}2F)< 2n-+p
> ( Î^) ^ 2n-4E + 2n-4w < 4n-4w£, i €
where we applied (2) and (13) that proves (14). Let us define Tni) = inf { x : Hn(x; i) = 1}, i £ 9. If x > T^ and Vn > n then Hn(x; i) = 1 and H*(x; i) — H (x; i) > H*(x; i) — Hi(x; i) > 0. Then assuming Vn > n, we obtain
v„ — n
sup Hn(x; i) - H(x; i)
\x\< —
= < max
sup) \Hn(x; i) - H(x; i)\, sup \Hn(x; i) - H(x; i)\
fa)
L n
<
< < max
sup \Hn(x; i) - H(x; i)\, sup \Hn(x; i) - H(x; i)\
x<T,
(i)
x>t;
(i)
= sup \Hn(x; i) - H (x; i)\, i GS. (17)
\ x\ <—
With vn < n, it is obvious that Hn(x; i) = H'n(x; i) for all (x; i) G R x S. Now taking into account the last two relations, total probability formula and (14), we obtain (15). Theorem 2 is proved. □
Let an(t) = [a^t), on' (t\),..., aCn' (tk )j , where a^' (x) = ^Jn (Hn(x) — H (x)j , a(x) = = y/n ( Hn(x; i) — H(x; i)) , (x; i) £ R x 3. We will prove an approximation theorem of the
vector-valued modified empirical Kac process an(t) by the appropriate Gaussian vector-valued
_k + 1
process Wn (t) ,t G R from Theorem 2.
Theorem 3. Let {Tn,n ^ 1} be a numerical sequence satisfying for each n the condition Tn <TH = inf {x : H(x) = 1} < œ such that
min {P (A(i) ) - H (Tn, i^ > 1 - H (Tn ) > 2 ^
r log n\ 1/2
2wn
(18)
If for any e > 0 condition (12) holds then on the probability space of Theorem 2 one can define k+ 1 sequences of mean zero Gaussian processes wno) (x),Wn1 (x),...,Wnk)( x) with the covariance structure (9) such that for an(t) and W*(t) = (w^ (^W^P (tx),... ,Wnk) (tk)) we have
p\ sup \\on(t) — W*(t)\\{k+1) > can2 logn \ < Kn-13,
[te(-x;Tn](k+1) )
where K is an absolute constant, C = C(e) and 3 = min (r, ew) for any e > 0. Proof. It is easy to see that probability in (19) can be estimated by the sum
(19)
-P { sup
H°Hx) - W(\x)
> Cn2 log n > +
k\ +J2P sup Hj)(x) - W(i)(x) i=1 Vx< Tn
(20)
> Cn2 logn = q1n + q2n.
Taking into account that for any x < Tn, Hn(x) < Hn(Tn), and if Hn(Tn) < 1 then C!n\x)= Cn0)*(x). Using formula of total probability, we have
qin < P (^P |on0)(x) — Wn°\x)\ > C*n-2 lognjH*n(Tn) < ^ + P (Hn(Tn) > 1) < a1)0^* (x) — Wn0)(x) > C*n-1 log n) + P(Hn(Tn) > 1) <
< P ( sup
\x<Tn
< Kn-r + P (Hn (Tn ) - H (Tn) > 1 - H (Tn)) <
/ ^ 1 \
' r log n
< K *n-r + P sup \H*n(x) - H (x) \ >
y\x\< —
2wn
< Ln-
r
where Theorem 1 and the analogue of (14) for Hn — H, L = K* + 4 are used. Analogously,
k / \ k q2n ^ pf sup \ali)(x) — wni)(x)l>c *n1 log nj +Y, p(k (Tn; i) >p(A(i))) <
i=1 ' i=1
< Vp( sup a<n)*(x) — W<i)(x) >C*n-2 logn) +
i==l \x<Tn J
+ V p I sup a<i)*(x) — W<i)(x) i=i \lx<TO
1 (22) > C*n-1 log n ) +
K — nU 1 (4r log n'\ 1 / UT— ^i^-Ar
+ kp\ —-1 > - ——) | < kLn-r + 2kn
\ n 2 \ 2wn
where inequalities (13), (15) and Theorem 1 are used. Now (19) follows from (21) and (22). Theorem 3 is proved. □
3. Estimation of exponential-hazard function
In many practical situations when we are interested in the joint behaviour of the pairs {(Z,A<i^ , i £ 3} the so-called cumulative hazard functions {sW(x) = exp (—A<i)(x)), i £ 3}
( x \
plays a crucial role. Here A<i)(x) is the i-th hazard function f = f
\—TO < — to;x] /
A<i)(x)= fx Hrt), i £3,
J—to 1 — H (u) x dH (u)
where A(1)(x) + ■ ■ ■ + A<k)(x) = A(x) = f -- is the corresponding hazard function of
1 — H (u)
d.f. H(x).
Let us consider two important special cases of the considered generalized censorship model:
1. Let {X1, X2,... } be a sequence of independent r.v.-s with common continuous d.f. F. They are censored on the right by a sequence \Y1,Y2,...} of independent r.v.-s. They are independent of the X-sequence with common continuous d.f. G. One can only observe the sequence of pairs {(Zk,6k) ,k = 1,n}, where Zj = mrn(Xj,Yj) and 5j = dj1 is the indicator of event Aj = A<1 = {Zj = Xj}. In this case k = 2, 1 — H(x) = (1 — F(x))(1 —
G(x)), H(x;1) = f (1 — G(u))dF(u). Thus S<P)(x) = S(x) = 1 — F(x). The useful special
—TO
case is 1 — G(x) = (1 — F(x))3, 3 > 0 which corresponds to independence of r.v.-s Zj and j ,j > 1.
2. Let us assume that k > 1 and consider independent sequences ^y!1\y.<1\ .. (i = 1,...,k) of independent r.v.-s with common continuous d.f. F. Let Zj = min (Yj1,..., Yjk)J. Let us observe the sequences j (Zj, i = 1, k^ , where j
is the indicator of the event Aj^ =
= |Zj = Y^}. This is the competing risks model with S<i)(x) = 1 — F<i)(x), i £ 3.
Let us define the natural Kac-type estimator
d.
1 - Hn(u)
A%\x)=r dHC;i] , ies
J-oo 1 — Hn(u)
of A(i)(x), i £ 9. Let w(\x) = Jn ^A(i)(x) — A(i)(x, i £9, is an Kac-type hazard process,
wn(t) = (wn1)(t1),...,wnk)(tk)) , t = (t1,...,tk), and Yn(t) = (Yn(1)(t1),...,Yn(k)(tk)) is the corresponding vector process with
Y w [x W(0\u)dH (u; i) + Wn\x) fx Wn\u)dH (u)
Yn (x) = /1 TT/„ \\2 + 1 TT// /1 TT/„ W2 , i € 9
(x) wn (u)dH (u)
(1 - H(u))2 ' 1 - H(x) - j-^ H-HÏÛ)2' '
and (x),Wn1^ (x),... ,Wnk' (x)^ are Wiener processes with the covariance structure (9).
Then for i € 9, EYn(i) = 0 and
EY(Xx)Y(Xy) = C (x,y), where x,y ^ TH = inf {x : H(x) = 1} ^ œ.
Theorem 4. Let {Tn, n ^ 1} be a numerical sequence satisfying for each n the condition Tn < TH such that
n ( 2 2rbl 2-bl 1 , ,
-- > m&x{32-w2,-n,-, (23)
log n [ w w )
where bn = (1 - H(Tn))-i, -> 0, r > 2. Then
P sup \\wn(t) - Yn(t)\\(k) > r(n)\ < k$in-P, (24)
Vte(-n;T„](k) /
on a probability space of Theorem 2, where r(n) = &0bnn 2 log n, Phi0 = $0(£,r), $1— are absolute constants.
Proof. It is sufficient to prove that for each i £ 9
P sup (x) — Yn(i) (x)j > r(n ) < . (25)
We have representation for each i £9 for difference
(1 — H (u))2 ' 1 — H (x)
2
w,){x) - Y,){x)= rx (C(0)(u) - w-0)(u))dH(u;i)+^x) - w„\x)
r x (a(n(u) - w(i\u)^ dH (u) - 2 , x (c(0)(u)) dH (u; i) J-n (1 - H(u))2 J-n (1 - h(u))2 (1 - Hri(u)
!■ x ~(0)/ \j~(i)/ \ 4
+ n-u an (u)fn (C ) = ^ Rmi)n(x). j-n (1 - h(u)) (1 - Hn(u)j m=i
Using (15) and (19), we have for sum Ri^x) + R>in(x) + Rilfe)
p sup
\x<Tn
^ ^ Rmn(x)
> 3Cn 2 log n + -n 2 bn log n <
< 3Kn-13 + 2Ln-WE < (3K + 2L)n-13, i €9.
Rewrite R^n in the form
rA))(x) = n— 1
(ai0) (u)) 2 d(H(u; i) — H(u; i))
(1 — H (u))2 1 — Hn(u)
+n 2
rx C0) aan
+
(27)
(1 — H(u))
Then taking into account (15), we obtain for i £
(u)dan (u)= R01(x)+RH(x).
p sup
\x<Tn
R<l(x) > 2en— 1 b3n log n) < 2Ln—we < 2Ln-3.
(28)
There exists an absolute constant A such that
+p
p sup
\x^Tn
sup n— 2
,x<Tn
R/n(x) > 3An— 2b2n log n) < p (H*(Tn) > 1) +
a^* (u)ddn)* (u)
(29)
> 3An 2 bn log n < Ln r + pn,
—to (1 — H (u))
so that for any x < Tn, H*(x) < H**(Tn) and if H**(Tn) < 1 then H*n(x; i) < H*n(Tn) and hence dh\x) = a^i)*(x) for i £ 3. It is sufficient to estimate probability pn. According to
proof of Theorem 1 in [6], supposing aVJ (x) = y/in(H*n (x) — H(x)), aiir)(x) = y/vn(H*n (x; i) — H(x; i)), i £ 3 and using representation (10), we have pn = p1n + p2n + p3n + p4n, where
x a Vn (u)da Vl(u)
p1n
p2n
p3n
p4n
p — sup
n x^Tn
(1 — H (u))2
> 3An 2 bn log n),
p w —
Vn Wn — n\
p w —
p
nn Vn \Vn — n\
sup
x^Tn
a<,J (u)dH(u; i)
sup
(1 — h (u)y
H( u) da n (u)
e 3
> -
2 V 2
e ( 3 Y
2 2
\ Vn — n\
2
sup
vn x^tA J—to (1 — H(u))
I—to (1 — H (u))2 H(u)dH(u; i^ e ,2 , ,
1 > 8 n 2 bn log n).
n 2 bn log n), n— 1 bn log n),
Taking into account Lemma in [5], we have
p sup
\x<Tn
On ) (u)daini) (u)
(1 — H (u))2
> Ab2n log n < Bn
where A = A(e) and B is an absolute constant. Moreover, using (13), we have
„ , I Vn — n| 1 \ 2nw
p[ LJ1-1 > _ ) ^ 2n— ^.
n2
(30)
(31)
It follows from (30) and (31) that
pin =
p
sup ||
\x<Tn J-
-<°)
( u) da( in) (u)
(1 — H (u))2
> 2Ab2n log Vn ^, n < Vn < f + p( ^ > 2) <
^ 2n n + p sup
\x^Tn
log Vn 2 a V}(u)da Vn(u)
(1 — H(u))2
> Ab2n log Vn I +2n l°s n ^
x
x
x
2
x
2
x
x
V
< e-
■+£ W sup
m=i
<Tn
affl (u)dam(u)
(1 - H(u))2
> Abn log m P(vn = m) + 2n
log n
< e-n + 2n-loggn + By m-e •— e-n < e-n + 2n-logn + Bn-e. (32)
m'
m=i
Analogously, using (31) and (1), we obtain
D, vn Wn - nl
P2n = P \ \--sup
n n x<Tn
rx aï°n>(u)dH(u; i)
(1 - H(u))2
- f 3 \ 2 2, n 3n\
> 22 n-2 bn ^og n, 2 < vn < Y +
+P{ K-^ > ^ ^ 2n-logwn +2n-we + p( sup
n 2) \|x|<n
aW(x)\> g log vn) M <
< 2n-^ +2n-we + e-n + Dn-e. (33)
Integrating by parts and using (2), we obtain
p3n ^ 2n logn +2n we + Pi sup
\ |x| <n
a
(x) > (2- log Vn)2 <
(34)
< 2n-logn + 2n-we + e-n + 2Dn-e.
Finally, using (13), we have
P4n < P > 2(2 log n) *) < 2n-we.
Now combining (26)-(29) and (32)-(35), we obtain (25). Theorem 4 is proved.
(35)
Corollary 1. It follows from (24) that for suitable r ^ 2 and £ > 0 one can obtain an approximation on (—&>; T](k) with b-1 = 1 — H(T) > 0 :
sup \\wn(t) - Yn(t)\\(k) = û(n-1 logn > 2. „„.Tim V /
te(-n;T ](k)
(36)
Now we consider joint estimation of exponential-hazard functions {S(i) x) = exp (—A(i^(x)), i £ 9}. Let us consider hazard function estimate
An(x) = i J —c
dHn(u) 1 - Hn(u)
and corresponding hazard process w(0)(x) = yfn (An(x) — A(x)). In the next Theorem 5 we
approximate w(i°^(x) by sequence of Gaussian processes YiU)(x) =
(0),- Wn0)(x)
1 - H(x)
Theorem 5. Let {Tn,n ^ 1} be a numerical sequence that satisfies the condition Tn < TH for each n such that (23) holds. Then on a probability space of Theorem 2 we have
P ( sup
K'x^Tn
wno)(x)
)(x) - Yn(0) (x) > r0(n)^ < ,
where r0(n) = $0bnn-1 logn and $0 = $0— r), are absolute constants.
x
x
m
Proof. It is easy to verify that
w{n](x) — Yno)(x)
(d?\x) — wno) (x))
1 — H(x)
+ n 2
~<0) an (u
\ 2
(u) dH(u)
I — to (1 — H(u))2 1 — Hn(u)
+
(u)dai° (u)
I — to (1 — H(u)) 1 — Hn (u)
+ n 2
Now further proof of (37) is similar to the proof of Theorem 4 and hence details are omitted. Theorem 5 is proved. □
One can obtain from Theorems 4 and 5 the following theorem on deviations of processes wno) and wn), i £ 3.
Theorem 6. Let {Tn,n ^ 1} be a numerical sequence that satisfies for each n the condition Tn < TH such that (23) holds. Then
p sup
\x<Tn
w(n))(x;
(x) >r0(n) + 2bn(e log n)2 ) < ^1n 3 +
and for i £ 3
p ( sup w^Xx) > r0(n) + 6b2n(e log n)1 ) < ^1n-3 + 3n—e.
\x<Tn J
(38)
(39)
Proof. It is easy to verify that for any n > 1
wn0)(x)=W (H(x)) and wni)(x)=W (H(x; i)), (x; i) £ R x 3,
where {W(y), 0 < y < 1} is a standard Wiener process on [0,1]. Then probability in (38) is not greater than
p sup
\x<Tn
<0) wn ( x
(x) — Yh0)(x) >ro(n)j+ p ^ sup Yno)(x) > 2bn (e log n)2 ) < < ^1n—3 + p (\W (1)\ > 2(e log n)2) < ^1n-3 +
(40)
where inequality (37) and well-known exponential inequality for Wiener process (see [14], Eq. (29.2)) are used. Analogously, (39) follows from (25) and the second estimate in (40). Theorem 6 is proved. □
To estimate the exponential hazard functions {S<i)(x) = exp (—A<i)(x)) ,i £ 3} we use the following exponential of Altshuler-Breslow, product-limit of Kaplan-Meier and relative risk power estimates of Abdushukurov ([1-3]):
SO (x)=exp (—An (x)
stt (x) = n »ux (1 — aa) (x)
S« (x) = [1 — Hn(x)]
<i)( n(
R(:)<x)
(41)
where R{n)(x) = A^(x)(An(x)) — 1, i £ 3.
(i)
It follows from the proof of Theorem 1.4.1 in [3] that for all (x; i) £ (—<x, Z<n)) x 3, Z<n) =max(Z1 ,...,Zn)
x
x
n
— t
n
0 < s1()(x) — S£(x) < 2n/x V dHn~U;v ^(d ,
J-TO (1 —Hn(u)) (n' (42)
0 < sSinn(x) — S^(x) < 2- fx dHn(U;i) = o (!) .
-to h — iHn(u^ w
(i)
Hence it is sufficient to consider only estimator S1( . Let us introduce vector-processes
Qn(t)= (q(X) (t1),..., ®{nk)(tk)) and Q*n(t) = (qin1)*(t1),..., Q{nk)*(tk)), where qi:)(x) =
= Jn S^x) — S(i)(x)) and <Qn)*(x) = S(i)(x)Yni)(x), i £ 9.
In the next theorem vector-valued process Qn(t) is approximated by Gaussian vector-valued process Q*n(t), t £ Rk.
Theorem 7. Let {Tn, n ^ 1} be a numerical sequence that satisfies for each n the condition Tn < TH such that inequality (23) holds. Then we have on a probability space of Theorem 2
P ( sup \\Qn(t) — Qn(t)l|(k) > r*(n) ) < kR*n, (43)
\te(-TO;T„](k) J
where r* (n) = |r0 (n) + ^ n-1 (^r(n) + 6bn (£ log n)2 j | and R* is an absolute constant. Proof. Using Taylor expansion for each i £ 9, we obtain
Q()(x) = S(i) (x)wi:)(x) + 1 n-2 exp (—^(x)) {w(:) (xj) * , where e<i)(x) £ [min (A()(x), A(i)(x)) , max (a* (x), A(i) (x))l. Now using (24), (38) and (39),
we obtain the required result. Theorem 7 is proved. □
4. Estimation of characteristic function under random right censoring
Let X1,X2,... be independent identically distributed r.v.-s with common continuous d.f. F. They are interpreted as an infinite sample of the random lifetime X. Another sequence of independent and identically distributed r.v.-s Y1,Y2,... with common continuous d.f. G censors on the right is introduced. This sequence is independent of {Xj }. Then the observations available at the n-th stage consist of the pairs {(Zj,Sj), 1 < j < n} = C(n), where Zj = min(Xj,Yj) and Sj is the indicator of the event Aj = {Zj = Xj} = {Xj < Yj}. Let
/to
eitxdF(x)
-to
be the characteristic function of d.f. F. The problem consists in estimating of d.f. F from censored sample C(n). In some situations it is more desirable to estimate C(t) rather then F. We consider estimator for C(t) in this model as Fourier-Stieltjes transform of estimator Fn(x) = 1 — S1n(x) = 1 — exp (—A(n1)(x)) :
/TO
eitxdFn (x), t £ R.
-TO
It follows from (39) that when n — œ
sup \ Fn(x) - F (x) \ = 0 b:
x<Tn \
log n
(44)
where b-1 = 1 - H(Tn). It also follows from (44) that when n — œ
1 - Fn(Tnf= 0 (1 - F (Tn)), Fn(-Tnf= 0 (F (-Tn))
(45)
It is obvious that An(r)——'0 when n — to for any t < to, where An(r) = sup \ Cn(t) — C(t)\.
\t\Ur
Let us consider quantity An(Tn) for some special numerical sequence Tn that tends to +to when n — to.
In the following theorem we prove result of uniform convergence for the empirical characteristic function.
Theorem 8. Let {Tn,n ^ 1} be a numerical sequence that tends to +to slowly when n — to. Then, An(Tn)a—s'0 when n — to.
Proof. Let us choose a sequence {Tn,n > 1} such that when n — to
Yn =max{1 - F (Tn), F (-Tn), b^nTn
log n
n
— 0,
(46)
where {Tn, n > 1} is a sequence that satisfies condition (23). Introducing the truncated integrals
bn(t) = ( eitxdFn(x), Hn(t) = fx\ < TneitxdF(x)
j\x\<Tn J\
and introducing dn(t) = bn(t) — bn(t), we have that
An(Tn) < sup \dn(t) \+ sup \bn(t) — Cn(t) \+ sup bn(t) — C(t)
\t\UTn \t\UTn \t\UTn
Integrating by parts, we obtain
^ sup
t| <Tn
sup \ dn(t) \ ^ sup
\ t\<Tn \ t \ <Tn
\eitx | \ Fn(x) - F (x) \ TnTn
<
i eitxd (Fn(x) - F(x))
J \ t \ <Tn
eitxd (Fn(x) - F(x))
+ sup
\ t \ <T,
it
x <Tn
dx <
< 2(1 + 2TnTn) sup \ Fn(x) - F(x)
x \ < Tn
On the other hand,
sup \ bn(t) - Cn (t) \ < sup f \eltX \ dFn(x) < 1 - Fn(Tn) + Fn(-Tn )
\ t \ <Tn \ t \ ^Tn-J \ x \ >T
n ^ \ x \ >T n
and
sup
\ t \ < Tn
bn (t) - C(t)
^ sup
\ t \ ^TnJ \ x \ >T
\eitx\ dF (x) < 1 - F (Tn) + F (-Tn).
Now adding (44)-(50), we have that An(Tn)a= '0(jn), n — œ. Theorem 8 is proved.
(47)
(48)
(49)
2
n
2
References
A.A.Abdushukurov, Nonparametric estimation of the distribution function based on relative risk function, Commun. Statis. Th. Meth., 27(1998), no. 8, 1991-2012. DOI: 10.1080/03610929808832205
A.A.Abdushukurov, On nonparametric estimation of reliability indices by censored samples, Theory Probab. Appl., 43(1999), no. 1, 3-11.
A.A.Abdushukurov, Statistics of incomplete observations. University Press, Tashkent, 2009 (in Russian).
M.D.Burke, Csorgo, L.Horvath, Strong approximations of some biometric estimates under random censorship, Z.Wahrschein. Verw. Gebiete, 56(1981), 87-112.
M.D.Burke, S.Csorgo, L.Horvath, A correction to and improvement of "Strong approximations of some biometric estimates under random censorship", Probab. Th. Ret. Fields., 79(1988), 51-57.
S.Csorgo, Strong approximation of empirical Kac processes, Carleton Math. Lect. Note., 26(1980), 71-86.
S.Csorgo, L.Horvath, On random censorship from the right, Acta. Sci. Math., 44(1982), 23-34.
A.Dvoretzky, J.Kiefer, J.Wolfowitz, Asymptotic minimax character of the sample distribution function and of the multinomial estimator, Ann. Math. Statist., 27(1956), 642-669.
M.Kac, On denations between theoretical and empirical distributions, Proc. Nat. Acad. Sci. USA, 35(1949), 252-257.
D.M.Mason, Classical empirical process theory and wighted approximation, Comunicaciones del CIMAT, N.I-15-03/09-11-2015/ (PE/CIMAT).
D.M.Mason, Selected Definition and Results from Modern Empirical Theory, Comunicaciones del CIMAT, N.I-17-01, 16.03.2017 (PE),.
P.Massart, The tight constant in the Dvoretzky-Kiefer-Wolfowits inequality, Ann. Probab., 18(1990), no. 3, 1269-1283.
W.W.Petrov, Limit theorems for sum of random variables, Moscow, Nauka, 1987 (in Russian).
A.W.Skorokhod, Random processes of independent increments, Moscow, Nauka, 1964.
Об аппроксимации эмпирических процессов Каца в общей модели случайного цензурирования
Абдурахим А. Абдушукуров
Филиал Московского государственного университета в Ташкенте
Ташкент, Узбекистан
Гульназ С. Сайфуллоева
Навоийский государственный педагогический институт
Навои, Узбекистан
Аннотация. В статье рассматривается общая модель случайного цензурирования и доказываются результаты аппроксимации для эмпирических процессов Каца. Эта модель включает в себя такие важные специальные случаи, как случайное цензурирование справа и модель конкурирую^ щих рисков. Наши результаты включают в себя теорию сильной аппроксимации, и нами построены оптимальные скорости аппроксимации. Также исследованы кумулятивные процессы риска. Эти результаты использованы для оценивания характеристической функции в модели случайного цензурирования справа.
Ключевые слова: цензурированные данные, конкурирующие риски, эмпирические оценки, оценка Каца, сильная аппроксимация, гауссовские процессы, характеристическая функция.