УДК 519.24
Conditional Distribution Function Random Right Censored Data
Abdurahim A. Abdushukurov*
Dpt. Probability Theory and Mathematical Statistics National University of Uzbekistan VUZ Gorodok, Tashkent, 100174
Uzbekistan
Rustamjon S. Muradov^
Institute of Mathematics National University of Uzbekistan VUZ Gorodok, Tashkent, 100174
Uzbekistan
Received 10.06.2014, received in revised form 20.08.2014, accepted 01.10.2014 In this article we study simple integral-type estimator of distribution function under random right censored observations at fixed covariate values, where the dependence between a life time and a censoring variable may expressed by a given Archimedean copula. We prove an almost sure asymptotic representation which provides a key tool for obtaining weak convergence result for estimator.
Keywords: fixed design, right censoring, copulas, asymptotic representation, weak convergence, Gaussian process.
Introduction
In such research areas as bio-medicine, engineering, insurance, social sciences, ..., researchers are interested in positive variables, which are expressed as a time until a certain event. For example, in medicine the survival time of individual, while in industrial trials, time until breakdown of a machine are non-negative random variables (r.v.-s) of interest. But in such practical situations, the observed data may be incomplete, that is censored. This is the case, for example, in medicine when the event of interest-death due to a given cause and the censoring event is death due to other cause. In industrial study, it may occur that some piece of equipment is taken away (that is censored) because it shows some sign of future failure. Moreover, the r.v.-s of interest (lifetimes, failure times) and censoring r.v.-s usually can be influenced by other variable, often called prognostic factor or covariate. In medicine, dose of a drug and in engineering some environmental conditions (temperature, pressure, ...) are influenced to the observed variables. The basic problem consist in estimation of distribution of lifetime by such censored dependent data. The aim of paper is considering this problem in the case of right random censoring model in the presence of covariable.
Let's consider the case when the support of covariate C is the interval [0,1] and we describe our results on fixed design points 0 < x\ < x2 < ... < xn < 1 at which we consider responses (survival
*[email protected] t r_muradov@ myrambler. ru © Siberian Federal University. All rights reserved
On Estimation of under Dependent
or failure times) X\,..., Xn and censoring times Y1,...,Yn of identical objects, which are under study. These responses are independent and nonnegative r.v.-s with conditional distribution function (d.f.) at xi, Fxi(t) = P(Xi < t/Q = xj). They are subjected to random right censoring, that is for Xi there is a censoring variable Yi with conditional d.f. Gxi (t) = P(Yi < t/Ci = xi) and at n-th stage of experiment the observed data is
S(n) = {(Zj,Si,Cj), 1 < i < n},
where Zi = min(Xi, Yi), 5i = I (Xi ^ Yi) with I (A) denoting the indicator of event A. Note that in sample S(n) r.v. Xi is observed only when Si = 1. Commonly, in survival analysis to assume independence between the r.v.-s Xi and Y conditional on the covariate Ci. But, in some practical situations, this assumption does not hold. Therefore, in this article we consider a dependence model in which dependence structure is described through copula function. So let
Sx(ti,t2) = P(Xx >ti,Yx >t2),ti,t2 > 0,
the joint survival function of the response Xx and the censoring variable Yx at x. Then the marginal survival functions are Sf(t) = 1 - Fx(t) = Sx(t, 0) and Sj(t) = 1 - Gx(t) = Sx(0, t), t > 0. We suppose that the marginal d.f.-s Fx and Gx are continuous. Then according to the Theorem of Sclar (see, [1]), the joint survival function Sx(t1,t2) can be expressed as
Sx(tl,t2) = Cx(SX (ti),Sj (t2)),ti,t2 > 0, (1)
where Cx(u, v) is a known copula function depending on x, Sf and Sj in a general way. It is necessary to note that in the case of no covariates, this idea first was considered by Zeng and Klein [2] and proposed copula-graphic estimator. Rivest and Wells [3] investigated copula-graphic estimator and derived a closed form expression for estimator when the joint survival function (1) is modeled an Archimedean copula. The copula-graphic estimator is then shown to be uniformly consistent and asymptotically normal. Note that the copula-graphic estimator is equivalent to the product-limit estimator of Kaplan and Meier [4] when the survival and censoring times are assumed to be independent. Braekers and Veraverbeke [5] extend copula-graphic estimator to the fixed design regression case and show that estimator has an asymptotic representation and a Gaussian limit. We consider other estimator of d.f. Fx which had a simpler form than copula-graphic estimator and it is also equivalent to the usual exponential-hazard estimator under independent censoring case. We study the large sample properties of estimator proposed and present result of uniform normality with the same limiting Gaussian process as for copula-graphic estimator.
1. Construction of estimator and asymptotic results
Assume that at the fixed design value x e (0,1), Cx in (1) is Archimedean copula, i.e.
Sx(ti,t2) = ^Ti](Wsx(ti)) + ^x(Sj(t2))),ti,t2 > 0, (2)
where, for each x, ^>x : [0,1] ^ [0, is a known continuous, convex, strictly decreasing function with <^x(1) = 0. ^x-i] is a pseudo-inverse of (see, Nelsen [1]) and given by
J-i](s) i ^-i(s), 0 < s < ^x(0), ( ) 1 0, ^x(0) < s < to.
We assume that copula generator function is strict, i.e. yx(0) = ro and hence ^L 1] = 1-From (2), it follows that
P(Zl > t) = 1 - Hx(t) = Hx(t) = sz(t) = sL(t,t) = sz (t)) + SY(t))), t > 0, (3)
Let HL1)(t) = P(Zl < Ml = 1) be a subdistribution function and AL(t) is crude hazard function of r.v. XL subjecting to censoring by YL,
p(Xl e dt,Xx < Yl) _ HL1)(dt)
Ax(dt) =
P(Xl > t,Yx > t) SZ(t—)' From (4) one can obtain following expression of survival function SZ:
(4)
SX (t) = ^
-1
X
-f (u-)^X (SZ(u)) dAx(u^ = ^ [- /Vx (SZ(u)) dHX1)(u)
•/0 .. Jo
t > 0,
(5)
(see, for example, [3,5]). In order to constructing the estimator of S_Z according to representation (5), we introduce some smoothed estimators of S:f, HX1 and regularity conditions for them. Similarly to Breakers and Veraverbeke [5], we will also use the Gasser-Muller weights
with
1 fXi 1 fx - z .
i(x, h„) = —-—— —n —- ) dz, i =1,...,n,
qn(x, hn) JX._1 h„ V h„
qn(x, h„) = i -1 n ( —,—Z | dz,
Jo hn
(6)
where =0, n is a known probability density function(kernel) and {hn,n > 1} is a sequence of positive constants, tending to zero as n ^ ro, called bandwidth sequence. Let's introduce the weighted estimators of HL, SZ and H(1) respectively as
#Xh(i) = wni(x, hn)I(Zi < t),
i=1
SZh(t) = i - Hh(t),
n
H^t) = 13 wni(x, hn)I(Zi < t,5i = 1).
(7)
Then pluggin in (5) estimators (7) we get corresponding estimator of SZ (t) as
SZh(t) = 1 - Fx h(t) = ^
f *
- / ^X(SZh(u))dH(h}(u)
0
t ^ 0,
(8)
Remark that in the case of no covariate, estimator (8) reduces to estimator first obtained by Zeng and Klein [2]. In the case of the independent copula y(y) = — log y, Zeng and Klein estimate reduces to a exponential-hazard estimate (see, [8, 9]). Also it is well-known that under independent censoring case Kaplan-Meier's product-limit estimator and exponential-hazard estimators are asymptotical equivalent. Therefore, we will show that estimator (8) and copula-graphic estimator of Breakers and Veraverbeke have the same asymptotic behaviours.
For the design points x1, ...xn, denote
An = min (xj — Xj-1), A„ = max (xj — x^).
1<i<n 1<i<n
n
1
For the kernel n, let
n2(u)du, mv(n) = / uvn(u)du, v = 1, 2,
- ^ J —<x>
2
- ^
= sup n(u).
Moreover, we use next assumptions on the design and on the kernel function: (A1) As n ^to, x— ^ 1, An = O( —), An - A— = o( —).
(A2) n is a probability density function with compact support [—M, M] for some M > 0, with
mi(n) = 0 and |n(u) — n(u')| ^ C(n)|u — u'|, where C(n) is some constant.
Let THx = inf{- > 0 : Hx(t) = 1}. Then THx = min(TFx,TGx). For our results we need
some smoothnees conditions on functions Hx(t) and ffX1^-). We formulate them for a general
(sub)distribution function Nx(t), 0 < x < 1,- G R and for a fixed T > 0. d
(A3) — Nx(t) = Nx(t) exists and is continuous in (x,t) G [0,1] x [0,T]. dx
d
(A4) dtNx(t) = NX(t) exists and is continuous in (x,t) G [0,1] x [0,T]. d2
(A5) d—2 Nx(t) = Nx(t) exists and is continuous in (x,t) G [0,1] x [0, T]. d2
(A6) dt2Nx(t) = NX'(t) exists and is continuous in (x,t) G [0,1] x [0,T]. d2
(A7) —d-Nx(t) = NX(t) exists and is continuous in (x,t) G [0,1] x [0,T]. dxdt
d (u) ^2 (u)
(A8) ^ U = vX(u) and —= ^"(u) are Lipschitz in the x-direction with a bounded
(u)
Lipschitz constant and —— = v4"(u) < 0 exists and is continuous in (x,u) G [0,1] x (0,1].
du3
It is clear that for existence of right hand side of representation (5) we must require the conditions (A 4) for functions Hx(t) and H(1)(t) in [0,1] x [0,T] with T < THx and existence of vX(u) on [0,1] x (0,1].
We derive an almost sure representation result with rate.
Theorem 1.1. Assume (A1), (A2), Hx(t) and H(1)(t) satisfy (A5)-(A7) in [0,T] with T < THx,
log n nh—
satisfies (A8) and hn ^ 0, —---► 0, -—— = O(1). Then, as n ^ to,
nhn log n
Fxh(-) — Fx(t) = ^ wni(x, h„)^ix(Zi, ¿i) + r—(t),
i=i
where
— 1 r /"4
= ' (—x(-)) vX (SZ(u)) (/(Zi < u) — Hx(u))dHX1)(u) —
v X(SX (-)) L ./o
—vX (sZ(-)) (I(Zi < -,5i = 1) — HX1)(t)) — /Vx (SZ(u)) (I(Zi < u,¿i = 1) — HX1)(u))dHx(u)
o
and
sup |r—(t)| a==s' O . ,
o<t<T \ V nh
/logn^ 3/4
—
The weak convergence of the empirical process (nhn)1/2{Fxh() — Fx()} in the space lTO[0, T] of uniformly bounded functions on [0, T], endowed with the uniform topology is the contents of the next theorem.
Theorem 1.2. Assume (A1), (A2), Hx(t) and hX1)(t) satisfy (A5)-(A7) in [0,T] with T < THx, and that satisfies (A8).
(log n)3
(I) If nh^ ^ 0 and-h--* 0, then, as n ^ to,
(nhn)1/2 {Fxh(^) — FX(0} Wx(-) in l~[0,T]. (//) If hn = Cn-1/5 for some C > 0, then, as n ^ to,
(nhn)1/2{Fxh(-) — Fx(^)} W£(0 in l~[0,T], where Wx() and WX(•) are Gaussian processes with means
EWx(t)=0, E WX(t)= ax(t),
and same covariance
Cov( Wx(t), WX(s)) = Cov( WX(t), WX(s)) = r*(t, s)
with
a (t) =
(t)) .,„
«x(t) = ^TjX?!^ t k^Z (u))HHx(u)dHX1)(u) — ^(Sf(u))dHHX1)(u)
and
r-(<-s) = ST^wfe^ 1 1 (SZ <"»" dHX"<z>+
|2 | /»min(t,s)
'2 I / / /cZ(
+ [¿M (w))SxZ (w) + (w))] / (y))dHX1)(y)dHX1)(w)+
J0 J0
min(t,s) max(t,s)
+ / (w)W (y))sZ (y) + (y))) dHX1)(y)dHX1)(w)—
J0 Jw
— 0 K(Sf(y))SZ(y) + (y))] dHW(y) 0 (w))SZ(w) + (w))] dH^w)} .
2. Proofs of Theorems 2.1 and 2.2
In order to proving the Theorems 2.1 and 2.2 we need some auxiliary results for empiricals Hxh and ffXh1. While the Lemma 3.1 below (i.e. Lemma A4 from [9]) about the rates of strong uniform consistency of weighted empiricals is formulated only for Hxh, it is still true also for Ha and proved exactly with the same way.
Lemma 2.1 ( [9]). (I) Assume (A 1), (A 2), Hx(t) satisfies (A 3), hn ^ 0, nhn ^ to, nh?
log n
= O(1). Then, as n ^ to,
a s
sup |Hxh(t) — Hx(t)| = O 0<t<T \ V nh.
log n 1/ 2
(//) Assume (A 1), (A 2), Hx(t) satisfies (A 3) and (A 5), h„ ^ 0,
nhL
log n
O(1). Then, as
sup |Hxh(t) - H(t)| a=. O( f10^
0<i<T \ V nhn
1/2N
The next Lemma 3.2 (Lemma 2 in [5]) provides the convergence rate of Theorem 2.1. Lemma 2.2 ( [5]). Under the conditions of theorem 2.1, as n ^ to,
sup
0<t<T
- 0 [vX (Sfh(u)) - vX (Sf (u))] d (H^u) - H(1)(u))
a== o( f log n
y nhn
3/4>
Proof of Theorem 2.1. Applying a second order Taylor expansion, we have
Fxh(t) - FX(t) = - (S*(t) - SX(t)) =
= -{ v-1 - l vX (Sfh(u)) dHXh)(u) - v-1 - JO vX (Sxf (u)) dH«(u) J
+ -
vX (SX (t)) 2 [vX (^-1(0xh(t)))]:
{-0 vX (Sfh(u))dHXh)(u) + 0 vX (Sxf(u))dHX1)(u)} + {- ^ vX (Sfh(u)) dH(h)(u) + J* vX (Sf (u)) dH(1)(u)} =
A„(t) + B„(t),
where 0Xh(t) between
/"t 1 r z-t
-/ vX(Sfh(u)) dH(h)(u) and -/ vX(Sf (u)) dH(1)(u)
00
first summand we rewrite as
An (t) =--r
[Qn1(t) + Qn2(t) + Qn3(t)] ,
where
and
vX (SX (t))
Qn1 (t) = - f [vX(Sfh(u)) - vX(Sf (u))] dH(1)(u),
0
Qn2(t) = - /1 vX(Sfh(u))d(H(h)(u)) - H(1)(u)), 0
/■ t
Qn3(t) = - / [vX(Sfh(u)) - vX(Sf (u))]d(H(h)(u)) - H(1)(u)).
0
(9)
. In (9) the (10)
From Lemma 2.2, we get
sup |Q„3(t)| a=. of f1^ ^
0<t<T \ V nhn
(11)
Furthermore, for 0 ^ t ^ T < THx, also by Taylor expansion,
Qn1(t) = / vX(Sf (u))(Hxh(u) - Hx(u))dH(1)(u)-
JO
r t 1
- yo 2vX(nxh(u))(Hxh(u) - Hx(u))2dH(1)(u) =
= /f vX(Sf (u))(Hxh(u) - Hx(u))dH(1)(u) + q„(t),
0
n ^ to
1
1
where nxh(u) G [min(Hxh(u), Hx(u)), max(Hxh(u), Hx(u))] and from Lemma 3.1,
sup |qn(t)| O ( ^) . (13)
0<t<T v nh? /
Integrating by parts, we rewrite Qn2(t) as
Qn2(t) = —5X(SxZ(t)) (HXh)(t) — HX1)(t)) + £ 5X(SxZ(u)) (ffXhV) — HX1)(u^ dHx(u). (14) Therefore, from (10)-(14), and Lemma 2.1, we have
1/2
sup iA?(t)i ^i^ni . (15)
a.s. log n I A? (t) -
0<i<T
Since,
hence, from (15)
sup |Bn(t)| a==s. o(( sup |An(t)^ ), (16)
sup |Bn(t)| a= o(^Y (17)
0<t<T v nh? )
Then, finally from (9)-(17), we obtain that for 0 ^ t ^ T < THx, as n ^ to,
FXh(t) — FX(t) a==s. — ' (Y( ( /Vx'(Sxz(u))(HXh(u) — HX(u))dHX1)(u) —
5X(SX (t)) I ./0
—5X(SxZ(t)) (HXh)(t) — tfX1^)) + 0 5X(Sf(u)) (HXh)(u) — F^u)) dHx(u) j +
+o ((m f)=t-(x.hn)^.x(z„,)+0( f f),
which completes the proof of Theorem 2.1.
It is necessary to note that almost sure representation of Theorem 2.1 plays a key role on investigating of estimator (8) and, in particular, it provides a basic tool for obtaining weak convergence result of Theorem 2.2. But the main summand of this representation is the same as in the case of copula-graphic estimator from [5]. Then the proof of Theorem 2.2 one can accomponing by line of proof of Theorem 2 from [5]. Therefore, the proof of Theorem 2.2 is omitted. Thus, the estimator (8) and copula-graphic estimator are asymptotic equivalent.
References
[1] R.B.Nelsen, An Introduction to Copulas, Springer, New York, 1999.
[2] M.Zeng, J.P.Klein, Estimates of marginal survival for dependent competing risks based on an assumed copula, Biometrika, 82(1995), 127-138.
[3] L.P.Rivest, M.T.Wells, A martingall aproach to the copula-graphic estimator for the survival function under dependent censoring, Journal of Multivariate Analysis, 79(2001), 138-155.
[4] E.L.Kaplan, P.Meier, Nonparametric estimation from incompelet observations, Journal of American Statistical Association, 53(1958), 457-481.
[5] R.Breakers, N.Veraverbeke, A copula-graphic estimator for the conditional survival function under dependent censoring, The Canadian Journal of Statistics, 33(2005), no. 3, 429-447.
[6] T.R.Fleming, D.P.Harrington, Counting Processes and Survival Analysis, Wiley, New York, 1991.
[7] R.S.Muradov, A.A.Abdushukurov, Estimation of multivariate distributions and its mixtures, LAMBERT Academic Publishing, 2011 (in Russian).
[8] A.A.Abdushukurov, R.S.Muradov, Estimation of survival and mean resudial life functions from dependent random censored data, New Trends in Mathematical Sciences, 2(2014), no. 1, 35-48.
[9] I.Van Keilegom, N.Veraverbeke, Estimation and the bootstrap with censored data in fixed design nonparametric regression, Ann. Inst. Statist. Math., 49(1997), no. 3, 467-491.
Об оценивании условной функции распределения при зависимом случайном цензурировании справа
Абдурахим А. Абдушукуров Рустамжон С. Мурадов
В данной статье мы исследуем простую оценку интегрального типа функции распределения случайно цензурированных при фиксированных ковариатах наблюдений, где зависимость между про-жительностью жизни и цензурирующей случайной величиной выражается через архимедовы копулы. Для оценки мы доказываем асимптотическое представление с вероятностью единица, которое обеспечивает ключевой подход для получения результата слабой сходимости.
Ключевые слова: фиксированный план, цензурирование справа, копулы, асимптотическое представление, слабая сходимость, гауссовский процесс.