DOI: 10.17516/1997-1397-2020-13-4-422-430 УДК 519.2
On Estimation of Bivariate Survival Function from Random Censored Data
Abdurakhim A. Abdushukurov*
Moscow State University Tashkent Branch Tashkent, Uzbekistan
Rustamjon S. MuradoV
Namangan Institute of Engineering & Technology Namangan, Uzbekistan
Received 11.03.2020, received in revised form 08.05.2020, accepted 16.06.2020
Abstract. At present there are several approaches to estimate survival functions of vectors of lifetimes. However, some of these estimators are either inconsistent or not fully defined in the range of joint survival functions. Therefore they are not applicable in practice. In this paper three types of estimates of exponential-hazard, product-limit and relative-risk power structures for the bivariate survival function are considered when the number of summands in empirical estimates is replaced with a sequence of Poisson random variables. It is shown that proposed estimates are asymptotically equivalent. Keywords: bivariate survival function, Poisson random variables, empirical estimates. Citation: A.A. Abdushukurov, R.S. Muradov, On Estimation of Bivariate Survival Function from Random Censored Data, J. Sib. Fed. Univ. Math. Phys., 2020, 13(4), 422-430. DOI: 10.17516/1997-1397-2020-13-4-422-430.
Introduction
The problem of estimation of multivariate distribution (or survival) function from incomplete data was considered from the beginning of 1980's (Campbell (1981), Campbell & Foldes (1982), Hanley & Parnes (1983), Horvath (1983), Tsay, Leurgang & Crowley (1986), Burke (1988), Dabrowska (1988, 1989), Gill (1992), Huang (2000), Abdushukurov(2004) etc.) (see, [1-20]). In the special bivariate case there are the numerous examples of paired data that represent life time of individuals (twins or married couples), the failure times of components of a system and others which are subject to random censoring. At present there are several approaches to estimate survival functions of vectors of life times. However, some of these estimators are either nconsistent or not fully defined in the range of joint survival functions. Hence they are not applicable in practice. In this work we present estimators for bivariate survival function and present some sample properties of estimators. We extend some results given in [1-4] to Poisson random summation. At the end of the paper we present consistent estimators of parameters of Marshall-Olkin exponential distribution.
1. Random right censoring model
Let X = [Xi = (Xii,X2i)}'^=1 be a sequence of independent and identically distributed (i.i.d.) two-dimensional random vectors with a common continuous survival function F(s,t) =
* [email protected] [email protected] © Siberian Federal University. All rights reserved
--+ 2
= P(X11 > s, X21 >t), (s,t) € R = [0, to) x [0, to). This sequence is censored from
the right by sequence Y = {Yi = (Y1i,Y2i)}°=1 of i.i.d. random vectors with survival func-
__+2
tion G (s,t) = P (Y11 > s, Y21 >t), (s,t) € R . Let us assume that there is the sample V(n) = {(Zi, Ai), 1 < i < n}, where Z = (Zu,Z2i), Ai = (5u,52i), = min(Xki,Yki), 5ki = I (Zki = Xki), k = 1,2, and I (■) is the indicator. The problem consist of estimating F from the sample V(n). Let H (s,t) = P (Z1i > s,Z2i > t), (s,t) € R+2 and sequences X and Y are independent. Then H (s,t) = F (s,t) G (s,t), (s,t) € R+2. In this paper we use exponential-hazard, product-limit and relative-risk power types functionals in order to construct the corresponding estimates of three types for F. In the empirical estimates the upper index of summation n is replaced by the Poisson random variable (r.v.) nn with expectation = n. This arises in the insurance business as the size of group insurance payments by an insurance company to customers in connection with an insured event. Following [2], we introduce some auxiliary functionals for (x,y) € R+2:
M (x, y)= P (Zn < x, Z21 >y), N (x,y) = P (Zn > x, Z21 < y),
M(x,y)= P (Zn < x, Z21 >y, 6n = 1), N(x,y) = P (Zn >x, Z21 < y, 521 = 1),
( fx M (ds,y)
k1 (x,y) = Jo Hs-y),
. . . ry N (x, dt)
A2(x,y)=J0 Hk-),
A1 (x,y)
r M(ds,y) lo H (s-,y),
A2 (x,y)
N1 (x, dt) H (x, t—),
where
A (x, y) = A1 (x, 0) + A2(x, y), A (x, y) = A1 (x, 0) + A2 (x, y), Ac (x, y) = A1 (x, 0) + A22(x, y), Ac (x, y) = A1 (x, 0) + A2 (x, y),
A1 (x, y) = A1 (x, y) -Y; A1 (A s, y), A1 (A s, y) = A1 (s, y) - A1 (s-, y),
(1.1)
A2 (x, y) = A2 (x, y) - ^ A2 (x, A t), A2 (x, A t)=A2 (x, t) - A2 (x, t-),
and similarly defined A1 and A2. To construct estimates for F we estimate functionals (1.1). Firstly, we introduce the following empirical estimates of the first four probabilities in (1.1) from the sample V(n):
1n
Hn (x,y) = ~y~] I (ZU > x, Z2i > y), n
i=1
Mn (x,y) = ~y~] I (Z1i < x, Z2i > y),
T> < *
i=1
Nn (x,y) = I (Z1i > x, Z2i < y),
n -^
i=1
Mn (x,y) = I (Z1i < x, Z2i >y,51i = 1),
n -^
i=1
Nn (x,y) = -T, I (Z1i > x, Z2i < y, 52i = 1). tl < ^
(1.2)
Let {^n, n > 1} be a sequence of Poisson random variables (r.v-s.) with parameter E^n = n, that is independent of the pair (X, Y). Along with estimates (1.2), we propose also their analogues
y
0
H*, M*, N*, M*, N* obtained from estimates (1.2) by replacing the upper limit of summation n by r.v. ¡in. However, it should be noted that these estimates have the disadvantage because they can be greater than 1. In fact, for example, for
1
Hn (x,y) = I (Zii > x, Z2i > y),
n z—'
i=i
we have
P (Hn (0,0) > 1)= P (Vn>n)= ]T —- > 0-
z—/ m!
To avoid this disadvantage we consider the following truncated versions of estimates
h*, M*, N*, M:,N: :
H0 (x,y) = 1 - (1 - H* (x, y)) I (H* (x, y) < 1)--
m ir^uím^iu J Hn (x,y) if Hn (x,y) ^ 1,
0 if H*n (x,y) > 1,
and similarly constructed estimates M0, N0, M0, N0. In similar way we construct the corresponding estimates for functionals in (1.1):
'* M0 (ds,y) _fy N0 (x,dt)
. . . x M0 (ds,y)
Ain (x,y)= "v :, Ain (x,y) =
Jo H° (s-,y) Jo
o HO (s-,y) Jo H0 (x,t-) '
- ( fx M0 (ds,y) - ( ) fy N0 (x,dt) (1.3)
A 1n(x,y) = Jo Hüs-yj, A2n(x,y) = Jo Wx-),
An (x, y) = Ain (x, 0) + A2n (x, y), An (x, y) = A in (x, 0) + A 2ri (x, y). The relative-risk function is
A- (x, y)
R (x,y) = Ti—^
A (x, y)
and its estimator is
A- (x, y)
Rn (x,y) = ^-^.
An (x, y)
Using estimates (1.3), we propose the following three estimates of F (x, y) for exponential, product and power structures
Fin (x,y) =exp{-An (x,y)} =exp{- (Ain (x, 0) + A2n (x,y))} ,
Fin (x,y) = n(l - A in (A s, 0))H(1 - A 2n (x, A t)), (1.4)
F3n (x, y) = [Hn (x,y)]Rn(x'y\
Let An
0,Zi
(n)
0,Z2
(n)
nA, where Zkn) =max (Zki,..., Zkn), A =
0,TZ
(i)
0,Tz
(2)
T^ = inf{t > 0 : P(Zkl < t) = 1}, k = 1, 2. The following theorem states the asymptotic equivalence of estimates (1.4).
Theorem 1.1. For all (x,y) G An :
(I) 0 < Fm (x,y) - F2n (x,y)=Op (^ • If the survival function G is also continuous on An then
(II) \Fm (x,y) - F3n (x,y)\ = OpM 7
x
X
One can also obtain from (I) and (II) that
'flog n^ 1/2
\F3n (x,y) - F2n (x,y)\ = O
p
V n
To prove Theorem 1.1 we need the following auxiliary statements.
Lemma 1.1. Let {^n, n ^ 1}- be a sequence of Poisson r.v-s. with expectation n. Then for any number e > 0 and for n such that
n e
^ -H , e = exp(1) , (1.5)
log n 8(l + 3)
the inequality
pflïn-A > u£ . 1/2\ ^ 2n-c0, (1.6)
\ n 2 \2 n J I
is true, where c0 = c0 (e) = e/16 (1 + e/3).
Proof. Let be a sequence of Poisson r.v.-s with expectation E(jk) = 1 for all k =
nn
= 1, 2,.... Then Un - n = J2(Yk - 1) = J2 £k, where
k=1 k=1
^ (et)k
Eet^k = e-tEetY1 = e-te-1^ = exp(et - (t +
k=0 '
Using Taylor expansion of et, we have
Ee^k = exp(l + t + t2 + *(t) - (t + = exp(^^ + ,
t3
where ^(t) = —exp(9t), 0 < 9 < 1. For 0 ^ t ^ 1,we have t3 ^ t2 and consequently 6
t3 t2
^(t) ^ 7T • e ^ e • —. From here, for 0 ^ t ^ 1 we obtain 66
Eet'k < ^ l^ + S) =eXp{ Y • t0 ' ^k = 1 + 3
Then using following exponential inequality for nonidentical distributed r.v.-s of Petrov ([22])
/ n
E fc
p
1 ( e
k=1
>u\ < 2 exp ( —- ), 0 < u < N,
under 0 ^ u =2 n log nj 2 ^ Xkn = N, we obtain (1.6). □
The following inequality for two-dimensional empirical estimates from [21, p. 292] is used below. Let C = C (H) = H (t^Kt™) > 0.
Lemma 1.2 ([21]). For all real z > 0
p( sup \Hn (x,y) - H (x,y)\ > zC2 J < Vz • (1 + n2f exp (-2nz2 • C4) , (1.7)
K(x,y)e R+2
where Vz = Vz (H) = 4exp (4zC2 + 4z2C4) .
1¡2
Corollary 1.1. Let z = z0 = • • C— in (2.7). Then
P sup \Hn (x,y) - H (x,y)\ >
\(x,y)eR+2
'Onf c-2) <*m. (I-«)
where
qn(e) = 4exP ' logn
1/2
1 +
4 + £ 2n
log n
1/22
• n + l)2n~(A+e) = O (n-E) .
Therefore, for e > 1 from (1.8) we have by Borel-Cantelli lemma that
sup \Hn (x,y) - H (x,y)\ a=' O ( f10^)
(x,y)eAn \\ n J .
(1.9)
In the next lemma we establish an analogue of (1.7) for an empirical estimate H°. Let qn (e) be obtained from qn (e) by replacing 4 + e with (4 + e) ¡4.
Lemma 1.3. Under the conditions of Lemma 1.1
' \ 1/2 4 + £ log n \
P sup H (x,y) - H (x,y)\ >
\(x,y)eAn
■
■C-2) < 2n-C°(E+4) + q0n (£) . (1.10)
Proof. For pn < n : Hn (x, y) = Hn (x, y) for all (x, y) G R+2 and for pn > n we have sup \H°(x,y) - H(x,y)\ < sup \H*n(x,y) - H(x, y)\.
(x,y)eR+2
(x,y)eR+2
Using the formula of complete probability, we obtain
P sup \Hn(x,y) - H(x,y)\ >z0C < H sup \Hn(x,y) - H(x,y)\ > z0C2 <
Jx,y)eA„
(x,y)eAri
< P sup
\(x,y)eA„
sup
\(x,y)eA„
Hn(x,y) - H(x,y) + - y~] I(Zu >x,Z2i > y)
n < J
i=n+1
ßn
Hn(x.y) - H(x,y)--V I(Zu >x,Z2i >y)
T> < *
i=n+1
> Z0C2 fpn > n ■ P(pn > n) +
> ZoC2 fpn > n P(pn > n) <
< M sup \Hn (x,y) - H (x,y)\ >-zoC2\ +
,(x,y)eA„
+P ( sup
(x,y)eA„
EI (Zu > x, Z2i > y)
i=nAßn+1
> 2zoC2 I <
< q°n (e) + P > 2zoC2) < 2n-^+4) + q0 (e),
where (1.6) and (1.8) are used. □
Proof of Theorem 1.1. From inequalities (2.4.2) in [2] applied to estimates F1n and F2n we have
2
n
Vn-1
i=1
Pn-1
2n2 1 ^
i=1
Sm I\Z\l> < x
S0Z ( Z(i) S1n \Z1 -
Pn-1
+
i=1
0 < F1n (x, y) - F2n (x, y) < 2 E №) (x, 0)) + (q2n) (x, y))
¿2(i) I (z(i) < x, Z2i < yj
Hn(z2 i)-)
< (1.11)
<
2n2
S0Z(z(r-1)-Y + H (x,zi»n-1)-)
where Zk1 < ... < Z^ order statistics are constructed from Zki, k = 1, 2, 6k(i) corresponds to Zki) and S0Z (x) = Hn (x; 0). It is known that for n j œ, Zkn) J T{Zk), k = 1, 2. We show that Z^) JJ Tkk), k =1,2 when n j œ. For e > 0, 0 < 6 < 1 and k = 1, 2 we have
p
Z^ftn)_t
< p
Z(^n) - t(k)
> e,
Un _ 1
> ej < < 6 + p
Un _ 1
^ 6 <
< P
Z^n") - t(k) < p
>e,n (1 - 6) <Un < n (1 + 6)) + P (
Vn _ 1
Zkn) - Tk
> e + p
1
> 6.
For arbitrary n > 0 there are numbers n1 and e such that for n ^ n1
> 6 <
p
>e) <2, k =1, 2.
(1.12)
Since P ^ — - 1 ^ 6^ J 0 when n J œ then for n ^ n2
p
Vn _ 1
(1.13)
Then for n ^ n0 = max(n1,n2) we obtain from (1.12) and (1.13) that
p
Z^ftn)_t
>e <n,
(1.14)
which is required result. Thus, taking into account (1.13) and (1.14), for n ^ to with probability close to 1 we have
ZkPn-1) « Zkn-1), k = 1, 2,
Un = Op[ -I.
n2 n
(1.15)
Taking into account (1.15) and the following relations obtained from (1.10) for (x,y) G An
1 \S°Z (x) - SZ (x)\ 1
<
+
S0Z (xP S0Z (x) SZ (x) SZ (x) SZ (x)
+ Op
log n \ 1/2
1 < \Hn (x,y) - H (x,y)\ + 1
Hn (x,y)^ Hn (x,y) H (x,y) ' H (x,y) H (x,y)
+ Op
log n
1/2
2
2
U
n
n
n
n
n
1
n
1
n
we obtain the right estimate in (I). Now according to the inequality |u — v| < | logu — logv|, for
0 <u, v < 1, 0 < Rn (x,y) < 1 and (x, y) G An we have
Fin (x,y) — F3n (x,y) < An (x,y)
— 1 +
(—log Hn (x,y)
(1.16)
An (x, y)
= Rn (x, y) I (— log Hn (x, y)) — An (x, y) I < < I — log Hn (x, y) + log H (x,y)I + |(— log H (x, y)) — An (x, y) |. According to Lemma 1.3 and the mean value theorem for (x, y) G An we obtain
| — log Hn (x,y) + log H (x,y)|=O^ j. (1.17)
Taking into account continuity of G, Lemma 3.4.3, the proof of Theorem 2.4.3 and Remark 2.4.4 in [2] we obtain for (x, y) G An that
|— log H (x,y) — An (x^y^ = Op ^^Y j . (1.18)
Now (II) follows from relations (2.16)-(2.18). □
It was shown in Theorem 2.4.3 in [2] that in the case of continuity of F and G both exponential-hazard and relative-risk power functionals coincide with the estimated survival function F. Then, taking into account Theorem 1.1, we can state that all three estimates (1.4) are consistent estimates of F (see, also [5]).
2. Estimation of parameters of Marshall-Olkin exponential distribution
__+2
Let us consider survival function F(s,t) = P(Xii > s,X2i > t), (s,t) G R of Marshall-Olkin exponential form with unknown parameters \i,\2, ^12
__l2
F(s,t) = exp(—Xis — X2t — Xi2 mdx(s,t)), (s,t) G R . (2.1)
Then corresponding cumulative hazard function is
A(s,t) = — log F(s,t) = Xis + X2t + Xi2 max(s,t). (2.2)
Nonparametric estimator of A(s,t) from (2.4) is An(s,t) = — log Fin(s,t) = Ain(s, 0)+A2n(s,t). It is easy to verify from (2.2) that we have the system of equations for s > 0
' A(s, 0) = Xis + Xi2s,
A (0, s) = X2s + Xi2s, (2.3)
k A (s, s) = Xis + X2s + Xi2s.
From (2.3) we find expressions for unknown parameters Xi,X2 and Xi2 for a fixed point
s = so > 0 :
(2.4)
1
Xi =
so
1
X2 =
so
1
Xi2 =
so
Now we obtain estimators of parameters from (2.4) by replacing A with An :
1
so
Xr = ~(An (so, so) - An (0, s0)),
1
so 1
X(o = — (An (so, so) - An (so, 0)), (2.5)
\(n)
= — (An (so, 0) + An (0, so) - An (so, so)) • so
It follows from Theorem 1.1 that An(s,t) is consistent estimator of A(s,t). Consequently, relations (2.5) give consistent estimators of corresponding parameters (2.4) of distribution (2.1).
References
[1] A.A.Abdushukurov, Nonparametrical estimators of survival function on the plane by censored observations, Uzbek mathematical journal, 2(2004), 3-11 (in Russian).
[2] A.A.Abdushukurov, Estimates of unknown distributions from incomplete observations and their properties, LAMBERT Academic Publishing, 2011, (in Russian).
[3] A.A.Abdushukurov, R.S.Muradov Estimation of two-dimensional survival function by random right censored Data, Proc. of XI-th Intern. Conf. Computer Data Analysis and Modelling, Minsk, Belorus, 2016, 85-86.
[4] A.A.Abdushukurov, Estimation of joint survival function from censored observations, Industrial Lab. Diagn. Mater., 82(2016), 80-84 (in Russian).
[5] A.A.Abdushukurov, R.S.Muradov On the estimates of the distribution function in a random censorship model, Industrial Lab. Diagn. Mater., 80(2014), 62-67 (in Russian).
[6] A.A.Abdushukurov, R.S.Muradov On Estimation of Conditional Distribution Function under Dependent Random Right Censored Data, Journal of Siberian Federal University, 7(2014), 409-416.
[7] A.A.Abdushukurov, R.S.Muradov Estimation of survival functions from dependent random right censored data, Intern. J. Innovation in Science and Mathematics, 2(2014), 280-287.
[8] A.A.Abdushukurov, R.S.Muradov, Some algebraic properties of Archimedean copula functions and their applications in the statistical estimation of the survival function, New Trends in Math. Sci, 4(2016), 88-98. D0I:10.20852/ntmsci.2016318808
[9] M.D.Burke, Estimation of a bivariate distribution function under random censorship, Biometrika, 75(1988), 379-382.
[10] G.Campbell, Nonparametric bivariate estimation with randomly censored data, Biometrika, 68(1981), 417-422.
[11] G.Campbell, A.Foldes Lange sample properties of nonparametric bivariate estimators with censored data, Colloquia Mathematica-Societatis Janos Bolyai, 32(1982), 103-122.
[12] D.M.Dabrowska, Kaplan-Meier estimate on the plane, Ann. Statist., 16(1988), 1475-1489.
[13] D.M.Dabrowska, Kaplan-Meier estimate on the plane: weak convergence, LIL and the bootstrap, J. Multivar. Anal, 29(1989), 308-325.
14] R.D.Gill, Multivariate Survival Analysis I, Theory Probab. Appl., 37(1992), 19-35 (in Russian).
15] R.D.Gill, Multivariate Survival Analysis II, Theory Probab. Appl., 37(1992), 307-328 (in Russian).
16] J.A.Hanley, M.N.Parnes, Nonparametric estimation of a multivariate distribution in the presence of censoring, Biometrica, 39(1983), 129-139.
17] L.Horvath, The rate of strong uniform consistency for the multivariate product-limit estimator, J. Multivar. Anal., 13(1983), 202-209.
18] Y.Huang, Two-sample multistate accelerated sojourn times model, J.A.S.A., 95(2000), 619-627.
19] R.S.Muradov, A.A.Abdushukurov, Estimation of multivariate distributions and there mixes by incomplete data, LAMBERT Academic Publishing, Germany, 2011 (in Russian).
20] W.Y.Tsai, S.Leurgans, J.Crowley, Nonparametric estimation of a bivariate survival function in the presence of censoring, Ann. Statist., 14(1986), 1351-1365.
21] J.D.Fermanian, Multivariate hazard rates under random censorship, J. Multivar. Anal., 62(1997), 273-309.
22] V.V.Petrov, Limit Theorems of Probability Theory: Sequence of independent r.v.-s., Oxford Studies in Probability-4, Vol. 4, Clerandon Press, Oxford, 1995.
Об оценивании двумерной функции выживания по случайно цензурированным данным
Абдурахим А. Абдушукуров
МГУ, Ташкентский филиал Ташкент, Узбекистан
Рустамжон С. Мурадов
Наманганский инженерно-технический институт
Наманган, Узбекистан
Аннотация. В настоящее время существует несколько подходов к оценке функций выживания векторов времени жизни. Однако некоторые из этих оценок либо являются несостоятельными, либо не полностью определены в области функций совместного выживания и поэтому не применимы на практике. В работе авторами предложены состоятельные оценки совместной функции выживания экспоненциальной, множительной и степенной структур при случайном пуассоновском объёме выборки. Показано, что эти оценки асимптотически эквивалентны.
Ключевые слова: двумерная функция выживания, пуассоновские случайные величины, эмпирические оценки.