Journal of Siberian Federal University. Mathematics & Physics 2023, 16(1), 66—75
EDN: NTXBVR УДК 517.24
On Special Empirical Processes of Independence in Presence of Covariates
Abduraxim A. Abdushukurov*
Moscow State University named after M. V. Lomonosov, Tashkent Branch
Tashkent, Uzbekistan
V. I. Romanovskiy Institute of Mathematics of Uzbekistan Academy of Sciences
Tashkent, Uzbekistan
Farkhad A. AbdikalikoV
Karakalpak State University named after Berdakh
Nukus, Uzbekistan
V. I. Romanovskiy Institute of Mathematics of Uzbekistan Academy of Sciences
Tashkent, Uzbekistan
Received 10.05.2022, received in revised form 15.08.2022, accepted 04.11.2022 Abstract. In this paper we investigate asymptotic properties of one class of empirical processes in case of presence of covariates for a class of measurable functions. Keywords: empirical processes, metrical entropy, Gaussian processes.
Citation: A.A. Abdushukurov, F.A. Abdikalikov, On Special Empirical Processes of Indej Presence of Covariates, J. Sib. Fed. Univ. Math. Phys., 2023, 16(1), 66-75. EDN: NTXBV:
1. Introduction and preliminaries
A special empirical processes of independence has been introduced in works of Abdushukurov and Kakadjanova [1, 2] in the case of indexing of empirical processes by class of measurable functions F. The modern asymptotic theory of empirical processes indexed by a class F is actively developed and the current results of this theory allow us to establish uniform versions of the laws of large numbers and central limit theorems for empirical measures under imposing of the entropy conditions for a class F. These results are essentially generalization of classical theorems of Glivenko-Cantelli and Donsker [3,4]. In applied mathematics, in order to generalize of Glivenko-Cantelli theorems for a class of sets Vapnik and Chervonenkis in 70-s years of the last centure made a significant contribution to the development of statistical (machine) learning theory (theory of Vapnik-Chervonenkis), which justifies the principle of minimizing of empirical risk (for details, see the monograph [5]).
In the papers of authors [1,2] the limiting properties of generalized empirical processes of independence of random variables (r.v.-s) and events indexed by a class F were investigated. Here we extend this model to the regression case. The necessity of considering such processes stems from practical situation, where we investigate joint properties of the triple of observed data: r.v.-s, event and covarate. Let us consider the sequence of observed triples {(Zk, Ak, Xk), k > 1},
*a— [email protected] [email protected] © Siberian Federal University. All rights reserved
where Zk are positive random elements defined on a probability space (H, A,P) with values in a measurable space (X, B). Events Ak have a common probability p = P (Ak) e (0,1). For our analysis, we consider the observed data (Zi, ¿1),..., (Zn, 5n) at n fixed design points 0 < x1 < x2 < ... < xn < 1 of covariate X, where Sk = I (Ak) is an indicator variable of the event Ak. The observed r.v.-s at design points x e [0,1] are Zx and ¿x. Here ¿x = 1 denotes that the event Ax occurs. Each pair (Zx, Sx) of samples induces a statistical model (X x {0,1} , B x {0,1}, Px) for a given X = x, where the distribution
{Px (B x D) = p (Z e B,5 e d/x = x), B e B, D c {0, 1}},
for each Borel set B represented by the subdistribution:
Px (B x {0, 1}) = Qx (B) = Qox (B) + Qix (B), Qmx (B) = Px (B x {m}), m = 0,1.
Our interest is focused on the hypothesis H of independence of Zx and Sx. It is easy to see that under validity of H: Qix (B) = pxQx (B) and Qox (B) = (1 - px) Qx (B), for all B e B, where px = Q1x (X). Let us introduce the signed measure
{Ax (B) = Qix (B) - pxQx (B) ,B e B} ,
which is equal to zero under the hypothesis H. Using this measure, we construct an empirical process for testing a hypothesis H. In this regard, we introduce empirical analogues of the above measures for B e B:
n
Qxh (B) = Y, Uni (x; hn) I (Zi e B) = Qoxh (B) + Qixh (B), (1)
i=i
where
n
Qmxh (B) = ^2 Uni (x; hn) I (Zi e B, ¿i = m), m = 0,1,
i=i
and
Axh (B) = Qixh (B) - pxhQxh (B), pxh = Qixh (X).
The nonparametric estimators above involve a sequence of smoothing weights {uni (x; hn)}, depending on a positive bandwidth sequence {hn, n > 1}, tending to zero as n ^ to. In our present case of fixed design points, it is common to use the Gasser-Miiller-type weights, given by
Uni(x; hn) = C } h ) I Trk(~T~ldz (i = 1,...,n),
Cn (x; hn) J xi-l hn \ hn J Cn (x; hn) = [ -1 k( x )dz.
o hn hn
Here x0 = 0 and k is a known probability density function (the kernel).
2. Asymptotic results
Under B = (-x>,t], let us define conditional distribution function (d.f.) and subdistribution functions for a given X = x:
Gx (t) = Qx ((-to, t]) = P (Z < t/X = x) = P (Zx < t),
and
Gmx (t) = Qmx ((—x,t]) = P (Z < t, S = m/X = x) = P (Zx < t, Sx = m), m = 0,1.
We shall need the following additional notation. For the design points xi,...,xn we denote An = min (x^ — xi-1) and An = max (x^ — xi-1). For the kernel k we use the following
i^i^n i^i^n
assumptions on the design points and the kernel (see, [6-8]): (C1) xn ^ 1, An = O (1), An — An = o (1).
(C2) k is a probability density function with the support [-M, M] for some M > 0, m1 (k) =
CO
= J yk (y) dy = 0 and k is Lipschitz of order 1.
-C
Note that Cn (x; hn) = 1 for n sufficiently large since xn ^ 1 and k has finite support. This means that in all proofs of asymptotic results we may take Cn (x; hn) = 1.
Further on we will need a typical smoothness condition of Gx (t) and Gmx (t), m = 0,1 and probability px = G1x (+x) = lim G1x (t).
■■ d2 .. d2
(C3) The second-order partial derivatives Gx(t) = ——2 Gx(t), Gmx(t) = Gmx(t) and
dx2 dx2
d2 d2 G'x(t) = dt2Gx(t) and G'mx(t) = d^Gmx(t), m = 0,1, exist and are continuous for 0 < x < 1
and t e R.
d2
(C4) The second-order partial derivatives px = -—2px exist and are continuous for 0 < x < 1.
dx2
In what follows, we also use the notation
G x
sup
(t,x)e[0,T ] x [0,1]
Gx (t) , Gx = sup Gx (t)
(t,x)e[0,T ] x [0,1]
\\px\\ = sup \px\ , \\px\\ = sup \px\ .
x£[0,1] x£[0,1]
We denote a weighted estimates for Gx (t) and Gmx (t), m = 0,1 obtained from (1) as
n
Gxh (t) = Qxh ((—^,t]) = YJ uni (x; hn) I (Zi < t),
i=1 n (2)
Gmxh (t) = Qmxh ((—x,t]) = ^2 Uni (x; hn) I (Zi < t, Si = m), m = 0,1,
i=1
and by definition uni (x; hn), un1 (x; hn) + ■■■ + unn (x; hn) = 1. Note that when we put uni (x; hn) = 1/n, i = 1,...,n, then estimators (2) transformed to usual empirical estimator for Gx (t) and Gmx (t), m = 0,1.
For a sufficient large n by condition (C1), we shall suppose that Cn (x; hn) « 1. Hence, in future calculation in asymptotic results we will put Cn (x; hn) = 1.
Now we give some asymptotic results for estimators (2) from works [6,8]. Let T < TxG = = inf {t : Gx (t) = 1}.
Lemma 2.1 ([6]). (Bias and variance). (a) Let the conditions (C1)-(C3) be satisfied and as n ^ x, hn ^ 0, nhn ^ x. Then under n ^ x
sup \EGxh (t) — Gx (t)\ = o (hn).
(b) Let the conditions (C1)-(C3) be satisfied and as n ^ x, hn ^ 0. Then under n ^ x
sup \EGxh(t) — Gx(t)\ = o(h2n + 1 0<t<T V n
In particular,
' 1
sup
0<t<T
EGxh (t) - Gx (t) - 2m (k) Gx (t) h
M)+O(n).
(c) Under conditions of (a) as n ^ to
DGxh (t) = -h-Gx (t) (1 - Gx (t)) \\k\\l + J-h-)
nhn \nhnJ
where m2 (k) = J y2k (y) dy and \\k\\2 = J k2 (y) dy.
2
— oo
Lemma 2.2 ([6]). (Pointwise strong consistency). Let the conditions (C1)-(C3) be satisfied and log n
as n ^ to, hn ^ 0, —-— = o(1). Then under n ^ to and t ^ T
nhn
Gxh (t) ^ Gx (t).
Lemma 2.3 ([6]). (Exponential estimator of Dworetzky-Kiefer-Wolfowitz). Let the conditions (C1), (C2) be satisfied and as n ^ to, nhn ^ to.
(a) For £ > 0 and large n such that
£2 > 2 nhn-
and for T > 0
P ( sup Gxh (t) - EGxh (t)| > ^ < 2donhn£ exp (-dinhn£2) . \o<i<T /
(b) Moreover, if the condition (C3) holds for £ > 0 and n such that
£ > m&x{(V6\\k\\2 (nnhn)-i/2) , (2 ||Gx|| An + 2m2(k) ||Gx|| } ,
\ 2 n
then
P sup \Gxh (t) - Gx (t)| < - donhn£ exp ( - - dxnhn £2 (3)
\o<t<T J 2 V 4 /
8e2 4
where d0 = -^ and d1 = -^. From (3) by Borel-Cantelli lemma under £ = £n =
0 i2 \\k\\2 i2 1 3\k\2 (JU
= c(nhn) 1/2 (log n)1/2 we have
Lemma 2.4 ( [6]). (Rate of strong uniform consistency). Let the conditions (C1)-(C3) be
nh5
satisfied and as n ^ to, -—n = O(1). Then under n ^ to
log n
sup \Gxh (t) - Gx (t)\ =■ Ol f l0gn 0<t<T \ V nhn
'=o(( nni)-
For a measure Gx and a class F of Borel measurable functions f : X ^ R, we introduce the integral over X
Gxf = I fdGx, f eF, ■Jx
= o
which is the expectation by measure Gx of function f. Let us introduce the following F indexed extensions of (1) for f G F:
n
/ f dGxh = 2_j Uni (x; hn) f (Zi) = Goxhf + Gixhf,
■'X
where
n
Goxhf = ^2 Uni (x; hn) (1 - Si) f (Zi),
i=i n
Gixhf = ^2 Uni (x; hn) Sif (Zi).
i=i
Introduce the empirical processes under the validity of H,
nh ^ 1/2 yPxh(l - Pxh)
where
(4)
nh
-7z n-r (Axh - Ax) f = Aixhf - Px ■ Axhf - Gxf ■ Aixhl - Rxh (f), f GF (5)
\Pxh(1 - Pxh) )
Axhf ^P"^^) /x fd (Gxh - Gx),
Aixhf = (--^ if fd (Gixh - Gix),
\Pxh(1 - Pxh)) Jx
(6)
Rxh (f P h'n\h-P h)^ (Pxh - Px) J fd (Gxh - Gx).
KPxh(1 - Pxh")) JX ( nhn ^ 1/2 KPxh(1 - Pxh)) ^X'Jx'
In order to consider the uniform variants of the Glivenko-Cantelli theorem and the Donsker theorem we need some notations from bracketing entropy theory. Let Lq (Q) be the space of functions f : X ^ R with the norm
\\f ||Q,q = (Q\f\q )i/q = (J \f\q dQl' \
To determine the complexity or entropy of a set of a set of Borel measurable functions F it is necessary to define a concept of e-brackets in Lq (Q). So e-bracket in Lq (Q) is a pair of functions p,^ G Lq (Q) such that Q (p(Z) < ^(Z)) = 1 and \\^ - p\\Qjq < e, that is - p)q < eq. Function f G F is covered by bracket [p, if Q (p(Z) ^ f (Z) < ^(Z)) = 1. Note that functions p and ^ may not belong to the set F but they must have finite norms. The bracketing number N[] (e, F, Lq (Q)) is the minimal number of e-brackets in Lq (Q) needed to cover the set F [3,4]:
{k : for some fi,..., fk G Lq (Q),
FC U [fi,fj]: \\fj - fi\\Q,q < e. i,j
The number Hq (e) = log N[] (e, F, Lq (Q)) is called the metric entropy of class F in Lq (Q). The metric entropy of the class F in Lq (Qm), m = 0,1 is denoted by Hmq(e) = = logNm[](e, F, Lq(Qm)). Integrals of metric entropies are
rs
J{mi](.S) = Jm[] (S, F, Lq (Qm)) = J (Hmq (e))i/2de, 0 < S < 1, m = 0,1.
Let us recall the important properties of numbers N[] (■). They tend to +œ when e ± 0. However, for the Donsker theorems they should converge to +œ not very fast. This rate of convergence is measured by integrals J^ (S). For example, for a class F of monotone functions f : X ^ [0,1] and each measure Qm one has
Hmq (e) < k0e 1,
where k0 is depends only on q. In particular, for a class F of indicators F = {I(-œ,t],t G R} entropy is Hm1 (e) ~ |loge\ and at n ^ œ
J2] (Sn) = j " [log O (e-1 )]1/2de = O (S^ 0, Sn I 0.
In future we can investigate the relation (5) and its summands (6). Next lemma is useful in estimating of convergence to zero of remainder term Rxh (f ) in (6).
Lemma 2.5 ([8]). Assume that (C1), (C2) and (C4), hn ^ 0. (a) For e > 0 and n sufficiently large such that
we have
e > 2 \\px\\ An + 2m2(k) \\px|| h2n
P (\pxh - Pxl > e) < 2exp[ -dnh
\ n 1+ e/6/
where d is some absolute constant. log n
(b) If —---> 0, then pxh - Px ^ 0 a.s.
nhn
J 5
(c) If n^n = O(1), then pxh - Px = O ((nhn)-1/2(log n)1/2) a.s.
Now we prove that the two-dimensional vector field (Axhf, A1xhg), f, g G F weakly converges to the corresponding Gaussian field uniformly with respect to space l(F) x l(F) for every class of measurable functions F. This is necessary for investigating of expansion (5).
Theorem 2.1. Let us consider conditions (C1)-(C4) and the class F of measurable functions f such that
FcC2 (Qmx) and Ji2/](1) < œ, m = 0,1. (7)
Then for n ^ œ the sequence of random vector field (Axhf,A1xhg), f, g G F weakly converges in l(F) x l(F) to the Gaussian field (Axf,A1xg), f, g G F with zero mean and covariance structure
cov (Axf, Axg) = \\k\\2 {Gxfg - Gxf Gxg} ,
cov (A1xf, A1xg) = \\k\2 {G1xfg - G^xf G1xg} , (8)
cov(Axf,A1xg) = \\k\\2 {G1xfg - GxfG^g} .
Proof. Consider the first condition in (7). Then for the fixed f G F it follows that Qmxf2 < œ, m = 0,1, and hence Qxf2 = Qoxf2 +Q1xf2 < œ. For every such Donsker class F with the second condition in (7) the sequences Axhf and A1xhg are asymptotically tight (see, Lemma 1.3.8 in [3]). There exists a tight Borel measurable version of Gaussian processes Axf and A1xg, that is, the Gaussian processes with zero mean and jointly covariance (8). Tightness and measurability of limiting process Axf and A1xg are equivalent to the existence of versions of all sample paths f ^
e2
Axf, g ^ Aixg uniformly bounded and uniformly continuous with respect to the corresponding mean square metrics (see, [3], p. 226)
E(Axf - Axgf = < (f) + 4x (g) + aQx (f - g),
E(Aixf - Aixg)2 = aQlx (f) + aQlx (g) + aQlx (f - g),
where aQQx (f) = Qx(f - Qxf)\ aQ^ (f) = QUf - Qixf ?.
On the other hand, the considered vector-field is the normalized sum of independent and identically distributed random vectors
n
(Axhf, Aixhg) = (nhn)-l/2^2 (wni (x; K) (f (Zi) - Qxf), (x; hn) (dig (Zi) - Qixg)), (9)
i=i
then by the multivariate central limit theorem the marginals of the sequence of vector-fields converge to the marginals of a Gaussian vector-valued field with zero mean and covariance matrix defined by structure (8). Vector-field (9) is an element of product-space l(F) x l(F), and it also induces tight sequences of distributions in the product-space by Lemma 1.4.3 [3]. The limiting value of covariance structure of vector (9) coincides with covariance structure (8). These arguments complete the proof of Theorem 2.1. □
Remark 2.1. Consider formulas (8). At g = 1 for f G F we have Aixl = px and hence
cov (Axf, Aix 1) = Gixf - GxfGixl = Gixf - PxGxf =^xf. (10)
Because covariance (10) is zero under validity of the hypothesis H, then the Gaussian fields {Axf,f G F} and normal r.v. Alx1 with variance px (1 - px), are independent.
Remark 2.2. By Lemmas 2.1-2.4 and Lemma 2.5, by consistency of Gxh for Gx and pxh for px we see that the remainder Rxh (f) tends to zero as n ^ to: \Rxh (f)| = o(1) in probability.
Now we study normalized empirical process (5) without the remainder term Rxh (f), which tends to zero as n ^ to. Let us denote
Axhf = (nhn)l/2 (Axh - Ax) f = (pxh(1 - pxh))1'2 {Aixhf - px ■ Axhf - Gxf ■ Aixhl} .
This process is the intermediate random field and it plays a supporting role in the study of basic process (5) which property of weak convergence to a corresponding Gaussian process is contained in the following statement.
Theorem 2.2. Under conditions (C1)-(C4) and (7). Then for n ^ to we have
Axhf ^ Axf in l~ (F), (11)
where {Axf, f G F} is a Gaussian fields with zero mean and with covariance
cov (Axf, Axg) = \\k\\2px(1 - px) {Gxfg - Gxf Gxg} . (12)
Proof. Let us consider process Axhf, which is zero mean Gaussian by Theorem 2.1. We then consider only covariance
cov (Axhf, Axhg) = \\k\\2 px(1 - px ) >;Cj}, (13)
where
Ci = Gixfg - GlxfGlxg, C2 = -Px(Gixfg - GxfClxg), C3 = -(1 -Px)GxfGixg, C4 = -px(Gixfg - GxgGixf),
C5 = P2x(Gxfg - GxfGxg), C6 = pxGxf (Gixg -PxGxg), (14)
C7 = -(1 -Px)GxgGixf, Cg = PxGxg(Gixf -PxGxg), C9 = Px(1 - Px)GxfGxg. Now adding of all elements (14) by formula (13) we obtain (12). Theorem 2.2 is proved. □ Thus, statistics for testing of hypothesis H one can construct from normalized process as a some functional
3. Application to random censoring
Let us consider a right random censoring model, where Zi = min {Ti,Ci}, Ai = {Ti < Ci] . Here r.v.-s Ti and Ci denote life times and censoring times, which is independent at fixed design points 0 < xi < x2 < • • • < xn < 1. Hence at each design points xi, there is a r.v. Ci such that we only observe the pair (Zi, Si), where Si = I (Ai). Furthermore, we suppose that d.f.-s Fxi and Kxi of r.v.-s Ti and Ci are continuous and Fxi (0) = Kxi (0) = 0. Consequently we have that the d.f.
Hxi (t) = P (Zi < t/X = xi) = 1 - (1 - Fx^ (t)) (1 - Kx^ (t)). Subdistributions defined as
Qoxi (B) = P (Zi e B,Si = 0/X = xi) = P (Cxi e B n [0,Txz])= i (1 - Fxi(t)) Kxi (dt),
J B
Qx (B) = P (Zi e B,Si = 1/X = xi) = P (Txi e B n [0,CXi])= i (1 - Kx%(t)) Fxi(dt).
B
As in the situation without covariates, we can also define in this model a Koziol-Green type sub-model by assuming that for a given design point x, the conditional survival function of Cx is some power of the conditional survival function of Tx: for t > 0,
1 - Kx (t) = (1 - Fx (t)f',
where ¡3x > 0 and is allowed to depend on the covariate x. We note that here
P (Sx = 1)= (1 - Kx (t)) dFx (t)= (1 - Fx (t)f' dFx (t), Jo Jo
p^o p^o
P (Sx =0)= (1 - Fx (t)) dKx (t) = ¡3x (1 - Fx (t)f' dFx(t),
oo
and hence ft, = ^^.
P ( Sx = 1)
By this extra assumption the estimator in this sub-model has a simpler form than in the general model and is given by
Fxh (t) = 1 - (1 - Hxh (t))Yxh,
where Hxh (t) = (x; hn) I (Zi < t) and jxh = (x; hn) Si are Stone type estimators
i=i i=i
for Hx (t) = P (Zx < t) and yx = -tt = P (Sx = 1)- This estimator has been studied more
1 + Px
extentively by Veraverbeke and Cadarso-Suarez [7]- These authors noted the superiority of methods for estimating and the testing in Koziol-Green proportional hazards model and methods are based on Fxh rather than on the product-limit estimator of Kaplan-Meier [10] or relative risk power estimator of Abdushukurov [9]- Hence the question arises as to when the advantages of the Koziol-Green model can be used- In other words, there is now a need for testing of validity of composite hypothesis described by by relation (10)- But this relation is equivalent to hypothesis H on independence of r-v--s Zx and Sx in sample- Let us consider the following special normalized empirical process, special Kolmogorov-type statistics, obtained from (15): sup |Axh (t)|, where
\t\<x>
Axh (t)=[ -^-r ) \\kW21 (Hixh (t) - PxhHxh (t)) , \t\ < tt, (16)
/ nhn V/2 \Pxh(1 - Pxh)j
n
where H1xh (t) = ^ uni (x; hn) I (Zi < t,Si = 1). Then we have consequence of Theorem 2-2: if
i=i
H holds, then as n ^ <x
A0xh (t) ^ B (Hx (•)), (17)
where {B (y), 0 < y < 1} is a Brownian bridge- Note that these statistics based on convergence (17) are consistent- Moreover, by Theorem 2-2 one can consider more general classes of statistics using F- indexed processes that are more flexible in application than (16)-
References
[1] A-A-Abdushukurov, L-R-Kakadjanova, A class of special empirical processes of independence, J. Sib. Fed. Univ. Math. Phys., 8(2015), no- 2, 125-133-
[2] A-A-Abdushukurov, L-R-Kakadjanova, Sequential empirical process of independence, J. Sib. Fed. Univ. Math. Phys., 11(2018), no- 5, 634-643-
DOI: 10-17516/1997-1397-2018-11-5-634-643
[3] A-W-Van der Vaart, J-A-Wellner, Weak convergence and empirical processes, Springer, 1996-
[4] A-W-Van der Vaart, Asymptotic Statistics, Cambridge University Press, 1998-
[5] V-N-Vapnik, Statistical learning theory, Wiley, New York, 1998-
[6] I- Van Keilegom, N-Veraverbeke, Estimation and bootstrap with censored data in fixed design nonparametric regression, Annals of the Institute of Statistical Mathematics, 49(1997), 467-491-
[7] N-Veraverbeke, C-Cadarso-Suarez, Estimation of the conditional distribution in a conditional Koziol-Green model, Test, 9(2000), 97-122-
[8] R-Breakers, Regression problems with partially informative or dependent censoring, Limd-burgs University Centrum, 2004-
[9] A.A.Abdushukurov, Nonparametric estimation of the distribution function based on relative risk function, Commun. Statist.: Th and Math., 27(1998), no.8, 1991-2012.
[10] E.L.Kaplan, P.L.Meier, Nonparametric estimation from incomplete observations, J.A.S.A., 53(1958), 457-481.
Специальные эмпирические процессы независимости в присутствии ковариат
Абдурахим А. Абдушукуров
Филиал Московского государственного университета имени М. В. Ломоносова в г. Ташкенте
Ташкент, Узбекистан Институт математики имени В. И. Романовского АН РУз
Ташкент, Узбекистан
Фархад А. Абдикаликов
Каракалпакский государственный университет
Нукус, Узбекистан
Институт математики имени В. И. Романовского АН РУз
Ташкент, Узбекистан
Аннотация. В работе исследуются асимптотические свойства одного класса эмпирических процессов при наличии ковариат для определенного класса измеримых функций.
Ключевые слова: эмпирические процессы, метрическая энтропия, гауссовские процессы.