Научная статья на тему 'ON THE NONPARAMETRIC ESTIMATION OF THE FUNCTIONAL REGRESSION BASED ON CENSORED DATA UNDER STRONG MIXING CONDITION'

ON THE NONPARAMETRIC ESTIMATION OF THE FUNCTIONAL REGRESSION BASED ON CENSORED DATA UNDER STRONG MIXING CONDITION Текст научной статьи по специальности «Математика»

CC BY
110
43
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
FUNCTIONAL DATA / CENSORED DATA / LOCALLY MODELED REGRESSION / ALMOST-COMPLETE CONVERGENCE / STRONG MIXING

Аннотация научной статьи по математике, автор научной работы — Leulmi Farid, Leulmi Sara, Kharfouchi Soumia

In this paper, we are concerned with local linear nonparametric estimation of the regression function in the censorship model when the covariates take values in a semimetric space. Then, we establish the pointwise almost-complete convergence, with rate, of the proposed estimator when the sample is a strong mixing sequence. To lend further support to our theoretical results, a simulation study is carried out to illustrate the good accuracy of the studied method.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «ON THE NONPARAMETRIC ESTIMATION OF THE FUNCTIONAL REGRESSION BASED ON CENSORED DATA UNDER STRONG MIXING CONDITION»

DOI: 10.17516/1997-1397-2022-15-4-523-536 УДК 519

On the Nonparametric Estimation of the Functional Regression Based on Censored Data under Strong Mixing Condition

Farid Leulmi* Sara Leulmi^

University Frères Mentouri Constantine, Algeria

Soumia Kharfouchi*

University Salah Boubnider Constantine, Algeria

Received 04.02.2022, received in revised form 09.03.2022, accepted 10.05.2022 Abstract. In this paper, we are concerned with local linear nonparametric estimation of the regression function in the censorship model when the covariates take values in a semimetric space. Then, we establish the pointwise almost-complete convergence, with rate, of the proposed estimator when the sample is a strong mixing sequence. To lend further support to our theoretical results, a simulation study is carried out to illustrate the good accuracy of the studied method.

Keywords: functional data, censored data, locally modeled regression, almost-complete convergence, strong mixing.

Citation: F. Leulmi, S. Leulmi, S. Kharfouchi, On the Nonparametric Estimation of the Functional Regression Based on Censored Data under Strong Mixing Condition, J. Sib. Fed. Univ. Math. Phys., 2022, 15(4), 523-536. DOI: 10.17516/1997-1397-2022-15-4-523-536.

1. Introduction and preliminaries

The nonparametric estimation in the functional data is an important subject in modern statistical literatures. This research field is motivated by the fact that several data collected in practice, are given in the form of curves. The monograph of [7] is a pioneer work in the nonparametric setting, where the authors established the pointwise almost-complete convergence for different kernel type estimators.

However, lot of works show that the performance of the local linear method is better than that of the kernel one. Such in [2], where they obtained the rate of the pointwise almost-complete convergence for local linear estimator of the regression function. The uniform convergence of other nonparametric local linear estimators has been investigated in some papers as [6,13,17], in the independent and identically distributed (i.i.d.) data case.

Unfortunately, in many practical applications such as reliability and survival time studies, the interest response variable may be incompletely observed, which make the study of censored data

*[email protected] 1 [email protected] [email protected] © Siberian Federal University. All rights reserved

more useful in practice. We can see this for example in the works of [1], where the authors gave a family of robust nonparametric estimators for which consistency and asymptotic normality results are established under independent data. For the same data, [10,11] investigated the rates of the pointwise and the uniform almost-complete convergence of a local linear estimator of the conditional quantile and the regression function. She improved that the local linear method outperforms the kernel method even for censored data.

All the above mentioned works concerned the independent functional data case. Nevertheless, in many cases, we face a dependent data. A large studied example is the case of the a-mixing dependence. We refer to [15] for the kernel nonparametric regression estimation under random censorship. [16] examined the almost-complete consistency and the asymptotic normality of the estimator of the relative error regression for the strictly stationary data. Furthermore, [3] used the local linear approach to estimate the conditional density and established its pointwise almost sure convergence, in the censored and functional a-mixing case.

By combining ideas from the two previous works of [11,12] for the local linear estimation of the regression function in the complete dependent and the independent censored, respectively, functional data cases, we propose a novel estimation procedure for the regression function in the case of dependent functional and incomplete data. Among incomplete data models we are here interested in right censoring, which is frequently present in practice.

To our knowledge, the local linear estimation of the regression function combining censored and functional dependent data has not been studied in statistical literature. So, in this work, we address this problem. More precisely, we first present in Section 2 of our paper, a local linear estimator of the regression function. Then, in Section 3, we establish the rate of its pointwise almost-complete convergence under standard conditions. A simulation study is carried out to show the good behaviour of our estimator in Section 4. Finally, the proofs of the main results are evoked in the Appendix.

Throughout this paper the following notations will be adopted. Let Tu = sup{t e R; Fu(t) < 1} denote the upper endpoint of the support of Fu, where Fu(t) = P(U < t) denote the distribution of a real random variable (r.r.v.) U. Furthermore, F is an infinite-dimensional space equipped with a semimetric d, X is a random variable valued in F, for any x e F, h > 0, B(x,h) := {y e F/ d(x,y) < h} denotes a closed ball in F of center x and radius h. We also define $x(r1,r2) := P(r1 < d(x,X) < r2), where r1 and r2 are two real numbers.

For the sake of clarity, we feel welcome to recall some definitions.

• Let {Zi, i = 1,2,... } be a strictly stationary sequence of random variables, Fk (Z) denotes the a-algebra generated by {Zj, i < j < k}. Given a positive integer n, set

a(n) = sup{|P(A n B) - P(A)P(B)\ : A e Fh{(Z) and B e F^+n(Z), k e N}

The sequence is said to be a- mixing (strong mixing) if the mixing coefficient a(n) ^ 0 as n —y

Many processes do satisfy the strong mixing property, see [14] for more details and examples.

• Let (z„)„eN* be a sequence of real random variables. We say that (zn)„eN* converge almost-

completely (a.co.) toward zero if, and only if, Ve > 0, ^ P(\zn\ > e) < x>. Moreover, let

n=1

(un)neN* be a sequence of positive real numbers; we say that zn = O(un) a.co. if, and only if, ^e > 0, £ P(\zn\ > eun) < w.

n=1

It is clear, from Borel Cantelli lemma, that this convergence is stronger than the almost sure one.

2. Definition of the estimator

Consider n pairs of random variables (Xi7 Yi)i=1.....,n identically distributed as the pair (X, Y) which is valued in Fx R.

We report that in the complete case, the local linear estimator of the regression function m(x) = = E(Y\X = x) is presented in [2] as follows

i j=1 Wij (x)Y ( 0

m(x) = --, — = 0

£ Wj (x) V0

i, j=i

with

Wij(x) = ft(Xi,x) Xi,x) - ft(Xj,x)) K(h-1d(Xi,x))K(h-1d(Xj,x)), (1)

where .) is a known function from F xF into R such that, V£ € F, £) = 0, the function K is a kernel and h := hn is a sequence of strictly positive real numbers which plays a smoothing parameter role.

As Yi is not disponible in practice, we can only observe a sample (Xi7 Zi, Si)1^i^n of i.d. observations of (X, Z = Y A R,S) where R is nonnegative censoring random variable with unknown continuous survival function G (Vt, G(t) = P(R > t)) and S = 1{Y<R} (where 1A denotes the indicator function of the set A) and Y is a nonnegative random variable.

All over this paper, we will assume that the sequences (Xi)i^i^n, (Yi)1^i^n and (Ri)1^i^n are stationary and a-mixing with mixing coefficients a1(n), a2(n) and a3(n) respectively. Notice that, in view of Lemma 2 in [5], we can show that, the sequences (Xi7 Yi)1^i^n, (Zi)i^i<n and then (Xi,Zi,Si)i^i^n are a-mixing with coefficients a(n) = 4max(a1 (n),a2(n)), b(n) = 4max(a2(n), a3(n)) and a(n) = 4max(a1(n), b(n)) = 4mmx (a1(n), 4max(a2(n), a3(n))) respectively.

Furthermore, the dependence assumption of (Xi)1^i^n, (Yi)1^i^n and (Ri)1^i^n, seems to be more general and one can think to replace it by a classical dependence assumption of (Xiy Yi)1^i^n and the sequence (Ri)1^i^n is i.i.d. censoring random variable, see for example [3]. Because, since (Xiy Yi)1^i^n is stationary and a-mixing, it is straightforward that the sequences (Xi)1^i^n and (Yj)1^i^n are also stationary and a-mixing. This can be deduced from the fact that the later can be seen as a projection-image of the former. On other hand, the a-mixing condition of (Ri)1^i^n is more comprehensive than the independence assumption, we put a3 = 0. Let (A1) be the following assumptions.

• R and (X,Y) are independent and TY <TR < x>.

• 3T <TY such that Vi, 1 < i < n; Zi < T.

This assumption is a standard condition in nonparametric censoring estimation which permits us to obtain an unbiased estimator. Like so, the independence assumption between R and (X, Y) is plausible whenever the censoring is independent of the patients modality, TY < TR implies that G(T) > 0 because T <Ty .

A feasible local linear nonparametric estimator of m(x), constructed in [11], is defined by

£ Wij (x) gMZh i j=1 ij ) /0 \ m(x) = -, (0 = 0), (2)

£ Wij(x) V0 J

i, j=1

where Wij (x) is defined in (1) and Gn is the well known [9] estimator of G, which given by

iTT L 1 - sm ... ,

= 1 - n—iTi) t<Z(n) (3)

[o if t > Z{n),

where Z(1) < Z(2) < • •• < Z(n) are the order statistics of Zi and S(i) the noncensoring indicator corresponding to Z(i). Notice that for all 1 < j < n, Gn(Zj) = 0 implies that Sj = 0. From now on, we have that (Xi, Zi,Si)1^i^n is strongly mixing with mixing's coefficient a(n). Now we are in position to give our assumptions and main result.

3. Main results

The aim of this section is to establish the pointwise almost-complete convergence of m. For this purpose, we need the following assumptions.

(H1) For any h> 0, &x(h) := (0, h) > 0.

(H2) There exists b > 0 such that

yx\, x2 £ B(x,h); \ m(xi) — m(x2) \ ^ Cxdb(xi,x2),

where Cx is a positive constant depending on x.

(H3) The function ¡3(.,.) is such that

3 0 < Mi < M2, W £ F; M1d(x, x') < \3(x,x')\ < M2d(x,x').

(H4) The kernel K is a positive and differentiable function on its support [0,1] and

3 C,C' > 0; 0 < C 1{0A](t) < K(t) < C'1[ojl](t) < to.

(H5) This condition is devided into the two following conditions (H5a) and (H5b). (H5a) There exist C > 0, a > sup (^4, satisfying

^n £ N; a(n) < Cn-a, where d and u are defined in (H5b) and (H8) respectively. (H5b) There exist 0 <d < 1, C > 0, C' > 0 such that

C' [$x(h)]1+d <Mh) < C [$x(h)]1+d, where ^x(h) := ^x(0, h) and

Mhi, h2) := P (hi < d(Xi ,x) < h2,0 < d(X2,x) < h2).

(H6) For all m > 2,Sm : x ^ E(\Y\m\X = x) is a continuous operator at x and

3C > 0; supE (\YiYj\\(Xi,Xj)) < C < to.

i=j

(H7)

1 f1 d

3 n0 G N, yn> n0, ^x(zh, h) — (z2K(z)) dz > C > 0

Wx(h) J0 dz

and

h2 j J ¡(u,x)P(t,x)dp(x1 ,x2)(u,t) = o

B(x,h) B(x,h)

J J ¡2(u,x)p2(t,x)dP{xx) (u,t)\

\B(x,h) B(x,h)

where dP(Xl,x2) is the joint distribution of (X1,X2). (H8) The bandwidth h satisfies h = 0 and 3 n0 > 0, u > 0, C1 > 0, C2 > 0 such that

C1n— +no < $x(h) < C2n-u,

. a — 3 with n0 < - and u < 1.

a +1

Remark that these conditions are standard in this context, the hypotheses (H1)-(H5) and (H7)-(H8) are the same conditions assumed in [12]. The condition (H6) is the same condition (H6) in [12] with p(t) = t.

Now, we are in position to state the almost-complete convergence of m(x).

Theorem 3.1. Assume that assumptions (A1) and (H1)-(H8) are satisfied, then

( in

m (x) — m(x) = O(hb) + O,

'a.co.

n$x(h)

One of the main features of the present paper is studding the local linear estimation under the dependent and censored case, which is generalizes several usual situations. In particular, we consider the independent case (see [11]), the complete case(see [12]) and the kernel method (see [15]).

Proof 3.1. Let us set

£ jj) L0 , m(x) = ^-, (- = 0), (4)

£ Wij(x) J

i,j=1

with Wij (x) is defined in (1) and which will play a prominent part in the proof of the Theorem 3.1 thanks to the following decomposition, for all x £ F.

m (x) — m(x) = , [(m 1(x) — m 1(x)) + (m 1(x) — Em 1(x)) + (Em 1(m) — m(x))] +

mo(x)

(5)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

m(x)

+ ^A(1 — mo(x)) , m o(x)

where

m 1(x) = —-1 ———- ^^ Wj^j (x) jz j , mo(x)

f 1W№ niEWii (x)7jh, mo(x) = —-. V^Wij (x) (6)

n(n — 1)E [Wi2(x)\ ¿-f, Gn(Zj) n(n — 1)E [Wi2(x)\z-f,

i=j J i=j

n

and

1 R 7

n{n - 1)E [wi2(x)} fj G(Zj)

To treat the pointwise almost-complete convergence of m(x), we need Lemma A1 introduced in [12] and the preliminary tecnical Lemma 3.1 Then, the proof of the Theorem 3.1 is a direct consequence of the following Lemmas. □

In what follows, let C be some strictly positive generic constant and for any x e F, and for

all i = 1,...,n, Ki(x) := K(h-1d(Xi,x)) and fti(x) := ft(Xi,x).

As the dependence assumption reveals covariances terms, let us define for k e {0,2} and l e {0,1}

n n

Sl I k k (x) = T,T,\C™(A?' l)(x), j'l) (x))l (8)

i=1 j = 1

where, for i e { 1 , . . . , n}

A(k'l)(x) = 1 {Ki(x)ftk(x)5\Z\G-l(Zi) - E[Ki(x)ftk(x)5\Z\G-l(Zi)]} . (9)

We now focus on these covariances terms in the following result. Lemma 3.1. Under assumptions (A1) and (H1)-(H7), we have

Sl,l,k (x)= O(n$x(h)). (10)

Proof 3.2. By following the same steps as the proof of Lemma A.2 in [12] we get our result. □ Lemma 3.2. Assume that hypotheses (A1), (H1)-(H5) and (H7) hold, then

m(x) - E(m 1 (x)) = O (hb) .

Proof 3.3. The bias term is not affected by the dependence condition. Therefore, by the equiprob-ability of the couples (Xi, Zi,Si), we get

Em i (x) - m(x) = E[wl2(x)] E {Wi2(x) [E (Z2G-1(Z2 )S2\X2) - m(x)]} .

Hypothesis (H4), combining with the facts that E(S2\X2,Y2) = G(Y2) and S2Z2 = S2Y2, give that

E [Z2G-1(Z2)S2\X2] = E [Y2G-1(Y2)E (S2X2 Y) X] = m(X2)■

Then, we have

Em 1(x) - m(x) = E[w1i2{x)] E [W12(x) (m(X2) - m(x))] ■ (11)

The claimed result is obtained by using the last relation and the condition (H2). □

Lemma 3.3. Under assumptions of Theorem 3.1, we get

I / ln n

m 1 (x) - E(m 1(x)) = Oa.co.

$x(h)

Proof 3.4. Inspiring by the proof of Lemma 4.4 in [2], we consider the following decomposition

1n

m 1(x) = n(n - 1)E IW^x)] ^ Wij ^Z^ ) =

n2h2 &x(h)

n(n - 1)E [Wi2(x)]

1 t Kj xz j a-'z >)( ' t K' j

n$x(h) = n J 3 3 v 3'J\n$x(h) j=[ h2

1 ^ Kj(x)jSj(x)ZjSja-'(Zj)\f 1 ^ Kj(x)jSj(x) 2s h I .(h) 2s h

n$x(h) j=[ h J \n$xXh) j = 1

= Q(x)[D2,i(x)D4,o(x) - Ds,i(x)Ds,o(x)], (12)

where, for p G {2, 3,4} and l G {0,1},

1 ™ Kj(x)j-2(x)zjsja-1 (Zj),_.,_ n2h2$2x(h)

(x) = nâJhsT, j h- j V " and Q(xx) =

n$x(h) hp-2 ' n(n - 1)E [W12 (x)]'

Notice that, Q(x) = O(1) (see the proof of Lemma 2 in [12]), so, we have to show that, for p G {2, 3,4} and l G {0,1}

t P (W (x) - E(DPi (x))| < E[Dpl(x)] = O(1),

and that almost surely

CO, [D2,1(x),D4,0(x)] = ^^nnh)

and _

cw i^i (x),Ds,o(x)]=

• Firstly we have

Dp,l(x) - EDp,i(x) = -—-Y^A?-2,l\x),

x\ > i=i

1 n

1 \ " A (P-2,l)

dp,i (x) - edp,i (x) =

with Af,l)( x) is defined in (9).

Note that, because E(A[k'l)(x)) = 0, EIA[k'l) (x)lq = O($x(h)) for q> 2 and using Tcheby-chev's inequality, we can apply Proposition A.11-i in [7], to get for any q > 2, e > 0, r > 1 and for some 0 < C < œ

P (IDpl (x) - E iDpl(x)] I >e) = P (|t A(f,l)(x)l > ne$x(h)) < (iq)

V i=i J (13)

< C [Ai(x) + A2(x)],

where

/ e2n2($ (h))2 \-r/2 ( r \(a+i)q/(q+a)

Now, choosing for n > 0

£ = ^ П s and r = (In n)2.

V пФх(Н)

In view of Lemma 3.1, we have Sn l k(x) = O(n$x(h)). So, we obtain

1_ (a+1)q _2 + 3(a + 1)q _ (a+1)q

A2(x) < Cn 2(q+a) (lnn) 2+ 2(q+a) ($x(h)) 2(q+a). Next, using (H8), it exists some real number v > 0 such that

A2(x) = O(n-1-v). (14)

Moreover, in view of equation (10) and the fact that ln(x + 1) = x — x2/2 + o(x2/2) where x tends to zero, we can write

A1(x) < Cn-r'2/2, (15)

which shows that A1(x) is the general term of a convergent series for an appropriate choice of n. Hence, by combining relations (13), (14) and (15), we derive

I I In 1

Dp,i(x) — EDp,i(x) = Oa.co.

n$x(h) J '

• It is easy to see that under (H1), (H3), (H4) and (A1), we get, for p G {2, 3,4} and l G {0,1},

E[DPi(x)\ = h2-p$x(h)-1E [Ki(x)H-2(x)Z[ô[G-l(Zi)j < C, (16)

the last inequality is obtained by using the Lemma A1(i) in [12] and the condition (A1).

• Finally, by following similar arguments used to prove (10), we obtain

Cov [D2A(x),D4,o(x)\ = O' 1

,пФх(И,) and

Cov [D3tl(x),D3fi(x)] = О^ф-щ

In view of (H8), this last rate is negligible with respect to O (* ). The proof is then

W n®x(h) J

completed. □

Lemma 3.4 (see [12]). If assumptions (H1),(H3), (H4), (H5a), (H5b), (H7) and (H8) are satisfied, we obtain

mo(x) — 1 = Oa.co. ^jjn^nO ^ and XXP (mo(x) < < Lemma 3.5. Under assumptions (A1), (H1),(H3), (H4), (H5a), (H5b) and (H7), we have

m i(x) — m 1 (x) = О

a.co. 1 \ -г / 7 \ \ V пФх(Н)

Proof 3.5. Because the assumption (A1) and the definitions of m1(x) and m1(x) in (6) and (7), we can write

\fn i(x) — m i(x)

ÜM g Wj№ ZK gz

1

<

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

i(n —

\T\ sup\Gn(t) — G(t)\

t^T

Gn(T )G(T)

(Zj) G(Zj)

■Y^Wij (x)

n(n — 1)E [W12(x)\

\T\ sup\Gn(t) — G(t)\ ^ f^Gn(T )G(T)-

where m0(x) is defined in (6).

In order hands, following [5] and [18], we obtain

ln n

sup\Gn(t) — G(t)\ = Oa.co.\\ - ,

t<T

<

<

(17)

(18)

which is equals to Oa

ln J

n$x(h)

. The proof is completed by using Lemma 3.4.

4. Simulation study

In this section, two examples of simulation are presented to illustrate the performance of the proposed estimator (LLR). More precisely, we compare the LLR estimator to the kernel regression estimator (KR) studied in [15].

For the computation of the (LLR) and the (KR) estimators, we use the quadratic kernel

3

K(x) = 2(1 — x2)l[o,i](x) and the bandwith h is chosen by the 2-fold cross-validation method. Take into account of the smoothness of the curves Xi(t) (see Figs. 1 and 4.), we choose the semi-metric d based on the derivative (for the first example) and the PCA (for the second example) described in [7] (see routines "semimetric.deriv" and "semimetric.pca" in the website http://www.lsp.ups-tlse.fr/staph/npfda) and we take ¡3 = d (for the LLR estimator).

Example 1. Let us consider the following nonparametric regression model

where

Y = m(X) + ,

m(X) = 1 exp i 2 —

(fa1 X '(t)dt)'

and e is the error supposed to be generated by an autoregressive model defined by

¿i = ~J?(ei-i + ii), i = 1, ■■■ ,n

with ii are centered random variables normally distributed (i.i.d.) with a variance equal to 0.1 (ii ^ N(0, 0.1)). The functional covariate X(t) is defined, for t e [0,^/3] by

X (t) = 2 — cos ( W ( t — y

t

L* I

1

1

2

where W is an a-mixing process generated by Wi = -Wi-1 + n with n are i.i.d N(0,1) and

9

are independent from Wi, which is generated independently by W0 ^ N(0,1) (see Fig. 1 for a sample of these curves). Notice that the conditional mean function will coincide and will be equal to m(x).

For this model, we adopt the censored mechanism (Xi,Zi,Si)1^i^n, where Zi = xa\n(Yi,Ri), Si = 1(y^Ri} and the censoring random variable Ri = aiRi-1 + Zi with ai ^ N(0,0.1) and Zi are i.i.d. exp(1.5) and are independent from Rj, which is generated independently by R0 ^ exp(1.5). In this simulation, to illustrate the performance of our estimator, we proceed as follows:

• Step 1. For a different sample sizes n = 100, 200,300,500, we split our data into two subsets:

- (Xi, Yi^^^: The learning sample used to build the estimators, where n1 = n/2.

- (Xi, Yi)n2^i^n: The testing sample used to make a comparison, with n2 = n1 + 1 .

• Step 2. We calculate the two estimators by using the learning sample and we find the LLR and KR estimators of the conditional expectation (m and mKR), for a different sample sizes n = 100, 200, 300, 500.

• Step 3. We plot the true values m(Xi) for all i (n2 < i < n) against the predicted ones by means of the two estimators, one in each graph (for a fixed sample size n = 300, see Fig. 1).

Fig. 1. From left to right the curves Xi, the LLR and KR estimators (n = 300)

• Step 4. To be more precise, we measure the prediction accuracy, for different values of n, by using the mean absolute errors (MAE), given by

MAE(LLR) :=

1

n2 + 1

]T \m(Xj) - m(Xj

MAE(KR) :

1

n2 + 1

\mKR (Xj) - m(Xj

j=n2

j=n2

and the prediction errors (MSE) such that MSE(LLR) :-MSE(KR) :=

1

n2 + 1 '

J = "2

]T (m(Xj) - m(Xj))

1

U2 + 1

J2 (mkr(Xj) - m(Xj)f

The obtained results are in the Tab. 1.

Table 1. MSE and MAE comparaison for LLR and KR methods according to sample sizes

n = 100 n = 200 n = 300 n = 500

MSE MAE MSE MAE MSE MAE MSE MAE

LLR 0.0896 0.2008 0.0775 0.1796 0.0641 0.1529 0.0396 0.1062

KR 0.1190 0.2338 0.0867 0.1933 0.0796 0.1590 0.0471 0.1529

From Tab. 1 and Fig. 1, we observe that the quality of the two estimators perform better when the sample size n increase. Also, we can be seen that our predictor has a good behavior than the kernel one.

We preffer to give a second example to make a better decision.

Example 2. We fixe n = 200 and we generated the functional explanatory variables X(t) as follows

Xi(t) = a, sm(4(bi - t)) + cH, i = 1, ■■■ , 200,

where a, ^ N(4, 3), c, ^ N(0,0.01) and b, is an a-mixing process generated by b, = 1 ai-1 + with n, are i.i.d. N(0,1) and are independent from b,, which is generated independently by b0 ^ N(0, 3). We carried out the simulation with a 300-sample of the curve X(t) (see Fig. 2). The scalar response variable is defined as

Y = m(X) + ,

where

i(x ) = i1

J0

1

- dt

io 1 + X (t)|

and e is the error generated by an autoregressive model defined by

e— + ^

1,--- , 200

with ^ N(0,0.1). Notice that the conditional median function will coincide and will be equal

to m(x).

We also simulate n i.i.d. rondom (R,) exponentially distributed with parameter A which is adapted in order to get different censoring rates (CR). We compute our estimator with the observed data (Xi,Zi,5i)l^i^n, where Z, = mYi^Y,,^) and S, = Next, we split our

data into a learning sample with size 135 and a test sample with size 65. The true values are plotted against the predicted ones by means of our estimator in(x) and the kernel estimator mkr(x) (CR = 1.48%). To be more precise, we measure the prediction accuracy, for different

'J = n2

e

Fig. 2. From left to right the curves Xi, the LLR and KR estimators (CR = 1.48%)

values of CR, by using the mean absolute errors (MAE), given by

MAE(LLR) ■■= — Y I™X ) " m(Xj

200

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

65

j = l36 200

MAE(KR) ■=— J2 \mKR(Xj) - m(Xj

j=l36

and the prediction errors (MSE) such that

MSE(LLR) (m(Xj) - m(Xj))

j=l36

200

2

200

MSE (KR) (mm KR (Xj) - m(Xj )f

j = 136

The obtained results arein the Tab. 2. Fig. 4. and Tab. 2 show that, our estimator performs better than the kernel estimator. It

Table 2. MSE and MAE comparaison for LLR and KR methods according to CR.

CR = 1.48% CR = 28.67%, CR = 48.15% CR = 73.33%

MSE MAE MSE MAE MSE MAE MSE MAE

LLR 0.0019 0.0331 0.0182 0.1044 0.0260 0.1271 0.05458 0.2106

KR 0.0037 0.0353 0.0220 0.1098 0.0295 0.1474 0.0610 0.2314

is also clear that, the quality of the both estimators become slightly worse when we have high percentage of censoring, however it remains acceptable.

Conclusion and comments

In conclusion, our Our theoretical and practical studies confirmed without surprise that the quality of the LLR and the KR estimators are better for a bigger sample size n and a weak rate of censoring CR. Furthermore, as for independent and censored data, the LLR estimator stay more accurate than the KR one in all cases.

References

[1] L.Ait Hennania, M.Lemdania, E.Ould Said, Robust regression analysis for a censored response and functional regressors, Journal of Nonparametric Statistics, 31(2018), no. 1, 221-243.

[2] J.Barrientos-Marin, F.Ferraty, P.Vieu, Locally modelled regression and functional data, Journal of Nonparametric Statistics, 22(2010), no. 5, 617-632.

DOI:1 0.1080/10485250903089930

[3] A.Benkhaled, F.Madani, S.Khardani, Strong consistenand cy of local linear estimation of a conditional density function under random censorship, Arabian Journal of Mathematics, 9(2020), 513-529 . DOI: 10.1007s40065-020-00282-1

[4] A.Bouchentouf, A.Hamza, A.Rabhi, Strong uniform consistency rates of conditional hazard estimation in the single functional index model for dependant functional data under random censorship, International Journal of Statistics and Economic, 18(2017), 82-101.

[5] Z.Cai, Estimating a distribution function for censored time series data, Journal of Multivariate Analysis, 78(2001), 299-318.

[6] J.Demongeot, A.Laksaci, F.Madani, M.Rachdi, Functional data: local linear estimation of the conditional density and its application, Statistics, 47(2013), no. 1, 26-44.

[7] F.Ferraty, P.Vieu, Nonparametric functional data analysis. Theory and Practice. Springer Series in Statistics, New York, 2006.

[8] Z.Guessoum, Ould Said, E. On nonparametric estimation of the regression function under random censorship model, Statistics & Decisions, 26 (2008), 159-177.

[9] E. M.Kaplan, P.Meier, Nonparametric estimation from incomplete observations, Journal of the American Statistical Association, 53 (1958), 457-481.

[10] S.Leulmi, Local linear estimation of the conditional quantile for censored data and functional regressors, Communications in Statistics - Theory and Methods, 50(2019), 1-15.

DOI: 10.1080/03610926.2019.1692033

[11] S.Leulmi, Nonparametric local linear regression estimation for censored data and functional regressors, Journal of the Korean Statistical Society (2020), 1-22.

[12] S.Leulmi, F.Messaci, Local linear estimation of a generalized regression function with functional dependent data, Communications in Statistics - Theory and Methods, 47(2018), no.23, 5795-5811.

[13] S.Leulmi, F.Messaci, A class of local linear estimators with functional data, J. Sib. Fed. Univ. Math. Phys, 12(2019), no. 3, 379-391. DOI: 10.17516/1997-1397-2019-12-3-379-391.

[14] Z.Lin, C.Lu, Limit theory of mixing dependent random variables.Mathematics and its applications, Beijing: Sciences Press, Kluwer Academic Publishers, 1996.

[15] N.Ling, Y.Liu, The kernel regression estimation for randomly censored functional stationary ergodic data, Communications in Statistics-Theory and Methods, (2016).

DOI: 10.1080/03610926.2016.1185117.

[16] B.Mechab, N.Hamidi, S.Benaissa, Nonparametric estimation of the relative error in functional regression and censored data, Chilean Journal of Statistics, 10(2019), no. 2, 177-195.

[17] F.Messaci, N.Nemouchi, I.Ouassou, M.Rachdi, Local polynomial modelling of the conditional quantile for functional data, Statistical Methods & Applications, 24(2015), no.4, 597-622.

[18] N.Rouabah, N.Nemouchi, F.Messaci, A rate of consistency for nonparametric estimators of the distribution function based on censored dependent data, Statistical MeAhods & Applications, 28(2019), 259-280. DOI: 110.1007/s10260-018-00445-7.

О непараметрической оценке функциональной регрессии на основе цензурированных данных в условиях сильного перемешивания

Фарид Леулми Сара Леулми

Университет Фререс Ментури Константин, Алжир

Сумия Харфучи

Университет Салах Бубнидер Константин, Алжир

Аннотация. В этой статье мы занимаемся локальной линейной непараметрической оценкой функции регрессии в модели цензуры, когда ковариаты принимают значения в полуметрическом пространстве. Затем мы устанавливаем поточечную почти полную сходимость со скоростью предложенной оценки, когда выборка представляет собой последовательность сильного перемешивания. Для дальнейшего подтверждения наших теоретических результатов было проведено имитационное исследование, иллюстрирующее хорошую точность изучаемого метода.

Ключевые слова: функциональные данные, подвергнутые цензуре данные, локально смоделированная регрессия, почти полная конвергенция, сильное перемешивание.

i Надоели баннеры? Вы всегда можете отключить рекламу.