Научная статья на тему 'HEAVY TAIL INDEX ESTIMATOR THROUGH WEIGHTED LEAST-SQUARES RANK REGRESSION'

HEAVY TAIL INDEX ESTIMATOR THROUGH WEIGHTED LEAST-SQUARES RANK REGRESSION Текст научной статьи по специальности «Математика»

CC BY
60
21
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
FRECHET DISTRIBUTION / WEIGHTED LEAST-SQUARES REGRESSION / RANK REGRESSION / MONTE CARLO SIMULATION / SHAPE PARAMETER

Аннотация научной статьи по математике, автор научной работы — Khemissi Zahia, Brahimi Brahim, Benatia Fatah

In this paper, we proposed a weighted least square estimator based method to estimate the shape parameter of the Frechet distribution. We show the performance of the proposed estimator in a simulation study, it is found that the considered weighted estimation method shows better performance than the maximum likelihood estimation. Maximum product of spacing estimation and least-squares in terms of bias and root mean square error for most of the considered sample sizes. In addition, a real example from Danish data is provided to demonstrate the performance of the considered method.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «HEAVY TAIL INDEX ESTIMATOR THROUGH WEIGHTED LEAST-SQUARES RANK REGRESSION»

DOI: 10.17516/1997-1397-2022-15-6-797-805 УДК 519.65

Heavy Tail Index Estimator through Weighted Least-squares Rank Regression

Zahia Khemissi* Brahim Brahimi^ Fatah Benatia*

Laboratory of Applied Mathematics Mohamed Khider University Biskra, Algeria

Received 10.07.2022, received in revised form 15.09.2022, accepted 20.10.2022

Abstract. In this paper, we proposed a weighted least square estimator based method to estimate the shape parameter of the Frechet distribution. We show the performance of the proposed estimator in a simulation study, it is found that the considered weighted estimation method shows better performance than the maximum likelihood estimation. Maximum product of spacing estimation and least-squares in terms of bias and root mean square error for most of the considered sample sizes. In addition, a real example from Danish data is provided to demonstrate the performance of the considered method. Keywords: Frechet distribution, weighted least-squares regression, Rank regression, Monte Carlo simulation, shape parameter.

Citation: Z.Khemissi, B.Brahimi, F.Benatia, Heavy Tail Index Estimator through Weighted Least-squares Rank Regression, J. Sib. Fed. Univ. Math. Phys., 2022, 15(6), 797-805. DOI: 10.17516/1997-1397-2022-15-6-797-805.

Introduction

In many theoretical concepts, the parametric estimating distribution methods have received great interest, among them are: Maximum likelihood estimation (MLE) method which has good theoretical properties for large sample sizes and is often preferred. On the other hand, the use of regression depends on a probability plot to estimate the parameters of statistical distributions because the procedure for its implementation is simple in cases of complete and censoring data. Where it represents the linear regression model, and its dependent variable is the nonparametric estimate for the value of the distribution function at the ranked sample, is obtained. From it, the estimates of the least squares of the parameters of the resulting regression model become the estimates of the parameters of the studied statistical distribution.

Least squares regression method (LS) based on the relationship between the empirical cumulative distribution function (cdf) and the order statistics are frequently used to estimate parameters of distributions. The weighted least squares method (WLS) is applied for parameter estimation, this method is comparatively concise and easy to perceive. In the literature, WLS estimation can be a better alternative that is superior to the existing methods than: [5,13,17,18]. [15] studied the LS method, ridge regression and maximum product of spacing methods (MPS) to estimate parameters for the Pareto distribution, while [17] considered the Laplace distributed errors [11] and Box-Cox regression to stabilize variance. [13] considers the LSE and WLSE for the Pareto

*[email protected]

[email protected] https://orcid.org/0000-0003-4482-3749 [email protected] https://orcid.org/00000-0002-3236-8729 © Siberian Federal University. All rights reserved

distribution. [17] considers regression procedure for the parameters of the three-parameter generalized Pareto distribution and applies the WLSE with the Box-Cox procedure. [18] applied WLS rank regression to estimate of the parameters of the Weibull, the exponential and the Gumbel distributions, and the results showed that the WLS estimator outperform the usual LS estimator, especially in small samples. Additionally, some research has been conducted on the Frechet distribution where [2] studied the performance of three different estimation methods of scale parameter LS, WLS and MLE, for two parameters Frechet distribution, and the results of the Monte Carlo simulation were show that the MLE method was the best as compared to LS and WLS method in terms of bias as well as mean square error.

In this paper, we propose a weighted least squares regression from introduce a new estimator of the cumulative distribution function for heavy tailed for the shape parameter of the Frechet distribution. The weights are based on the idea of calculating a derive approximate weights to stabilize the variances ([18]). Thus, the weights are of a simple form and independent of the parameter of the distribution. Then, the proposed WLS is then applied to a Monte Carlo simulation where in most cases the results give better performance than the usual LS estimator, we also get approximate results for the MLE and MPSE estimators. Next, an applied illustrative example from Danish data.

The rest of the paper is organized as follows, we define our estimators of the LS method and the WLS method in Section 1. In Section 2 we perform a simulation study to illustrate the performance of our estimator and an application with real data of Danish fire insurance claims. Concluding notes are relegated to the Section 2.3.

1. Methodology and main results

Extreme value models play an important role in statistic. The generalized extreme value (GEV) distribution [9] and its sub-models are widely used in application involving extreme events. We consider the one-parameter Frechet distribution. The probability density function with parameter y > 0 is,

f (x; y) = yx-(y+1) exp(— x 1) for x, y > 0

the cdf is given by :

F(x; y) = exp(—x 1). (1)

In this section, we describe the methods of estimation for the one-parameter Frechet distribution.

1.1. Least squares method

The principles of LSE are independently discovered by [1,6,12].The distribution function can be transformed to a linear regression model, if it can be written as an explicit function. After algebraic manipulation, Equation 1 can be linearized as follows :

— log(— log(F(x)) = y log(x).

Suppose that random variables x(1) < x(2) < ... < x(n) be the order statistics of x1,x2,... ,xn. are independent and identically distributed from the Frechet distribution, the regression model is rewritten as:

— log( log(F(x{i); y))) = Ylog(x(i)). (2)

Comparing equation 2 with Yi = YXi, we get Yi = — log— log(F(x(i); y))) and Xi = log(x(i)), the regression model with error term occurs as :

Y(i) = Yx(i) + £(i). (3)

On the other hand, the error term of the model given in equation 3 is not identically distributed as mentioned model have no equal variance. This situation may adversely affect the LSE. In such cases, alternative estimation approaches to stabilize variances should be used.

In estimation, the sum of the squares of the errors, which is defined below, should be minimized

n

min^(Yi - ylog(xw))2; with Yi = -log(-log(F(xw;7))). (4)

Y i=i

In [3], we use in this study the mean rank estimator to estimate the values of the cumulative distribution function F(x),

___ i

F(x(i)) = —TT w n +1

where i denotes the ith smallest value of x(1),x(2),..., x(n).

Therefore, the estimate 7 of the parameter 7 is given by differentiating equation 4 partially 7 and equaling to zero, we get LS estimate is :

n

E - log( log F(x(i))) log(x^i)) i=i

7ls =-

E (log(x(i)^

i=i

1.2. Weighted least squares method

For estimating parameter of the Frechet distribution, the order statistics x(1) < x(2) < ... < x(n) denotes a sample of size n from a Frechet distribution F, so that the regression model based on equation :

Y„ = 7X(i), (5)

called regression of Y on X by [16], knowing that the order statistics x(1) < x(2) < ... < x(n) do not have constant variance, nor do the log transformed order statistics X, so that the regression model 5 is non-homogeneous.

Equation 2 with error term yield the following equation and replacing F(x^)) by its estimate, called Fi, we obtain the equation

- log( log(F(x(i) )) = 7 log(x(i)) + £i. (6)

To estimate 7 of the regression parameter 7, than the regression model can be expression to minimize the function

n

12

min$3 wi Yi) - 7 log(x(i))]2

where wi is the weight factor i = 1,... ,n.

In this paper, we derive approximate weights to stabilize the variances proposed by [18] defined by the following formula

1/ IM \\^mi(1 - mi) Var(A(xa))) v (n + 2)

dA(F (x(i)))

dF (x(i))

(7)

in order to calculate weights using large sample properties of the empirical distribution function or order statistics, and the weights for least-squares derivation from large sample variances, we use the inverse of the approximate variance of a scalar function A of a derived order statistic, to stabilize the variance in order to perform the WLS estimation method.

2

Furthermore, if A(x^))is of the form as A(x^)) = A(F(x(i))), we calculate the approximate variance of — log(— log(F(x^; 7))) = 7 log(x^)) using the formula 7 :

Var(-log(— log(F (x(i))))

m,: (1 — mi)

(n + 2)(f (x(i)))2

d — log(— log(f (x(i); 7)))] d(x(i))

mi (1 — mi) 1 i

—?r ; with m, =

(n + 2) m2 ""' n +1

n+T (1 — n+T ) 1

(n + 2)

(n + 1 — i)2'

(«+0

Therefore, we get the weights are independent of the parameter of the considered distribution .

In addition, the linear regression model given in 5, The weighted least-squares regression equation is solved by letting

Yi = Y* = (— log— log(Fi)),..., — log(— log(Fn)), Xi = X* = (log(x(i)),..., log(x(n))) and w = diag (wi,w2,. .. ,wn), wi = i) , i = 1,...,n

which is solved by

yWLS = (X*wX)-1X*wY . Where ywls is the vector of the WLS estimate of 7, then it follows that :

n

—wi log(x(i))log(— log Fi)

IWLS := —-n-, wi ~ 1/Var(— log(— log(F(x(i))))).

wi(log(x(i)))2

i=1

In Tab. 1, we will show values YWLS by changing values of 7 = (1.67, 1.11, 0.5) and sample size n = (10; 20; 30; 50; 100; 200; 500; 1000; 2000).

2. Simulation and example

2.1. Performance of the estimator

To compare the performance of our proposed estimator 7WLS against the least square estimator 7lsb , the maximum likelihood estimator YMLE and Maximum product of spacing estimation ympSe , we made some simulation studies. A common approach to select the best method is the Monte Carlo simulation by using appropriate criteria: bias and mean squared error MSE [10]. In this section, the considered WLS is compared with the MLE, MPSE and LSE, we propose a Monte Carlo study of 10000 randomly generated samples, for each sample sizes ranging from n = 10, 20, 30, 50,100,200,500,1000 to 2000 for Frechet distribution and the shape parameters are considered as 7 = (1.67; 1.11; 0.5). The efficiency of the methods is based on an comparison between the root mean square error (RMSE) and Bias. The Bias of an estimator is Bias(7) = E(Y) — 7. The RMSE is defined as root of the sum of the variance and the squared bias of an estimator.

2

Table 1. The estimation of 7 by different estimators at true value 7 = (1/0.6; 1/0.9; 0.5) (note: the value of each entry is mean, and results are re-scaled by the factor 0.00001)

n Methods Y = 1.67 Y = 1.11 Y = 0.5

10 MLE MPSE LSE WLS 1.32995 1.30107 2.49297 2.06358 0.95437 1.89420 1.61866 1.48917 0.45015 0.41260 0.49807 0.47880

20 MLE MPSE LSE WLS 1.49870 1.46761 2.09857 2.04806 0.99107 0.95218 1.66033 1.34121 0.52681 0.49972 0.70631 0.68271

30 MLE MPSE LSE WLS 1.60355 1.54872 2.16820 2.06776 1.07210 1.01028 1.35582 1.28703 0.56824 0.49935 0.70790 0.60878

50 MLE MPSE LSE WLS 1.44972 1.42337 1.48879 1.44812 1.05178 1.01293 1.37017 1.04935 0.47361 0.44725 0.47361 0.45039

100 MLE MPSE LSE WLS 1.52108 1.41076 1.38191 1.19045 1.04215 1.03052 1.13933 1.09802 0.45006 0.43118 0.47727 0.46916

200 MLE MPSE LSE WLS 1.70824 1.69816 1.82353 1.77158 1.22987 1.21880 1.22692 1.19479 0.53599 0.54960 0.55239 0.53146

500 MLE MPSE LSE WLS 1.65717 1.52088 1.79043 1.66917 1.12576 1.02215 1.13586 1.12096 0.49174 0.47583 0.47122 0.49279

1000 MLE MPSE LSE WLS 1.65780 1.63754 1.73552 1.68209 1.12194 1.08834 1.14054 1.11442 0.49958 0.46457 0.51225 0.51928

2000 MLE MPSE LSE WLS 1.63653 1.60534 1.64144 1.65295 1.08665 1.06778 1.11822 1.11291 0.51535 0.50646 0.52656 0.51404

From the simulation results presented in Tab. 2 which shows the RMSE and bias values for MLE, MPSE and LSE estimation and the considered WLS in this study, for selected sample sizes and considered values of the Frechet parameter.

Table 2. Simulated bias and RMSE when y = (1.667; 1.111; 0.5), and results are re-scaled by the factor 0.00001

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Y = 1.666 Y = 1.111 Y = 0.5

n Methods Bias RMSE Bias RMSE Bias RMSE

10 MLE MPSE LSE WLS -0.42880 0.57890 -0.44991 0.55032 -0.33831 0.54721 -0.33348 0.53601 -0.95252 1.03148 -0.97364 1.04259 -0.89221 1.05260 -0.88898 1.05734 -0.42863 0.47367 -0.43001 0.46928 -0.40149 0.46417 -0.40004 0.41580

20 MLE MPSE LSE WLS -0.32511 0.28018 -0.33076 0.30029 0.33339 0.59442 0.32463 0.20644 -0.21651 0.58715 -0.22108 0.60739 0.22226 0.69628 0.21442 0.50429 -0.09743 0.18422 -0.09956 0.19070 0.10002 0.19833 0.09239 0.18193

30 MLE MPSE LSE WLS -0.21613 0.25724 -0.23182 0.30835 0.25063 0.46406 0.20260 0.26453 -0.11059 0.55177 -0.14268 0.57266 0.16709 0.60937 0.15507 0.50969 -0.07176 0.16830 -0.73280 0.18033 0.07519 0.19922 0.06978 0.13936

50 MLE MPSE LSE WLS -0.11182 0.17517 -0.14393 0.27124 0.17100 0.33949 0.10558 0.14598 -0.11782 0.41691 -0.11813 0.42014 0.11953 0.42633 0.10372 0.33065 -0.04952 0.08261 -0.05096 0.09907 0.05136 0.10185 0.04367 0.07379

100 MLE MPSE LSE WLS -0.09519 0.12317 -0.09938 0.16086 0.10359 0.22574 0.08920 0.11128 -0.05947 0.32213 -0.06458 0.35150 0.06906 0.39049 0.04345 0.28419 -0.03155 0.04696 -0.03479 0.05707 0.03696 0.06772 0.02676 0.03939

200 MLE MPSE LSE WLS -0.05917 0.08742 -0.06157 0.10553 0.06246 0.15187 0.05417 0.06031 -0.04145 0.30828 -0.04194 0.33039 0.04203 0.35125 0.03611 0.30687 -0.01665 0.02923 -0.01778 0.03045 0.01874 0.04556 0.01525 0.02809

500 MLE MPSE LSE WLS -0.02649 0.05554 -0.02988 0.07663 0.03100 0.09109 0.02109 0.05095 -0.01999 0.13703 -0.20984 0.15184 0.02067 0.17072 0.01739 0.11530 -0.00744 0.01966 -0.01032 0.02104 0.01182 0.02733 0.00703 0.01838

1000 MLE MPSE LSE WLS -0.01663 0.03994 -0.01475 0.05597 0.01767 0.07299 0.01599 0.03091 -0.01091 0.02627 -0.01109 0.03230 0.01178 0.04199 0.01066 0.01727 -0.00619 0.01182 -0.00428 0.01187 0.00945 0.01189 0.00480 0.01127

2000 MLE MPSE LSE WLS -0.00940 0.02776 -0.01002 0.03387 0.01027 0.04365 0.00899 0.01911 -0.00626 0.01850 -0.00657 0.02521 0.00685 0.02910 0.00599 0.01274 -0.00302 0.00833 -0.00306 0.00945 0.00308 0.01309 0.00270 0.00473

2.2. Results and discussion

According to bias criterion:

We evaluate the estimator WLS the proposed in this study in terms of bias criterion, is best for the small sample size n = 10 and it is the best performer next to the LSE, MLE and MPSE. For other size n > 10 and in all cases of shape parameters we show that in general the estimator

WLS is clearly the best estimator in terms of bias next to the MLE, MPSE and LSE. In addition, bias decreases with increasing sample size and shape parameters cases.

According to the RMSE criterion: For the sample size n =10 and for 7 = (1.67; 0.5), the proposed WLS shows smaller than RMSE of the LSE, MPSE and MLE, also for 7 = 1.11 the RMSE of MLE it's smaller than RMSE of the MPSE, LSE and the WLS.

For n> 10 we show the RMSE of LSE it's larger than MPSE, MLE and WLS for each shape parameters cases. Since the RMSE of the WLS is asymptotically the best, it can be seen from analysis that MLE and MPSE have better performance as the sample size increases the RMSE decreases in each method and shape parameters cases, thus we conclude that there are accurate increments of the parameters.

2.3. Illustrative example

As a real application, We take 2167 observations from the Danish data that describe large fire insurance claims in Denmark from Thursday 3rd January 1980 until Monday 31si December 1990 available in "evir" package of the Rsoftware [8].This data has been used by many value theories in an important application context.

In this section, we are concerned performance of the proposed estimator in weekly and monthly maximum losses during the mentioned period. There are 310 weekly maxima and 132 monthly maxima from the given 2167 observations which would provide an excellent example of the use of extreme as all studies confirm that the Danish data show a heavy tail with an index between 1 and 2.

This allows us to fit the data to heavy-tailed models with the proposed estimator which meets the objective of this paper and compare it with new bias-reduced estimator for a in the case of infinite second moment, see (Tab. 7, [4]) defined by the following formula

V7 -1 e -1 ) n

Our case study is mostly based on samples from the Frechet distribution 1 with shape parameter 7 = 1.5, we then calculate estimate of shape parameter of the Frechet distribution using the previously mentioned estimation method in this study, see Tab. 3.

Table 3. Parameter estimate for Frechet distribution of the weekly and monthly maxima of the Danish fire losses.

Monthly

N_YMLE_Y MPSE_"Tls_YWLS_A_

132 0.63622 0.67531 0.71649 0.68363 0.466853

Weekly

N_TmLE_Y MPSE_iLS_YW LS_A_

310 0.67842 0.65912 0.69471 0.67593 0.408663

Conclusion

The WLS method is meant to calculate weights. In this paper, we propose weighted least squares estimator, based on an easily-calculated propose by [18], is then applied to the estimation of a heavy tailed for the shape parameter of the Frechet. Considering the results of the Monte Carlo

simulation, the efficiency of the method is compared based on bias and the RMSE criterion, where the WLS with the proposed weights in this study can be a good alternative estimation method for the shape parameter of the Frechet in all sample cases. Considering at the results in a real application, it is shown that the proposed WLS shows a better performance than other considered methods.

Moreover, it is also emphasized that the considered estimation methods can be applied to Burr XII, and other distributions, which have explicit cumulative distribution functions, after calculating the inverse of the approximate variance them, and estimating the variances in the weighted least squares estimation.

References

[1] R.Adrain, Research concerning the probabilities of the errors which happen in making observations, &c. The Analyst, or Mathematical Museum, Vol. I, Article XIV, Philadelphia: William P. Farrand and Co 1808, 93-109.

[2] M.S.Annasaheb, C.B.Girish, Comparison of three estimation methods for Frechet distribution, International Journal of Multidisciplinary Research and Development, 5(2018), no. 1, 38-41.

[3] A.Bernard, E.C.Bosi-Levenbach, The plotting of observations on probability paper, Stat. Neederlandica,7(1953), 163-173.

[4] B.Brahimi, D.Meraghni, A.Necir, D.Yahia, A bias-reduced estimator for the mean of a heavy-tailed distribution with an infinite second moment, Journal of Statistical Planning and Inference, 143(2013), 1064-1081.

[5] R.M.Engeman, T.J.Keefe, On generalized least squares estimation of the Weibull distribution, Communications in Statistics. Theory and Methods, 11(1982), 2181-2193.

DOI: 10.1080/03610928208828380

[6] C.FGauss, Translated by G. W. Stewart, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Two, Supplement, Society for Industrial and Applied Mathematics, 1795.

[7] W.L.Hung, Weighted least squares estimation of the shape parameter of the Weibull distribution, Quality and Reliability Engineering International, Wiley Online Library, 2001.

[8] R.Ihaka, R.Gentleman, R: a language for data analysis and graphics, Journal of Computational and Graphical Statistics, 5(1996), 299-314. DOI: 10.1080/10618600.1996.10474713

[9] A.F.Jenkinson, The frequency distribution of the annual maximum (or minimum) values of meteorological elements, Quarterly Journal of the Royal Meteorological Society, 81(1955), 158-171.

[10] Y.M.Kantar, I.Usta, §.Acitas, A Monte Carlo Simulation Study on Partially Adaptive Estimators of linear Regression Models, Journal of Applied Statistics, Taylor & Francis, 2011.

[11] R.Koenker, G.Bassett, Regression quantiles, Econometrica: Journal of the Econometric Society, 46(1978), no. 1, 33-50, JSTOR.

[12] A.M.Legendre, Nouvelles methodes pour la determination des orbites des cometes, Paris, Appendix, 'Sur la Methode des moindres quarres', 1805, 72-80.

[13] H.L.Lu, S.H.Tao, The Estimation of Pareto distribution by a weighted least square method. Quality & Quantity, Springer, 2007.

[14] H.L.Lu, C.H.Chen, J.W.Wu, A Note on weighted least-squares estimation of the shape parameter of the Weibull distribution, Quality and Reliability Engineering, Wiley Online Library, 2004.

[15] H.J.Malik, Estimation of the parameters of the Pareto distribution, Metrika: International Journal for Theoretical and Applied Statistics. Springer, 15(1970), 126-132.

[16] L.F.Zhang, M.Xie, L.C.Tang, A study of two estimation approaches for parameters of Weibull distribution based on WPP. Reliability Engineering & System Safety, Elsevier, 2007.

[17] J.M.V.Zyl, A Median Regression Model to Estimate the Parameters of the Three-parameter Generalized Pareto Distribution, Communications in Statistics-Simulation and Computation, 41(2012), 544-553. DOI: 10.1080/03610918.2011.595868

[18] J.M.V.Zyl, R.Schall, Parameter estimation through weighted least-squares rank regression with specific reference to the Weibull and Gumbel distributions, Communications in Statistics-Simulation and Computation, 41(2012), 1654-1666.

DOI: 10.1080/03610918.2011.611315

Оценка индекса тяжелого хвоста с помощью взвешенной ранговой регрессии по методу наименьших квадратов

Захия Хемисси Брахим Брахими Фатх Бенатиа

Лаборатория прикладной математики Университет Мохамеда Хидера Бискра, Алжир

Аннотация. В этой статье мы предложили метод взвешенной оценки методом наименьших квадратов для оценки параметра формы распределения Фреше. Мы показываем производительность предложенной оценки в имитационном исследовании, установлено, что рассматриваемый метод взвешенной оценки показывает лучшую производительность, чем оценка максимального правдоподобия. Максимальное произведение оценки интервала и метода наименьших квадратов с точки зрения систематической ошибки и среднеквадратичной ошибки для большинства рассматриваемых размеров выборки. Кроме того, приведен реальный пример из датских данных, демонстрирующий работоспособность рассматриваемого метода.

Ключевые слова: Распределение Фреше, взвешенная регрессия наименьших квадратов, регрессия Ранга, моделирование методом Монте-Карло, параметр формы.

i Надоели баннеры? Вы всегда можете отключить рекламу.