AN IMPROVED ESTIMATOR OF FINITE POPULATION MEAN UNDER RANKED SET SAMPLING
Francis Delali Baeta 1,2% Dioggban Jakperik 2, Michael Jackson Adjabui 2
•
Ho Technical University, Ho, Ghana 1 C. K. Tedam University of Technology and Applied Sciences, Navrongo, Ghana 2
[email protected] [email protected] [email protected]
Abstract
To obtain reliable estimates of population parameters, data that is sampled for estimation must accurately represent the underlying population. Sampled data that is representative of the underlying population depends also on the sampling technique that was used in obtaining them. This is very important since sampling bias could lead to over or under estimation of parameters. Ranked Set Sampling is considered to be a better alternative to the classical sampling designs in obtaining such data. Ranked Set Sampling is designed to minimize the number of measured observations required to achieve a desired precision in making inferences, and thus it is more economical to use for the purposes of estimation, compared to the classical sampling designs. This is also an added advantage in cases where it is difficult to obtain data. Many estimators have been developed recently for the estimation of finite population mean under ranked set sampling. This paper aims to improve estimation by modifying an existing estimator using a simple linear combination of the known population mean, square root of the known coefficient of variation, and the known median of an auxiliary variable. The theoretical properties of the proposed estimator, such as the bias and mean squared error were derived up to the first order of approximation, using Taylor's expansion. The bias, mean squared error, absolute relative bias, and the relative efficiency were used as means of evaluation and comparison between the proposed modified estimator and its competitors. The R software was used to aid computations. Empirical applications to real data showed that the proposed modified estimator is superior to the competing estimators that were compared since it has least bias, the least mean squared error, the least absolute relative bias, and the highest relative efficiency in all sample sizes that were considered. The bias and mean squared error of the modified estimator under Ranked Set Sampling was found to be smaller than those of the existing estimators that were compared. Hence it is more efficient and capable of providing reliable estimates than the existing estimators that were compared and so we recommend that it should be used in survey estimations.
Keywords: Ranked Set Sampling, Ratio Estimator, Bias, Mean Squared Error, Auxiliary Variable
1. Introduction
Over the years, researchers have been preoccupied with the development of new estimators for finite heterogeneous population mean with the aim of reducing the associated bias and MSE of existing estimators to the barest minimum [1, 5, 9, 16]. For reliable estimates, data that is employed for estimation must be representative of the underlying population. Sampled data that is representative of the underlying population depends also on the sampling method [18]. Sampling bias could lead to over or under estimation of population parameters. Consequently,
one field of interest currently has been in the area of identifying designs that generate representative samples for super populations. Estimation of finite population mean has been based disproportionately on the classical sampling designs, especially simple random sampling (SRS). However, the SRS procedure as noted by [3] is incapable of generating representative samples for certain populations. The consequence, as noted by [10] is that, a specific sample which is not truly representative of the underlying population can possibly be included for estimation, and that can lead to unreliable estimates. Therefore, to improve accuracy and precision in the estimation of finite population mean, sampling procedures which do not suffer such weaknesses as the SRS must be considered. Among the sampling methods, the Ranked Set Sampling (RSS) technique is a good alternative to SRS for obtaining data that are truly representative of the population under study [2]. The goal of RSS is to collect observations that are more likely to span the full range of values in the population and therefore produces more representative samples than SRS [10]. RSS was first introduced by [12] and was used to estimate pasture yield. RSS was introduced for circumstances where difficulty exist in taking actual measurements for sample units. [17] established the statistical methodology for RSS.
The procedure for obtaining a ranked set sample is briefly outlined by the following steps:
1. Randomly select a sample of size m2 from the targeted population.
2. Distribute the m2 selected units in m sets, each of size m.
3. Rank the units within each set with respect to the attribute of interest, using the judgement of an expert or by the aid of an auxiliary variable that is correlated with the study variable.
4. Select the ith ranked unit from the ith set for actual measurement of the attribute of interest, in the order i = 1,2,3,..., m.
5. Repeat steps i to iv for r cycles if it is desired to obtain a sample of size, n = mr.
RSS is preferred when mechanisms are readily available for ranking a set of sample units, whether by the use of an auxiliary variable, or by the use of the judgement of an expert. [7] proved that the ranked set sample mean is an unbiased estimator for population mean, even in cases of imperfect ranking. [10] adduced that an auxiliary variable, X could be used to rank any variable under study, Y in cases where judgement ranking of Y is difficult. Consequently, a lot of estimators have been developed under RSS, employing a variety of auxiliary variables for ranking.
[16] introduced the classical ratio estimator under RSS. Several other authors have since extended the work of [16], employing a variety of auxiliary variables. [11] suggested a modified ratio estimator for population mean under RSS utilizing the quartile deviations and the known mean of an auxiliary variable. [4] proposed a generalized ratio estimator for population mean under RSS using the known population mean of an auxiliary variable and some pre-assigned constants. [13] suggested a modified ratio-cum-product estimator for finite population mean under RSS using the known population information on the mean, the coefficients of variation and of kurtosis of an auxiliary variable under RSS. [15] proposed a ratio-type estimator for population mean under RSS, using the known population mean and quartiles of an auxiliary variable. [8] proposed a ratio-type estimator under RSS based on known population mean and population deciles of an auxiliary variable.[9] proposed a generalised ratio-type estimator based on RSS, employing known parameters of the population such as the coefficients of variation, kurtosis and skewness as well as the mean of the auxiliary variable. [14] suggested a ratio type estimator of population mean based on RSS employing the known coefficient of variation, known median, as well as the known population mean of the auxiliary variable. These estimators were more efficient and superior to their competitors. Not withstanding, the existing estimators wield significant biases and are fraught with large mean squared errors. Therefore, this study sought to improve estimation by modifying an existing estimator of finite population mean that was based on RSS.
2. Review of Existing Estimators
Suppose the study variable Y and the auxiliary variable, X are positively correlated. Then [16]
expressed the classical ratio estimator of population mean under RSS as
Vk,RSS = V[n]
where
X
x(n)
Bias (yRRss) = Y 0 ICX - pCxCy) - (W2(0 - Wyx[i)
MSE (yRrRss) = Y2 [0 [C2 - 2pCxCy + C2y) - {W2x(l) - 2Wyx{i) + Wfa
(2.1)
(2.2) (2.3)
Through out of this study,
W2 = 1 f fML-X^2
x(i) r t-1
i=1 2 S2 and Cy = y2.
mX
2
w2 = 1 f ( Vy[i]-YY W ,, = f (Vy[i]-Y)(^x(i)-X) 0 = 1 c2 = Si y[i] r ifi\ mY I ' V"Vx(i) .f m2rY X mr'^x XX2
[13] modified the classical ratio estimator of population mean under RSS respectively using Cx
and fi2(x) as
VM1,RSS = V[n]
and
V M2, RSS = y [n]
with the respective biases
X + Cx
x(n) + Cx
XCx + ß2(x)
x(n) Cx + ß2 (x)
(2.4)
(2.5)
B (ymi,rss) = Y 10 U2cX - WCxCy) - {fW2^ - pWyx(i)
22
B (ym2,rss) = Y 0 (UC2X - vpCxCy) - lv2W2x{l) - vWy^)
and the respective MSEs
(2.6) (2.7)
MsE (yMi, rss) = Y2 [0 [p2C2x + C2y - 2ppCxCy) - [fW2^ + W2y[l] - 2pWyx{i)
MsE (y mi, rss ) = Y2 0 [v2 Cxx + C2 - 2vpCxCy) - [v2W2{{) + W2{i] - 2vWyx{l)
where
X
(2.8) (2.9)
XX + Cx XX Cx
v = —-
XXCx + fc (x)
[14] used the coefficient of variation (Cx) and the median of an auxiliary variable (Md) to propose a ratio-type estimator of population mean as
yp, rss = y [n]
X + CxMd
x(n) + CxMd
(2.10)
with the respective bias and MSE as
B (yP,RSs) = Y 0 IA CX - ApCxCyj - U'W' - AWyx{i)
and
where
A
X
X + CxMd
(2.11)
MSE (]p, rss) = Y2 [0 (A2CX + C] - 2ApCxCy) - (a2W2x[ï) + W^[t] + 2AWyx({)) ] (2.12)
3. The Proposed Estimator
Motivated by [14], this study proposes modified ratio-type estimator for finite population mean under Ranked Set Sampling, which utilizes the coefficient of variation (Cx) and the median (Md) of the employed auxiliary variable as
',,RSS = ][n]
X + Md VCx ' x(n)+ Md Vcx
(3.1)
Using large sample properties, the following assumptions are made: y^n\ = Y (1 + e0) and
X(n) = X (1 + e1), where E (e0) = E (e1) = 0. Therefore, equation (3.1) evolves as
!,RSS = ][n] = y[n]
= y[n]
y[n]
X + Md VC~x |_X (1 + ei)+ Mdvcx\ ' X + MdVCx ' X + Md VCx + Xe\
1
1 + ( X+Md vc) e1. 1
where
Now,
n| 1 1 + wex X
w
X + MdVcx'
1
yB, RSS = ym,[^
= Y (1 + eo )(1 + wet)-1. Assuming |wex | < 1 and using Taylor's expansion to the second order,
B, RSS
Y (1 + eo) 1 - wet + w2e2 +
= Y (1 - œei + w2e^ + e0 - we0e1 +
Therefore the bias of the proposed estimator is obtained as
rss ) = Y œ2E (e^ - wE (eoe1 )
B
^ B (Vb,rss) = Y |w2 (6Cj - W^J - w [6pCxCy - Wyx(l)
Y
6 [w2CX - wpCxCy) - lw2W2(0 - wWyx(i)
.2T/u2
The mean squared error of the proposed estimator is obtained as
MSE (fa, rss) = Y2 [w2E (e?) - 2wE (eoei) + E (e2
^ MSE (yBirss) = Y2 [w2 (eC2? - W?(0) + (dC? - W^) - 2w (dpCxCy - Wyx{{))
Y2
6 [w2C2x + C2 - 2wpCxCyj - (w2W2(i) + W^ - 2wWyx(i)
22
(3.2)
(3.3)
4. Efficiency Comparison
The proposed estimator, yB, RSS was compared to the RSS estimators of [13] and that of [14]. The proposed estimator, yB, RSS is more efficient than the estimator of [14] if
MSE (?b, rss) < MSE (ypRss) ^ w2 (eC2 - wx2({)) - 2w (epCxCy - w??^) < a2 (eC?2 - w^) - 2A (epCxCy - Wyx{i)) ^ (w2 - a2) (eCx2 - W2X{1)) < 2 (w - A) (epCxCy - WyX{i)) Hence, provided w < A, the proposed estimator is more efficient than the estimator of [14] if
(w + A) (eCx2 - W2X({)) + 2Wyx(i)
p <
26CxCV
(4.1)
where p is the correlation coefficient between the auxiliary variable X and the study variable Y. Let P = 6CX - W2^), Q = 2WyX(j) and R = 26CXCy. Then the proposed estimator, yB/RSS is respectively more efficient than yM1,RSS and yM2,RSS if
p < [P (w + q>) + Q] /R and p < [P (w + u) + Q] /R,
where
X
w
<P
X
XCx
X + Md^C~x' r X + Cx' XCx + fc (x)' 5. Empirical Application
The dataset that was used for evaluating the estimators is taken from page 34 of [6] and a general description is given below.
X : Weekly family income. Y : Weekly family expenditure.
Objective: To estimate mean weekly family expenditure.
N = 33, X = 72.5454, Y = 27.4909, p = 0.2521, Md = 69, £2(x) = 2.1429, Cx = 0.1436, C? = 0.3629
The ARB of the various proposed estimators were obtained by the formula
ARB
Bias (yi )
Bias (Vr,rss) where i = (M1,RSS), (M2,RSS), (P,RSS), (B,RSS).
(5.1)
The Percent Relative Efficiency (PRE) of an estimator yi compared to the classical ratio estimator yR of [5], was obtained by
PRE = MSEM x 100, (5.2)
MSE (yi) ' V ;
where i = (M1,RSS), (M2,RSS), (P,RSS), (B,RSS).
Six ranked set sample sizes were considered with the data for different set sizes m and the corresponding number of cycles r and the results displayed in Tables 1 to 6. For each case, corresponding values of W^^), W2^ and Wyx({) were determined for the sample size n = m x r.
For sample size n = 9, where m = 3 and r = 3, then W^ = 0.0012, W^ = 0.0064, Wyx(i) = 0.0028 and the corresponding performance of the various estimators is displayed in Table 1.
Table 1: m=3, r=3
Estimator Bias MSE ARB PRE
yR, RSS 0.0668 9.0725 1.0000 100.0
yM1, RSS 0.0667 9.0653 0.9985 100.1
yM2, RSS 0.0562 8.6695 0.8413 104.6
yP, RSS 0.0556 8.6428 0.8323 105.0
yB, RSS 0.0433 8.1567 0.6482 111.2
If n = 12 where m = 3, r = 4, then W^ = 0.0009, W^ = 0.0050, Wyx(i) = 0.0021 and the performance of the various estimators is displayed in Table 2.
Table 2: m=3, r=4
Estimator Bias MSE ARB PRE
yR, RSS 0.0501 6.6533 1.0000 100.0
yM1, RSS 0.0450 6.6478 0.8982 100.1
yM2, RSS 0.0418 6.3319 0.8343 105.1
yP, RSS 0.0417 6.3310 0.8323 105.1
yB, RSS 0.0325 5.9664 0.6487 111.5
If n = 15 where m = 3, r = 5, then W^^ = 0.0007, W^ = 0.0040, Wyx{i) = 0.0017 and the performance of the various estimators is displayed in Table 3.
Table 3: m=3, r=5
Estimator Bias MSE ARB PRE
Vr, rss 0.0412 5.3680 1.0000 100.0
yM1, RSS 0.0411 5.3635 0.9976 100.1
yM2, RSS 0.0346 5.1103 0.8398 105.0
yP, RSS 0.0343 5.1031 0.8325 105.2
yB, RSS 0.0267 4.8055 0.6481 111.7
If n = 16 where m = 4, r = 4, then W^ = 0.0008, Wy^ = 0.0044, Wyx(i) = 0.0019 and the performance of the various estimators is displayed in Table 4.
Table 4: m=4, r=4
Estimator Bias MSE ARB PRE
yR, RSS 0.0431 4.8955 1.0000 100.0
yM1, RSS 0.0429 4.8908 0.9954 100.1
yM2, RSS 0.0367 4.5020 0.8515 108.7
yP, RSS 0.0364 4.5020 0.8445 108.7
yB, RSS 0.0291 4.2936 0.6752 114.1
For n = 20 where m = 4, r = 5, then W^ = 0.0006, W^ = 0.0036, Wyx(i) = 0.0015 and the performance of the various estimators is displayed in Table 5.
Table 5: m=4, r=5
Estimator Bias MSE ARB PRE
yR, RSS 0.0350 3.8559 1.0000 100.0
yM1, RSS 0.0349 3.8521 0.9971 100.1
yM2, RSS 0.0310 3.6368 0.8857 106.0
yP, RSS 0.0296 3.6292 0.8457 106.3
yB, RSS 0.0234 3.3685 0.6686 114.5
If n = 25 where m = 5, r = 5, then W^^ = 0.0006, W2^ = 0.0033, Wyx{i) = 0.0013 and the performance of the various estimators is displayed in Table 6.
Table 6: m=5, r=5
Estimator Bias MSE ARB PRE
?r, rss 0.0275 2.8278 1.0000 100.0
yM1, RSS 0.0274 2.8248 0.9963 100.1
yM2, RSS 0.0239 2.6750 0.8691 105.7
yP, RSS 0.0235 2.6487 0.8545 106.8
yB, RSS 0.0190 2.4400 0.6909 115.9
6. Conclusion
The study modified the ratio estimator of [14] and derived the theoretical properties of the modified estimator up to order O (n-1). The modified estimator was compared to the all the RSS estimators that were considered by [14] using the classical RSS ratio estimator of [16] as the basis of comparison. Ranked Set Samples various sizes were considered to test the performance of the various estimators and all sizes, the propsed modified estimator had the last bias and MSE. Compared to the classical RSS ratio estimator of [16], the efficiency of the proposed modified estimator ranged from 11% to 16% whilst the efficiency of the estimator of [14] ranged from 5% to 8%. The bias of the proposed modified estimator was also the least in all the sample combinations that were considered. This study therefore recommend the use of improved estimator for estimation since it can provide more efficient and more accurate estimates, compared to its competitors.
Funding: This work received no funding.
Conflict of Interest: The authors declare that they have no conflicts of interest. Data Availability: The data that was used in this study is available within the paper.
References
[1] Afzal, I. and Masood, N. (2022). An exponentially ratio type estimator under ranked set sampling (rss) and its efficiency. Journal of Statistics, 26:44-53.
[2] Ahmed, S. and Shabbir, J. (2019). Model based estimation of population total in presence of non-ignorable non-response. PLoS ONE, 14(10):e0222701.
[3] Al-Saleh, M. F. and Samawi, H. M. (2007). A note on inclusion probability in ranked set sampling and some of its variations. Test, 16:198-209.
[4] Brar, S. S. and Malik, S. (2014). Generalized ratio type estimator of population mean under ranked set sampling. International Journal of Statistics and Reliability Engineering, 1(2):178-192.
[5] Cochran, W. (1940). The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. The Journal of Agricultural Science, 30(2):262-275.
[6] Cochran, W. G. (1977). Sampling Techniques. John Wiley & Sons, New York, NY, USA.
[7] Dell, T. and Clutter, J. (1972). Ranked set sampling theory with order statistics background.
Biometrics, 28:545-555.
[8] Jeelan, M. I., Bouza, C. N., and Sharma, M. (2017). Modified ratio estimator under rank set sampling. Investigación Operational, 38(1):103-106.
[9] Khan, Z. and Ismail, M. (2019). Modified ratio estimators of population mean based on ranked set sampling. Pakistan Journal of Statistics and Operation Research, 2:445-449.
[10] Lynne Stokes, S. (1977). Ranked set sampling with concomitant variables. Communications in Statistics-Theory and Methods, 6(12):1207-1211.
[11] Maqbool, S. and Javaid, S. (2014). Modified ratio estimator using quartiles of auxiliary variable in ranked set sampling. Int. J. Agricult. Stat. Sci, 10(2):333-334.
[12] McIntyre, G. (1952). A method for unbiased selective sampling, using ranked sets. Australian Journal of Agricultural Research, 3(4):385-390.
[13] Mehta, N. and Mandowara, V. (2016). A modified ratio-cum-product estimator of finite population mean using ranked set sampling. Communications in Statistics-Theory and Methods, 45(2):267-276.
[14] Riyaz, S., Rather, K. U. I., Maqbool, S., and Jan, T. (2023). Ratio estimator of population mean using a new linear combination under ranked set sampling. Reliability: Theory & Applications, 18(2 (73)):479-485.
[15] Saini, M. and Kumar, A. (2017). Ratio estimators for the finite population mean under simple random sampling and rank set sampling. International Journal of System Assurance Engineering and Management, 8:488-492.
[16] Samawi, H. M. and Muttlak, H. A. (1996). Estimation of ratio using rank set sampling.
Biometrical Journal, 38(6):753-764.
[17] Takahasi, K. and Wakimoto, K. (1968). On unbiased estimates of the population mean based on the sample stratified by means of ordering. Annals of the institute of statistical mathematics, 20(1):1-31.
[18] Wolfe, D. A. (2010). Ranked set sampling. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4):460-466.