Unbiased Exponential Type Estimators of Population Mean Using Auxiliary Variable as an Attribute In Double
Sampling
Sajad Hussain*
Department of Mathematics, School of Engineering (SOE), Presidency University, Bengaluru, Karnataka, India-560064
Abstract
In this paper, unbiased ratio-cum-product exponential type estimators for estimating the population mean have been introduced, specifically within the framework of a double sampling plan. The large sample properties of these estimators are investigated by deriving their bias and mean square error (MSE) expressions. The findings indicate that, under optimal conditions, the proposed estimators are not only unbiased but also more efficient than traditional methods, including the sample mean and the double sampling ratio and product type estimators developed by Naik and Gupta [11] and Singh et al. [17]. To further substantiate the theoretical results, we conducted a numerical study, which demonstrates the practical effectiveness of the proposed estimators in improving estimation accuracy.
Keywords: Exponential Estimator, Auxiliary Attribute, Proposed Estimator, Population, Optimum Value.
1. Introduction
In sampling theory, the precision of an estimate can be enhanced by incorporating information from an auxiliary variable, particularly when this variable is highly correlated with the study variable. The method of utilizing this auxiliary information depends on its form. For quantitative auxiliary information, estimators like Cochran's ratio estimator [4], Robson's product type estimator [12], Bahl and Tuteja's exponential type estimators [2], and other estimators developed by AL-Omari and Bouza [1], Shalabh and Tsai [15], Singh and Vishwakarma [16], Mehta et al. [10], and Hussain et al. [5, 6, 7] are commonly employed.
However, in many practical scenarios, the auxiliary information is qualitative in nature-meaning the auxiliary variable is an attribute correlated with the study variable. For example, Jhajj et al. [8] and Shabir & Gupta [17] illustrated cases such as (a) the height of an individual being dependent on their sex, and (b) crop yield depending on the variety of the crop. In such cases, traditional estimators that rely on quantitative auxiliary information are not suitable due to the point biserial correlation between the study and auxiliary variables. To address this, researchers like Naik and Gupta [11], Jhajjet al. [8], Singh et al. [17], Shabir & Gupta [17,14], and Abd-Elfattah et al. [14] have developed ratio and product type estimators for the population mean that leverage prior knowledge of the parameters of the auxiliary attribute, thereby improving estimation accuracy in situations where the auxiliary information is qualitative. The proposed exponential estimators typically assume that the population proportion P is known beforehand. However, in real-world scenarios, researchers often encounter situations where this population proportion is not known in advance. To address this, the method of double sampling is employed.
In response to this challenge, Naik and Gupta [11] and Singh et al. [17] introduced double sampling ratio and product type estimators. However, these estimators are biased, meaning they can potentially under or overestimate the true population mean, which introduces inaccuracies in the estimation process.
Taking into account the above discussion, we propose unbiased ratio-cum-product exponential type estimators of the population mean, denoted as Tdel and Tde2, utilizing auxiliary information in the form of an attribute. Theoretical expressions for the Bias and Mean Square Error (MSE) up to the first order approximation have been derived for these proposed estimators. These results are then compared, both theoretically and empirically, with some existing double sampling ratio and product type estimators.
2. Methodology and Existing Estimators
Use the Simple Random Sampling Without Replacement (SRSWOR) procedure to select a sample of size n from a population consisting of N units. In this context, each unit in the population is associated with two variables: Yj, which represents the value of the study variable for the i'th unit, and fai, which represents the auxiliary attribute for that same unit (i = 1,2,..., N). The auxiliary attribute fa is assumed to be dichotomous, meaning that it takes on only two possible values: 0 or 1. A value of 1 indicates the presence of the attribute, while a value of 0 indicates its absence. This complete dichotomy within the population allows for a clear classification of units into two distinct categories based on whether the attribute fa is present or absent. This classification is crucial for the estimation process, as it allows researchers to leverage the auxiliary information to improve the precision of estimates. For instance, if the attribute fa is highly correlated with the study variable Y, this information can be used to develop more efficient estimators of the population mean or other parameters of interest. The SRSWOR method ensures that every unit in the population has an equal chance of being selected, and no unit can be selected more than once, maintaining the randomness and representativeness of the sample. This procedure is fundamental in sampling theory, particularly when auxiliary information is available and can be used to refine estimates and reduce potential bias.
Consider A = Ej=1 faj and a = En=i fai, the number of units possessing the attribute fa in the population and sample respectively. Therefore P = N and p = - is the proportion of units possessing the given attribute fa in the population and sample respectively. When the value of P is unknown, the method of double sampling can be applied. This involves allocating a portion of the budget to gather information on an auxiliary variable. First, a large preliminary sample of size n is collected, with pi representing the proportion of units possessing the attribute fa in this sample. Then, a smaller, second-phase sample of size n is drawn, which is nested within the first-phase sample (n < n). In this second sample, p represents the proportion of units with the attribute fa, and y is the mean of the study variable Y. Some formulas that have been used to compute various measures in this case are presented below as
Population Estimates Sample Estimates
Y = N Ei=i X, is the mean of study variable. : y = - E-=i X, is the mean of study variable. sy = N—1 Ei=1 (Yi - Y)2, is the mean square of : s^ = -- En=1 (yi - y)2, is the mean square of study variable. study variable.
Sfa = N-1 (fai - P)2, is the mean square of : sfa = En=1 (fai - p)2, is the mean square auxiliary variable. of auxiliary variable.
Syfa = nLtEn=1 (Yifai - NPY)2, is the covari- : syfa = --1E-=i(yifai - -py), is the covari-ance between study variable and auxiliary at- ance between study variable and auxiliary attribute. tribute. Further,
S S
Cy = Y and Cp = spfa, is the coefficient of variation of Y and fa respectively.
s
Ppb = sySfafa, is the correlation between Y and fa .
ff = syjr, is the sample regression coefficient. Y =
—, Y1 = {n - N) , Y2 = (n - n) , Y3 = Y + Y1, where f = NN is the sampling fraction.
n ' ' 1 N 1 ' '2 \n n
To obtain the Bias and MSE expressions when the auxiliary variable is an attribute in single sampling plan, consider
y - Y p - P
eo =J—=^ and = .
The expected values of quantities eo, ej, e0, ej, and £o£j are obtained as
E(eo) = E(ej) = 0, E(e0) = YCy, E(ej) = yC2, E(eoej) = YCyp.
To obtain the Bias and MSE expressions when the auxiliary variable is an attribute in double sampling plan, consider
eo = Y-1 (y - Y), ej = P-1 (p - P), ej = P-1 (px - P). Therefore, the following expected values are obtained as
E(eo ) = E(ej ) = E(ej ) = 0, E(e2) = yC2, E(ej ) = YC2p.
E(ef) = Y1Cp, E(eoej) = Ycyp, E(eoej) = Y1 Cyp, E(ej ej) = Y1 Cp. When auxiliary information is unavailable, the sample mean serves as a reliable estimator for the population mean. The sample mean, denoted by y , is computed by averaging the observations in a sample and provides a practical approach to estimating the population mean
1n
t1 =y = .
1=1
The estimator t1 is unbiased and its variance is as
V (t1) = YY2Cy2. (1)
Naik and Gupta [11] were the pioneers in proposing double sampling ratio and product type estimators for the population mean, specifically tailored for scenarios where the auxiliary variable is presented as an attribute. Their groundbreaking work introduced a method to effectively utilize categorical or binary auxiliary data to improve the accuracy of population mean estimates
Tngr — ^ •
Tngp — y
p
ni P
Pi
The Bias and MSE of the estimator Tngr and Tngp is as
Bias (Tngr) = Y(y - Y1)(Cp - Cyp).
MSE (Tngr) = Y2(YCy2 + YiCp2 - 2Y2 Cyp). (2)
Bias (Tngp) = Y(y - Y1)Cyp.
MSE (Tngp) = Y2(YCy2 + Y2 Cp2 + 2y2 Cyp). (3)
Later, Singh et al. [17] introduced a double sampling exponential ratio estimator for the population mean, tailored for situations where the auxiliary information is given as an attribute. This approach enhances estimation accuracy by effectively incorporating categorical or binary auxiliary data
p1 - p
Tsgr = ^{p^p p - p1
Tsgp = yexp{p+-p
The Bias and MSE of estimators Tsgr and Tsgp is as
Bias(TSgr) = 2Cp - Cyp) • MSE(Tsgr) = Y2 (7Cy2 + 172Cp2 - YiCyp^j
Bias (Tsgp ) = 172"? (1 Cp + Cyp^j • MSE(TSgp) = Y2 (jCy2 + 472Cp2 + 72C
(4)
(5)
3. Proposed Estimators in Double Sampling
The proposed proportion based unbiased exponential ratio-cum-product estimators in double sampling plan are as
Trpl — y
lpl
pl — p \ ( p — pl wl exp I ——- + (l — wl) exp 1
lpi
(6)
T 1 y
p2 = y
w2 exp I p—p ) + (l — w2) exp ' p pl
(7)
gp j * \ gp
Where l( = 0), g(= 0), &>i and w2 are constants. The values of l and g are chosen such that the proposed estimators are unbiased and the values of &>i & w2 are chosen such that MSE (Trpi) (i = 1,2) is minimum.
Theorem 4.1: The Bias and MSE expressions for the double sampling ratio-cum-product type estimators TrpT and Trp2 to the first order of approximation are as
Y2Y
Bias (Trpl ) = —
Cp
Cyp — 2wlCyp + —
(8)
and
Bias(Trp2 ) = YY?
C2
(2w2 — l ) (Cp — Cyp ) — ^
v2
MSE(Trpl) = Y
MSE(Trp2) = Y2
' . (2wi — l)2 „2 2(2WI — l) r
YC + --12—- Y2Cp — —-l-- Y2Cyp
l
' 2 (2W2 — l)2 r2 2(2W2 — l) r YC + --Y2Cp — ^—g-Y2Cyp
g2
g
(9)
(l0)
(ll)
Proof: Writing the estimator TrpT and Trp2 i.e. (6) and (7) in terms of es the expression obtaimed are as
Trpl = Y (l + eo )
wl exp
'P(l + e'ç ) — P(l + e^ ) \
, lP(l + ) )
+ (l — Wl) exp
'P(l + e^, ) — P(l + e'ç ) ^
, lP(l + ) ,
^ Trpl = Y(l + eo)
wl exp
e(p e (p e( + e<pe(p + •••
l
+ (l — Wl ) exp
e( + e (p e(
+ •••
(l2)
l
Trp2 — Y (i + eo )
w2 exp
'P(1 + e'j) — P(i + ej) '
gP(1 + ej )
+(1 — œ2) exp
'P(1 + ej) — P(1 + e j) ^
gP(1 + ej )
^ Trp2 — Y (1 + eo )
w2 exp
ej + ej ej
+
+ (1 — œ2 ) exp
ej — ej — ej + ej ej + g
Solving (12) & (13) and retaining the terms up to second degree only, we have
Trp1 — Y (1 + eo )
1+
2„1 (ej — e j — ej +
ej + ej 2ej ej
+
2l2
+
ej + ej ej
(13)
(14)
Trp2 — Y (1 + eo )
1 + 2„2 (e j + ej — ej — eje j) + ej + e j — 2e j ej
g
2g2
+
ej ej ej + g
Further solving (14) & (15) and excluding the terms of degree higher than two, we have
(15)
Trp1 Y
e + 2„1(e j — e jp — ej + eje j + e j eo — eo ej) + ej + e j — 2e j ej
l
+
2l2
+ ej — ej — ej ej + eo ej — eo ej
(16)
Trp2 Y
eo +
2W2(e'j + ej — ej —
+ e j eo — eoej ) + ej + e j — 2e j ej
+
2g2 2
ej — ej — ej + ej ej + eo ej — eo ej g
(17)
Taking expectation on both sides of equations (16) and (17), the Bias of the estimator Trp\ and Trp2 is obtained as
cp"
Bias(Trp1 ) —
Cyp Iw^Zyp + 2p
Bias(Trp2 ) — Y—
(2w2 — 1)(C2p — Cyp ) — ^
C2
p
2g
(18)
(19)
Squaring equations (16) and (17) on both sides and then taking expectation, the mean square error of the proposed estimators is obtained as
MSE(Trp1) — Y2 MSE(Trp2) — Y2
' 2 (2„ — 1)2 ^2 2(2^1 — 1) YCy + „2 Y2Cp — -Y2Cyp
' 2 (2„2 — 1)2 r2 2(2„2 — 1) r YC + --72C2p — ^—g-Y2Cyp
g2
g
(20)
g
l
3.0.1 Unbiased Condition
The double sampling exponential estimators Trpi and Trp2 are found unbiased, if
l 2(2^1 -1)ppbCy and g 2(2^2 - 1)(PpbCy - Cp) respectively.
Using the values of l and g in equations (20) and (21) respectively, the expressions obtained are as follows
MSE(Trpi) = Y2 [<yCy + 473(1 - 2^)4^ - 473(1 - 2ui)2p^tf] . (22)
MSE(Trp2) = Y2 \7Cl + 472(1 - 2^2)4(ppbCy - Cp)2-
2 1 (23)
472(1 - 1W2)2(ppbCy - Cp)ppbCy .
3.0.2 Optimum Value of uT and w2
Using the differentation procedure for obtaining the optimal value of uT and w2 from equations (22) and (23), we have
uT(opt) = 0.146, 0.854. (24)
_ 1 , 1 P pbCy U2(°pr) = 2 ± 2p(p pbCy - Cp). (25)
Now substituting the values of wT(0pt) and u2(0pt) obtained as equations (24) and (25) in equations (22) and (23) respectively, the minimum value of MSE up to O (n-1) for the estimators Trpi (i = 1,2) is obtained as
MSEmm (Trpi ) = 7Y2 Cy2 (1 - p2pb) + 7iY2ppbCy2. ; i = 1,2. (26)
4. Efficiency Comparison
For comparing the MSE of the proposed double sampling estimators with some existing ratio and product type estimators such as mean per unit estimator (tm), double sampling ratio and product estimator (Tngr & Tngp) of Naik and Gupta [11], double sampling exponential ratio and product estimators (Tsgr & Tsgp) of Singh et al. [17], we first write the expressions of their mean squared error in double sampling up to the first order of approximation
Bias(tm) = 0.
MSE(tm) = jY2Cy2. (27)
Bias(Tngr) = Y(7 - Yi)(Cp - Cyp). MSE(Tngr) = Y2(jCy2 + Y2C2 - 272Cyp). (28)
Bias(Tngp) = Y(7 - 7i)Cyp. MSE(Tngp) = Y2(7Cy2 + 72 Cp2 + 272 Cyp). (29)
1 w/ 1
Bias (Tsgr ) = 2 Y2Y(i Cp — Cyp^j • MSE(Tsgr ) = Y2( YCy2 + l Y2Cp2 — Y2Cyp) Bias (Tsgp ) = 2 y2yQ Cp + Cyp^j •
MSE(Tsgp) = Y2 (jC/ + 1Y2Cp2 + Y2. (31)
The efficiency comparison of the proposed double sampling ratio-cum-product exponential estimators Trp1 and Trp2 with the other existing estimators is done as
From equations (27), (28), (29), (30) and (31), we get the following conditions under which Trpi and Trp2 are more efficient than the estimators tm, Tngp, Tngr, Tsgp and Tsgr as
MSEmin (Trpi) < V(tm)
^ 7Y2C/(1 - p2pb) + 7iY2p2pbC/ < jY2Cy2,if
72PpbY2 > 0. (32)
MSEmin (Trpi) < MSE(Tngr )
^ 7Y2Cy2(1 - p2pb) + 7iY2p2pbCy2 < Y2(7Cy2 + 72Cp2 - 272Cyp), if
Y2(Cp - ppbCy)2 > 0. (33)
MSEmin (Trpi)
< MSE(Tngp)
^ 7Y2Cy2(1 - p2pb) + 71Y2p2pbCy2 < Y2(7Cy2 + Y2Cp2 + 272Cyp), if
Y2(Cp + ppbCy)2 > 0. (34)
MSEmin (Trpi) < MSE(TSgr )
yY2Cy2(1 - p2pb) + Y1Y2p2pbCy2 < Y2 (YCy2 + 1 Y2Cp2 - Y2Cy^ , if
Y2 (Cp - 2ppbCy)2 > 0. (35)
MSEmin (Trpi) < MSE(TSgp)
^ yY2Cy2(1 - p2pb) + Y1Y2p2pbCy2 < Y2 (YCy2 + 1 Y2Cp2 + Y2Cy^ , if
Y2 (Cp + 2ppbCy)2 > 0. (36)
Which is true in all the cases, therefore the proposed estimators are theoretically efficient.
5. Numerical illustration
The populations P1 and P2 have been considered for comparing the efficiency of the proposed ratio-cum-product double sampling exponential estimators Tde1 and Tde2 with the existing estimators. The population P1 is from Sukhatme and Sukhatme [18] where the study variable (Y) is the number of villages in the circles and the auxiliary information (<^) is a circle consisting of more than five villages. The population P2 is from Mukhopadhyay [9], in which the study variable (Y) is the household size and the auxiliary attribute (<) is a household that availed an agricultural loan from a bank. A first phase sample of size 45 is drawn from the population P1 and a second phase sample which is nested within the first phase sample of size 23 is drawn. Further, a first phase sample of size 13 is drawn from the population P2 and a second phase sample which is nested within the first phase sample of size 7 is drawn.
Table 1: Characteristics of population data sets
Population N n n y p1 P pb Cy cp ^ (<P) Cyp
P1 89 45 23 2.911 0.067 0.586 0.542 3.782 11.433 1.201
P2 25 13 7 9.462 0.385 - 0.396 0.497 1.317 - 2.056 - 0.259
The description of the population given in Table-1 shows that the correlation between study and auxiliary variable for the population P1 is positive, for population P2 is negative.
Table 2: MSE and Bias of various estimators
Estimator Population
P1
MSE |Bias|
tm (Sample mean) 0.080 0.000
Tngr (Naik and Gupta, 1996) 2.224 0.811
Tsgr (Singh et al., 2007) 0.508 0.184
Trpi (Proposed) 0.062 0.000
The data of Table-2 clearly shows that the estimators Trpi have the lowest MSE as compared to the other existing estimators tm, Tngr and Tsgr and are also unbiased.
Table 3: PRE of Trpi with respect to the estimators tm, Tngr and Tsgr for P6
Estimator Population - P1
Percent Relative Efficiency w.r.t
tm Tngr Tsgr
tm (Sample mean) 100.000 2780.000 635.000
Tngr (Naik and Gupta, 1996) 3.597 100.000 22.842
Tsgr (Singh et al., 2007) 15.748 437.795 100.000
T ■ 1 rpi (Proposed) 129.032 3587.097 819.355
Perusal of Table-3 shows that the PRE of the proposed estimators Trpi for population P1 with respect to tm, Tngr and Tsgr is 129.032, 3587.097 and 819.355 respectively. The highest PRE value is found with respect to the exponential ratio estimator Tngr followed by the PRE with respect to the exponential ratio type estimator Tsgr and the sample mean estimator tm.
Table 4: MSE and Bias of tm, Tngp, Tsgp and Trpi
Estimator Population
P2
MSE |Bias|
tm (Sample mean) 2.362 0.000
Tngp (Naik and Gupta, 1996) 9.816 0.162
Tsgp (Singh et al., 2007) 3.431 0.190
T ■ (Proposed) 2.125 0.000
The data of Table-4 clearly shows that the estimators Trpi have the lowest MSE as compared to the other estimators tm, Tngp and Tsgp and are also unbiased.
Table 5: PRE of Trpi with respect to the estimators tm, Tngp and Tsgp for P2
Estimator Population - P2
Percent Relative Efficiency w.r.t
tm Tngp TSgp
tm (Sample mean) 100.000 415.580 145.258
Tngp (Naik and Gupta, 1996) 24.063 100.000 34.953
Tsgp (Singh et al., 2007) 68.843 286.097 100.000
T ■ 1rpt (Proposed) 111.153 461.929 161.459
Perusal of Table-5 shows that the PRE of the proposed double sampling product type estimators Trpi for population P2 with respect to tm, Tngp and Tsgp is 111.153, 461.929 and 161.459 respectively. The highest PRE value is found with respect to the exponential product estimator Tngp followed by the PRE with respect to the exponential product type estimator Tsgp and the sample mean estimator tm.
It can be observed from Table-3 and Table-5 that the proposed double sampling ratio estimators Trpi have the highest PRE with respect to the existing estimators tm, Tngr, Tngp, Tsgp and Tsgr for both the populations P1 and P2 which concludes that the proposed estimators are efficient.
6. Conclusion
In this paper, we have introduced two unbiased ratio-cum-product exponential estimators of the population mean, denoted as Trp\ and Trp2, within the framework of double sampling. The large sample properties of these estimators have been derived to the first order of approximation and theoretically compared with existing ratio and product type estimators. Through numerical evaluation across various datasets, our comparison demonstrates that the proposed estimators consistently outperform existing ones in terms of efficiency while maintaining unbiasedness. These findings underscore the superior performance of the proposed estimators, making them a valuable contribution to the field.
References
[1] AL-Omari, A. and Bouza, C. 2015. Ratio estimators of the population mean with missing values using ranked set sampling, Environmetrics, 26(2):67-76.
[2] Bahl, S. and Tuteja, R. K. (1991). Ratio and product type exponential estimator. Information and Optimization Sciences, 12(1), 159-163.
[3] Abd-elfattah, A.M., El-Sherpieny, E.A., Mohamed, S.M. and Abdouf, O.F. 2010. Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute. Applied Mathematics and Computation, 215(12):4198-4202.
[4] Cochran, W.G. 1940. The estimation of the yields of the cereal experiments by sampling for the ratio of grain to total produce. The Journal of Agricultural Science, 30:262-275.
[5] Hussain, S., Sharma, M. and Chandra, H., 2021. Modified Exponential Product Type Estimators for Estimating Population Mean Using Auxiliary Information. In Special Proceedings 23rd Annual Conference, 24-28.
[6] Hussain, S. and Bhat, V.A., 2023. New Median Based Almost Unbiased Exponential Type Ratio Estimators In The Absence Of Auxiliary Variable. Reliability: Theory & Applications, 18 (72):242-249.
[7] Hussain, S., Sharma, M., Bhat, V.A. and Bhat, M.I.J., 2024. Proportion Based Dual Unbiased Exponential Type Estimators of Population Mean. Thailand Statistician, 22(1):31-39.
[8] Jhajj, H.S., Sharma, M.K. and Grover, L.K. 2006. A family of estimators of population mean using information on auxiliary attribute. Pakistan Journal of Statistics 22(1):43.
[9] Mukhopadhyay, P. 2000. Theory and methods of survey sampling, PHI Learning, New Delhi.
10] Mehta, V., Singh, H. P. and Pal, S. K. 2020. A general procedure for estimating finite population mean using ranked set sampling. Investigation Operational, 41(1):80-92.
11] Naik, V.D. and Gupta, P.C. 1996. A note on estimation of mean with known population proportion of an auxiliary character. Journal of the Indian Society of Agricultural Statistics, 48(2):151-158.
12] Robson, D.S. 1957. Applications of multivariate polykays to the theory of unbiased ratio type estimation. Journal of the American Statistical Association, 52:511-522.
13] Shabbir, J. and Gupta, S. 2007. On estimating the finite population mean with known population proportion of an auxiliary variable. Pakistan Journal ofStatistics, 23(1):1.
14] Shabbir, J. and Gupta, S. 2010. Estimation of the finite population mean in two phase sampling when auxiliary variables are attributes. Hacettepe Journal of Mathematics and Statistics, 39(1):121-129.
15] Shalabh and Tsai, J.R. 2017. Ratio and product methods of estimation of population mean in the presence of correlated measurement errors. Communications in Statistics-Simulation and Computation, 46(7):5566-5593.
16] Singh, N. and Vishwakarma, G. K. 2019. A generalised class of estimator of population mean with the combined effect of measurement errors and non-response in sample survey. Investigacion Operacional, 40(2):275-285.
17] Singh, R., Chauhan, P., Sawan, N., and Smarandache, F. 2007. Ratio-product type exponential estimator for estimating finite population mean using information on auxiliary attribute. Renaissance High press USA, 1:18-32.
18] Sukhatme, P.V. and Sukhatme, B.V. 1970. Sampling Theory of Surveys with Applications, Lowa State University Press, USA.