Estimating the net premium using additional information about a quantile of the cumulative distribution function
Zhanna N. Zenkova
Associate Professor, Institute of Applied Mathematics and Computer Science
National Research Tomsk State University;
SRO "Association of professional actuaries"
Address: 36, Lenin Avenue, Tomsk, 634050, Russian Federation
E-mail: [email protected]
Elizaveta A. Krainova
Doctoral Student, Institute of Applied Mathematics and Computer Science National Research Tomsk State University Address: 36, Lenin Avenue, Tomsk, 634050, Russian Federation E-mail: [email protected]
Abstract
In this paper, the task of increasing the accuracy of net premium estimations in non-life insurance is considered. Improvements are achieved by involving additional information about a known quantile of loss cumulative distribution function. The additional information is used by projection the empirical cumulative distribution function onto the class of cumulative distribution functions with a certain quantile, and then the modified empirical cumulative distribution function is substituted into the integral that yields the mean value. This allows us to obtain a modified estimation of mean value using additional information about the quantile which is unbiased and its variance is asymptotically less than the variance of the classical sample mean, so that the mean-square error of the modification is also smaller. Therefore, the modified estimation is more accurate than the classical one for a large sample size.
The influence of a quantile value on the variance of the new estimation is studied for uniform, triangular and normal distributions. It is suggested that the minimum of the variance is reached when a known quantile is equal to the median (symmetry center) for symmetrical distribution. Based on Simpson triangular distribution, it was shown that for cases of skewed distributions involving the quantile allows one to decrease the variance more significantly than for symmetrical ones.
The modified estimation of mean value is applied to a real data set for calculation of a net premium. The data contain information about payments for voluntary health insurance of some insurance company. It is demonstrated that the classical method underestimates the net premium, and so it could lead to the company's bankruptcy. After applying the new modified technique, the net premium becomes higher and the bankruptcy risk is reduced as well.
This paper contains practically significant results which make it possible to give important recommendations to an insurance company.
Key words: net premium, sample mean, additional information, cumulative distribution function quantile, modified estimation of mean value, accuracy of estimation, mean-square error, non-life insurance.
Citation: Zenkova Z.N., Krainova E.A. (2017) Estimating the net premium using additional information about a quantile of the cumulative distribution function. Business Informatics, no. 4 (42), pp. 55—63. DOI: 10.17323/1998-0663.2017.4.55.63.
Introduction
The insurance business is actively employing the latest scientific developments, especially in the area of modern statistical methods and models. This paper considers the cases of non-life insurance [1] in which the insurance indemnity is only paid upon the occurrence of an insured event, which are essentially non-deterministic. Because of their random character, the insurance companies use the so-called net premium [2] as the foundation for the calculation of insurance premiums. The net premium depends on the mean loss amount and the probability that the insured event will occur. Then the net premium is adjusted by a coefficient greater than one, which is calculated based on the probability of the companyr's bankruptcy set by an actuary. The premium is further incremented so that the company could make a profit and pay the agents' commissions.
To optimize their business, insurance companies always aim at setting policy premiums that, on the one hand, would be attractive to clients, and, on the other hand, accurately account for the random character of the insured events as accurately as possible. For these purposes, different statistical methods and models are used, including methods that involve various additional types of information [3-14].
This paper proposes a new method of improvement of the net premium estimation by considering additional information about the quantile of the loss cumulative distribution function. This results in a more accurate modified estimation of the expected value of the insurer's payouts per insured event. The
new method of calculations was tested using real-life data on voluntary health insurance and allowed us to re-evaluate the policy premiums, thus reducing the risk of bankruptcy of the insurance company.
This paper should be of a considerable practical interest because this approach to the calculation of the net premiums has not been used before.
1. Estimating the mean insurance indemnity payment using additional information about the quantile of the cumulative distribution function
The net premium p plays the key role in the calculation of the insurance premium rates in non-life insurance [1], and it is defined as follows [2]:
P = z ■ EX, (1)
where X - the payout amount, a random variable with the cumulative distribution function F{x) = P{X< jc);
EX — the expected value of the random variable;
Z — the probability of occurrence of the insured event.
In insurance practice, neither z nor EX are known exactly. They must be estimated based on the available data on payouts. Very often, the researcher possesses some additional information about the random variable under consideration and its distribution. This information could come from the conditions of an experiment, the researcher's professional experience, etc. It is well known that the use of additional information positively affects the properties of
the modified statistics by increasing the accuracy of the estimation [3—13].
This paper considers the additional information about the known quantile of the cumulative distribution function xg of the given level q, i.e., we know that
F(x)=q. (2)
Using (2), we obtain the modified estimation of the expected value by substituting [15] the modified empirical distribution function into the functional for EX :
TJJ
X' = ¡xdF9(x).
The modified empirical distribution function F§(x) is obtained as the projection of the ordinary empirical distribution function onto the prior class [9] by the formula:
*¿(x) =
9-
-7—\"Al
FÁX<) .
FN{x)-FN(xq)
vvv
if ^(x,)e(0;l);
\ \ \
vO a1
/ J /
vW = 1,
(3)
where
(4)
is the ordinary empirical distribution function, symbols a and v denote the maximum (minimum) of the two values,H(y) = {0 :y < 0;1:> 0} is a Heaviside step function, X(1) < X(2) <...< X(N) are the order statistics of the independent sample [X1,X2,...,XN), N is a sample size.
Then
X" =
+
N-F«(xq) t (1-*)
+
N(l-F<(xf)) 1
-iJL Vf ; ^-ft-g) V x
Here the random variable r is defined through
N
and it has the binomial distribution Bi(N,g) [15].
After the modification using the known quantile, the estimated expected value is given by the equation
X9 =
q N 1 -q N
X,
if r = l,N-I; if r- 0 orN,
(5)
where
- 1 N N U '
(6)
is the classical sample mean. Additionally, for r = 1, N -1 the equation (5) can be defined using the order statistics:
" ¡=1 ly —r i=r+l
Let us note that the equation (5) can be considered as a generalization of the asymptotically normal estimation proposed in [8], which dealt with a more general case of the functional estimation using several exactly known expected values; however, the situations when a modification cannot be defined were not taken into account. It is worth noting that the aforementioned estimation was obtained not by substitution, but by projection onto the prior class using the Kullback-Leibler divergence.
Let us find the expected value of the estimate (5):
Er = E{E{X«\r)) = E\E\l^Xil[x<Xt)+ 1 -q N
N-r fa (*<-*») II
N
orJvjU(l-^-a-çf)-
' I"? t..Jn*>-<l
\r =
= l,N-1^ + EX \qN +{\-qf) = EX.
Therefore, the modified estimation of the expected value (5) is unbiased.
Let us find the variance of the estimation (5) using the equation
DXq = D(E{Xq \r)} + E(p(Xq |r)),
where D [e (x9 | r)} = D {EX} = 0. Using independence of Xn i = 1 ,N, we obtain
Yxj,
I r fa (X'<x'i
W'-1-*-1
According to the properties of expectation,
i i
< — because function f(r) = -, is convex.
. . Er Convergence of £
(\ \
r> 0
\r /
1
£(r|r>0)
was proved numerically for N= 10,100. Figure 1 shows the relative deviation
\ r >0 y 1
E{r\r > O)
E '1 J \ r > 0 y
•100%
as a function of the sample size N for q = 0.2, 0.5 and 0.8.
%
10 30 50 70 90 N
Fig. 1. Deviation A for q = 0.2, 0.5, 0.8
Since q e (0,1), then
p(r=uv=i)=l-^-M^l
P(r = 0) = qN}^>0, P(r=N) = (l-q)N^0.
To obtain the asymptotic value of the variance, it is sufficient to consider the case
r = l,N-l, then
+ (l-q)-D [X,l{x¡ q]x>d F-f-q-\\xd
F(x) 9
\2 +
F(x)-q
-(1 -q)
Ve»
*q 1 +<» 1
f x2dF(x) -- ■ L2 + f x2dF(x) ——R2 ■L q i \-Q
= DX +
+» y J xdF(x) -
q 1 -q
xg +oo
Here L=\ xdF(x), R = J xdF(x).
-co Xq
Therefore, the asymptotic variance of the modified estimation of the expected value, normalized to the sample size, is given by the following equation:
\2
1_
l-q
O2=DX+{EX)2-\
\2
• (7)
Since (EX)2 =L2+2 L R + R2, then
(EX)2---L2—-R2=-l-^-L2+ K J a l-a a
+ 2-
l-q
■LR
1-g 1 -q
■R2
1 -q
■L-
1 -q
■R
<0,
which means that the equation (7) can be rewritten as follows:
(j2 = DX-
or, equivalently [8],
-
aq-a
1
?
J xdF(x)-q-EX
\2
. (8)
where a2 =DX = NDX.
For the case when the known quantile is the median, we have:
—
-a
J xdF(x)-
EX
\2
From (8) it is obvious, that a2 < a2, i.e., when the number of observations is large enough, the use of the additional information about the quantile may reduce the mean-square error, and, consequently, increase the accuracy of the estimate of the expected value.
For the uniform distribution i/^^x), the equation (8) takes the following form:
2 1 g(l-g) a«=ñ--i""
(9)
It is obvious that the minimum of the variance is reached at q = 0.5. Figure 2 shows the dependence given by the equation (9).
Figure 3 shows a2 vs q for the normal distribution function iV(10,4).
Figure 4 shows the second term of the equation (8), namely
f ^
1
9(1-9) L-
\xdF(x)-q-EX
(10)
NDX
1
0.8 -0.6-
Fig. 2. a\ vs. q for ü¡0il](x), a1 = ^
for different values of c for Simpson's triangular distribution over the interval [0,1] where ce(0,1). The value of d shows how much smaller the final variance a2 is compared to the variance of the original estimate of the mean a2 = NDX. Note that we did not obtain the best results for the symmetrical case (when c = 0.5). The consideration of the quantile turned out to be more important in the cases of a large skew-ness (c = 0.9 and c = 0.1). When the skewness is positive (c<0.5), the accuracy increases if we use the information about q> 0.5, and vice versa (see Figure 5 and Table 1).
Figure 5 shows the combination of the parameter c and the quantile q that allows for the maximum improvement in the quality of the estimation of the modified mean, meaning that d reaches its minimum value. Table 1
NDX
4
3.5 -•
3 -■ 2.5 -■
2 -■ 1.5 -■
1
0
0.25
50
0.75
Fig. 3. a2 vs. qfor the normal distribution function N(10,4), o2 = 4
1
0
-0.005 -0.010 -0.015 -0.020 -0.025 -0.030 -0.035 -0.004 -I
0 0.1 0.25 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 4. The dependence of d vs. qfor the triangular distribution over [0,1] with the parameter c e (0,1)
shows the numerical values for asymptotic normalized variances a2q and the minimum values of d for these combinations of c and q.
Therefore, for all distributions we considered, the account of the additional information leads to a significantly smaller variance of the estimations. For symmetric distributions, the variance reached the minimum when the quantile coincided with the median, i.e. with the center of symmetry. Skewed distributions require additional studies; however, judging by the example of the triangular distribution, we can assume that, for a positively skewed distribution, a more accurate estimation of
0.7 0.6 0.5 0.4 0.3
0
0.2
—i-1—
0.4 0.6
0.8
t c
1
Fig. 5. The combination of the parameter c
and the quantile q that allows for the maximum improvement of the quality of the estimation of the modified mean for the triangular distribution over [0,1] with the parameter c e (0,1)
Table 1.
Values of asymptotic normalized
_2
variances a„
variances a
■■ lim NDXq,
N->co
2 - NDX
and d = o2—a2 for optimal combinations of c and q giving the minimum d values
c q <72 d
0.01 0.6355 0.055006 0.015375 -0.03963
0.1 0.616 0.050555 0.014224 -0.03633
0.2 0.609 0.046667 0.013541 -0.0331
0.3 0.5935 0.043889 0.013383 -0.03051
0.4 0.564 0.042222 0.013619 -0.0286
0.5 0.500 0.041667 0.013889 -0.02778
0.6 0.436 0.042222 0.013619 -0.0286
0.7 0.4065 0.043889 0.013383 -0.03051
0.8 0.3910 0.046667 0.013541 -0.03313
0.9 0.384 0.050555 0.014224 -0.03633
0.99 0.3645 0.055006 0.015375 -0.03963
the expected value may be obtained using the information about a quantile that is greater than the median, and vice versa.
It is worth mentioning that for uniform distribution, the estimation of the cumulative distribution function involving symmetry has a smaller variance compared to the estimation that takes the median into account, as shown in [7].
As a result, the modified estimation of the net premium (2) with the account for the information about the quantile can be calculated as follows:
■z-X",
(11)
This formula yields a more accurate result compared to the classical method of estimation that uses the sample mean because in this case, the mean-square error Xq is smaller.
2. Estimation of the net premium using a quantile for voluntary health insurance
The proposed method for the net premium estimation (11) was applied to real-life data on insurance indemnity payments made on voluntary health insurance policies. For privacy reasons, the actual values are scaled, and the name of the insurance company is concealed. There have been N = 239 insured events over the considered period. Table 2 presents the sample X= {Xv X2, ..., XjJj, nonrepeating payments Y.,
nl — the number of repetitions of Y. in the ' k ' sample, N = ^n,, and k — the number of non-
/=i
repeating payments.
Table 2.
Scaled data on indemnity payments for insured events of voluntary health insurance in conventional monetary units (cmu) per unit
i Y 7 cmu/unit n. 1 i Y i cmu/unit n. 1 i Y i cmu/unit n. 1 i Y i cmu/unit n. 1
1 16.9 2 16 131.3 2 31 437.5 1 46 1343.8 1
2 20.6 1 17 137.5 1 32 468.8 5 47 1500.0 1
3 25.0 1 18 156.3 24 33 487.5 1 48 1562.5 2
4 28.1 1 19 162.5 1 34 500.0 3 49 1662.5 1
5 31.3 8 20 175.0 1 35 531.3 1 50 1687.5 1
6 37.5 9 21 187.5 18 36 562.5 2 51 2020.0 1
7 46.9 1 22 200 4 37 600.0 1 52 2500.0 1
8 50.0 1 23 210.0 1 38 625.0 8 53 3125.0 2
9 56.3 11 24 218.8 2 39 739.8 1 54 3750.0 2
10 62.5 19 25 243.8 2 40 750.0 2 55 4687.5 1
11 87.5 1 26 250.0 5 41 781.25 1 56 5000.0 1
12 93.8 20 27 281.3 4 42 937.5 2 57 5625.0 1
13 100.0 5 28 312.5 19 43 1000.0 1 58 8312.5 1
14 112.5 2 29 375.0 10 44 1125.0 2 59 8437.5 1
15 125.0 11 30 406.25 1 45 1250.0 2 60 9375.0 1
When the net premiums are calculated by the traditional method, the sample mean X = 504.73 cmu/unit, the probability of an insured event was estimated as the ratio of the number of insured events to the number of policies, and it turned out that z = 0.035. The final net premium is p = 17.67.
However, from his long-term experience in the health insurance business, the company's actuary knows that in 90% of the cases the indemnity payments do not exceed 750 cmu/unit, i.e., we know the quantile xq = x09 of the q = 0.9 level for the random value considered here. Using (5), we obtain Xq = 516.23 cmu/unit. As a result, by accounting for the quantile, we find the modified net premium of pq = 18.07, which is 2.26% greater than p.
Therefore, knowledge of the quantile leads to a more accurate evaluation of the net premium. The premium calculated previously was underestimated, meaning an increased risk of bankruptcy for the insurance company. Since the modified estimate is more accurate and provided a large enough sample size (N = 239), the company's actuary was urged to recalculate the health insurance premiums.
Conclusion
This paper presents a new approach to the calculation of net premiums in the non-life insurance business. The approach uses additional information about the quantile of the cumulative distribution function to calculate the expected value of loss. The modified mean estimation is unbiased and more accurate than the classical sample mean, because its asymptotic mean-square error is smaller. Therefore, the new estimate of the net premium is more
accurate for a sufficiently large number of observations.
The paper also studies the effect of the quantile on the asymptotic normalized variance of the modified mean in the cases of uniform, triangular and normal distributions. It is shown that in the case of a symmetric distribution, maximum accuracy is achieved by using the median for the adjustment calculation. As follows from the consideration of triangular distribution, positively skewed distributions will benefit from employing quantile levels greater than 0.5, and vice versa.
The proposed method was tested using reallife data on voluntary health insurance indemnity payments. The use of additional information allowed us to obtain a more accurate value for the net premium. The company's actuary was urged to recalculate the insurance premium rates to reduce the risk of the company's bankruptcy. ■
References
1. Mironkina Y.N., Sorokin A.S. (2011) Osnovy aktuarnykh raschetov [Fundamentals of actuarial calculations]. Moscow: Center EAOI (in Russian).
2. Falin G.I., Falin A.I. (2004) Teoriya riska dlya aktuariev v zadachakh [Theory of risk for actuaries in tasks]. Moscow: Mir, Nauchny Mir (in Russian).
3. Abu-DayyehW.A., Ahmed M.S., Ahmed R.A., Muttlak H.A. (2003) Some estimators of a finite population mean using auxiliary information. Applied Mathematics and Computation, no. 139, pp. 287-298.
4. Haq A., Shabbir J. (2014) An improved estimator of finite population mean when using two auxiliary attributes. Applied Mathematics and Computation, no. 241, pp. 14-24.
5. Singh H.P., Tailor R. (2005) Estimation of finite population mean with known coefficient of variation of an auxiliary character. Statistica, vol. LXV, no. 3, pp. 301-313.
6. Tarima S., Pavlov D. (2006) Using auxiliary information in statistical function estimation. ESAIM:
Probability and Statistics, no. 10, pp. 11-23.
7. Dmitriev Y.G. (1976) O svoystvakh otsenok funktsii raspredeleniya i funktsionalov pri dopolnitel'noy apriornoy informatsii [On the features of distribution function and functionals estimations in case
of additional a priori information]. Mathematical Statistics and Its Applications, no. 4, pp. 63-76 (in Russian).
8. Dmitriev Y.G., Tarasenko P.F. (1992) Ispol'zovanie apriornoy informatsii v statisticheskoy obrabotke eksperimental'nykh dannykh [Using a priori information in statistical processing of experimental data]. Izvestiya vuzov. Physics, no. 9, pp. 136-142 (in Russian).
9. Dmitriev Y.G., Ustinov Y.K. (1988) Statisticheskoe otsenivanie raspredeleniy veroyatnostey s ispol'zovaniem dopolnitel'noy informatsii [Statistical estimation of probability distributions using additional information]. Tomsk: TSU (in Russian).
10. Zhurko E.S., Zenkova Z.N. (2017) Vliyanie apriornoy informatsii na rezul'taty metoda tsenoobrazovaniya na tovar-novinku PSM [The influence of a priori information on the results
of the PSM new product pricing method]. Proceedings of the IIIInternational Scientific and Practical Conference "Actual Problems and Perspectives of State Statistics Development in Modern Conditions ". Saratov, 5—7December 2016. Saratov: Saratovstat, vol. 2, pp. 66—68 (in Russian).
11. Zhurko E.S., Zenkova Z.N. (2016) Modifikatsiya metoda tsenoobrazovaniya PSM s uchetom kvantilya zadannogo urovnya [Modification of the PSM pricing method considering quantile of the specified level]. Proceedings of the International Scientific and Practical Conference "Information Technologies
of Siberia". Kemerovo, 10November2016. Kemerovo: KuzSTU, pp. 134—136 (in Russian).
12. Zenkova Z.N., Makeeva O.B. (2015) Ispol'zovanie informatsii o kvantile pri analize oborachivaemosti oborotnykh sredstv [Using quantile information in current assets turnover analysis]. Proceedings of the III All-Russian Youth Conference "Mathematical Support and Software for Informational, Technical and Economical Systems ". Tomsk, 22—23 May 2015. Tomsk: TSU, pp. 82—87 (in Russian).
13. Zenkova Z.N., Muravleva M.A. (2014) Asimptoticheskaya nesmeshchennost' otsenki funktsii raspredeleniya po odnokratno intervalom tsenzurirovannym dannym s privlecheniem informatsii o simmetrii [Asymptotic unbiasedness of distribution function estimation based on single interval-censored data using information about symmetry]. Proceedings of the X All-Russian Conference
"New Information Technologies in Complex Structures Research". Katun', 9—11 June 2014. Tomsk: TSU, pp. 114—115 (in Russian).
14. Zenkova Z.N., Krakovetskaya I.V. (2013) Neparametricheskaya otsenka Ternbulla dlya interval' no-tsenzurirovannykh dannykh v marketingovom issledovanii sprosa na bioenergeticheskie napitki [Nonparametric Turnbull estimator for interval-censored data in the marketing research of the demand of bio-energy driks]. Tomsk State University Journal of Control and Computer Science, no. 3(24),
pp. 64—69 (in Russian).
15. Borovkov A.A. (1997) Matematicheskaya statistika [Mathematical Statistics]. Novosibirsk: Nauka; Institute of Mathematics (in Russian).