Events Dependence Used in Reliability Speaks More to Know. Modeling Competing Risks
Boyan Dimitrov
Department of Mathematics, Kettering University, 1700 West University Avenue, Flint, Michigan, USA email: [email protected]
Abstract
In this article we show how some known to us measures of dependence between random events can be easily transferred into measures of local dependence between random variables. This enables everyone to see and visually evaluate the local dependence between uncertain units on every region of their particular values. We believe that the true value of the use of such dependences is in applications on non-numeric variables, as well as in finances and risk studies. We also trust that our approach may give a serious push into the microscopic analysis of the pictures of dependences offered in big data. Numeric and graphical examples should confirm the beauty, simplicity and the utility of this approach, especially in reliability models.
Keywords: local measures of dependence, local regression coefficienstins, local correlation, mapping the local dependence, big data tools, microscopic analysis of dependence in reliability models - graphic illustrations
1. Introduction
The big data files contain a number of simultaneous multi-dimensional observations. This fact offers plenty of opportunities for establishing possible dependences between observed variables. Most of these dependences will be of global nature. However, there exist (or can be created) techniques to take a microscopic look on more details into it. In this article we want to show the ideas of these microscopic looks.
The concepts of measuring dependence should start from the very roots of Probability Theory. Independence for random events is introduced simultaneously with conditional probability. Where independence does not hold, events are dependent. Further, the focus in textbooks is on the independence. No text-books usually discus what to do if events are dependent. However, there are ways to go deeply in the analysis of dependence, to see some detailed pictures, and use it later in the studies of random variables. This question is discussed in our previous articles (Dimitrov 2010, 2015) and more (Esa-Dimitrov 2013, 2017). Some particular situations are analyzed in Dimitrov and Esa 2014 and Esa, Dimitrov 2017. Applications in study of politics are used in Esa, Dimitrov 2013. We refer to these articles for making a quick passage to the essentials.
First we notice here that the most informative measures of dependence between random events are the two regression coefficients. Their definition is given here:
Definition1. Regression coefficient Rb(A) of the event A with respect to the event B is called the difference between the conditional probability for the event A given the event B, and the
conditional probability for the event A given the complementary event B, namely
Rb(A) = P(A|B) - P(A| B ).
This measure of the dependence of the event A on the event B, is directed dependence.
The regression coefficient Ra(B) of the event B with respect to the event A is defined analogously.
From the many interesting properties of the regression coefficients we would like to point out here just few:
(R1) The equality to zero Rb(A) = Ra(B) =0 takes place if and only if the two events are independent.
(R2) The regression coefficients Rb(A) and Ra(B) are numbers with equal signs and this is the sign of their connection 8(A, B) =P(A"B)-P(A)P(B). The relationships
A) = P( A n B) - Pi A)P(B), and = P( A n B) - P( A)P(B). B P(B)[1 - P(B)] A P( A)[1 - P(A)]
The numerical values of Rb(A) and Ra(B) may not always be equal. There exists an asymmetry in the dependence between random events, and this reflects the nature of real life. (R3) The regression coefficients Rb(A) and Ra(B) are numbers between -1 and 1, i.e. they satisfy the inequalities
-1 < Rb(A) < 1 ; -1 < Ra(B) < 1.
(R4.1) The equality Rb(A) =1 holds only when the random event A coincides with (or is equivalent to) the event B. Then it is also valid the equality Ra(B) =1;
(R4.2) The equality RB (A) — -1 holds only when the random event A coincides with (or is
equivalent to) the event B - the complement of the event B. Then it is also valid Ra(B) = - 1, and respectively A — B.
We interpret the properties (r4) of the regression coefficients in the following way: As closer is the numerical value of Rb(A) to 1, "as denser inside within each other are the events A and B, considered as sets of outcomes of the experiment". In a similar way we interpret also the negative values of the regression coefficient.
There is a symmetric measure of dependence between random events, and this is their coefficient of correlation.
Definition 2. Correlation coefficient between two events A and B we call the number
PAB= Rb ( A) ■ RA (B) ,
where the sign, plus or minus, is the sign of the either of the two regression coefficients.
Remark. The correlation coefficient PAB between the events A and B equals to the formal correlation coefficient Pj j between the random variables and IB, the indicators of the two random events A and B.
The correlation coefficient PAB between two random events is symmetric, is located
between the numbers Rb(A) and Ra(B). The following statements hold:
pi. pAB = 0 holds if and only if the two events A and B are independent. The use of the numerical values of the correlation coefficient is similar to the use of the two regression
coefficients. As closer is pAB located to the zero, as "closer" to the independence are the two events A and B.
For random variables similar statement is not true. The equality to zero of their mutual correlation coefficient does not mean independence
q2. The correlation coefficient pAB always is a number between -1 and +1, i.e.
-1 * PAB * 1
p2.1. The equality pAB = 1 holds if and only if the events A and B are equivalent, i.e. when A = B. p2.2. The equality Pa b = - 1 holds if and only if the events A and B are equivalent, i.e. when A =
B.
As closer is pAB to the number 1, as "more dense one within the other" are the events A and B, and when pA b = 1, the two events coincide (are equivalent).
As closer is pAB to the number -1, as "denser one within the other" are the events A and B, and when pAB = - 1, the two events coincide (are equivalent). Denser one within the other are then the events A and B.
2. The transfer rules
The above measures allow studying the behavior of interaction between any pair of numeric r.v.'s (X,Y) throughout the sample space, and better understanding and use of dependence.
Let the joint cumulative distribution function (c.d.f.) of the pair (X,Y) be F(x,y)=P(X<x, Y<y), and marginals F(x) =P (X * x), G(y)=P(Y<y). Let introduce the events
Ax={x<X<x+^1x}; By ={y<Y<y+Azy}, for any x, y e (
Then the measures of dependence between events Ax and By turn into a measure of local dependence between the pair of r.v.'s X and Y on the rectangle D=[x, x+A 1x]x[y, y+Aiy]. Naturally, they can be named and calculated as follows:
Regression coefficient of X with respect to Y, and of Y with respect to X on the rectangle [x, x+A ix]x[y, y+A2y]. By the use of Definition 1 we get
Ry((X Y)eD)= AdF(x>y) ~ [F(x + A'x) ~ F(x)]G(y + A2y) ~ G(y)] , [F(x + Ax) ~ F (x)]{1 ~ [F(x + Ax) ~ F (x)]} .
Here ADF(x,y) denotes the two dimensional finite difference for the function F(x,y) on rectangle D=[x, x+A 1x] x [y, y+A iy]. Namely
ADF(x,y) =F (x+Aix, y+Aiy)- F (x+Aix, y)- F (x, y+Aiy)+ F (x, y).
In an analogous way is defined px((X,Y)eD). Just denominator in the above expression is changed respectively.
Correlation coefficient qy((X,Y)sD) between the r.v.s X and Y on rectangle D=[x, x+A1x]x[y, y+Aiy] can be presented in similar way by the use of Definition 2. We omit detailed expressions as something obvious.
It seems easier to find out the local dependence at a value (X=i, Y=j) for a pair of discretely distributed r.v. (X,Y). Regression coefficient of X with respect to Y, and of Y with respect to X at a
value (X=i, Y=j) is determined by the rule
P( X = ■-, Y = J) - P( X = i)P(Y =.;) =
^ J' T^ T^ -MM T^ T^ -NT fi .(1- Pi .)
P( X = i)[1 - P( X = i)]
Similarly, you can get that the local correlation coefficient between the values of the two r.v.'s (X,Y) is given by
P(i ,J )-Pi. P.J.
pX,r(X=i,Y=j)= pi (1-pi p j (1-p j )
Using these rules one can see and visualize the local dependence between every pair of two r.v.'s with given joint distribution.
This ends our theoretical background of the local dependence structural study. Next we illustrate its application on qualitative and quantitative probability models.
3. Illustrations
3.1 Reliability systems
In this case let us consider the two traditional systems of independent components, the system in series and the system in parallel. We want to study how the regression coefficients of a component with respect to the system, and vise verse, regression coefficient of the system with respect to a component change in time during the work of the system. For simplicity consider system of just two components, since considering one component, everything else can be considered as a second component. Results of the studies are shown next.
3.1. A system in series. Assume both components have live times exponentially distributed with parameters Ai and A2 . Then the reliability function at any time instant t (this is the event B) equals
r(t)= e , and the probability that component 1 functions (this is the event A) is e. The
regression coefficient of the system with respect to component i is then
R (s) = r(t) - r(t)e= *
Analogously we evaluate the regression coefficient of the component 1 with respect to the system at time t. It is given by the relation
r(t) - r(t)e~* _ 1 -
Rs (1) r(t)[1 - r(t)] 1 - .
And the correlation coefficient between system reliability and the component reliability are changing during the time according to the relations
Ie(1 - e^ ) f)= (1 - e^)
Ps,l(t) = -y 1 - e-(A +^)t ' Ps,2\l) -y 1 - e-(*1 +^)t .
Notice that all dependences are positive. Graphs of these functions of local dependence in time for Ai=1 and A2=2 are shown on next figures.
We observe that the system reliability local correlation measures of dependence is decreasing to 0 for both components, but is higher with the weakest component 2, when the time increases. In the same time the regression coefficients between the system and the strongest component behave different: Local dependence Ri(S) approaches 0 with the time (like system becomes independent on component lwith the growth of the time) when the local dependence Rs(1) of strongest component 1 on the system reliability approaches 1 with the growth of the time.
3.2. System in parallel. Assume again both components have live times exponentially distributed with parameters Ai and A2. Then the reliability function at any time instant t (this is the event B) equals r(t)= 1 - (1 - e ^ )(1 - e ^), and the probability that component 1 functions (this is
the event A) is e Àl'. Applying the rules we obtain:
The regression coefficient of the system with respect to component 1 is then
r (S )==,.
-x2t
-Xf
(1 - e*)
Analogously we evaluate the regression coefficient of the component 1 with respect to the system at time t. It is given by the relation
„-Xt
Rs (1) =
r( t) - r(t)e Xt __
r(t)[1 - r(t)] " 1 - (1 -e-Xlt)(1 -e~Xt)
And the correlation coefficient between system reliability and the component reliability are changing during the time according to the relations
Ps ,1(0 =
-\t
(1 - )
1 - (1 - e-Xt)(1 - )
Ps,2 (t) =
t (1 - e-Xt )
1 - (1 - e^X1t )(1 - e^X2t )
X t\
Notice that all dependences are positive. Graphs of these functions of local dependence in time for Ai=1 and A2=2 are shown on next figures. First one represents the two regression coefficients superimposed on the same graph, and he second represents the two correlation coefficients.
We see that the system reliability local correlation measure of dependence is approaching 1 with the strongest component 1, and approaches 0 with the weakest component when the time
e
e
e
e
increases. In the same time the both regression coefficient between the system and the strongest component approach 1 with the growth of the time.
3.3. Categorical variables
The most interesting and valuable applications in the Big Data analysis we see in the analysis of local dependences between non-numeric vs non numeric variables, as well as between non-numeric vs numeric variables. Since analysis in this kind of studies is (according to us obviously) too similar, we recommend here as an example of local dependence between categories of two non-numeric random variables. It is just an illustration of the proposed measures of dependence between random events. We analyze here an example from the book of Alan Agresti, (2006). You can see this illustration in the work of Dimitrov, 2010.
3.4. A challenging idea in modeling dependent variables
Modeling dependence in multivariate distributions always has been and still is a hot topic in applied probability, statistics and risk studies. One of the most popular approach in modeling dependence is known as Farlie-Gumbel-Morgenstern dependence model. It is using a construction of bivariate distributions as a mixture of two or more marginal distributions. The main disadvantage of this approach is that it produces multivariate distributions with limited magnitude of the correlation coefficient pxY. Original construction gives pxy within [-.32, .32]. Some generalizations lately (Bekrizadeh et al, 2012) expanded this range to [-.5, +.43]. Other approaches based on copula constructions (Joe, 1997, Nelsen 2006) offer constructions for dependent multivariate distributions with desired marginal. In most of these constructions is used mostly analytical instrumentation where one can get the goal, but loses the meaning.
In this subsection we offer a construction which is based on the dependence between the two components of the random vector due to the presence of a common random component in each. In our opinion, such models are of interest in reliability and risk modeling where competitive risks are presented and have realistic meaning. And each risk is presented by a r.v. kind of independent on the others. We illustrate this approach on a very particular bivariate dependence, where components are indicator variables.
Let U, V and Wbe independent one dimensional r.v. Consider the following constructions:
A) X = min(U, W); Y = min(V, W), and the pair (X, Y);
B) X = max(U, W); Y = max(V, W), and the pair (X, Y);
C) X = min( U, W); Y = max(V, W), and the pair (X, Y);
D) X = U + W; Y = V + W, and the pair (X, Y).
Other algebraic operations also may be used in similar constructions. Obviously, the components of each pair are dependent due to the presence of one and the same component W in
both. The good thing here is that we see the interaction between X, and Y. And also, here one may use any distributions or the original risks U, V and W. Our goals here are to find the correlation coefficients in each of the above 4 constructions, and also to investigate the local correlation structure between X, and Y in the light of the proposed measures for the strength of local
dependence explored recently (Dimitrov 2010, 2015). Actually, we will use the measure Pa,B
defined in Definition 2.We start on the grass roots, considering the examples when U, V and W have the simplest Bernoulli distribution i=1,2,3, or have the Uniform distributions on [0, 1].
U, V, W 0 1
fi(.) qt =1 - pi Pi
Everyone knows that the expected values and standard deviation of a Bernoulli distributed r.v. are E(U) = p, and ou =^¡~pq .
3.4.1 Minimum-Minimum competing risks
Elementary combinatorial considerations will convince you, that the joint distribution of the random vector (X, Y) is presented by the table
Table 1
0 1 fx(.)
0 1—P3(p1+p2-p1p2) qrpip3 1 - pp
1 pqp3 ptpip3 pp
MO 1 - pip3 pip3 1
On the margins are the marginal distributions of the components X, and Y, and each of it is also a Bernoulli distributed r.v. This fact will simplify your calculation of the correlation coefficient pxY, using the short cut rule
pxY = [E(XY) - E(X)E(Y)]/ [ox oy],
and the results above about Bernoulli distributed r.v. After several algebraic manipulations we arrive to the expression
pxY = q3
P1P2
(1 - Pi PsX1 - P2P3)
A brief analysis of this expression shows, that this correlation coefficient can take any value between 0 (when q3 is close to 0, and pi, p2 are small), and 1 (when q3 is close to 1, and so are pi, p2). Hence, pending on probabilities pi, p2 and p3, any correlation between X and Y is feasible.
In particular if U, V and W are equally distributed, then
pXY = p/(1+p).
But this time the correlation coefficient may take values only between 0 and 0.5. Of course, you may get negative correlations of same size if change second component Y by its negative -Y.
Local dependence magnitudes.
Now let see the strength of dependence of the event {Y=0} with respect to the event {X=0}. It means that we may predict the event {Y=0} if we know that it occurred {X=0}, by making use of
relations above. First determine the local regression coefficient using Definition 1 and data in Table 1. We get
Rx=0(Y=0) = Piq3 .
1" P1P3
Further considerations show, that if we know individual parameters of variables U,V and W, and know that event (X=0) occurred, then our prediction of event (Y=0) will be given by the posterior probability
P( Y=0l X=0) =1- qipip3/[1 - pp],
Due to the positive dependence, the prediction probability increases with the information of the known value of component X.
Not going into detailed explanations, we get
pq
Rx=0(Y=1)=--; P(Y=1 I X=0) = qip2p3/(l-pip3)
1" P1P3
As we expect, we have P( Y=0I X=0) + P(Y=1 I X=0) = 1.
Let see the local strength of dependence of the event {Y=0} with respect to the event {X=1}. It means that we may predict the event {Y=0} if we know that it occurred {X=1}, by making use of Definition 2. First determine the local regression coefficient using Definition 1 and data in Table 1. We get
Rrx=1(Y=0) = ~Pq3 .
1 ~ P1P3
This negative regression coefficient indicates that chances of the event {Y=0} to happen decrease if it is known that event {X=1} occurred. And the equivalent to the Bayes posterior probability rule now is valid
P(Y=0I X=1) = 1- pip3 -p2q3 = 1-p2.
Similarly, we determine Rx=1(Y=1), and respective posterior probabilities:
Pq
Rrx=1(Y=1)= ; P(Y=1I X=1) = p2p3 +piq3= pi.
1 ~ P1P3
Compare all four results, we observe complete symmetry in regards of the local dependence strengths: Positive regression coefficients for same results in Y as in x, and negative (same magnitudes) for opposite result. So to say, the two risks support each other in the sense that they act in same direction.
Since symmetric constructions, the regression coefficients Ry=j(x=i), for i , j = 0, 1 in relationships above can be found by same expressions, when keeping p3 and q3 as is, and changing indices p1 and qt to pi and qi, and vice versa. We skip details, but give a numeric example for (A): p1=.3, p2=.6 and p3=.9 with calculated correlation coefficient (as measures of global dependence), the regression coefficients (as measures of local dependence) and the posterior probabilities for each variable. For comparison, we also give the same characteristics calculated for the combination of numeric parameters (B): p1=.3, p2=.6 and p3=.1. We have
pxY(p1=.3, pi=.6, p3=.9) = 0.0732143;
Table 2. Joint distribution (X,Y)
0 1 fx(.)
0 .352 .378 .73
1 .108 .162 .27
MO .46 .54 1
The next graphs show surfaces within the cube {0,1]x[0,1]x[0,1], where combination of values pi, p2, and p3 produce correlation coefficient of equal values.
Minimum-Minimum competing risks
■ The Global Correlation Is given by
In the next illustrations we will not give detailed numerical analysis, and will show jist the summary graphs similar to this one.
3.4.2 Maximin-Maximum competing risks
Max-Max competing risks
3.4.3 Minimum-Maximum competing risks
3.4.4 Sums of competing risks
+
Sums of competing risks.
■ Here we consider the configuration
X- U + IV, Y- V + W, and the pair (X, >0. Table 2.4. Joint distribution of (X, Y)
Y X 0 ] f JH-)
0 0 <liQ>
1 Ptfih Wjft
0 Pl1}P) PJhPi P\P)
ft) Pjft 1
Global Correlation coefficient P\y ~
V</V/i + P&XPih + P\<i< )
Sums of competing risks -1
+
Mndcl "f Swniret ocicblinn Loti suri«« |«X.Vt- "5 Model of S«mm*<*i\.-blinn Lovl ч|гГкх «X-Vl- 25
Sums of competing risks - 2
+
McdciofЗшмСлтЬмжLcvH HitxcplVY)-^ MmfclofS«ram>Coml4MiiU»cl wrfacc (iX.»
Sums of competing risks - 3
I
Moifcl иf SuninMCoiTcbUûn Lnd >ш!*с Model flf SuniotX ixicmIuui Lnd *iil*c »*ХЛ*Ь '»5
Conclusions
We extend our previous study of local dependence between random events to measures of local dependences between random variables. This turns into a study of the local dependence at a rectangle where interval values of the random variables meet. These local dependences are universally valid and can be continued for higher dimensions. As illustrations, we consider local dependences in reliability systems. The numerical illustrations can be graphically visualized, and show that local dependence is essentially different on different areas in the field. Graphics offer much more comments and further thoughts. Our expectations are that the analysis of Big Data sets will be enriched with the inclusion of our approach into its system tools. An excellent example of this approach can be seen in Dimitrov and Esa (2018).
We also discussed four models for constructing of dependence between two random variables (X,Y) build on 3 independent Bernoulli distributed r.v.'s Um V and W with different parameters.
These models are producing Correlation coefficients in different ranges. These ranges are shown on Correlation Level surfaces in the space of probabilities for success in the used Bernoulli variables in the models. Local dependences between values of X and Y are studied via the correlation coefficient' magnitudes.
Their numerical values serve are presented for particular combinations of parameters, and graphs of some level surfaces are shown.
We are sure that using other particular distributions of the components, different from the Bernoulli ones, may lead to more inetersting and useful results.
References
[1] A. Agresti (2006). Categorical Data Analysis, New York, John Wiley & Sons.
[2] I. Bairamov, S., Kotz (2002), Dependence structure and symmetry of Huang-Kotz FGM distributions and their extensions, Metrika 56 , pp 55-72.
[3] N. Blomqvist (1950), On a measure of dependence between two random variables, Annals of Mathematical Statistics 21, pp. 593-600.
[4] J. F. Caarfere (2004), Copulas, Encyclopedia of Actuarial Science, Vol., 1-3, Wiley, New York, NY.\
[5] B. Dimitrov, (2014) Dependence between Random Events and Its Use, Proceedings of the FISC Conference booklet (2014), pp. 119- 132.
[6] B. Dimitrov (2013). Measures of dependence in r4liability, Proceedings of the 8-th MMR'2013, Stellenbosh, South Africa, July 1-4, pp 65-69.
[7] B. Dimitrov (2010). Some Obreshkov Measures of Dependence and Their Use Compte Rendus de l'Academie Bulgare des Sciences, V. 63, No.1, pp. 15-18.
[8] B. Dimitrov and. S. Esa (2918) Interval dependence structures of two bivariate distributions in riskand reliability, RTA v. 13, No 1, pp. 28 - 38.
[9] D. Drouet-Mari, S., Kotz (2001), Correlation and dependence, Imperial College. Press, London,
[10] F. Durante, A new class of symmetric bivariate copulas, Nonparametric Stat. 8(2006), 499510.
[11] F. Durante, P. Jaworski, A new characterization of bivariate copulas, AMS Subject Classification: 60E05, 62H20, 26A24, 2009.
[12] Esa, S. and Dimitrov, B. (2016). Dependence Structures in Politics and in Reliability. In Proceedings of the Second International Symposium on Stochastic Models in Reliability Engineering, Life Science and Operations Management (SMRLO'16), I. Frenkel and A. Lisnianski (eds.), Beer Sheva, Israel, February 15-18, pp. 318 - 322, IEEE CPS, 978-1-46739941-8/16.
[13] S. Esa and B. Dimitrov. Dependencies in the world of politics, Proceedings of the 8-th MMR'2013, Stellenbosh, South Africa, July 1-4 , pp 70 - 73.
[14] S. Esa and B. Dimitrov. (2013) "Survival Models in Some Political Processes", Risk Analysis and Applications (RT&A), v. 8, # 3, (30) pp 97 - 102.
[15] D. G. J. Farlie, The performance of some correlation coefficients for a general Bivariate distribution, Biometrika 47(1960), 307-323.
[16] H. Joe, Multivariate models and dependence concepts, Chapman & Hall, London, 1997.
[17] J. S. Huang, S., Kotz, Modifcations of the Farlie-Gumbel-Morgenstern distributions, A tough hill to climb. Metrika 49(1999), 135-145.
[18] J. M. Kim and E. A. Sungur, New class of bivariate copulas, In Proceedings for the Spring Conference of Korean Statistical Society, pages 207-212, 2004.
[19] D. Morgenstern, Einfache beispiele zweidimensionaler Verteilungen. Mitteilungsblatt für Mathematische Statistik 8(1956), 234-235.
[20] R. B. Nelsen, An introduction to copulas, Springer Series in Statistics. Springer, New York, second edition, 2006.
[21] R. B. Nelsen, Characterization of the Farlie-Gumbel-Morgenstern distribution by the property of the correlation coefficient, SankhyaA A, 56(1994), 476-479.