Научная статья на тему 'FAST AND ROBUST BIVARIATE CONTROL CHARTS FOR INDIVIDUAL OBSERVATIONS'

FAST AND ROBUST BIVARIATE CONTROL CHARTS FOR INDIVIDUAL OBSERVATIONS Текст научной статьи по специальности «Медицинские технологии»

CC BY
31
6
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
Gnandesikan-Kettenring estimator / Qn estimator / Sn estimator / MAD / τ estimator

Аннотация научной статьи по медицинским технологиям, автор научной работы — Sajesh T.A.

There are various circumstances where it is important to simultaneously monitor or control two or more related quality characteristics. Independently tracking these quality characteristics might be quite deceptive. Hotelling's T2 chart, in which the T2 statistics are generated using the classical estimates of location and scatter, is the most well-known multivariate process monitoring and control approach. It is well known that the existence of outliers in a dataset has a significant impact on classical estimators. Any statistic that is computed using the classical estimates will be distorted by even a single outlier. The non-robustness issue is investigated in this study, which also suggests four robust bivariate control charts based on the robust Gnandesikan-Kettenring estimator. This study employs four highly robust scale estimators, with the best breakdown point, namely the Qn estimator, Sn estimator, MAD estimator, and τ estimator, in order to robustify the Gnandesikan-Kettenring estimator. Through the use of a Monte Carlo simulation and a real-life data, the performance of the suggested control charts is assessed. The four techniques all outperform the traditional method and provide greater computing efficiency.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «FAST AND ROBUST BIVARIATE CONTROL CHARTS FOR INDIVIDUAL OBSERVATIONS»

FAST AND ROBUST BIVARIATE CONTROL CHARTS FOR INDIVIDUAL OBSERVATIONS

Sajesh T A

Department of Statistics, St. Thomas College (Autonomous), Thrissur, Kerala, India

*[email protected]

Abstract

There are various circumstances where it is important to simultaneously monitor or control two or more related quality characteristics. Independently tracking these quality characteristics might be quite deceptive. Hotelling's T2 chart, in which the T2 statistics are generated using the classical estimates of location and scatter, is the most well-known multivariate process monitoring and control approach. It is well known that the existence of outliers in a dataset has a significant impact on classical estimators. Any statistic that is computed using the classical estimates will be distorted by even a single outlier. The non-robustness issue is investigated in this study, which also suggests four robust bivariate control charts based on the robust Gnandesikan-Kettenring estimator. This study employs four highly robust scale estimators, with the best breakdown point, namely the Qn estimator, Sn estimator, MAD estimator, and t estimator, in order to robustify the Gnandesikan- Kettenring estimator. Through the use of a Monte Carlo simulation and a real-life data, the performance of the suggested control charts is assessed. The four techniques all outperform the traditional method and provide greater computing efficiency.

Keywords: Gnandesikan- Kettenring estimator, Qn estimator, Sn estimator, MAD, t estimator.

1. Introduction

Bivariate control charts are specifically designed for situations where two variables are observed simultaneously. It enables the detection of patterns or trends that signal a shift or alteration in the process. There are two separate phases, namely Phase I and Phase II when constructing the control chart [1]. Historical data is used in Phase I to determine control limits, estimate the unknown parameters of the in-control process, and evaluate the process' stability. Phase II involves applying the estimated parameters and control limits discovered in Phase I to the data gathered during the actual production process in order to analyse it and find any deviations or out-of-control signals. Phase II's goal is to keep track of and maintain the process' stability in accordance with the defined control limits.

The most frequently used multivariate control chart for monitoring the variability of a multivariate industrial process is the Hotelling's T2 control chart. Let xt, i = 1, 2, ..., n be a two-dimensional vector of measurements made on a process at the time period i, then for the sample x = {x1, X2, . . ., xn}, the Hotelling's T2 statistic is defined as

T2(xi) = (xi-eyz-1(xi-e). (1)

It is assumed that, when the process is in statistical control, x/s are independent bivariate normal random vectors with mean vector d and covariance matrix. If both d and L are known, Ti2 follows a Chi-square distribution with 2 degrees of freedom. When the population parameters d and L are unknown, the T2 statistic is constructed using the classical estimators of mean vector (x) and covariance matrix (S) as given below,

T2(Xi) = (Xi-x)TS-1(Xi-x),

where x =

and S =

Su

S21

S12 $22

,S12 = Cov(X1, X2). When constructing control charts using phase I

data, the classical estimators, which are unfortunately highly susceptible to the influence of outliers, can produce inaccurate findings (known as the masking problem). Researchers have suggested several methods in the literature to lessen the negative effects of outliers in response to this problem. The control charts can be made more robust and reliable in the presence of outliers by using robust estimators. Through the use of robust estimators, which are more resistant to the presence of outliers, the conventional estimators are swapped out in these alternative methods.

This paper proposes a robust bivariate control chart that can effectively handle spurious outliers. The proposed control chart makes use of the covariance estimator introduced by Gnanadesikan and Kettenring [2], which is based on the identity

cov(X, Y)=- (a(X + Y)2 - a(X - Y)2),

4

(3)

where o is the standard deviation and X, Y is a pair of random variables. By replacing o by a robust scale estimator, one can easily robustify the Gnanadesikan and Kettenring (GK) estimator. The robust GK estimator is defined as

covR(X, Y) =-4(sr(X + Y)2- sr(X - Y)2), (4)

where sr is a robust scale estimator. This study has considered four robust scale estimators with an optimal breakdown point to robustify the GK estimators. These robust GK estimators were then used for the construction of bivariate control charts. The Lower control limits (LCL) of these control charts are set to zero and the Upper Control limits (UCL) were estimated by fitting the quantiles. The performance of these control charts is examined and compared with that of classical bivariate control chart through Monte Carlo simulation.

2. Robust scale estimators

For a variety of applications, from genuine scale problems to outlier identification, and as auxiliary factors for more complicated analysis, robust estimates of scale are crucial. The broad population of users of statistical methods seems to have a somewhat lower level of acceptance for robust estimation of scale. Previously, the interquartile range, which has a breakdown point of 25%, was the only robust scale estimator to be found in the majority of statistical software packages.

i) Median Absolute Deviation (MAD)

A preliminary or auxiliary estimate of scale is frequently required in robust estimation. A very robust scale estimator is the median absolute deviation about the median (MAD), given by

MAD = b medi^xi - med, (Xj)|}, (4)

where 'med' denotes median. The MAD has the best possible breakdown point (50%, twice as much as the interquartile range), and its influence function is bounded, with the sharpest possible bound among all scale estimators. The MAD was first promoted by Hampel [3], who attributed it to Gauss. The constant b in equation (4) is needed to make the estimator consistent for the parameter of interest. In the case of the usual parameter a at Gaussian distributions, we need to set b = 1.4826. In spite of many advantages, the MAD also has some drawbacks. Its efficiency at Gaussian distributions is very low;

Sajesh T A RT&A, No 4 (76) ROBUST BIVARIATE CONTROL CHART_Volume 18, December 2023

whereas the location median's asymptotic efficiency is still 64%, the MAD is only 37% efficient. Also, it takes a symmetric view on dispersion.

ii) t Estimator

Yohai and Zamar [4] introduced a new class of robust scale estimates called t estimates which possesses optimal breakdown value. Let p be a real function satisfying the following properties: 1. p(0) = 0.

2. p(-u) = p(u).

3. 0 < u < v implies thatp(u) < p(v).

4. p is continuous.

5. Let a = Sup p(u); 0 < a < ^

6. If p(u) < a and 0 < u < v, then p(u) < p(v).

7.

Let p1 an p2 bet two functions satisfying above assumptions. Then for a given sample u = (u1, ..., un) the t estimate for scale is defined as

xn(u) = s2(u)i£f=1p2 (5)

where s be a M estimate of scale based on p1. This estimator possesses approximately 80% efficiency when c = 3. Moreover, t estimate is asymptotically normal and has bounded influence function. Maronna and Zamar [5] used this estimate for introducing a multivariate outlier detection technique in which they have considered s = MAD and p2(x) = min (x2, c2). This study also used the considerations of Maronna and Zamar for s and p2.

iii) Sn and Qn Estimator

To address the lower efficiency drawback of MAD, Rousseeuw and Croux [6] introduced two robust scale estimators with optimal breakdown value of 50%, namely, Sn estimator and Qn estimator. Sn estimator is defined as

5„ = c med{med |x; — x,|}. (6)

The factor c is for consistency, and its default value is 1.1926. Moreover, the asymptotic efficiency Sn is 58.23% which is much higher than MAD. A drawback of MAD, and Sn, is that their influence functions have discontinuities. The Qn estimator is solution for this drawback. It is defined as

Qn = 4|xi-X/M<;}(fc), (7)

n

where d is a constant factor and k = ~ where ft = [n] + 1 is roughly half of the observations.

The estimator Qn, shares the attractive properties of Sn simple and explicit formula, a definition that is equally suitable for asymmetric distributions, and a 50% breakdown point. In addition, we will see that its influence function is smooth, and that its efficiency at Gaussian distributions is very high (about 82%). Rousseeuw and Croux [6] showed that, although Qn is more efficient, Sn is more preferable in most of the applications because of its low gross-error sensitivity.

3. Proposed Bivariate Robust Control Charts

Let {xi, . . ., xn} be a set of Phase I data follows bivariate normal distribution with mean vector d and covariance matrix L. Let y £ {xi, . . ., xn}be a Phase II observation, then it is known that

^-[^W^ (8)

where T2(y) is as defined in Equation (1) and F^VliV2) is F distribution with (vi, V2) degrees of freedom [7]. Since this statistic is not robust to the presence of outliers, in the proposed robust control chart we use robust Hotelling's T2 statistic, denoted by TR. It is obtained by replacing the classical estimates used for the computation of Hotelling's T2 statistic by their robust counterpart. Suppose xmed and Sgk represent the component wise median vector and robust GK covariance matrix, respectively. We define a robust Hotelling's T2 for y based on these estimates by

TR (y) = (y- xmed)T S-1(y - xmed), (9)

where SCK =

Sr(XI)2 COVR(XI,X2) c ovr(Xi,X2) Sr(X2)2

. By applying Slutsky theorem [8], the asymptotic distribution

of TR2 can be obtained as Chi square distribution with 2 degrees of freedom. As n ^ m

D

(y - Xmed)TS--iK(y - Xmed) ^ (y - 0)TZ-1(y - 9) ^y (10)

This asymptotic distribution, though, only holds true for large sample sizes. We employ Monte Carlo simulations to estimate quantiles for different sample sizes in order to determine the control limits for the suggested control charts. The sample size and quantiles of TR: were then fitted with a smooth curve. For modest Phase I sample sizes, these fits can be utilised to determine the proper control limits of the proposed control charts.

3.1. Estimation of Control Limits for the proposed control charts

Upper control limits of the proposed control charts are obtained by modelling the quantiles of TR2, for a given Phase I sample size n, computed from N=10000 trials. Phase I samples are generated from a standard bivariate normal distribution N2(0, I) and 99%, 95%, 99.73% and 99.9% quantiles of Tr2 are computed using Sn, Qn, MAD and t estimates. In each trial, for each data set, we also generate a new random observation yt from N2(0, I) (treated as a Phase II observation) and calculate the corresponding Tr (yd value using robust GK covariance estimates. By inverting the empirical distribution function of TR2 (yd, computed for n = 10,15, 20,. . . ,500, we obtain Monte Carlo estimates of the 99%, 95%, 99.73% and 99.9% quantiles.

Scatter plots of the empirical quantiles of TR2 (yt) versus the sample size n suggest that we could model the quantiles using a family of regression curves of the form f(n) = a + . Scatter plots of the empirical 99% quantiles of TR2 (yt) computed using the four robust scale estimates are shown in figure 1. Since the TR2 statistic asymptotically follow X(2) distribution, following two parameter family of curves is used for both robust GK control charts:

f1-a(n)=X\2,i-a)+^, (11)

where X^2,1-a) is the 1 - a quantile of the x2 distribution with 2 degrees of freedom and b1-a and C1-a are constants with overall false alarm probability a. Fitting this curve to the data will help us to estimate the desired upper control limits of the proposed control charts for any Phase I sample of size n. Note that, as n increases, fi--a(n) approaches x22j1-a). Table 1 gives the least-square estimates of the parameters b1-a and C1-a. Using Table 1 and Equation (11), we can compute the 99%, 95%, 99.73% and 99.9% quantiles of TR (yi) for Phase I sample size n. The regression curves given by Equation (9) fit well to all the cases in Table 1, yielding R2 values of at least 90.7%.

Figure 1: Simulated Quantiles of T^ and the Fitted Curves for a = 0.01

Table 1: The Least-Squares Estimates of the Regression Parameters bi-a and ci-a for Confidence Levels 1 - a -0.99, 0.95, 0.9973 and 0.999.

a Parameters GK(Qn) GK(Sn) GK(MAD) GK(t)

0.01 b\-a 13120.00 1281.00 5496.00 1145.00

c1-a 2.451 1.598 1.758 1.632

R2 0.983 0.963 0.985 0.954

0.05 b1-a 228.80 249.50 462.60 324.20

c1-a 1.437 1.399 1.373 1.447

R2 0.916 0.930 0.976 0.942

0.0027 b1-a 120600.00 11440000.00 721800.00 2610.00

c1-a 2.945 4.66 3.22 1.656

R2 0.987 0.990 0.990 0.907

0.001 b1-a 556600.00 26090.00 41240.00 3973.00

c1-a 3.526 1.947 1.769 1.826

R2 0.968 0.920 0.950 0.930

0.1 b1-a 31530.00 248.30 219.4 118.60

c1-a 1.65 1.574 1.31 1.281

R2 0.951 0.919 0.967 0.947

3.2. Computational algorithm for obtaining the proposed control chart A step-by-step approach for constructing proposed robust control chart is given as follows: Phase I

1. Select the confidence level 1 - a and sample size n.

2. Collect the Phase I data {x1, X2, ..., xn} at predetermined periodic intervals and compute robust estimates of location and scale parameters using this data.

3. Use Table 1 to select the least square estimates of the parameters b1-a and C1-a for the desired a, then use equation (11) to determine the upper control limit.

Phase II

4. Compute TR2 for each of the new observation as per Equation (11) and plot it on a control chart with the control limits determined in Phase I.

5. Analyse the chart and look for any out-of-control points or non-random patterns.

4. Performance of the Proposed Control Charts

We carried out numerous simulations to assess and compare the efficiency of the proposed control charts under normal and contaminated situations. The control chart's efficacy is assessed by analysing its ability to detect changes and the rate of false detections in the process behaviour using different estimators in Phase I. We measure the performance of the control chart by success rate (SR)- which is the proportion of statistic values that exceed the control limits across 1000 replications, which provides an estimate of the likelihood of detecting changes and false alarm rate (FAR) - which is the false detections in the process behaviour based on the Phase II data. Phase I data are generated using the following contaminated model:

(1 - E) N2(00, £0) + E N2(0 1, £1),

where e is the proportion of outliers, 00 and £0 are the in-control parameters and 01 and £1 are the out-of-control parameters of location and scatter. Without loss of generality, 00 is set to be a zero vector. We have generated the phase I data sets for n = 25, 50, 100 and 1000, E = 0, 0.1 and 0.2 and a = 0.001, 0.0027, 0.01, 0.05 and 0.1. The probability of detecting a change depends on the values of 01, £0 and £1 and hence we consider three different contaminated models in which the values of 0 1, £0 and £1 vary.

Case A: Independent Variables

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

In this case, the two variables (Quality Characteristics) X1 and X2 are assumed to be independent. The contaminated normal model considered is as follows:

(1 - E) N2(0, I2) + E N2(01, I2),

where I2 is the identity matrix of size 2. In this case, we compare the behaviour of different robust alternatives when there are different-sized changes in the average of all the variables if the variables are independent.

Case B: Correlated Variables

In this case, two variables, x1 and x2 are assumed to be correlated. The contaminated normal model considered is as follows:

(1 - e) N2(0, £0) + E N2(01, £0),

r 1 0 9l

where £0 = [ ' ]. We have used this value of £0 to analyse whether the correlation level affects the detection probability of each alternative.

Table 2: SR (FAR) obtained for Case A with different values of Qi and £

n Q1 £ GK(Qn) GK(Sn) GK(MAD) GK(t) Classical

50 (0,0) 0 100 (1.9) 100 (1.0) 99.8 (0.8) 100 (1.3) 100 (1.3)

10 100 (1.4) 100 (1.0) 99.9 (0.4) 100 (1.1) 100 (1.1)

20 100 (1.3) 100 (0.9) 99.8 (0.6) 100 (0.6) 100 (0.6)

(5,5) 0 100 (2.0) 100 (1.6) 99.7 (1.3) 100 (1.2) 100 (1.5)

10 98.8 (0.3) 98.8 (0.3) 97.7 (0.4) 98.6 (0.1) 22.1 (0.2)

20 86.2 (0.0) 80 (0.0) 76.8 (0.0) 74.3 (0.1) 2 (0.2)

(10,10) 0 100 (1.1) 99.9 (0.3) 99.7 (0.5) 100 (1.7) 100 (0.8)

10 99.4 (0.3) 99.1 (0.1) 97.9 (0.3) 99.1 (0.4) 0.6 (0.4)

20 88.3 (0.0) 80.8 (0.0) 79.8 (0.1) 75.8 (0.2) 0.4 (0.3)

100 (0,0) 0 100 (1.2) 100 (1.0) 100 (1.0) 100 (1.2) 100 (1.2)

10 100 (1.5) 100 (1.4) 100 (1.3) 100 (1.3) 100 (1.3)

20 100 (1.4) 100 (1.3) 100 (1.2) 100 (0.7) 100 (0.7)

(5,5) 0 100 (0.6) 100 (0.7) 100 (0.9) 100 (1.5) 100 (0.6)

10 99.7 (0.2) 99.6 (0.1) 99.8 (0.3) 99.8 (0.4) 30.2 (0.4)

20 92.5 (0.0) 90.4 (0.0) 93.5 (0.0) 83.9 (0.0) 2.5 (0.1)

(10,10) 0 100 (1.5) 100 (1.2) 99.9 (0.7) 100 (0.7) 100 (1.1)

10 99.8 (0.2) 99.8 (0.3) 99.8 (0.5) 99.6 (0.4) 0.1 (0.2)

20 91.1 (0.1) 89.4 (0.1) 92.5 (0.1) 85.8 (0.2) 0.4 (0.4)

500 (0,0) 0 100 (1.9) 100 (1.9) 100 (1.8) 100 (1.8) 100 (1.8)

10 100 (1.5) 100 (1.4) 100 (1.4) 100 (1.4) 100 (1.4)

20 100 (0.8) 100 (0.9) 100 (0.9) 100 (0.8) 100 (0.8)

(5,5) 0 100 (1.4) 100 (1.2) 100 (1.4) 100 (1.2) 100 (1.2)

10 99.8 (0.0) 99.8 (0.0) 99.9 (0.0) 99.8 (0.1) 34.5 (0.0)

20 94.8 (0.0) 95.5 (0.0) 98.1 (0.0) 89.3 (0.0) 2.4 (0.3)

(10,10) 0 100 (0.6) 100 (0.7) 100 (0.8) 100 (1.8) 100 (0.6)

10 99.8 (0.1) 99.9 (0.1) 99.9 (0.2) 99.9 (0.2) 0.7 (0.1)

20 94.6 (0.0) 95.1 (0.0) 97.8 (0.1) 88.1 (0.0) 0.4 (0.2)

1000 (0,0) 0 100 (1.0) 100 (0.9) 100 (0.8) 100 (0.9) 100 (0.9)

10 100 (0.8) 100 (0.8) 100 (0.9) 100 (0.8) 100 (0.8)

20 100 (0.6) 100 (0.5) 100 (0.5) 100 (0.7) 100 (0.7)

(5,5) 0 100 (0.8) 100 (0.8) 100 (1.0) 100 (0.7) 100 (0.6)

10 99.9 (0.2) 99.9 (0.2) 100 (0.3) 100 (0.0) 34.9 (0.3)

20 94.9 (0.0) 95.8 (0.1) 98.3 (0.1) 89.5 (0.0) 2 (0.4)

(10,10) 0 100 (1.8) 100 (1.9) 100 (1.7) 100 (0.8) 100 (1.8)

10 99.8 (0.1) 99.8 (0.1) 100 (0.2) 99.8 (0.1) 0.8 (0.2)

20 93.8 (0.0) 94.7 (0.0) 98.3 (0.1) 89 (0.0) 0.3 (0.2)

Case C: Correlated Variables and Regression Outliers

Here, the two variables X1 and X2 are assumed to be correlated and regression outliers are introduced. The contaminated normal model considered is as follows:

(1 - £) N2(0, £0) + £ N2(Q1, £1),

where £0 = [^ and £1 = [^ 0 ]. In this case, we analyse and compare the proposed robust

methods in terms of the so-called good leverage and regression outliers.

In all the three cases, we consider 01 as a vector of size 2 where the elements are all 0 (when there is no change), 5 or 10 (which is a good leverage point). This process is repeated 1000 times, and in each trial, a random observation, zu from N2(0, £u) and another observation, Z2k from n2(0c, £c) are generated, for k = 1, 2, ..., 1000. Here, £u is the scale estimator used for generating uncontaminated observations in Phase I data, and 0C and £C are the location and scale estimates used for generating contaminated observations in Phase I model. The success rates are computed as the percentages of Z2k's that are successfully detected, and the false alarm rates are computed as the percentages of z1k's that are falsely detected. The results obtained for a = 0.01 is presented here.

Table 3: SR (FAR) obtained for Case B with different values of 01 and e

n 01 E GK(Qn) GK(Sn) GK(MAD) GK(t) Classical

50 (0,0) 0 95.6 (4.1) 90.6 (4.6) 84.6 (2.7) 97 (2.8) 97.4 (0.8)

10 95.6 (4.3) 91.4 (3.2) 84.6 (2.7) 96.3 (2.7) 96.3 (0.9)

20 95.5 (3.4) 90.9 (4.8) 82.4 (1.7) 95.7 (2.6) 96.4 (1.4)

(5,5) 0 94.7 (4.5) 91.2 (3.8) 83.6 (3.0) 96.9 (1.4) 97 (0.9)

10 84.7 (3.0) 77.1 (2.9) 67.2 (1.7) 79.7 (3.1) 17.9 (0.4)

20 57 (2.3) 48.2 (1.9) 40.5 (1.7) 40.2 (1.1) 2.3 (0.2)

(10,10) 0 95.3 (3.4) 91.2 (3.5) 82.4 (2.8) 97.1 (2.5) 98.3 (1.2)

10 84.4 (3.9) 76.2 (1.9) 67.4 (2.6) 80.3 (2.6) 0.7 (0.2)

20 57.1 (2.5) 46.3 (1.3) 43.8 (0.8) 41.1 (2.1) 0.8 (0.4)

100 (0,0) 0 98.6 (3.3) 96.3 (4.7) 94.2 (3.9) 98.6 (2.4) 98.4 (1.3)

10 97.9 (2.0) 95 (3.7) 92.2 (3.4) 97.6 (2.2) 97.4 (1.1)

20 98.5 (1.8) 95.8 (3.4) 92.4 (3.5) 97.9 (0.9) 97.8 (0.4)

(5,5) 0 98.5 (2.7) 96 (4.8) 93.2 (3.6) 98.1 (0.9) 98.1 (1.0)

10 90.4 (1.5) 86.9 (2.6) 86.7 (3.1) 85.8 (2.0) 23.5 (0.9)

20 63.3 (0.9) 56.8 (2.0) 61.4 (2.7) 49.6 (1.9) 2.8 (0.2)

(10,10) 0 98.8 (2.1) 96 (3.7) 93 (3.6) 98.5 (1.7) 98.7 (1.1)

10 91.2 (0.7) 87.4 (2.2) 87.6 (2.6) 88.2 (2.1) 0.8 (0.3)

20 62.4 (0.7) 57.5 (2.5) 60.1 (2.0) 51.2 (1.6) 0.7 (0.6)

500 (0,0) 0 99.2 (1.0) 99 (2.2) 98.8 (2.8) 99.3 (1.2) 99.2 (1.0)

10 99.1 (1.1) 98.9 (2.7) 98.1 (3.5) 99 (1.1) 99 (1.1)

20 98.6 (1.3) 98.6 (2.6) 97.8 (3.5) 98.7 (1.5) 98.6 (1.5)

(5,5) 0 98.6 (1.1) 98.7 (2.9) 98.3 (3.9) 99 (1.2) 98.4 (1.1)

10 90.8 (0.7) 91.9 (2.6) 94 (3.2) 90 (0.7) 25.6 (0.6)

20 62.4 (0.0) 63.8 (0.8) 74.8 (1.6) 53.5 (0.8) 3.1 (0.3)

(10,10) 0 98.3 (1.7) 98.2 (2.2) 97.6 (3.8) 98.5 (1.4) 98.4 (1.7)

10 90.9 (0.1) 92.4 (1.5) 93.1 (2.5) 91.1 (0.4) 0.7 (0.2)

20 64.6 (0.0) 66.6 (0.9) 77.1 (1.4) 55 (0.3) 0.6 (0.4)

1000 (0,0) 0 99 (0.2) 98.9 (0.5) 98.7 (1.3) 99 (0.3) 98.9 (0.2)

10 98.1 (1.3) 98.1 (1.6) 97.8 (1.7) 98 (1.1) 97.9 (1.1)

20 99.1 (0.6) 98.9 (1.6) 98.8 (2.1) 99.1 (0.8) 98.9 (0.7)

(5,5) 0 98.8 (1.3) 98.7 (1.5) 98.7 (2.6) 98.7 (1.2) 98.8 (1.1)

10 91.3 (0.0) 92.4 (0.5) 93.9 (2.0) 91.8 (0.5) 25.8 (0.2)

20 64.7 (0.0) 66.7 (0.1) 76.4 (1.1) 53.4 (0.3) 3.2 (0.6)

(10,10) 0 98.3 (1.6) 98.1 (2.1) 98 (2.7) 98.7 (1.2) 98.3 (1.6)

10 90.1 (0.1) 91.8 (0.4) 93.8 (1.4) 90.8 (0.4) 0.5 (0.2)

20 64.8 (0.0) 67.2 (0.0) 77.3 (1.1) 53.1 (0.3) 0.7 (0.1)

Table 2 presents the results obtained from simulations conducted for Case A samples. The findings clearly indicate that when the phase I data contains outliers, the control charts based on robust methods exhibit a high success rate with minimal FAR compared to the classical control chart. Furthermore, the success rates of the proposed control charts improve with increasing sample size. Conversely, when the phase I data is uncontaminated, the robust control charts demonstrate similar performance to their classical counterpart.

Table 4: SR (FAR) obtained for Case C with different values

of Qi and £

n Q1 £ GK(Qn) GK(Sn) GK(MAD) GK(t) Classical

50 (0,0) 0 97.9 (4.7) 96.2 (4.9) 91.6 (2.3) 99.6 (2.4) 100 (1.2)

10 98.8 (3.8) 97.1 (3.1) 95.3 (2.4) 99.3 (2.7) 100 (1.3)

20 99 (4.1) 98.8 (4.1) 96.5 (3.9) 99.1 (4.3) 100 (1.8)

(5,5) 0 98.3 (3.7) 96.8 (3.3) 90 (2.7) 99.2 (1.9) 100 (0.6)

10 95.1 (2.1) 85.6 (2.8) 75.9 (2.1) 91.5 (3.7) 8.4 (0.2)

20 78.6 (0.2) 45.9 (1.3) 38 (1.1) 36.7 (2.0) 0.9 (0.3)

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(10,10) 0 98.8 (5.2) 94.9 (4.0) 90.8 (2.9) 99.1 (2.6) 100 (1.0)

10 96.3 (3.2) 87.8 (2.5) 77.4 (1.7) 92.7 (2.9) 0.8 (0.6)

20 76.6 (1.7) 47.6 (1.4) 37.1 (1.0) 38 (1.5) 0.6 (0.4)

100 (0,0) 0 100 (2.5) 98.2 (3.3) 96 (3.7) 99.9 (1.6) 100 (1.0)

10 100 (2.1) 99.2 (2.6) 97.7 (3.5) 99.9 (1.7) 100 (0.6)

20 99.9 (2.8) 99.7 (3.1) 99.5 (5.2) 99.8 (3.3) 100 (1.9)

(5,5) 0 99.8 (2.6) 98.4 (2.9) 95.7 (2.5) 100 (1.6) 100 (0.5)

10 99.7 (0.6) 94.6 (3.5) 93.3 (2.0) 98.4 (2.7) 10.6 (0.1)

20 91.2 (0.1) 62.4 (2.3) 68.4 (1.6) 47.4 (1.3) 1.2 (0.4)

(10,10) 0 99.9 (3.5) 98 (4.5) 96.4 (4.3) 100 (1.9) 100 (1.5)

10 99.9 (0.7) 96.1 (3.4) 94 (3.2) 98 (2.7) 0.6 (0.1)

20 92.9 (0.2) 65.5 (1.8) 68.9 (1.5) 50.6 (1.5) 0.2 (0.1)

500 (0,0) 0 100 (1.0) 99.9 (1.9) 99.4 (3.4) 100 (1.0) 100 (0.8)

10 100 (1.5) 100 (1.8) 99.8 (3.5) 100 (1.6) 100 (1.3)

20 100 (2.1) 100 (2.4) 100 (3.8) 100 (2.1) 100 (1.7)

(5,5) 0 100 (1.3) 100 (2.5) 99.9 (3.3) 100 (1.7) 100 (1.4)

10 100 (0.4) 99.6 (1.5) 98.7 (2.3) 100 (0.5) 11 (0.6)

20 98.9 (0.0) 85.3 (1.0) 95.7 (1.0) 56.4 (1.4) 1.8 (0.0)

(10,10) 0 100 (1.3) 99.9 (2.5) 99.6 (3.3) 100 (1.5) 100 (1.4)

10 100 (0.2) 99.8 (0.8) 98.9 (2.6) 100 (0.4) 0.9 (0.4)

20 99.2 (0.0) 87.9 (1.2) 95.4 (1.9) 58.8 (1.0) 0.4 (0.2)

1000 (0,0) 0 100 (1.1) 100 (1.5) 99.9 (1.9) 100 (1.3) 100 (1.3)

10 100 (1.7) 100 (2.0) 100 (3.3) 100 (2.0) 100 (1.7)

20 100 (1.9) 100 (2.5) 100 (3.5) 100 (1.7) 100 (1.7)

(5,5) 0 100 (0.9) 100 (1.2) 99.9 (1.8) 100 (0.8) 100 (0.8)

10 100 (0.3) 99.9 (0.8) 100 (1.2) 100 (0.5) 10 (0.3)

20 99.7 (0.0) 89.9 (0.3) 98.9 (0.9) 57.1 (0.4) 2 (0.3)

(10,10) 0 100 (1.0) 100 (1.0) 100 (2.2) 100 (1.3) 100 (1.0)

10 100 (0.1) 100 (0.8) 99.8 (1.6) 100 (0.0) 0.6 (0.2)

20 99.9 (0.0) 88.2 (0.1) 98.5 (1.1) 59.4 (0.0) 0.2 (0.1)

Results obtained for Case B samples are presented in table 3. It is clear that the proposed control charts perform well with respect to SR and FAR in almost all cases. Even though, their performance is not very satisfactory in cases of small samples with high contamination, their SR and FAR are getting better with increasing sample size.

In all the three cases the proposed robust GK based control charts are performing better than the classical control chart in contaminated situations and they show similar performance as that of classical chart in uncontaminated cases. It is interesting to see that, among the robust control charts, the one based on GK(Qn) shows better success rate than other robust control charts when n < 500 while the control chart based on GK(MAD) outperform others for large samples when the data from Case A and Case B. But, when the data come from Case C, in all the cases GK(Qn) based control chart shows superior performance.

4.1. Time Complexity

The proposed methods' significantly faster computation times are a key benefit. To compare the computation times of these approaches for various sample sizes, a simulation study was done. The simulation study consists of 10,000 trials, and the average running time is presented in table 5. Among the robust methods, the control chart based on GK(Sn) is multiple times faster than the other three methods when n < 500 while GK(MAD)-based control chart performs faster than the other compared methods when n > 500.

Table 5: Running time (in seconds)

n GK(Qn) GK(Sn) GK(MAD) GK(T)

50 0.000128 0.000080 0.000207 0.000361

100 0.000222 0.000093 0.000231 0.000361

500 0.001065 0.000290 0.000304 0.000516

1000 0.002231 0.000526 0.000397 0.000992

4.2. Real life data

A data set given by Quesenberry [9] has been used to evaluate the performance of the proposed methods in real life data. The original data consists of 11 quality variables measured on 30 products from a production process. For our comparison purposes, we consider the third and fourth variables as our bivariate data. Bivariate control charts using the proposed robust methods and the classical methods are developed for this data and presented in figure 3. From charts it is clear that the data is outlier-free, and none of the methods, including classical method, commit any false detection. In order to evaluate performance in a contaminated situation, we artificially created two outlying observations. Observations 7 and 16 are changed from (21.5, 5.08) and (21.5, 15.32), to (22.75, 5.08) and (22.75, 15.32) respectively, by adding a very small shift of 1.25 in the first variable. Control charts of the contaminated data set are given in figure 3, and it is clear that except for the classical control chart, all the proposed control charts detected these outliers.

Figure 2: Control charts for uncontaminated Quesenberry data

Figure 3: Control charts for contaminated Quesenberry data

5. Conclusions

When quality characteristics are interdependent, monitoring them simultaneously is crucial. Using univariate techniques to monitor or analyse these data is frequently ineffective. The Hotelling's T2 control chart, which is created using the classical estimates of location and scatter, is the most well-known multivariate process monitoring and control method. Unfortunately, the existence of outliers

has a significant impact on the classical estimates, which results in incorrect T2 statistics computation. In light of this, using Hotelling's T2 control chart based on classical estimates will be quite deceptive. We address this problem for the bivariate case in this paper and suggest four robust control charts to handle outliers. The proposed control charts make use of the covariance estimator introduced by Gnanadesikan and Kettenring [2]. Four highly robust estimators - the Qn estimator, Sn estimator, MAD estimator, and т estimator - were used to robustify the GK estimator. These estimators have a bounded influence function and an ideal break down value.

Four different robust control charts for bivariate quality characteristics were introduced using GK(Qn), GK(Sn), GK(MAD) and GK(t) estimators. The upper control limits of these robust control charts are obtained by simulating the empirical quantiles of robust T2 statistics, while the lower control limits are set to zero. Performance of the proposed methods is evaluated through Monte Carlo simulation. Three cases of contaminated Phase I datasets were considered for various amounts of contaminations and in all the cases the proposed methods outperform the classical chart. The proposed methods were also applied to real-life datasets, and even though the data contained outliers, they were still able to identify the out-of-control observation. Another advantage of the proposed methods is their fast computation which will reduce computational complexity.

References

[1]. Alt, F.B. Multivariate quality control. In S. Kote and N. Johnson (Eds.), The Encyclopaedia of Statistical Sciences, Volume 6 (110-122), John Wiley & Sons, 1985.

[2]. Gnanadesikan, R. and Kettenring, J.R. (1972), Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics, 28:81-124.

[3]. Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69:383-393.

[4]. Yohai, V. J. and Zamar, R. H. (1986). High breakdown point estimates of regression by means of the minimization of an efficient scale. Journal of the American Statistical Association, 86:403-413.

[5]. Maronna, R.A. and Zamar, R.H. (2002). Robust estimates of location and dispersion for high dimensional datasets, Technometrics, 44(4): 307-317.

[6]. Rousseeuw, P. J. and Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424):1273-1283.

[7]. Wilks, S. S. Mathematical Statistics. John Wiley & Sons, 1962

[8]. Serfling, R.J. Approximation Theorems of Mathematical Statistics. John Wiley & Sons, 1980.

[9]. Quesenberry, C.P. (2001). The multivariate short-run snapshot Q chart. Quality Engineering, 13(4):679-683.

i Надоели баннеры? Вы всегда можете отключить рекламу.