Научная статья на тему 'BAYES ESTIMATION OF CAPABILITY INDEX USING THREE-PARAMETER WEIBULL DISTRIBUTION'

BAYES ESTIMATION OF CAPABILITY INDEX USING THREE-PARAMETER WEIBULL DISTRIBUTION Текст научной статьи по специальности «Строительство и архитектура»

CC BY
68
38
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
Process capability index / Gibbs sampler / Three-parameter Weibull distribution

Аннотация научной статьи по строительству и архитектуре, автор научной работы — Sonam Gubreley, Ankita Gupta, Satyanshu K.Upadhyay

The process capability index is an important tool used in quality control and process improvement. Generally, the index is estimated under the assumption of a normal distribution, although some other distributions are also recommended in the literature. This paper instead considers a three-parameter Weibull distribution and obtains an estimate of the process capability index under the Bayesian framework. Bayesian development is based on the use of non-informative priors and the posterior sample-based inferences are drawn using an important Markov chain Monte Carlo technique, namely, the Gibbs sampler algorithm. Finally, a numerical illustration based on two real datasets is provided.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «BAYES ESTIMATION OF CAPABILITY INDEX USING THREE-PARAMETER WEIBULL DISTRIBUTION»

BAYES ESTIMATION OF CAPABILITY INDEX USING THREE-PARAMETER WEIBULL DISTRIBUTION

Sonam Gubreley1, Ankita Gupta2, and Satyanshu K. Upadhyay1

1Department of Statistics 2Statistics Section, Mahila Mahavidyalay Banaras Hindu University, Varanasi -221 005, India. sonamgubreley05@gmail.com

Abstract

The process capability index is an important tool used in quality control and process improvement. Generally, the index is estimated under the assumption of a normal distribution, although some other distributions are also recommended in the literature. This paper instead considers a three-parameter Weibull distribution and obtains an estimate of the process capability index under the Bayesian framework. Bayesian development is based on the use of non-informative priors and the posterior sample-based inferences are drawn using an important Markov Chain Monte Carlo technique, namely, the Gibbs sampler algorithm. Finally, a numerical illustration based on two real datasets is provided.

Keywords: Process capability index, Gibbs sampler, Three-parameter Weibull distribution

1. Introduction

With the advancement of technology, there is an ever-increasing demand for high-quality products and services. Smart manufacturing process employing various advanced technologies facilitate automation, enhance productivity, improve maintenance and monitoring and reduce scope of human error. However, associated software products need to be examined for quality assurance.

The quality and reliability of the product can be assessed through various statistical tools, among which, process capability index (PCI) has been found propitious by the manufacturers as it is useful in assisting decision-making and boosting efforts in process performance. PCI is a measuring tool for accurately analysing the potential of a process and its performance. For quality control engineers, it is extremely important since it quantifies the relationship between the process�s actual performance and the product�s predetermined parameters. The index ascertains whether the process meets the defined manufacturing prerequisites. In this regard many capability indices have been developed so far (see, for example, [31], [11], [14] and [5]). The first index put forward in the literature was Cp, which simply calculates the span of the specifications relative to the six-sigma spread in the process (see [31]). As per this index, the process mean is centred between the lower and the upper specification limits. One of the major issues with this index is that it does not take into account the location of the process mean relative to the specifications. Moreover, if the process is not centred on the specification region, it would be possible to have a substantial percentage of the products with characteristics outside the specification limit although Cp may be high. In order to overcome this problem, [11] introduced another capability index, Cpk, which takes process centring into account in addition to the spread of the specifications relative to the six-sigma spread in the process. In other words, it measures the distance between the specification limits closest to the average from the quality characteristic of interest. Mathematically, Cp and Cpk can be defined as USL and LSL are the upper and lower specification limits, respectively, �p denotes the process mean and .p represents the process standard deviation.

USL . LSL Cp = ,6.p (1)

Cpk = min(Cpu, Cpl), (2)

where

USL . �pCpu = ,3.p (3)

�p . LSL Cpl = ,3.p (4)

Both of these PCIs are defined under two important assumptions, that is, the process is under statistical control and the quality characteristic of the process of interest is normally distributed (see [31]). Perhaps, because of these assumptions, a bulk of literature is available on the estimation of PCIs under the assumption of normality (see, for example, [1], [2], [13] and [23]). However, industrial processes are often not normally distributed and, for such scenarios, the values of conventional PCIs may be absurd and possibly misrepresent the quality of the product. For example, one may refer to [10], [27] and [24] for a systematic and detailed coverage. In order to remove this discrepancy, [3] proposed the quantile-based measure to estimate the capability index for non-normal distributions, which is given as under.

USL . MM . LSL

Cpk = min , , (5)

Up . MM . Lp

where Up, Lp and M are the 99.865th, 0.135th, and 50th percentiles of the target distribution, respectively, USL and LSL indicate upper and lower specification limits. A value of Cpk <1 is unfavourable and indicates that the process is incapable, whereas, a value of 1 . Cpk . 1.33 indicates that the process is barely capable and Cpk . 1.33 shows that the process is capable to meet the consumers� requirements.

Besides normality assumption, several developments can be seen in literature on non-normal assumptions as well. [3], [14], [17], [16], [22], [12], [9], [26] and [20] are some of the important among other references where capability indices are estimated under the assumption of non�normal distributions. A thorough literature review on the estimation of PCIs for non-normal datasets reveals that most of the developments are done using classical framework and only a few of them considered Bayesian approach for estimating capability index. Further, in statistical process control, most of the datasets lie at a particular location, generally far from zero, and, therefore, it becomes imperative to assess capability index by considering a model which has a location parameter even if one is dealing with non-normal data. To the best of our knowledge, there is no reference in the literature that entertains a non-normal model with location parameter for estimating the capability index. To bridge this gap, this paper considers a three-parameter Weibull distribution for estimating the capability index and performs a Bayes analysis of the distribution.

The Weibull distribution is an important distribution that has received enough attention in the field of reliability and quality control. Its versatility stems from the fact that it incorporates increasing, decreasing and stable hazard rates for different values of its shape parameter (see

[18] and [15], etc). The literature on the analysis of Weibull distribution has considered both two-parameter and three-parameter form of model where the former model is defined without a threshold parameter. The two-parameter Weibull distribution is comparatively easier to deal with as compared to three-parameter model form and, therefore, the literature on both classical and Bayes analysis of two-parameter Weibull distribution is available in bulk (see, for example, [19], [28], [15], [25], among others). On the other hand, the three-parameter Weibull distribution is much richer because of the involvement of a threshold parameter although its analysis is slightly more challenging due to sometime unusual behaviour of the likelihood function, especially when the shape parameter is less than unity (see also [30] and [32]). As a result, this model is comparatively less entertained in the literature. [30], [32] and [28] are some of the important references among others where this form of the model is explored.

As mentioned, this paper is an attempt to provide Bayes analysis of the three-parameter Weibull distribution with ultimate objective of finding the estimate of PCI. The entire development is done using non-informative priors for the model parameters. It is seen that the resulting posterior is analytically intractable to draw exact posterior based inferences and, therefore, the paper utilizes an important Markov Chain Monte Carlo (MCMC) procedure, namely the Gibbs sampler algorithm, to simulate posterior samples and draw the sample based inferences including those of PCI. Finally, the proposed methodology is numerically illustrated on the basis of two real datasets from a juice manufacturing company.

The plan of the paper is as follows. The next section briefly describes the three-parameter Weibull model and its Bayesian formulation. Section 3 provides numerical illustration based on two real datasets. Finally, a brief conclusion is provided in the last section.

2. Model Formulation

2.1. Likelihood function The probability density function (pdf) of the three-parameter Weibull distribution is

..1

.

. x . � x . �

f (x|., ., �)= exp . , x > �; ., ., � > 0 (6)

.. .

where ., . and � are the scale, shape and location parameters, respectively. The distribution exhibits increasing hazard rate for . > 1, decreasing hazard rate for . < 1 and, for . = 1, the distribution reduces to two-parameter exponential model possessing constant hazard rate. Let us use the notation W(., ., �) to denote the three-parameter Weibull distribution given in (6). The reliability function and the hazard function of W(., ., �) at time t are, respectively, given by

t . � .

R(t)= exp . , (7)

.

and

. t . � ..1

h(t)= . (8)

.. Similarly, the expressions for Up, Lp and M for the model W(., ., �) can be written as

1

Up = .[2.86967] . + �, (9)

1

Lp = .[0.00058] . + �, (10)

and

1

M = .[ln 2] . + �, (11)

respectively.

Let us now assume that an experiment consisting of n units is being conducted and let x = (xi; i = 1, 2, ..., n) be the resulting observations. Then, the likelihood function for the dataset x can be expressed as

n ..1 n

.

nn

xi . � xi . �

L(x|., ., �)=

(12)

. .

i=1 i=1

Sonam Gubreley, Ankita Gupta, S.K. Upadhyay RT&A, No 1 (77)

BAYES ESTIMATION OF CAPABILITY INDEX Volume 19, March 2024

2.2. Bayesian formulation

To conduct Bayesian analysis, it is essential to specify prior distribution for the parameters of the entertained model. Several types of priors are proposed in the literature for the Weibull parameters. The paper, however, considers joint non-informative prior as suggested by [32] and the same is given as

1

g(., ., �) . . (13)

.. Obviously, the parameter � is assigned a constant prior over the positive real space. The updated belief in the form of posterior distribution can be obtained by combining the prior distribution as specified in (13) with the likelihood function given in (12) via Bayes theorem. The joint posterior up to proportionality can, therefore, be written as

.n.1 n n .

xi . �

(xi . �)..1

p(., ., �|x) .

.

.

. > 0, . > 0, � < min(x). (14)

exp

.

;

.n.+1 .

i=1 i=1

Obviously, the posterior given in (14) is analytically intractable and, therefore, one has to proceed with some approximation or simulation based alternative approaches for drawing the desired inferences from the posterior. As mentioned, this paper considers Gibbs sampler algorithm, an important MCMC procedure, because of its straightforwardness and ease of implementation. The algorithm requires specification of low-dimensional full conditionals for simulating the high dimensional posterior where both full conditionals and the posterior need to be specified up to proportionality only. The algorithm starts with the appropriately chosen initial values for the variates and then simulates the full conditionals one by one in a cyclic fashion with most recent available values for all the given variates at every stage. Obviously, the appropriately chosen initial values are updated after the first cycle of iteration from all the full conditionals. The process is continued for a large number of cycles until some systematic pattern of convergence is achieved among the generating variates. Moreover, it can be easily seen that the posterior

(14) results in three one-dimensional full conditionals corresponding to ., . and � and these full conditionals can be easily simulated resulting in an easy implementation of the Gibbs sampler algorithm. For further details on the algorithm, one can refer to [7], [6] and [32], among others.

Coming on to the full conditionals derived from (14), it can be seen that the full conditional for . happens to be the kernel of gamma distribution after appropriate transformation and, hence, . can be easily generated from a gamma generating routine (see [4]). The full conditional of . can be seen to be log concave and, therefore, . can be simulated using adaptive rejection sampling procedure (see [8]). The generation of � from its full conditional is based on the rejection

.

algorithm using the envelope density g1(�|., x1)= (x1 . �)(..1); x1 > �, where x1 is

.

x

1

minimum of (xi; i = 1, 2, ..., n) (see [32] for further details).

3. Numerical Illustration

For numerical illustration of the proposed formulation, the paper considers two real datasets on the weights (in grams) of thirty juice packs of grape and strawberry flavours. In the discussion that follows, the dataset on weights of juice packs of grape flavour is referred to as the Data1 whereas that of strawberry flavour is referred to as the Data2. The two datasets are presented in Table 1 and these are actually collected to assess the process of filling powdered juice bags. The two datasets were first reported by [21] where the authors analysed the datasets under the assumption of normal distribution and evaluated Cpk by considering the specification limits as: LSL= 18.0 and USL= 22.0. These specification limits were specified in accordance with the guidelines provided by the National Institute of Metrology, Quality and Technology (INMETRO), the Brazilian organisation responsible for the quality control.

Before proceeding with the analysis of datasets, let us plot the control charts with the specification limits of 18.0 and 22.0. The control charts are presented in Figure 1 where the red line corresponds to Data1 and the blue line corresponds to Data2. Moreover, the specification limits 18.0 and 22.0 suggest that the process must hover around the mean of these specification limits although the Figure 1 clearly suggests that the process is not centred around its mean. In fact, there are certain values that lie outside the provided range, which ultimately suggest that the process is out of control.

Table 1: Data on weights (in grams) of juice packs

Data1

21.011 20.635 21.732 21.333 20.587

20.587 21.784 21.088 20.997 21.100

22.155 21.116 20.707 20.413 20.822

20.883 20.930 20.908 20.897 20.486

20.935 21.867 20.814 20.795 21.520

20.537 21.438 20.621 20.975 20.919

Data2

22.572 21.376 20.768 21.833 19.970

21.583 21.813 22.025 20.892 20.241

21.816 21.232 21.730 20.529 21.435

21.106 20.519 21.263 20.684 21.233

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

19.624 21.150 20.962 21.024 20.316

21.942 21.495 20.819 20.973 21.115

Figure 1: Control chart for the two datasets.

Further, before carrying out the Bayes analysis of the considered datasets, let us check the compatibility of two datasets with the assumed model (6). The compatibility was examined based on Kolmogorov-Smirnov (KS) test statistic which was evaluated using maximum likelihood (ML) estimates of the model parameters. It may be noted that the ML estimates for ., . and � were found to be 0.693, 1.475 and 20.391, respectively, for Data1 and 2.635, 4.244 and 18.737, respectively, for Data2. Finally, for Data1, the KS statistic was found to be 0.110 with the corresponding p�value as 0.860 while for Data2, the KS statistic was 0.066 with the corresponding p-value as 0.998. Obviously, the two datasets provide good compatibility with the model W(., ., �).

For performing the Bayes analysis, the Gibbs sampler algorithm was implemented on the posterior (14) as per details given in subsection 2.2. Convergence monitoring was done using ergodic averages, obtained separately for each of the three variates, using a single long run of the iterating chain. It was found that 50K iterations were good enough for getting stationarity be�haviour of the ergodic averages. Once the convergence was assessed, equally spaced observations at a gap of 10 were chosen to make auto correlation negligibly small. In this way, a posterior sample of size 1K was taken from the marginal posterior of each of ., . and � (see also [29] and [32]). Once the samples of ., . and � are obtained, the same can be used in (9)-(11) by substitution to get the corresponding samples of size 1K from the posterior of each of Up, Lp and M. Finally, the samples of Up, Lp and M so obtained can be used to get the posterior samples of size 1K corresponding to Cpk given in (5).

Table 2: Estimated posterior summaries for ., ., � and Cpk

Estimated Posterior Summaries

Datasets Parameters Mean Median Mode 0.95 HPDI

Data1 . . � Cpk 0.701 0.700 0.698 0.587 0.816 1.493 1.490 1.483 1.215 1.794 20.384 20.386 20.391 20.350 20.412 1.227 1.221 1.211 0.931 1.532

Data2 . . � Cpk 2.663 2.589 2.439 1.770 3.725 4.259 4.139 3.901 2.522 6.242 18.706 18.780 18.928 17.715 19.561 0.870 0.869 0.867 0.695 1.053

Table 2 provides a few important posterior based summaries of different posterior charac�teristics corresponding to various entertained model parameters, each estimated on the basis of corresponding 1K posterior samples. These summaries are shown in the form of estimated posterior mean, median, mode and the highest posterior density intervals with 0.95 coverage probability (0.95 HPDI) for each of the two datasets. It can be observed from Table 2 that the estimated posterior mean, median and mode corresponding to each parameter for both the datasets are quite close to each other, implying that the posterior distributions are approximately symmetric. Furthermore, the width of 0.95 HPDIs for all the parameters are quite small indicating less variability in the estimated values of the parameters and, hence, ensuring the consistency of the estimated values. An important finding presented in Table 2 is that 1 . Cpk . 1.33 for Data1, indicating that the process is barely capable whereas for Data2 Cpk < 1 implying that the process is incapable and requires further improvement. A similar conclusion was drawn on the basis of control charts shown in Figure 1.

4. Conclusion

Technological advancements have typically led to an expansion of the industry, wherein the need for high-quality goods and services is reinforced by a competitive environment. From this vantage point, industries that deal with manufacturing are always susceptible to manufacturing process failures leading to the products that may not meet the desired specifications. The manufacturing sector has made extensive use of PCIs, providing a numerical gauge of a process�s ability to produce goods that satisfy the factory-set quality standards. In estimating PCIs, more often the assumption is made that the data are generated randomly using a normal model. Nonetheless, asymmetric data are found in many circumstances. This paper has successfully demonstrated the utility of the three-parameter Weibull model in estimating the aforesaid index. Further, the Bayesian methodology developed in the paper is also found to offer the intended inferences in a routine manner. The inferential results show that the process pertaining to Data1 is barely capable while that of Data2 is incapable to offer the desired quality assurance.

References

[1] Albing, M. (2006). Process capability analysis with focus on indices for one-sided specification limits. PhD thesis, Lulea Tekniska Universitet.

[2] Chou, Y. and Owen, D. B. (1989). On the distributions of the estimated Process capability indices. Communications in Statistics-Theory and Methods, 18(12):4549�4560.

[3] Clements, J. A. (1989). Process capability calculations for non-normal distributions. Quality Progress, 22:95�100.

[4] Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer-Verlag, New York.

[5] Feigenbaum, A. V. (1951). Quality Control: Principles, practice and administration: An industrial management tool for improving product quality and design and for reducing operating costs and losses. McGraw-Hill.

[6] Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. Journal of the American Statistical Association, 85(412):972�985.

[7] Gelfand, A. E. and Smith, A. F. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American statistical association, 85(410):398�409.

[8] Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(2):337�348.

[9] Gupta, P. K. and Singh, A. K. (2017). Classical and Bayesian estimation of Weibull distribution in presence of outliers. Cogent Mathematics, 4(1):1300975.

[10] Hosseinifard, Z., Abbasi, B., and Niaki, S. (2014). Process capability estimation for leukocyte filtering process in blood service: A comparison study. IIE Transactions on Healthcare Systems Engineering, 4(4):167�177.

[11] Kane, V. E. (1986). Process capability indices. Journal of quality technology, 18(1):41�52.

[12] Kashif, M., Aslam, M., Al-Marshadi, A. H., and Jun, C. (2016). Capability indices for non�normal distribution using gini�s mean difference as measure of variability. IEEE Access, 4:7322�7330.

[13] Kocherlakota, S. and Kocherlakota, K. (1991). Process capability index: bivariate normal distribution. Communications in Statistics-Theory and Methods, 20(8):2529�2547.

[14] Kotz, S. and Johnson, N. L. (2002). Process capability indices�A Review, 1992�2000. Journal of quality technology, 34(1):2�19.

[15] Lawless, J. F. (2011). Statistical models and methods for lifetime data. John Wiley & Sons, New York.

[16] Leiva, V., Marchant, C., Saulo, H., Aslam, M., and Rojas, F. (2014). Capability indices for Birnbaum�Saunders processes applied to electronic and food industries. Journal of Applied Statistics, 41(9):1881�1902.

[17] Lin, G., Pearn, W., and Yang, Y. (2005). A Bayesian approach to obtain a lower bound for the Cpm capability index. Quality and Reliability Engineering International, 21(6):655�668.

[18] Mann, N. R., Schafer, R. E., and Singpurwalla, N. D. (1974). Methods for Statistical Analysis of Reliability and Life Data. John Wiley & Sons, New York.

[19] Martz, H. F. and Waller, R. A. (1982). Bayesian reliability analysis. Wiley series in probability and mathematical statistics. John Wiley & Sons, New York.

[20] Meng, F., Yang, J., and Huang, S. (2021). Hypothesis testing of Process capability index Cpk from the perspective of generalized fiducial inference. Quality and Reliability Engineering International, 37(4):1578�1598.

[21] Molina, R. (2018). Tecnicas de avaliacao de medidas e verificacao dos indices de capacidade. Presidente Prudente-Sao Paulo.

[22] Panichkitkosolkul, W. (2016). Confidence intervals for the Process capability index Cp based on confidence intervals for variance under non-normality. Malaysian Journal of Mathematical Sciences, 10(1):101�115.

[23] Pearn, W. (1998). New generalization of process capability index Cpk. Journal of Applied Statistics, 25(6):801�810.

[24] Pina-Monarrez, M. R., Ortiz-Yanez, J. F., and Rodriguez-Borbon, M. I. (2016). Non-normal ca�pability indices for the weibull and lognormal distributions. Quality and Reliability Engineering International, 32(4):1321�1329.

[25] Ramos, P. L., Almeida, M. H., Louzada, F., Flores, E., and Moala, F. A. (2022). Objective Bayesian inference for the Capability index of the Weibull distribution and its generalization. Computers & Industrial Engineering, 167:108012.

[26] Saha, M., Dey, S., Yadav, A. S., and Ali, S. (2021). Confidence intervals of the index Cpk for normally distributed quality characteristics using classical and Bayesian methods of estimation. Brazilian Journal of Probability and Statistics, 35(1):138�157.

[27] Sennaro.glu, B. and .Senvar, O. (2015). Performance comparison of Box-cox transformation and weighted variance methods with Weibull distribution. Journal of Aeronautics and Space Technologies, 8(2):49�55.

[28] Singpurwalla, N. D. (2006). Reliability and risk: A Bayesian Perspective. John Wiley & Sons, New York.

[29] Smith, A. F. and Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Methodological), 55(1):3�23.

[30] Smith, R. L. and Naylor, J. (1987). A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution. Journal of the Royal Statistical Society Series C: Applied Statistics, 36(3):358�369.

[31] Sullivan, L. P. (1984). Reducing variability: A new approach to quality. Quality Progress, 17(7):15�21.

[32] Upadhyay, S., Vasishta, N., and Smith, A. (2001). Bayes inference in life testing and reliability via Markov chain Monte Carlo simulation. Sankhya:. The Indian Journal of Statistics, Series A (1961-2002), 63(1):15�40.

i Надоели баннеры? Вы всегда можете отключить рекламу.