Научная статья на тему 'Granularity explains empirical factor-of-three relation between probabilities of pulmonary embolism in different patient categories'

Granularity explains empirical factor-of-three relation between probabilities of pulmonary embolism in different patient categories Текст научной статьи по специальности «Математика»

CC BY
114
26
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
тромбоэмболия лёгочной артерии / гранулярность / низкая вероятность ТЭЛА / умеренная вероятность ТЭЛА / высокая вероятность ТЭЛА / pulmonary embolism / granularity / factor-of-three relation / lowprobability category / intermediate-probability category / high-probability category

Аннотация научной статьи по математике, автор научной работы — Beverly Rivera, F. Zapata, V. Kreinovich

Тромбоэмболия лёгочной артерии (ТЭЛА) — очень опасное заболевание, которое сложно обнаружить. Для диагностики тромбоэмболии лёгочной артерии практикующие врачи складывают из косвенных признаков этого заболевания единый балл и затем классифицируют пациентов по категориям: с низкой, умеренной и высокой вероятностью ТЭЛА. Эмпирический анализ показывает, что когда мы переходим от одной категории к другой, вероятность ТЭЛА увеличивается в 3 раза. В этой статье мы приводим теоретическое обоснование этого эмпирического отношения между вероятностями.

i Надоели баннеры? Вы всегда можете отключить рекламу.

Похожие темы научных работ по математике , автор научной работы — Beverly Rivera, F. Zapata, V. Kreinovich

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Pulmonary embolism is a very dangerous difficult-to-detect medical condition. To diagnose pulmonary embolism, medical practitioners combine indirect signs of this condition into a single score, and then classify patients into low-probability, intermediate-probability, and high-probability categories. Empirical analysis shows that, when we move from each category to the next one, the probability of pulmonary embolism increases by a factor of three. In this paper, we provide a theoretical explanation for this empirical relation between probabilities.

Текст научной работы на тему «Granularity explains empirical factor-of-three relation between probabilities of pulmonary embolism in different patient categories»

Mathematical

Structures and Modeling 2014. N. 4(32). PP. 128-133

UDC 519.1:616.24

GRANULARITY EXPLAINS EMPIRICAL FACTOR-OF-THREE RELATION BETWEEN PROBABILITIES OF PULMONARY EMBOLISM IN DIFFERENT PATIENT CATEGORIES

Beverly Rivera1

Research Assistant, PhD student, e-mail: barivera@miners.utep.edu

F. Zapata2

Research Assistant Professor, Ph.D. (Computer Science), e-mail: fazg74@gmail.com

V. Kreinovich1 Ph.D. (Math.), Professor, e-mail: vladik@utep.edu

'University of Texas at El Paso, El Paso, TX 79968, USA 2Research Institute for Manufacturing and Engineering Systems (RIMES), USA

Abstract. Pulmonary embolism is a very dangerous difficult-to-detect medical condition. To diagnose pulmonary embolism, medical practitioners combine indirect signs of this condition into a single score, and then classify patients into low-probability, intermediate-probability, and high-probability categories. Empirical analysis shows that, when we move from each category to the next one, the probability of pulmonary embolism increases by a factor of three. In this paper, we provide a theoretical explanation for this empirical relation between probabilities.

Keywords: pulmonary embolism, granularity, factor-of-three relation, low-probability category, intermediate-probability category, high-probability category.

1. Formulation of the Problem

Pulmonary embolism: a brief reminder. One of the most dangerous medical conditions is pulmonary embolism, a blockage of the main artery of the lung (or one of its branches) which can lead to collapse and sudden death; see, e.g. [1]. Pulmonary embolism is responsible for about 15% of sudden deaths.

If detected on time, pulmonary embolism can be treated: either by anticoagulation medicine like heparin or warfarin, or — in severe cases — by a surgery. The problem is that pulmonary embolism is difficult to diagnose: lungs are mostly normal, fever is either absent or low-grade, etc.

Scores: a brief description. Since pulmonary embolism is difficult to directly diagnose, hospitals' emergency departments take into account different variables like age, heart rate, different types of pain, etc., to produce a numerical score.

A high score indicates a high probability of pulmonary embolism; so, the doctors start applying aggressive treatment to such patients.

One of the most widely used ways to assign scores is known as the Geneva score; its latest version is described by [4]. Depending on this score, patients are classified into three categories:

• patients with low scores are classified into the low-probability category;

patients with intermediate scores are classified into the intermediate-probability category; and

patients with high scores are classified into the high-probability category.

Scores: empirical fact. According to an empirical study [4]:

• in the low-probability category, approximately 8% of the patients had pulmonary embolism;

in the intermediate-probability category, approximately 28% of the patients had pulmonary embolism; and

• in the high-probability category, approximately 74% of the patients had pulmonary embolism.

From each category to the next one, the probability increases by a factor of three.

What we do in this paper. Division into categories is a particular case of granularity; see, e.g., [5]. In this paper, following ideas from [2,3], we use granularity techniques to provide a theoretical explanation for the above empirical relation between probabilities.

2. Explanation

Main idea. We are interested in the situation where we estimate probability — a quantity which can only take non-negative values. In general, to estimate the values of a non-negative quantity, we select a sequence of positive numbers

... < eo < ei < e2 < ...

(e.g., 0.1, 0.3, 1.0, etc.), and every actual value x of the estimated quantity is then estimated by one of these numbers. Each estimate is approximate: when the estimate is equal to e,, the actual value x of the estimated quantity may differ from e,; in other words, there may be an estimation error Ax = e, — x = 0.

What is the probability distribution of this estimation error? This error is caused by many different factors. It is known that under certain reasonable conditions, an error caused by many different factors is distributed according to Gaussian (normal) distribution (see, e.g., [7]; this fact - called central limit theorem —

130 B. Rivera et al. Granularity Explains Empirical Factor-of-Three...

is one of the reasons for the widespread use of Gaussian distribution in science and engineering applications). It is therefore reasonable to assume that Ax is normally distributed.

It is known that a normal distribution is uniquely determined by its two parameters: its average a and its standard deviation a. Let us denote the average of the error Ax by Ae,, and its standard deviation by a,. Thus, when the estimate is e,, the actual value x = e, — Ax is distributed according to Gaussian distribution, with an average e, — Ae, (which we will denote by ej), and the standard deviation ai.

For a Gaussian distribution with given a and a, the probability density is everywhere positive, so theoretically, we can have values which are as far away from the average a as possible. In practice, however, the probabilities of large deviations from a are so small that the possibility of such deviations can be safely ignored. For example, it is known that the probability of having the value outside the "three sigma" interval [a — 3a, a + 3a] is « 0.1% and therefore, in most applications to science and engineering, it is assumed that values outside this interval are impossible.

There are some applications where we cannot make this assumption. For example, in designing computer chips, when we have millions of elements on the chip, allowing 0.1% of these elements to malfunction would mean that at any given time, thousands of elements malfunction and thus, the chip would malfunction as well. For such critical applications, we want the probability of deviation to be much smaller than 0.1%, e.g., < 10-8. Such small probabilities (which practically exclude any possibility of an error) can be guaranteed if we use a "six sigma" interval [a — 6a, a + 6a]. For this interval, the probability for a normally distributed variable to be outside it is indeed « 10-8.

Within this Gaussian description, what is the optimal granularity?

Optimal granularity: informal explanation. In accordance with the above idea, for each ei, if the actual value x is within the "three sigma" range

I = [e — 3a,, e, + 3a,],

then it is reasonable to take ei as the corresponding estimate.

We want a granulation which would cover all possible values, so each positive real number must be covered by one of these intervals. In other words, we want the union of all these intervals to coincide with the set of all positive real numbers.

We also want to makes sure that all values that we are covering are indeed non-negative, i.e., that for every i, even the extended "six sigma" interval

[e, — 6a,, e, + 6a,]

only contains non-negative values.

One of the main purposes of granularity is to decrease the number of "labels" that we use to describe different quantities. So, we want to consider optimal (minimal) sets of intervals. Formally, we can interpret "minimal" in the sense

that whichever finite subset we pick, we cannot enlarge their overall coverage by modifying one or several of these intervals. Let us formalize these ideas.

Optimal granularity: formal description. In the following definitions, we will use the fact that an arbitrary interval [a-,a+] can be represented in the Gaussian-type form [a—3a, a+3a]: it is sufficient to take a = (a-+a+)/2 and a = (a+—a-)/6.

Definition.

• We say that an interval I = [a — 3a, a + 3a] is reliably non-negative if every real number from the interval [a — 6a, a + 6a] is non-negative.

• A set [Ii], i = 1,2,..., of reliably non-negative intervals I, is called a granulation if every positive real number belongs to one of the intervals I,.

• We say that a granulation can be improved if, for some finite set [ii,..., ik], we can replace intervals I, with some other intervals Ii for which

fc fc fc fc

U Ii.c U I, U = U I, •

j=1 j=1 j=1 j=1

and still get a granulation.

A granulation is called optimal if it cannot be improved.

Proposition. In an optimal granulation, I, = [ai,ai+1 ], where ai+1 = 3a,.

This explains the fact that each next probability is three times larger than the previous one.

Proof.

1°. Let us first prove that for every interval I = [ai — 3ai,ai + 3ai] from an optimal granulation, ai = 6ai.

Indeed, since all the intervals Ii must be reliably non-negative, we can conclude that ai — 6ai > 0, hence ai > 6ai. So, to complete this part of the proof, it is sufficient to show that we cannot have ai > 6ai. We will prove this by showing that if ai > 6ai, then the corresponding granulation can be improved.

Indeed, in this case, we can take ai = ai/6 > ai, and consider a wider interval Ii = [ai — 3ai,ai + 3ai] D I,. Due to our choice of ai, this new interval is also reliably non-negative. Therefore, if we replace the interval I, by I,, we still get a granulation, and I, c Ii, I, = Ii. Thus, the original granulation can be improved. So, if the granulation is optimal (i.e., cannot be improved), we have a, = 6a,.

2°. Let us now prove that for every interval I, = [a-,a+] from an optimal granulation, a+ = 3a-.

Indeed, from Part 1 of this proof, we can conclude that for an arbitrary interval I, = [a-,a+] = [a, — 3ai,ai+3a,] from the optimal granulation, we have 3a, = 0.5-a,, hence a- = a, — 3a, = 0.5 ■ a, and a+ = a, + 3a, = 1.5 ■ a,. Thus, a+ = 3a-.

132

B. Rivera et al. Granularity Explains Empirical Factor-of-Three.

3°. Let us now show that if two intervals from an optimal granulation intersect, then this intersection can only consist of a single point.

To prove this, we will show that if two intervals i, = [a-,a+] and ij = [a-,a+] have a more extensive intersection, then the granulation can be improved. Without losing generality, we can assume that a- < a-.

We already know that since both i, and ij are intervals from an optimal granulation, we have a+ = 3a- and a+ = 3a-. Since a- < a-, we thus conclude that + _ + 1 j j ~ j

a+ = 3a,- < 3a- = a+.

The fact that the intervals i, = [a-, 3a-] and ij = [a-, 3a-] have an intersection means that a- < 3a-; the fact that this intersection is not simply a single point means that a- < 3a-. In this case, i, U ij = [a-, 3a-].

Let us show that we can improve the granulation if we replace i, by itself i, = i, and ij by ij = [3a-,9a-]. Indeed, both new intervals are reliably non-negative, and the new union i, U ij = [a-,9a-] is a strict superset of the old one — because a- < 3a- hence 3a- < 9a-.

4°. So, in an optimal granulation, every interval must be of the type [a, 3a], these intervals must cover the entire real axis, and they cannot intersect in more than one point. Thus, right after each interval [a,, 3a,], there should be the next interval [a,+1, 3a,+1], so we should have a,+1 = 3a,.

Thus, we get the description from the formulation of the theorem.

5°. One can also easily prove that the granulation in which i, = [a^a,^] with a,+1 = 3a, cannot be improved and is thus optimal. The proposition is proven.

Acknowledgments

This work was supported in part by the National Science Foundation grants HRD-0734825 and HRD-1242122 (Cyber-ShARE Center of Excellence) and DUE-0926721.

References

1. Goldhaber S.Z. Pulmonary thromboembolism // Kasper D.L., Braunwald E., Fauci A.S., Hauser S.L., Longo D., Jameson J.L. Harrison's Principles of Internal Medicine. Columbus, Ohio: McGraw Hill, 2004. P. 1561-1565.

2. Hobbs J.R., Kreinovich V. Optimal choice of granularity in commonsense estimation: why half-orders of magnitude // Proceedings of the Joint 9th World Congress of the International Fuzzy Systems Association and 20th International Conference of the North American Fuzzy Information Processing Society IFSA/NAFIPS 2001. Vancouver, Canada, July 25-28, 2001. P. 1343-1348.

3. Hobbs J., Kreinovich V. Optimal choice of granularity in commonsense estimation: why half-orders of magnitude // International Journal of Intelligent Systems. 2006. V. 21, N. 8. P. 843-855.

4. Le Gal G., Righinin M., Roy P.M., Sanchez O., Aujesky D., Bouameaux H., Perrier, A. Prediction of pulmonary embolism in the emergency department: the revised Geneva score // Annals of Internal Medicine. 2006, V. 144, No. 3, P. 165-171.

5. Pedrycz W., Skowron A., Kreinovich V. (eds.). Handbook on Granular Computing. Chichester, UK: Wiley, 2008.

6. Pregerson B. Quick Essentials: Emergency Medicine. Bel Air, Califonia: E.D. Insight Books, 2004.

7. Wadsworth H.M. (editor). Handbook of statistical methods for engineers and scientists. New York: McGraw-Hill Publishing Co., 1990.

ГРАНУЛЯРНОСТЬ ОБЪЯСНЯЕТ ЭМПИРИЧЕСКИЙ КОЭФФИЦИЕНТ 3 В ОТНОШЕНИЯХ МЕЖДУ ВЕРОЯТНОСТЯМИ ТРОМБОЭМБОЛИИ ЛЁГОЧНОЙ АРТЕРИИ У РАЗЛИЧНЫХ КАТЕГОРИЙ ПАЦИЕНТОВ

Б. Ривера1

научный сотрудник, аспирант, e-mail: barivera@miners.utep.edu

Ф. Запата2

научный ассистент, Ph.D. (Computer Science), e-mail: fazg74@gmail.com

В. Крейнович1

к.ф.-м.н., профессор, e-mail: vladik@utep.edu

1 Техасский университет в Эль Пасо, США 2Научно-исследовательский институт производственных и инженерных систем (RIMES), Техасский университет в Эль-Пасо, Эль-Пасо, США

Аннотация. Тромбоэмболия лёгочной артерии (ТЭЛА) — очень опасное заболевание, которое сложно обнаружить. Для диагностики тромбоэмболии лёгочной артерии практикующие врачи складывают из косвенных признаков этого заболевания единый балл и затем классифицируют пациентов по категориям: с низкой, умеренной и высокой вероятностью ТЭЛА. Эмпирический анализ показывает, что когда мы переходим от одной категории к другой, вероятность ТЭЛА увеличивается в 3 раза. В этой статье мы приводим теоретическое обоснование этого эмпирического отношения между вероятностями.

Ключевые слова: тромбоэмболия лёгочной артерии, гранулярность, низкая вероятность ТЭЛА, умеренная вероятность ТЭЛА, высокая вероятность ТЭЛА.

i Надоели баннеры? Вы всегда можете отключить рекламу.