Научная статья на тему 'Mathematical model for cancer prevalence and cancer mortality'

Mathematical model for cancer prevalence and cancer mortality Текст научной статьи по специальности «Медицинские технологии»

CC BY
52
11
i Надоели баннеры? Вы всегда можете отключить рекламу.

Аннотация научной статьи по медицинским технологиям, автор научной работы — Kalas Josef, Novotny Jan, Michalek Jaroslav, Nakonechny Olexander

The first part of the paper designs a deterministic model to describe cancer prevalence and mortality in a population. Next the asymptotic properties of the model are investigated. In the second part, the model is applied to real-world data. For selected model data, a numerical solution is found to the di erential equations describing the model, a long-term prediction is made with its results compared with those of predictions made by regression analysis, which are often used to model the prevalence and mortality in the present literature. It is shown that, although for short-term predictions (up to 10 years) both approaches are nearly equivalent, there is a major di erence between them if a longer-term prediction is made and finding a reliable prediction for a period longer than 10 years based on short time series seems to be unlikely.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Mathematical model for cancer prevalence and cancer mortality»

MATHEMATICAL MODEL FOR CANCER PREVALENCE AND

CANCER MORTALITY

© J. Kalas, J. Novotny, J. Michalek, O. Nakonechny

Institute of Mathematics. Faculty of Mechanical Engineering Brno UT e-mail: fusek.m@gmail.com Taras Shevchenko National University of Kyiv. Cybernetic Faculty e-mail: a.nakonechniy@gmail.com

Abstract. The first part of the paper designs a deterministic model to describe cancer prevalence and mortality in a population. Next the asymptotic properties of the model are investigated. In the second part, the model is applied to real-world data. For selected model data, a numerical solution is found to the differential equations describing the model, a long-term prediction is made with its results compared with those of predictions made by regression analysis, which are often used to model the prevalence and mortality in the present literature. It is shown that, although for short-term predictions (up to 10 years) both approaches are nearly equivalent, there is a major difference between them if a longer-term prediction is made and finding a reliable prediction for a period longer than 10 years based on short time series seems to be unlikely.

Introduction

Today, cancer is one of the major health risks of our civilisation. The statistics of the cancer prevalence and mortality related to the geographical distribution of such variables is a subject that has been receiving much attention in the present literature [3, 4, 6, 7]. The objective is to design good mathematical models that can be used to describe the changes in the prevalence numbers with respect to their prediction and to the prediction of mortality.

This paper is concerned with the design of a mathematical model based on differential equations for making reliable short-term predictions for a given population with the possibility of a long-term perspective. The model is then tested on real-world data and the resulting predictions are compared with the predictions obtained by regression analysis.

1. Model

We will use the following denotations in a population with cancer occurrence:

n1(t) number of people suffering from cancer (prevalence) at time t, n2(t) number of deaths from cancer (mortality) at time t.

The time interval in which the prevalence ni(t) and mortality n2(t) is to be modelled is (0,T) with T being a time horizon and, denoting by n(t) the population size at time t, n(T) gives the size of the observed population at the time horizon T.

When constructing the model, we assume the prevalence change over a time interval At to be proportional to the length of this interval next to the prevalence at t and, finally, to the logarithm of ^T) • Thus, as t increases and t is close to the time horizon T, the change in the growth rate drad1t(~t) is slower and, when the time horizon n(T) is reached, it almost vanishes. Similarly, we assume that the change in mortality over a time interval At is proportional to the length of this interval and to the mortality n2(t), and, finally, to the logarithm of ^(j)• Thus, when describing the prevalence behaviour, we see that it does not change in the limit case if the mortality reaches the value of prevalence.

The given considerations lead to the following system of differential equations for prevalence n and mortality n2 :

dni(t) , . -, ( n(T)

iW - aini (t) ln v ;

dt \n1(t)

dn2(t) /^n1(t) o^n (t) ln '

dt \n2(t)

These equations should be solved in terms of n and n2, subject to initial conditions ni(to) = nio and n2(t0) = n2o. The model has two parameters, a and a2, which affect the shape of n1 and n2, respectively. When fitting the model to a particular population data, the initial conditions are given, while the parameters a1 and a2 are to be estimated. The constant n(T) in equation (1), as mentioned above, denotes the size n of the whole population (e.g. of a given country) at time T - the horizon of the intended prognosis. This quantity should be estimated or based on an expert judgment.

2. The phase analysis of the model equations

It can be shown that the solutions of (1) have the form

n1(t) = exp {lnn(T) — cexp (—a1t)} . (3)

Inserting this into (2) yields an equation in n2 and t only, which is however nontrivial. Therefore we shall accomplish phase analysis of the autonomous two-dimensional system (1), (2) in the first quadrant of the phase space of (1), (2). It can be easily seen that, for the right-hand sides of (1) and (2), it holds that a1n1 ln ^^pj > 0 (< 0) iff

0 < n1 < n(T) (n1 > n(T)) and a2n2 ln ^^ > 0 (< 0) iff n1 > n2 > 0 (0 < n1 < n2). Hence the direction field of (1), (2) looks as in Figure 1. The nulclines of (1), (2) are lines n2 = n1 and n1 = n(T). From the direction field we infer that any trajectory of (1), (2)

o o

starting in the interior R+ of the first quadrant R+ remains in R+ for t ^ ro and any

trajectory is bounded. Taking into account the practical meaning of ni, n2, it is obvious that only trajectories lying in the interior of the shaded triangle T are admissible in our model.

U2 '

un \ lç

1 ^ X / Î

S

\ \

T

/ Î / t \

MT)

Fig. 1. Direction field of the system (1), (2).

Theorem 1. The autonomous system (1), (2) has a unique stationary point S = (n(T),n(T)) in the interior of the first quadrant. The trajectory starting at a point (n(T),n20) different from the stationary point S is a part of a straight line n1 = n(T).

o

Any trajectory starting in the interior T of the triangle T remains in T for increasing t and tends to the point S as t ^ œ (see Figure 2).

Proof: Any stationary point of (1), (2) is an intersection of nulclines of (1), (2).

o

Clearly, there is the unique intersection of the nulclines n2 = n1, n1 = n(T) in R+ at the point S = (n(T),n(T)). The solution with the initial point (n(T),n20), where n20 G (0, n(T)) U (n(T), œ), is of the form

(ni(t), n2(t)) = ( n(T), n(T) exp J ln exp[a(to - t)]

V I n(T)

The corresponding trajectory is a part of a straight line n1 = n(T). The Jacobi matrix of the mapping

/ x ( ! /XT)\ ! /V

(n1,n2) ^ a1n1in - , ln —

V V n1 J V n2,

is

" ' n(T ) a1 | —1 + in -

J (n,1,n,2 )

«1 (— 1+ln *?) 0

»2Û'2 fin ^ — ll

2 ni 2 y n2 J

0

Thus

J (n(T ),n(T ))

a1 0

a2 —a2

Since the eigenvalues of the matrix J(n(T),n(T)) are A1 = —a1 < 0, A2 = —a2 < 0, the stationary point S = (n(T),n(T)) is a stable node. With respect to the direction field

o

of (1), (2), we observe that any trajectory starting in the interior T of the triangle T

o

remains in T for t ^ ro. In view of the Poincare-Bendixson theory (see e. g. Hartman [2],

o

Chapter VII), the w-limit set i?(C+) of any trajectory C + starting in T is the set i?(C+) = {(n(T),n(T))}. This implies (n1(t), n2(t)) ^ (n(T),n(T)) as t ^ ro for any solution (n1(t),n2(t)) of (1),(2) corresponding to the considered trajectory. □

"2 t

n(T )

0 n(T) ni

Fig. 2. Phase trajectories corresponding to the solution (n1(t), n2(t)).

Theorem 2. If a2 > a1, then infinitely many trajectories of (1), (2) starting in T approach the stationary point S = (n(T),n(T)) as t ^ œ with the characteristic direction (a2 — a1;a2) and there is at least one trajectory of (1), (2) starting at T such that it approaches the point S with the characteristic direction (0,1). Moreover,

n1(t) = n(T) + e-ait [(a2 — a1)K + o(1)] as t ^ œ, n2(t) = n(T) + e-ait[a2K + o(1)] as t ^œ

o

for infinitely many solutions (n1(t), n2(t)) of (1), (2) starting in T, where k is a nonzero real constant dependent on the solution (n1 (t),n2(t)).

Proof: Denote ' = dt. The transformation x1 = n1 — n(T), x2 = n2 — n(T) converts the system (1), (2) into the system (written in a vector form)

—a1 «2

0

«2

+

a(x1 + n(T)) in xin+nTT) + «1^1 ^2(^2 + n(T)) in Xi+n(T) — a2X1 + «2X2^

with the singular point (x10,x20) (2). The transformation

(0, 0) corresponding to the singular point S of (1)

yields

0 a2 — a1

1 «2

—«2 0 0 —a1

+ F (y1,y2),

where

F (y1,y2)

F1 being defined by

Fl(X1,X2)

— «2 a2—ai 1

a2—ai

F1((a2 — «1 )y2,y1 + «2^2),

«1(^1 + n(T)) in ^) + «1X1 ^2(^2 + n(T)) in X1+n(T) — "2X1 + «2X2^

Notice that the inverse transformation to (4) is given by

a2—ai

1

a2—ai

It can be easily verified that ||F(y1;y2)|| / ||(y1,y2)||1+e ^ 0 as (y1;y2) ^ (0,0) for some e > 0, where || • || denotes the Euclidean norm in R2. The transformations used are

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

o

regular affine, the triangle T is converted to a new triangle T' and T' is an invariant set with respect to the system (5). Combining this with Theorem 3.1 from Chapter VIII of [2],

o

we get that infinitely many solutions (y1(t),y2(t)) of (5) with (y1(to),y2(t0)) G T' satisfy (y1(t),y2(t)) ^ (0, 0) and (y 1 (t), y2(t))/1|(y 1 (t), y2(t))|| ^ (0,1) as t ^ rc. Moreover, Theorem 3.5 from Chapter VIII of [2] provides the equations

y1(t) = e-ai*o(1) as t ^ rc, y2(t) = e-ai*(x + o(1)) as t ^rc

for these solutions, where k is a nonzero real constant. Using the transformation (4), the characteristic direction (0,1) of (5) is converted to the characteristic direction (a2 — ai,a2) and the relations n = n(T) + (a2 — ai)y2, n2 = n(T) + y1 + a2y2 yield the desired results. Note that the trajectory corresponding to the solution (n1(t), n2(t)) = (n(T),n(T) exp |ln nf0) exp[a2(t0 — t)]|) tends to the singular point S with the characteristic direction (0,1) as t ^ œ. □

The case a2 < a1 is analogous and from our data point of view is not important.

3. Parameter estimation

This section is concerned with parameter estimation « and a2. To estimate the parameters a and a2, we propose to minimize the L2 distance between the predictions and the real-world data. Consider real-world data for the years t0 • • • tm denoting them by ni0, • • •, nim and n20, • • •, n2m. Also denote the solution to (1), (2) by n^a^ t), n2(a^ a2, t) where the dependence on the parameters ai and a2 is stressed. The optimization problem can then be expressed as

m m

(nii - ni(ai,ti))2 + C2 (n2i - n2(ai, «2, t»))2], (6)

i=0 i=0

s.t. ai > 0, «2 > 0,

where ci and c2 are suitable weighting coefficients (in the basic setting ci = 1, c2 = 1 ).

As mentioned in the previous section, the solutions to (1) have the form (3). Substituting (3) into (2) yields a non-trivial equation in n2 and t only. Thus it is better, using computer, to integrate the equations (1), (2) numerically and use a black-box type solver for the problem (6). In this case, the solver requires that the objective function of (6) is evaluated on a sequence of points (ai, a2). For each such point, the equations (1), (2) are solved and subsequently the value of (6) is obtained.

By this approach, satisfactory results on the given data were achieved. We used Octave with the lsode ODE solver [5] to integrate the equations (1), (2), and the NOMAD [1] solver for the optimization.

4. Data

The model was tested for functionality using the data shown by Table 1. In processing prevalence the numbers of colon cancers were used (the cancer type being C18) in the Czech Republic's male population from 1989 to 2005, see [3]. The table is completed by further demographic data on the numbers of new born and deceased men as well as the total size of the Czech male population during the years in question.

Table 1. Men's population — C18 cancer type.

diseased total

year prevalence incidence mortality births deaths population

(ni) (n2 )

1989 3853 1505 1101 n.a. n.a. n.a.

1990 4075 1476 1153 n.a. n.a. n.a.

1991 4416 1730 1258 129354 63342 5006002

1992 4807 1710 1193 121705 61767 5013413

1993 5231 1756 1205 121025 59180 5019297

1994 5578 1835 1294 106579 58609 5020464

1995 6091 1886 1214 96097 58925 5016515

1996 6525 1951 1255 90446 56709 5012085

1997 7149 2234 1308 90657 56692 5008730

1998 7602 2163 1354 90535 55139 5005435

1999 8267 2325 1389 89471 54845 5001062

2000 8821 2323 1437 90910 54882 4996731

2001 9511 2459 1467 90715 53772 4967986

2002 10268 2603 1415 92786 54377 4966706

2003 10938 2559 1488 93685 55880 4974740

2004 11569 2460 1414 97664 54190 4980913

2005 12273 2622 1414 102211 54072 5002648

5. Results

Since the total population of the Czech Republic is steady, we estimate the value of n(T) to be approximately 5 000 000. The estimated parameters of the model (1) and (2)

are

a1 = 0.0111 a = 0.0119

and the fitted time dependencies are shown in Figure 3. It can be seen that the short-time predictions obtained from this model are reasonable, especially for prevalence.

-estimated

o data

years

(a) Prevalence.

(b) Mortality.

Fig. 3. Estimates.

In the event of a long-term prediction, the model achieves an equilibrium close to n(T) — see Figure 4. It is obvious however, that the model does not give a satisfactory description of reality in the long term. It is clear from the pictures that, for a short time horizon (of up to ten years) the predictions obtained seem to be realistic. Predictions for a long time horizon, however, are rather debatable.

5 4.5 4 3.5

8 3

C

■§ 2.5 * 2 1.5 1

0.5

02000 2100 2200 2300 2400 2500 2600 2700 2800 years

2000 2100 2200 2300 2400 2500 2600 2700 2800 years

(a) Prevalence. (b) Mortality.

Fig. 4. Long term estimates.

x 10

2000

1800

1600 -

1400

1200 -

1000

800-1-1-1-1-1-1-1-1-1-1-

1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010

1990

1995

2000

2005

years

x 10

x 10

6. Comparison with the regression model

In the medical community, linear regression models prevail nowadays. We present a regression model of both mortality and prevalence, based on the data of Table 1, which is to be compared with the model based on differential equations (DE model) developed in the previous section.

A linear dependence for mortality and a quadratic one for prevalence are the appropriate polynomial choices, as indicated by statistical tests of their coefficients differences from zero.

Mortality: m = + A(y — 1989) Prevalence: p = £o + A(y — 1989) + &(y — 1989)2

The regression coefficients are summarized in tables (3) and (2), and the fitted dependencies are depicted in Figure 5.

Table 2. Regression coefficients — mortality.

parameter estimate conf. interval (95%)

ßo 1144 1096 1190

ßi 21.4 16.4 26.4

Table 3. Regression coefficients — prevalence

parameter estimate conf. interval (95%)

ßo 3792 3714 3870

ßi 292 270 315

ß2 15.2 13.8 16.6

Figure 6 shows a comparison of the regression and DE models. The models will differ by more than 50 percent by 2040 in the case of mortality, and by 2070 in the case of prevalence. This considerable difference may be accounted for by the regression model dependent variables growing at a polynomial while those of the DE model at an exponential rate. Because of this, the use of either of these models for long-term predictions is considerably limited. However, the graphics give an outline of the behaviour of the observed quantities. Based on the comparison of the models, it may be concluded that the regression predictions, used quite often nowadays, are applicable to short-term predictions (of up to ten years). The values predicted by the regression approach are similar to those obtained from the dynamic DE model. For long-term predictions extending beyond 10 years, however, the methods differ considerably thus making a reliable prediction for this period based on the short data series rather unrealistic.

5 4.5 4 3.5 3

"55 2.5

1 a.

2 1.5 1

0.5 0

1995 2000

years

(a) Prevalence.

years

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(b) Mortality.

Fig. 5. Regression model.

— DE model

+ data

--- - regression

JT

O

E

1990 2000 2010 2020 2030 2040 2050 2060 2070 2080 2090 2100 years

1990 2000 2010 2020 2030 2040 2050 2060 2070 2080 2090 2100 years

(a) Prevalence. (b) Mortality.

Fig. 6. Model results comparison.

The paper was prepared with the support of the Czech Science Foundation — Projects MSMNo. 0021622418 and CQR No. 1M06047 and of the grant 201/08/0469 of the Czech Science Foundation.

References

1. Abramson M. A., Audet C., Couture G., Dennis J. E., Jr., and LeDigabelS.: The nomad project. Software available at www.gerad.ca/nomad. August 2009.

2. Hartman, P.: Ordinary differential equations, Society for Industrial and applied mathematics (SIAM), Philadelphia, 2002,(Second edition).

x 10

1600

1500

1400 -

1300 -

1200

1100- u

1000-1-1-1-1-1-1-1-1-1-1-

1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010

1990

2005

2010

x 10

x 10

5

3. Konecny M., Kubacek P., Stampach R., Kozel J. Strachon Z., Dite P., Kraus R., Koska P., Geryk E., Michalek J. and Odehnal J. Cancer prevalence in the Czech Republic 1989- 2005-2015 PF MU Brno, 2008, pp. 1-69.

4. Levi F., Lucchini F., Negri E., Boyle P., La Vecchia C.: Mortality from major cancer sites in European Union, 1955 - 1998. Ann. Oncol. 2003, 14:490-495.

5. Radhakrishnan K. and Hindmarsh A. C., Description and use of Isode, the livermore solver for ordinary differential equations, Tech. Rep., 1993. [Online]. Available: http://gltrs.grc.nasa.gov/cgi-bin/GLTRS/browse.pl?2003/RP-1327.html. October 2009.

6. Stracci F., Canosa A., Minelli L., Petrinelli A. M., Cassetti T., Romagnoli C. and La Rosa F.: Cancer mortality trends in the Umbria region of Italy 1978 - 2004: a jointpoint regression analysis BMC Cancer 2007, 7;10 p. 1-9.

7. Wingo P. A., Cardinez C. J., Landis S. H., Greenlee R. T., Ries L. A., Anderson R. N., Thun M. J.: Long-term trends in cancer mortality in the United States, 1930 - 1998. Cancer 2003, 97 (Suppl 12]: 3133-3275. Erratum Cancer 2005, 103 Suppl 12:2658)

i Надоели баннеры? Вы всегда можете отключить рекламу.