Increasing the accuracy of macroeconomic time series forecast by incorporating functional and correlational dependencies between them

Moiseev N.; Volodin A.

Прикладная эконометрика, 2019, т. 53, с. 119-137. Applied Econometrics, 2019, v. 53, pp. 119-137.

N. Moiseev, A. Volodin1

Increasing the accuracy of macroeconomic time

series forecast by incorporating functional and correlational dependencies between them

The paper presents a parametric approach to forecasting vectors of macroeconomic indicators, which takes into account functional and correlation dependencies between them. It is asserted that this information allows to achieve a steady decrease in their mean-squared forecast error. The paper also provides an algorithm for calculating the general form of the corrected probability density function for each of modelled indicators. In order to prove the efficiency of the proposed method we conduct a rigorous simulation and empirical investigation.

Keywords: regression analysis; GDP; inflation; monetary base; unemployment; maximum likelihood method; probability density function; functional and correlation dependencies of macroeconomic indicators; projection accuracy; mean square error; Bayesian econometrics.

JEL classification: C52; C53.

1. Introduction

When constructing complex forecasting systems of macroeconomic processes researchers often resort to regression tools. This approach is one of the most common and yields not only point forecasts, but also confidence intervals with a given level of significance. Currently, many methods for constructing regression models have been developed, both by choosing the most optimal model, and by weighing the set of equations under consideration. Best model selection is usually carried out according to some efficiency criterion. One such criterion is the Bayesian information criterion, which was proposed by Schwarz (1978). Subsequently, a considerable amount of work was devoted to its application in econometrics, see, for example, (Raftery et al., 1997; Hoeting et al., 1999; Brock, Durlauf, 2001; Fernandez et al., 2001; Gar-ratt et al., 2003; Sala-i-Martin et al., 2004). Another common criterion is Mallows information criterion, which was first introduced in (Mallows, 1973) and is similar to Akaike information criterion, see (Akaike, 1973) and (Shibata, 1980), whose asymptotic optimality was investigated by Shibata (1980, 1981, 1983), Lee and Karagrigoriou (2001), Ing (2003, 2004, 2007) and Ing, Wei (2003, 2005). Also, there are a number of methods for conducting specification of regression equation, see, for example, (Moiseev, 2017a; Shehata, White, 2008). However, recently the focus of researchers has shifted more towards weighing models, since this procedure has proved itself as more efficient compared to selecting only one optimal model. Model averaging was pioneered

1 Moiseev Nikita — Plekhanov Russian University of Economics, Moscow, Russia; [email protected]. Volodin Andrei — University of Regina, Regina, Canada; [email protected].

by Bates and Granger (1969) and further developed in (Granger, Ramanathan, 1984). Since then, the synergistic effect of weighing models in terms of reducing the forecast error has been confirmed by a significant number of econometricians, and the effectiveness of this approach is rather unquestionable, see, for example, (Granger, 1989; Clemen, 1989; Hendry, Clements, 2002; Hansen, 2007, 2008, 2014; Claeskens, Hjort, 2003; Zubakin et al., 2015; Moiseev, 2016, 2017b; Stock, Watson, 2006).

However, regardless of the chosen modeling method, in most cases, each of the considered macroeconomic indicators is modeled and predicted according to the model specially constructed for it, i.e. separately from the others. Thus, the predicted indicators often appear to be incoherent with each other, or one of the functionally dependent indicators is predicted through the predictions obtained from the remaining indicators. For example, when predicting the indices of the Gross Domestic Product (GDP), the GDP deflator and GDP in constant prices one may use a trivial functional relation, more specifically the GDP index is equal to the product of the GDP deflator index and the real GDP index. In this connection, models are usually built only for two of these indicators, and the forecast for the third is obtained by expressing its value through the predictions of these models. However, with this approach, information that could be obtained from the model for the third indicator is lost, what can be characterized as a significant omission, since this information could improve the obtained forecasts. In this paper, we express an idea that it is possible to improve the accuracy of predictions of functionally and correlatively dependent indicators if, after forecasting each of them separately, we apply special adjustments by taking into account the dependencies between them. Since the system of statistical accounting of processes occurring in the economy at the macro level assumes a large number of functional and correlation dependencies between the measured characteristics of the economic structure, the proposed method for correcting predictions of regression models will have a wide range of application. It should be noted that ideas somewhat resembling those that are expressed in this paper have already been published in some international statistical journals and consider regression with multiple responses, i.e. modeling a vector of the target variables. Such models consist of a system of regression equations with the assumption of some degree of correlation between the modelled target variables. A number of papers have been devoted to nonparametric models with two target variables (biresponse models) estimated using smoothing splines, see for example (Wang et al., 2000; Chen, Wang, 2011; Welsh, Yee, 2006; Lestari et al., 2010), as well as using the polynomial approximation, see (Chamidah et al., 2012). The purpose of such models with multiple responses is to obtain more accurate predictions than for models with one target variable, since in the latter case only the influence of explanatory factors on the output variable is taken into account, and in the first one information about the interdependence between the predicted responses is added. This interdependence is usually represented in the form of the vari-ance-covariance matrix of errors, which is used to weigh the observed deviations in calculating the estimates of true model parameters by analogy with the generalized least-squares method. The maximum effect in this case is achieved if there is a sufficiently strong correlation relationship between the analyzed responses, which is clearly shown in the works of Ruchstuhl et al. (2000), Welsh et al. (2002) and Guo (2002). Typically, regression models with multiple responses are widely used in the analysis of categorical or panel data in the field of medicine and sociology, see (Wang et al., 2000; Chen, Wang, 2011; Welsh, Yee, 2006; Antoniadis, Sapatinas, 2007). However, in the field of economics, approaches that share a similar idea can produce models of higher quality. A prerequisite for this improvement in forecast accuracy, as mentioned above,

is the fact that there are functional and strong correlational dependencies among the main mac- :§

roeconomic indicators that can be used to improve the reliability of developed models. J

The paper has the following structure. In section 2 we propose various conceptual models of ^

the functional dependencies of macroeconomic processes and provide some examples of high- -j

ly correlated indicators. Section 3 presents a method of accounting for functional and correla- -2

tional dependencies between macroeconomic indicators when they are simultaneously predict- ^

ed using regression equation systems. Section 4 is devoted to simulation and empirical inves- a tigation of proposed methods in order to demonstrate its efficiency. Section 5 summarizes the obtained results, highlights the key characteristics of the proposed method and discusses directions for further research.

2. Conceptual model of interdependencies of macroeconomic processes

The interdependencies of macroeconomic indicators have both linear (for example, balance indicators) and nonlinear forms (for example, the above dependence of the GDP index on the price index and the index of products). In this paper we will consider several conceptual complex models based on functional and correlational interdependencies of macroeconomic indicators. Let us begin with the widely known Fisher equation of exchange, which links the money supply, money velocity and GDP.

M-V = PQ or M-V = GDP, (1)

where M is the aggregate of the money supply, V is velocity of M, P is the level of prices for goods and services, and Q is the number of goods and services sold.

If we consider the growth indices of each of these indicators, then (1) is transformed into the following equation:

M. V GDP,

t - t - ^ I •I = I (2)

±M ±0 GDP ■

Mt_i Vt_i GDPt_i

Thus, equation (2) is the first complex model of functional interrelation of three macroeco-nomic indicators considered in this paper. The second model is also a three-factor model and represents the functional dependency of real GDP, GDP deflator and nominal GDP.

pq . p-a = pq ^ j .j = j (3)

p q pq ' V /

p-qt p-q- p-q- P q pq

where the first factor is the Paasche index, and the second one is the Laspeyres index.

By incorporating model (3) into model (2), we obtain the third conceptual model, which includes five above mentioned factors.

2-1 • I = 2-I ^ I ■ I = Igdp + Irgdp 'Ip (4)

^ 1M 1V ^ 1 GDP ^ 1M 1V r. ■ VV

If we move from growth indices to nominal indices, we can construct even more complex models of functionally dependent macroindicators. Let us consider in more detail the nominal GDP, which is the total market value of all finished goods and services produced and sold on the territory of the country during the year at current market prices.

There are several ways of GDP calculation, one of which is based on accounting costs. According to the expenditure method, GDP can be represented as follows:

GDP =C + In + G + Ex-Im, (5)

where C is total final consumption, In — investments, G — government expenditure, Ex — total export, and Im — total import.

If we apply considered method of calculating GDP to formula (1), then by trivial mathematical operations one can obtain the following conceptual model of the interrelation of macroeco-nomic indicators, which incorporates ten indicators:

RGDP ■ I + GDP + C + In + G + Ex - Im

MV =-p-. (6)

3

Next, consider GDP per capita, which is one of the main indicators of well-being and life standard in a country or a region. This indicator is calculated as follows:

GDP P '

where P denotes population.

GDP t =-, (7)

per capita n ' ^ '

Then, if we incorporate GDP per capita into the model (6), you we get the following conceptual model of the relationship of considered macroindicators:

RGDP ■ Ip + GDP + C + In + G + Ex - Im + GDPper capita ■ P

MV =-p- --(8)

4

Next, let us move on to the second method of calculating GDP, which is based on accounting of income. According to this method, GDP is calculated as:

GDP =Y + A +T -S + Z, (9)

where Y — national income, A — amortization, T — indirect taxes, S — subsidies and Z — net income from foreign production factors.

National income, in its turn, consists of the following components:

Y = W + R + Int + O, (10)

where W — wages before taxes, R — rental payments, Int — interest accrued and O — gross profit of organizations.

Thus, if we incorporate considered method of calculating GDP into model (8), then we can construct a more comprehensive macroeconomic equation

5-M-V = RGDP■ Ip +GDP + C + In + G + Ex - ^

-Im + GDPpercapita-P + W + R + Int + O + A + T - S + Z. (11) |

This model includes twenty different macroeconomic indicators, placed together in one equa- a tion. Therefore, we conclude that the system of statistical macroeconomic indicators is constructed in such a way that it is possible to form a model that includes major macroeconomic indicators, establishing functional dependencies between them.

Thus, based on the proposed model, it is possible to create an integrated system for forecasting major macroeconomic indicators of a country, which allows obtaining more consistent predictions.

It should be noted that, in addition to functional dependencies, macroeconomic indicators often have a relatively close correlation dependence, which can also help improve the quality of the forecasts obtained. The first example of such indicators is GDP deflator and consumer price index (CPI). Both these indicators characterize the rate of price growth in the economy, but are calculated on a different set of goods and services. GDP deflator takes into account price increase for all goods and services included in the calculation of GDP for a certain period, whereas CPI is calculated according to the set of goods included into the consumer basket. Figures 1 and 2 show scatter plots based on the quarterly data of GDP deflator and CPI for the United States from 1947 to 2017.

Fig. 1. Scatter plot of quarterly inflation by GDP deflator and CPI in USA

As can be seen in Fig. 1, GDP deflator and the CPI are closely related (determination coefficient is 0.62, and correlation coefficient is 0.79). Particularly important here is the fact that CPI inflation is forecasted using the GDP deflator for the same period significantly better than using the first-order autoregression model. The same conclusion is also valid for GDP deflator. Therefore, it is reasonable to assume that by taking into account given correlation dependence may substantially increase the accuracy of obtain forecasts when modelling these two indicators simultaneously.

Another example of strong correlation between two time series is relationship between two major US stock indices, more specifically Dow Jones Industrial Average (DJIA) and Standard & Poor's 500 (S&P 500). Figure 2 shows the scatter plot of daily growth rates for the indicators in question.

5

о

R2 = 0.9361

5

о

-4

-5

Dow Jones Industrial Average

Fig. 2. Scatter plot of daily growth rates for DJIA and S&P 500

As it can be seen from Fig. 2 DJIA and S&P 500 have almost a linear relationship (determination coefficient is equal to 0.936, and correlation coefficient is 0.967). Stock indices are rather difficult to forecast as they are considered to be martingales. However, the use of this correlational dependence can still improve the quality of obtained forecasts, what will be shown below. Note that there are many more correlational dependencies between macroeconomic indicators which for the sake of brevity were omitted in this paper.

Thus, it can be concluded that in the case of complex simultaneous forecasting of time series of macroeconomic indicators, it is possible to incorporate functional and correlation interrelations between the simulated indicators into the system of developed models. Therefore, obtained forecasts become coherent with each other, what has a positive effect on the accuracy of predictions.

3. Method of calculating adjustments

Let {yt,Xt: t = 1,...,n} be a considered real-valued sample where yt is a target variable and Xt =(l, x1t,x2t,...) is a countable dataset of possible explanatory variables. Then linear regression model for yt will look as follows:

У = XtB + et or

(12)

(13)

(14)

Here we presuppose that classical OLS assumptions hold, Assumption 1. Strict exogeneity, i.e. E (et | X) = 0. Assumption 2. Homoscedasticity, i.e. E (e2t \X) = a2. Assumption 3. Normality, i.e. et ~ N(0; a) .

Assumption 4. No perfect multicollinearity, i.e. XTX is a positive-definite matrix. ¡g

Assumption 5. No autocorrelation, i.e. cov(eq,eu) = 0, Vq ^ u .

Then probability density function (pdf) for one-step-ahead forecast for the value yn+1 is subject to location-scale Student's distribution with v =n —m — 1 degrees of freedom:

r> + 1

Уп+1 ) = "

4MSIE+ 2

1 + (yn+1 Уп+1 ) v-MSFE+i

(15)

where yn+1 is the location parameter, computed according to formula (13), and MSFEn+1 is the scale parameter, computed in a usual way as shown below:

MSFE„

= s2 (1 + XTn+l (XTX)~l X+ ) , (16)

where Xn+i is a column vector of the values of explanatory variables involved in constructing a forecast for the period n + i, 52 is an unbiased error variance estimator, calculated as follows:

= -XU —y.)2. (17)

n — m — =1

Further, suppose that there is a set of target variables y®, y(2),..., y(K), each of which is modeled using the data set X (1), X (2),..., X(K) respectively. Besides that, there is a known functional form that binds all considered target variables together:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

,(0

= (y«..., yf®, y^,..., y(K)), (18)

where f denotes a function, which expresses y® in terms of other explanatory variables under consideration.

Then it is possible to correct obtained predictions, taking into account their probability density functions and a known functional relationship between the target variables. This procedure is proposed to be carried out using the Maximum Likelihood Method (MLM). The essence of the method is to maximize the so-called likelihood function, which is the joint probability density of a set of analyzed random variables. In the case of the functionally dependent target variables mentioned above, the joint probability density can be represented as follows in terms of the function f1:

LH = V (f (y®, y®,.,yn+^K (y+1)-V (yn+1). (19)

Thus, the procedure of adjusting the forecasts is reduced to finding such values of predicted random variables that would maximize expression (19). In order to reduce computational complexity of calculating the optimal parameters when maximizing the likelihood function (19), we resort to the maximization procedure for its log-likelihood function:

log -LH = log ^ (f (y +, y®,..., yi+1)) + log V 2 (y<2> ) + ■■■ + log V, (y^). (20)

Let's illustrate the process of making corrections to the predicted values using the example of two target variables that are functionally dependent on each other. Suppose we are constructing a one-step-ahead forecast for two target variables and y(+1, and y(+1 = f (y^2-!). Each

of the considered target variables is predicted by its own set of explanatory factors X(), X(2) respectively. Then the result of such parallel independent work of these two regression models is two location-scaled t-distributions, which characterize y^ and yi+l. However, the calculated predictions may turn out to be incoherent, especially in case when sets of explanatory variables significantly differ from each other, i.e. the known functional dependence y(+1 = f (yi+l) will be violated. Therefore, by applying the correction by maximizing the log-likelihood function (20) we adjust obtained forecasts in order to bring them into coherence according to initially stated functional connection. Figure 3 schematically displays the procedure for such an adjustment.

As it can be seen in Fig. 3, initially considered target variables were predicted at the levels corresponding to the modes of each probability distribution. However, after the adjustment procedure, predicted values shifted to the points corresponding to extremum of the joint probability density. In other words, in presence of obtained probability distributions and previously known functional dependence between the target variables, corrected forecasts reflect the most probable situation of the joint occurrence of precisely such values of the target variables.

In addition to proposed adjustments to traditional predictions, it is also possible to obtain the adjusted probability density function for all target variables under consideration. This procedure is proposed to be carried out by calculating the marginal distribution for the analyzed target variable, which takes into account probability distributions of the remaining target variables and the functional relationship between them. Thus, according to this method, corrected probability density function is calculated as shown below:

У+1) =

/X У+

(i) Я+1

(21)

where the normalizing constant in the denominator is an integral of the likelihood function over all target variables under consideration, and Q( y^h) is computed as follows:

00 00

Q( y+1 ) = / - /V (ЬV -1 (Л-1 ( уП+1 yn+12), yn+1 У+1))

— О

XV, (y+1)-V ()dy® -dy^dyf^ -dy^.

X

(22)

Thus, from initial probability distribution for the target variable we can obtain the adjusted probability density function, whose variance would be less than or equal to the variance of the initial distribution, see Fig. 4.

0.6 0.5

----original pdf

- reradjusted pdf

-3-2-10 1 2 3

y (« + 1)

Fig. 4. Example of adjusted and initial probability density function

As it can be seen from Fig. 4, the modes of adjusted and initial distribution functions do not coincide, what indicates some correction of predicted value, since the mode of adjusted distribution corresponds to the maximum point of the likelihood function.

Next, consider the situation when considered target variables are bound not by a functional dependence but by a correlational one. Suppose we construct a system of models for target variables y®, y£2),..., y\K), each of which is modeled using the data set X ®, X (-2),..., X (K) respectively. Also, we assume, that all considered target variables are bound by some linear correlation function:

yf = b® + + ... + b—1 y(—1) + b+yf+1) +... + bK)#) + e,

(23)

Then, according to model (23), the probability density function for the predicted value of the target variable y^ is a location-scale /-distribution with v = n — K — 1 degrees of freedom:

V (yn+Jyl+1,., , yn+1,., ynK?) = V(j)n^ ,MSFE+, n — K — 1), (24)

where

y((° = + M +.+ b—y('—1) + b+y(+1) +.+ bK y(K), (25)

and MSFEn+1 is a scale parameter, which is computed straightforwardly according to formula (16).

Hence, we conclude that if we multiply the joint probability density by the density of the abovementioned distribution, we obtain the likelihood function that takes into account correlational relationship between considered target variables. The following formula presents the likelihood function for calculating the correction for the i-th target variable:

LH =V (yn+1 (yn+1 )■■• V K (yn+1)-Vi (yn+11 yn++1,., yn+?). (26)

It is easy to see from formula (26) that parameters of function v , namely the scale parameter and the location parameter, depend on the quantities y^+1, yn+1,., y^—1'1, y^«^,. ■ ■, y(+1. Therefore, when calculating the likelihood function, the functions V, (y^) and v, (y^) have the

same argument value, however for the first one parameters are constant, and for the second one parameters depend on the values taken by the other target variables during the optimization process. Here it should be specially noted that, despite the fact that, when maximizing the likelihood function (26), all target variables under consideration are subject to corrections, it is not appropriate to use this likelihood function to calculate a correction for target variables other than y(++1. Therefore, it is highly recommended that when calculating corrections for each of the target variables under consideration, we use an individual likelihood function in which the corrected target variable is expressed through the remaining target variables by regression line. Based on the form of formula (26), we can say that the corrections obtained by regressing one of the target variables on all the other variables are not equal to the corrections when regressing another target variable on all the remaining ones. Calculated adjustments will coincide only in the case of a functional relationship between the output variables under consideration. Thus, there arises the problem of choosing between the obtained adjustments, which in this paper is proposed to be solved by choosing the correction, in which the corrected target variable is modeled through the rest of the factors by regression line. In this way, it is possible to achieve a more substantial improvement in the quality of the forecasts obtained.

To calculate the corrected density function, we will, by analogy with the previous case, calculate the marginal distribution for the analyzed target variable, which takes into account the probability distributions of the other output variables and the correlational relationship between them. Thus, according to this method, the corrected probability density function is calculated as shown below:

where the normalizing constant in the denominator is the integral of the likelihood function over all target variables under consideration, and Q (y^) is calculated as follows:

a () = / - (yn+ )■ ■ ■ *< ()- *, () •

— w — w

(y^! I yi+1 ,-, yi—1), yn+,-, y® - dy^dy^ -

4. Empirical and simulation testing of proposed methods

In order to test the efficiency of the methods proposed in this paper, we carry out simulation and empirical experiments. First, we check how well the developed correction system works for artificially generated data in the presence of a functional dependence between the predicted variables. Let us consider the simplest case, when three target variables ylt, y2i and y3i are modeled using unifactor linear regression model, namely:

Jit =b10 +b11*1t +eit,

y2t =b 20 21X2t +e2t,

(29)

y1t = b30 + b31x3t +e3t.

Moreover, simulated target variables are bound by the functional dependence y3t = y1t + y2t. Explanatory variables x1t and x2t are generated from the normal distribution with zero mean and unit variance, and the variable x3t is generated using the equation x3t = x1t + x2t. The true parameters of this experiment were established at the levels b10 = 2, b20 = 2, b11 = 07 and b21 = 07. True errors of the first two models are generated from the normal distribution with zero mean and unit variance, namely: e1t ~ N(0,1) and e2t ~ N(0,1). For the third model, true coefficients and errors are not explicitly generated, since y3t and x3t are defined functionally according to the values of factors and target variables for the first two models.

Table 1 shows the mean square realized forecast error for models (29), denoted as y1t, y2t and y3t, for models with the proposed corrections (y1t, y2t, y3t) and for models with explanatory variables x1t and x2t given below:

y1t = c10 + c11 x1t + c12 x2t, y2t = C20 + C21 x1t + C22 x2t, (30) y3t = C30 + C31x1t + C32x2t.

To provide a comprehensive simulation experiment, the described models were tested on data frames of different lengths n and compared by the value of the mean-squared realized forecast error. To compute the results on each data frame we used 10000 simulations. Therefore, this experiment can be considered as a basis for making sound conclusions about the efficiency of proposed methods.

As it can be seen from Table 1, proposed method of correcting predicted target variables exceeds both models (29) and models (30) in accuracy for almost any data frame under consideration (performance of the best model is highlighted in bold). The only exception is the mean-squared realized error for model y1t for n = 100, which proved to be slightly less than the error obtained according to proposed model y1t. For all target variables and methods under consideration, the accuracy of the prediction improves with the increase of the number of observations and, for a sufficiently long data frame, the differences in the mean-square realized forecast error

(28)

Table 1. Comparison of the analyzed methods, simulation experiment, functional dependence

n Ун Уг, Уз, Уи Уг, Уз, Ун Уг, Уз,

5 1.8019 1.8276 3.5756 1.6442 1.6751 3.1601 3.4469 4.0148 7.8445

6 1.5515 1.5467 3.0982 1.4703 1.4768 2.8733 2.3229 2.3652 4.7144

7 1.4329 1.4091 2.7944 1.3656 1.3667 2.6249 1.8767 1.8832 3.7273

8 1.3962 1.3734 2.7777 1.3517 1.3329 2.6383 1.7161 1.7349 3.4859

9 1.3225 1.2916 2.5672 1.2927 1.2604 2.4694 1.6013 1.5391 3.0879

10 1.2638 1.3012 2.5377 1.2414 1.2768 2.4443 1.4808 1.5111 2.9322

15 1.1418 1.1421 2.3064 1.1293 1.1299 2.2582 1.2341 1.2623 2.5204

20 1.0857 1.1089 2.1876 1.0776 1.0991 2.1631 1.1575 1.1763 2.3346

50 1.0217 1.0273 2.0583 1.0189 1.0263 2.0527 1.0432 1.0533 2.1087

80 1.0151 1.0275 2.0108 1.0149 1.0263 2.0069 1.0292 1.0371 2.0383

100 1.0209 1.0005 2.0009 1.0212 0.9995 1.9939 1.0339 1.0107 2.0215

can be considered to be insignificant. However, when analyzing time series of macroeconom-ic processes, one almost always has to work under conditions of the scarcity of statistical data. It is well known that a too long data frame yields just as inaccurate forecasts as does a too short one. Therefore, when analyzing macroeconomic processes, application of proposed adjustments allows a significant reduction of the forecast error.

Next, let us proceed to the empirical testing of the correction method, which takes into account the functional relationship of the modelled target variables. Consider the simplest three-factor macroeconomic equation (3) described in section 1. In order to test the methods, we took the quarterly statistics for the USA were for these indicators, from Q1.1947 to Q3.2016. Thus, the set of statistical data for carrying out the empirical experiment constitutes 279 observations. The basic system of models is represented as an autoregression of the fourth order as shown below:

!pt = b10 + b11Ip(t—1) h b12Ip(t— 2) h b13Ip(t—3) h b14Ip(i—4),

qt b20 +b21^q(t-l) + b221 q(t-2) + b23^q(t-3) + b241 q(t-4)' !pqt = Ь30 + b311pq(t-1) + Ь321 pq(t-2) + Ь331 pq(t-3) + Ь341 pq(t-4) '

(31)

We also add to the comparison the models of three target variables under consideration with all the exogenous variables of the system (31). Thus, we obtain in some way an equivalent of the reduced form of a system of simultaneous linear equations.

1 pt bi0 + i1 p(t-i) + i1 q(t-i) + i1pq (t-i)' i=1 i=1 i=1 4 4 4

1 qt = Ь20 + 2b>2i1p(t-i) + 2jC2i1q(t-i) + 2<d2i1 pq(t-i) , i=1 i=1 i=1 4 4 4

1pqt = Ь30 + 2b3i1p(t-I) + 2<C3i1 q(t-i) + ^^li1 pq(t-i) '

(32)

Table 2 shows the mean-squared realized forecast error for models (31), (32) and models with corrections according to proposed method, calculated by abovementioned statistical data base.

4

Table 2. Comparison of the analyzed methods, empirical experiment, functional dependence

л л л

n I I I I I I Ip I Ipq

p q pq p q pq p q pq

20 0.9705 0.1103 0.9757 0.9123 0.1064 0.9702 2.4943 0.3209 2.4845

3G 0.8295 0.0857 0.8598 0.7985 0.0837 0.8362 1.2493 0.1734 1.3475

4G 0.7695 0.0775 0.8373 0.7444 0.0769 0.7971 1.0735 0.1281 1.1849

5G 0.6878 0.0779 0.7502 0.6592 0.0779 0.7181 0.9584 0.1032 1.0215

60 0.6662 0.0792 0.7364 0.6341 0.0791 0.7009 0.8775 0.0951 0.9421

7G 0.6654 0.0837 0.7683 0.6443 0.0828 0.7217 0.9275 0.0957 1.0054

8G 0.6562 0.0887 0.7718 0.6481 0.0887 0.7199 0.8932 0.0988 0.9639

9G 0.6516 0.0866 0.7941 0.6461 0.0867 0.7388 0.9257 0.0989 0.9981

1GG 0.5814 0.0776 0.6822 0.5694 0.0792 0.6537 0.8594 0.1081 0.8257

11G 0.5294 0.0577 0.6628 0.5247 0.0581 0.6242 0.6032 0.0701 0.7112

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

120 0.5324 0.0511 0.6626 0.5208 0.0521 0.6256 0.5933 0.0588 0.7183

As it can be seen from Table 2, proposed method of correcting predicted target variables in the majority of cases yields better forecasts than both models (31) and models (32) (performance of the best model is highlighted in bold). The only exceptions in this case are the mean-squared realized errors of Iqt for the data frames n = 90, n = 100, n = 110 and n =120, which as the result of the empirical testing happened to be slightly less than the error obtained according to the proposed model Iqt. Apart from the dynamics of the mean-squared realized forecast error according to the previous simulation experiment, in this case the dynamics of this error does not have a hyperbolic form. It can be tracked that the MSRE has a local minimum in the area, where n = 60, after which we observe a slight increase till n =90 . Thus, in the interval from n = 20 to n = 90, we have more of a parabolic dependence of the mean-square realized error on the length of the data frame, which is described in detail by Moiseev (2017b).

Next, we test the efficiency of proposed methods of correcting the obtained forecasts in case when considered target variables are related not by a functional, but by a correlational dependence. Let us begin with the simulation experiment. Let the two target variables y1t and y2t be simulated using a unifactor linear regression model, namely:

yit = biG + biixit >

У 2t = Ь20 +b21 X2t.

(33)

The simulated target variables are related by correlation dependence y2t = b0 + biy1t + et. For computer generation of endogenous and exogenous variables of this system, we use the method of generating correlated normally distributed random variables. First, we determine the true variance-covariance matrix

X1 X2 У У 2

xi 1.00 0.10 0.15 0.10

2= x2 0.10 1.00 0.10 0.15

У1 0.15 0.10 1.00 0.90

У 2 0.10 0.15 0.90 1.00

by which we set certain interrelations between exogenous and endogenous variables of the system. Next, we introduce a column vector of independent identically distributed (i.i.d.) variables Z4X1, which are subject to the normal distribution with zero mean and unit variance, which implies that

E (ZZT ) = Im .

After that, we generate the variables x1t, x2t, y1t and y2t by applying the Cholesky decomposition of the true variance-covariance matrix 2:

X1t = S[1,-]Zt, X2t = S[2,-]Zt, y1t = S[3,-]Zt, y2t = S[4,-]Zt , (34)

where 2 = S ST, S[7 . ] — 7-th row of lower triangular matrix S.

Thus, for each observation t, we obtain the values of random variables of this system of equations that have the true variance-covariance matrix 2. By analogy with the previous simulation experiment, the described models were tested on data frames of different length n and compared according to the mean-squared realized forecast error. To compute the results on each data frame we used 10000 simulations. Therefore, this experiment can be considered as a basis for making sound conclusions about the efficiency of proposed methods.

Table 3 displays the mean-squared realized forecast error for models (33), for models incorporating proposed corrections and for models including all the explanatory variables of the system listed below:

y1t = c10 h C11X1t h C12X2t, (35) y2t = C20 h C21 X1t h C22X2t.

To carry out the corrections, we use two likelihood functions. Corrections y11 and y21 are obtained by maximizing the following likelihood function:

LH = (y« K W ), (36)

and corrections y12 and y22 are calculated by maximizing the likelihood function, which is presented below:

LH = (y« K (y(h1 )*2 (y(+1 \ yn+?1). (37)

The difference between the likelihood functions (36) and (37) is that in (36) regression line is constructed for y(h1 with the factor y^h , and for the likelihood function (37), on the contrary, we use the regression equation for y^h with y^h as the explanatory variable.

As it can be seen from Table 3, proposed method of correction the forecasts in all cases under consideration works better than traditional models and models involving all exogenous factors of the system (performance of the best model is highlighted in bold). It displays a significant improvement in the quality of the forecast in comparison with the models (33) and (35), especially in the context of a short data frame. With a sufficiently large number of observations, the difference in the accuracy of forecasting between the analyzed approaches is minimal, but in the modeling of time series of macroeconomic processes such situation is extremely rare. Thus, it can be claimed, that proposed methods have a certain practical significance.

Table 3. Comparison of the analyzed methods, simulation experiment, correlational dependence

n У1 У 2 У1.1 У2.1 У1.2 У2.2 У1 У 2

5 1.7458 1.7253 1.4577 1.4812 1.4985 1.4371 3.3691 3.2243

6 1.5144 1.5095 1.3197 1.3378 1.3465 1.3142 2.2871 2.2294

7 1.3914 1.3867 1.2512 1.2597 1.2736 1.2387 1.8487 1.8112

8 1.3371 1.3074 1.2136 1.2288 1.2324 1.2136 1.6348 1.6468

9 1.2675 1.2454 1.1714 1.1728 1.1877 1.1601 1.5057 1.4979

10 1.2128 1.2222 1.1281 1.1504 1.1413 1.1379 1.4051 1.4161

15 1.1376 1.1253 1.0859 1.0872 1.0949 1.0808 1.2343 1.2253

20 1.0684 1.0813 1.0394 1.0485 1.0446 1.0438 1.1325 1.1389

50 0.9996 1.0116 0.9887 0.9982 0.9904 0.9976 1.0158 1.0235

The model y11, which is constructed with regressing line y1t on y2, displays a higher forecast accuracy than the model y12, using the regression of y2t on y1t. A similar conclusion can be made about the models y21- y22, where the corrections by regression line y2t on y1t work better for predicting the variable y2t. The mean-squared realized forecast errors for the compared models show similar downward dynamics with decreasing difference as the number of observations in the data frame increases.

Let us move on to carrying out the empirical testing of the method of correcting obtained forecasts of macroeconomic indicators taking into account correlational dependences between them. Here we will consider daily data on two stock indices for the USA: DJIA and S&P 500, which are discussed in detail in section 2. To ensure the stationarity of the analyzed data, we transform the initial values into growth rates. For convenience of further analysis, we denote the target variable DJIA as y1t, and the target variable S&P 500 as y2t. In this experiment, the modelling of the target variables in question is carried out using a simple autoregression model, namely:

y1t = b10 + b11 yKt—1),

( ) (38)

y 2t = b20 +b21 y 2(t—2).

Here we note that in order to avoid a high degree of multicollinearity between the lagged variables of the system (38) and, consequently, the low informativity of corrections for such a structure of equations, variable y2t is modeled using the lag t — 2 and not t — 1. By analogy with previous experiments of this section, described models were tested on data frames of different lengths n and compared by mean-squared realized forecast error. To compute the results on each data frame we used daily data on the US stock indices for the period from 05.10.2012 to 05.10.2017.

Table 4 displays the mean square realized forecast error for models (38), for models taking into account proposed corrections and for models with all explanatory variables of the system given below:

y1t = C10 + C11 y1(t—1) h C12 y2(t—2),

= ( ) ( ) (39)

y 2t = C20 + C21 y1(t—1) + C22 y2(t—2).

By analogy with the previous simulation experiment, we use two likelihood functions to carry out the corrections. Corrections y11 and y21 are obtained by maximizing the likelihood function (36), and corrections y12 and y22 are calculated by maximizing the likelihood function (37).

Table 4. Comparison of analyzed methods, empirical experiment, correlation dependence

n й У 2 й.1 й.1 Й.2 y2.2 yi У 2

5 1.0316 1.1901 0.8787 0.9914 0.8813 0.9731 1.6726 1.8255

6 0.9159 1.0409 0.7846 0.8589 0.7929 0.8524 1.3907 1.4832

7 0.8381 0.9439 0.7569 0.8232 0.7564 0.8107 1.1868 1.2695

8 0.7646 0.8501 0.7084 0.7645 0.7134 0.7627 1.0383 1.1243

9 0.7322 0.8268 0.6798 0.7433 0.6853 0.7418 0.9737 1.0591

10 0.6858 0.7825 0.6585 0.7182 0.6603 0.7157 0.8821 0.9529

15 0.6429 0.7235 0.6191 0.6723 0.6196 0.6701 0.7339 0.7872

20 0.6161 0.6753 0.5999 0.6424 0.6019 0.6436 0.6789 0.7301

30 0.5799 0.6476 0.5731 0.6209 0.5756 0.6208 0.6133 0.6643

50 0.5763 0.6267 0.5679 0.6111 0.5717 0.6118 0.5968 0.6406

Analyzing the results presented in Table 4, it can be concluded that proposed method of corrections yields the smallest mean-squared realized forecast error for any considered number of observations (performance of the best model is highlighted in bold). However, in comparison with Table 3, where the supremacy of the model y1 j over y1 2 and of the model y22 over y21 was ubiquitous, when modeling empirical data in individual cases, the model y12 works slightly better than the model y11 (with the number of observations n = 7 ), and also y21 is more accurate than y22 (with the number of observations n = 20 and n = 50). In all other cases, we observe a situation similar to Table 3, where on average the model y11 works better than the model y12, and the model y2 2 is better than the model y21. Thus, the empirical experiment as well as the simulation experiment verifies the recommendation made in section 3 that in the process of choosing between the resulting adjustments, it is recommended to choose the one at which the corrected target variable is modeled through the remaining ones.

It can be clearly seen that proposed method results in a significant improvement in the quality of the forecast in comparison with the models (38) and (39), especially under conditions of a short data frame. The reason for a bad performance of models (39) is the fact that each of these models includes all the exogenous variables of the regression equation system, which requires an estimation of a larger number of parameters than for models (38). Consequently, the overall uncertainty about the forecasted value increases because it incorporates the uncertainty in the estimation of each parameter of the regression equation. It is obvious, that this effect is gradually cancelled out with increasing number of observations. However, despite the fact that for a sufficiently long data frame, the difference in the accuracy of forecasting between the analyzed approaches is minimal, this method is still relevant, because when modeling time series of macroeconomic processes, the sufficiency of statistical data is extremely rare.

5. Conclusion

The paper presents a method of forecasts correction by taking into account functional and correlational dependencies between modelled macroeconomic indicators. The simulation and empirical experiments show, on fairly trivial examples, practical benefits of proposed methods while modelling time series of macroeconomic processes that functionally or correlatively

depend on each other. When making such corrections, obtained forecasts become coherent with :§ each other, what positively affects their quality. In general, developed methods result in a signif- J icant improvement in the quality of forecasts in comparison with the regression equations that ^ model each macroeconomic process separately, and also work better than models that include jj all the exogenous variables of regression equations under consideration. This positive effect is -2 achieved due to the use of information about the form of the functional or correlational depen- ^ dence between the forecasted target variables. As it is possible to functionally bind the majori- a ty of macroeconomic indicators, as shown in section 2, the method of correcting obtained forecasts proposed in this paper can be considered relevant for complex simultaneous forecasting of macroeconomic indicators. Also in section 2, it was shown that there are a lot of macroeconomic indicators that have a close correlational relationship and, as proven in this paper, this information can also be used to improve the quality of regression models.

Acknowledgements. This research was carried out within the framework of the basic part of a state commission in the sphere of scientific activity of the Ministry of Education and Science of the Russian Federation on the topic «Intellectual analysis of large-scale text data in finance, business and education on the basis of adaptive semantic models», project number 9577.

References

Akaike H. (1973). Information theory and an extension of the maximum likelihood principle. In: Pet-roc B., Csake F. (eds.), Proceedings of the 2nd International Symposium on Information Theory, 267-281.

Antoniadis A., Sapatinas T. (2007). Estimation and inference in functional mixed-effects models. Computational Statistics and Data Analysis, 51 (10), 4793-4813.

Bates J. M., Granger C. W. J. (1969). The combination of forecasts. Operations Research Quarterly, 20 (4), 451-468.

Brock W., Durlauf S. (2001). Growth empirics and reality. World Bank Economic Review, 15 (2), 229-272.

Chamidah N., Budiantara I. N., Sunaryo S., Zain I. (2012). Designing of child growth chart based on multi-Response Local Polynimial Modelling. Journal of Mathematics and Statistics, 8 (3), 342-247.

Chen H., Wang Y. (2011). A penalized spline approach to functional mixed effects model analysis. Biometrics,, 67 (3), 861-870.

Claeskens G., Hjort N. L. (2003). The focused information criterion. Journal of the American Statistical Association, 98 (1), 900-916.

Clemen R. T. (1989). Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5 (4), 559-581.

Fernandez C., Ley C. E., Steel M. F. J. (2001). Benchmark priors for Bayesian model averaging. Journal ofEconometrics, 100 (2), 381-427.

Garratt A., Lee K., Pesaran M. H., Shin Y. (2003). Forecasting uncertainties in macroeconomic modelling: An application to the UK economy. Journal of the American Statistical Association, 98 (464), 823-838.

Granger C. W. J. (1989). Combining forecasts — twenty years later. Journal of Forecasting, 8 (3), 167-173.

Granger C. W. J., Ramanathan R. (1984). Improved methods of combining forecast accuracy. Journal of Forecasting, 3 (2), 197-204.

Guo W. (2002). Functional mixed effects models. Biometrics, 58 (1), 121-128.

Hansen B. E. (2007). Least squares model averaging. Econometrica, 75 (4), 1175-1189.

Hansen B. E. (2008). Least-squares forecast averaging. Journal of Econometrics, 146 (2), 342-350.

Hansen B. E. (2014). Model averaging, asymptotic risk and regressor groups. Quantitative Economics, 5 (3), 495-530.

Hendry D. F., Clements M. P. (2002). Pooling of forecasts. Econometrics Journal, 5 (4), 1-26.

Hoeting J. A., Madigan D., Raftery A. E., Volinsky C. T. (1999). Bayesian model averaging: A tutorial. Statistical Science, 14 (4), 382-417.

Ing C.-K. (2003). Multistep prediction in autoregressive processes. Econometric Theory, 19 (2), 254-279.

Ing C.-K. (2004). Selecting optimal multistep predictors for autoregressive processes of unknown order. Annals of Statistics, 32 (2), 693-722.

Ing C.-K. (2007). Accumulated prediction errors, information criteria and optimal forecasting for autoregressive time series. Annals of Statistics, 35 (3), 1238-1277.

Ing C.-K., Wei C.-Z. (2003). On same-realization prediction in an infinite-order autoregressive process. Journal of Multivariate Analysis, 85 (1), 130-155.

Ing C.-K., Wei C.-Z. (2005). Order selection for same-realization predictions in autoregressive processes. Annals of Statistics, 33 (5), 2423-2474.

Lee S., Karagrigoriou A. (2001). An asymptotically optimal selection of the order of a linear process. Sankhya Series, A63, 93-106.

Lestari B., Budiantara I. N., Sunaryo S., Mashuri M. (2010). Spline estimator in multi-response nonpara-metric regression model with unequal correlation of errors. Journal of Mathematics and Statistics, 6 (3), 327-332.

Mallows C. L. (1973). Some comments on C Technometrics, 15 (4), 661-675.

Moiseev N. A. (2016). Linear model averaging by minimizing mean-squared forecast error unbiased estimator. Model Assisted Statistics and Applications, 11 (4), 325-338.

Moiseev N. A. (2017a). p-Value adjustment to control type I errors in linear regression models. Journal of Statistical Computation and Simulation, 87 (9), 1701-1711.

Moiseev N. A. (2017b). Forecasting time series of economic processes by model averaging across data frames of various lengths. Journal of Statistical Computation and Simulation, 87 (16), 3111-3131.

Raftery A. E., Madigan D., Hoeting J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92 (437), 179-191.

Ruchstuhl A., Welsh A. H., Carroll R. J. (2000). Nonparametric function estimation of the relationship between two repeatedly measured variables. Statistica Sinica, 10 (1), 51-71.

Sala-i-Martin X., Doppelhofer G., Miller R. I. (2004). Determinants of long-term growth: A Bayesian averaging of classical estimates (BACE) approach. American Economic Review, 94 (4), 813-835.

Schwarz G. (1978). Estimating the dimension of a model. Annals of Statistics, 6 (2), 461-464.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Shehata Y. A., White P. (2008). A randomization method to control the type I error rates in best subset regression. Journal of Modern Applied Statistical Methods, 7 (2), 398-407.

Shibata R. (1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. Annals of Statistics, 8 (1), 147-164.

Shibata R. (1981). An optimal selection of regression variables. Biometrika, 68 (1), 45-54.

Shibata R. (1983). Asymptotic mean efficiency of a selection of regression variables. Annals of the Institute of Statistical Mathematics, 35 (1), 415-423.

Stock J. H., Watson M. W. (2006). Forecasting with many predictors. In: Elliott G., Granger C. W. J., :§ Timmermann A. (eds.), Handbook of Economic Forecasting, vol. 1. Elsevier, Amsterdam, 515-554. J

Wang Y., Guo W., Brown M. B. (2000). Spline smoothing for bivariate data with application to association between hormones. Statistica Sinica, 10 (1), 377-397. ®

®

Welsh A. H., Lin X., Carroll R. J. (2002). Marginal longitudinal nonparametric regression: Locality and o

§

efficiency of spline and kernel methods. Journal of American Statistical Association, 97 (458), 482-494. ^

Welsh A. H., Yee T. W. (2006). Local regression for vector responses. Journal of Statistical Planning and Inference, 136 (9), 3007-3031.

Zubakin V. A., Kosorukov O. A., Moiseev N. A. (2015). Improvement of regression forecasting models.

Modern Applied Science, 9 (6), 344-353.

Received 20.01.2019; accepted 01.03.2019.

Increasing the accuracy of macroeconomic time series forecast by incorporating functional and correlational dependencies between them Текст научной статьи по специальности «Математика»

Аннотация научной статьи по математике, автор научной работы — Moiseev N., Volodin A.

Похожие темы научных работ по математике , автор научной работы — Moiseev N., Volodin A.

Increasing the accuracy of macroeconomic time series forecast by incorporating functional and correlational dependencies between them

Текст научной работы на тему «Increasing the accuracy of macroeconomic time series forecast by incorporating functional and correlational dependencies between them»