Научная статья на тему 'Review of studies on time series forecasting based on hybrid methods, neural networks and multiple regression'

Review of studies on time series forecasting based on hybrid methods, neural networks and multiple regression Текст научной статьи по специальности «Медицинские технологии»

CC BY
893
236
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
hybrid models / forecasting / time series / multiple regression / neural networks / fuzzy modeling methods / regres-sion analysis / econometric methods / гибридные модели / прогнозирование / временной ряд / множественная регрессия / нейронные сети / нечеткие методы / регрессионный анализ / эконометрические методы

Аннотация научной статьи по медицинским технологиям, автор научной работы — S.A. Yarushev, A.N. Averkin

The article gives a detailed overview of the studies in time series forecasting. It also considers the history of forecasting methods development. The author gives a review of the latest valid forecasting methods, such as statistical, connectionist and hybrid, forecasting methods that are based on multiple regression, their basic parameters, application area and performance. The paper considers recent research in the field of hybrid forecasting methods application, gives a short overview of these methods and notes their efficiency according to the authors. The author emphasizes the study of using BigData in forecasting. He suggests a forecasting model based on BigData technology using a hybrid of soft computing and artificial neural networks, he tests it on a stock market. The article considers a model based on neural networks, wavelet analysis and bootstrap method. The method is developed for flows forecasting to manage water resources successfully. The paper shows a detailed comparative study of methods based on neural networks and multiple regression. It considers different studies with a description of comparison methods and results. It also shows a comparison of these methods on the example of predicting housing market; there is a detailed analysis of both methods using different samples. At the end the author gives the results of the study and compares forecasting results.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

ОБЗОР ИССЛЕДОВАНИЙ ПО ПРОГНОЗИРОВАНИЮ ВРЕМЕННЫХ РЯДОВ НА ОСНОВЕ ГИБРИДНЫХ МЕТОДОВ, НЕЙРОННЫХ СЕТЕЙ И МНОЖЕСТВЕННОЙ РЕГРЕССИИ

В статье делается детальный обзор исследований в области прогнозирования временных рядов. Рассматриваются наиболее мощные современные методы прогнозирования, такие как статистические, нейросетевые и гибридные методы прогнозирования. Начало статьи посвящено истории развития методов прогнозирования и краткому обзору не-которых методов. Далее рассматриваются методы прогнозирования на основе множественной регрессии, основные параметры методов, область применения и результативность. Описаны самые современные исследования в области применения гибридных методов прогнозирования. Делается краткий обзор методов, а также говорится о результативности методов, по оценке их авторов. Среди них следует отметить исследование по применению BigData в прогнозировании. Авторы предлагают модель прогнозирования, основанную на BigData-технологиях, используя гибрид технологий мягких вычислений и искусственных нейронных сетей, и тестируют ее на рынке акций. Рассматривается модель на основе нейронных сетей, вейвлет-анализа и бутс-трап-метода. Метод разработан для прогнозирования потоков для результативного управления водными ресурсами. Также рассмотрен ряд других исследований в области гибридных методов. В работе делается подробное сравнение методов на основе нейронных сетей и методов на основе множественной регрессии. Рассматриваются различные исследования, где кратко описываются методы сравнения и результаты. По-мимо обзора методов, в работе проводится сравнение данных методов на примере прогнозирования рынка недвижимости. Проводится подробный анализ обоих методов на различных выборках, и в итоге сравниваются результаты исследования и результаты прогнозирования.

Текст научной работы на тему «Review of studies on time series forecasting based on hybrid methods, neural networks and multiple regression»

DOI: 10.15827/0236-235X. 113.075-082 Received 26.10.15

REVIEW OF STUDIES ON TIME SERIES FORECASTING BASED ON HYBRID METHODS, NEURAL NETWORKS AND MULTIPLE REGRESSION

(The work is done with partial support from the Russian Foundation for Basic Research,

project no. 14-07-00603)

S.A. Yarushev, Postgraduate Student, [email protected]; A.N. Averkin, Ph.D. (Physics and Mathematics), Associate Professor, [email protected] (Dubna International University for Nature, Society and Man, Universitetskaya St. 19, Dubna, 141980, Russian Federation)

The article gives a detailed overview of the studies in time series forecasting. It also considers the history of forecasting methods development. The author gives a review of the latest valid forecasting methods, such as statistical, connectionist and hybrid, forecasting methods that are based on multiple regression, their basic parameters, application area and performance.

The paper considers recent research in the field of hybrid forecasting methods application, gives a short overview of these methods and notes their efficiency according to the authors. The author emphasizes the study of using BigData in forecasting. He suggests a forecasting model based on BigData technology using a hybrid of soft computing and artificial neural networks, he tests it on a stock market. The article considers a model based on neural networks, wavelet analysis and bootstrap method. The method is developed for flows forecasting to manage water resources successfully.

The paper shows a detailed comparative study of methods based on neural networks and multiple regression. It considers different studies with a description of comparison methods and results. It also shows a comparison of these methods on the example of predicting housing market; there is a detailed analysis of both methods using different samples. At the end the author gives the results of the study and compares forecasting results.

Keywords: hybrid models, forecasting, time series, multiple regression, neural networks, fuzzy modeling methods, regression analysis, econometric methods.

The recent surge in research of artificial neural networks (ANN) showed that neural networks have a strong capability in predicting and classification problems. ANN are successfully used to solve various problems in many areas of business, industry and science [1].

The rapid growth in the number of articles published in scientific journals in various disciplines shows high interest in neural networks. It suffices to consider several large databases to understand the huge number of articles about studying neural networks published during the year. There are thousands of articles.

Neural networks can be run in parallel with the input variables and therefore can simply handle large amounts of data. The main advantage of neural networks is the ability to find patterns [2]. ANN are a promising alternative in the professional forecasting toolbox. In fact, the non-linear structure of neural networks is partially useful to identify complex relationships in most real-world problems.

Perhaps neural networks are the most universal method of forecasting considering the fact that they can not only find non-linear structures in problems, but they can also simulate the processes of linear processes. For example, the possibility of using neural networks in modeling a linear time-series line has been studied and confirmed by a number of researchers [3-5].

One of the main applications of IOS is forecasting. In recent years, the interest in forecasting using neural

networks has increased. Forecasting has a long history, and its importance is reflected in the application in a variety of disciplines from business to engineering.

The ability to predict the future accurately is fundamental in many decision-making processes in planning, developing strategies, building policy, as well as in the management of supply and stock prices. As such, forecasting has always been an area to invest a lot of effort. It still remains an important and active area of human activity in the present and will continue to evolve in the future. A review of the research needs in prediction was presented by Armstrong [6].

For several decades forecasting has been dominated by linear methods, which are simple in design and use. They are also easy to understand and interpret. However, linear models have significant limitations, due to which they cannot discern any nonlinear relationships in data. Earlier in 1980 there was a large-scale competition in forecasting. Most widely used linear methods were tested on more than 1,000 real-time series [7]. It has been found that none of linear models showed the best results worldwide that can be interpreted as a failure of linear models in accounting with a certain degree of non-linearity that is common for the real world.

Financial market prediction is one of the most important trends in research due to their commercial appeal [8]. Unfortunately, financial markets are dynamic, non-linear, complex, non-parametric and chaotic by nature [9]. Time series are multi-stationary, noisy, casual and have frequent structural breaks [10]. In addi-

tion, financial markets also affect a large number of macroeconomic factors [11, 12], such as political development, global economic development, bank rating, the policy of large corporations, exchange rates, investment expectations, events in other stock markets and even psychological factors.

Artificial neural networks are one of the technologies that showed significant progress in the study of stock markets. In general, the value of shares is a random sequence with some noise; in turn, artificial neural networks are powerful parallel processors of nonlinear systems, which depend on their own internal relations. Development of techniques and methods that can approximate any nonlinear continuous function without a priori notions about the nature of the process itself is shown in the work of P. Pino [13]. It is obvious that a number of factors demonstrates sufficient efficacy in the forecast prices. A weak point is that they all contain a few limitations in forecasting stock prices and use linear methods. The previous investigation revealed the problem: none of them provides a comprehensive model for the valuation of shares. If we evaluate the cost and provide a model in order to remove the uncertainty, it can help to increase the investment attractiveness of the stock exchanges. Therefore, nowadays, to conduct a research to choose the best method of financial time series forecasting is the most popular and promising task.

Time series forecasting based on multiple regression

Classification of Econometric Models. Many publications reflect the fragmentary study of local economic processes and indicators [14]. Among the publications devoted to the development and application of econometric methods, models and software and technological tools we select only those that reflect a full description of the dynamics of a forecasting performance and compare them. It is reasonable to make a preliminary classification of the means used in forecasting models of the country because it makes sense to compare only comparable products. Among the main characteristics of country systems forecast models we highlight a few obvious:

- the power of prediction;

- the presence and development of tools for debugging regression equations;

- the availability and flexibility of control systems calculation;

- the balance sheet values of predictive control performance;

- controlling targets of allowable values;

- the presence and development of tabular and graphical tools to analyze the targets.

Power forecasting system determines the ability of the system to a single-pass mode to solve the maximum possible number of related regression equations. We distinguish that if the system solves up to 10 equations,

it is low power ("M"), up to 100 equations - average power ratio ("C"), the equations to 1000 - high power ("B"), over 1,000 equations - extra high capacity ("X").

Availability and advanced debugging tools of regression equations. Compilation of regression equations is a creative process. A debugging process and selection of regression equations is routine, laborious and time-consuming. Without automation, the main elements of this process can include only "M" Class. For "C" class or higher the language needs equations at the level of program identifiers and automatic decryption of the equations in a foreign language in the parameter table and input data fields to pass on to standard procedures of finding the coefficients of the regression equations. At the same time procedures with automatic control of regression equations entries accuracy at the level of formal language are necessary.

The availability and flexibility of control systems calculation. Advanced forecasting models, as well as software and technology tools that implement these models (with "C" class systems or higher) should be able to reproduce at least three operation modes: preparation, operation and final.

Preparation includes forming and updating baseline reporting of data for calculating a forecast; setting the values of forecasting scenario variants; recording and debugging regression equations, setting parameters for calculation; installing a of a forecasting scenario identifier.

The operating mode includes installation parameters and values of scenario forecasting performance variant, execution of a forecast calculation, the analysis of control information fields.

The final mode includes a full-scale withdrawal of all the values of calculated predictive indicators; quality control for each indicator; the ability to display tables and graphs forecast and reported values of any single forecast figure; the ability to output the combined forecast schedules and reported values of any individual set of the targets; saving forecast version results.

A special place in the country econometric model takes control of balance sheet values of the targets and control targets of allowable values.

Balance value control of prediction indicators. In the SNA (System of National Accounts) a gross domestic product is calculated in three ways. Each of the methods has its own set of calculating indicators, the amount of which in relative terms should not exceed 1.0 (with a tolerance deviations). Any balance sheet items calculated in the system (for example, profit and loss statement, labour force balance, etc.) can be incorporated into the control system of balance sheet values. The presence of the identified deviations in the amount exceeding the allowable amount does not stop the calculation, but draws attention of an expert and researcher to discrepancies.

Control targets of allowable values. For the majority of indicators known ranges of acceptable values.

The values of these limitations are set (or reset) in the preparation mode before performing calculations. The presence of over-identified control system does not stop the calculation and serves as a warning about the possible inconsistencies in the selected predictive values.

Availability and development of tabular and graphical tools for the analysis of the targets. After the completion of the main phase of settlement a number of works on material preparation for analytical study of the forecast version increases. You must have tables and graphs with forecast quality assessment for each indicator, the ability to compare any indicators in any amount visually, a full picture of the forecast performance of the test version, the results of monitoring balance sheet ratios, the results of monitoring allowable ranges of individual values of each indicator.

Classification of Indicators. Parameters and factors under the study are divided into nominal and anomalous. For example, all indicators of the SNA are rated. Indicators that are not typical for steady-state economics, but used in the national statistics, should be considered as abnormal. For example, wage arrears, accounts payable, hidden wages, the number of undocumented working migrants and other indicators similar to forecast nominal indicators that are not recommended for using as indicators of abnormal factors. Or using the abnormal factors in exceptional cases, where they (temporarily) explain quantitative and qualitative phenomena and processes well. Furthermore, it should be borne in mind that the figures themselves can be rated as a nominal value, and abnormal.

Limiting the possibility of econometric models means limitations in an information base and a state of methodological tools. Restrictions on the use of econometric methods for forecasting come under the following conditions:

- the emergence of new indicators with short rows of report data;

- incomparability of a report data index because of radical methodological changes in the target segment of reporting;

- the dependence of the values on legislative decrees, executive or management decisions of financial authorities ("Policy" indicators);

- the use of indicators with hidden (unobserved) sets of factors influence;

- administrative dependent indicators.

The latter situation has been detected at the early stages of using econometric models by the author of these models, the winner of the Nobel Prize in 1980 Klein L.R. Administrators themselves determine the future values of the control actions on the basis of speculative conclusions about the behavior of influencing factors. The experience of its own estimates of forecast market behavior indicators has become a base for developing doctrines and purposes, its own control actions that determined the behavior of endogenous indicators.

Time Series Forecasting Based on Neural Networks

Artificial neural networks (ANN) are computing structures that model biological processes. ANN study many competing hypotheses simultaneously using massively parallel non-linear networks with respect to computing elements connected to the different weights by links. This bonding conductor set of weights contains the knowledge generated by the INS. INS were successfully used for detection of low-level tasks, such as speech or symbols recognition. They are currently being studied to reveal decision-making and induction problems [15].

In general, the ANN model describes the network topology, performance units and the rules of training or education. ANN consist of a large number of simple processing elements, each of them interacts with the other by excitatory or inhibitory compounds. A distributed view over the large number of elements with the interconnection of processing elements provides the allowable error. Learning is conducted using rules that adapt weights of links in response to the input patterns. Changes of the weights associated with links allow adapting to new situations. Lipmann [16] revealed a wide range of topologies used for implementation of the ANN in their work.

Over the past 10 years more and more research efforts have focused on using ANN in business. Although, the opinions about values of these approaches are mixed. Some researchers consider them effective for solving the tasks of unstructured decision-making, others have expressed doubts on their potential, suggesting that they require stronger empirical evidence. In order to evaluate the effectiveness of neural networks, it is necessary to conduct a thorough review of studies in the field of assessing the effectiveness of ANN predictive capability.

Next, we will study the effectiveness of neural networks in forecasting problems with an extensive review on existing research and work on the subject.

Latest researches in hybrid time series prediction

One approach to solving difficult real-world problems is a hybridization of artificial intelligence and statistical methods to combine the strengths and eliminate weaknesses for each method [17].

Ranjeeta Bisoi and P.K. Dash used a hybrid dynamic neural network (DNN) trained by sliding mode algorithm and differential evolution (DE) [18]. They used this model to predict stock price indices and stock return volatilities of two important Indian stock markets, namely the Reliance Industries Limited (RIL) from one day ahead to one month in advance. The DNN comprises a set of the first order IIR filters for processing the past inputs and their functional expansions. Its weights are adjusted using a sliding mode

strategy known for its fast convergence and robustness with respect to chaotic variations in the inputs. Extensive computer simulations are carried out to predict the stock market indices and return volatilities simultaneously. It is noted that a simple IIR-based DNN-FLANN model hybridized with DE produces better forecasting accuracies in comparison to more complicated neural architectures.

Kumar S. proposed Reservoir inflow forecasting using ensemble models based on neural networks, wavelet analysis and bootstrap method [19]. The aim of this study is to develop an ensemble modeling approach based on wavelet analysis, bootstrap resampling and neural networks (BWANN) for reservoir inflow forecasting. In this study the performance of BWANN model is also compared with wavelet based ANN (WANN), wavelet based MLR (WMLR), bootstrap and wavelet analysis based on multiple linear regression models (BWMLR), standard ANN and standard multiple linear regression (MLR) models for inflow forecasting. This study demonstrated the effectiveness of proper selection of wavelet functions and appropriate methodology for wavelet based model development. Moreover, it is found, that performance of BWANN models is better than BWMLR model for uncertainty assessment. Comparing to point predictions, a range of forecast is more reliable, accurate and can be very helpful for operational inflow forecasting.

Wang J. proposed a Hybrid forecasting modelbased data mining and genetic algorithm-adaptive particle swarm optimization [20]. This study proposes a hybrid forecasting model that can effectively provide preprocessing of the original data and improve forecasting accuracy. The developed model applies a genetic algorithm-adaptive particle swarm optimization algorithm to optimize the parameters of the wavelet neural network (WNN) model. The proposed hybrid method is subsequently examined in regard to the wind farms of eastern China. The forecasting performance demonstrates that the developed model is better than some traditional models (for example, back propagation, WNN, fuzzy neural network, support vector machine). Its applicability is further verified by the paired-sample T tests.

Singh P. suggested Big Data Time Series Forecasting Model. It is a novel big data time series forecasting model that is based on the hybridization of two soft computing (SC) techniques, viz., a fuzzy set and an artificial neural network [21]. The proposed model is explained with the stock index price data set of State Bank of India (SBI). The performance of the model is verified with different factors, viz., two-factors, three-factors and M-factors. Various statistical analyzes signify that the proposed model can take far better decision with the M-factors data set.

Comparison of Neural Networks and Multiple Regression on the Example of Predicting Housing Value. In their work Cripps and Engin [22] compared

the effectiveness of forecasts made by multiple regression and an artificial neural network using Backpropa-gation. The comparison is made on the example of predicting the value of residential real estate. Two models were compared using different datasets, functional specifications and comparative criteria. The same specifications of both methods and a real information can explain why other studies, which have compared the ANN and multiple regression, yielded different results.

For an objective comparison of models it is necessary to identify possible problems of each model, which could distort the performance of the method. Research shows that multiple regression and ANN are determined. In addition, some studies on models based on multiple regression are also determined and applied to the data set used in this study.

A comparison used a standard feedforward neural network with training on the basis of back-propagation. The same experiments were performed with a lot of variations in methods of training neural networks. Different neural network architectures, such as ARTMAP (adaptive resonance theory), GAUSSIAN, neural-regression, were also studied. After hundreds of experiments and changes of architecture, a standard method of reverse spread showed a better performance than other architectures of neural networks. Several other studies on comparison of neural networks and multiple regression obtained different results. Thus, ironically, the main issue and the purpose of this study is to determine the causes why some researchers got the best result using multiple regression, while others have come to the conclusion that using neural network is better.

Some studies show the superiority of the ANN multiple regression in solving the problem of real estate market forecasting [23, 24]. Other studies [25], however, showed that ANN are not always superior to regression. Due to the ability to train ANN to learn and recognize complex images without being programmed by certain rules, they can easily be used in a small set of statistical data. In contrast, the regression analysis, neural network does not need to form predetermined functional based determinants. This function of ANN is important because several studies [26, 27] found that the property age has a nonlinear dependence on its value (for a set of data used in their studies). Other studies have shown that in addition to age, the living area also has a non-linear relationship with the value of [28]. Based on the results of previous studies and theoretical capacity of ANN, one would expect that ANN have a better performance than a multiple regression.

When using multiple regression, it is necessary to solve methodological problems of the functional form because of incorrect specification, linearity, and multi-collinearity heteroscedasticity. The possibility of a nonlinear functional form in most cases can be trans-

lated to a linear-nonlinear relationship before we proceed to using a regression analysis [29]. As noted earlier, some studies found that the age and living area have a nonlinear relationship with the value of the property. Multicollinearity does not affect the predictive capabilities of multiple regression, as well as at ANN [30] because the conclusions are drawn together in a certain area of observation. Multicollinearity, however, makes it impossible to affect separation of supposedly independent variables. Heteroscedasticity arises when using intersection data. In addition to the model of methodological problems, the lack of relevant explanatory variable is another source of error when using multiple regression and ANN. This is often due to a lack of data.

When using a neural network of direct distribution with training on the basis of back propagation of errors, it is necessary to solve the following methodological problems: the number of hidden layers, the number of neurons in each hidden layer, the sample of training data, the size of the sample, the sample of test data and the corresponding size of the sample as well as overtraining. As a general rule, the level of education and the number of hidden neurons affects the storage and generality predicted by the model. The more widely training and the bigger the number of hidden neurons are used, the better the production of correct predictions on the training set in the model. On the other hand, ANN is less likely to predict a new data (summary), i.e. ANN ability to generalize weakens when there is overtraining, which can occur when the dimension of the hidden layer is too big. To avoid overtraining, it is advisable to use the heuristic method described in Hecht-Nielsen's article [31]. Despite limitations, there is some theoretical basis to facilitate the determination of the number of hidden layers and neurons in use. In most cases there is no way to determine the best number of hidden neurons without training and evaluation of multiple networks generalization errors. If ANN have only a few hidden neurons, then an error and training error of generalization will be high due to thigh statistical accuracy. If a neural network has too many hidden neurons, there will be a few errors while training, but the error of generalization will be high because of the re-education and high dispersion [32]. If the training sample is not a representative set of data (statistics), there is no basis for ANN training. Typically, a representative training set is generated using a random sample of the dataset. When a training data set is too small, then INS will have a tendency to memorize that training models are too specific and extreme points (noise) will have an extraordinary impact on the quality model. This can be corrected, however, with the help of K-fold cross-validation method of teaching [33].

Input data for research. To carry out a study, the authors selected 3906 observations on the sale of a single unit of residential property, collected over 18 months. The following parameters were used for the

study of real estate: residential area (sqrft), bedrooms (bed #), the number of bathrooms (bath #), the number of years that have passed since the construction of the building (age), the quarter in which the property is sold (quarter #) and whether the object is a garage or a carport (garage_cp).

The next numbers were selected from the 3906 observations: 306, 506, 706, 906, 1,106, 1,306, 1,506, 1,706, 1,906, 2,106, 2,306, 2,506, 2,706, 2,906, 3,106, 3,306, 3,506 and 3,706. Such sets are called a training sample, from one to 18 (T1, T2, T3, T4, T5, T6, T7, T8, T9, T10, T11, T12, T13, T14, T15, T16, T17, T18) respectively. Each set includes an extension of the previous, i.e. T1 T2 T3 ... T18. A complement to each training set with respect to the data set of 3906 observations gives a set of test data, V1-V18, respectively. Table 1 shows some basic set of statistical information for a data set. Each training set from T1 to T18 contains a uniform distribution by quarters for 18 months. Sets of test data are used to reduce any errors associated with each model related to sample size. Both models use identical training samples, as well as inspection data. For example, set V1 verification uses the training set T1, etc.

Using models of ANN and multiple regression. 108 comparisons with different sizes of a training sample, functional specifications and forecast periods were carried out to counter the predictive efficacy of multiple regression and ANN. Models of multiple regression and ANN (used in this study) include the matters described above. It is a misspecification of a functional form, linearity and heteroscedasticity for multiple regression models. For neural network models they are: the number of hidden layers, the number of neurons in each hidden layer, the selected training sample, the test set of data and its size, as well as overtraining.

As noted in previous studies, the authors presented some improvements of multiple regression data model to predict a selling price. ANN improvements were used to predict the same forecasting sales prices. It follows that the six functional specifications of multiple regression (based on previous studies) were used to develop ANN models. This process can demonstrate the improvement of multiple regression and also improve the ANN model. The obtained data model is compared with 18 training samples and corresponding sets of 18 tests. This study used functional forms of multiple regression that are linear, semi-log and completely logarithmic.

The result of this comparison. The authors compared multiple regression and ANN. An attempt was made to conduct valid comparisons of predictive abilities for both models. Multiple comparative experiments were carried out with different sets of training data, the functional specifications and in different periods of the forecast. 108 trials were conducted. For comparison, two criteria: MAPE (mean absolute percentage error) and PE (prediction error). Based on the results of prediction, both of these criteria may differ for

different samples of a data model. Thus, each indicator using a certain criteria should be handled carefully to assess forecasting accuracy, as well as the size of the sample used. When using the sample size from moderate to large, ANN performs better in both criteria with respect to the multiple regression. For these purposes, the size of data samples from 506 to 1,506 cases (a total of 3906 cases) for ANN outperformed multiple regression (using both criteria). In general, when ANN functional specification becomes more complicated, the training sample size should be increased to ensure that ANN have worked better than a multiple regression. A multiple regression shows better results (using the criterion of the mean absolute relative error) than INS when using the small size of the data for training. For each of the functional specification model performance multiple regression to some extent remains constant at different sample sizes, while the performance is significantly improves ANN with increasing size of training data.

Fluctuations in the performance of an ANN model are associated with the large number of possible options and the lack of a methodical approach to the choice of the best parameters. For example, experiments must be conducted to determine the best way to represent the data model specification, the number of hidden layers, the number of neurons in each hidden layer, learning rate and the number of training cycles. All of these actions are intended to identify the best model of a neural network. Failure to conduct such experiments can lead to a badly specified ANN model.

If other input variables, such as a fireplace, floors, finishing materials, lot size, connected communication and the type of funding are included, the results may vary.

Research findings provide an explanation why previous studies give different results when comparing multiple regression and ANN for prediction. Prognostic efficiency depends on the evaluation criteria (MAPE and FE) used in conjunction with the size and parameters of the model training. Fluctuations of an ANN model performance may be due to the large number of settings selected by experimentation and depend on the size of the training sample.

In conclusion, it should be noted that if we have a large set of training data and relevant parameters of ANN, then it works much better than multiple regression. Otherwise, the results vary.

Conclusion

In this research we made a detailed review of models for time series forecasting such as multiple regression and based on artificial neural networks. A thorough analysis of studies was conducted in this field of research. And the results of studies were reviewed to identify the qualitative method to make predictions. A particular attention was given to methods based on comparison of multiple regression and artificial neural

networks. The article described in detail an example of a study which included a practical comparison of predictive capability of ANN and multiple regression for example predicting the value of residential real estate. The results of this study demonstrate superiority of neural networks for prediction quality obtained in comparison with a multiple regression. The authors also identified some difficulties encountered in predicting using both methods. Thus, based on the research results, it can be concluded that in order to get the most superiority in predictions using ANN over multiple regression, we should have the largest possible amount of training set. The larger the training set, the more qualitative forecast making the neural network.

Considering all the above mentioned results of studies, which compared neural networks with a plurality of other prediction methods, we conclude that in the majority of cases artificial neural networks provide a qualitative prediction with respect to other methods, in relation to the multiple regression too.

References

1. Widrow B., Rumelhart D.E., Lehr M.A. Neural networks: Applications in industry, business and science. Stanford Univ. Communications of the ACM. 1994, vol. 37, no. 3, pp. 93-105.

2. Chang P.C., Wang Y.W., Liu C.H. The development of a weighted evolving fuzzy neural network for PCB sales forecasting. Expert Systems with Applications. 2007, vol. 32, iss. 1, pp. 86-96.

3. Hwang H.B. Insights into neural-network forecasting of time series corresponding to ARMA (p, q) structures. Omega. 2001, vol. 29, iss. 3, pp. 273-289.

4. Medeiros M.C., Pedreira C.E. What are the effects of forecasting linear time series with neural networks? Engineering Intelligent Systems. 2001, vol. 9, pp. 237-424.

5. Zhang G.P. An investigation of neural networks for linear time-series forecasting. Computers & Operations Research. 2001, vol. 28, pp. 1183-1202.

6. Armstrong J.S. Research needs in forecasting. Int. Journ. of Forecasting. 1988, vol. 4, pp. 449-465.

7. Makridakis S., Anderson A., Carbone R., Fildes R., Hib-don M., Lewandowski R., Newton J., Parzen E., Winkler R. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journ. of Forecasting. 1982, vol. 6, iss. 1 (2), pp. 111-153.

8. Majhi R., Panda G. Stock market prediction of S&P 500 and DJIA using bacterial foraging optimization technique. Proc. 2007 IEEE Congress on Evolutionary Computation. 2007, pp. 2569-2579.

9. Tan T.Z., Quek C., Ng, G.S. Brain inspired genetic complimentary learning for stock market prediction. Proc. IEEE Congress on Evolutionary Computation. 2005, vol. 3, pp. 2653-2660.

10. Oh K.J., Kim K.J. Analyzing stock market tick data using piecewise non linear model. Expert System with Applications. 2002, vol. 3, pp. 249-255.

11. Miao K., Chen F., Zhao Z.G. Stock price forecast based on bacterial colony RBF neural network. Journ. of QingDao University.

2007, vol. 20, pp. 50-54 (in Chinese).

12. Wang Y.F. Mining stock prices using fuzzy rough set system. Expert System with Applications. 2003, vol. 24, iss. 1, pp. 13-23.

13. Pino R., Parreno J., Gomez A., Priore P. Forecasting next-day price of electricity in the Spanish energy market using artificial neural networks. Engineering Applications of Artificial Intelligence.

2008, vol. 21, pp. 53-62.

14. Kitova O.V., Kolmakov I.B., Potapov S.V., Sharafutdino-va A.R. Sistemy modeley kratkosrochnogo prognoza pokazateley sotsialno-ekonomicheskogo razvitiya RF [Systems of Short-time Prediction Models of the Russian Federation Social Indicators]. Init-siativy XXI veka, 2012, no. 4, pp. 36-38.

15. Schocken S., Ariav G. Neural networks for decision support: problems and opportunities. Decision Support Systems. 1994, vol. 11, pp. 393-414.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

16. Lippmann R.P. An introduction to computing with neural nets. IEEE ASSP Magazine. 1987, vol. 4, iss. 2, part 1, pp. 4-22.

17. Averkin A.N., Yarushev S.A. Hybrid methods of time series prediction in financial markets. Proc. Int. Conf. on Soft Computing and Measurements. St. Petersburg, St. Petersburg Electrotechnical University "LETI" Publ., 2015, vol. 1, part 1-3, pp. 332-335.

18. Bisoi R., Dash P.K. Prediction of financial time series and its volatility using a hybrid dynamic neural network trained by sliding mode algorithm and differential evolution. Int. Journ. of Information and Decision Sciences. 2015, vol. 7, no. 2, pp. 166-191.

19. Kumar S. Reservoir inflow forecasting using ensemble models based on neural networks, wavelet analysis and bootstrap method. Water Resources Management. 2015, vol. 29, no. 13, pp. 4863-4883.

20. Wang J. Hybrid forecasting model-based data mining and genetic algorithm-adaptive particle swarm optimization: a case study of wind speed time series. IETRenewable Power Generation. 2015.

21. Singh P. Big data time series forecasting model: a fuzzy-neuro hybridize approach. Computational Intelligence for Big Data Analysis. 2015, pp. 55-72.

22. Nguyen N., Cripps A. Predicting Housing Value: A comparison of multiple regression analysis and artificial neural networks. JRER. 2001, vol. 3, pp. 314-336.

23. Tsukuda J., Baba S.-I. Predicting japanese corporate bank-

ruptcy in terms of financial data using neural networks. Computers & Industrial Engineering. 1994, vol. 27, iss. 1-4, pp. 445-48.

24. Do Q., Grudnitski G. A neural network approach to residential property appraisal. The Real Estate Appraiser. 1992, vol. 58, pp. 38-45.

25. Allen W.C., Zumwalt J.K. Neural networks: a word of caution. Unpublished Working Paper. Colorado State Univ., 1994.

26. Grether D., Mieszkowski P. Determinants of real values. Journ. of Urban Economics. 1974, vol. 1, iss. 2, pp. 27-45.

27. Jones W., Ferri M., McGee L. A competitive testing approach to models of depreciation in housing. Journ. of Economics and Business. 1981, vol. 33, iss. 3, pp. 2-11.

28. Goodman A.C., Thibodeau T.G. Age-related heteroskedas-ticity in hedonic house price equations. Journ. of Housing Research. 1995, vol. 6, pp. 25-42.

29. Kmenta J. Elements of econometrics. NY, Macmillan Publ., 1971.

30. Neter M., Wasserman W., Kutner J. Applied linear statistical models. 3rd ed., McGraw-Hill Publ., 1990.

31. Hecht-Nielsen R. Kolmogorov's Mapping Neural Network Existence Theorem. Proc. IEEE 1st Int. Conf. on Neural Networks. San Diego, CA, 1987.

32. Geman S., Bienenstock E., Doursat R. Neural networks and the bias/variance dilemma. Neural Computation. 1992, vol. 4, pp. 1-58.

33. Goutte C. Note on free lunches and cross-validation. Neural Computation. 1997, vol. 9, pp. 11-15.

УДК 004.415.2 Дата подачи статьи: 26.10.15

DOI: 10.15827/0236-235X.113.075-082

ОБЗОР ИССЛЕДОВАНИЙ ПО ПРОГНОЗИРОВАНИЮ ВРЕМЕННЫХ РЯДОВ НА ОСНОВЕ ГИБРИДНЫХ МЕТОДОВ, НЕЙРОННЫХ СЕТЕЙ И МНОЖЕСТВЕННОЙ РЕГРЕССИИ (Работа выполнена при частичной поддержке РФФИ, грант № 14-07-00603)

Ярушев С.А., аспирант, [email protected];

Аверкин А.Н., к.ф.-м.н., доцент, [email protected] (Международный университет природы, общества и человека «Дубна», ул. Университетская, 19, г. Дубна, 141980, Россия)

В статье делается детальный обзор исследований в области прогнозирования временных рядов. Рассматриваются наиболее мощные современные методы прогнозирования, такие как статистические, нейросетевые и гибридные методы прогнозирования. Начало статьи посвящено истории развития методов прогнозирования и краткому обзору некоторых методов. Далее рассматриваются методы прогнозирования на основе множественной регрессии, основные параметры методов, область применения и результативность.

Описаны самые современные исследования в области применения гибридных методов прогнозирования. Делается краткий обзор методов, а также говорится о результативности методов, по оценке их авторов. Среди них следует отметить исследование по применению BigData в прогнозировании. Авторы предлагают модель прогнозирования, основанную на BigData-технологиях, используя гибрид технологий мягких вычислений и искусственных нейронных сетей, и тестируют ее на рынке акций. Рассматривается модель на основе нейронных сетей, вейвлет-анализа и бутстрап-метода. Метод разработан для прогнозирования потоков для результативного управления водными ресурсами. Также рассмотрен ряд других исследований в области гибридных методов.

В работе делается подробное сравнение методов на основе нейронных сетей и методов на основе множественной регрессии. Рассматриваются различные исследования, где кратко описываются методы сравнения и результаты. Помимо обзора методов, в работе проводится сравнение данных методов на примере прогнозирования рынка недвижимости. Проводится подробный анализ обоих методов на различных выборках, и в итоге сравниваются результаты исследования и результаты прогнозирования.

Ключевые слова: гибридные модели, прогнозирование, временной ряд, множественная регрессия, нейронные сети, нечеткие методы, регрессионный анализ, эконометрические методы.

Литература

1. Widrow B., Rumelhart D.E., Lehr M.A. Neural networks: applications in industry, business and science. Stanford Univ. Communications of the ACM. 1994, vol. 37, no. 3, pp. 93-105.

2. Chang P.C., Wang Y.W., Liu C.H. The development of a weighted evolving fuzzy neural network for PCB sales forecasting. Expert Systems with Applications. 2007, vol. 32, iss. 1, pp. 86-96.

3. Hwang H.B. Insights into neural-network forecasting of time series corresponding to ARMA (p, q) structures. Omega. 2001, vol. 29, iss. 3, pp. 273-289.

4. Medeiros M.C., Pedreira C.E. What are the effects of forecasting linear time series with neural networks? Engineering Intelligent Systems. 2001, vol. 9, pp. 237-424.

5. Zhang G.P. An investigation of neural networks for linear time-series forecasting. Computers & Operations Research. 2001, vol. 28, pp. 1183-1202.

6. Armstrong J.S. Research needs in forecasting. Int. Journ. of Forecasting. 1988, vol. 4, pp. 449-465.

7. Makridakis S., Anderson A., Carbone R., Fildes R., Hibdon M., Lewandowski R., Newton J., Parzen E., Winkler R. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journ. of Forecasting. 1982, vol. 6, iss. 1 (2), pp. 111-153.

8. Majhi R., Panda G. Stock market prediction of S&P 500 and DJIA using bacterial foraging optimization technique. Proc. of the 2007 IEEE Congress on Evolutionary Computation. 2007, pp. 2569-2579.

9. Tan T.Z., Quek C., Ng, G.S. Brain inspired genetic complimentary learning for stock market prediction. Proc. of the IEEE Congress on Evolutionary Computation. 2005, vol. 3, pp. 2653-2660.

10. Oh K.J., Kim K.J. Analyzing stock market tick data using piecewise non linear model. Expert System with Applications. 2002, vol. 3, pp. 249-255.

11. Miao K., Chen F., Zhao Z.G. Stock price forecast based on bacterial colony RBF neural network. Journ. of QingDao University. 2007, vol. 20, pp. 50-54 (in Chinese).

12. Wang Y.F. Mining stock prices using fuzzy rough set system. Expert System with Applications. 2003, vol. 24, iss. 1, pp. 13-23.

13. Pino R., Parreno J., Gomez A., Priore P. Forecasting next-day price of electricity in the Spanish energy market using artificial neural networks. Engineering Applications of Artificial Intelligence. 2008, vol. 21, pp. 53-62.

14. Китова О.В., Колмаков И.Б., Потапов С.В., Шарафутдинова А.Р. Системы моделей краткосрочного прогноза показателей социально-экономического развития РФ // Инициативы XXI века. 2012. № 4. С. 36-38.

15. Schocken S., Ariav G. Neural networks for decision support: problems and opportunities. Decision Support Systems. 1994, vol. 11, pp. 393-414.

16. Lippmann R.P. An introduction to computing with neural nets. IEEE ASSP Magazine. 1987, vol. 4, iss. 2, part 1, pp. 4-22.

17. Аверкин А.Н., Ярушев С.А. Гибридные методы прогнозирования временных рядов на финансовых рынках // XVIII Междунар. конф. по мягким вычислениям и измерениям (SCM- 2015): сб. докл. СПб: Изд-во СПбГЭУ «ЛЭТИ». 2015. Т. 1. Ч. 1-3. С. 332-335.

18. Bisoi R., Dash P.K. Prediction of financial time series and its volatility using a hybrid dynamic neural network trained by sliding mode algorithm and differential evolution. Int. Journ. of Information and Decision Sciences. 2015, vol. 7, no. 2, pp. 166-191.

19. Kumar S. Reservoir inflow forecasting using ensemble models based on neural networks, wavelet analysis and bootstrap method. Water Resources Management. 2015, vol. 29, no. 13, pp. 4863-4883.

20. Wang J. Hybrid forecasting model-based data mining and genetic algorithm-adaptive particle swarm optimization: a case study of wind speed time series. IET Renewable Power Generation. 2015.

21. Singh P. Big data time series forecasting model: a fuzzy-neuro hybridize approach. Computational Intelligence for Big Data Analysis. 2015, pp. 55-72.

22. Nguyen N., Cripps A. Predicting housing value: a comparison of multiple regression analysis and artificial neural networks. JRER. 2001, vol. 3, pp. 314-336.

23. Tsukuda J., Baba S.-I. Predicting japanese corporate bankruptcy in terms of financial data using neural networks. Computers & Industrial Engineering. 1994, vol. 27, iss. 1-4, pp. 445-48.

24. Do Q., Grudnitski G. A neural network approach to residential property appraisal. The Real Estate Appraiser. 1992, vol. 58, pp. 38-45.

25. Allen W.C., Zumwalt J.K. Neural networks: a word of caution. Unpublished Working Paper. Colorado State Univ., 1994.

26. Grether D., Mieszkowski P. Determinants of real values. Journ. of Urban Economics. 1974, vol. 1, iss. 2, pp. 27-45.

27. Jones W., Ferri M., McGee L. A competitive testing approach to models of depreciation in housing. Journ. of Economics and Business. 1981, vol. 33, iss. 3, pp. 2-11.

28. Goodman A.C., Thibodeau T.G. Age-related heteroskedasticity in hedonic house price equations. Journ. of Housing Research. 1995, vol. 6, pp. 25-42.

29. Kmenta J. Elements of econometrics. NY, Macmillan Publ., 1971.

30. Neter M., Wasserman W., Kutner J. Applied linear statistical models. 3rd ed., McGraw-Hill Publ., 1990.

31. Hecht-Nielsen R. Kolmogorov's mapping neural network existence theorem. Proc. of the IEEE 1 st Int. Conf. on Neural Networks. San Diego, CA, 1987.

32. Geman S., Bienenstock E., Doursat R. Neural networks and the bias/variance dilemma. Neural Computation. 1992, vol. 4, pp. 1-58.

33. Goutte C. Note on free lunches and cross-validation. Neural Computation. 1997, vol. 9, pp. 11-15.

i Надоели баннеры? Вы всегда можете отключить рекламу.