Секция «Математические методы моделирования, управления и анализа данных»
УДК 004.94
ПРОГНОЗИРОВАНИЕ ДЕФОЛТА КЛИЕНТОВ КРЕДИТНЫХ КАРТ В СИСТЕМЕ
RAPID MINER
Ван Сюйсинь* Научный руководитель - М. В. Карасева
Сибирский государственный университет науки и технологий имени академика М. Ф. Решетнева Российская Федерация, 660037, г. Красноярск, просп. им. газ. «Красноярский рабочий», 31
*Е-шаИ 1248809030@qq.com
Применение методов интеллектуального анализа является актуальным направлением для решения задач в различных областях, в том числе и в ракетно-космической области. В данной работе решается задача классификации владельцев кредитных карт с целью выявления рисков невыплат по кредитным картам. Для решения задачи применялась система RapidMiner, в которой была построена модель с такими метолами как дерево решений, искусственная нейронная сеть, метод k ближайших соседей.
Ключевые слова: RapidMiner, дерево решений, искусственная нейронная сеть, метод k ближайших соседей, T-test.
DEFAULT OF CREDIT CARD CLIENTS FORECASTING BY RAPID MINER
Wang Xuxin* Scientific supervisor - M. V. Karaseva
Reshetnev Siberian State University of Science and Technology 31, Krasnoyarskii rabochii prospekt, Krasnoyarsk, 660037, Russian Federation *Е-mail 1248809030@qq.com
The article states that the application of the methods of intellectual analysis is an urgent direction for solving problems in various fields, including in the rocket and space field. This article solves the problem of classifying credit card holders with the aim of potentially defaulting on credit cards. The RapidMiner system is used to solve the problem. The model was built with such methods as a decision tree, an artificial neural system, and the nearest neighbors method.
Ключевые слова на английском языке: RapidMiner, decision tree, artificial neural network, k nearest neighbors method, T-test.
This research aimed at the case of customers default payments in Taiwan and compares the predictive accuracy of probability of default among three data mining methods [1]. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients
Актуальные проблемы авиации и космонавтики - 2021. Том 2
[2-3]. Because the real probability of default is unknown [4].
When demonstrating the model we need, we need to use the tool RapidMiner to help us complete the establishment of the commercial bank's business risk model. After judging the required data set, we construct the model we need. As shown in the figure, it is the RapidMiner model we built (fig. 1):
Fig. 1. Model in the Rapid Miner system
Here we need to make a simple explanation, we need to further extend the Validation operator, the extension is as follows (fig. 2):
Fig. 2. Validation operator in the Rapid Miner system
To solve the problem, an artificial neural network (ANN), a decision tree (DT) and the k nearest neighbors method (k-NN) were used. As a result, the following results were obtained: ANN: 81,96%; DT: 69,11%; k-NN: 81,7%.
Cemiim «MaTeMäTHqecKHe MCTOjbi m «oje .i h po bii h h , ynpaBiem« h mauro« ja h h bi x »
Table 1
T-Test Significance
ANN DT k-NN
0.817 +/- 0.006 0.691 +/- 0.008 0.820 +/- 0.005
ANN 0.000 0.378
DT 0.000
k-NN
You can see on table 1 that the algorithm with 0.820 accuracy is the best but the algorithm with 0.817 has almost the same accuracy. And these both algorithms have no significant difference in effectivity and it is proved by T-Test method. And the algorithm with 0.691 is significantly worse then others which mentioned above.
It is necessary to use procedures that optimize method settings to improve the efficiency of data mining methods [5]. This direction can become a vector for further research.
References
1. Machine Learning Repository [Electronic resource]. URL: https://archive.ics.uci.edu/ (date of access: 14.2.2021).
2. Yeh I. C., Lien, C. H. The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 2009. 36(2), pp. 2473-2480.
3. Crouhy M., Galai D., Mark R. A comparative analysis of current credit risk models. Journal of Banking and Finance, 2000, 24(1), pp. 59-117.
4. Covitz D., Downing C. Liquidity or Credit Risk. Journal of Finance, 2007. 62(5), pp. 2303-2328.
5. Karaseva T. S., Semenkina O. E. Automatic differential equations identification by self-configuring genetic programming algorithm. IOP Conference Series: Materials Science and Engineering, 2021. pp. 12076.
© Wang Xuxin, 2021