Актуальные проблемы авиации и космонавтики - 2021. Том 2
УДК 519.68
ПОВЫШЕНИЕ ТОЧОСТИ КЛАССИФИКАФИИ С ПОМОЩЬЮ МЕТОДОВ
АНАЛИЗА ДАННЫХ
*
Я. С. Ганжа , Т. В. Пен Научные руководители - А.М. Попов, А. А. Ступина
Сибирский государственный университет науки и технологий имени академика М. Ф. Решетнева Российская Федерация, 660037, г. Красноярск, просп. им. газ. «Красноярский рабочий», 31
*Е-шай: yanavaio@yandex.ru
В статье представлены существующие и разработанные методы анализа данных. Рассмотрено и проанализировано применение однокритериальных и многокритериальных эволюционных алгоритмов в анализе данных.
Ключевые слова: методы анализа данных, эволюционные алгоритмы, классификация.
IMPROVING CLASSIFICATION ACCURACY USING DATA ANALYSIS TECHNIQUES
*
Y. S. Ganzha , T. V. Pen Scientific supervisors - A.M. Popov, A. A. Stupina
Reshetnev Siberian State University of Science and Technology 31, Krasnoyarskii rabochii prospekt, Krasnoyarsk, 660037, Russian Federation *Е-mail: yanavaio@yandex.ru
This article presents the existing and developed methods of data analysis. The application of single-criterion and multi-criteria evolutionary algorithms in data analysis is considered and analyzed.
Keywords: data analysis methods, evolutionary algorithms, classification.
Introduction. Today there is a wide variety of software packages containing various tools and classification methods. However, when using them, you can often find their insufficient work efficiency. That is why it is relevant to use non-standard approaches using data mining. Experiments were carried out using single-criteria and multi-criteria evolutionary algorithms. These experiments confirm their high efficiency [1].
In the problem of choosing the optimal set of features, it is advisable to use a one-criterion genetic algorithm because of the convenient representation of each feature in the form of zeros and ones, as well as the choice of classification accuracy on the considered set of features as an objective function. A multicriteria genetic algorithm for the selection of informative features was considered [2].
An alternative approach is proposed using a multicriteria genetic algorithm with a self-adjusting procedure to optimize the choice of the most efficient evolutionary operators for each problem. Its advantage is in an integrated approach to solving the problem of selecting informative features [3].
Characteristics are assessed. After evaluating features, features are selected based on efficiency, where 0 is not an effective feature, 1 is an effective feature.
The average value of the effectiveness of features is calculated by the formula (1):
Секция «Математические методы моделирования, управления и анализа данных»
F* Yf-1 Fitness^ AV =- j^"1- '
where k is the number of classes, Fitness is a fitness function, r is a feature.
Yr=i Yf= i FitnessI
The effective value (1) is assigned to a feature in the case when AV >-—-
V^ Uit-noccf
f=1FitnessJr
otherwise, when^K <-—-, then the feature is not effective (0).
The classification accuracy is the ratio of the number of correctly classified objects to the total number of objects in the database [4, 5].
The following classifiers were selected for testing: k-NN, Naive Bayes, VFI. The results obtained confirm the effectiveness of the developed method. The average accuracy of the developed method is 74%. The recognition quality depends on the choice of the classifier. The highest accuracy was shown by k-NN. The accuracy of the classifier is influenced by its efficiency and the number of features.
References
1. P. J. Angeline: Adaptive and Self-adaptive Evolutionary Computations. In: M. Palaniswami, Y. Attikiouzel, R. Marks, D. Fogel and T. Fukuda (Eds.) Computational Intelligence: A Dynamic System Perspective, NJ: IEEE Press, 152-161, 1995.
2. P. J. Angeline: Two Self-Adaptive Crossover Operators for Genetic Programming. In: P. Angeline, K. Kinnear (Eds.) Advances in Genetic Programming II. Cambridge, MA: MIT Press, 89-110, 1996.
3. Haupt R.L., Haupt S.E. Practical genetic algorithms. John Wiley & Sons, Inc., Hoboken, New Jersey, 2004.
4. Fareed Akthar, Caroline Hahne. Rapid Miner 5: Operator reference // Dortmund. 2012.
5. Burkhardt F., Paeschke A., Rolfes M., Sendlmeier W. F., and Weiss B. A database of german emotional speech // Interspeech. 2005.
© Ganzha Y. S., Pen T. V., 2021