DEVELOPMENT OF A MODULE FOR EVALUATING THE ACTIVITY OF THE MAKHALLA CHAIRPERSONS BASED ON THE EXPERTS' ASSESSMENT WITH THE ALGORITHMS
1Ibragimov Muhiddin Fakhraddin ugli, 2Matyakubov Islombek Ikrom ugli, 3Raximov Asadbek Dilshod ugli, 4Musayev Muhammadjon Xursandbek ugli
1Assistant of the department of "Software engineering", Urgench branch of Tashkent University
of Information Technologies named after Muhammad al-Khwarizmi 2,3,4 Students of the direction of "Software engineering", Urgench branch of Tashkent University of Information Technologies named after Muhammad al-Khwarizmi https://doi.org/10.5281/zenodo.7915928
Abstract. This article deals with the application of machine learning algorithms in the development of a module for evaluating the activity of mahallas based on the employment data of the mahalla. For the issue of classification of machine learning, the construction of a model for the support of Linear Regression, Polynomial Regression and Neural Network algorithms of Machine Learning is mentioned. By using these algorithms, the problem of automatic assessment of mahalla assessment has been solved. Experimental work was carried out on the KNIME Analytics platform. The obtained results are compared for these algorithms and conclusions are presented.
Keywords: expert, model, evaluation, linear regression learner, polynomial regression learner, RProp MLP Learner, X-Partitioner, X-Aggregator, error, object.
Introduction
When analyzing the mahalla data, there is a lot of information that mainly determines the living conditions, material and spiritual condition of the population. The standard of living of the population means the level of providing people with the necessary material and spiritual benefits, the level of satisfying their consumption and needs.[1]
It is possible to consider the standard of living of the population as the level of meeting their material, spiritual and social needs. One of the important indicators in this regard is the employment of the population of the mahalla. Incomplete data on employment is common when data is collected in a cross-sectional area, but if it is discarded during pre-processing of the data, it results in a loss of informative data.[2,3] In the process of building the model, it leads to an increase in errors in the verification of model adequacy.
Experts face a number of difficulties in assessing the activity of the mahalla chairman, especially based on the existing indicators of employment, due to the large number of mahallas. To upload the evaluation process to the system, building a model for evaluating the mahallas according to the employment indicator will allow mahalla chairmen to compare their performance with other mahallas. Classification methods and algorithms of machine learning based on available data help to solve the problem.[4,5]
Setting the question. When solving the problem of classification into classes, the use of methods such as properly distributed neural network, Logistic regression, Naive Bayes, Base vectors, Random forest, and nearest neighbor gives good results. Many classification algorithms have been developed based on these methods. Each of them works well on different types of
datasets. Therefore, it is important to determine which classification method is effective in solving the classification problem for the data set we have chosen. As an efficiency indicator, it is necessary to consider not only the classification error rate, but also the time taken to execute the algorithm as an efficiency indicator. Therefore, in solving the problem of classification in this article, the main issue is to choose a teaching method that minimizes both the error in classification and the time spent on the teaching process for the given set of data.
Therefore, it is advisable to use a more complicated, but more reliable testing method. One such method is the K-Fold Cross-Validation testing method for assessing model reliability. In this way, we can describe the testing process as follows. The main aspect of this method is that all the subjects participate in both the teaching process and the testing process. (See Figure 1).
Figure 1. Splitting the running sample into subsets using the K-Fold Cross-Validation testing
method.
In the K-Fold Cross-Validation testing method, the set of running samples is divided into k subsets. The model is then run and tested k times. Each time it is run, the i-th set is used only for testing, and the rest are used for running. Accordingly, the error is calculated for each test, and the average error is determined by the following formula.
K i=l
Using data on employment of residents of mahallas in Khorezm region (Table 1), a data set is developed, looking at each mahalla as an object.[10,11]
Table 1.
Demographic indicators and employment indicators in the areas of the regions and expert
assessment
The Num
num ber
Makh Popul ber of
allas ation of pens
fami ione
lies rs
The
numb Numbe
er of r of
peopl people
e engage
under d in
the child
age of care
18
Number
of permane
ntly employe d
People engage
d in busines
Number
of unemplo yed
Expert assessmen t
s
Uslar 2371 649 194 726 182 486 149 189 86
Yuksa lish 3314 667 314 981 248 617 164 208 71
Namu na 2857 797 356 1208 245 64 129 138 75
Gulzo r 4012 915 283 1374 150 423 917 270 72
Bogzo r 3723 745 389 1344 205 0 144 243 76
Uzbek istan 4861 1203 124 1754 386 811 202 215 68
Navba khor 6347 1945 389 2443 432 281 328 234 65
'he regression problem, which is a part o ? the classi: Ication problem, is solved for the
generated running sample.
Regression is one of the methods of intellectual analysis of data and is a set of statistical processes for evaluating the relationship between variables related to an object or process. Linear regression analysis. Regression analysis is widely used mainly for prediction and forecasting, and now the use of this method is compatible with the field of machine learning [8]. In linear regression, the relationship between the independent variables and the dependent variable is usually done through a line that represents the relationship between the two variables. The corresponding line is called the regression line and y = a * x + b is represented by a linear equation. if we have more than one independent variable, then we consider a multiple linear regression model if we take the following model:
y=bo + bixi + b2X2 + ... + bnXn
• y- is the response to the values, that is, it means the result predicted by the model;
• bo the intercept, which is the value of x for which y is all 0;
• the first sign bl xl coefficienti;
• another characteristic is the bn x coefficient;
• x1,x2,., xn are independent variables of the model.
Basically, an equation explains the relationship between a constant dependent variable (y) and two or more independent variables (xl, x2, x3...). In polynomial regression, the relationship between the independent variable and the dependent variable is usually represented by a Polynomial, which represents the relationship between the two variables.
File Reader: Data is downloaded through this component. The downloaded data will look like this.
Row ID [J] Colo [0 Coll []]Col2 UfC0l3 [¡]Col4 [J] Col 5 [J] ColS Q]CoI7 [J] ColS Q]CoI9
RowO 2371 649 194 726 182 486 149 312 189 86
Rowl 3314 667 314 981 248 617 164 421 208 71
Row2 2857 797 356 1208 245 64 129 83 138 75
Row3 4012 915 283 1374 150 423 917 489 270 72
Row4 3723 745 389 1344 205 0 144 147 243 76
Row5 4661 1203 124 1754 386 811 202 316 215 68
Row6 6347 1945 389 2443 432 281 328 285 234 65
Row 7 4114 1039 397 1402 29 786 365 342 136 73
Row8 3386 935 283 1010 115 481 414 280 217 86
Row9 3305 925 334 926 23 781 298 345 145 86
Row 1)0 3468 874 523 1285 29 102 124 275 255 87
Figure 3. Downloading the data formed on the basis of the data of the mahallas.
Normalizer: Through this component, the data we have is normalized (minmax is transferred to the [0..1] range by normalization. This process is necessary to give the same weight to all symbols)
Figure 4. View of the data set after normalization. X-Partitioner and X-Aggregator: K-Fold Cross-Validation running and testing samples forming components, divides the running samples into k parts and organizes the running process k times
.Y1 = -0,5823 * x1 - 0,347 * x2 - 0.2316 * x3 - 0,4682 * x4 - 0,4759 * x5
0,3101 * x6 + 0,00048 * x7 - 0,0909 * x8 + 0,1683 * xf - 0,3748 * xf + 0,198 * xf + 0,2268 * x| + 0,287 * xf + 0,2266 * xf - 0,161 * xf + 0,059 * xf
R-Squared: 0,7163 Adjusted R-Squared: 0,6964
Linear Regression Learner: Through this component, we check the adequacy of the model built using Regression learner. This is done by giving the tested sample to the regression predictor. After testing, the Line Chart component is used to visualize how different the results are from the real situation, and the error is as follows.[12]
= -0,2315 * x1 - 0,0889 * x2 - 0.2799 * x3 - 0,3407 * x4 - 0,2359 * x5 - 0,0879 * x6 - 0,173 *x7 - 0,0731 *x8;
R-Squared: 0,6799 Adjusted R-Squared: 0,669
Row ID S Variable D Coeff, D Std, Err, D t-value D a R>|t|
RowlJO Intercept 1,047 0,027 38,233 0
Row3 Col 2 -0,28 0,06 -4,645 0
Row 5 Col4 -0,236 0,053 -4,464 0
Row4 Col3 -0,341 0,095 -3,579 0
Row 7 Colo -0,173 0,067 -2,573 0,011
Rowl ColO -0,232 0,141 -1,636 0,103
Row6 Col 5 -0,038 0,056 -1,58 0.L15
RowS Col 7 -0,073 0,083 -0,879 0,38
Row2 Coll -0,089 0,112 -0,792 0,429
Row9 ColS -0,058 0,077 -0,754 0,452
Figure 5. The error was detected using the Linear Regression Learner method. Polynomial Regression Learner: we can build a regression database and get the following nonlinear information.
R-Squared: 0,7163 Adjusted R-Squared: 0,6964
Figure 6. Using the Polynomial Regression Learner method, the database and the error were
identified.
Through RProp MLP Learner: we build the regression database and get the following
graph.
Figure 7. Initial and final values are shown in the RProp MLP Learner method.
Table 2.
Experts' assessment
№ Linear Regression Polynomial RProp MLP Learner
Learner Regression Learner
1 0.024 0.021 0.009
2 0.019 0.014 0.008
3 0.023 0.016 0.024
Conclusion. In conclusion, it was found that the model of machine learning using linear, polynomial regression and multi-layer neural network methods for evaluating the activity of the chairman of the neighborhood based on neighborhood data is more effective than the model built using Neyorn's network method than linear and polynomial regression methods.
REFERENCES
1. Development of algorithms and software products for personality recognition based on speech signal processing S Ismoilov, O Masharipov, M Ibragimov, AIP Conference Proceedings, 2022
2. Ibragimov Mukhiddin Fakhraddin ugli EURASIAN JOURNAL OF MATHEMATICAL THEORY AND COMPUTER SCIENCES Volume 2 Issue 14, December 2022 https://doi.org/10.5281/zenodo.7485196
3. Ijro Hokimiyati Organlarda Qaror Qabul Qilishning Intelektual Algaritmlarini Ishlab Chiqish Va Uni Tadbiq Qilish I.M. Fahraddin o'g'li - Komputer texnologiyalari, 2022
4. Madeling using polynamial regression algorithims of machine learning on mahalla data O.K.Xujaev M.F.Ibragimov
5. THE IMPORTANCE OF MONITORING IN THE MANAGENT OF SOCIO-ECONOMIO PROCESSES IN SELF- GOVERNMENT BODIES M.F.Ibragimov, O.K.Xujaev
6. К Вопросу Оценки Компетентности Подготовки Будущих Бакалавров «Программный Инжиниринг» В Слабо Формализованных Условиях Ф Юсупов, О Казаков, М Ибрагимов.
7. Tomas Loster. KLASTER TAHLILIDA KLASTERLARNING OPTAL SONINI ANIQLASH. 10-Xalqaro statistika va iqtisodiyot kunlari. Konferensiya materiallari. 2016 yil 8-10 sentyabr; Praga, Chexiya. pp. 1078-1090.
8. X. Rahimboev., M. Ismoilov. "Boshqaruv ob'ekti va uning tarkibiy qismlarining holatini parametrik baholash uchun modelni yaratish". Acta Turin Polytech. Univ. Toshkent, jild. 10, yo'q. 2, bet. 19-33, 2020. https://uzjournals.edu.uz/actattpu/vol10/iss2/11.
9. Raximboyev XJ "Mashinada o'qitishdan foydalangan holda o'zini o'zi boshqarish organlarida qarorlar qabul qilishni qo'llab-quvvatlash algoritmini ishlab chiqish" Ilmiy- texnika jurnali, FerPI, 2020 yil, V.24, №6. 23-30-betlar. https://uzjournals.edu.uz/ferpi/vol24/iss6/4
10. Dubina, I. N. Ijtimoiy-iqtisodiy jarayonlarni matematik modellashtirish asoslari: bakalavriat va magistratura talabalari uchun darslik va amaliy mashg'ulot / I. N. Dubina. - Moskva: Yurayt nashriyoti, 2019. - 349 b.
11. A. A. Barseghyan, M. S. Kupriyanov, I. I. Xolod, M. D. Tess va S. I. Elizarov. Ma'lumotlar va jarayonlarni tahlil qilish: darslik. nafaqa. - 3-nashr, qayta ko'rib chiqilgan. va qo'shimcha -Sankt-Peterburg: BHV-Peterburg, 2009. - 512 p.: kasal. + CD-ROM.
12. Prokopenko, N. Yu. Qarorlarni qo'llab-quvvatlash tizimlari: darslik / N. Yu. Prokopenko; Nijniy Novgorod davlat arxitektura va qurilish universiteti. - Nijniy Novgorod: Nijniy Novgorod davlat arxitektura va qurilish universiteti, 2017 yil.
13. Kornikov V.V., Seregin I.A., Xovanov N.V. Og'irlik koeffitsientlari haqida raqamli bo'lmagan, noto'g'ri va to'liq bo'lmagan ma'lumotlarni qayta ishlash uchun Bayes modeli // http://inftech.webservis.ru/it/conference/scm/2000/session3/kornikov.htm
14. https://www.knime.com/knime-analytics-platform
15. Sahami M. Learning limited dependence Bayesian classifiers // Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. - Portland, Ore, USA, 1996. - P. 335-338
16. Friedman N. Learning belief networks in the presence of missing values and hidden variables // Proceedings of the 14th International Conference on Machine Learning. - 1997. - P. 125133.