Ад. UNIVERSUM:
№ 3 (120)_А ТЕХНИЧЕСКИЕ НАУКИ_март. 2024 г.
DOI - 10.32743/UniTech.2024.120.3.17028
DETERMINATION OF INFORMATIVE FEATURES USING THE METHOD OF DIVISION INTO INTERVALS BASED ON THE COMPACTITY HYPOTHESIS
Fayzulla Shodiev
Associate professor, Karshi State University, Uzbekistan, Karshi E-mail: _ fayzulloshyu@gmail.com
Munisa Davronova
Student of Applied Mathematics, Karshi State University, Uzbekistan, Karshi
ОПРЕДЕЛЕНИЕ ИНФОРМАТИВНЫХ ПРИЗНАКОВ МЕТОДОМ РАЗДЕЛЕНИЯ НА ИНТЕРВАЛЫ НА ОСНОВЕ ГИПОТЕЗЫ КОМПАКТНОСТИ
Шодиев Файзулла Юсупович
исполняющий обязанности доцента, Каршинский государственный университет, Республика Узбекистан, г. Карши
Давронова Муниса Ищом цизи
студент,
Каршинский государственный университет, Республика Узбекистан, г. Карши
ABSTRACT
The sample contains 295 soft wheat varieties (objects) obtained from the experimental fields of the Southern Agricultural Research Institute information about and the values of their features.
Partitioning object values into intervals based on the compactness hypothesis using the method, their optimal limits were found, and based on this, the weights of the features in the sample were calculated and informative features were determined. In addition, smoothing and latency work was carried out on the values of the features.
АННОТАЦИЯ
Выборка содержит данные о 295 сортах мягкой пшеницы (объектах) и значениях их признаков, полученные на опытных полях Южный научно-исследовательский сельскохозяйственный институт.
С помощью метода разделения значений объектов на интервалы на основе гипотезы компактности были найдены их оптимальные пределы, на основе этого рассчитаны веса признаков в выборке и определены информативные признакы. Кроме того, была проведена работа по сгложивание и латентности значений признаков.
Keywords: sample, feature, informative feature, latent feature, feature weight, quantitative feature, division into intervals, stochastic methods, deterministic methods, standardization.
Ключевые слова: выборка, признак, информативный признак, латентный признак, вес признака, количественный признак, разделение на интервалы, стохастические методы, детерминированные методы, стандартизация.
Introduction. The article contains a selection of information about wheat varieties compactness the problem of partitioning into optimal intervals is solved using the method of partitioning into intervals based on the hypothesis. Drought-resistant varieties are determined by calculating the weights of features (parameters) based on optimal intervals. Also, one of the main goals of the research is to develop a new approach to determine the weights of features in the selection of wheat varieties using the method of division into compactness intervals.
Adoption of this approach will help bring about positive changes in the seed industry.
Feature weights are used for the following purposes:
• to calculate the proximity measure between objects;
• to select and sort informative features;
• in the search for patterns to model the intuitive decision-making process;
• in order to reduce the space of features in the calculation of generalized values (latent features) [1].
Библиографическое описание: Shodiev F.Yu., Davronova M.I. DETERMINATION OF INFORMATIVE FEATURES USING THE METHOD OF DIVISION INTO INTERVALS BASED ON THE COMPACTITY HYPOTHESIS // Universum: технические науки : электрон. научн. журн. 2024. 3(120). URL: https://7universum. com/ru/tech/archive/item/17028
№ 3 (120)
Materials and methods. Weighting methods are aimed at solving the problems of teacher and untutored comprehension. It is known that there is no general method of classification. Therefore, conditional and unconditional optimization algorithms are used in the calculation process. It should be noted that there is no strict distinction between the terms "feature weight" and "feature contribution" in terms of content. The essence of the criteria used to calculate the weight and contribution of the features is based on the verification of the truth of the compactness hypothesis [2].
март, 2024 г.
Quantitative features weights. Let's say sorted,
l^v'-Mm (1)
Щ" sequence and u^,---,u\,...,и],---,u\ isaset of integers, in which the number of values of the feature up — in the description of objects q - in the range Kp
from a the sequence number in the formula a +1 (1)-
Quantitative feature all values in the description
of objects (1) in the order numbers a +1 and a ranges
according to the following criteria are equivalent to the nominal scale of the measurement scale:
i i i
II( up - !) u'
p=i i=i
LIK (I K -1)
ii
IК
p=1 '=1 V
m
- K l-Kp + up
\\
j=1
IK ( m-| К, I)
^ max
{A}
(2)
The maximum value of [0;1] this criterion is considered a weight with a set of values in the range of a quantitative feature w [3].
In most cases, the technology of dividing the values that quantitative features can take into intervals is widely used in creating models aimed at obtaining new knowledge (hidden laws) from data bases related to the subject areas that are not well conditioned. Stochastic and deterministic methods are used for dividing into intervals.
Stochastic methods are usually used in the initial analysis of givens. The results of measurements on quantitative scales are divided into the following from the point of view of division into intervals:
• selection objects are not divided into classes;
• selection objects are divided into classes.
Traditional methods for the first case include
histograms, decile and percentile distributions. This X = ,...,x }is the length of the set of values of the
features under consideration h =
maxxj - mmxi
x eX x eX
is
k
divided into k intervals. The number of intervals for decile and percentile distributions is k=9, respectively and is defined as k=90.
Classification into classes can be carried out by the method developed by V.Vapnik, which is based on the distribution law and the number of intervals. This method is a heuristic method, and when dividing into intervals, the belonging of objects to one or another entropy class is taken into account [4].
Quantitative features based on deterministic criteria Two methods of partitioning into non-intersecting intervals are known [5]. Algorithms of these methods are invariant to measurement scales and are used for the following cases: • in the search for latent features from the data base in modeling the intuitive decision-making process;
• ensuring that the information lost in the formation of nominal features from quantitative features is minimal;
• informative t sets from different categories of features.
Interpretation of criteria. Given a possible set of two disjoint k and K2 classed objects E0 ,...,}.
Each object is a feature of n different categories X (n) = (x,..., xn) be described on the basis of 5(5 > 0)
one of them on a quantitative scale, and the rest n -5 on a nominal scale. X (n) features obtained from let
there be an operator reflecting y(^) = (y ) the
quantitative signs on and in its elements X ( n ) taken
from 5 , there may also be latent features in the thigh.
As an example of latent features x * x., x / x and
1 j 1 j
x. / x. combinations, as well as generalized indicators
J I
derived from quantitative and nominal features [6]. E let there be two criteria for dividing the values
of the features taken from the subset of the sample into non-intersecting intervals y . The first criterion
is based on the condition that the number of classes and the number of intervals are equal. In the case we are looking at, this number is 2.
Each y. e y feature according to the above
criterion is performed as follows. The ordered set of values of the feature is divided into two intervals [co; Ci ](ci; c2 ]. Here Co = min y^ aM
s„ eEn
C2 = maX У* ( =( У*,..., yvM)) •
The calculation
of the values of the boundary of the interval C2 is based on the following hypothesis, that is, the values of the
№ 3 (120)
MapT, 2024 r.
features of the objects in each interval K or K3_t (t = 1,2) is based on that obtained from the class [7].
Suppose that u\,u2(u\,u2) is the number of values
y. ef of the feature Kx (K2) belonging to the
class, [c0, q ] and (q, c2 ] to the intervals.
A = ( a, a, a ), a = 1, a2 = m, a gets from the sample P , sorted in ascending order of features values
y ■ e Yand be a sequence defining the interval limit Ci = ^, mt = \Kt n| , (t = 1,2).
The following criterion c can be used to calculate
the optimal value of the limit of the interval and use its value as an indicator of the compactness of the quantitative feature when dividing objects of the set into classes P :
( 2
IS up (
_
wf I m - mt
uf )+wf ( m - wf )
2mm
22
SSwf (wf -1)
p=i ¿=i
m (m -1)+m (m -1)
^ max
M
(3)
Only K the values of the features of the obtained objects are located y e Yin the boundaries corresponding to each of the two objects K , then the value of the criterion (3) is equal to 1 (one).
If r. = r......r. = r. then the value of criterion (3)
J1 J2' ' Jm-1 Jm
is equal to 0(zero). In other cases, the value of the criterion is (0;1) equal to one of the numbers in the
interval [8].
Results and discussion. The selection includes 295 soft wheat varieties and their features (quantitative) values from SARI (Southern Agricultural Research Institute). In addition, the objects in the selection are divided into two classes according to the
recommendations of experts. Varieties resistant to drought (objects belonging to class 1, 15), varieties resistant to drought (objects belonging to class 2, 280).
In Table 1 below, we present the values obtained as a result of dividing and smoothing the values of features of drought-resistant wheat varieties into compactness intervals based on criteria (2) and (3).
Calculating the weights of the features in the sample directly (without smoothing) leads to many losses. Because the numerical values in the columns of features differ sharply from each other. For example, the range of values of the "The nature of the grain" feature varies in the range [669.46;838.1].
In order to improve the quality of the obtained results, we polish each quantitative feature column by standardization.
Table 1.
Split intervals and weights after sample file smoothing
№ Features C0 C1 C2 Feature weight
1 1000 grain weight -2.5615 1.1405 2.3561 0.643057
2 Productivity -3.6627 1.1193 4.6424 0.600568
3 The nature of the grain -4.6941 0.52603 1.7866 0.405435
4 Plant height -3,042 0.43686 2.9022 0.334819
5 Protein content -2.1688 0.51598 2.3836 0.312253
6 Spike length -2.2199 -0.93573 3.6874 0.298118
7 Amount of gluten -2.2876 0.97734 2.6098 0.282956
8 The length of the last syllable -3.0479 0.36968 3.86 0.280146
9 The number of spikes -2.8483 0.47097 2.8419 0.271753
10 Vegetation period -2.9167 0.20305 2.0749 0.267084
11 IDK -6,138 -0.22963 1.2866 0.264227
12 Grain vitreousness -1.2678 -0.09024 3.3513 0.253981
13 Grain moisture -1.6489 -0.4849 2.8778 0.251748
Features with a weight of 0.4 and above in Table 1 can be taken as informative features. Because these features contribute a lot to the classification and weight of drought-resistant wheat varieties.
When evaluating drought-resistant varieties, it is necessary to pay attention to their combinations, not the individual condition of the features. For this purpose, better results can be achieved if the weights are calculated by delaying the features.
№ 3 (120)
март, 2024 г.
Table 2.
The weights of the features in the latent state of the sample file
№ Features Feature weight
1 (Productivity*1000 grain weight) 0.693828
2 ((The nature of the grain*Protein content)*(Spike length/Grain moisture)) 0.462482
3 ((Protein amount*IDK)*(Spike length*Number of spikes, units)) 0.459401
4 ((Number of spikes, grain/Grain moisture)*(Protein amount*IDK)) 0.427136
5 ((Number of spikes/Grain moisture)*(Spike length*Number of spikes)) 0.417391
6 ((Spike length/Grain moisture)*(Spike length*Number of spikes)) 0.410489
7 ((Number of spikes/Grain moisture)*(Vegetation period/Grain moisture)) 0.397817
8 ((Protein content/Grain moisture)*(Spike length*Protein content)) 0.393862
9 ((Spike Length/Grain Moisture)*(The length of the last syllable*Protein Content)) 0.391693
10 ((Number of spikes*Protein content)*(IDK/Grain vitreousness)) 0.391693
11 ((Number of spikelets/Grain vitreous)*(Protein content*IDK)) 0.391693
12 ((Vegetation period*Spikes number )*( Spike length*Protein amount )) 0.386234
13 ((Spike length/Grain moisture)*(Number of spikes*Protein content)) 0.384879
14 ((Number of spikes/Grain moisture)*(Spike length*Protein content)) 0.384879
15 ((Protein content/Grain moisture)*(Spike length*Number of spikes)) 0.384879
16 ((Protein content*IDK)*(Vegetation period/Grain moisture)) 0.374164
17 ((The length of the last syllable*Protein Amount)*(Spike Length/Gluten Amount)) 0.372762
18 ((Vegetation period/Grain moisture)*(Spike length*Number of spikes)) 0.372694
19 ((Number of spikes/Grain vitreousness)*(Spike length*Number of spikes)) 0.371363
20 ((Number of spikes/Grain moisture)*(Number of spikes*Protein content)) 0.370322
21 (Plant height*Protein content) 0.336662
22 (Number of spikes*Amount of gluten) 0.312379
We can see in Table 2 above that the weights of the newly formed latent (based on hidden laws) features after the features are latentized twice. This situation means that new informative features have been formed, which will make a more significant contribution to the assessment of varieties.
In conclusion, the following can be given. In this article, based on the division of the features of wheat varieties into compactness intervals, the methods of
calculating the weights of the features in the sample were used.
The obtained results show that the informative features of drought-resistant wheat varieties almost overlap with the features recognized by experts [9].
The identified informative features not only confirm the opinion of experts in the field, but also indicate the need to be interested in features that have been overlooked by them.
References:
1. Madraximov S.F., Saidov D.Y. Stability of the objects of classes and grouping the features //Проблемы вычислительной и прикладной математики. - 2016. - №. 3. - С. 50-54.
2. Shodiyev F. Intellectual system based on the determination of hidden legality //Central Asian journal of education and computer sciences (CAJECS). - 2022. - Т. 1. - №. 5. - С. 11-16.
3. Ignatyev N.A., Madrakhimov S.F., Saidov D.Y. Stability of object classes and selection of the latent features // International journal of engineering technology and sciences. - 2017. - Т. 4. - №. 1. - С. 61-71.
4. Вапник В.Н. Алгоритмы и программы восстановления зависимостей. - М.: Наука, 1984. - 816 с.
№ 3 (120)
UNIVERSUM:
ТЕХНИЧЕСКИЕ НАУКИ
■ 7universum.com
март, 2024 г.
5. Згуральская Е.Н. Алгоритм выбора оптимальных границ интервалов разбиения значений признаков при классификации // Известия Самарского научного центра Российской академии наук. Т.14, № 4 (3), 2012. -
6. Шодиев Ф.Ю., Эшбоев Э.А., Эгамбердиев Э.Х. Использование обобщенных оценок для прогнозирования устойчивости сортов пшеницы к болезням //Азиатский журнал многомерных исследований. - 2021. - Т. 10. -No 4. - С. 602-610.
7. Игнатьев Н.А. Вычисление обобщённых показателей и интеллектуальный анализ данных // Автоматика и телемеханика. - 2011. -№ 5. - С.183-190.
8. Шодиев Ф., Эшбоев Е., Суярова А. Прогнозирование устойчивости к болезням высококачественных сортов пшеницы с использованием метода расчета обобщенных оценок //E3S Web of Conferences. - EDP Sciences, 2023. - Т. 401. - С. 04063.
9. Sharma S.N, Sain R.S, Sharma R.K. Genetics of spike length in durum wheat. Euphytica 130: 2003. -PP. 155-161.
С. 826-829.