Научная статья на тему 'DETERMINATION OF INFORMATIVE FEATURES USING THE METHOD OF DIVISION INTO INTERVALS BASED ON THE COMPACTITY HYPOTHESIS'

DETERMINATION OF INFORMATIVE FEATURES USING THE METHOD OF DIVISION INTO INTERVALS BASED ON THE COMPACTITY HYPOTHESIS Текст научной статьи по специальности «Электротехника, электронная техника, информационные технологии»

CC BY
12
6
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
sample / feature / informative feature / latent feature / feature weight / quantitative feature / division into intervals / stochastic methods / deterministic methods / standardization / выборка / признак / информативный признак / латентный признак / вес признака / количественный признак / разделение на интервалы / стохастические методы / детерминированные методы / стандартизация

Аннотация научной статьи по электротехнике, электронной технике, информационным технологиям, автор научной работы — Shodiev Fayzulla, Davronova Munisa

The sample contains 295 soft wheat varieties (objects) obtained from the experimental fields of the Southern Agricultural Research Institute information about and the values of their features. Partitioning object values into intervals based on the compactness hypothesis using the method, their optimal limits were found, and based on this, the weights of the features in the sample were calculated and informative features were determined. In addition, smoothing and latency work was carried out on the values of the features.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

ОПРЕДЕЛЕНИЕ ИНФОРМАТИВНЫХ ПРИЗНАКОВ МЕТОДОМ РАЗДЕЛЕНИЯ НА ИНТЕРВАЛЫ НА ОСНОВЕ ГИПОТЕЗЫ КОМПАКТНОСТИ

Выборка содержит данные о 295 сортах мягкой пшеницы (объектах) и значениях их признаков, получен-ные на опытных полях Южный научно-исследовательский сельскохозяйственный институт. С помощью метода разделения значений объектов на интервалы на основе гипотезы компактности были найдены их оптимальные пределы, на основе этого рассчитаны веса признаков в выборке и определены информативные признаки. Кроме того, была проведена работа по сглаживание и латентности значений признаков.

Текст научной работы на тему «DETERMINATION OF INFORMATIVE FEATURES USING THE METHOD OF DIVISION INTO INTERVALS BASED ON THE COMPACTITY HYPOTHESIS»

Ад. UNIVERSUM:

№ 3 (120)_А ТЕХНИЧЕСКИЕ НАУКИ_март. 2024 г.

DOI - 10.32743/UniTech.2024.120.3.17028

DETERMINATION OF INFORMATIVE FEATURES USING THE METHOD OF DIVISION INTO INTERVALS BASED ON THE COMPACTITY HYPOTHESIS

Fayzulla Shodiev

Associate professor, Karshi State University, Uzbekistan, Karshi E-mail: _ [email protected]

Munisa Davronova

Student of Applied Mathematics, Karshi State University, Uzbekistan, Karshi

ОПРЕДЕЛЕНИЕ ИНФОРМАТИВНЫХ ПРИЗНАКОВ МЕТОДОМ РАЗДЕЛЕНИЯ НА ИНТЕРВАЛЫ НА ОСНОВЕ ГИПОТЕЗЫ КОМПАКТНОСТИ

Шодиев Файзулла Юсупович

исполняющий обязанности доцента, Каршинский государственный университет, Республика Узбекистан, г. Карши

Давронова Муниса Ищом цизи

студент,

Каршинский государственный университет, Республика Узбекистан, г. Карши

ABSTRACT

The sample contains 295 soft wheat varieties (objects) obtained from the experimental fields of the Southern Agricultural Research Institute information about and the values of their features.

Partitioning object values into intervals based on the compactness hypothesis using the method, their optimal limits were found, and based on this, the weights of the features in the sample were calculated and informative features were determined. In addition, smoothing and latency work was carried out on the values of the features.

АННОТАЦИЯ

Выборка содержит данные о 295 сортах мягкой пшеницы (объектах) и значениях их признаков, полученные на опытных полях Южный научно-исследовательский сельскохозяйственный институт.

С помощью метода разделения значений объектов на интервалы на основе гипотезы компактности были найдены их оптимальные пределы, на основе этого рассчитаны веса признаков в выборке и определены информативные признакы. Кроме того, была проведена работа по сгложивание и латентности значений признаков.

Keywords: sample, feature, informative feature, latent feature, feature weight, quantitative feature, division into intervals, stochastic methods, deterministic methods, standardization.

Ключевые слова: выборка, признак, информативный признак, латентный признак, вес признака, количественный признак, разделение на интервалы, стохастические методы, детерминированные методы, стандартизация.

Introduction. The article contains a selection of information about wheat varieties compactness the problem of partitioning into optimal intervals is solved using the method of partitioning into intervals based on the hypothesis. Drought-resistant varieties are determined by calculating the weights of features (parameters) based on optimal intervals. Also, one of the main goals of the research is to develop a new approach to determine the weights of features in the selection of wheat varieties using the method of division into compactness intervals.

Adoption of this approach will help bring about positive changes in the seed industry.

Feature weights are used for the following purposes:

• to calculate the proximity measure between objects;

• to select and sort informative features;

• in the search for patterns to model the intuitive decision-making process;

• in order to reduce the space of features in the calculation of generalized values (latent features) [1].

Библиографическое описание: Shodiev F.Yu., Davronova M.I. DETERMINATION OF INFORMATIVE FEATURES USING THE METHOD OF DIVISION INTO INTERVALS BASED ON THE COMPACTITY HYPOTHESIS // Universum: технические науки : электрон. научн. журн. 2024. 3(120). URL: https://7universum. com/ru/tech/archive/item/17028

№ 3 (120)

Materials and methods. Weighting methods are aimed at solving the problems of teacher and untutored comprehension. It is known that there is no general method of classification. Therefore, conditional and unconditional optimization algorithms are used in the calculation process. It should be noted that there is no strict distinction between the terms "feature weight" and "feature contribution" in terms of content. The essence of the criteria used to calculate the weight and contribution of the features is based on the verification of the truth of the compactness hypothesis [2].

март, 2024 г.

Quantitative features weights. Let's say sorted,

l^v'-Mm (1)

Щ" sequence and u^,---,u\,...,и],---,u\ isaset of integers, in which the number of values of the feature up — in the description of objects q - in the range Kp

from a the sequence number in the formula a +1 (1)-

Quantitative feature all values in the description

of objects (1) in the order numbers a +1 and a ranges

according to the following criteria are equivalent to the nominal scale of the measurement scale:

i i i

II( up - !) u'

p=i i=i

LIK (I K -1)

ii

p=1 '=1 V

m

- K l-Kp + up

\\

j=1

IK ( m-| К, I)

^ max

{A}

(2)

The maximum value of [0;1] this criterion is considered a weight with a set of values in the range of a quantitative feature w [3].

In most cases, the technology of dividing the values that quantitative features can take into intervals is widely used in creating models aimed at obtaining new knowledge (hidden laws) from data bases related to the subject areas that are not well conditioned. Stochastic and deterministic methods are used for dividing into intervals.

Stochastic methods are usually used in the initial analysis of givens. The results of measurements on quantitative scales are divided into the following from the point of view of division into intervals:

• selection objects are not divided into classes;

• selection objects are divided into classes.

Traditional methods for the first case include

histograms, decile and percentile distributions. This X = ,...,x }is the length of the set of values of the

features under consideration h =

maxxj - mmxi

x eX x eX

is

k

divided into k intervals. The number of intervals for decile and percentile distributions is k=9, respectively and is defined as k=90.

Classification into classes can be carried out by the method developed by V.Vapnik, which is based on the distribution law and the number of intervals. This method is a heuristic method, and when dividing into intervals, the belonging of objects to one or another entropy class is taken into account [4].

Quantitative features based on deterministic criteria Two methods of partitioning into non-intersecting intervals are known [5]. Algorithms of these methods are invariant to measurement scales and are used for the following cases: • in the search for latent features from the data base in modeling the intuitive decision-making process;

• ensuring that the information lost in the formation of nominal features from quantitative features is minimal;

• informative t sets from different categories of features.

Interpretation of criteria. Given a possible set of two disjoint k and K2 classed objects E0 ,...,}.

Each object is a feature of n different categories X (n) = (x,..., xn) be described on the basis of 5(5 > 0)

one of them on a quantitative scale, and the rest n -5 on a nominal scale. X (n) features obtained from let

there be an operator reflecting y(^) = (y ) the

quantitative signs on and in its elements X ( n ) taken

from 5 , there may also be latent features in the thigh.

As an example of latent features x * x., x / x and

1 j 1 j

x. / x. combinations, as well as generalized indicators

J I

derived from quantitative and nominal features [6]. E let there be two criteria for dividing the values

of the features taken from the subset of the sample into non-intersecting intervals y . The first criterion

is based on the condition that the number of classes and the number of intervals are equal. In the case we are looking at, this number is 2.

Each y. e y feature according to the above

criterion is performed as follows. The ordered set of values of the feature is divided into two intervals [co; Ci ](ci; c2 ]. Here Co = min y^ aM

s„ eEn

C2 = maX У* ( =( У*,..., yvM)) •

The calculation

of the values of the boundary of the interval C2 is based on the following hypothesis, that is, the values of the

№ 3 (120)

MapT, 2024 r.

features of the objects in each interval K or K3_t (t = 1,2) is based on that obtained from the class [7].

Suppose that u\,u2(u\,u2) is the number of values

y. ef of the feature Kx (K2) belonging to the

class, [c0, q ] and (q, c2 ] to the intervals.

A = ( a, a, a ), a = 1, a2 = m, a gets from the sample P , sorted in ascending order of features values

y ■ e Yand be a sequence defining the interval limit Ci = ^, mt = \Kt n| , (t = 1,2).

The following criterion c can be used to calculate

the optimal value of the limit of the interval and use its value as an indicator of the compactness of the quantitative feature when dividing objects of the set into classes P :

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

( 2

IS up (

_

wf I m - mt

uf )+wf ( m - wf )

2mm

22

SSwf (wf -1)

p=i ¿=i

m (m -1)+m (m -1)

^ max

M

(3)

Only K the values of the features of the obtained objects are located y e Yin the boundaries corresponding to each of the two objects K , then the value of the criterion (3) is equal to 1 (one).

If r. = r......r. = r. then the value of criterion (3)

J1 J2' ' Jm-1 Jm

is equal to 0(zero). In other cases, the value of the criterion is (0;1) equal to one of the numbers in the

interval [8].

Results and discussion. The selection includes 295 soft wheat varieties and their features (quantitative) values from SARI (Southern Agricultural Research Institute). In addition, the objects in the selection are divided into two classes according to the

recommendations of experts. Varieties resistant to drought (objects belonging to class 1, 15), varieties resistant to drought (objects belonging to class 2, 280).

In Table 1 below, we present the values obtained as a result of dividing and smoothing the values of features of drought-resistant wheat varieties into compactness intervals based on criteria (2) and (3).

Calculating the weights of the features in the sample directly (without smoothing) leads to many losses. Because the numerical values in the columns of features differ sharply from each other. For example, the range of values of the "The nature of the grain" feature varies in the range [669.46;838.1].

In order to improve the quality of the obtained results, we polish each quantitative feature column by standardization.

Table 1.

Split intervals and weights after sample file smoothing

№ Features C0 C1 C2 Feature weight

1 1000 grain weight -2.5615 1.1405 2.3561 0.643057

2 Productivity -3.6627 1.1193 4.6424 0.600568

3 The nature of the grain -4.6941 0.52603 1.7866 0.405435

4 Plant height -3,042 0.43686 2.9022 0.334819

5 Protein content -2.1688 0.51598 2.3836 0.312253

6 Spike length -2.2199 -0.93573 3.6874 0.298118

7 Amount of gluten -2.2876 0.97734 2.6098 0.282956

8 The length of the last syllable -3.0479 0.36968 3.86 0.280146

9 The number of spikes -2.8483 0.47097 2.8419 0.271753

10 Vegetation period -2.9167 0.20305 2.0749 0.267084

11 IDK -6,138 -0.22963 1.2866 0.264227

12 Grain vitreousness -1.2678 -0.09024 3.3513 0.253981

13 Grain moisture -1.6489 -0.4849 2.8778 0.251748

Features with a weight of 0.4 and above in Table 1 can be taken as informative features. Because these features contribute a lot to the classification and weight of drought-resistant wheat varieties.

When evaluating drought-resistant varieties, it is necessary to pay attention to their combinations, not the individual condition of the features. For this purpose, better results can be achieved if the weights are calculated by delaying the features.

№ 3 (120)

март, 2024 г.

Table 2.

The weights of the features in the latent state of the sample file

№ Features Feature weight

1 (Productivity*1000 grain weight) 0.693828

2 ((The nature of the grain*Protein content)*(Spike length/Grain moisture)) 0.462482

3 ((Protein amount*IDK)*(Spike length*Number of spikes, units)) 0.459401

4 ((Number of spikes, grain/Grain moisture)*(Protein amount*IDK)) 0.427136

5 ((Number of spikes/Grain moisture)*(Spike length*Number of spikes)) 0.417391

6 ((Spike length/Grain moisture)*(Spike length*Number of spikes)) 0.410489

7 ((Number of spikes/Grain moisture)*(Vegetation period/Grain moisture)) 0.397817

8 ((Protein content/Grain moisture)*(Spike length*Protein content)) 0.393862

9 ((Spike Length/Grain Moisture)*(The length of the last syllable*Protein Content)) 0.391693

10 ((Number of spikes*Protein content)*(IDK/Grain vitreousness)) 0.391693

11 ((Number of spikelets/Grain vitreous)*(Protein content*IDK)) 0.391693

12 ((Vegetation period*Spikes number )*( Spike length*Protein amount )) 0.386234

13 ((Spike length/Grain moisture)*(Number of spikes*Protein content)) 0.384879

14 ((Number of spikes/Grain moisture)*(Spike length*Protein content)) 0.384879

15 ((Protein content/Grain moisture)*(Spike length*Number of spikes)) 0.384879

16 ((Protein content*IDK)*(Vegetation period/Grain moisture)) 0.374164

17 ((The length of the last syllable*Protein Amount)*(Spike Length/Gluten Amount)) 0.372762

18 ((Vegetation period/Grain moisture)*(Spike length*Number of spikes)) 0.372694

19 ((Number of spikes/Grain vitreousness)*(Spike length*Number of spikes)) 0.371363

20 ((Number of spikes/Grain moisture)*(Number of spikes*Protein content)) 0.370322

21 (Plant height*Protein content) 0.336662

22 (Number of spikes*Amount of gluten) 0.312379

We can see in Table 2 above that the weights of the newly formed latent (based on hidden laws) features after the features are latentized twice. This situation means that new informative features have been formed, which will make a more significant contribution to the assessment of varieties.

In conclusion, the following can be given. In this article, based on the division of the features of wheat varieties into compactness intervals, the methods of

calculating the weights of the features in the sample were used.

The obtained results show that the informative features of drought-resistant wheat varieties almost overlap with the features recognized by experts [9].

The identified informative features not only confirm the opinion of experts in the field, but also indicate the need to be interested in features that have been overlooked by them.

References:

1. Madraximov S.F., Saidov D.Y. Stability of the objects of classes and grouping the features //Проблемы вычислительной и прикладной математики. - 2016. - №. 3. - С. 50-54.

2. Shodiyev F. Intellectual system based on the determination of hidden legality //Central Asian journal of education and computer sciences (CAJECS). - 2022. - Т. 1. - №. 5. - С. 11-16.

3. Ignatyev N.A., Madrakhimov S.F., Saidov D.Y. Stability of object classes and selection of the latent features // International journal of engineering technology and sciences. - 2017. - Т. 4. - №. 1. - С. 61-71.

4. Вапник В.Н. Алгоритмы и программы восстановления зависимостей. - М.: Наука, 1984. - 816 с.

№ 3 (120)

UNIVERSUM:

ТЕХНИЧЕСКИЕ НАУКИ

■ 7universum.com

март, 2024 г.

5. Згуральская Е.Н. Алгоритм выбора оптимальных границ интервалов разбиения значений признаков при классификации // Известия Самарского научного центра Российской академии наук. Т.14, № 4 (3), 2012. -

6. Шодиев Ф.Ю., Эшбоев Э.А., Эгамбердиев Э.Х. Использование обобщенных оценок для прогнозирования устойчивости сортов пшеницы к болезням //Азиатский журнал многомерных исследований. - 2021. - Т. 10. -No 4. - С. 602-610.

7. Игнатьев Н.А. Вычисление обобщённых показателей и интеллектуальный анализ данных // Автоматика и телемеханика. - 2011. -№ 5. - С.183-190.

8. Шодиев Ф., Эшбоев Е., Суярова А. Прогнозирование устойчивости к болезням высококачественных сортов пшеницы с использованием метода расчета обобщенных оценок //E3S Web of Conferences. - EDP Sciences, 2023. - Т. 401. - С. 04063.

9. Sharma S.N, Sain R.S, Sharma R.K. Genetics of spike length in durum wheat. Euphytica 130: 2003. -PP. 155-161.

С. 826-829.

i Надоели баннеры? Вы всегда можете отключить рекламу.