Научная статья на тему 'THE DISCRIMINANT ANALYSIS APPLIED TO THE DIFFERENTIATION OF SOIL TYPES'

THE DISCRIMINANT ANALYSIS APPLIED TO THE DIFFERENTIATION OF SOIL TYPES Текст научной статьи по специальности «Математика»

48
16
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
analysis / differentiation / soil / types / plant / analiza / diferencijacija / tlo / vrste / biljka.

Аннотация научной статьи по математике, автор научной работы — Radovan Damnjanović, Snežana Krstić, Milena Knežević, Svetislav Stanković, Dejan Jeremić

It is frequently important in agroeconomics, on examing form example in plant breeding the problem might be to decide whether a plant or plant progeny belons to a high-yealding or low-yealding grop up. Sometimes decisions can be made on the basic of a single varialble, but more often of the 2 group differ in several variables, each of which gives some indication as to group in which the individual should be placed. This is a clasical problem of discrimination, where the general problem is to find a disrimination function.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

PRIMENA DISKRIMINACIONE ANALIZE U IZBORU TIPOVA ZEMLJIŠTA

Čest problem, u agroekonomiji na primeru ispitne forme uzgoja biljaka, jeste odluka o tome da li biljka ili biljno potomstvo pripada grupi koja donosi profit kada se uzgaja u velikim ili malim zasadima. Ponekad se odluke mogu doneti na osnovu jednog parametra, međutim, češće se ove dve grupe razlikuju na osnovu nekoliko promenljivih, od kojih svaka daje indikator o tome u kojoj se grupi pojedina biljka treba svrstati.Ovo je klasični problem klasifikovanja, gde je opšti problem da se pronađe funkcija raspodele.

Текст научной работы на тему «THE DISCRIMINANT ANALYSIS APPLIED TO THE DIFFERENTIATION OF SOIL TYPES»

Original scientific paper Economics of Agriculture 4/2017

UDC: 519.237:336]:631.51.02

THE DISCRIMINANT ANALYSIS APPLIED TO THE DIFFERENTIATION OF SOIL TYPES

Radovan Damnjanovic1, Snezana Krstic2, Milena Knezevic3, Svetislav Stankovic4,

Dejan Jeremic5

Summary

It is frequently important in agroeconomics, on examing form example in plant breeding the problem might be to decide whether a plant or plant progeny belons to a high-yealding or low-yealding grop up.

Sometimes decisions can be made on the basic of a single varialble, but more often of the 2 group differ in several variables, each of which gives some indication as to group in which the individual should be placed. This is a clasical problem of discrimination, where the general problem is to find a disrimination function.

Key words: analysis, differentiation, soil, types, plant.

JEL: C25, C35.

Introduction

According to Kardaun, et al. (1993), the theory of discriminant analysis is a well developed branch of statistics and at the same time still a field of active research. Part of the algorithms are implemented in special or general statistical packages. One can approach discriminant analysis from a purely data-descriptive point of view and from a probabilistic point of view (Both approaches, but most easily the latter one, can be incorporated into a decision theoretical framework). In the latter approach, a probabilistic model is used to describe the situation. The applicability of such a model in non-random situations may be questioned

1 Radovan Damnjanovic, PhD. Assistant Professor, Militaiy Academy, University of Defence, Belgrade, Pavla Jurisica Sturma 33, 11000 Belgrade, Serbia, E-mail: radovandam78@gmail.com

2 Snezana Krstic, Ph.D., Associate Professor, Military Academy, University of Defence, Belgrade, Pavla Jurisica Sturma 33, 11000 Belgrade, Serbia, E-mail: snezanakrstic17@gmail.com

3 Milena Knezevic, PhD., University of Defence, Belgrade, Pavla Jurisica Sturma 33, 11000 Belgrade, Serbia.

4 Svetislav Stankovic, PhD. Assistant Professor, Military Academy, University of Defence, Belgrade, Pavla Jurisica Sturma 33, 11000 Belgrade, Serbia.

5 Dejan Jeremic, PhD. Assistant Professor, Sequester Employment, Palmoticeva br. 22, 11000 Belgrade, Serbia, E-mail: office@sequesteremployment.com

EP 2017 (64) 4 (1513-1521) 1513

from a fundamental point of view (Breiman et al., 1984) However, such a probabilistic framework is almost indispensible if one wants to estimate the performance of procedures in future situations, and to express uncertainties in various estimates.

Moreover, it often leads to procedures that are also sensible from a data-descriptive point of view(Chercassky, Mucier, 2007). Or reversely: A specific procedure can often be viewed upon as a data-descriptive one, with little further interpretation, and as a probabilistic one, with considerably more interpretation, the validity of which is of course dependent on the adequacy of the framework.

Sometimes a procedure developed in one probabilistic framework can also be interpreted in another probabilistic framework, which may be more relevant for the data at hand (Farlov, 1984; Forsyth, 1989; Gilad-Bachrach, 2006; Gilad-Bachrach,2004).

Thanh et al. (2017) show that there has been a great effort to transfer linear discriminant techniques that operate on vector data to high-order data, generally referred to as Multilinear Discriminant Analysis (MDA) techniques. Many existing works focus on maximizing the inter-class variances to intra-class variances defined on tensor data representations. However, there has not been any attempt to employ class-specific discrimination criteria for the tensor data. In this paper, they propose a multilinear subspace learning technique suitable for applications requiring class-specific tensor models. The method maximazes the discrimination of each individual class in the feature space while retains the spatial structure of the input.

Early on, Beauchamp et. al (1980) implemented discriminant anlaysis method to uranium exploration. It is possile to use discriminant analysis methods on hydrogeochemical data collected in the NURE Program to aid in fomulating geochemical models that can be used to identify the anomalous areas used in resource estimation. Discriminant analysis methods have been applied to data from the Plainview, Texas Quadrangle which has approximately 850 groundwater samples with more than 40 quantitative measurements per sample. Discriminant analysis topics involving estimation of misclassification probabilities, variable selection, and robust discrimination are applied(Hart, 1989; Haussler, 1989; Han & Camber, 2000; Kantardzic, 2011). A method using generalized distance measures is given which enables the assignment of samples to a background population or a mineralized population whose parameters were estimated from separate studies (Milojevic et al., 2013; Vukoje, 2013; Stanojevic et al., 2017).

Also, Zhijin, et al. (1994) used the discriminant analysis method in multivariate statistical theory to handle the e n ^ separation in BES, describing the principle of the discriminant analysis method, deriving the unstandardized discriminant functions (responsible for particle separation), giving the discriminant efficiency for e n ^ and comparing the results from the discriminant analysis method with those obtained in a conventional way.

Data and Variables

Our data collected 286 samples of soil of which 100 contained the organism Azotobacter and 186 did not. Characteristics of the soil were suded:

X1 = pH6

X2 = amount of redealy avaiable phosphate X3 = total nitrogen content

Data are colected from Iowa Agriculture Experimentation Statsion, Cox and Martin (1) In our case, a sample for X1, X2, and X3 was taken to 52 samples of the earth. Group A had 25 samples and contained Azotobacter, while Group B had 27 samples and did not contain Azotobacta.

Methods

In our case, we will use discriminatory analysis in order to evaluate the difference in soil diversity. In other words, through the knowledge of 3 characteristics X1, X2, and X3, through formal presentations in our case, the application of discriminatory analysis can make significant indications whether the soil sample contains or does not contain the organism Azotobacter . Respecting the fact that Aztobacter positively affects agriculture products, which is not a matter of our consideration. For the purposes of our research, we have identified the use of stepwise discriminant analysis for the purpose of determining a variable that is decisive for the classification procedure (Kohavi, 1995; Quinlan, & Cameron-Jonas, 1995; Koteri & Lester, 2012), whether the type of soil contains or does not contain the bacterium Azotobacter. The first step in our analysis is the application of linear discriminatory analysis.

Linear Discrimination Analysis-LDA (Supervised Learning)

The first step in the classification process is the application of LDA in the application of the Data Mining method - finding drowned knowledge(Written & Frank, 2005), which presupposes learning on the sample, produced the following results:

We only hold on the confusion matrix, which indicates a resubstitution error of the order of 12%. A detailed analysis of the results shows that some variables are not important in the process of determining the presence of Azotobacter.

6 The pH value is the measure of the activity of hydrogen ions (H +) in the solution and thus determines whether a solution is of acid or base character. The pH value is dimensionless, and for the comparison, a pH scale of values ranging from 0 to 14 is used. For acid solutions, the pH is less than 7 (pH <7.0), and for bases it is greater than 7 (pH> 7 , 0)

Table 1. Classifier performances

Error rate 0,1224

Values prediction Confusion matrix

Value Recall 1-Precision Yes No Sum

Yes 0,7727 0,0556 Yes 17 5 22

No 0,9630 0,1613 No 1 26 27

Sum 18 31 49

Source: authors' calculations;

The classification function would be, as follows: Z = 21.5 X1 -0.07 X2 +0.03 X3 -76

The question arises as to whether the variables X2 and X3 should be rejected from the analysis as insignificant for our classification process.

Although the error is optimistic, we approach the use of resampling methods called bootstrap I which gives a better assessment of the classification potential.

Table 2. Boostrap error estimation

Error rate

.632+ bootstrap 0,1429

.632 bootstrap 0,1423

Resubstitution 0,1224

Source: authors' calculations;

We see that the actual error is significantly higher than the initial error.

Otherwise in the classification process is the application of stepwise discriminant analysis, with the results as follows:

Table 3. Detailed results

N d.f Best Sol.1 Sol.2 Sol.3 Sol.4 Sol.5

1 (1, 47) X 1 L : 0 ,4 5 6 5 F : 5 5 , 9 5 p : 0,0000 X 1 L:0,4565 F: 55,95 p : 0,0000 X 2 L : 0 , 7 0 0 0 F : 2 0 , 1 5 p : 0,0000 X 3 L : 0 , 7 2 9 5 F : 1 7 , 4 2 p : 0,0001 - -

2 (1, 46) - X 3 L : 0,4444 F : 1,26 p : 0,2679 X 2 L : 0 , 4 4 5 4 F : 1 , 1 5 p : 0,2900 - - -

Source: authors' calculations;

Using forward strategy, we obtained for F statistics 3.84 that there is only one relevant variant X1 = pH.

The third step in the analysis is the re-implementation of the LDA, which in this case gives the same classification error, but with a discriminatory function, as follows:

Z = 19.5 X1 -71.5

Again, using the bootstrap validation we came up with a similar error of13%.

Application of the LDA and STEPDISK classification indicates that after the application of the Data Mining method, the so-called. The supervised learning came to a single chrysal variable that, in combination with a constant, has a dominant effect on determining whether the type of soil contains or does not contain Azotobacter.

The next section of the appendix has the purpose to define how many "potentials" influence the classification variables through the application of the Decision Trees.

Application of Decision Trees

Learning the decision tree is the process of creating a discriminating function in the form of a decision tree (1,8), (2,995-1003), (18, 404). The tree is created recursively, from the top (roots) to the leaves, so each tree node represents a logical test of the value of an attribute from the description of the problem, and leaves represent the class in which the example is classified. When creating, the assortment of attributes for each node is done by heuristic methods, based on the assessment of the quality of discrimination (under) of a set of examples from the training session, remaining for discrimination in the observed node. Although a tree can perfectly classify all the cases from a training session, it does not represent a high accuracy guarantee on new examples, as they are often overfits according to training examples, so simplification is made, resulting in smaller trees, which are more accurate at the same time and more comprehensible. In our analysis, we used well-known decision-making algorithms, C4.5 (16,287), which are available within the WEKA (University ofWaikato) system (19) for the purpose of selecting associated attributes. The main advantage of the decision tree is to provide a significant way of presenting knowledge by extracting IF-THEN classification rules.

The results obtained indicate a level of accuracy of 90% versus an error of 10%. Decision tree

- X1 <6.8000

• X1 <5.7500 then contains Azotobacter = No (100.00% of 9 examples)

• X1> = 5.7500

■ X2 <34.0000 then contains Azotobacter = No (90.91% of 11 examples

■ X2> = 34.0000

o X1 <6.1500 then contains Azotobacter = Yes (60.00% of 5 examples)

o X1> = 6,1500 then contains Azotobacter = No (83.33% of 6 examples)

- X1> = 6.8000 then contains Azotobacter = Yes (94.44% of 18 examples)

Attribute selection methods

The formation of an adequate model is based on the knowledge of the problem and is often reduced to the selection of the corresponding set of attributes. The existence of irrelevant and redundant (irrelevant, surplus) attributes in the problem model negatively influences the performance of most of the inductive learning methods, and such attributes are often removed from consideration by the method of previous or embedded selection of attributes (2). The optimal set of attributes contains all relevant attributes, while redundant and irrelevant attributes are usually excluded from consideration, although poorly relevant redundant attributes potentially contain information that can affect the improvement of classifying performance in practice (2), (4), (9). In the attachment, some methods of the previous selection of attributes embedded in the WEKA system (19) will be used to further check the significance of individual attributes from the problem model.

Results WEKA Selection of the Attribute. With different methods of searching and evaluating attribute subsets, the best subgroup is found, which gives the most accurate rules (trees). Some of the methods for individual attributes also give numerical estimates.

Method Relieff (evaluates each attribute separately), gave the following results:

Table 4. Detailed results

N Attribute Weight

1 X1 0,175578

2 X3 0,054731

3 X2 0,048435

Source: authors' calculations;

The Relieff method estimates that the most important attributes are hierarchically compared: X1 = pH.

Conclusion

Agroeconomics is facing increasing challenges, especially in the domain of research not only on the quality of land, but also on other food resources as sources of organic food. Methods of finding hidden knowledge have a presumption in relation to classical methods because they more precisely classify, and have higher predictive capacities.

The aim of this study is to examine the usefulness and exactness of these methods in the case of examining the presence of an asteroid in the soil or non-existence (category "yes" and "no") based on the sample examination. Supervised Linear Discimination Analysis was used to identify the specific effect of variables on the presence versus non-deposition of the Azotobacteria with methods of validating the accuracy of the classification of the effect of variables and identifying the key variables in this case, this is the presence of pH. In addition to this method, the Decision Tree was used, which gave results that are more precise in terms of determining the level to which the influence of individual

variables is. The data obtained are accurate at the level of 90% and unlike conventional multivariate research, this is a survey where the influence of four variables on the presence of Azotobacteria from which three variables are not decisive for qualification is improved by means of the supervised analysis. Everything that the research put into the foreground was achieved and this is a great degree of research accuracy (level of 90%).

The RILIEF method - the selection of the attribute clearly defined the supremacy of the pH - factor effect, while the impact remained two relatively minor values of about 5% respectively. The use of this methodological tool would greatly help researchers in the field of agriculture, especially because of the possibility for research to be carried out on scarce training sessions with a large number of attributes (characteristics of the subject of research, eg land, quality of agricultural products, fruits, vegetables, eggs, meat I many others) and very few examples (the so-called scarce rallies). The problem of scarcity is related to the task difficulty assessment, which is dealt with in the Domain Data Mining domain by reducing the number of attributes (variables). Such methodological approaches enable the discovery of hidden knowledge in agronomy and agroeconomics, and primarily in the causes that determine the key - determined variables and attributes and factors for solving research problems and correct hypothesis, both in the field of agronomy and in other fields of research.

Literature

1. Beauchamp, J.J., Begovich, C.L., Kane, V.E. & Wolf, D.A. (1980): Application of discriminant analysis and generalized distance measures to uranium exploration, Journal of the International Association for Mathematical Geology; vol. 12, No. 6, pp. 539-558.

2. Breiman, L., Friedman, J.H., Olshen, R.A., Stone C.J. (1984): Classification and Regression Trees, Wadsworth, Belmont.

3. Chercassky, V., Mucier, F.M. (2007): Learning from Data: Concept, Theory and Mehods, 2ed, Jogn Wiley -IEEE Press.

4. Farlov, S. (1984): Self-Organizing methods in Modeling: GMDH tуре Algorithm, Taylor and Francis.

5. Forsyth, R.(1989): Machine Learning: Princples and technics, London: Chapman and Hall.

6. Gilad-Bachrach, N. F.(2006): Large margin principles for feature selection", In Guvon, G., Sikravesh, Z.(2006): Feature extraction, foundations and applications, SpringerVerlag.

7. Gilad-Bachrach, N. F.(2004): Margin based feature selection - theory and algorithms, InProc. 21st ICML.

8. Han, J., & Camber, M. (2000): Data mining concepts and techniques, San Diego, USA: Morgan Kaufman.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

9. Hart, A. (1989): Machine induction as a form of knowledge acquisition in knowledge engineering, in Forsyth. R. (1989): Machine Learning: Principles and techniques,

Chapman and Hall, London.

10. Haussler, D. (1990): Probably approximately correct learning. In Proc. of the 8 thNational Conference on Artificial Intelligence, pp. 1101-1108, Morgan Kaufmann.

11. Kantardzic, M. (2011): Data mining: concepts, models, methods, and algorithms. Willey-IEEE Press.

12. Kardaun, O.J.W.F., Itoh, S.I., Itoh, K., and Kardaun, J.W.P.F. (1993) Discriminant Analysis to Predict the Occurrence of ELMS in H-Mode Discharges, Nagoya, Japan: National Institute for Fusion Science.

13. Kohavi, R. (1995): A Study of Cross-validation and Bootstrap for Accuracy Estimation and Model Selection, Proc. of International Joint Conference on Artificial Intelligence.

14. Koteri, S., Lester, R. (2012): The Role of Accounting in the Financial Crisis: Lessons For The Future, Accounting Horizons, vol. 26. No.2, pp. 335-352.

15. Milojevic, I., Vukoje, A., Mihajlovic, M. (2013): Accounting consolidation of the balance by the acquisition method, Ekonomika poljoprivrede, vol. 60, no. 2 , pp. 237-252, Drustvo agrarnih ekonomista, Beograd, Srbija.

16. Quinlan, R. J., Cameron-Jonas, R.R. (1995): Introduction of Logic Programs: FOIL and Related Systems, New Generation Computing, vol 13, pp.287-312.

17. Stanojevic, S., Bordevic, N., Volf, D. (2017): Primena kvantitativnih metoda u privredivanju poslovanja privrednih drustava , ODITOR, vol. 3, No. 1, pp. 91-101. Centar za ekonomska i finansijska istrazivanja, Beograd, Srbija.

18. Thanh, T.D., Moncef, G., & Alexandras, I. (2017): Multilinear class-specific discriminant analysis, Aalborg, Pattern Recognition Letters, vol. 93, No. 3, pp. 131-136, Elsevier, Denmark.

19. Vukoje, A. (2013): Faktori egzistencije kao uslov stvaranja trzisne pozicije preduzeca, ODITOR, vol. 1, No. 5, pp. 27-37, Centar za ekonomska i finansijska istrazivanja, Beograd, Srbija.

20. Written, I.H., Frank, E.(2005): Data Mining: Practical machine learning tools and techniques, 2 end edition, Morgan Kaufman, San Francisco.

21. Zhijin, J., Taijie, W., Yigang, X. & Tao, H. (1994): The use of the discriminant analysis method for e n p separation in BES, Netherlands: Nuclear Instruments and Methods in Physics Research. Section A, Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 345, No. 3, pp. 541-548.

PRIMENA DISKRIMINACIONE ANALIZE U IZBORU TIPOVA

ZEMLJISTA

Radovan Damnjanovic7, Snezana Krstic8, Milena Knezevic9, Svetislav Stankovic10,

Dejan Jeremic11

Rezime

Cest problem, u agroekonomiji na primeru ispitne forme uzgoja biljaka, jeste odluka o tome da li biljka ili biljno potomstvo pripada grupi koja donosi profit kada se uzgaja u velikim ili malim zasadima.

Ponekadse odluke mogu doneti na osnovu jednogparametra, medutim, cesce se ove dve grupe razlikuju na osnovu nekoliko promenljivih, od kojih svaka daje indikator o tome u kojoj se grupi pojedina biljka treba svrstati.Ovo je klasicni problem klasifikovanja, gde je opsti problem da se pronade funkcija raspodele.

Kljucne reci: analiza, diferencijacija, tlo, vrste, biljka.

7 Docent Radovan Damnjanovic, Vojna akademija, Univerzitet odbrane, Beograd, Pavla Jurisica Sturma 33, 11000 Beograd, Srbija, E-mail: radovandam78@gmail.com

8 Vanredni profesor Snezana Krstic, Ph.D., Vojna akademija, Univerzitet odbrane, Beograd, Pavla Jurisica Sturma 33, 11000 Beograd, Srbija, E-mail: snezanakrstic17@gmail.com

9 Milena Knezevic, Univerzitet odbrane, Beograd, Pavla Jurisica Sturma 33, 11000 Beograd, Srbija.

10 Docent Svetislav Stankovic, Vojna akademija, Univerzitet odbrane, Beograd, Pavla Jurisica Sturma 33, 11000 Beograd, Srbija.

11 Docent Dejan Jeremic, Sequester Employment, Palmoticeva ulica br. 22, 11000 Beograd, Srbija, E-mail: office@sequesteremployment.com

i Надоели баннеры? Вы всегда можете отключить рекламу.