Научная статья на тему 'Cluster analysis, fuzzy sets, and fuzzy logic models in bird identification'

Cluster analysis, fuzzy sets, and fuzzy logic models in bird identification Текст научной статьи по специальности «Биологические науки»

CC BY
230
34
i Надоели баннеры? Вы всегда можете отключить рекламу.
Журнал
Ukrainian Journal of Ecology
Область наук
Ключевые слова
bird species / identification / cluster analysis / fuzzy sets / fuzzy logic

Аннотация научной статьи по биологическим наукам, автор научной работы — V. V. Osadchyi, V. S. Yeremeev, A. V. Matsyura

In our resent research (Osadchiy at al., 2016) we considered the mathematical model for the identifying of bird species according to the results of inaccurate field measurements. We used the total length of the bird, the wingspan, the wingbeat frequency, and the flight as the input factors of the model. Testing the model on a hypothetical case of identifying some target species, like Rook, Common raven, Mallard, White Stork, and Lapwing revealed that this model can be used for bird species identification with definite limitations. However, in previous model we applied the recognition algorithm that was based on the classical sections of mathematical statistics. The limitations of those model are obvious it does not take into account many characteristics and behavioral features of birds that cannot be represented in numerical form, like diurnal activity pattern and flocking behavior. In this case the possibility of using the traditional sections of mathematical statistics is quite limited. The present study is devoted to the development of a mathematical method for the identifying of the bird species that based on cluster analysis with fuzzy logic and fuzzy sets which extends the possibilities of the algorithm that was previously proposed in our research.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Cluster analysis, fuzzy sets, and fuzzy logic models in bird identification»

Ukrainian Journal of Ecology

Ukrainian Journal of Ecology, 2017, 7(2), 96-103, doi: 10.15421/2017_25

ORIGINAL ARTICLE UDC 636.5:637.5:637.04

Cluster analysis, fuzzy sets, and fuzzy logic models

in bird identification

V.V. Osadchyi1, V.S. Yeremeev1, A.V. Matsyura2

1Bogdan Chmelnitskiy Melitopol State Pedagogical University, Melitopol, Ukraine,

E-mail: poliform55@gmail.com 2Altai State University, Barnaul, Russia, E-mail: amatsyura@gmail.com Submitted: 18.01.2017. Accepted:25.04.2017

In our resent research (Osadchiy at al., 2016) we considered the mathematical model for the identifying of bird species according to the results of inaccurate field measurements. We used the total length of the bird, the wingspan, the wingbeat frequency, and the flight as the input factors of the model. Testing the model on a hypothetical case of identifying some target species, like Rook, Common raven, Mallard, White Stork, and Lapwing revealed that this model can be used for bird species identification with definite limitations. However, in previous model we applied the recognition algorithm that was based on the classical sections of mathematical statistics. The limitations of those model are obvious - it does not take into account many characteristics and behavioral features of birds that cannot be represented in numerical form, like diurnal activity pattern and flocking behavior. In this case the possibility of using the traditional sections of mathematical statistics is quite limited. The present study is devoted to the development of a mathematical method for the identifying of the bird species that based on cluster analysis with fuzzy logic and fuzzy sets which extends the possibilities of the algorithm that was previously proposed in our research.

Kew words: bird species, identification, cluster analysis, fuzzy sets, fuzzy logic.

Bird identification is key point of field ornithological research. By now, the ornithologist is guided by his experience or field guide with information on the body mass, geometric dimensions and plumage color of the birds (Ryabitsev, 2001; Opredelitel', 2017). Thus, researchers from the California University of Technology and Cornell University developed an online service for identifying bird species of the United States and Canada by photographs (Identifikatsiya ptits, 2016).

Nowadays, the use of technical methods and computer data processing are extremely important for the analysis and processing of field bird observations (Bosak, 1990; Potapov, 1990b; Il'ichev et al., 1975; Ganya et al., 1991). The sound spectroscopy makes it possible to recognize the bird species and to study vocalization patters, including even the research of local dialects. Ornithologists at the Cornell Laboratory started to use sonograms to record songs of night birds and observe the exchange of information between them using alarms calls (Zhambyu, 1988).

In addition to scientific interest, ornithological research is of great practical importance. One of the problems relates to the risk of airstrike with the birds. At present, around 2.5-3 thousand of aircraft collisions with the birds are recorded annually in the world. Nowadays, every large airport has its own ornithological service that studies the migration routes of birds and conducts measures to control their abundance.

Technical means, automation of the observation process and application of mathematical methods for processing field data allow to perform ornithological research at more higher level. In last decade, the applied ornithological database was created by Osadchiy et al. (2015) for accumulation of the results of observations over various bird species in the southeastern region of Ukraine. Its content and data reliability depends on large number of factors - weather conditions, the technical limitations of filed surveys, terrain type and so on. Bird species determination is closely connected to the accuracy of using visual or technical means. The use of technical means allows expanding the scope of observation and more precisely organizing the bird counts. Modern technical facilities significantly expand the possibilities of obtaining more correct data, but they still have some limitations. In Osadchiy et al. (2016) the mathematical model for recognition of bird species was presented concerning the results of errors in field measurements. Some four parameters were considered as input factors of the model: the total length of the bird, the wingspan, the wingbeat frequency, and the flight speed. Testing the model on a hypothetical case of recognition

of rooks, crows, ducks, storks, and lapwings suggested that this model could be used at some extent. The model recognition algorithm was based on classic mathematical statistics while the limitations of this model are obvious - it does not consider many characteristics and behavioral features of birds that cannot be represented in numerical form. For example, one species has high activity in the morning time and others - in the afternoon, some species prefer flight in flocks whereas the others are not. In such situations, the possibility of using traditional sections of mathematical statistics is excluded. The present study is devoted to the development of a mathematical method for identifying the bird species by cluster analysis (Zhambyu, 1988) using fuzzy logic (Konysheva, Nazarov, 2011) and fuzzy sets (Eremeev, Baryshevskiy, 2011), which extends the possibilities of the algorithm proposed in (Osadchiy et al., 2016).

Methods

The main filed characteristics of bird species are: geometric dimensions (length of the bird, wingspan, etc.); wingbeat frequency; flight speed; body weight. The accuracy of the measured parameters depends on the distance to the observer, the technical parameters of observation instruments, the terrain pattern and weather conditions, among which we selected: air transparency (rain, snow, nebula); wind speed and direction; atmospheric pressure; air humidity and temperature. The mathematical model for recognizing the bird species observed in real time is represented in the form W = W (U1, U2, ... Un, V), (1)

where W = 1,2 ... i ... m is the set of species, U1, U2, ... Uk ... Un - bird parameters measured during the observation, V - noise interference due to measurement errors.

Each bird species, W = i is characterized by a set of properties U0i1, U0i2, ... U0ik, ... U0in, which we unite into the set U0i = U0i {U0ik}, where i = 1,2 ... m.

The sets U0i for all the bird species, W, form the reference set U0 = U0 {U0i}. The measurement results U1, U2, ... Un of the unknown species form set U = U {Uk} k = 1, ..., n. The set of measurements, U in general does not coincide with any of the sets U0i. Therefore, the question whether the observation results belong to one of the reference species should be considered from probability point of view.

Let's suppose that in the process of field observations, data on the parameters U1, U2, ... Un of an unknown bird species were obtained and we need to select a species W from set of reference objects for which the reference values U0i1, U0i2, ... U0in are in highest coincidence with measurements. Possible solution of a similar problem for the parameters U1, U2, ... Un, characterized by continuous random variables, was obtained in (Osadchiy et al., 2016). In our case this restriction is removed. Some parameters can be set at a qualitative level like the bird species / is a nocturnal species while species i+1 - diurnal species; one bird has black color, another - white, etc. Similar problems can be solved with the help of cluster analysis (Konysheva, Nazarov, 2011) using methods of fuzzy logic (Zhambyu, 1988). Conceptual tool for cluster analysis.

Cluster analysis is widely used in the classification of information, depending on many factors (Hartigan, Wong, 1979). The initial data for solving the problem are the measurements U1, U2, ... Un and the set of reference values of these parameters U0i1, U0i2, ... U0ik, ... U0in, i = 1, 2, n.

This mathematical model is presented on Fig. 1 as "black box" with reference parameters, where the input factors are the results of measurements, and the output factor is the identified species of observed bird.

Fig. 1. Mathematical model of bird species identification.

The algorithm for solving the problem should ensure the allocation of a set i of the set of known bird species, which is most likely to correspond to the measured values for its reference indicators U0i1, U0i2, ... U0ik, ... U0in. The set of reference parameters U0i1, U0i2, ... U0ik, ... U0in, for each of the form W = i will be associated with the reference cluster under the number i. We assign a set of parameters U = U {Uk} k = 1, ..., n to a cluster named "Measurements", characterizing the results of observations. The simplest formula for determining the distance between the i-th and k-th clusters is (Obzor, 2017):

k =n

rv=Zfk\U0 -U0\,i,j = 1,2...m, (2)

k=1

Where fkis the coefficient by which the significance of the individual parameters can be adjusted. The choice of the formula for the calculation of the measure depends on problem statement; to obtain the best result it is necessary to experiment with various formulas. The geometric distance in a multidimensional Euclidean space

k=n

r,j = (Z fU - )2}1/2, i, j = 1,2...m, (3)

k=i

r =Z f U-U0 )2, i, j = 1,2...m, (4)

or its square is often used:

k=n

jOi

' ij = ^ Jk (U k

k=1

The numerical order of parameters U0ik of the same species can be significantly different. For example, the wingbeat frequency can be measured by 1 -2 Hz, and the length of the bird is like 100-130 cm, which leads to a disproportionate contribution of the corresponding parameters to calculated value/distance (3) or (4). With the aim of leveling such a shortcoming, a relative difference was used in (Osadchiy et al., 2016):

(UO — u0 )/U0 , (5)

The transformation (5) gives good results at a relatively high measurement correctness. We denote the variance of the measurement error U0 by (tfk)2. As the accuracy decreases, when the root-mean-square deviation oik approaches the

difference U0 — U\j, it is advisable to use the value

Zf = (UO — U0J)/a! , (6)

instead of the value (5), where ojk is the root-mean-square deviation equal to the square root of the variance, ( jj)2 and determines the errors of the k-th parameter measurement for bird species with indices / and j in accordance with the formula:

(jj )2 = (j )2 + (j )2, (7)

Results

In the present paper, we suggested to calculate the distance between clusters due to:

k=n

'j = £[/ (Zk0j )2]}1/2, i, j = 1,2...m, (8)

k=i

instead of formula (3), where the relative difference between two values for the same parameter is determined by the formula (6). As was mentioned earlier, some parameters have discrete characteristics. This excludes the possibility of their estimation with the help of a continuous set of real numbers and, therefore, excludes the possibility of calculating distances by formulas (2, 3, 4, 8). In this case, we use fuzzy definitions. Let us consider three examples. The first example.

Let one of the parameters of the observed species determine the degree of blackness of bird plumage. The values of the parameter "Black color" will be given by numbers from 0 to 2.86 (the second column on the left in Table 1), namely: the value "Absolutely black" is 2.86, the value "Rather black than other color" is 1.66, etc. The value of U from 0 to 2.86 in Table 1 should be considered as reference points for an approximate estimation of plumage color. The corresponding probabilities are given in the third column of Table 1: the value "Absolutely black" corresponds to 1, the value "Rather black than other color" - 0.90, etc. The numerical characteristics for the parameter U in Table 1, 2, and 3 were chosen so that the random variable (6) obeys a normal distribution with zero mathematical expectation and unit variance.

Table 1. Quantitative characteristics of plumage blackness in observed bird species.

Parameter for bird plumage color

"Absolutely black"

"Rather black"

"Undefined"

"Rather non-black"

"Non-black"

Value, U 2.88 1.66 0.97 0.46 0.0

Probability, P(U) =1.0 0.90 0.67 0.35 0.0

The second example.

Let one of the parameters characterized the daily activity of bird species towards three periods like: "Morning" (until 10.00), "Daytime" (from 10.00 to 15.00), "Evening" (from 15.00).

Table 2. Daytime bird activity

Daily activity parameter Value, U Probability, P(U)

Before 10.00 AM 2.88 =1.0

From 10.00 till 3.00 PM 0.97 0.67

From 3.00 PM 0.0 0.0

Example three.

It is known that some bird species demonstrate flocking behavior. The individual, being in the flock, spends less time tracking the danger and more time for feeding. On the other hand, in this case part of the energy is spent on social conflicts (fights, demonstrative behavior) (Matematicheskie modeli, 2017). This could be presented by fuzzy logic and illustrated by Table 3.

Table 3. Estimation of flocking behavior tendency

Spatial parameter

Flock pattern Individual pattern

Value, U

2.88 0.0

Probability, P(U) =1.0 0.0

If the relative difference (6) is used in the formula (4) instead of the difference U0' — U\j, then the square of the distance between the measures is determined by the formula:

k=n

R = 1/ (Z0j )2], i, j = 1,2...

m,

(9)

k=i

In literature, a more general power-law measure of this distance is also used:

Sj = {£ [/k (Z0j ) P r ", i, j = 1,2...

m,

(10)

k=i

For p = r = 2 expression (10) coincides with (9) and for p = 2 and r = 1 with the formula (8). It is not possible to give preference to any one expression from (8, 9, 10). It all depends on the properties of the clusters. The optimal choice of a metric is made on a specific material and requires additional research. Distances between reference clusters

Reference data for some bird species are given in Table 4 and contain parameters determined by a continuous set of real numbers (the length of the bird, the wingspan, the wingbeat frequency, the flight speed), and the qualitative parameters (the plumage blackness, species activity at different daily times, flocking behavior tendency) in accordance with the data of Table 1 and 3. The first four parameters for rook, raven, duck mallard, stork and lapwing are taken from (Osadchiy et al., 2016), whereas the last three were suggested by experts. Let's consider the mallard duck as an example of plumage color estimation. According to experts, the degree of blackness regards "black-not black" scale rather could be accepted as "difficult to determine". Therefore, this value, U, considered to be 0.97 according to Table 1. The errors of the set value can be determined by the nearest upper and lower values of the blackness parameter from Table 1. The nearest upper value is 1.66. Adding half the difference between 1.66 and 0.97 to 0.97, we get the maximum estimated value equal to 1.31. The closest lower value for this parameter is 0.46. By subtracting from 0.97 half the difference between 0.97 and 0.46, we get the minimum estimated value equal to 0.72. All cells of Table 4 were filled in similar pattern. The parameters of individuals can vary considerably depending on the habitat and time of observation, therefore the authors do not pretend to exact values set in Table 4.

Table 4. The reference values Ui for some bird species.

Bird species parameter

Species Bird Wingspan, Wingbeat Flight speed, Plumage Daily activity, U6 Spatial pattern,

length, U2, cm frequency, U4, km/h darkness, U7.

U1, cm Us, Hz U5

Rook, W1 60-70 130-140 3-4 50-60 2.27-2.88 0.48-1.90 1.43-2.86

Raven, W2 60-70 120-130 3-4 40-50 1.32-2.26 0.48-1.90 0.0-1.42

Mallard, W3 57-62 85-95 5-7 72-97 0.72-1.31 1.91-2.86 1.43-2.86

Stork, W4 100-115 155-165 1,5-2,5 35-45 0.0-0.23 0.0-0.47 0.0-1.42

Lapwing, W5 27-33 78-88 40-45 95-105 0.24-0.71 0.0-0.47 0.0-1.42

At the first stage, we calculated the distances between the reference clusters ry in formula (8) with the coefficients fk = 1. It was

' j

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

assumed that root-mean-square deviations (( and (( in formula (7) are equal to half the difference between the maximum and minimum values in the reference parameters of Table 4. The results of the calculations are presented in Table 5.

Table 5. Distances between the reference clusters ry calculated by formula (8).

W?,Rook W?,Raven WsMallard W4Stork W5,Lapwing

W?,Rook 0 1.005 3.09 3.83 7.95

W?,Raven 1.005 0 2.44 2.98 7.75

WsMallard 3.09 2.44 0 25.15 6.12

W4Stork 3.83 2.98 5.01 0 9.13

W5,Lapwing 7.95 7.75 6.12 9.13 0

The distance between clusters /and jis equal to the distance between clusters jand i, so Table 5 is symmetrical with respect to

the diagonal elements.

The calculation of the diagonal elements themselves requires additional explanation. Formally, according to expression (6), the value Z0 for i = j is 0. Therefore, the diagonal elements rM calculated by formula (8) are also equal to 0, which is reflected in Table 5.

Such an approach for computing rn is justified when the reference values of the parameters U0' are specified with absolute

accuracy. In fact, each of these parameters is defined in a certain probability interval [U0'min,U0'max].

Suppose there are several reference databases. We denote the values of the same parameter for species i in different databases

as Uf and U°k'2. Since U0'1 and U°'2are within the interval [U°'mm,U°'max] their values differ by not more than

U0'max _U0'min, i.e. by a quantity of the order (U°'max -U0'min)/2 on average. Earlier it was indicated that the standard deviation o'k, which characterizes the errors of parameter k for species i, was assumed to be equal to

_'' t-j rOimax j rOimin\ / r\

= (Uk _ Uk )/2 .

Therefore, the value Z0 = ^U"'1 — U0 2)|/^f calculated regarding formula (7) is 1/V2. In this case, the diagonal elements

U0'1 and U0'2 equal to the distance between and calculated by formula (8), will be equal to V3.5 = 1.87. Replacing zero values by 1.87 in the diagonal elements of Table 5 we rewrite it (Table 6).

Table 6. Distances between the reference clusters rij, calculated by formula (8) regards the errors of the diagonal elements.

W1,Rook W?, Raven W3,Mallard W4,Stork W5,Lapwng

W1,Rook 0 1.005 3.09 3.83 7.95

W?,Raven 1.005 1.87 2.44 2.98 7.75

W3,Mallard 3.09 2.44 1.87 5.01 6.12

W4,Stork 3.83 2.98 5.01 1.87 9.13

W5,Lapwing 7.95 7.75 6.12 9.13 1.87

All distances in Table 6 are calculated by the same formulas for the normalized values of the parameters (6). Since the diagonal elements of this table, which are equal to 1.87, characterize the errors of setting reference parameters, then all the values of distances rij between clusters, less than 1.87, can be attributed to insignificant, and the corresponding individuals are considered indistinguishable. Since r-i2 = 1.005 <1.87, then the identification of hooded crow from rook is impossible in considered measurement correctness.

The distances r™ = 3.83, r-is = 7.95, r25 = 7.75, r34 = 5.01, r35 = 6.12, and r45 = 9.13 are by several times larger than the diagonal element 1.87, so the pairs "rook" - "stork", "rook" - "lapwing" "raven" - "lapwing", "duck" - "stork", "duck" - "lapwing", "stork" -"lapwing" can be attributed to well-distinguishable. Distances r23 = 2.44, r24 = 2.98, r13 = 3.09 are slightly larger than the diagonal element. Therefore, with some confidence, we can assume that the pairs "raven" - "duck", "raven" - "stork", and "rook" - "duck" are rather distinct than indistinguishable. To obtain more correct conclusions, the distances between the reference clusters Rij were calculated by the formula (9). The results of the calculations are given in Table 7.

Table 7. Distances between the reference clusters Rij, calculated by the formula (9).

W?,Rook W2,Raven W?,Rook 0 1.01 W?,Raven 1.01 0 WjMallard 9.56 5.93 W4Stork 14.66 8.88 W5,Lapwing 63.15_60.13

WaMallard W^Stork W5,Lapwing

9.56 14.66 63.15

5.93 8.88 60.13

0 25.15 37.51

25.15 0 83.27

37.51 83.27 0

The deviation of the reference parameter U0jk from the reference parameter U0ik is a random variable that obeys normal

distribution with a variance approximately equal to (^f)2. The normalized parameter (6) is also a random variable that obeys

normal distribution with a mathematical expectation equal to zero and a variance equal to unity. Therefore, the sum of deviation squares (9) for two random species with indices i and j obeys the Pearson distribution x2 (chi-square) with kdegrees of freedom (Kremer, 2004). The density of this distribution could be:

i=k

z2 =ZZ2, (11)

'=1

Where Zi is a random variable that obeys the normal distribution law with zero mathematical expectation and unit variance. The density of the distribution x2 is:

k - x

x2 e 2

V(x) = ——TT 'x - 0

22 T(k) 2

0, x < 0

to

Where r(y) = Je~'ty-ldt is the Euler gamma function, which is (y-1) for positive integers!

0

Since the sum of the distances between the clusters Ry obeys the distribution (11) in Table 7, then by using the Pearson criterion, we can test the Ho hypothesis that the distances between two reference species are equal to zero. The critical values of x2cr for k degrees of freedom are given in Table 8.

Table 8. Critical values of x2cr for significance level q = 0.3 (Obzor, 2017).

~k 3 4 5 6 7 x2pp_3.66_4.88_6.06_7.23_8.38_

The number of parameters k, in our case is equal to 7. According to Table 7 the critical value of x2cr is equal to 8.38. The distances between clusters for all pairs of birds, except for the pairs "rook" - "duck" and "raven" - "duck" are much larger than the critical value of 8.38, so it can be argued that despite the errors of specifying the parameters of birds, most species can be assumed as distinguishable (Table 7).

We noted that the measure (9) is more sensitive in comparison to (8) when species are identified. So, the measure (9) in Table 7, in contrast to measure (8) in Table 6 allows us to distinguish clusters in pairs "rook" - "duck" and "crow" - "duck". Testing "Measurements" cluster vs one from reference clusters

The results of measurements U1, U2, ... Un will be merged into a cluster called "Measurements". Let's provide it the number m+1. The distance between the "Measurement" cluster and any reference cluster with the number i in Euclidean space is determined by formulas (8) or (9):

k =n

rhm+i = d[/k (Zk0,m+1)2]}1/2, i = 1,2...m, (12)

k=1

Or

k=n

Rmi = 1/ (ZT+1)2], i = 1,2...m, (13)

k=1

Where Zlim1 = (U0i - Um+1) / &km+1 . (14)

The power function for calculating the distance can be written similarly to (10) in the form:

Sj = d [f (Z0,m+1) P J)1'" j = l,2...m, (15)

k=1

We can consider an algorithm for identifying an unknown species using the reference parameters presented in Table 4. Suppose we could obtain the data that based on observations of an unknown species and present them in Table 9.

Table 9. Parameter values for identification of an unknown bird species (cluster with number m=6 "Measurements").

Species Bjrd

length, cm

Unknow 50-60 n

Unknown bird species parameter

Wing span, cm Wingbeat Flight speed, Plumage Daily Distribution

frequency, km/'h darkness activity pattern

Hz

80-90 4-6 70-90 0.24-0.71 1.91-2.86 0.0-1.42

The distance ri6 between the "Measurement" cluster and the reference clusters was computed using formula (12). It was assumed that the root-mean-square deviation <j6k in formula (14) is half the difference between the maximum and minimum

values for the cluster parameter "Measurements" (Table 9), and the standard deviation j in the formula is half the difference

between the maximum and minimum values for the reference parameters in Table 4. The results of the calculations are presented in ranking series:

rmallard= 0.86< rraven6=2.64<rraok6=3.42<rstork6=4.89<rlapwing6=5.82, (1 6)

it is clear from (16) that the duck has the greatest probability of identifying the observed object, followed by the raven, rook,

stork, and lapwing (in descending order

In some cases, when one of the criteria is an order of magnitude more than alternative variants, such mathematical processing of data can fully satisfy the researcher. The very probability of identification remains unknown, although some preliminary conclusions can be obtained from the following considerations.

The root-mean-square deviations <j6k and j'k characterize the variances of the k-th parameter for the "Measurements" cluster and the reference parameter of the ith species. The calculated distances ri6, formula (12) are equal to the square root of the sum of squares of the factor Z°''m+1, due to formula (14). The numerator Z°''m+1 is equal to the difference between the

reference value of the reference cluster parameter U0' and the cluster parameter "Dimensions" Uk+X. The denominator is

equal to the root-mean-square deviation Jkm+1, which characterizes the measurement error. For Z°''m+1 <1, the value Jkm+1

is not less than the difference | U0' - Uk+X |, which indicates the coincidence of these parameters with a probability of at least 0.5-0.6.

Since the number of parameters in our case is 7, the coincidence of the clusters "Measurements" and "Ducks" has the probability of approximately equal to (0.5-0.6) / 7=0.1. Similar reasoning allows concluding that there is a low probability of coincidence of the unknown bird with the stork and lapwing. The question of identifying an unknown species with a crow and rook requires additional analysis, which we could perform by formula (13). The results of the calculations are presented below in ranking series:

Rmallard = 0.74 < Rraven6 = 6.99 < Rrook6 = 11.73 < Rstork = 23.89 < Rlapwing6 = 33.89, (1 7).

The distances Ri6 in (17) obey the x2 distribution. According to Table 8 the critical value of Pearson's criterion for seven degrees of freedom at a significance level of q = 0.3 is equal to 8.38. Therefore, the last three species - rook, stork, and lapwing (17) should be excluded from consideration towards identifying with an unknown bird. The question of the coincidence or difference in clusters "Dimensions" and "crows" remains open. The answer can be obtained by using more precise measuring devices. Impact of measurement errors on the validity of unknown bird identification.

The relevance of unknown bird identification depends on two standard deviations. One of them determines the error in

measuring the k-th parameter of an unknown bird j^1, the other j0' is the error of setting the k-th parameter for the

reference cluster of the ith species. In this paper, the calculation of distances between clusters was carried out using formulas (12) and (13) in a multidimensional Euclidean space. The values calculated from formula (13) allow us to determine the reliability of the conclusions using the statistical distribution x2. Therefore, when studying the correctness of the identification of an unknown species, we use this formula.

In Table 10 we presented the results of calculations of x2 for various dispersions (j''m+1)2 for fixed values of the average reference parameters, given in Table 4, and average parameters of unknown species for the cluster "Measurements", Table 9. The second column in Table 10 corresponds to the values (j''m+1)2 used in the construction of the ranking series (17). The first,

third, and fourth columns are obtained for values (j'km+1)2 multiplied by 0.5, 2.0, and 4.0, respectively.

Table 10. Influence of dispersion (j''m+1)2errors on the distance between the reference clusters and the cluster "Measurements", calculated by the formula (13).

Species

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Rook

Raven

Mallard

Stork

Lapwing

0.5 (a'k6)2

23,46

13,98

1,48

47,78

67,78

1.0 (^k6)2

11,73

6,99

0,74

23,89

33,89

2.0 (^f)2

5,86 3.48 0.37

11.94

16.95

4.0 (^k6)2 2,93 1,74 0,18 5,98 8,7

The critical value of the Pearson criterion for the seven degrees of freedom with a significance level of q = 0.3 is 8.38. The decrease in the accuracy of measurements is even two times higher than the standard value corresponding to the second column in Table 10, so we can state that the clusters "Measurements" - "rook", Measurements" - "stork", and "Measurements"-"lapwing" do not coincide. The coincidence between clusters "Measurements" and "Crows" can be considered using more

precise methods of observation. Reducing the dispersion (j'k)2 by two times will increase the distance between these clusters

to 13.98 and draw an appropriate conclusion.

Conclusions

In this paper, we suggested the algorithm for identifying the bird species using cluster analysis. This method is based on the concept of "Cluster" (Konysheva, Nazarov, 2011), which in this case consists of a set of parameters that characterize a certain

bird species. As an example, the clusters related to the rook, raven, mallard duck, white stork, and lapwing were considered. The number of characteristics in a cluster is not limited. Numerical results of the cluster analysis are obtained for a set of seven parameters: the length of the bird, wingspan, wingbeat frequency, the flight speed, the plumage blackness, the activity of behavior at day time, and the tendency to fly in the flock. The distance in the multidimensional Euclidean space was chosen when performing calculations - formulas (8, 9, 12 and 13).

The distances between the reference species, calculated from formula (8), are given in Table 6. The calculated values allow us to determine the cluster differences in the pairs "rook" - "raven", "rook" - "duck", "rook" - "white stork", "rook" - "lapwing", "raven" - "mallard duck", "raven" - "white stork", "raven" - "lapwing", "mallard duck" - "white stork", "mallard duck" - "lapwing", "white stork" - "lapwing". The smallest distance of 1.005, obtained using the formula (8), refers to the pair "rook" - "raven" that indicates a slight difference in these species from the results of the observations presented in Table 4 and due to the systematic closeness of the species. The highest distance 9.13 was calculated for the pair "white stork" - "lapwing". Application of formula (9) is more informative (see Table 7). In this case, the smallest distance 1.01 was also obtained for the pair "rook" - "raven", and the highest distance 83.27 was calculated for the pair "white stork" - "lapwing" Current method approbation was carried out by cluster analysis towards the observation parameters of an unknown species with fuzzy data on the wingspan, wingbeat frequency, flight speed, plumage blackness, activity at diurnal time, and ability to fly in the flock. The Pearson criterion value obtained for the discussed study case testified that the unknown observed species could be a mallard duck with high degree of significance.

References

Eremeev, V.S., Baryshevskiy, S.O. (2011). Graficheskiy metod resheniya zadach nechetkogo lineynogo programmirovaniya s chetko postavlennoy tsel'yu pri nechetkikh ogranicheniyakh. Geometrichne modelyuvannya i informatsiyni tekhnologii proektuvannya: Tavrian State Agrotechnial Academy, 4(49), 27-32 (in Russian).

Ganya, I.M., Zubkov, N.I., Kotyatsy, M.I. (1991). Radiolokatsionnaya ornitologiya. Kishinev: Shtiintsa (in Russian). Hartigan, J. A., Wong, M.A. (1979). Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, 28(1), 100-108. Identifikatsiya ptits po peniyu. Available from:

http://muz4in.net/news/instrument vtorojmirovoj vojny kotoryj izmenil nashi sposoby izuchenija penija ptic/2015-12-11 -39847/ (Accessed on 15.05.2017).

Identifikatsiya ptits. Available from: https://nplus1.ru/news/2015/06/09/hitchcock-knowed-about-birds-better/ (Accessed on 15.05.2017). Il'ichev, V.D., Vasil'ev, B.D., Zhantiev, R.D. (1975). Bioakustika. Moscow: Vysshaya shkola (in Russian).

Konysheva, L.K., Nazarov, D.M. (2011). Osnovy teorii nechetkikh mnozhestv. Saint Petersburg: BKhV-Peterburg Press (in Russian). Kremer, N.Sh. (2004). Teoriya veroyatnostey i matematicheskaya statistika. Moscow: YuNITI- DANA (in Russian). Kto i kak spasaet samolety ot ptits. Available from: http://www.yaplakal.com/forum3/topic1404151 .html/ (Accessed on 15.05.2017). Matematicheskie modeli staynogo povedeniya kulikov. [Elektronnyy resurs] Rezhim dostupa: http://dom-i-zveri.ru/povadki-ptic/matematicheskie-modeli-stajnogo-povedeniya-kulikov.html/ (Accessed on 15.05.2017)

Obzor algoritmov klasterizatsii dannykh. Available from: http://habrahabr.ru/post/101338/ (Accessed on 15.05.2017). Opredelitel' ptits stran SNG. Available from: http://onbird.ru/opredelitel-ptic/p8/ (Accessed on 15.05.2017).

Osadchiy, V.V., Siokhin, V.D., Gorlov, P.I., Vasil'ev, V.M., Pechers'ki, P.I. (2015). Komp'yuterna programa "Web portal formuvannya informatsiyno'i bazi danikh z migratsii' ptakhiv v Azovo-Chornomors'komu regioni Ukrai'ni". Ukrainian Patent 62480 from 12.11.2015 (in Ukrainian). Osadchyi, V.V., Matsyura, A.V., Eremeev, V.S. (2016). Mathematical model of bird species identifying: implication of radar data processing. Biological Bulletin of Bogdan Chmelnitskiy Melitopol State Pedagogical University, 6(3), 463-471. Doi: http://dx.doi.org/10.15421/2016119 Potapov, E.R. (1990a). Uchet khishchnykh ptits v ravninnykh tundrakh (pp. 12-16). In Metody izucheniya i okhrany khishchnykh ptits. Metodicheskie rekomendatsii. E.P. Kryukova (Ed.). Tver': Oblastnaya tipografiya (in Russian).

Potapov, E.R. (1990b). Bioradiotelemetriya v izuchenii khishchnykh ptits: sredstva i vozmozhnosti (pp. 1138-164). In Metody izucheniya i okhrany khishchnykh ptits. Metodicheskie rekomendatsii. E.P. Kryukova (Ed.). Tver': Oblastnaya tipografiya (in Russian).

Ryabitsev, V.K. (2001). Ptitsy Urala, Priural'ya i Zapadnoy Sibiri. Spravochnik-opredelitel'. Ekaterinburg: Ural University Press (in Russian). V N'yu-Yorke ubili 70 tysyach ptits dlya bezopasnykh poletov. Available from: http://comments.ua/world/571583-v-nyu-yorke-ubili-70-tisyach-ptits.html/ (Accessed on 15.05.2017).

Zhambyu, M. (1988). Ierarkhicheskiy klaster-analiz i sootvetstviya. Moscow: Finansy i statistika (in Russian).

Citation:

Osadchyi, V.V., Yeremeev, V.S., Matsyura, A.V. (2017). Cluster analysis, fuzzy sets, and fuzzy logic models in bird identification.

Ukrainian Journal of Ecology, 7{2), 96-103.

I Thk work is licensed under a Creative Commons Attribution 4.0. License

i Надоели баннеры? Вы всегда можете отключить рекламу.