RUSSIAN JOURNAL OF EARTH SCIENCES, VOL. 18, ES2001, doi:10.2205/2018ES000618, 2018
Short introduction into DMA
S. M. Agayan1, Sh. R. Bogoutdinov1'2, and R. I. Krasnoperov1 Received 21 January 2018; accepted 26 February 2018; published 6 March 2018.
Discrete Mathematical Analysis (DMA) is a new approach to data analysis that is being developed at the Geophysical Center of the Russian Academy of Sciences. Multiple papers, which have been published earlier, are mainly devoted to applied research and solving specific problems in various areas of the Earth's sciences, such as detection of geophysical anomalies, monitoring of geophysical processes, seismic zoning, etc. The goal, which the authors pursue in this paper, is, to a certain degree, opposite - to give a formal mathematical description of principles that form the basis of DMA. KEYWORDS: Fuzzy mathematics; Discrete mathematical analysis; discontinuous, anomality; clusters; trends; geophysical monitoring.
Citation: Agayan, S. M., Sh. R. Bogoutdinov, and R. I. Krasnoperov (2018), Short introduction into DMA, Russ. J. Earth. Sci., 18, ES2001, doi:10.2205/2018ES000618.
Introduction
Data mining in natural sciences can be very schematically presented as follows (Figure 1). Nowadays the analysis and processing of data are performed mainly by using such classical methods as statistical analysis, time-frequency signal analysis, wavelet analysis, fractal analysis and mathematical morphology, which currently gains popularity.
For all the advantages, most of them have excessive robustness because of their mathematical origin. This means that the object, which is being studied (more likely, its model), has to meet certain preliminary criteria (stationarity, normalcy, regularity, etc.). If they fail to meet them, then problems may occur. Previously they were solved by model's simplification.
In recent times due to the development of computational tools more gentle and undemanding approaches have been introduced (combinatory ex-
1 The Geophysical Center of RAS, Moscow, Russia
2Institute of Physics of the Earth of RAS, Moscow, Russia
Copyright 2018 by the Geophysical Center RAS. http://rjes.wdcb.ru/doi/2018ES000618-res.html
Data analysis -Data +
Theory —
Object —► Model
Figure 1. Scheme of a research.
New knowledge
haustive search, imitational modelling, neural networks, etc.)
The presented scheme (Figure 1) does not include a researcher while his role is important even in case of a firm theory (e.g. during the discussion and interpretation of results) and completely essential in other cases. The more accurate will be the scheme, shown in Figure 2. It represents the situation in geology and geophysics (which is particularly close to the authors' practice): the researcher's role in this area is extremely important, since data and knowledge tend to be irregular and ill-defined.
Data analysis
\
Theory t
Researcher
/
New knowledge
Figure 2. The final scheme of data analysis.
ES2001
1 of 10
Figure 3. Transition from mathematical concepts (white fields) to the concepts of data analysis (yellow fields).
In comparison to any formal apparatus an experienced researcher is more accurate in detection of anomalies in low-dimensional physical fields; transit from local level to the global one for achieving interpretational unity; in recognition of signals of any desired form within short fragments of records; etc. But he fails to deal with large dimensions and volume. For that reason, the task of learning the computer to analyze data as a researcher becomes particularly topical.
Discrete mathematical analysis - DMA
The fact that human thinks and operates not in numbers but in fuzzy concepts was primarily considered while solving the problem. Therefore, the technical basis of our modelling along with the classical mathematics was formed with fuzzy mathematics and directly through it with fuzzy logic [Zadeh, 1965].
The authors presume that the researcher's advantage in data analysis over the formal methods is explained by a man's flexible and adaptive perception of fundamental features of proximity, continuity, connectivity, trend, etc., because these particular features, like a "construction set", form all the algorithms for data analysis. The more thoroughly the features are modelled, the more comprehensive the "construction set" is. It is the reason why there should be plenty of continuities, connectivities, trends, etc.
The resulting solution is a new approach to data analysis that is, being researcher-oriented, falls in between robust mathematical methods and gentle combinatory. It is DMA [Gvishiani et al2010].
Discrete mathematical analysis (DMA) is a series of algorithms for processing of discrete data, unified by common formal basis: numeric fuzzy comparison, measure of proximity in discrete spaces, discrete limit. The idea of DMA is based on the construction of discrete analogues of classical mathematical analysis concepts: limit, continuity, smoothness, connectivity, monotonicity, ex-tremum, etc.
Thus, our way is to implement the classical continuous mathematics, substituting its fundamental basis with fuzzy models of their discrete analogues.
Referring to scheme (Figure 3), let's presume that we know what large—small is, thus we can construct the following mathematical concepts:
• Proximate—Remote = large—small distance;
• Continuous—Discontinuous = proximate to proximate - proximate to remote;
• Dense—Non—dense = large—small presence of small points;
• Continuous—Discontinuous = possibility-impossibility of transition from any point to another via proximate points;
and concepts of data analysis:
• Anomalies = discontinuous;
• Clusters = connected and dense;
• Trends will be discussed further.
For construction of DMA and particularly for implementation of the presented scheme the ordinary sets and Boolean logic are not sufficient: Boolean features are internally disjoint (robust),
what leads to modeling emasculation. Conceptual features have to be continuous (gentle) and thus, fuzzy.
Fuzzy mathematics = fuzzy sets + fuzzy logic.
Fuzzy mathematics is an appropriate link (interface) between a researcher and a computer.
Fuzzy Sets
Definition 1. Fuzzy set A = is set of pairs (x,^a(x)), where x - point within universal X, and ^a(x) - degree of membership x to A.
Main operations with fuzzy sets
• Fuzzy complement
1. ^a(x) = 1 — ^a(x), Vx e X
2. v-a(x) = \A — Va(x), Vx e X
• Fuzzy intersection
1. V-AnB(x) = max(^a(x), ^b(x)), Vx e X
2. y,AtiB(x) = ^a(x)^b(x), Vx e X
• Fuzzy union
1. Vaub(x) = min(^A(x),^B(x)), Vx e X
2. Vaub(x) = ^a(x)+ y,b(x) — Va(x)^b(x), Vx e X
We adopt two concepts from fuzzy mathematics:
1. That which is declared the principle of fuzzi-ness: "Any element possesses any quality, but to varying degree".
2. Fuzzy logic operations for combining the qualities and algorithm construction.
Commonly in DMA X is the range of definition of a data record, field or process. Any feature of theirs is manifested within X to various degrees and in this aspect, it can be considered as a gentle structure within X.
Large-small
From this perspective, in order to set up the scheme in Figure 3 one should answer a question, what "large" is, and what "small" is.
Given: A = {(a^,Wk)\k=\,Uk > 0} - a finite numerical collection with weights.
Definition 2. Measure of maximality mesmax a(x) (minimality mesmin a(x)) is a fuzzy structure within R, that answers the question: "To what measure (mes) the number x is larger (smaller) modulo A?"
mesmaxa(x) = mes (A <x) e [—1,1] mesmin a (x) = mes(^ < A) e [—1,1]
Measure of extremality = (Measure of maximality) V (Measure of minimality)
Informal interpretation for functions: mesmax f (x) shows to what degree the value of function f in a point x is large. The same is for mesmin f (x).
f : X ^ R,
mesmax f (x) = mesmax ¡m f f (x) mesmin f (x) = mesmin ¡m f f (x)
There are four constructions for measures in DMA. The most transparent is "fuzzy comparison"
Fuzzy comparison
In many cases the conventional linear measure of greatness of one number over another as their difference appears to be too coarse.
Definition 3. Fuzzy comparison n(a,b) for nonnegative numbers a,b e R+ defines the level of greatness of iCb" over "a":
n(a, b) = mes (a < b) e [—1,1].
Example 1. n(a,b) = b). Two pairs of numbers are given (5,10) and (70, 75). The conventional difference for them is equal, whereas the fuzzy comparison is varying:
mes (5 < 10) = n(5,10) = 10 = ± mes (70 < 75) = n(70, 75) ^ = ^
which seems more natural (a five-year-old child differs greatly from a ten-year-old one, than a seventy-year-old man from a seventy-five-year-old one).
Every fuzzy comparison defines the measures of extremeness, for example by means of the binary method:
n(ai ,x)
mesmax ax = -=- E [—1,1]
n(x,ai)
mesmin ax =-=- E [-1,1]
Thus, we have an answer for the question: "What is large and what is small?"
Definition 4. The element x is large (small) modulo A, if mesmax ax > 0.5 (mesmin ax > 0.5).
Proximate—remote
Proximity measures are constructed using the measures of minimality. Let us give two constructions.
• 1st construction: dX is an assembly of all non-trivial distances in the space X. For points x and y the following problem is solved: "To what degree the distance d(x, y) between
them is small amongst the others?". The answer is 8X (y).
dX = {d(x,y): x = y e X} Sx (y) = n(d(x,y ),dX) = T,y=x n(d(x,y),d(x,y) \X\(\X |- 1)
• 2nd construction: dX(x) is an assembly of distances from x to other points from X. Further the same.
dX(x) = {d(x, y) : y e X - x} Sx(y) = n(d(x, y),dX(x)) = n(d(x,y),d(x,y))
\X \- 1
Example 2. Application of the 2nd construction (see Figure 4).
Dense-non-dense
NB: The denser is the space at a point (Figure 4), the lesser is the radius of the red circle. In other words, such a circle can be considered as the criteria of density (irreducibility) of the space at a point. Let us give a general definition:
Figure 4. Dependence of the space partition by measure of nearness from the position of point (black color).
Dark blue - -1.0 < 5X(y) < 0.0 Blue - 0.0 < 5X(y) < 0.5
Green - 0.5 < (y) < 0.75 Red - 0.5 < 5X(y) < 1.0
Figure 5. Dense points.
Definition 5. For a subset A c X density Pa (x) is the function of membership within X for the fuzzy concept "proximity (irreducibility) to A in X value Pa(x) expresses in the scale [0,1] the degree of proximity (irreducibility) of point x to subset A within the space (X, d)
Let us give two constructions:
• 1st construction continues the concept of proximity measures from two points to a subset and a point: if d(^,A) is a variant of the distance to A in X, d(X, A) = {d(x, A) : x G X}, then
PA (x) = mesmin d(x,A)
• 2nd construction implements "dense - large presence of proximate points". Let r > 0, Da(x, r) = {a G A : d(a, x) < r}, DA(X, r) = {Da(y,r),y G X}. Then
PA (x) = mesmax (x,r)| IDA (x,r)l
Let (X, d) be the finite metric space, Px(•) is the chosen density model within it, P(X) = {Px(x) : x G X}.
Definition 6. The point x* is dense in X, if mesmax P(X)Px(x*) > 0.5
Example 3. Red color denotes dense points (see Figure 5).
Clusters
DMA defines clusters informally as continuous regions of the initial space with a relatively high density of points that are separated from other similar ones by regions with relatively low density. The basis of the rigorous formalization of clusterness forms the abovementioned conjunction:
Clusterness = density + connectivity .
Let us consider connectivity: the measure of density 5 and density threshold a are chosen.
Definition 7. A - a-5-connected, if Vx,y e X there is a chain z\,... ,zn with x = z± and y = zn, for which SZi (Zi+l) > a, i = 1,...,n — 1.
This definition implements "connectivity as ability to transfer through proximate points".
Definition 8. A - a-5-cluster, if minP^(x) >
xeA
/3 A A - a-5-connected.
DMA-clustering (density + connectivity) is more realistic, than traditional, such as clustering in noisy spaces. It includes two stages:
1st stage is noise removal = topological filtering = reduction of space to dense points;
2nd stage is clustering of dense points in general sense = breakdown into connected components (clusters).
Example 4. Figure 6 demonstrates DMA-clustering with respect to vertical view on density (the densest is in the bottom of the hills): Figure 6a - initial set; Figure 6b - result of the 1st stage; Figure 6c - result of the 2nd stage.
Continuous-Discontinuous
Discrete continuity is in the focus in the DMA, since it is closely associated with discontinuity (one of the manifestations of anomaly). Let us consider the following approach: let f be the mapping of finite metric spaces X and Y, which transfers point x e X into point y e Y:
f : X ^ Y X 3 x 4 y e Y.
The abovementioned fuzzy comparisons and measures of proximity allow formalizing the concept of
continuity f in point x: any pair of proximity measures 8X at point x within X, 8y at point y within Y allow to implement the formulated earlier logic of continuity (proximate to proximate) of mapping f and obtain fuzzy measure of continuity Cf (x) of mapping / at point x:
Definition 9.
D(x) - {x G X : Sx(x) > 0.5} D(y) - {y G F : Sv(y) > 0.5}
Cc(rr) - l^eD(x):f(x)eD(y)l °f (X) - lD(x)l
Example 5. In Figure 7 red color shows the points, where mapping f has a low measure of discrete continuity (high measure of anomaly)
Trends
Let there be given a series x ~ {x(U)\o), U = a-\-ih. h = to = a, tN = b
Definition 10. Let's specify limitationsx|[a t.] and respectively left and right parts of x at
node ti and denote them respectively as Lx(ti) and Rx(U).
Series x increases (decreases) at node ti, if Lx(ti) < Rx{ti) (Rx(ti) < Lx{ti)).
These inequations are modeled differently by fuzzy comparisons. Fuzzy measure also assigned differently. Let us consider one of the possible schemes of such modeling. Let S^.(tj) and ti£(tj) be the one-sided weights at node ti:
(tj )
% fe )
tj — a + h
ti — a + h b + h — tj
b + h — ti
and
- f. ) E x(tj)s-(tj) ]
gr x(ti) - — ,f3 G [Mi]
Figure 6. DMA-clustering.
EC (ti )
gr+x{ti )
EC (tj )
Definition 11.
1. Series x increases (decreases) at node ti, if gr~x(ti) < gr+ x(ti) (gr+ x(t{) < gr~x(t{))
2. Series x increases (decreases) within [a,b], if it increases (decreases) at every ti e [a, b]
The difference of such trends from the ordinary ones is that they intersect covering each other. Their intersection is the fuzzy extremum. Choosing the only candidate for the fuzzy extremum within
Figure 7. The most clear variant of "proximate to proximate."
it is a separate task, which has been solved within DMA.
Example 6. Figure 8 demonstrates the results of definition of trends and fuzzy extremums in different situations. Red color shows the zones of monotonous increase, green - monotonous decrease, cyan - fuzzy extremums.
So, we have answered all the main problems of data analysis and hence formed its variant. Many of them are "able to shoot and engaged in active combat". But, for example, the block "continuous-discontinuous" is not very convenient for detecting real anomalies, since they are much wider and more complicated.
This task is solved by the system for monitoring of dynamic processes that was developed within the DMA framework. A dynamic process is defined as number of time series of arbitrary nature. Monitoring includes the analysis of activity measures [Agayan et al., 2016] of separate time series with consecutive assessment of the dynamic processes' anomality in general. Measure of activity is the formalization of the fuzzy and multivalent concept of a time series activity. Any time series may be connected with a number of measures of activity that implement various views on its activity.
Application of DMA to Analysis of Geomagnetic Data
Figure 8. Fuzzy trends and extremums.
For automated assessment of geomagnetic activity level within a region, where a separate geomagnetic observatory is located; or for assessment of
geomagnetic conditions within a given region using data from a network of observatories; or for
/Cp-index
May 2005
Figure 9. Comparison of ^p-index (downloaded from [WDC, 1957]) and measure of activity ^ (data for analysis obtained from [INTERMAGNET, 1991].
global assessment of magnetic disturbances within the Earth, a new indicator has been introduced. It is based on the value of the measure of anomality ^ of a magnetometer data at a given moment (time interval). This indicator allows to measure the level of geomagnetic activity at various observatories on a single scale regardless the amplitude of disturbances, common to a given observatory. This amplitude depends on the latitude where an observatory is located. The largest amplitudes of geomagnetic disturbances are typical for auroral regions. The indicator ^ to a certain extent is the analog of the traditional Kp-index (Figure 9) [Love and Remick, 2007]. But its widely known disadvantage is its extremely large 3-hour time interval, within which the index is calculated. Moreover, the calculation of ^p-index requires preliminary elimination of regular daily variation from magnetograms, which is highly labor-consuming and causes delays. Nowadays there is a demand of operative geomagnetic indices, calculated with a 1-minute interval and provided in the internet in quasi-real mode. The proposed indicator ^ is aimed to overcome the disadvantages of the traditional Kp-index. Calculation of ^ is algorithmized and may be executed in operative and automated mode with the same frequency as the initial data are acquired.
Let's consider an example of geophysical monitoring of geomagnetic nature. Measures of activity ^ in this case play the same role as the indices
of geomagnetic activity. Let us compare ^ with a widely known ^p-index.
The magnetic conditions in the network within 3 hours according to Kp-index and ^ are presented in Figure 10 respectively horizontally and vertically.
Using the Kolmogorov's mean a new system of coordinates is constructed. In its first (third) quadrant interesting (uninteresting) events, according to Kp-index and are located. The fourth quadrant contains events, which are interesting according to ^p-index and uninteresting according to The opposite quadrant is untrivial, it contains events, which are interesting according to ^ and uninteresting according to Kp-index (Figure 10).
Indeed, let's consider one of the events from the second quadrant and compare them with the corresponding intervals of the magnetic records, registered at stations that perform monitoring (Figure 11).
It is apparent that within the first hour all of the considered stations register a certain event. It was detected by the index based on measure of activity, and missed by the standard Kp-index.
Conclusion
The goal the authors pursued in this paper is to give a brief formal mathematical description of the Discrete Mathematical Analysis (DMA) approach.
May 2005
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
- • • • •
• • • •
i • • • i t t
• i i HP'- •
—ri1 jjj : 5 « i
— i
0 2 4 6
/Cp-index
Figure 10. Condition of the INTERMAGNET network.
We formally described the basis of mathematical concepts which form the apparatus of DMA: proximate-remote, dense-non-dense, continuous-discontinuous, clusters, and trends. A more detailed description and examples of DMA-application for geological and geophysical tasks (monitoring of geophysical processes, seismic zoning, analysis
of magnetograms) can be found in the following papers: [Agayan et al., 2014; 2016], [Bogoutdi-nov et al., 2010], [Gvishiani et al., 2008a; 2008b; 2010; 2013a; 2013b; 2016], [Mikhailov et al., 2003], [Sidorov et al., 2012], [Soloviev et al., 2012a; 2012b; 2013; 2016], [Zelinskiy et al., 2014], [Zlotnicki et al., 2005].
Figure 11. Intervals of the magnetic satation records.
Acknowledgments. The authors wish to express gratitude to the staff of the World Data Center for Geomagnetism in Kyoto for data on geomagnetic indices. The results presented in this paper rely on data collected at magnetic observatories. We thank the national institutes that support them and INTERMAGNET for promoting high standards of magnetic observatory practice (http://www.intermagnet.org). The research has been conducted in the framework of the state task of the Geophysical Center of RAS (theme No. 0145-2018-0002 "Development of new methods of intellectual analysis of geomagnetic data, and the system of ground-based measurements of the Earth's magnetic field for studying electro-magnetic processes in the near-Earth space and their affection on climate and technological infrastructure").
References
Agayan, S., Sh. Bogoutdinov, M. Dobrovolsky (2014), Discrete Perfect Sets and their application in cluster analysis, Cybernetics and Systems Analysis, 50, No. 2, 176-190, Crossref Agayan, S., Sh. R. Bogoutdinov, A. Soloviev, et al. (2016), The Study of Time Series Using the DMA Methods and Geophysical Applications, Data Science Journal, 15, 1-15,Crossref Bogoutdinov, Sh. R., A. D. Gvishiani, S. M. Agayan, A. A. Soloviev, E. Kihn (2010), Recognition of disturbances with specified morphology in time series. Part 1: Spikes on Magnetograms of the Worldwide INTERMAGNET Network, Physics of the Solid Earth, 46, No. 11, 1004-1016,Crossref Gvishiani, A. D., S. M. Agayan, Sh. R. Bogoutdinov (2008), , Fuzzy recognition of anomalies in time series, Doklady Earth Sciences, 421, No. 1, 838-842,Crossref
Gvishiani, A. D., S. M. Agayan, Sh. R. Bogoutdinov,
A. A. Soloviev (2010), Discrete mathematical analysis and applications geology and geophysics, Bulletin of KRAESC. Earth Sciences, No. 10, 109-125.
Gvishiani, A. D., S. M. Agayan, Sh. R. Bogoutdinov, J. Zlotnicki, J. Bonnin (2008), Mathematical methods of geoinformatics. III. Fuzzy comparisons and recognition of anomalies in time series, Cybernetics and System Analysis, 44, No. 3, 309-323,Cross-ref
Gvishiani, A. D., M. N. Dobrovolsky, S. Agayan,
B. Dzeboev (2013), Fuzzy-based clustering of epicenters and strong earthquake-prone areas, Environmental Engineering and Management Journal, 12, No. 1, 1-10.
Gvishiani, A. D., B. Dzeboev, S. M. Agayan (2013), A new approach to recognition of the strong earthquake-prone areas in the Caucasus, Izvestiya, Physics
of the Solid Earth, 49, No. 6, 747-766,Crossref Gvishiani, A. D., A. Soloviev, R. Krasnoperov, et al. (2016), Automated Hardware and Software System for Monitoring the Earth's Magnetic Environment, Data Science Journal, 15, No. 18, 1-24,Crossref Love, J., K. J. Remick (2007), Magnetic in-
dices In: Gubbins, D and Herrero-Bervera, E (eds.) Encyclopedia of Geomagnetism and Paleomagnetism, 711-713 pp., Springer, New York. Mikhailov, V., et al. (2003), Application
of artificial intelligence for Euler solutions clustering, Geophysics, 68, No. 1, 168-180, Crossref Sidorov, R. V., A. A. Soloviev, Sh. R. Bogoutdinov (2012), Application of the SP algorithm to the INTERMAGNET magnetograms of the disturbed geomagnetic field, Izvestiya, Physics of the Solid Earth, 48, No. 5, 410-414,Crossref Soloviev, A., S. M. Agayan, Sh. R. Bogoutdinov (2016), Estimation of geomagnetic activity using measure of anomalousness, Annals of Geophysics, 59, No. 6, 1-17,Crossref Soloviev, A., Sh. R. Bogoutdinov, A. D. Gvishiani, R. Kulchinskiy, J. Zlotnicki (2013), Mathematical Tools for Geomagnetic Data Monitoring and the INTERMAGNET Russian Segment, Data Science Journal, 12, WDS114-WDS119,Crossref Soloviev, A., A. Chulliat, Sh. R. Bogoutdinov, et al. (2012), Automated recognition of spikes in 1 Hz data recorded at the Easter Island magnetic observatory, Earth, Planets and Space, 64, No. 9, 743-752,Crossref
Soloviev, A. A., S. M. Agayan, A. D. Gvishiani, et al. (2012), Recognition of disturbances with specified morphology in time series. Part 2: Spikes on 1-s magnetograms, Izvestiya, Physics of the Solid Earth, 48, No. 5, 395-409,Crossref Zelinskiy, N. R., N. G. Kleimenova, O. V. Kozyreva, S. M. Agayan, Sh. R. Bogoutdinov, A. A. Soloviev (2014), Algorithm for recognizing Pc3 geomagnetic pulsations in 1-s data from INTERMAGNET equatorial observatories, Izvestiya, Physics of the Solid Earth, 50, No. 2, 240-248,Crossref Zadeh, L. A. (1965), Fuzzy sets, Information Control,
8, No. 3, 338-353,Crossref Zlotnicki, J., J.-L. LeMouel, A. D. Gvishiani, et al. (2005), (2005), Automatic fuzzy-logic recognition of anomalous activity on long geophysical records. Application to electric signals associated with the volcanic activity of la Fournaise volcano (Reunion Island), Earth and Planetary Science Letters, 234, No. 1-2, 261-278, Crossref INTERMAGNET (1991), International Real-time Magnetic Observatory Network web-page, Link to DB, WDCS, Paris. WDC (1957), World Data Center for Geomagnetism, Kyoto, Link to DB, WDCS, Kyoto.
S. M. Agayan, Sh. R. Bogoutdinov, R. I. Krasnoperov, Geophysical Center of RAS, Moscow ([email protected])