Testing and Diagnosis of Bad Messages in Individual Cyberspace
Vladimir Hahanov, Senior Member, IEEE , Svetlana Chumachenko, Member, IEEE,
Aleksandr Mischenko
Abstract — The theory, methods and the architecture of parallel information’s analysis is presented by the form of analytical, graph and table forms of associative relations for the search, recognition, diagnosis of destructive components and the decision making in n-dimensional vector cybernetic individual space. Vector-logical processes-models of actual oriented tasks are considered. They include the testing and diagnosis of bad massages and the recovery of serviceability, the hardware-software components of computer systems and the decision quality is estimated by the interactions of non-arithmetic metrics of Boolean vectors. The concept of self-development information of computer ecosystem is offered. It repeats the evolution of the functionality of the person. Original processes-models of associative-logical information analysis are represented on the basis of high-speed multiprocessor in n-dimensional vector discrete space.
I. Introduction
The problem of creating an effective infrastructure of cyberspace (Cyber Space), as well as self-developing information and computing ecosystems (ICES) of the planet is particularly important for global companies, such as Kaspersky Laboratory, Google and Microsoft.
Cyberspace as an object of nature is also susceptible to destructive components affecting the performance of subjects, which are computers, systems and networks. Therefore, now and in the future it remains as an important problem of space standardization and
Manuscript received July 3, 2011.
Vladimir Hahanov is with the Kharkov National University of Radioelectronics, Ukraine, 61166, Kharkov, Lenin Prosp., 14, room 321 (corresponding author to provide phone: (057)7021326; fax: (057)7021326; e-mail: hahanov@ kture.kharkov.ua).
Chumachenko Svetlana is with the Kharkov National University of Radioelectronics, Ukraine, 61166, Kharkov, Lenin Prosp., 14, room 321 (phone: (057)7021326; fax: (057)7021326; e-mail: ri@
kture.kharkov.ua).
Alexandr Mischenko is with the Kharkov National University of Radioelectronics, Ukraine, 61166, Kharkov, Lenin Prosp., 14, room 321 (phone: (057)7021326; fax: (057)7021326; e-mail:
alex@simplesolutions.com.ua).
specialization of all the interacting entities, including the negative, as an integral part of the ecosystem. This action is permanent in time, whose purpose is to keep up, but one step ahead of the emergence of new malicious components, by creating an infrastructure cybernetic space, operating the computer ecosystem of the planet and the quality of each person's life.
Among the modules of such an infrastructure we can provide diagnosis of failures, and spam by analyzing the information obtained at the testing stage and using of special methods of built-in search spam, standard-based boundary scan or assertion redundancy focused on spam detection, it will allow to identify and to eliminate spam without the use of external funds. So, it will be possible to do it without difficult exterior programs of modeling, testing and diagnosing by grafting of each e-mail testability intellectual redundancy package at the stage of its creation. It should use the predicate of recognition, which operates not only Boolean but register and matrix variables, making it nearly significant in formal writing the equations of diagnosis or recognition:
xa * x ® a = 0 v minQi ^ x ® a ® Q = 0;
xm * x ® m = 0 v min Qi ^ x ® m ® Q = 0;
T ® S = Q *
"00" "1 . 1 1" "1 . 1 1"
01 . 1 A .111 ..11
10 . 1 . 1 .1.1
11 1 . . 1 ... 1 1 . . .
where xa, xm - Predicate variables, a, m - values of variables, Qi - an estimation of a cognizance of variable value; T - the test, S - the object which is subject to testing (program).
On the basis of recognition of the predicate m-image of any complexity, nature and shape can be created quite compact equation predicates forming intellectual solutions in the field of pattern recognition, decision making, testing, knowledge and technical facilities, diagnosis (recognition) among the spam e-mails.
In this regard, the proposed infrastructure of cyberspace, the metric to measure and model the
R&I, 2012, №1
9
process of analysis and synthesis of subjects give the opportunity to create effective solutions for computer products focused on Quick Search, detection and diagnosis not only positive but also negative subjects. Specifically, the proposed infrastructure can solve the problem: 1) Description of the variety of cyberspace e-waste. 2) The formalization of the interaction of triad components <the program, spam, tests>. 3) Diagnosing and e-mail filtration. 4) The creation and effective utilization of spam basis. 5) The creation of high-speed intellectual means of self-developing service and protection of cyberspace.
It laid the basic principles of evolution, expressed even in the modern computer industry: 1) The
standardization is the most important thing for evolution and life cycle ICS. The market doesn't accept and doesn't understand non-standard decisions on the interface. 2) Specialization is the efficiency provided by (personally oriented) services, products related to performance, quality, cost, energy saving by optimizing the structure and functional components that cover the specification. 3) Widespread use of vector-logical criteria of quality solutions to the problems of generating ideas, analysis and synthesis. Generation is a process of new functionality creation. In this synthesis it operates with existing components in the information space to create a structure. The analysis is an estimation of the received decision. 4) Hasse diagram is used to develop strategies to optimize coverage of the functional specification of library components or combinations, belonging to the information space. It is consistent with the modern Y-Technology, part of the ESL Design, which uses library components for all levels of product design to meet the Specified functionality in the synthesis process.
Fig. 1 is a vicious cycle of evolution ICES, which is actually isomorphic to the spiral of human development, wound on the time axis.
The purpose of this paper is the significant improvement of the quality of individual cyberspace (ICP) of the user and cost reduction in operating costs due to vaccination ICP by adding a space infrastructure service, which includes libraries of positive and negative messages and provides testing, diagnosis and removal of harmful components of e-mails.
The object of study is a personal cyberspace with the information provided, its carriers and converters, as well as destructive components harmful to the functionality and improves the quality of human life.
Subject of research is the infrastructure service, which includes libraries of positive and negative messages and embedded software redundancy, which operates in real time, provides testing, diagnosis and removal of malicious and "junk" e-mail information, as described in the relevant libraries.
Fig. 1. Cycle ICES
Motivation:
1) Lack of market AntiSpam protection built-in testing, diagnosing and removing harmful components that make up the infrastructure service, just as in digital systems in crystals, there are boundary-scan standards, and software products - assertion redundancy focused on integrated testing of defects and errors, followed by a hardware or software products.
2) The theoretical development related to technology of algebraic vector analysis of information data-oriented high-performance solutions and estimation problems of recognition and images, action and testing facilities.
3) The presence of a model of production and marketing infrastructure of Kaspersky Lab is able to support the project of electronic communication technology vaccination and authority to offer to the market the information technologies.
4) Miniaturization and digital communications systems (phones, smart phones, tablets) require constant protection from the massive and unnecessary e-mails through the introduction of built-AntiSpam means controlling the exchange of information.
Tasks: 1) To develop mathematical tools of analysis of cyberspace based on the creation of models and methods of service of software products for testing, diagnosing and eliminating the massive and unnecessary emails. 2) To create a standard process models and criteria for the interaction of e-mails with content analysis of useful functionality. 3) To develop the technology for analyzing the structure of the program code for determining the critical points and install the assertion operator monitors and manage the process of its functioning. 4) To create the infrastructure service functional programs for the embedded test, diagnosis and removal of harmful components of the software package of functionality through the use of library spam
10
R&I, 2012, №1
information. 5) To test and to verity integrated infrastructure service functionality that protects against malicious software code components.
II. Evolution of Cyber Space and Internet
To create a schema that implements useful functionality, it should generate the lowest level primitives. You must create filters F = {F1,F2,...,Fj,...,Fm} that form a table of
primitive relations, taken from the informational space of the planet (Fig. 2). Having a standardized data structure for the individual portals and browsers, delivering new services with higher speed you should expect a gradual qualitative improvement of all components of Cyber Space. The ultimate goal of such a mutual and positive impact of the infrastructure of cyberspace is to develop uniform standards for the interfaces and its transformation into a self-developing intelligent information computer ecosystem. Significant importance will be the primary filters or converters to create the new standardized primitives, creating a technological infrastructure for high-speed drive on the Cyber Space with the use of specialized non-arithmetic engine (I-Computer). With time, the amorphous or "garbage" of the Internet will decrease and standardized infrastructure grow. By 2020, the informational space of the planet must adopt civilized formats of data structures with standardized interfaces, just as it has happened with the development of a planetary infrastructure, transport connections to the terminals, hotels and gas stations, satisfying all user requests.
Fig. 2. The Evolution of Cyber Space and the Internet
With a specification provided after processing a verbal description in the form of a vector of input and output variables, it is easy to write a strategy to build a new functionality as the task of finding covered by the library elements of the generalized vector <X,Y>. The general solution of the problem is similar to the synthesis of an automaton model that defines the interaction of components in time and space. However, a variety of primitives are not specified in advance,
excludes such a possibility, which means a shift from the strict determinism of digital machines to the field of evolutionary and quasi-solutions.
The condition of the problem: there is a specification as a vector of the essential variables that needs to cover a minimal set of primitives from the library and to generate an output vector. A beautiful solution to the problem of the functional structure synthesis of the specification is the key to self generating computer for new solutions. After the solution is solved only two problems are left on the way to the creation of computer intelligence - itself-generate original functionality required to solve the problem of coverage and specifications of new and useful for human or computer services.
III. Integral Metric Evaluation of the Diagnosis
Infrastructure brain-like algorithms for detecting spam includes models, methods and associative logical data structure intended to support the search process, to recognize and to make decisions based on nonarithmetic vector operations. The score is determined by solving the problem of vector-logical criterion of quality interaction between the query (a vector m) with a system of associative vectors (associates) which will generate a constructive response in the form of one or more associates and the numerical characteristics of the power supplies (quality function) input of vector m to found solution: ^(m є A). The input vector
m = (m1,m2,...,mi,...,mq)> mi є{0,1,х} and a
matrix Ai of associators
Aijr (є Aij є Ai є A) = {0,1, x} have an identical dimensionality equal q. Further the accessory level of m-vector to A vector will be designated as ^(m є A) .
There are 5 types set-theoretic (logical) A -interactions of two vectors m n A defined in fig. 3.
Fig. 3. The results of the intersection of two vectors
They form all primitive choices of generalized PRP-SYSTEM response (Search, Recognition and Decision
R&I, 2012, №1
11
making) to an input vector request. In the technology industry knowledge technical diagnostics (Design & Test) is a specified sequence of actions, it is isomorphic to the itinerary: the search for defects, their detection, the decision to restore health. All three stages of a technological route need the evaluation metric solution for the optimal choice.
Definition. Integral theoretical metric for evaluating the quality of the query is a function of the interaction of multi-valued vectors m n A, which is determined by the average sum of three normalized parameters: the minimum distance d(m,A), the membership function p,(m є A) and membership function ц(Л є m):
Q = 1 [d(m, A) + p,(m є A) + ц(Л є m)],
1 n
d(m,A) = — [n -card(mi I Ai =0)];
n i=1
p,(m є A) = 2card(mnA)-card(A) ^ card(m n A) =
nn = card(mi I Ai = x)&card(A) = card( U Ai = x);
i=1 i=1
ц(Л є m) = 2card(mnA)-card(m) ^ card(m n A) =
nn = card(mi I Ai = x)&card(m) = card( U mi = x).
i=1 i=1
(1)
Explanations. Valuation parameters allow to evaluate the level of interaction vectors in the interval [0,1]. If we set limiting maximum value of each parameter equal to 1 then the vectors are equal. Minimum score Q = 0
records in the case of a complete mismatch of the vectors for all n coordinates. If the power of intersections m n A = m is equal to half of the space vector A, the membership function and quality are equal:
p,(m є A) = 2 ц(Л є m) = 1; d(m,A) = 1;
Q(m,A) = -V = 5.
2 X 3 6
A similar value would be setting if the power of intersections is equal to half of the space of the vector m. If the power of intersections is equal to half of the capacity of the spaces of vectors and m, the membership functions have values:
M'(m є A) = -; ц,(Л є m) = -; d(m,A) = 1;
Q(m,A) =
4
2 x 3
4
6
2
3
It should be noted if the intersection of two vectors is equal to empty set, the power of two characters from
"empty" is taken to be zero:
2card(mnA)=0 = 20 = о
It really means that the number of common points at the intersection of two spaces is zero.
The aim of introducing vector logical criteria of the solution quality is significantly improved by the performance in calculating the quality Q of interaction between the components m and A in the analysis of associative data structures by using only the vector logic operations. Arithmetic criteria (1) without averaging membership functions and minimum distance can be transformed to the form of:
Q = d[m, Ai(j)] + h[m є Ai(j)] + h[Ai(j) є m], n(m)
d(m,A;(j)) = card[m © = 1];
i(j)=1
n(m)
g(m є Aj(j)) = card[Aj(j) = 1] - card[m л Ajq = 1];
i(j)=1
n(m)
p(Aj(j) є m) = card[m = 1] - card[m л A;q = 1].
i(j)=1
(2)
The first component creates the degree of mismatching n-dimensional vectors, it is the minimum distance by performing xor, the second and the third ones determines the degree of non-membership result of the conjunction to the number of units of each two interacting vectors. Notions of belonging and not-belonging are complementary but in this case it is better to calculate technological nonaffiliation. Thus, the ideal quality criteria are zero when two vectors are equal. The assessment of interaction quality between two binary vectors decreases with increasing test from 0 to 1. Finally to get away from arithmetic when you count a vector quality criteria help the expression (2) transformed to:
Q = d(m, A) v p(m є A) v p(A є m),
d(m, A) = m © A; p(m є A) = A л m л A;
p(A є m) = m л m л A.
Here the criteria are not numbers and vectors, which evaluate the interaction of components. The increase in the number of zeros in the three vectors improves the quality criterion and the availability of units indicates the deterioration of the interaction quality.
IV. Process Model of Diagnosing SPAM
Quality metric presented in (3) makes it possible to assess the proximity of spatial objects to each other as well as the interaction of the vector spaces. A practical example of the usefulness of integral quality criteria may be shooting at the goal which is illustrated by the
12
R&I, 2012, №1
P(m,A) = minQi(m A Aj) =
i=1
= v[(Qj л Qj)0 Qj] = 0; j=1,n
Q(m,A) = (Qi,Q2,...,Qj,...,Qn); A = (Ai,A2,...,Aj,...,An);
A = {and, or, xor, not, slc, nop};
Aj = (Ai1, Ai2, -, Aij, -, Ais );
.At
previously reduced diagram (see Fig. 3) of the interaction vectors:
1) The shell hit the target and did it completely. 2) The target was struck by unreasonably large caliber projectile. 3) Caliber projectile is insufficient to defeat a major purpose. 4) Inefficient and inaccurate shot by large caliber projectile. 5) The projectile flew past the target.
Process-interaction model is accompanied by integral quality criteria which evaluates not only hit or miss but also the caliber efficiency of the weapon. The analytical form of a generalized process model that selects the best interaction between the input query m to the system logic associative relationships are represented as follows:
making when using the associative structure of tables (Fig. 4).
Aij = (Aij1, Aij2,-, Aijr ,•••, Amsq); m = (m1,m2,...,mr,...,mq).
Qi = d(m, Aj) v p(m є Aj) v p(Aj є m), (5) d(m,Aj) = m 0 Aj;
The evaluation of effectiveness (Fig. 5) of the design solution under the auspices of specialization and standardization Sp ^ St is based on the combined use
of three mutually conflicting parameters: the quality Y, speed T, the program costs H:
p(m є Aj) = Aj лm л Aj;
p(Aj є m) = m л m л Aj.
In order to detail the structure of vector calculations the analytical and structural process models are presented below which are given for the analysis of A matrix by columns or lines.
The proposed process model analysis (graph) of associative tables identified by the components of spam and introduced by the quality criteria for logical decisions that allow us to solve the problem of quasioptimal coverage, diagnosing varieties of spam messages in an individual cyberspace (ICP) users. The model of vector calculations has provided the basis for the development of specialized multiprocessor oriented architecture search, pattern recognition and decision
Fig. 5. Evaluation of the effectiveness of the process model
V. Practical Results of the Implementation of Infrastructure
As an object of investigation was chosen SquirrelMail - email client with a Web interface written in PHP. The application can be installed virtually on any web server that has PHP installed and there is a connection to the mail server for IMAP and SMTP. The interface window is shown in Fig. 6.
R&I, 2012, №1
13
This client is easily expanding by different plugins. To conduct the study was written a plugin that implements the analysis of determination of usefulness of information based on user preferences. The process-model of the plugin is shown in Fig. 7. Based on the user’s activity and the attributes of content
studied. While downloading new messages from an individual cyberspace (in this case cyberspace was presented by a subset of e-mails), the information was filtered not on the "spam"or "not spam" basis but on the basis of personal preferences of the user (Fig. 8).
Fig. 7. The process-model of the plugin for SquirrelMail
14
R&I, 2012, №1
Fig. 8. Graph-scheme of the letter analysis
From the entire set of letters arriving at the mailbox, "spam" (anonymous mass mailings) has been partially identified not as a spam, but as distribution, which can carry out useful information for the user. Fig. 9 shows the effectiveness of introducing of the infrastructure service ICS for a single user, where the TL - Total Letters, S - Spam, SAI - Spam after Infrastructure, UL -Useful Letters. If we assume that the market attractiveness of the infrastructure is around the order of 1 billion users, the time savings in the overall market of cyberspace users is (T2 - total time savings per year
; k - the reducing ratio of spam in the implementation of infrastructure; L - the number of letters per month; N -the potential number of users in Ukraine; T - time analysis of a single letter; M - number of months per year; HE - the annual financial savings from the introduction of infrastructure; Ch - the cost of one hour of work of a single user in Ukraine)
TE =k x L x N x T x M = 0,9x800x10000000(1x12= 8640000000b « 2740years« 24002400iours;
HE = TE x Ch =24002400 $5 = $120012000
1400
1200
1000
800
600
400
-TL-
-SAI-
-UL
0 e= =8= • LA • £ •
Jul Aug Sept Oct Nov Dec
TL 760 850 1010 900 1010 1270
S 532 595 707 630 707 889
SAI 152 170 202 180 202 254
UL 80 89 106 95 106 133
S
Fig. 9. The effectiveness of implementation of ICS infrastructure
VI. Conclusion
1. Scientific novelty of the study results is that the servicing infrastructure of individual cyberspace was proposed for the first time and characterized by the
presence of built-in testing, diagnosing and restoring of the ICS and two growing library of positive and negative messages, which gives the opportunity to significantly (by several times) reduce the analyzing time of received information.
R&I, 2012, №1
15
2. The practical significance of the research results of infrastructure ICS service was focuses on the quality improving of life for all stakeholders of the planet that are using email services to communicate with the outside world. In this case ICS is a model of the future of human communication with the outside world, which is invariant with respect to the technical means available in the cyberspace world. The annual economic effect from the introduction of the ICP infrastructure for Ukrainian users can make more than $ 120 million.
3. The direction of future researches. Urgent problem is the creation of the theory, methods and architecture of the parallel analysis of information provided in the form of analysis, graph and tabular forms of associative relations for search, recognition, diagnosis and destructive components of decision making in the n-dimensional vector discrete space. It is advisable to use here a vector-logical process model of topical applications, including a diagnosis of viruses and disaster recovery software and hardware components of computer systems, the quality solutions of which are estimated by nonarithmetic metric interaction of binary vectors. Solving the problem is focused on search, detection, diagnosis of the destructive components of hardware and software by methods in discrete cyberspace. Generality of the provided theory of synthesis and analysis of cyberspace is based on the vanishing of the triad of equivalent components that are connected by xor m ®A ®Q = 0 operation, formulating the conditions for solving the problem. Here, the first component m is the input code, the second A - is a destructive reference model, third Q - is the result of interaction between the first two, which may degenerate into criteria of quality relationships or decision making, the assessment of recognition of objects or images.
The goal is a substantial improvement of the quality of software products and cost reduction in operating costs due to their vaccination by introducing a code embedded software redundancy in the form of infrastructure service that provides testing, diagnosis and removal of harmful classified in libraries. The object of study is cyberspace presented by information, its carriers and converters as well as destructive components harmful to the functionality that improves the quality of human life. The subject is infrastructure of service in the form of built in redundancy program running in real time, which provides testing, diagnosis and removal of harmful components, described in the relevant libraries.
4. Expected results and its market appeal: 1) infrastructure protection of built in code from unauthorized modification, leading to a change in functionality. 2) The redundancy of infrastructure code that is automatically synthesized at the stage of design and verification is not more than 5% of the specified functionality. 3) The market attractiveness of infrastructure with the variety of software products, multiplied by the sales of each product that is equal to about one billion copies per year. 4) The cost of creating an infrastructure for software is 20% of the cost of developing functional code. If the level of sale is not less then 500 copies, the costs of creating a completely integrated antivirus payback within a year. 5) The introduction of patented software of vaccination products at their birth can bring to the company about 2 billion dollars in the first 3 years of its operation. 6) The marketing problem of global companies (Kaspersky Lab) is in persuading software developers to implement existing antivirus inside the code of useful functionality.
VII. References
[1] Infrastructure of brain-like computational processes / M.F. Bondarenko, O.A. Guz, V. I. Hahanov, J.P. Shabanov-Kushnarenko .Kharkov: Novoe Slovo .- 2010 .- 160 p.
[2] Designing and testing of digital systems on crystals. Verilog & System Verilog / V.I. Hahanov, E.I. Litvinov, O.A. Guz. Kharkov: KhNURE. 2009. 484 p.
[3] Design and verification of digital systems on crystals/ V.I. Hahanov, I.V.Hahanova, E.I. Litvinova, O.A. Guz. Kharkov: Novoe Slovo. 2010. 528 p.
[4] Semenets V.V., Hahanova I.V., Hahanov V.I. Design of digital systems using the VHDL language. Kharkov: KhNURE. 2003. 492 p.
[5] Hahanov V.I., Hahanova I.V. VHDL+Verilog = synthesis in minutes. Kharkov: KhNURE. 2006. 264 p.
[6] Hahanov V.I., Chumachenko S.V. Models of space in the scientific researches / Radioelectronics and Informatics. 2002. № 1. P. 124-132.
[7] Zorian Yervant. Guest Editor's Introduction: Advances in Infrastructure IP / / IEEE Design and Test of Computers. 2003. P.4955.
[8] Bulent I. Dervisoglu. A Unified DFT Architecture for Use with IEEE 1149.1 and VSIA / IEEE P1500 Compliant Test Access Controllers. Proceedings of the Design Automation Conference. 2001. P. 53-58.
[9] Bergeron J. Writing Testbenches using SystemVerilog. Springer US. 2006. 414 p.
16
R&I, 2012, №1