Научная статья на тему 'От больших данных к большим знаниям'

От больших данных к большим знаниям Текст научной статьи по специальности «СМИ (медиа) и массовые коммуникации»

CC BY
285
94
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
БОЛЬШИЕ ДАННЫЕ / BIG DATA / МОДЕЛИ / MODELS / ЗНАНИЯ / KNOWLEDGE / ПРОЦЕДУРА ЭЙТКЕНА / EYTKEN''S PROCEDURE / СЕТИ ДОВЕРИЯ БАЙЕСА / NETWORKS OF TRUST OF BAYES / ГЛОБАЛЬНЫЙ БАНК МАТЕМАТИЧЕСКИХ МОДЕЛЕЙ / GLOBAL BANK OF MATHEMATICAL MODELS / ГЛОБАЛЬНАЯ СЕМАНТИЧЕСКАЯ СЕТЬ / GLOBAL SEMANTIC NETWORK

Аннотация научной статьи по СМИ (медиа) и массовым коммуникациям, автор научной работы — Bogomolov A.I., Nevejin V.P.

In article the knowledge acquisition problem as a result of accumulation and processing of Big Data is considered. The solution is proposed on the basis of use of the methodology of modeling providing further standardization and integration of mathematical models and their placement in the global distributed databank and global semantic network. Examples of integration of models for the purpose of obtaining new knowledge, and also alternative approaches to the uniform description of mathematical models are given.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «От больших данных к большим знаниям»

УДК:004.62

FROM BIG DATA TO BIG KNOWLEDGE

Bogomolov A.I., Nevejin V.P. Financial University under the Government of the Russian Federation, Moscow

Abstract. In article the knowledge acquisition problem as a result of accumulation and processing of Big Data is considered. The solution is proposed on the basis of use of the methodology of modeling providing further standardization and integration of mathematical models and their placement in the global distributed databank and global semantic network. Examples of integration of models for the purpose of obtaining new knowledge, and also alternative approaches to the uniform description of mathematical models are given.

Keywords: Big Data, models, knowledge, Eytken's procedure, networks of trust of Bayes, global bank of mathematical models, global semantic network

ОТ БОЛЬШИХ ДАННЫХ К БОЛЬШИМ ЗНАНИЯМ

Богомолов А.И., Невежин В.П. Финансовый университет при Правительстве Российской Федерации, г. Москва

Аннотация. В статье рассматривается проблема получения знаний в результате накопления и обработки Больших данных (Big Data). Решение проблемы предлагается на основе использования методологии моделирования, предусматривающей в дальнейшем стандартизацию и интеграцию математических моделей и их размещения в глобальном распределённом банке данных и глобальной семантической сети. Приведены примеры интеграции моделей с целью получения нового знания, а также альтернативные подходы к единообразному описанию математических моделей.

Ключевые слова: Большие Данные, модели, знания, процедура Эйткена, сети доверия Байеса, глобальный банк математических моделей, глобальная семантическая сеть

Introduction

With Big Data we are experiencing another stage in the technological revolution, radically changing information environment [1]. Big Data is among the few names that have very accurate date of his birth — September 3, 2008, when a special issue of the oldest British scientific journal Nature, dedicated to finding the answer to the question "How can affect the future of science technology opens the possibilities of working with large volumes of data?". Special issue summarizes the previous discussions about the role of data in science in General and in e-science (e-science) in particular.

Computer methods of processing big data can be used in almost all Sciences, from archaeology to nuclear physics. As a consequence, noticeable change, and scientific methods. Not accidentally did the libratory neologism formed from the words library (the library) and laboratory (lab), which reflects changes in representations that can be considered the result of the study. However, the problem remains: how from big data to big knowledge, and this problem is still far from solution.

Big Data, business and information technology

In business, as in science, large amounts of data, too, is not something completely new — have long talked about the need to work with large volumes of data, for example in connection with the distribution of radio frequency identification (RFID) [2], social networks, Internet of things, etc. In the production of knowledge in modern society takes more resources than the production of goods and services that allows you to talk about

industry knowledge. Knowledge is more than data and information. Our knowledge help us to create new technology and machines to treat disease and conquer the space, to understand different situations, to solve complex problems and perform difficult tasks, to learn from his experience and accordingly adjust their behavior. If we are working in any company, our knowledge combined with the knowledge of our colleagues contribute to its success.

The main means of accumulation, publication and dissemination of knowledge recently were books, then added radio and television. Over the last 20 years the situation has radically changed. With the advent of the Internet, the universal spread of computers and computer networks, digital data repositories, OLAP technology and Data Mining and other advances in information technology have been revolutionary changes in the storage, information processing and learning. Most of the information and knowledge is stored in networks and in machine-readable form, except, of course, the knowledge stored in the minds of people.

Modern information technologies allow you to store and organize huge amounts of information. Researchers have created a 5D disk, which records data in 5 dimensions, persisting for billions of years. It can store 360 terabytes of data and to withstand temperatures up to 1000 degrees.

The Internet becomes available everywhere and anywhere. A draft of ultra-fast 5G Internet from drones with solar panels (Fig. 1) [3].

Figure 1 - Ultra-fast 5G Internet from drones with solar batteries

Cloud, big data, Analytics - these three factors of it are not only interconnected, but today can not exist without each other. Working with Big Data is impossible without cloud storage and cloud computing - cloud computing not only in the form of ideas, and in the form and completed projects was the trigger for the start of a new round of the spiral of increasing interest to Big Data Analytics [4].

However at all achievements of information technologies the fundamental problem was found: ability to generate and store huge data arrays was simpler, than ability to obtain from them information and knowledge. The cause of this distortion is, most likely, that in 65 years of history of computers we haven't understood how to receive new knowledge as a result of data processing [5]. The interrelation of data, information and knowledge in decision-making process, nevertheless, can be presented in the form of the following scheme (fig. 2).

Figure 2 - Scheme of interrelation of data, information and knowledge That is why to restore the links of the chain of "data — information — knowledge" to talk about solving the problem of Big Data is meaningless. Data is processed to obtain information, which should be just enough so that people could turn it into knowledge.

Big data and knowledge

In recent decades major works for relations of raw data with useful information, nothing that we usually call the theory of information Claude Shannon, is not than other, as the statistical theory of signal transmission, and information perceived by man is irrelevant. There are plenty of publications that reflects the private point of view, but no full-fledged modern theory of information. In the result, the overwhelming majority of experts in General does not distinguish between data and

information. All around just a state that data is a lot or very much, but a Mature view of what much, in what ways should solve the problem, no one — and all because the technical capabilities of working with data is clearly ahead of the level of development of abilities to use them. Only one author, editor of Web 2.0 Journal Dion of Hinchcliffe, [6] provides a classification of Big Data, allowing to correlate technology with the result that you expect from Big Data, but it is far from satisfactory.

Hinchcliff divides the approaches to Big Data in three groups:

• Fast Data (Fast Data), their volume is measured in terabytes;

• Big Analytics (Big Analytics) is a petabyte-scale data and

• Deep Penetration (Deep Insight) — activity, zettabytes. Groups differ not only the operated amounts of data, but also the quality of decisions on their treatment.

The solution to the problem of obtaining knowledge from Big Data, in our opinion, lies in the exploration process simulations. For simplicity, we refer to the tasks and problems of economic-mathematical modeling.

Mathematical model as a knowledge formed

Mathematical modeling of economic processes and systems is a widespread practice, both in educational processes and in economic research. A great many created over the last decades various mathematical models daily updated with the newly created models having different degree of novelty and practical significance.

Modern information technologies allow to store them and make them available to researchers. In the presence of common requirements and standards for their registration could be stored in a distributed data Bank Badstuber mathematical models. The establishment and maintenance of the banks mathematical models with common (standardized) classification criteria, "immersed" in a single information space (Internet) is required for the further development of "industry knowledge". If the mathematical model has been presented in the Internet in a standardized form, in some cases, when solving practical problems they could be combined into one system.

An example of a positive effect from the integration of several models into one model is the use of statistical procedures Aitken [7] in the system of mass appraisal values of real estate objects of Moscow. The development of the above system was commissioned by the Government of Moscow Finance Academy under the Government of the Russian Federation. As the core of this system used econometric non-linear in the parameters model of mass valuation of real estate of Moscow

Under the assumption that in addition to the developed econometric models to improve the accuracy of estimation values of the property can be used and other models have been implemented this integration and additional models based on the statistical procedure of Aitken that gave the opportunity to significantly improve the quality of the models and the accuracy of the calculated cost characteristics, i.e. to obtain new knowledge.

Consider a different model, namely, using models based on belief networks Bayes for decision on the loan

[2]. In Fig. 1 shows the model in the form of a simple trust network for estimating the probability of a positive decision on the loan. The probability of events, except the root, are determined by expert way.

A

XI Xn

X2

Figure 3 - The web of trust Bayes for decision on issuance of credit.

However, the accuracy and descriptiveness of the model could be improved if can use to assess the likelihood of certain events not expert estimation, and binary choice models [8], that is to combine the model based on the trust network and binary choice model that determines the probability of the event not on the basis of the expert conclusion, on the basis of actual values of financial indicators of the Bank.

These examples illustrate the relevance of the registers database of knowledge, which includes mathematical models of various classes and their standardized descriptions, search, and interaction. This task is only a private, but very important, aspect of the General problem of standardization, retrieval and integration of knowledge.

Accumulated and growing volume of global knowledge puts before mankind the problem of formalization, standardization and integration. The theory of classification and systematization of highly organized fields of knowledge is called taxonomy. Taxonomy is a hierarchically arranged system of concepts from simple to complex.

The mathematical model underlying the taxonomy can be represented in the form of a tree structure set of objects. At the top of this structure is a single unifying classification is the root taxon - which applies to all objects of a given taxonomy. The taxa that are below the root are more specific classifications that apply to subsets of the total set of classified objects. For example, in the classification of organisms Carl Linnaeus, the root taxon is the body. Later in this taxonomy are the type, class, order, family, genus and species [9].

Languages the description of knowledge, semantic web

Currently, knowledge can be represented and described in different languages (also in a hierarchical structure), including natural (Russian, English, etc.), language, formulas, graphs, tables, movies, etc. certain areas of science have their own concepts and their own languages such as genetics or the theory of groups. There are languages for writing proofs, archives of the evidence and programs that can not only check but also to look for evidence of the allegations.

The possibility of creating a universal language of knowledge, even on the basis of such a universal tool and language like XML (which is the basis of KML), is problematic. But, of course, the promotion of scientific research in this direction seems to be very interesting and promising. The output of such research appears results, which find application in expert systems, the theory of intelligent agents and artificial intelligence theories and technologies for creating knowledge bases, computational linguistics, General systems theory, theory of computer analysis of natural language and computer allocation semantics. Cases where any part of the process of problem solving, including theoretical, carried out with the help of a computer, today it is not uncommon. Humanity is beginning to trust your knowledge of the computer.

Progress in this direction is the development of unified standards for knowledge formalization, protocols of exchange of knowledge, technologies of interaction between different ontologies, and associate objects in a single network.

Since the knowledge represented in the form of information stored in different nodes of the global network for enterprises knowledge is necessary that the network consistently presented them in a form suitable for machine processing. Is the development direction of the world wide web is called semantic web (eng. Semantic Web) [11]. In a conventional Web-based HTML pages, the information embedded in the text of the pages and removed by the person using the browser. The semantic web involves the recording of information in the form of semantic network with ontologies. Thus, the client program can directly retrieve from the web of facts and make logical conclusions. Semantic web operates in parallel with the usual Web and on its basis, using the HTTP Protocol, and resource identifiers URI.

The term "semantic web" was first introduced by sir Tim Berners-Lee (the inventor of the world wide web) in may 2001 in the journal "Scientific American" and called them "the next step in the development of the world wide web". Later in his blog, he suggested as a synonym the term Giant Global Graph (Giant Global Graph, GGG, similar to WWW). The concept of semantic web was adopted and promoted by the world wide web Consortium.

Semantic web can be a basis for finding knowledge base of economic-mathematical models. However, there remain the questions of their classification and integration. Integration of two or more mathematical models into a single, more complex, but more efficient requires a sequence of actions that run each at its level.

Try to draw analogies to the network of UDDI (The Universal Description, Discovery and Integration) [10]. UDDI is an initiative of the leaders of e-business Ariba, IBM and Microsoft. This is the first attempt to bring to a common standard business structures available in the Internet. In General, UDDI accepts and organizes three types of information in three categories:

• White pages (white pages) is a General information about the company, including name, business type, address, and contact information.

• Yellow pages (yellow pages) describe what services the company provides. Here you are able to apply the sectoral or intersectoral classifiers.

• Green pages (green pages) is a technical information about services exposed by the business including references and interfaces to them.

Conclusion

We put the issue of standardization and integration of mathematical models that must be distributed in a global data Bank to be linked into a global semantic network. Such a network should ensure the use of Big Data and obtaining on their basis of new knowledge. Considered some of the areas of formalization and mathematical modeling that can be used to solve the problem.

Despite significant progress in the above directions, the issues of standardization, retrieval, and integration of mathematical models still waiting to be explored.

Sources used

1. Katasonov V. Big Data, or Big brother is watching you. - URL: http://ruskline.ru/opp/2017/mart/06/ big_data_ili_bolshoj_brat_sledit_za_toboj

2. Radio frequency identification - URL: http://www.datakrat.ru/technology/7942.html

3. 20 brilliant technologies of the future. -URL: http://russiahousenews.info/society/20-genialnih-tehnologiy-buduschego

4. Bogomolov A. I. Models, standards, and technology interaction in the information society. Monograph., The Financial University, Moscow, 2010.

5. Chernyak L. Big Data — a new theory and practice. URL:http://www.osp.ru/os/2011/10/13010990

6. Hinchcliff D. How do you feel big data in the enterprise. - URL: https://www.pcweek.ru/idea/article/ detail.php?ID=141009

7. Byshev V. A., Bogomolov A. I., Kostyunin V. I. Mass appraisal values of real estate: from model to system // Bulletin of Financial Academy. No. 3, M., 2007.

8. Pasechnik, A. A. the Use of econometric binary choice models to estimate the probability of bankruptcy of Russian banks [Text] / A. A. Pasechnik, D. Pasechnik, E. N. Lucas // the Young scientist. — 2011. — No. 10. Vol. 1. — S. 137-148.

9. The development of expert systems based on Bayesian networks for a decision on the loan.

10. Big encyclopedic dictionary. - URL: http://allbest.ru/o-

3c0b65625b2bd68b4c53b89421216c37.htmlhttp://dic.aca demic.ru/dic.nsf/enc3p/287302

11. Semantic web. Wikipedia. URL: http ://dic.academic.ru/dic.nsf/ruwiki/28440

12. An overview of UDDI. - URL: http://iso.ru/ru/press-center/journal/1863.phtml\

13. Bogomolov A. I., Nevezhin V.P. Econometrics Network Information Society // DOAJ - Lund University: Koncept : Scientific and Methodological e-magazine. -Lund, №4 (Collected works, Best Article), 2014. - URL: http://www.doaj .net/2457/

УДК: 332.145

ИННОВАЦИОННО-ИНВЕСТИЦИОННАЯ ДЕЯТЕЛЬНОСТЬ РОССИЙСКИХ КОМПАНИЙ

Пискун Е.И., Гнитченко А.В. Севастопольский государственный университет, Севастополь, Россия

Аннотация: Инновационно-инвестиционная деятельность современной компании является частью экономических отношений, обеспечивает ее конкурентоспособность и результативность. Данный вид деятельности сопряжен с рядом проблем, в частности, слабой инновационной активностью, нехваткой инвестиционных и иных ресурсов, низкой заинтересованностью менеджеров высшего звена.

Ключевые слова: инновационно-инвестиционная деятельность, инновационная активность, инновации, инновационная инфраструктура.

THE INNOVATIVE AND INVESTMENT ACTIVITY OF RUSSIAN COMPANIES

Piskun E.I., Gnitchenko A. V Sevastopol State University, Sevastopol, Russia

Abstract: innovative and investment activity is a part of up-to-date campaigns' economic relationships, that provides competitiveness and productivity. This type of activity is connected with the set of problems, in particular, the weak innovative activity, the lack of investment or other resources, the low interest of top managers.

Keywords: innovative and investment activity, innovative activity, innovative infrastructure.

Происходящие трансформационные изменения в рыночной конкуренции подталкивают отечественные экономике Российской Федерации, интеграционные и компании к использованию накопленного глобализационные процессы в мировой экономике, рост теоретического и практического опыта управления.

i Надоели баннеры? Вы всегда можете отключить рекламу.