Научная статья на тему 'APPLIED SOLUTIONS BASED ON BIG DATA'

APPLIED SOLUTIONS BASED ON BIG DATA Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
59
16
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
big data / indicators based on big data / solutions on big data

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Pleskach Valentyna, Kryvolapov Yaroslav, Krasnoshchok Viktor, Sholochov Olexiy

The article is devoted to the analysis big data under development of digital society. Few model solutions on the big data are described. Key trends of the growth of big data in the digital society are considered.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «APPLIED SOLUTIONS BASED ON BIG DATA»

12

COMPUTER SCIENCE / «<g®LL®(MUM~J®U®MaL» #8(167), 2023

di

UDC: 004.043

Pleskach Valentyna,

Doctor of Economics, Professor of Taras Shevchenko National University of Kyiv

Kryvolapov Yaroslav,

Assistant Professor of Taras Shevchenko National University of Kyiv

Krasnoshchok Viktor,

Candidate of Technical Sciences, Docent of Taras Shevchenko National University of Kyiv

Sholochov Olexiy Candidate of Physical and Mathematical Sciences, Associate Professor of Taras Shevchenko National University of Kyiv DOI: 10.24412/2520-6990-2023-8167-12-14 APPLIED SOLUTIONS BASED ON BIG DATA

Abstract.

The article is devoted to the analysis big data under development of digital society. Few model solutions on the big data are described. Key trends of the growth of big data in the digital society are considered.

Keywords: big data, indicators based on big data, solutions on big data

Introduction

Data and statistics are strategic elements for a digital society and economy required for reasonable important decision-making both in the private and public sectors. Today, a lot of private companies, national and international companies can admit that 'big data' is a challenge of time, a medium-term concept that requires a long-term strategy.

Top Big Data Challenges, such as lack of knowledge professionals, lack of proper understanding of massive data, data growth issues, confusion while Big Data tool selection, integrating Data from a spread of sources, securing Data, etc. Big data is evolving and can form new real-time data for economic and financial analysis.

2. Materials and Methods. It should be noted that there are a lot of definitions of the big data concept. According to the Gartner Glossary, "Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation".

There are many famous scientists who have investigated the problems of big data, including Gray J., Tukey J., Jordan M., Cutting D., Hinton G., etc.

It is possible to generalize several challenges in 'big data'. One of them is the problem of data selection criteria as it is impossible to keep all data, but certain data structuring and finding of the necessary regularities in them using OLAP is quite possible. Moreover, as is known from applied statistics, to find acceptable statistics in the sense of their reliability, it is enough to have a limited data sample.

This problem is better illustrated by the example of data from the users of social networks. If to store all

the data of each user during his activity in the network, then the volume of data on the user will be extremely large, but it is unlikely that he will ever need it in full neither for himself nor for large marketing companies.

Another problem is construction and verification of useful models based on big data as even one-time 'scrolling' of a respective volume of big data through a hypothetical model at an acceptable time will require great computing power. This will require quantum computers in the future.

Discussion. The accumulation of big data and the emergence of quantum computers is the 'cornerstone' from which new mathematics takes its origin - mathematics of big data in quantum computing. Quantum computers operate on the principles of quantum mechanics, which enable them to perform certain calculations much faster than classical computers. In particular, quantum computers can perform certain types of search and optimization algorithms much faster than classical computers, which could be useful for processing big data.

Let's go back to the design and modelling sections of large systems.

Let's consider a mathematical model of regression: Y = X8 + %, where Y is a dimension vector n x 1 that contains n observations of the values of the dependent variable y, X is a matrix of independent variables, the elements of which are n observations of values m of independent variables - factors dimI = nxm; 8 is a vector of unknown parameters of dimension mx 1 to be estimated; £ -vector of random disturbances (noise) of dimension n x 1.

Vi

Y = J2 ,x =

yn.

x1,1 X1,2 X1,m \Q1] rfci

X2,1 X2,2 x2,m ,8 = = fz

Xn, 1 Xn, 2 xn,m 0m-

«coyyoMUM-JMTMaL» mm /

13

To obtain a solution 8 = (XTX)-1XTY by the least squares method (LSM), first it is necessary to perform multiplicative operations n xm to find the vector-matrix product XTY and multiplicative operations n x m2 to find the product XTX, then, solving a consistent system (and this is unknown in advance!), for example, by the Gauss method, multiplicative operations m(m+11>(m+2') in the direct motion, and multiplicative operations m(™+1 in the back motion. When new data is received, the recurrent least squares method is used. By using known algorithms, the number of calculations can be reduced, but not by an order of magnitude. It is possible to roughly determine the complexity of calculations when obtaining the solution as m3.

For modern computational capabilities and with the use of algorithms that have a smaller number of steps for calculating products and quotients, this is a trivial task. If to add thereto a check on the condition-ality of the Gram matrix the computational complexity increases, but not to the extent that to give up solving the problem.

The task is significantly complicated if there is reason to believe that the factors x1,x2 ...,xm are non-linearly included in the regression. The degree indicators for each factor that give the best assessment of LSM can be infinitely selected. If the data is heterogeneous in density, then polynomial swinging is possible.

The next complication is the multiplicative entry of factors into the regression. If the model is known in advance and assumes linearization by means of loga-rithmization, then the task of finding unknown regression parameters is simplified to the analyzed linear one. A typical example thereto is the Cobb-Douglas production function. If the nature of relationship between the factors is not known in advance, although the correlation between them has been identified, then the search for an adequate model on big data may be hopeless.

The conclusion that can be drawn on the basis of the results of consideration of this model: data pre-processing, including OLAP is a necessary condition for obtaining a solution to similar problems. The main purpose thereof is clarification of the linearity or non-linearity of the influence of factors and the nature of their relationship.

Also, Big data can be useful for macroeconomic and financial statistics and ultimately for policy-making, at least with the help of:

1. Answers to new questions and suggestions for new indicators.

2. Elimination of time lags inherent in official statistics and maintenance of timelier forecasting of the available indicators.

3. Provision of an innovative source of data in production of official statistics.

Data quality issues, access difficulties and the need for new skills and technologies are the main challenges of big data.

The pandemic has led to the fact that the data is increasingly collected around the world at a tremendous pace: according to the International Data Corporation (IDC), an American analytical company specializing in information technology market research, by

2025 the total amount of global data could make 175 zettabytes.

As a result, the global amount of digitized information is exponentially growing. However, data gets any value only if it is stored and analyzed. The volume of big data is gigantic and growing. Data sets are so large and complex that traditional applications are insufficient to collect, process, store, and analyze data.

The suitability of a new data source shall be assessed on the basis of a number of key characteristics such as accuracy, robustness and methodological soundness with metadata being of great importance for interpretation and evaluation of new data sources.

Indicators based on big data have a short time span and contain outliers, and their continuity cannot be guaranteed. Big data is often unstructured and therefore requires proper transformation into time series of observations and clearing of variables, such as replacement of outliers and missing observations with the estimates.

Data privacy and cybersecurity risks are a serious concern when using big data. Big data contains large amounts of sensitive and personal information that may be at risk of privacy and cybersecurity violation. If this information is not adequately protected, it can be vulnerable to cyber-attacks, used for profiling of certain individuals, and sold to third persons. The potential loss of personal data can lead to a loss of reputation, as well as a loss of consumer trust. Thus, international organizations and government institutions must ensure security so that the used data sources and indicators have been obtained without any violation of personal data protection and confidentiality.

Procedures for personal data protection and information technology practices have a key value for privacy and cybersecurity risk minimization when using the detailed big data sources. Companies, government agencies, and third-party data users shall establish robust privacy protection procedures. They shall invest in security levels and adapt traditional information technology methods such as cryptography, anonymization and user access control to the big data characteristics to protect privacy and data from reconstruction and correlation with an individual.

Despite a lot of technical IT solutions and platforms to choose from, the selection processes are quite complex. To process big data, it is necessary to create multidisciplinary teams. Statistical agencies engaged in big data projects are aware of the importance of cooperation with a large number of human and technical resources to make good use of big data.

Working with big data requires a strategy to select the most promising applications to supplement official statistics and create additional value, such as improvement of timeliness, support of forecasting of the existing data sets, and development of new indicators. It is necessary to go beyond the use of certain and separate applications for big data.

From the geographical point of view, the US market has become the largest today with the amount of revenue of 100 billion dollars. Japan and Great Britain take second and third place by the amount of revenue.

14

/ «ШУУШШШУМ-ЛШТМак» #8И&7)), 2023

International organizations responsible for official statistics shall work in close cooperation with the user departments.

The following branches of industry will be drivers at the market of big data and business analytics in the near future: banking sector, specialized services, continuous production, government.

At the same time, the greatest growth of the market in the future will be provided by such areas as retail trade - 15.2% CAGR (Compound annual growth rate), and transactions with securities and investment services - 15.3% CAGR. The main advantages of CAGR are simplicity of its application and universality of the approach.

New approaches to big data help companies make real-time decisions.

The sources of big data should be actively searched for when satisfying the digital services of humanity. And the general assessment given in this paper needs to be constantly revised as big data and official statistics develop.

Conclusion. Data is an important asset of business entities, the issue of information security of private and confidential information requires special attention, since existing information technologies currently provide reliable protection of only half of the volume of big data; the growth of big data will lead to an increase in electronic interaction with smart devices of every inhabitant of the digital planet based on the Internet of Things and processing on base cognitive and AI systems.

Big data is expected to play a critical role in the next five years in many different industries, as busi-

nesses and organizations continue to recognize the importance of data-driven decision-making. Here are some potential ways that big data could be used in the coming years: 1) improved customer insights; 2) better healthcare outcomes;

3) increased automation; 4) enhanced cybersecurity; 5) smarter cities, etc.

Overall, big data is expected to continue to play a critical role in shaping the future of many different industries in the next years and beyond.

List of references:

1. Data Revolution Group. Data Innovation: Big Data and New Technologies, 2014 // http://www.undatarevolution.org/data-innovation.

2. Gray Jim, Scientific Data Management in the Coming Decade Journal of Computer Science and Technology, ACM SIGMOD Record Volume 34 Issue 4 December, 2005, pp. 34 - 41 (https://doi.org/10.1145/1107499.1107503).

3. Tukey John The Future of Data Analysis, Annals of Mathematical Statistics, 1986 (Tukey_the_future_of_data_analysis.pdf (stanford.edu))

4. Jordan M. Big Data: New Tricks for Econometrics, Journal of Economic Perspectives, Volume 28, Number 2, Spring 2014, pp. 3-28.

5. Big Data. https://ec.europa.eu/eurostat/cros/content/big-data_en.

6. Kitchin, R. Big Data and Official Statistics: Opportunities, Challenges and Risks. Statistical Journal of the International Association of Official Statistics 31 (3) (9), 2015.

i Надоели баннеры? Вы всегда можете отключить рекламу.