USING REDIS CLUSTER IN INTERPROCESS COMMUNICATION OF INFORMATION SYSTEMS

Gladun A. M.

Использование кластера REDIS в межпроцессном взаимодействии информационных систем

Гладун Анастасия Михайловна,

магистр прикладной математики и информатики, ведущий программист ООО «АТОН» E-mail: [email protected]

Облачные технологии, как один из результатов научно-технического потенциала ИТ-отрасли, все шире используются в современных приложениях и постепенно вытесняют приложения с другими архитектурами. Это особенно актуально для облачных приложений с данными изображений и данными, которые быстро устаревают. Для удовлетворения потребностей в мощностях помимо вертикального масштабирования (увеличения мощности серверов за счет установки более производительных компонентов) применяется и горизонтальное масштабирование. При использовании стратегии горизонтального масштабирования возможно увеличение количества как однотипных серверов с равномерным распределением однотипных задач, так и использование разнотипных серверов с постановкой узкоспециализированных задач для каждого сервера. Современные проекты используют комбинированный подход - используют группы серверов, работающих с разными типами задач.

Для достижения высокого показателя производительности кластер Redis использует оперативную память для хранения данных. Кластер Redis часто используется в качестве инструмента для хранения данных и их кэширования, он стал популярным инструментом благодаря своей масштабируемости и высокой скорости. В статье рассматриваются особенности использования кластера Redis в процессе межпроцессного взаимодействия информационных систем. Кластер Redis предоставляет способ запуска установки Redis, при котором данные автоматически распределяются по нескольким узлам Redis. Кластер Redis имеет больше возможностей, чем memcached, и поэтому является более мощным и гибким. Он используется многими компаниями и во многих критических производственных средах.

Кластер Redis позволяет: автоматически разделить набор данных между несколькими узлами; продолжать работу, когда подмножество узлов выходит из строя или не может обмениваться данными с остальной частью кластера. В статье подчеркивается важность правильного проектирования межпроцессного взаимодействия на всех уровнях трехуровневой архитектуры, что значительно ускоряет общее время отклика в информационной системе с горизонтальным масштабированием.

Ключевые слова: Redis, кластер Redis, база данных, программное обеспечение, резервное копирование, синхронизация.

Introduction

Cloud technologies, as one of the results of the scientific and technical potential of the IT industry, are increasingly used in modern applications and are gradually replacing applications with other architectures [1]. Cloud applications have an undeniable advantage - they reduce the requirements for computing resources of client devices. As a result, it is possible to process large datasets using slow computers, as well as mobile devices and loT (Internet of Things) devices, as a multifunctional infrastructure of the macro-digital ecosystem [2]. A stable Internet connection is the main requirement for devices. As of the beginning of 2018, it is available due to the expanse of Wi-Fi, 3G and LTE technologies.

It's not enough to have one server to service a large number of client devices (large-scale projects may have hundreds of thousands or millions client devices to service). This is especially true for cloud applications with image data and data quickly becoming obsolete. To satisfy the capacity needs, in addition to vertical scaling (increase in server capacity by mounting more productive components), horizontal scaling is also applied to. Horizontal scaling assumes increasing the number of servers, nodes, processors that handle data.

When using a horizontal scaling strategy, both same-type servers can be increased in number, with evenly distributed same-type tasks, and different-type servers can be used, with narrowly specialized tasks set for each server. Modern projects use a combined approach - they use groups of servers working with different types of tasks.

Main body (methodology, results)

1.1 Types of communication in information systems with horizontal scaling

Interprocess communication is a key aspect in modern information systems. Due to well-functioned communication, especially in case when the system is a distributed one with horizontal scaling, the end user can quickly have a response to his actions, many users can simultaneously serve the information system (IS) and the IS maintenance will not be expensive [3].

In distributed IS, communication between processes is both intra- and extra-level.

Figure 1.1 shows a standard three-tiered architecture [4]. The first type of communication occurs between the client-end and the server-end portion of application. Communication runs on a many-to-one (in case there is no horizontal scaling in the server-end portion of application) or many-to-many base. The information system developers are fully entrusted to implement this

C3

о

CO

от m Р от

от А

=Е

communication type. They can use both ready-made software solutions with proven architectural patterns, and their own developments. But, regardless of the implementation, the vast majority of solutions work on top of the IP network layer protocol, since the Internet network structure runs on this protocol.

The second communication type occurs within the server-end portion of application. In a scalable system, this end has several application servers that can perform both similar, and different narrowly focused tasks.

Fig. 1.1. A three-tiered architecture in distributed systems

Data between the elements of the application server tier can be exchanged in the following ways:

1. Data exchange within the tier (Fig. 1.2a). Special software is used for data exchange. These may include individual messaging systems between system and keyvalue storage components, or mechanisms integrated into server processes that use sockets and other types of network interaction.

2. Data exchange using the database server tier (Fig. 1.2b). In this case, data to be transferred to the neighboring application server is sent to the database

from which data is read by the second application server.

Each approach has its advantages and disadvantages. The approach to use local intra-tier tools ensures better productivity and saves database server resources. But the extra mechanisms implemented complicate developing software for application servers and may sometimes lead to fault-tolerance reduction of the system due to additional elements in the communication network.

Q.

e

CM CM

Fig. 1.2. Application server communication options

Recently, there has been a tendency to promote communication between client processes. Such a solution offloads application servers. Basically, these solutions are used in voice and video communication

(Skype establishes direct secure connections when two users talk), secure messaging (Secret chat mode in Telegram [5]) and cooperative file sharing tools (Bit-Torrent protocol) (Figure 1.3).

Сервер приложений (T4)

Fig. 1.3. Sequence of comminication between information system elements

Optimizing the time spent on interprocess operations is not the only task that distributed system developers face. There are also a number of problems that systems with calculations distributed among several machines have.

As mentioned earlier, same-tier inter-place data are to be occasionally synchronized in a three-tier model of distributed computing systems (DCS) with horizontal scaling. Regarding the database tier, DBMS developers enable synchronization by using integrated DBMS

mechanisms and DB drivers. Application servers do not always require synchronization. In rare cases, server logic enables its handlers to be scaled into independent nodes with zero functional losses. But there are situations in which direct and rapid information exchange is required for the processes implementing server logic.

The first example is illustrated in a system with heterogeneous application servers performing different functionality (Figure 1.4).

Fig. 1.4. Three-tier architecture with multi-task application servers

For security reasons, correctly implemented applications authorize each user action, regardless of its type - reading data, changing, adding or deleting records. Data reading is a very frequently used process, and its operation is authorized with the same intensity.

Storing session data in the primary database implies at least twice as many requests. Relational databases are often highly inefficient in running frequent and simple requests. Therefore, session data are stored and obtained by using other mechanisms.

Possible ways to synchronize data between application servers:

1. Synchronization through a database layer. The solution is simple in its implementation, but is inappropriate when a high-end solution is required, and relational databases are the core databases.

2. The use of data caching tools in RAM with the ability to combine storage between several nodes (memcached).

3. Using special-purpose key-value fast stores (Redis).

4. Using interprocess messaging (RabYtMQ).

5. Using the runs designed for a specific distributed system.

6. Using both simple solutions and their variations with an option to connect p2p, network self-organization and node accessibility control.

Several options can be combined.

1.2. Synchronization through the database layer

This approach implies storing session data in the main database (Figure 1.5). There is a sheet (in a relational database), or an individual document (in non-relational document-oriented databases) to record session data.

Fig. 1.5. Synchronizing sessions through the primary database

This synchronization type is characterized by the following features:

1. Ease of implementation. There is no need for additional tools. Sometimes additional DBMS configuration is required to speed up simple requests.

2. Increased load on the primary database server.

3. Slow implementation even if database settings are optimized. Modern databases (especially relational ones) have complex multi-stage request parsing algorithms, as well as algorithms for searching and reading the requested data. In case of large samples, these DBMSs run quickly with an acceptable running time, but these databases are redundant and slow when running short repeated requests.

4. Data loss tolerance. DBMSs have many mechanisms designed to ensure information integrity, even of OS or hardware fail during transactions.

1.3 Memory-combining data caching tools

The primary objective of data caching is to accelerate access to the re-requested data. A cache is fast access intermediate memory with information that is likely to be requested [6].

The cache size is much smaller than the primary = storage size.

e Data addressing mode is an important aspect of £ cache memory. The data in the cache is a key-value ° pair. Addressing occurs through the key. The key can £ be:

1. The entire object identifier.

2. Part of the object identifier.

3. Hash function computed from the entire object or from some of its fields.

When accessing data by a specified key, there are two options - cache hit (the required data is located at the specified address) and cache miss (data is missing, or refers to another object). Recording is possible both in an empty data cell and on top of another recorded object.

Although the cache is a temporary data store and data can be lost by running out of cache limits or overwriting a value, some scenarios make it possible to safely use caching tools as a tool for communication between servers. It is important to provide for data back-up (if necessary) in ROM.

1.4 Memcached distributed caching system

Memcached is software that caches data in RAM as a hash table. It is free and open source software under the BSD license. Memcached can run under Unixlike operating systems (Linux, macOS, FreeBSD) and Windows.

Memcached provides a huge hash table distributed (if necessary) between multiple machines. When the table is full, incoming records overwrite old least used data. Distributed caching applications typically first access memcached to request data and in case of failure request data from slow stores.

Fig. 1.6. Memcached scaling. Fitzpatrick B. Distributed caching with memcached

Memcached is based on the client-server interface architecture (Figure 1.6).Clients make requests to memcached servers by using client libraries, connecting via TCP or UDP protocol (port 11211). A client knows the entire set of servers, while the servers do not have information about the clients (requests are initiated by clients only). When a client requests data, the key hash-based client library determines which server to use. Data distribution through the key split ranges ensures the solution scalability (Figure 1.6).

Figure 1.7 shows an implementation variation of the session store by using the memcached distributed cache. Sessions, in addition to being in the database, are also cached on memcached servers and are available from all application servers.

Fig. 1.7. Storing sessions by using memcached

It is important to remember that memcached cannot store user data that is not backed up and cannot be recovered. Additionally, memcached can store session data - tokens generated after authentication. Occasional token loss will only cause an extra need to re-authenticate, while in some cases re-generation of new tokens is useful in terms of security.

1.5 Redis key-value data store

Redis is open source software that stores data in key-value Ram and ROM and is used as a database, cache and message transfer tool (interprocess communication tool) [7].

Redis can store the following data types [8]:

1. String.

2. Array of strings.

3. Rowset (a set of rows with no values repeated).

4. Ordered rowsets (a set of non-repeated rows sorted by value).

5. Hash tables.

6. HyperLogLogs-like data used to estimate the number of unique values in datasets.

7. Geolocation data.

As with MySQL master-slave replication, this innovative design provides read scalability but not write scalability [9].

The Redis Sentinel subsystem has been developed to ensure more flexible scaling. The subsystem main tasks include:

1. Monitoring. Sentinel processes constantly ensure that the primary and subordinate servers are up and running.

2. Alerting. Notifications of the events that occur with Redis instances can be sent to the administrator or other programs through the API in case of emergency.

3. Fault tolerance automation. If the primary server fails, Sentinel ensures reconfiguration and transforms an ex-subordinate server into the primary one.

4. Configuration. Applying to Sentinel, clients can find out the current addresses of the main server in case reconfiguration occured.

As of 2021, Sentinel had already been an outdated solution for scaling Redis. The replacement is Redis clustering using the Redis Cluster utility suite. Its main advantage is simultaneous support for sharding and replication [10].

Redis is a more flexible solution than memcached; it enables fast and secure storage with less effort taken to ensure that data is stored in ROM. This solution also offloads the primary database server, since, due to the mechanisms that store data in a secure place and the

C3

o

CO £

m p

CT

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

CT1 A

IE

Q. в

oj oj

ability to replicate them, there is no need to back up this data in the primary database.

Conclusion

Thus, it is worth noting that the Redis cluster provides a way to start the Redis installation, in which data is automatically distributed across several Redis nodes. The Redis cluster also provides some accessibility - the ability to continue operations when some nodes fail or cannot communicate. The Redis cluster has more capabilities than memcached, and thus is more powerful and flexible. It is used by many companies and in numerous critical production environments. Redis Cluster enables to:

- automatically split the dataset between multiple nodes.

- continue operation when a subset of nodes fails or cannot communicate with the rest of the cluster. To summarize, the importance to properly design

interprocess interaction at all levels of a three-tier architecture is emphasized, which significantly speeds up the overall response time in an information system with horizontal scaling. In addition, it should be said that the testing method is far from being perfect and in future changes in the testing method are to be made, the list of tools used is to be expanded and new results to the relevant Internet resources are to be published.

Литература

1. Zhigalov V. I., Sokolova M.V. Study of innovation processes based on the analysis of patent activity of residents and non-residents, and the scientific and technical potential of the country // Modern Science: Actual Problems of Theory and Practice. Series: ECONOMY and LAW. 2022. No. 09. pp. 37-44.

2. Zhigalov V.I. Basic conditions for the creation and development of innovation and technology parks // Innovations and investments. 2010. № 2. pp. 5052.

3. Gorodnichev M.G. Methods of designing and developing client server applications. Information society technologies. International industry-specific scientific and technical conference: proceedings. 2017. pp. 439-440.

4. Tanenbaum E, van Steen M. Distributed systems. Foundations and framework. St. Petersburg: Piter.

5. Grashoff K., Heemskerk, B., Usta B., Vonk M. Telegram-Web.

6. Lasnitskaya M. The endless Internet has come to an end. Moscow. 8. Nelson J. Mastering Redis. Birmingham, UK: PACKT Publishing.

7. Bartenev V.V. The HTTP/2 Module in NGINX. San Francisco.

8. Tikhonov N. A., Budnikova I.K. Redis backup analysis and processing //Information technologies in construction, social and economic systems. 2020. № 2 (20). pp. 121-124.

9. Zhigalov V.I. Trends in the Formation and Use of Intangible Assets of Innovatively Active Enterpris-

es // Innovations and Investments. 2022. No. 9. pp. 58-62.

10. Kostenko I. P., Stupina M.V. Improving web applications by using redis DBMS //Don's young researcher. 2022. № 4 (37). pp. 29-32.

USING REDIS CLUSTER IN INTERPROCESS COMMUNICATION OF INFORMATION SYSTEMS

Gladun A.M.

ATON LLC

Cloud technologies, as one of the results of the scientific and technical potential of the IT industry, are increasingly being used in modern applications and are gradually replacing applications with other architectures. This is especially true for cloud-based applications with image data and data that quickly become out of date. To meet capacity needs, in addition to vertical scaling (increasing the capacity of servers by installing more powerful components), horizontal scaling is also used. When using the horizontal scaling strategy, it is possible to increase the number of servers of the same type with a uniform distribution of tasks of the same type, as well as the use of different types of servers with the formulation of highly specialized tasks for each server. Modern projects use a combined approach -they use groups of servers that work with different types of tasks. To achieve a high performance score, a Redis cluster uses RAM to store data. Often used as a data storage and caching tool, Redis Cluster has become a popular tool due to its scalability and high speed. The article discusses the features of using a Redis cluster in the process of interprocess communication of information systems. A Redis Cluster provides a way to run a Redis installation that automatically distributes data across multiple Redis nodes. Redis cluster has more features than memcached and is therefore more powerful and flexible. It is used by many companies and in many critical manufacturing environments.

A Redis cluster allows you to: automatically split a dataset across multiple nodes; keep running when a subset of nodes fails or cannot communicate with the rest of the cluster.

The article emphasizes the importance of proper design of interprocess communication at all levels of a three-tier architecture, which significantly speeds up the overall response time in a horizontally scalable information system.

Keywords: Redis, Redis Cluster, database, software, backup, synchronization.

Reference

1. Zhigalov V. I., Sokolova M.V. Study of innovation processes based on the analysis of patent activity of residents and non-residents, and the scientific and technical potential of the country // Modern Science: Actual Problems of Theory and Practice. Series: ECONOMY and LAW. 2022. No. 09. pp. 37-44.

2. Zhigalov V.I. Basic conditions for the creation and development of innovation and technology parks // Innovations and investments. 2010. № 2. pp. 50-52.

3. Gorodnichev M.G. Methods of designing and developing client server applications. Information society technologies. International industry-specific scientific and technical conference: proceedings. 2017. pp. 439-440.

4. Tanenbaum E, van Steen M. Distributed systems. Foundations and framework. St. Petersburg: Piter.

5. Grashoff K., Heemskerk, B., Usta B., Vonk M. Telegram-Web.

6. Lasnitskaya M. The endless Internet has come to an end. Moscow. 8. Nelson J. Mastering Redis. Birmingham, UK: PACKT Publishing.

7. Bartenev V.V. The HTTP/2 Module in NGINX. San Francisco.

8. Tikhonov N. A., Budnikova I.K. Redis backup analysis and processing //Information technologies in construction, social and economic systems. 2020. № 2 (20). pp. 121-124.

9. Zhigalov V.I. Trends in the Formation and Use of Intangible Assets of Innovatively Active Enterprises // Innovations and Investments. 2022. No. 9. pp. 58-62.

10. Kostenko I. P., Stupina M.V. Improving web applications by using redis DBMS //Don's young researcher. 2022. № 4 (37). pp. 29-32.

USING REDIS CLUSTER IN INTERPROCESS COMMUNICATION OF INFORMATION SYSTEMS Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Gladun A. M.

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Gladun A. M.

Текст научной работы на тему «USING REDIS CLUSTER IN INTERPROCESS COMMUNICATION OF INFORMATION SYSTEMS»