№ 10 (127)
октябрь, 2024 г.
DOI - 10.32743/UniTech.2024.127.10.18417
ADDRESSING DATA CONSISTENCY IN HIGH-LOAD DISTRIBUTED SYSTEMS: IMPLEMENTATION CHALLENGES AND SOLUTIONS
Dmitriy Malygin
Software Architect, Technical Director of Projects at T1 Digital LLC Russia, St. Petersburg E-mail: [email protected]
ОБЕСПЕЧЕНИЕ СОГЛАСОВАННОСТИ ДАННЫХ В ВЫСОКОНАГРУЖЕННЫХ РАСПРЕДЕЛЕННЫХ СИСТЕМАХ: ПРОБЛЕМЫ РЕАЛИЗАЦИИ И РЕШЕНИЯ
Малыгин Дмитрий Сергеевич
архитектор программного обеспечения, Технический директор проектов ООО Т1 Диджитал,
РФ, г. Санкт-Петербург
ABSTRACT
This article presents a review of best practices for ensuring data consistency in high-load distributed systems (HLDS). It highlights the importance of this task in the context of rapidly growing data volumes. Various consistency models such as strict, sequential, weak, and causal are analyzed. Methods and approaches for ensuring consistency are studied. Examples of successful application of these approaches in the practice of leading international companies are provided. Methods for solving typical problems arising during the implementation of consistency in HLDS are discussed.
АННОТАЦИЯ
В статье представлен обзор лучших практик обеспечения согласованности данных в высоконагруженных распределенных системах (HLDS). Подчеркивается важность этой задачи в условиях быстрорастущих объемов данных. Анализируются различные модели согласованности, такие как строгая, последовательная, слабая и причинная. Изучаются методы и подходы к обеспечению согласованности. Приводятся примеры успешного применения данных подходов в практике ведущих международных компаний. Обсуждаются методы решения типичных проблем, возникающих при реализации согласованности в HLDS.
Keywords: data consistency, high-load distributed systems, system scalability, data management, application architecture optimization, NoSQL, NewSQL, Kafka, CAP theorem.
Ключевые слова: согласованность данных, высоконагруженные распределенные системы, масштабируемость системы, управление данными, оптимизация архитектуры приложений, NoSQL, NewSQL, Kafka, ZooKeeper, теорема CAP.
Introduction
In the contemporary world, data has emerged as a new 'currency', as it enables the generation of substantial volume of insights that businesses can leverage to both increase their revenue and improve the efficiency of their processes [1]. However, as the volume of data increases, software systems must address challenges related to concurrency management, latency reducing, data replication handling. Distributed systems, which consist of a number of independent nodes working together to provide a unified business service, have become critical part for the operation of modern cloud-based applications, including e-commerce platforms, social networks, and banking services [2]. Ensuring consistency in high-load scenarios—characterized by frequent data updates from multiple sources - often conflicts with
other system properties such as availability and partition tolerance, as articulated by the CAP theorem [3].
The primary goal of this study is to explore and analyze various techniques for addressing data consistency in high-load distributed systems (HLDS), develop a framework for consistency model evaluation and a set of recommendations for software developers and architects designing high load architectures.
The tasks of this study were to identify and articulate the primary challenges_associated with maintaining data consistency in distributed systems operating under high-load conditions. The challenges include issues related to concurrency, network latency, and partition tolerance; synthesize findings from existing research and analyzing emerging trends in data consistency techniques; evaluate and classify consistency models such as strong consistency, eventual consistency,
Библиографическое описание: Malygin D. ADDRESSING DATA CONSISTENCY IN HIGH-LOAD DISTRIBUTED SYSTEMS: IMPLEMENTATION CHALLENGES AND SOLUTIONS // Universum: технические науки : электрон. научн. журн. 2024. 10(127). URL: https://7universum.com/ru/tech/archive/item/18417
№ 10 (127)
causal consistency, focusing on their theoretical foundations and practical implications in high-load scenarios; provide actionable insights and recommendations for practitioners and researchers, providing best practices for designing distributed systems that effectively balance data consistency, availability, and performance (in accordance to CAP theorem) under high-load conditions.
The relevance of this work is justified by presence of technical challenges associated with the implementation of scalable and fault-tolerant architectures for data-intensive business systems, such as e-commerce platforms, financial services, online banking, and social networks. The challenges require well-reasoned solutions.
The theoretical significance of this study lies in its provision of a comprehensive framework for the evaluation and comparative analysis of various consistency models, systematically assessing their strengths and weaknesses. Furthermore, this study can serve as a reference for teaching concepts related to data consistency, system design, and performance optimization. The practical contribution of this research includes the formulation of a set of recommendations, best practices, and guidelines for practitioners and researchers engaged in the field of distributed systems, particularly in the design of systems that prioritize both performance and data integrity.
Materials and Methods
To investigate and evaluate various methods for ensuring data consistency in distributed software systems and to achieve the stated goals, this study uses both theoretical and practical approaches. In order to build up a holistic and comprehensive understanding of the topic addressed in this research; the following steps were undertaken:
• identified the core concepts such as the CAP theorem (Consistency, Availability, and Partition Tolerance), as well as various consistency models such as strong, eventual, causal, and hybrid consistency models.
• confirmed theoretical conclusions and obtained practical recommendations by comprehensive review of pertinent academic journals, conference proceedings, and technical reports was conducted across various databases, including IEEE Xplore, ACM Digital Library, and Google* Scholar [4]. The key terms used in the search included "data consistency, "consistency models, "distributed systems, "high-load environments, and "scalability".
• studied practical application of various techniques for ensuring consistency, including an analysis of how each model addresses the challenges presented by specific scenarios and use cases associated with each model.
The following stages of the research were conducted:
1. The data acquired from theoretical sources and practical use cases were systematically processed and analyzed. The inclusion criteria for the sources stipulated that the studies must be current and published within the last three-four years, specifically focusing on insights into data consistency techniques in distributed systems. Both theoretical and empirical studies were reviewed to identify emerging trends and best practices.
октябрь, 2024 г.
2. The findings were synthesized to construct a cohesive narrative regarding the current state of research on data consistency in high-load distributed systems. The key themes identified include: The evolution of consistency models in response to the growing volume of data; The significance of hybrid consistency models in attaining a balance between performance and correctness.
3. A comparative framework was established to assess the efficacy of various data consistency techniques under high-load conditions. This framework encompassed criteria including: a. the evaluation of the influence of each technique on system latency and throughput; b. the assessment of scalability involves evaluating the capacity of various techniques to manage growing data volumes and increasing user demands; an examination of the effectiveness of conflict resolution mechanisms in upholding the consistency.
4. A comprehensive set of recommendations was developed based on the results obtained. These recommendations are tailored to guide the selection of the optimal consistency strategy, taking into account the specific requirements of the architecture and the most efficient utilization of resource capabilities.
The proposed comparison framework enables software developers, architects, and Chief Technology Officers (CTOs) to make informed and structured decisions. For instance, they can determine whether to implement strong consistency, which ensures immediate correctness, or eventual consistency, which prioritizes availability and speed, based on the specific requirements of the application. Additionally, the framework facilitates the evaluation of trade-offs, such as latency versus consistency, cost versus performance, and complexity versus flexibility. It also aids in risk mitigation by assessing how different databases manage network partitioning and whether the system can degrade gracefully during failures. Furthermore, the framework promotes support of performance and scalability optimizations. Additional advantage is keeping consistency across teams. This consistency is crucial for long-term maintainability at an appropriate level. For example, when selecting between various open-source libraries or frameworks, a comparison based on active development and community support can help ensure that the project is not impeded by the use of unsupported or outdated technologies.
The next step in the development of the comparison framework must consider additional factors that influence economic efficiency, including the learning curve of the software engineers, costs of technical resources, and other relevant variables for calculation. This approach will facilitate the creation of a more accurate and effective tool for decision-making in the domain of data consistency.
Results, and Discussion
Distributed systems have become widespread in software solutions for commercial companies due to their scalability, reliability, significant potential for costs reduction, and handling the ever-growing volumes of data. The primary consumers of these solutions are so called data-intensive applications such as e-commerce platforms, social media networks, or cloud service
№ 10 (127)
PKraGpb, 2024 r.
providers, which need to accommodate millions of users and transactions [5]. Unique characteristics of the distributed systems make them essential for businesses that eager to innovate, grow, and compete in a digital-first economy.
A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system. These types of system have become widespread due to establishment of microservice architectures.
Figure 1. Transition from the monolith to the microservice architecture
With a set of aforementioned advantages of distributed systems, they also come with several disadvantages and challenges. These downsides can add complexity and cost to system design, maintenance, and operation. Here are some key disadvantages of distributed systems [6]:
• Network related issues
• Fault detection and diagnosis challenges
• Security related issues
• Increased maintenance costs
• Higher resource requirements
• Data consistency challenges
One of the most important goals of a distributed system is to allow its users to get an access to resources. It can be implemented in a cooperative way, as in the case of communication channels or in competitive way when a number (> 2) of independent users / clients want to retrieve the same file server or the same tables in a
shared database. In instances of concurrent resource utilization, it is essential that each user remains unaware of the simultaneous access by others to the same resource. This phenomenon is referred to as concurrency transparency. A critical consideration in this context is ensuring that concurrent access to a shared resource maintains the resource in a consistent state. Consistency can be attained through the implementation of locking mechanisms, which grant users exclusive access to the desired resource in a sequential manner. A more sophisticated approach involves the utilization of transactions; however, the implementation of transactions in a distributed system can be challenging, particularly when scalability presents a concern [7]. It often takes place in building transaction-intensive solutions like payment platforms [8].
№ 10 (127)
QKra6pb, 2024 r.
Figure 2. Transaction-intensive solitonspresent significant challenges in maintaining data consistency due to their inherent complexity
Understanding and applying best practices for maintaining data consistency is becoming an important aspect for ensuring the stability and efficiency of high-distributed systems (HDS) [9]. The selection of architectural solutions and technologies that facilitate a balance between consistency and availability is crucial for the system's capacity to adapt to evolving usage conditions.
Data consistency models are a set of principles and guarantees that define the behavior of a system when reading and writing data. They provide predictability and reliability in a distributed computing environment and define how and when changes made in one place become visible to other parts of the system. There are five main criteria that must be evaluated for developing data sharing in distributed systems [6]:
• concurrency (the extent to which conflicting read/write access is permissible)
• consistency (preservation of the update dependency and stale read data tolerance)
• availability (the access method to the replica is their absence) visibility (how should we have a global view, when the local changes have been applied on the replicated data)
• isolation (when the remote updating must be observed locally)
• These criteria represent a set of requirements to a consistency model appropriate to the system and in accordance to the CAP theorem [10]. Based on these requirements many consistency models were invented and classified (Figure 3) [7] but only few types of them found widespread adoption:
• strict consistency (linearizability) assumes that if a write occurs before a certain point in time, then any subsequent read after that point will see that write. Achieving strict consistency is quite difficult, especially when high availability and partition tolerance are required
• sequential consistency has less stringent requirements and guarantees that if a write is seen as complete, then all subsequent reads will see the result of that write [11], but does not require that changes be immediately visible to all nodes
• weak consistency provides even more flexible guarantees by not requiring immediate or sequential visibility of changes. Instead, it guarantees that if no new writes are made, then eventually all reads will return the latest value. This greatly simplifies its implementation,
№ 10 (127)
PKraGpb, 2024 r.
where latency and network partitioning are common [12], but introduces an element of uncertainty into the system's behavior, making it less predictable for users and developers
• causal consistency ensures that operations that have a causal relationship will be visible in the appropriate order.
Figure 3. Classification of different consistency models used in distributed systems
The consistency model is largely determined by the specifics of the application. Developers must carefully weigh the requirements for availability and timeliness
of information to find a balance for a system development. The following figure demonstrates visual representation of parameters of the consistency models:
Eventual Consistent Prefix Session Causal Bounded Strong
Read Staleness Sequential Lineanzable
Read Your Own Write
Monotonie Read Sinei
Write RAnvs Read
Weaker Consistency Monotonie Write Stronger Consistency
t>
Lower Availability, Higher Latency, Lower Throughput, Lower performance, Highly consistent data Figure 4. The characteristics of consistency models are dependent on their respective types
There are methods and approaches to ensuring con- approaches are used, each of which has its own charac-
sistency. To ensure data consistency, various methods and teristics, advantages and areas of application (Table 1).
Table 1.
Features of the application of consistency assurance methods
Method/Approach Peculiarities Advantages Application areas
Two-phase commit (2PC) Two phases: preparation and confirmation /rollback. Guarantee of transaction atomicity. Distributed transaction systems.
Three-Phase Commit (3PC) Adding a pre-phase to 2PC to reduce blocking. Improved fault tolerance. Distributed systems with high availability requirements.
Vector clock Tracking causal dependencies between events [13]. Allows for the implementation of causal consistency. Systems, where the sequence of events is important.
Data versioning [14] Saving the history of changes to objects. Simplifying conflict resolution. Systems with frequent data changes.
Resolving conflicting replicas Mechanisms for merging changes to achieve consistency. Flexibility in handling conflicts. Systems with active replication.
№ 10 (127)
PKraGpb, 2024 r.
Each of these methods and approaches plays a crucial role in maintaining data consistency within the developed system. By integrating various mechanisms and considering the specifics of a particular application or service, it is possible to identify the optimal balance among consistency, availability, and efficiency, thereby enhancing the system's reliability and operational effectiveness. The implementation of microservice architecture offers significant opportunities for the application of these technologies.
Problems of implementing consistency and ways to overcome them. In the process of implementing consistency in the system, developers face a number of problems, including the complexity of data management in conditions of high availability and fault tolerance [15]. It is necessary to ensure the scalability of the system while maintaining strict requirements for data consistency, which requires careful planning and optimization of the architecture. Table 2 presents typical problems when implementing consistency and possible solutions:
Table 2.
Problems of implementing consistency and ways to solve them
Problem Description Solution methods
Delays due to rigid consistency [15] Striving for absolute consistency increases latency due to the need to synchronize data between nodes. Applying consistency models tailored to the specific requirements of the application, such as sequential or causal consistency.
Concurrency and Lock Management Simultaneous access to data may result in blocking and slowdowns in the system. Use of optimistic and pessimistic blocking methods and conflict resolution algorithms.
Difficulty of scaling System expansion can make it more difficult to manage data consistency. Data partitioning and replication, using geographically distributed databases to improve scalability and availability [17].
Network partition tolerance Network failures may prevent data synchronization between nodes. Using partition-tolerant algorithms such as Paxos or Raft to ensure consistency even under network problems [18].
Ensuring Consistency in Microservice Architecture Microservices increase the complexity of data management and consistency. Create a centralized state management service or use event-driven approaches to synchronize data between microservices.
This is not an exhaustive list of potential methods for addressing common consistency issues; however, it outlines fundamental strategies that help to the develop reliable and effective distributed systems.
Conclusion
Ensuring data consistency in the distributed systems is a complex task that requires a comprehensive approach and a deep understanding of the theoretical foundations and practical aspects from developers and system architects. The choice of the optimal consistency model is a key factor in determining the strategy for working with data in the developed system. Depending on the specifics of the application and the requirements for reliability and performance, it may be necessary to use strict, causal or sequential consistency. The success of the implementation is significantly influenced by the selected technologies and tools, including NoSQL and NewSQL [19] databases, messaging brokers and coordination service such as Apache Kafka and Apache
ZooKeeper respectively, which provide powerful capabilities for creating scalable and reliable systems [20]. Finding a balance between performance and consistency requires careful consideration from developers and can be achieved through trade-offs aimed at optimizing the system for specific tasks. An important part of the work is writing strategies to solve common problems such as latency and blocking management, scalability, and resilience to network partitioning is key to creating efficient and reliable platforms. Analysis of successful examples, such as Amazon DynamoDB and Google * Spanner [21], though not presented in this study, shows that with a reasonable choice of consistency models, as well as with the effective application of modern approaches to data management, it is possible to achieve a high level of reliability and performance that meets application requirements. Successful implementation of consistency requires from developers not only technical knowledge and skills, but also a deep understanding of the specifics of the application, which allows finding flexible and effective solutions for each specific case.
References:
1. Acciarini, C., Cappa, F., Boccardelli, P. and Oriani, R., 2023. How can organizations leverage big data to innovate their business models? A systematic literature review. Technovation, 123, p. 102713.
2. Malallah, H.S., Qashi, R., Abdulrahman, L.M., Omer, M.A. and Yazdeen, A.A., 2023. Performance analysis of enterprise cloud computing: a review. Journal of Applied Science and Technology Trends, 4(01), pp. 01-12.
№ 10 (127)
UNIVERSUM:
ТЕХНИЧЕСКИЕ НАУКИ
• 7universum.com
октябрь, 2024 г.
3. Lee, E.A., Akella, R., Bateni, S., Lin, S., Lohstroh, M. and Menard, C., 2023. Consistency vs. availability in distributed real-time systems. arXiv preprint arXiv:2301.08906.
4. Zhang, Z., Patra, B.G., Yaseen, A., Zhu, J., Sabharwal, R., Roberts, K., Cao, T. and Wu, H., 2023. Scholarly recommendation systems: a literature survey. Knowledge and Information Systems, 65(11), pp.4433-4478.
5. Abughazala, M., 2024. Architecting Data-Intensive Applications: From Data Architecture Design to Its Quality Assurance. arXiv preprint arXiv:2401.12011.
6. S. Susarla, J. Carter, Composable consistency for large-scale peer replication, Technical Report, Number UUCS-03-025, School of Computing.
7. Aldin, Hesam & Deldari, Hossein & Moattar, Mohammad & Razavi Ghods, Mostafa. (2019). Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applications. 10.48550/arXiv.1902.03305.
8. Saga-Based Design Using Apache Camel and Kafka: Implementing Highly Reliable Distributed Business Transactions, https://portx.io/saga-based-design-using-apache-camel-and-kafka-implementing-highly-reliable-distributed-business-transactions
9. Filisov D.A. Optimization strategies for highly loaded applications: increasing overall performance / D.A. Filisov // Science Bulletin. Vol. 3, No. 7 (64). 2023, P. 233-257. doi: 10.24412/2712-8849-2023-764-233-257
10. Kuznetcov IA Scalable architectures for backend development: current state and prospects // Modern scientific researches and innovations. 2024. No. 2 [Electronic journal]. URL: https://web.snauka.ru/en/issues/2024/02/101564
11. Krasochkin S.G. Finding the ideal balance between query processing speed and scaling of relational and non-relational databases / S.G. Krasochkin // Innovations and Investments. No. 7. 2023. P.169-171.
12. de Freitas, D.C.A., 2023. Towards Causal Consistency in Read-Heavy Cloud-Native Systems.
13. Karpovich M.N. Features of designing microservice -event architectures for highly loaded distributed information processing systems / M.N. Karpovich // Proceedings of BSTU. Series 3: Physical and mathematical sciences and computer science, No. 1 (266). 2023. P. 89-95.
14. Aikins, M.V., 2023. Distributed storage systems and how they handle data consistency and reliability. Faculty of Natural and Applied Sciences Journal of Scientific Innovations, 5(1), pp. 83-89.
15. Sharma, P. and Prasad, R., 2023. Techniques for Implementing Fault Tolerance in Modern Software Systems to Enhance Availability, Durability, and Reliability. Eigenpub Review of Science and Technology, 7(1), pp.239-251.
16. Redrugina N.M. Models and methods for calculating delays in the provision of services by the user on service platforms of session infocommunication services / N.M. Redrugina // T- Comm Telecommunications and Transport. Vol. 17, No. 4. 2023. P. 32- 38.
17. Emara, T.Z. and Huang, J.Z., 2020. Distributed data strategies to support large-scale data analysis across geo-distrib-uted data centers. IEEE Access, 8, pp.178526-178538.
18. Alkhatib, B., Udayashankar, S., Qunaibi, S., Alquraan, A., Alfatafta, M., Al-Manasrah, W., Depoutovitch, A. and Al-Kiswany, S., 2023. Partial network partitioning. ACM Transactions on Computer Systems, 41(1-4), pp.1-34.
19. Muniswamaiah, M., Agerwala, T. and Tappert, C.C., 2023, December. Comparison of SQL, NoSQL, and NewSQL Database Technologies. In 2023 IEEE International Conference on Big Data (BigData) (pp. 6230-6232). IEEE.
20. Zhang, W. and Chen, L., 2024. Design and Optimization of a High Availability, Low Latency Messaging Broker Using Zookeeper and Kafka for Asynchronous Processing. Asian American Research Letters Journal, 1(3).
21. Dziubak, S., 2023. Review of Cloud Database Benefits and Challenges. Modern Management Review, 28(3), pp. 7-16.
*(По требованию Роскомнадзора информируем, что иностранное лицо, владеющее информационными ресурсами Google является нарушителем законодательства Российской Федерации - прим. ред.)