THE POSSIBILITY OF CUDA TECHNOLOGY IN DEEP LEARNING PROCESSES

Mekhriddin Rakhimov; Shakhzod Javliev

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

THE POSSIBILITY OF CUDA TECHNOLOGY IN DEEP LEARNING PROCESSES

Mekhriddin Rakhimov1, Shakhzod Javliev2

1Associate professor of the department of Computer systems, Tashkent university of information technologies named after Muhammad al-Khwarizmi 2Trainee-researcher of the department of Artificial intelligence, Tashkent university of information technologies named after Muhammad al-Khwarizmi E-mail: raximov022@gmail.com 1, shajavliyev@gmail.com 2 https://doi.org/10.5281/zenodo.7854484

Abstract. As a result of technological advancement, the speed of computers is increasing day by day. Information is provided on the CUDA technology and its application, which allows saving time resources in the implementation of deep learning processes in artificial intelligence. By using this technology, it is possible to fully utilize the available computing resources in heterogeneous systems for extracting important features from large datasets in deep learning problems, and for working with images. In the research results, we present comparative results on tools and programming languages that support CUDA technology.

Keywords: artificial intelligence, deep learning, heterogeneous computing systems, CPU, GPU, parallel processing, CUDA technology.

In recent times, it has become difficult to imagine any field without information technology and artificial intelligence. We can see that almost all areas related to artificial intelligence and deep learning are rapidly evolving [1]. The use of artificial intelligence and deep learning methods has made it possible to solve problems that previously required a lot of time and difficult calculations to analyze complex data with unmatched accuracy. Moreover, with the development of technology, computers are becoming faster day by day, and deep learning algorithms are being used to extract valuable features from large amounts of data obtained from images, videos, text, speech signals, and sensors. Currently, in various fields, high-performance computing is achieved through parallel computing tools [2,4].

Deep learning is a process aimed at understanding the structure and properties of multilayer neural networks using a large amount of data related to the object under study. Deep neural networks have demonstrated remarkable performance in a variety of challenging tasks such as image classification, object detection, speech recognition, and language translation. Deep learning models typically require large amounts of data and computing resources for training, which requires specialized deep learning hardware and software, including graphics processing units (GPUs) and deep learning, led to the development of systems [1,6].

Processing of large amounts of data is also one of the current directions today. Compared to the main processor of the computer (CPU), we refer to the graphics processor (GPU) and process images or videos on this graphics processor [3,5]. CPU (Central Processing Unit) is an electronic circuit that executes instructions containing a computer program, considered the central processor or main processor of a computer. This processor performs the basic arithmetic, logic, control, and I/O operations specified in the program instructions. GPU (Graphics processing unit) is a processor like the central processing unit of a computer, the difference is that it is a special electronic circuit designed for graphics processing. With the help of GPU, we can control the performance of graphics programs such as large program, video, image [5]. As a solution to large-scale problems,

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

heterogeneous computing systems take a place as universal computing. Heterogeneous systems are systems that use multiple processors or cores [3]. These systems gain performance or energy efficiency not only by adding processors of the same type, but also by adding dissimilar coprocessors. CUDA technology is one of the efficient parallelization technologies of heterogeneous computing systems (Figure 1).

(Device) GPU Block (0, 0)

Shared Memory

* *

Thread (0, 0) Thread (1,0) Thread (2, 0)

I I I

>■ Global Memory

Figure 1. GPU memory structure using CUDA programming model The main difference between CPU programming and GPU programming is the degree to which the programmer can influence the GPU architecture. A basic understanding of parallelism and GPU architecture will allow you to write parallel programs that span hundreds of cores as easily as you can write serial programs. CUDA contains code for GPU and CPU, and this code is C programming language code [3]. When writing a program in CUDA, we write a piece of sequential code that is called by only one thread, and the GPU takes that core and parallelizes it by running thousands of threads, all performing the same computation. The CUDA programming model provides a way to organize streams in this. The CUDA programming model allows parallel processing of images on heterogeneous computing systems by simply interpreting the code with a small set of extensions to the C programming language [7]. As shown in Figure 2, we do not need to write code in full CUDA technology during data processing, on the contrary, we can solve the problem by using CUDA when necessary, i.e. when an important or difficult situation of the process occurs. Basically, the device code runs on the GPU, and the host code runs on the central processor, so the CPU is called the host, and the GPU is called the device [5].

CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) developed by Nvidia that allows developers to use Nvidia graphics processing units (GPUs) for general-purpose processing. Also, using CUDA, the power of Nvidia GPUs can be used to perform general computing tasks such as matrix multiplication and other linear algebra operations instead of just graphics calculations. CUDA is a parallel computing architecture developed by NVIDIA for high-performance computing [8]. CUDA supports the computer's graphics processing unit (GPU), which enables parallel processing

Host

(CPU)

<4

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

of large amounts of data or images. The flow of data processing using CUDA technology is carried out in the following stages (Figure 2).

Figure 2. Stages of data processing using CUDA technology

In the field of artificial intelligence, CUDA is widely used for training and inference of deep neural networks (DNN) due to its ability to efficiently parallelize computation across thousands of GPU cores. This results in a significant speedup compared to performing these calculations on CPUs alone. Deep learning platforms require integration of different types of computing cores into a single system to provide optimal performance for each time spent [6]. One of the most important advantages of heterogeneous computing is that it is possible to optimize power consumption by integrating different types of computing cores into the system, as these parameters need to be taken into account. In this case, for heterogeneous systems, using CUDA technology can give good results.

CUDA can be used to accelerate a wide range of artificial intelligence tasks, CUDA technology applications in artificial intelligence include:

■ Computer vision: CUDA can be used to accelerate the training and inference of deep learning models for object detection, image segmentation, and facial recognition.

■ Natural language processing (NLP): CUDA can be used to accelerate the training and inference of deep learning models for language modeling, text classification, and machine translation.

■ Speech recognition: CUDA can be used to accelerate the training and inference of deep learning models for speech recognition, including automatic speech recognition (ASR) and speaker identification.

■ Recommendation systems: CUDA can be used to accelerate the training and inference of deep learning models for recommendation systems, which are used in e-commerce, social media, and other applications.

■ Reinforcement learning: CUDA can be used to accelerate the training of deep reinforcement learning models, which are used for gaming, robotics, and other applications that require agents to learn from their environment.

When using CUDA technology to accelerate deep learning tasks in artificial intelligence, programming languages such as Python, C++, or CUDA C are typically used to write code that uses the CUDA API to perform operations on the GPU. Additionally, CUDA is used in popular deep learning frameworks such as TensorFlow, PyTorch, and Caffe, which provide high-level APIs that allow developers to easily take advantage of GPU power without writing low-level CUDA code [8]. Based on the information presented above, we can observe how effective the CUDA technology is in terms of speed when processing images of different sizes (Figure 3).

Copy data from CPU memory to GPU memory

Copy data from GPU memory back to CPU memory

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

Figure 3. CPU vs GPU side by side comparison In the diagram shown in Figure 3, we can see that the CUDA technology has significantly reduced the time spent not only in deep learning, but also in processing images, where the time spent can increase as the size of the images increases. But compared to the CPU, the GPU is much faster, which in turn brings the advantages of using CUDA technology.

As a conclusion, CUDA plays a crucial role in speeding up the training of deep neural networks, which is the basis of many artificial intelligence applications. In general, any AI task that involves large-scale matrix operations and can benefit from parallel computing can be accelerated using CUDA. Based on the research studies and the results obtained, the proposed CUDA technology can be used to use additional functions in parallel from several computing units at the same time, which allows the relevant data flow from the CPU to the GPU and vice versa. provides access to processing capabilities by sending the data stream to the CPU. Especially in deep learning processes, the framework can speed up the extraction of important features from large data sets with the help of special additional functions, and we can see that CUDA technology gives a much faster and more efficient result.

REFERENCES

1. Mekhriddin Rakhimov, Ravshanjon Akhmadjonov, Shahzod Javliev, "Artificial Intelligence in Medicine for Chronic Disease Classification Using Machine Learning", 2022 IEEE 16th International Conference on Application of Information and Communication Technologies (AICT), 2022, pp. 1-6, DOI: 10.1109/AICT55583.2022.10013587.

2. M. Rakhimov, J. Elov, U. Khamdamov, S. Aminov and S. Javliev, "Parallel Implementation of Real-Time Object Detection using OpenMP," 2021 International Conference on Information Science and Communications Technologies (ICISCT), 2021, pp. 1-4, doi: 10.1109/ICISCT52966.2021.9670146.

3. M. Rakhimov and M. Ochilov, "Distribution of Operations in Heterogeneous Computing Systems for Processing Speech Signals," 2021 IEEE 15th International Conference on

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

Application of Information and Communication Technologies (AICT), 2021, pp. 1-4, DOI: 10.1109/AICT52784.2021.9620451.

4. Rakhimov M.F., Berdanov U.A., "Parallel Processing Capabilities in the Process of Speech Recognition", 2017 International conference on information science and communications technologies ICISCT, - Tashkent, 2017.

5. Mekhriddin Rakhimov Fazliddinovich and Yalew Kidane Tolcha, "Parallel Processing of Ray Tracing on GPU with Dynamic Pipelining," International Journal of Signal Processing Systems, Vol. 4, No. 3, pp. 209-213, June 2016. DOI: 10.18178/ijsps.4.3.209-213.

6. Khdier, Hajer & Jasim, Wesam & Aliesawi, Salah. (2021). Deep Learning Algorithms based Voiceprint Recognition System in Noisy Environment. Journal of Physics: Conference Series. 1804. 012042. 10.1088/1742-6596/1804/1/012042.

7. Jason Sanders, Edward Kandrot, CUDA by Example, an introduction to general-purpose GPU programming. Printing US, July 2010 y. 311 p.

8. NVIDIA Corporation, "NVIDIA CUDA Compute Unified Device Architecture Programming Guide," version 11.1-2021.

THE POSSIBILITY OF CUDA TECHNOLOGY IN DEEP LEARNING PROCESSES Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Mekhriddin Rakhimov, Shakhzod Javliev

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Mekhriddin Rakhimov, Shakhzod Javliev

Текст научной работы на тему «THE POSSIBILITY OF CUDA TECHNOLOGY IN DEEP LEARNING PROCESSES»