DEVELOP AN APPLICATION THAT CONVERTS VIDEOS INTO SLIDES UTILIZING BACKGROUND ESTIMATION AND FRAME DIFFERENCING IN OPENCV

Primbetov Abbaz; Abdusalomov Shaxbozbek; To‘Lqinov Oybek; Rahmonova Mahbuba; Pardaboyeva Mohichexra

PPSUTLSC-2024

PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES OF THE 21ST CENTURY

tashkent, o-8 mav 2004 www.in~academy.uz

DEVELOP AN APPLICATION THAT CONVERTS VIDEOS INTO SLIDES UTILIZING BACKGROUND ESTIMATION AND FRAME DIFFERENCING

IN OPENCV

Primbetov Abbaz1 Abdusalomov Shaxbozbek2 To'lqinov Oybek3 Rahmonova Mahbuba4 Pardaboyeva Mohichexra5

1 Senior lecturer, Teacher, University of Tashkent for Applied Sciences, Gavhar Street 1, Tashkent 100149, Uzbekistan 2,3,4,5 student, University of Tashkent for Applied Sciences, Gavhar Street 1, Tashkent 100149, Uzbekistan abbaz0203@mail.ru, skayoker2000@gmail.com, tolqinovoybek92@gmail.com, mahbubam217@gmail.com, mohichehra.pardaboeva18@gmail.com, ORCID ID: 0009-0004-3120-1152 https://doi.org/10.5281/zenodo.13364871 Abstract: In this paper, we will guide you through the process of creating a user-friendly video-to-slides converter application. The goal is to extract slide images from slide or lecture videos using fundamental frame differencing and background subtraction techniques available in OpenCV. By following the instructions provided, you will be able to develop a straightforward and efficient solution for converting videos into slide images.

Keywords: OpenCV, KNN (K-Nearest Neighbors), GMG, PPT, PDF.

1 INTRODUCTION

This article presents a robust application designed to address the common need of obtaining slide presentations from video lectures, whether they contain animations or not. It is a common occurrence for platforms like YouTube to lack accompanying slide files for such lectures. The objective of this application is to provide a solution by leveraging techniques such as basic frame differencing and statistical background subtraction models like KNN and GMG, which are readily available in OpenCV. By utilizing these techniques, the application can accurately convert video lectures into corresponding slide images, ultimately allowing users to create PowerPoint (PPT) or Portable Document Format (PDF) presentations.

2 METHODS

Before we delve into the development of the application, it is crucial to familiarize ourselves with the concept of background subtraction, as it forms the essence of our application. Background subtraction is a method employed to differentiate foreground objects from the background in a video sequence. The underlying principle involves creating a model of the background scene and subtracting it from each frame to isolate the foreground objects. This technique finds extensive utility in numerous computer vision applications, including object tracking, activity recognition, and crowd analysis. Consequently, we can apply this concept to convert slide videos into their corresponding slides, where the concept of motion

corresponds to the various animations observed throughout the video sequence.

THRESHOLD

fO'WOUINlinMll

- - »-IPH

background model ^^^^ ^^^^

Figure 1: Background Subtraction Background modeling consists of two main steps:

1.Background Initialization

2.Background Update.

The initial step involves computing an initial model of the background, while the subsequent step focuses on updating the model to accommodate potential changes in the scene. Background estimation can also be employed in motion-tracking applications, including traffic analysis, people detection, and more. To enhance your comprehension of background estimation for motion tracking, we highly recommend referring to the following article [1], which will undoubtedly provide valuable insights

2.1. OpenCV Background Subtraction Techniques

The major background subtraction approaches from OpenCV that are popularly used are: 1.K-Nearest Neighbors (KNN): This algorithm maintains a history of the most recent K frames and calculates the pixel intensity probability distribution for each pixel. The foreground is determined based on the

PPSUTLSC-2024

PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES Of THE 2IST CENTURY

TASHKENT. 0-8 MAY 2024

www.in-academy.uz

dissimilarity between the current pixel value and the historical distribution[2].

2. Mixture of Gaussians (MOG): MOG models each pixel's intensity as a weighted combination of Gaussian distributions. The algorithm adapts the parameters of the Gaussians over time to account for changes in the background. Pixels with low probabilities under the background model are classified as foreground[3].

3. GMG (Generalized Moving Object Detection): GMG models each pixel using a probabilistic background model and applies Bayesian inference to classify foreground and background pixels. It adapts to dynamic scenes and handles lighting variations effectively [4].

Of the above, we are going to use the GMG and K-NN Background subtraction models for our application as they yield better results compared to MOG2. 2.2. Background Subtraction using Frame Differencing

Background subtraction through frame differencing is a relatively straightforward process. It involves retrieving the video frames in grayscale and calculating the absolute difference between consecutive frames. By applying morphological operations, we can determine the percentage of the foreground mask. If this percentage surpasses a specific threshold, we will save the corresponding frame. The flowchart below demonstrates this.

a) b)

Figure 2: a) Application Flowchart for Video to Slides Converter using Frame Differencing b) Application Flowchart for Video to Slides Converter through Background Modeling

b) KNN

Figure 3: Comparisons Across GMG and KNN Background Estimation

We ran the GMG and KNN methods across several slide videos and found that the KNN background estimation approach yielded almost four times as much FPS as its GMG counterpart. However, in some video samples, we found that the KNN approach missed out on a few frames

Having obtained slide images through background modeling, a prominent issue arises: a considerable number of generated screenshots exhibit significant similarity. Hence, our primary objective is to effectively eliminate such images. In our post-processing endeavor, our primary aim is to detect similar images. Cryptographic hashing techniques tend to yield completely different hashes for similar images due to minor variations in pixel values. To address this challenge, image hashing techniques come into play, providing similar or identical hashes for similar images Among the various approaches for image hashing, including average hashing, perceptual hashing, difference hashing, wavelet hashing, and more, we have selected difference hashing[6] for our specific application. This choice is driven by the algorithm's exceptional speed of computation and its superior robustness compared to other hashing techniques like average and perceptual hashing. 4 EXPERIMENTAL RESULTS AND DISCUSSION

We have observed the satisfactory outcomes achieved by different background estimation methods in scenes with noticeable animations. Likewise, scenes predominantly composed of static frames can be effectively handled using a simple frame differencing approach. Nevertheless, when dealing with video sequences that primarily consist of static frames but also include facial camera movements, none of the aforementioned approaches produce satisfactory results. The facial movements are incorrectly identified as animations, resulting in an excessive number of redundant captured frames. Even after applying frame differencing and implementing post-processing, the results showed only marginal enhancement.

a) GMG

PPSUTLSC-2024

Figure 4: Results of videos having static frames with facial movements

It has also been noted that even after the postprocessing step, there are cases where redundant slide images persist. Instead of relying on image hashing, more effective techniques like cosine similarity can be employed to identify similar images. Increasing the frame buffer history might help alleviate this issue, but neither of the approaches guarantees satisfactory results. To address this problem, a deep learning-based approach could be adopted, where facial features can be extracted and utilized as input for a non-parametric supervised learning classifier like K-Nearest Neighbors to obtain unique samples . The application demonstrates nearly flawless results for lectures featuring voice-over presentations. However, for lectures involving interactive sessions, alternative techniques beyond those discussed in this article may need to be explored.

Neurons « O -O oHo^n O :iO ;U

^ fill

—

S*

'O

Figure 5: Results for lectures having voice-over presentations

5 CONCLUSION

The primary objective of this paper was to develop a straightforward application that converts voice-over video lectures into slides, employing the following approaches:

1. The naive frame differencing approach, which proves effective for video lectures primarily composed of static frames.

2. Probabilistic methods like GMG (Gaussian Mixture Models) and KNN (K-Nearest Neighbors) for modeling background pixels. These techniques are suitable for lectures containing substantial animations, allowing for background modeling.

www.in-academy.uz

Furthermore, we have acknowledged that both approaches may generate redundant slides when dealing with videos featuring static slides and facial movements. Nonetheless, even with these simple techniques, the application can still produce satisfactory results in most video lectures.We trust that this article provides sufficient guidance and insight to help you construct a simple yet efficient application for converting video lectures into slide PDFs or PowerPoint presentations.

REFERENCES

[1]. Magee, D. R. (2004). Tracking multiple vehicles using foreground, background and motion models. Image and vision Computing, 22(2), 143-155.

[2]. Qasim, S., Khan, K. N., Yu, M., & Khan, M. S. (2021, April). Performance evaluation of background subtraction techniques for video frames. In 2021 International Conference on Artificial Intelligence (ICAI) (pp. 102-107). IEEE.

[3]. Li, B., Zhang, Y., Lin, Z., & Lu, H. (2015). Subspace clustering by mixture of gaussian regression. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2094-2102).

[4]. Trnovszky, T., Sykora, P., & Hudec, R. (2017). Comparison of background subtraction methods on near infrared spectrum video sequences. Procedia engineering, 192, 887-892.

[5]. Piccardi, M. (2004, October). Background subtraction techniques: a review. In 2004 IEEE international conference on systems, man and cybernetics (IEEE Cat. No. 04CH37583) (Vol. 4, pp. 3099-3104). IEEE.

[6]. Ng, W. W., Lv, Y., Yeung, D. S., & Chan, P. P. (2015). Two-phase mapping hashing. Neurocomputing, 151, 1423-1429.

[7]. Cioppa, A., Van Droogenbroeck, M., & Braham, M. (2020, October). Real-time semantic background subtraction. In 2020 IEEE International Conference on Image Processing (ICIP) (pp. 3214-3218). IEEE.

[8]. Braham, M., Pierard, S., & Van Droogenbroeck, M. (2017, September). Semantic background subtraction. In 2017 IEEE International Conference on Image Processing (ICIP) (pp. 4552-4556). Ieee.

[9]. Tougaard, S. (1989). Practical algorithm for background subtraction. Surface Science, 216(3), 343-360.

[10]. J Redmon, S Divvala, R Girshick, A Farhadi ; "You Only Look Once: Unified, Real-Time Object Detection", in IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2016, DOI : https://doi.org/10.1109/CVPR.2016.91

[11]. A. Bochkosvskiy, C. Y. Wang, H. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection,". arXiv,2020, arXiv:2004.10934.

[12]. C.Y.Wang, A. Bochkovskiy, H. Y. M. Liao, "Scaled-YOLOv4: Scaling Cross Stage Partial Network," arXiv preprint, arXiv:2011.08036, 2021.

[13]. J. Redmon, A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv, 2018, arXiv:1804.02767.

PPSUTLSC-2024

PRACTICAL PROBLEMS AND SOLUTIONS TO THE USE OF THEORETICAL LAWS IN THE SCIENCES OF THE 2IST CENTURY

tashkent, o-8 may 2004 www.in~academy.uz

[14]. G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, Nanocode012, Y. Kwon, TaoXie, J. Fang, imyhxy, K. Michael, et al. "ultralytics/yolov5: v6.1-TensorRT, TensorFlow Edge TPU and Open VINO Export and Inference", Zenodo, 2022, DOI : https://doi.org/10.5281/zenodo.6222936

[15]. C. -Y. Wang, H. -Y. Mark Liao, Y. -H. Wu, P. -Y. Chen, J. -W. Hsieh and I. -H. Yeh, "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 2020, pp. 1571

[16]. Kendjayeva , D. K. (2024). ZAMONAVIY TA'LIMDA RAQAMLI TEXNOLOGIYALAR VA SUN'YIY INTELLEKT TEXNOLOGIYALARIDAN FOYDALANISH. GOLDEN BRAIN, 2(5), 207-213.

[17]. Kendjayeva , D. K. (2024). ZAMONAVIY TA'LIMDA RAQAMLI TEXNOLOGIYALAR VA SUN'YIY INTELLEKT TEXNOLOGIYALARIDAN FOYDALANISH. GOLDEN BRAIN, 2(5), 207-213.

[18]. T. Y Lin et al . "Microsoft COCO: Common Objects in Context," in European Conference on Computer Vision, 2014, arXiv:1405.0312, 2015.

[19]. Primbetov, A. (2024). Automatic Red Eye Remover using OpenCV. Modern Science and Research, 3(1), 1-3.

[20]. Primbetov, A., Saidova, F., Yembergenova, U., & Primbetov, A. (2024). REAL TIME LOGO RECOGNITION USING YOLO ON ANDROID. Modern Science and Research, 3(1), 1-5.

[21]. Giyosjon, J., Anvarjon, N., & Abbaz, P. (2023). SAFEGUARDING THE DIGITAL FRONTIER: EXPLORING MODERN CYBERSECURITY METHODS. JOURNAL OF MULTIDISCIPLINARY BULLETIN, 6(4), 77-85.

DEVELOP AN APPLICATION THAT CONVERTS VIDEOS INTO SLIDES UTILIZING BACKGROUND ESTIMATION AND FRAME DIFFERENCING IN OPENCV Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Primbetov Abbaz, Abdusalomov Shaxbozbek, To‘Lqinov Oybek, Rahmonova Mahbuba, Pardaboyeva Mohichexra

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Primbetov Abbaz, Abdusalomov Shaxbozbek, To‘Lqinov Oybek, Rahmonova Mahbuba, Pardaboyeva Mohichexra

DEVELOP AN APPLICATION THAT CONVERTS VIDEOS INTO SLIDES UTILIZING BACKGROUND ESTIMATION AND FRAME DIFFERENCING IN OPENCV

Текст научной работы на тему «DEVELOP AN APPLICATION THAT CONVERTS VIDEOS INTO SLIDES UTILIZING BACKGROUND ESTIMATION AND FRAME DIFFERENCING IN OPENCV»