Научная статья на тему 'MUSIC RECOMMENDER SYSTEM'

MUSIC RECOMMENDER SYSTEM Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
128
23
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
Recommendation system / Decision tree / Correlation

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Alpamis Kutlimuratov, Makhliyo Turaeva

This article discusses the analysis of Spotify’s music data and the generation of recommendations based on subscribers' preferences. The recommendation algorithm suggests contents based on personal viewing histories or the viewing histories of other users with similar tastes. Data extraction, data preparation, and predictions are the main components of the study. Correlation coefficients and the decision tree algorithm are applied to recommend contents.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «MUSIC RECOMMENDER SYSTEM»

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

MUSIC RECOMMENDER SYSTEM Alpamis Kutlimuratov1, Makhliyo Turaeva2

department of Information-Computer Technologies and Programming, Tashkent University of Information Technologies Named after Muhammad Al-Khwarizmi, Tashkent 100200, Uzbekistan. kutlimuratov.alpamis@gmail .com ^Department of Computer Systems and Programming, Tashkent University of Information Technologies Named after Muhammad Al-Khwarizmi, Tashkent 100200, Uzbekistan.

max1209uz@gmail .com https://doi.org/10.5281/zenodo.7854462

Abstract. This article discusses the analysis of Spotify 's music data and the generation of recommendations based on subscribers' preferences. The recommendation algorithm suggests contents based on personal viewing histories or the viewing histories of other users with similar tastes. Data extraction, data preparation, and predictions are the main components of the study. Correlation coefficients and the decision tree algorithm are applied to recommend contents.

Keywords: Recommendation system, Decision tree, Correlation

Introduction

The proliferation of Internet-based technologies and the resulting deluge of data from all spheres of activity have contributed to the modern problem of information explosion. A large number of modern e-commerce websites utilize a variety of recommendation systems that are both functional and successful in order to address this issue, enhance the level of service they provide, and engage and keep loyal clients. For instance, the movie recommendations on Netflix, the music recommendations on Spotify, and so on. In this regard, researchers have completed a significant amount of research [1,2,3,4,5,6] in order to develop recommendation systems. Therefore, in our study, we aimed to describe how to obtain Spotify's own data using the Spotify API and the Spotify library, as well as how to create a structured CSV file, preprocess the data to obtain clear results from the classification algorithm, which in this case is a decision tree, and then draw recommendations based on the results. There are studies including the examination of the dataset, such as a cursory examination of correlations between characteristics and particular features with the goal variable.

Dataset Explanation and Prediction

Three main strategies were used in this study: a) Data extraction, in which we used API technology to collect datasets in csv format; b) Data preparation, using feature selection as the technique of choice. The Pearson correlation metric was used to calculate the relationships of characteristics between each other and with the target variable. The data dimension was decreased to 8 and includes the "favorite" column. c) Classification and prediction. The approach that we favored in this section is classification, and decision tree method were applied for classification.

a) Data extraction

For this study, material from Spotify, including playlists and songs, was used. An application has to be approved by the user using client credentials in order to have access to user-related data via the application programming interface (API). This information is obtained using the Application Programming Interface (API) of Spotify, which, when requested, delivers a JSON file with the result data. Spotify is a supplier of music streaming as well as other media services.

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

It has more than 489 million monthly active users, of whom 205 million are paying members, making it one of the most significant suppliers of music streaming services. The dataset [7] that was utilized for this analysis is a continuous dataset that contains more than 30,000 songs and 12 audio features for each track. In order to obtain the audio characteristics of each and every track included in two different playlists (one for songs that were liked and one for songs that were disliked), we initially probed the API for the playlist components and afterwards demanded the sound characteristics of the track that was being listened to. After doing the necessary data cleaning on the dataset, a total of 9996 music and the audio attributes associated with them are made accessible.

The main features are as following (Figure 1):

• Acousticness, • Energy, • Duration,

• Liveness, • Loudness, • Key,

• Speechiness, • Danceability, • Tempo,

• Instrumentalne • Valence • Mode

ss

DANCEABILITY

SPEECHINESS ACOUSTICNESS

0.677

0.696

0

0.585 0.436 10

0.773 0.695 4

0.680 0.826 0

0.653 0.524 11

0.514 0.730 1

0.586 0.616 0.793 0.620

0.749

0.449

0.463

0.465

5 S 11

-6.181 -8.761 -6.865 -5.4B7 -9.016 -5.934 -7.964 -7.079 -8.43 3 -8.964

0.1 190 0.0601 0.09113 0.0309 0.0502 0.0598 0.0324 0.2320 0.0328 0.0791

0.32100 0.72100 0.17500 0.02120 0.11200 0.00146 0.18200 0.41400 0.20300 0.52400

Figure 1. Features of Spotify music dataset b) Data preparation

The first thing that we did was check to see if the dataset contained any missing values. If there were any missing values, we would have filled them in using the median of the characteristic, which is the most common technique. However, since the dataset does not contain any missing values, these methods are not required. The coefficient of correlations was calculated between different characteristics and one another, as well as between the target variable and itself. Because we had previously extracted the features via the Spotify API system, we decided to go with this strategy for selecting the features. Consequently, the practice of extracting new features is not the case; rather, we should operate with the same characteristics throughout each phase. We went with the strategy of target variables rather than the relevance of characteristics since it is not something we are interested in. We have done some work with the connection between the characteristics and the variable that we are targeting. We need a factor for the correlation coefficient so that we can choose certain characteristics based on a set threshold, which we decided would be 0.1. After a

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

number of failed attempts to figure out what the threshold should be, the number 0.1 was finally chosen. In the end, we brought the dimension down from 13 to 5, which allowed us to operate with five features.

c) Classification and prediction

As a result of the rapidly increasing amount of datasets, several solutions, the majority of which are based on classification algorithms, have been developed to avoid significant job-specific feature extraction. In spite of the recent development of deep learning algorithms, which are used in several well-known research papers [8,9,10,11], we have endeavored to employ the decision tree algorithm as our primary method of classification. First, the data was divided into two parts: 30% of the data was used for the test portion, and 70% of the data was used for the training portion. There were also many attempts made to determine the best way to split the data. The decision tree approach was selected to serve as the classification model, and each variable was partitioned off into the appropriate class. In a decision tree, we have a cutoff value for a parameter that decides how deep the tree goes. This number is called the depth threshold. This threshold is established by taking into account the entropies of the features, which may also be interpreted as the impureness of the variable. To put it another way, the entropy of the features has a direct correlation to the depth of the tree. The most pristine characteristic was assigned to the final vertex.

Earlier, we discussed the existence of correlation experiments, which may be used to ascertain the critical value of correlation coefficients.

A heatmap displaying the feature correlation coefficients is shown here (Figure 2). However, we are only focused in the final column of the table. Therefore, the optimal number for that would be 0.1 since if we increased it, we would be able to skip through several vital features.

The following (table 1) is a list of the correlation coefficients that characteristics have with the objective "1" variables (favourite):

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

Acousticness, 0.299 Energy, 0.091 Duration, 0.014

Liveness, 0.035 Loudness, 0.238 Key, 0.015

Speechiness, 0.185 Danceability, 0.343 Tempo, 0.051

Instrumentalness 0.077 Valence 0.046 Mode 0.036

Conclusion

In this study, to better predict which music users will like, we used a classification model using a decision tree algorithm. The dataset was retrieved from Spotify by utilizing the API system, and the calculation of the coefficient correlation was included as part of the preparation stage. A comparison was carried out with the variable that served as the objective, and certain attributes were selected (feature selection). Our approach for separating data works quite well for dealing with situations like that one. We examined the usefulness of the approach using the accuracy score as a measure for assessment, and we found that it was accurate. It is sufficient for the recommendation system since it is 80%, however this number is not particularly acceptable because the rate of loss is so significant for such machine learning applications. As future direction, it ought to be able to develop an emotionally-based recommender model [12] that makes use of audio attributes taken from the Spotify dataset.

REFERENCES

1. Ilyosov, A.; Kutlimuratov, A.; Whangbo, T.-K. Deep-Sequence-Aware Candidate Generation for e-Learning System. Processes 2021, 9, 1454. https://doi.org/10.3390/pr9081454.

2. Safarov F, Kutlimuratov A, Abdusalomov AB, Nasimov R, Cho Y-I. Deep Learning Recommendations of E-Education Based on Clustering and Sequence. Electronics. 2023; 12(4):809. https://doi.org/10.3390/electronics12040809

3. Ilya Shenbin, Anton Alekseev, Elena Tutubalina, Valentin Malykh, and Sergey I. Nikolenko. 2020. RecVAE: A New Variational Autoencoder for Top-N Recommendations with Implicit Feedback. In The Thirteenth ACM International Conference on Web Search and Data Mining (WSDM '20), February 3-7, 2020, Houston, TX, USA. ACM, New York, NY, USA, 9 pages. https: //doi.org/10.1145/3336191.3371831

4. Kutlimuratov, A.; Abdusalomov, A.; Whangbo, T.K. Evolving Hierarchical and Tag Information via the Deeply Enhanced Weighted Non-Negative Matrix Factorization of Rating Predictions. Symmetry 2020, 12, 1930.

5. Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, and Xiuqiang He. 2021. UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM '21), November 1-5, 2021, Virtual Event, QLD, Australia. ACM, New York, NY, USA, 10 pages. https: //doi.org/10.1145/3459637.3482291

6. Kutlimuratov, A.; Abdusalomov, A.B.; Oteniyazov, R.; Mirzakhalilov, S.; Whangbo, T.K. Modeling and Applying Implicit Dormant Features for Recommendation via Clustering and Deep Factorization. Sensors 2022, 22, 8224. https://doi.org/10.3390/s22218224.

7. https://developer.spotify.com/documentation/web-api/

8. Yang, Dan & Zhang, Jing & Wang, Sifeng & Zhang, XueDong. (2019). A Time-Aware CNN-Based Personalized Recommender System. Complexity. 2019. 1-11. 10.1155/2019/9476981.

INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE "DIGITAL TECHNOLOGIES: PROBLEMS AND SOLUTIONS OF PRACTICAL IMPLEMENTATION IN THE SPHERES" APRIL 27-28, 2023

9. Abdusalomov, A.; Baratov, N.; Kutlimuratov, A.; Whangbo, T.K. An Improvement of the Fire Detection and Classification Method Using YOLOv3 for Surveillance Systems. Sensors 2021, 21, 6519. https://doi.org/10.3390/s21196519.

10. S. Wang, L. Sun, W. Fan et al., "An automated CNN recommendation system for image classification tasks," in Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), IEEE Computer Society, Hong Kong, China, July 2017

11. Abdusalomov, A.B.; Mukhiddinov, M.; Kutlimuratov, A.; Whangbo, T.K. Improved RealTime Fire Warning System Based on Advanced Technologies for Visually Impaired People. Sensors 2022, 22, 7305. https://doi.org/10.3390/s22197305.

12. Makhmudov, F.; Kutlimuratov, A.; Akhmedov, F.; Abdallah, M.S.; Cho, Y.-I. Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders. Electronics 2022, 11, 4047. https://doi.org/10.3390/electronics1123404

i Надоели баннеры? Вы всегда можете отключить рекламу.