Научная статья на тему 'COMPARING METHODS OF BUILDING A MOVIE RECOMMENDER SYSTEM'

COMPARING METHODS OF BUILDING A MOVIE RECOMMENDER SYSTEM Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
78
15
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
RECOMMENDATION SYSTEMS / ITEM BASED / MOVIES / COLLABORATIVE FILTERING / CONTENT BASED FILTERING

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Yelimissova A.A.

Recommendation systems using collaborative filtering technology are successful methods in recommendation systems. In this work, we have considered this approach in building a recommendation system based on user ratings of films and on the basis of ratings of films themselves, and a review of the code script in the python programming language is also conducted. The presented recommender system generates recommendations using various types of knowledge and data about users, the available items stored in customized database. The user can then browse the recommendations easily and find a movie of their choice. The evaluation metrics of the recommender system on the presented topic is also provided.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «COMPARING METHODS OF BUILDING A MOVIE RECOMMENDER SYSTEM»

УДК 82:004

A.A. Yelimissova

COMPARING METHODS OF BUILDING A MOVIE RECOMMENDER SYSTEM

Recommendation systems using collaborative filtering technology are successful methods in recommendation systems. In this work, we have considered this approach in building a recommendation system based on user ratings of films and on the basis of ratings of films themselves, and a review of the code script in the python programming language is also conducted. The presented recommender system generates recommendations using various types of knowledge and data about users, the available items stored in customized database. The user can then browse the recommendations easily and find a movie of their choice. The evaluation metrics of the recommender system on the presented topic is also provided.

Keywords: Recommendation systems, item based, movies, Collaborative filtering, Content based filtering

The growth of information and types of goods on the Internet has caused a new problem for Internet users in the form of a huge number of choices of a particular product for use. Because the Internet has become an integral part of human life. Too much available information makes it impossible to make the right choice. We turn to the search engine to find movies or products. But it is very difficult to find a suitable one, because of the overabundance of options, recommendation systems will help you right here to cope this explosion of information. In a view of this, a number of recommendation systems have been implemented and used in search engines of online stores such as the recommendation system for books at Amazon.com , for movies on MovieLens.org, CDs on CDNow.com (from Amazon.com), etc. [1]

Recommendation systems have added sales to the economy of some e-commerce sites. For example Amazon.com and Netflix, which have made these systems a prominent part of their web sites. A brief overview of some websites is provided in the table below [2]:

Table1

Companies benefit through recommendation system Recommender

Netflix 2/3rd of the movies watched are recommended

Google News recommendations generate 38% more click-troughs

Amazon 35% sales from recommendations

Choicestream 28% of the people would buy more music if they found what they liked

The systems generate recommendations to users who in turn can accept them as they choose, and can also provide feedback. User actions and feedback can be stored in the recommender database and used to generate new recommendations during subsequent user interactions with the system. Web-based personalized recommendation systems in today's use are used to provide various types of personalized information to relevant users. These systems are used in various applications and are very common among recommendation systems.

In this article, we will define recommendations based on collaborative filtering. So why collaborative filtering? The collaborative filtering system recommends items based on measures of similarity between users or items. System recommends those elements that are preferred for such users. Collaborative filtering has many advantages: -content-independence; -real quality assessment of items; -provides intuitive recommendations.

In the proposed model, was used a pre-filter before applying the algorithms. In our research, we found that the most appropriate recommendations that can be generated should be based on ratings given to movies by previous users, so we gave more value for the rating attribute than for other attributes.

To implement recommender system the data was taken from www.imdb.com because it probably has the largest collection of movies, along with the rating given to these movies by a large number of different users from different parts of the world.

© Yelimissova A.A., 2021.

Another important parameter in the model we propose is the total number of votes received by a particular film. Also movies with a rating of less than 5 were noted that they are the least suitable for recommendations and the least desirable for users. Users tend to want to see a good movie, and a higher rating ensures that our predicted set of movies is among those movies that a large number of users like.

The data set turned out to be quite voluminous. The first 100,000 entries from the entire set were selected. In total, 671 users and 9,066 movies were included in the selection. For convenience, the movie IDs were scaled so that they start with 1 and end with selected number of movies.

The entire data set was devided into two parts: training and test. The first one was used for training, and the second one was measured the quality of the predicted scores. To divide the set was used the train_test_split function from the scikit-learn module.

To determine the quality of predicted estimates, use the RMSE measure (Root Mean Square Error) [3]:

RMSE = - ru,d2 (1)

The root-mean-square error is the root of the mean error over all estimates given by our algorithm. User-to-movie size matrices were formed for the training and test sets so that the element in cell [i, j] reflects the i-th user's rating of the j -th movie.

One of the important points in collaborative filtering is to find similar users for User-Based and similar objects (in our case, movies) for Item-Based collaborative filtering. There are different approaches to this. One of them is to use the cosine distance between vectors describing users and objects. There is a ready-made function in the scikit-learn module pairwise_distances.

Here are cosine distance between points: ([3,3],[2,3])) 0.01941932430907989

([3, 3],[1, 1.5])) 0.01941932430907989

([3, 3],[1, 3])) 0.10557280900008414

This recommendation system calculates the predicted user rating u for the movie i using the formula:

(2)

Where:

•N — number of users similar to user u, •U — set of n similar users, •u' — user, similar to user u (from the set U), •r_{u', i} — user rating u' to movie i, •r_{u,i} — the predicted rating of the movie i. According to this formula, the predicted rating of user u's movie i is equal to the average rating of movie i from the N users most similar to user u.

Next revealed cosine distance between users: array([[ 0. , 0.59577396, 0.8867723 , 0.28933095], [ 0.59577396, 0. , 0.2036092 , 0.4484398 ], [ 0.8867723 , 0.2036092 , 0. , 0.87929886], [ 0.28933095, 0.4484398 , 0.87929886, 0. ]])

In the resulting matrix, the number in cell [i, j] reflects the similarity of user i and j. In our example, the number 0.59577396 in cell [0, 1] is the cosine distance between the scores of two users.

Let's assume that N is equal to two. The two most similar users to USER1 are USER2 (distance is 0.2893) and USER3 (distance is 0.5957); for USER3, USER4 and USER2 (distances are 0.2036 and 0.4484, respectively); for USER4, USER3 and USER2 (distance is 0.2036 and 0.8792, respectively); for USER2, USER1 and USER3 (distance is 0.2893 and 0.4484, respectively).

Following the formula (2), we calculate the expected USER1 score based on the Ant-man and Hulk movies. For ant-man:

_ 5 + 0 _

rUSERl,Ant-man = ^ = 2,5

For Hulk:

_ 3 + 4 _

rUSERl,Halk = 2 = 3,5

So, root-mean-square error for naive recommendations are: User-based CF RMSE: 2.81961691384066 Item-based CF RMSE: 3.001291898703705

Next, let's consider recommendations based on average ratings of similar users. Formula for calculating the predicted estimate:

_ TuieU simil(u,u')rU}i U,Î = T.weu\simil(u,u')\ ( )

Where:

•simil(u, u') — "similarity" between user u and user u', •r_{u', i} — user rating u' from U to movie i, •U, u' — User sets and one user. That is, the predicted rating of the movie is equal to the sum of the products of the user's " similarity" to his rating for all the most similar users.

Following the formula (3), we calculate the expected USER1 score based on the Ant-man and Hulk movies. For ant-man:

_ 0,5957* 5 + 0,2893 * 0 _ rusERi,Ant-man - \o,5957\ + \0,2893\ = 3,365537

For Hulk:

_ 0,5957 * 3 + 0,2893 * 5 _ rUSERi,Haik - \0,5957\ + \0,2893\ = 3,6537 User-based CF RMSE: 2.821055763616836 Item-based CF RMSE: 3.245023071118644

It shows worse root-mean-square error result than naïve realization, even they're both implemented on same data.

Lastly, check recommendations based on average user ratings and the "similarity matrix". The implementation depends on the ratings that the user has previously set (more precisely, on the average rating for all movies rated by the user), the average ratings of "similar" users, and the coefficients of " similarity":

r _ r- , ZU'eusimil(u,u')(ru i- rU')

U,i = U T.U'eU\simil(u,u')\ ( )

where r with a dash is the average score of the corresponding user, and the other variables of the formula have already been discussed above.

Following the formula (4), we calculate the expected USER1 score based on the Ant-man and Hulk movies. This will require average ratings for all rated films for USER1, USER2 and USER3. They are equal to 5, 4.666667, 3.25. For ant-man:

_ 0,2893 * (4,666 - 0) + 0,5957 * (3,35 - 5) _ rUSERi,Ant-man \0,5957\ + \0,2893\ = 5,414

For Hulk:

_ 0,2893 * (4,666 -4) + 0,5957 * (3,35 - 3) _ rUSERi,Haik - \0,5957\ + \0,2893\ = 5,453

User-based CF RMSE: 1.5491818781971805 Item-based CS RMSE: 1.283294405965073

In this variant, we got the best result for User-based collaborative filtering. For Item-based, the naive implementation remained the best.

So, we implemented three methods for user based and item based collaborative system. Here is the table which shows final result:

Method1 Method2 Method3

User-Based 2.819 2.821 1.549

Item-Based 3.001 3.245 1.283

Comparing three methods by RMSE we can note that 3rd method gave us best result in a given set of data. In this article, we have reviewed the recommendation system for movie recommendations using three methods to find the best match. It allows the user to select their choice from a given set of attributes and then recommend a list of movies to them. By its nature, our system is not an easy task to evaluate performance, as there are no right or wrong recommendations, it is simply a matter of opinions. Based on unofficial evaluations we conducted on a small set of users, we received a positive response from them.

References

1. Han J., Kamber M., "Data Mining: Concepts and Techniques", Morgan Kaufmann (Elsevier), 2006.

2. Kumar M. et al. A movie recommender system: Movrec //International Journal of Computer Applications. - 2015. - Т. 124. - №. 3.

3. Del Olmo F. H., Gaudioso E. Evaluation of recommender systems: A new approach //Expert Systems with Applications. - 2008. - T. 35. - №. 3. - C. 790-804.

4. Ghazanfar M., Prugel-Bennett A. An improved switching hybrid recommender system using Naive Bayes classifier and collaborative filtering. - 2010.

YELIMISSOVA AINUR ASKATOVNA - master's student, D. Serikbayev East Kazakhstan State Technical University, school of information technologies and intelligent systems, Kazakhstan, Ust-Kamenogorsk.

i Надоели баннеры? Вы всегда можете отключить рекламу.