Научная статья на тему 'VIDEO CONTENT POPULARITY PREDICTION USING MACHINE LEARNING METHODS'

VIDEO CONTENT POPULARITY PREDICTION USING MACHINE LEARNING METHODS Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
122
40
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ЦИФРОВАЯ ЭКОНОМИКА / МАРКЕТИНГ / МЕНЕДЖМЕНТ / БОЛЬШИЕ ДАННЫЕ / ВИДЕОКОНТЕНТ / ИНСТРУМЕНТЫ МАШИННОГО ОБУЧЕНИЯ / DIGITAL ECONOMY / MARKETING / MANAGEMENT / BIG DATA / VIDEO CONTENT / MACHINE LEARNING METHODS

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Shafirov Ilya L.

This paper deals with the research problem of predicting the popularity of newly created video content. Machine learning task is represented by binary classification of videos into “popular” and “unpopular”. Based on the Pareto principle, the “popular” videos are those, which are part of the top 20% most viewed videos. The article provides an overview of studies on the video content popularity prediction problem by using methods of machine learning (including deep learning). The author explores the applicability of various modifications of existing methods to solve the research problem. The author also develops the new method based on a combination of the ensemble of trees and neural networks. Each method is tested on a sample of 11,000 YouTube videos data, which is collected by using a purposefully developed parsing software. Based on the tests results, it is suggested to use the method of combining tree ensembles and neural networks. The quality of prediction by using this method is characterized by the following metrics: 87% of videos are correctly classified (Accuracy); among the videos classified as popular, 63% are popular (Precision); 49% of truly popular videos are correctly identified (Recall). Research findings indicate characteristics that are most likely to influence the popularity of the newly created video: the number of views and dislikes of the last published video on this channel; the number of channel subscribers; last video's publishing time; new video title; the channel establishment date. The limitations and directions for improving the method are outlined; the need for interdisciplinary research is proposed as encompassing the interests of marketers, data analysts, linguists and psychologists.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «VIDEO CONTENT POPULARITY PREDICTION USING MACHINE LEARNING METHODS»

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) ф Том 11, № 2. 2020

ВОПРОСЫ ЦИФРОВОЙ ЭКОНОМИКИ

www.hjournal.ru

DOI: 10.17835/2078-5429.2020.11.2.098-112

ПОПУЛЯРНОСТЬ ВИДЕОКОНТЕНТА: ПРОГНОЗИРОВАНИЕ С ИСПОЛЬЗОВАНИЕМ ИНСТРУМЕНТОВ МАШИННОГО ОБУЧЕНИЯ

ШАФИРОВ ИЛЬЯ ЛЕОНИДОВИЧ,

Национальный исследовательский университет - Высшая школа экономики,

г. Москва, Россия, e-mail: shafirov1999@gmail.com

Видеоконтент формирует 70% интернет-трафика, кроме того, в условиях пандемии коронавируса изменяется экономическое поведение населения, увеличивается доля интернетрекламы. Вышеизложенное актуализирует исследовательскую проблему прогнозирования популярности вновь создаваемого видеоконтента. Задачу прогнозирования популярности видеоконтента в данном исследовании предложено сформулировать как задачу бинарной классификации видео на «популярные» и «непопулярные». С учётом принципа Парето, к «популярным» предлагается относить видео, которые входят в топ - 20% видео по числу просмотров. В статье представлен обзор исследований по тематике прогнозирования популярности видеоконтента с применением методов машинного (в том числе - глубокого) обучения. Автор исследует возможность применения модификации данных методов для решения поставленной задачи, а также разрабатывает новый метод, основанный на комбинации методов ансамбля деревьев и нейронных сетей. Каждый из методов тестируется на выборке данных 11000 видео YouTube, которая собрана при помощи разработанного программного обеспечения. По результатам тестирования для прогнозирования популярности вновь созданного видеоконтента автором предложен к применению показавший лучшие результаты метод комбинации ансамблей деревьев и нейронных сетей. Качество прогнозирования популярности видеоконтента с использованием данного метода характеризуется следующими значениями метрик: 87% видеороликов корректно классифицируется как популярные/непопулярные (Accuracy); среди роликов, отнесённых к классу популярных, 63% являются популярными (Precision); 49% действительно популярных видеороликов определены корректно (Recall). Выявлены признаки, имеющие наибольшее значение для популярности вновь созданного видео: количество просмотров и дислайков видео, последнего из опубликованных на канале; количество подписчиков канала; время опубликования вновь созданного видео; заголовок вновь созданного видео; дата основания канала. В статье изложены ограничения и направления совершенствования предложенного автором метода, доказана необходимость междисциплинарных исследований в данной области на пересечении сфер интересов маркетологов, аналитиков данных, лингвистов и психологов.

Ключевые слова: цифровая экономика; маркетинг; менеджмент; большие данные; видеоконтент; инструменты машинного обучения.

© Шафиров И. Л., 2020

VIDEO CONTENT POPULARITY PREDICTION USING MACHINE LEARNING METHODS

Ilya L. SHAFIROV,

National Research University - Higher School of Economics,

Moscow, Russia, e-mail: shafirov1999@gmail.com

This paper deals with the research problem of predicting the popularity of newly created video content. Machine learning task is represented by binary classification of videos into “popular” and “unpopular”. Based on the Pareto principle, the “popular” videos are those, which are part of the top 20% most viewed videos. The article provides an overview of studies on the video content popularity prediction problem by using methods of machine learning (including deep learning). The author explores the applicability of various modifications of existing methods to solve the research problem. The author also develops the new method based on a combination of the ensemble of trees and neural networks. Each method is tested on a sample of 11,000 YouTube videos data, which is collected by using a purposefully developed parsing software. Based on the tests results, it is suggested to use the method of combining tree ensembles and neural networks. The quality of prediction by using this method is characterized by the following metrics: 87% of videos are correctly classified (Accuracy); among the videos classified as popular, 63% are popular (Precision); 49% of truly popular videos are correctly identified (Recall). Research findings indicate characteristics that are most likely to influence the popularity of the newly created video: the number of views and dislikes of the last published video on this channel; the number of channel subscribers; last video’s publishing time; new video title; the channel establishment date. The limitations and directions for improving the method are outlined; the need for interdisciplinary research is proposed as encompassing the interests of marketers, data analysts, linguists and psychologists.

Keywords: digital economy; marketing; management; Big data; video content; machine learning methods.

JEL codes: C8, C81, C88, M39

Introduction

The number of Internet users is continuously increasing, and in 2019 it exceeded 4 billion people (Clement, 2020). This is due to multiple brands spending considerable financial resources to promote their goods on the Internet: it had previously been predicted that in 2021 these expenses would account for more than 49% of all media advertising costs (Sherman, 2019). Moreover, as a result of the new coronavirus pandemic, consumers cannot visit public places, so experts predict that the share of online advertising in the overall advertising budgets will increase in countries where there are many cases of the infection (Enberg, 2020).

Predicting the popularity of digital content is a widespread task. Employees of marketing departments often face this problem. They can use a model capable of predicting the popularity of video content to draw up content plans, evaluate the effectiveness of various promotion channels and plan the costs of promoting their goods on the Internet. Also, this tool can be used by bloggers and content authors for the same purposes. Social networks and video hosting services attract advertisers and provide them with data and the forecasted results of planned advertising campaigns. Therefore, they also face tasks related to the problem of popularity prediction.

The demand for forecasting the popularity of created advertising materials is becoming more critical due to the fact that during the crisis, consumers are changing the structure

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) · Vol. 11, no. 2. 2020

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) ф Том 11, № 2. 2020

of consumption and are cutting their expenses (in the US only 15.9% of consumers plan to maintain the level of consumer spending) (Influence Marketing Hub, 2019), while advertisers are reviewing their advertising strategies (in the USA, 59% of advertisers are cutting their expenses on media advertising, 16% plan to keep them at the same level, and 25% plan to increase advertising costs) (Ibid). Meanwhile, the use of video content in advertising attracts an increasing number of advertisers, from 63% in 2017, to 87% in 2019 (Wyzowl, 2019). Thus, prediction of the online video popularity is an actual problem.

YouTube is the leader in the number of active users among video hosting services: the number of monthly visitors exceeds 2 billion (Clement, 2019). In addition, YouTube is the second most visited website (Alexa Internet, Inc., 2020). Each day, a single user spends 8.4 minutes watching a YouTube video on average (YouTube, 2020). Moreover, from a study conducted by the Pew Center in the USA, it can be concluded that the service’s user base is exceptionally diverse (Pew Research Center, 2020).

In 2019, 31 million individual channels created content for YouTube (Tubics, 2020). In a 2019 survey, 62% of business representatives said they used YouTube channels to post content (Buffer, 2020). In the same study, it was found that 36.7% of businesses published content monthly, 24.3% — weekly, and only 14.5% of respondents did not create video content for YouTube at all (Ibid.).

Access to a broad audience motivates many brands to use YouTube for large-scale marketing campaigns. In 2019, YouTube’s advertising revenue amounted to $15 billion, which is an increase of 36% compared to the previous year (Statt, 2020). A model that accurately predicts the popularity of video content will allow companies to optimize advertising costs by selecting only the expectedly popular videos to promote them.

Therefore, in this article, the results of the experimental research on forecasting the YouTube video content popularity are presented.

Several researchers and practitioners consider getting into the topmost viewed videos the important criteria for video popularity (Min Gyeong Choe et al., 2019). Besides, Guandan Chen et al. (2019) justify the feasibility of converting the task of popularity forecasting into predicting whether the number of views exceeds a certain threshold. The author of this article defines this threshold value in order to consider the publication as popular in the context of the Pareto principle (or rules 80—20) (Newman, 2005), which indicates that approximately 80% of people focus on 20% of the content. In view of the foregoing, the author proposes to classify videos as popular only if they are in the top 20% of the most viewed videos.

Thus, the machine learning problem is defined. Let X be the features of videos (frame sequences, video title, metadata, and data about the channel). The model is assumed to predict popularity before the video is published. The predicted indicator Y is the inclusion of the video in the TOP-20% most viewed videos amongst all analyzed of all analyzed items. The task is to develop an algorithm, which can classify videos:

a: X--> Y

Literature Review

The literature review aims to analyze and compare existing video content popularity prediction methods.

A comparative analysis of methods for predicting the popularity of video content is carried out according to the following criteria: input data - a set of characteristics of the video; predicted indicator - the dependent variable that the authors predict using the proposed methods; datasets - datasets used for training and testing the method; quality metrics — indicators which demonstrate the performance of the method; method quality - method applicability.

Popularity prediction method based on sentiment analysis and content of the video

Fontanini et al. (2016) suggest using frames of the video to predict the popularity of the video. Given video frames, metadata and time series (x^v), x2(v)),...,xtr(v)), where x.(v) is the number of views of the video v during the day i after upload, authors attempt to develop a model, which predicts the number of views of the video until day t = 30. It is assumed that the model is used tr days after upload, so we can use this time series as an independent variable.

The method consists of three main parts:

1. Extremely Randomized Trees Classifier (Geurts et al., 2006) is trained to predict the type of popularity trend of the video using metadata and time series (x^v), x2(v)),...,xtr(v)). The author divides popularity trends into 4 distinct categories, as described in (Crane, Sornette, 2008). The classifier is trained 7 times for each t, 1 < t < 7.

Visual features of the video are transformed for later use in the Multivariate Radial Basis Function (MRBF) model. Each video is divided into separate frames. The authors then use the output of the FC7 layer of the pre-trained VGG-M-128 network, which is pretrained on the ImageNet dataset, to represent the visual features of video frames. In order to represent the “mood” of the video, it is proposed to use the DeepSentiBank neural network (Tao Chen et al., 2014), which describes each frame using multiple ANPs (phrases from pairs of adjectives and nouns). These two encodings are concatenated to form a representation of a single frame. To represent the whole video, frame embeddings are combined using the Fisher Vector encoding.

2. For each of the 4 groups Pi in (1) the MRBF model is trained, which predicts the number of views prior to day tt:

N Pi{v, tr, tt) = Θ(ίηί0 * Xtr(+) + Σων<* RBFVc(v(, (1)

vcec

_ll* -)-x )_c) II2

where xtr(v( = (*ι(^)>*2θ0),···,*ίΓ(>)), RBFVc(v(= t ^ .

The first part of the model uses the number of views of a video prior to the day tr to forecast xtt(v). The second part of the model is based on the assumption that videos that are similar (possess similar temporal characteristics and display similar visual features) to the same elements of the set C, which contains the most diverse and representative videos from the training set, will receive approximately the same number of views. As a measure of “similarity”, a Gaussian radial basis function (RBF) is used.

Two sets of data are collected to test and train the method. The “Top” dataset contains information about 4 840 most popular videos on YouTube, while the “Random” dataset stores data about 13 144 videos on YouTube. To evaluate the quality of the model, the authors propose to use Relative Squared Error (RSE):

RSE =

Np. (v,tr,tt) ( N(v,tt)

(2)

where N(v, tt ) is the number of views of the video v until day t. To obtain for the accuracy of the method on the test dataset, the RSE metric is averaged over all videos in the test sample.

Table 1 shows the results of the experiments presented by Fontanini et al. (2016). It is possible to conclude that the addition of visual features helps to increase the accuracy of the model prediction. The improvement is especially noticeable at small values of tr since it is impossible to predict the future number of views of a video is going to receive from just a

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) · Vol. 11, no. 2. 2020

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) ф Том 11, № 2. 2020

small sample of previous per day observations. Hence, the inclusion of visual features vastly improves the quality of the prediction.

Results of the experiments

Table 1

Day MRBF Content Sentiment Mixed

1 0,4329 0,2854 0,2846 0,2845

2 0,3606 0,2454 0,2439 0,2442

3 0,2963 0,2157 0,2161 0,2151

4 0,2461 0,1808 0,1796 0,1808

5 0,2093 0,1570 0,1571 0,1564

6 0,1847 0,1405 0,1407 0,1400

7 0,1614 0,1256 0,1250 0,1249

Mean 0,2702 0,1929 0,1924 0,1923

Day MRBF Content Sentiment Mixed

1 0,5071 0,398 0,3965 0,3998

0

2 0,3831 0,3139 0,3133 0,3133

3 0,2985 0,2587 0,2570 0,2604

4 0,2411 0,2153 0,2143 0,2126

5 0,2052 0,1825 0,1821 0,1821

6 0,1810 0,1620 0,1641 0,1635

7 0,1599 0,1453 0,1454 0,1452

Mean 0,2823 0,2394 0,2390 0,2396

Based on the data given in Table 1, it is possible to conclude that the RSE values (lower is better) for models on the “Top” dataset (above) are better than for the “Random” (below) dataset. Data in the “Content” column represents the results of testing on the video’s temporal characteristics and visual characteristics represented by the output of the FC7 layer of the pre-trained network VGG-M-128. The data in the “Sentiment” column demonstrates the results of testing on the video’s temporal characteristics and sentiment (mood) characteristics represented by the output of the DeepSentibank neural network. The column “Mixed” stores the results of tests on the representation, which consists of temporal features, outputs of the DeepSentibank neural network and FC7 layer of the pre-trained network VGG-M-128.

Based on the results presented in (Fontanini et al., 2016), it is possible to conclude that the consideration of visual video features improves the accuracy of the prediction. However, the MRBF model used in (Fontanini et al., 2016) cannot be used for the solving the author’s task, since MRBF requires the input data on the number of views after publication. This information is not available to newly created or not yet published video.

Popularity-SVR

Support Vector Regression SVR, a modification of the famous Support Vector Machine (SVM) method, is used in (Trzcmski, Rokita, 2017) to forecast the popularity of video content using temporal and visual features. The authors of the paper attempt to forecast the number

of the video views until day t = 30 by using visual (video frames and video thumbnail) and temporal (number of views, likes and comments) features. It is assumed that the model is used for prediction the views of the video on day tr after the upload, so daily increments of every temporal attribute for every day until tr are given.

The method consists of these main phases:

1. The features of the visual content of the video (the number of plans on the video, the number of people on average per frame, etc.) are extracted from the frames of the video. These features are added to the video representation obtained by averaging the outputs of the ResNet-152 pre-trained neural network for all video frames.

2. “Popularity” of the video preview is calculated using Popularity API (Facebook for Developers, 2020).

3. Temporal and visual features are concatenated to create the embedding of one video X(v, t).

4. The number of the video views is expected to gain after t days is estimated using the SVR model:

к

N( t7, tr, tt) = Σ ak * Ф(Х( 17, tr), X(fc, tr)) (3)

k=1

ll*-y II2 „

where Ф(х,у) = e 2*σ2 , [X(k, tr))ı - support vectors returned by SVR model.

The following method is trained and test on a dataset of 1820 videos from Facebook. The authors only consider videos, which were uploaded by a preselected set of group of content creators. In order to compile this dataset, the Graph API was used (Facebook for Developers, 2020).

To assess the quality of the model, the authors use the Spearman rank coefficient between observed and predicted views. The results are presented in Figure 1. From this graph, we can see that the best results of any tr are demonstrated by the Popularity-SVR model, which uses temporal and visual features. The most significant difference between the models is observed at small values of t .

r

Fig. 1. The dependence between the quality of the model and the number of hours, after which the model was trained, from the moment the video was published

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) · Vol. 11, no. 2. 2020

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) ф Том 11, № 2. 2020

Unfortunately, despite the impressive performance, the proposed method cannot be used for the author’s task since its application requires access to temporary features (views of the video in the first days after upload). This data is not available for the new, not uploaded video. Trzcmski and Rokita (2017) note that prediction of the content popularity by using only visual signs shows unsatisfactory quality.

Popularity-LRCN

Trzcmski et al. (2017) introduce the Popularity-LRCN method, which combines Convolutional neural networks with Long-term recurrent convolutional networks to predict the popularity of a video. The authors of the paper use just 18 frames of the video to solve a binary classification problem. Each video is assigned a tag l, which equals to 1, if video i will is «popular», and 0 otherwise. The purpose of the model is to predict the label l .. To determine if a video is «popular», the authors calculate normalized popularity score:

normalized popularity score = log2( viewcount +i ). ‘Popular are considered

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

number of publis her s followers

videos whose normalized popularity counts are greater than the median value.

The architecture of the neural network consists of the following layers:

1. Convolutional, ReLU и Pooling layers, which extract features from individual frames.

2. LSTM cell, which sequentially (frame by frame) analyzes features extracted from individual frames.

3. Softmax layer, which returns the probability distribution of the object belonging to each of the classes.

The full architecture is demonstrated in Figure 2.

Fig. 2. Popularity - LRCN

Model is trained and tested on more than 37 000 Facebook videos, which were uploaded between 06.01.2016 and 09.31.2016. Data was collected using Graph API (Facebook for Developers, 2020). The performance of the model was measured using accuracy metric and Spearman’s rank correlation coefficient between the predicted probability that the video will be popular and the normalized popularity counter. The results of the experiments are presented in Table 2. The Popularity-LRCN method showed the best results among all models that used only video features for prediction.

The value of quality metrics

Table 2

Model Feature Classification accuracy Spearman correlation

HOG 0,587 ± 0,006 0,229 ± 0,014

logistic GIST 0,609 ± 0,007 0,321 ± 0,008

regression CaffeNet 0,622 ± 0,007 0,340 ± 0,007

ResNet 0,645 ± 0,005 0,393 ± 0,010

HOG 0,616 ± 0,004 0,359 ± 0,008

SVM GIST 0,609 ± 0,006 0,294 ± 0,012

CaffeNet 0,653 ± 0,003 0,395 ± 0,007

ResNet 0,650 ± 0,007 0,387 ± 0,015

Popularity-LRCN raw video frames 0,7 ± 0,003 0,521 ± 0,009

Popularity-LRCN method is applicable to predict the popularity of the not yet published videos.

Attention-Based Popularity Prediction

Bielski and Trzcinski (2018) propose a method based on attention mechanism to predict the popularity of a video by using video frames and title. The task and the dependent variable are entirely identical to the Popularity-LRCN method.

The model consists of three main steps:

1. Video frames are processed in order to construct an informative embedding. Firstly, 18 video frames are extracted from the video. To create an embedding of a single frame, authors propose to use the output of a pre-trained ResNet50 neural network. Linear layer and ReLU transformation are applied to every embedding. The representation of a video is

formed by a weighted sum of modified frame embeddings (qj :

V _

(4)

where weights a{ are calculated using the attention mechanism implemented as a two-layer neural network.

The first layer creates a hidden representation u:

щ = tanh (IWuqi + bu) . (5)

The second layer returns the “importance” of i-th frame:

ai = ( WaUi + ba ). (6)

Lastly, a{ is normalized using softmax.

2. Embedding of the title is composed. Individual words are encoded using pre-trained GLOVE vectors and are fed into biLSTM. Authors use the same two-layer neural network on hidden states of biLSTM to learn the «importance» of each word.

3. Text and video embeddings are concatenated to create a combined feature representation, which is fed into a two-layer network, which predicts the popularity of a video.

The method is trained and tested on 37000 videos from Facebook. The dataset and quality metrics are the same as those used in the Popularity-LRCN method. The results

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) · Vol. 11, no. 2. 2020

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) ф Том 11, № 2. 2020

of the experiments carried out by the authors are presented in Table 3. The method uses attention mechanism on combined video embedding (text + headline) is applicable.

Models’ test results

Table 3

Input Feature Acc, % Spearman

Video frames ResNet 50 vean 68,17 0,524

+ attention 68,87 0,526

Headline biLSTM 69,47 0,542

+ attention 68,70 0,525

Multimodal ResNet+biLSTM 71,94 0,612

+ attention 72,72 0,607

The method offers approaches to data processing of various modalities (text and video). The authors proved that the training a neural network on top of concatenated features of different modalities improves the quality of the prediction.

Literature Review Conclusions

Based on a review of the literature, a table comparing different popularity prediction methods is compiled (see Table 4).

Table 4

The comparison of the popularity prediction methods

Method Input data Dependent Datasets Quality Applicability

variable metrics

Attention-Based 18 video frames Binary 37000 Facebook Accuracy, For word

Popularity and video title popularity videos Spearman and video

Prediction value rank processing

coefficient

Popularity- 18 video frames Binary 37000 Facebook Accuracy, For video

LRCN popularity videos Spearman processing

value rank

coefficient

Popularity-SVR Video frames Number of 1820 Facebook Spearman Not applicable

and thumbnail, views a video videos rank as it uses

number of receives until coefficient temporary

views, likes and the 30th day independent

comments variables

(views)

MRBF Video frames, Number of The ‘Top’(4840) MRSE Not applicable,

metdata, views a video and the since the data

number of receives until ‘Random’ on the number

views, likes and the 30th day (13144) datasets of views is

comments used

The first two methods use the input data that is available before the video is published.

To assess the quality of the forecast, it is possible to use the Accuracy metric, as well as the “Recall”, “Precision”, and “ROC-AUC” metrics, since the first of the metrics is not indicative of unbalanced data sets in classification problems.

Data collection

The author tests performance of every developed method on 11 000 Youtube videos, which were published between 2 weeks and 1 year ago. This way author tries to select relevant videos, which are not gaining views anymore. Videos are selected from channels, which were included in the US trending list in the last two years. For every selected video, the data about the following video features is collected:

• thumbnail;

• title;

• date and time of the publication of the new video;

• length;

• first six minutes of video content;

• the exact time of creation of the channel.

Furthermore, several of the developed methods are going to use the following characteristics of the previous video on the channel:

• number of views;

• number of likes and dislikes;

• number of comments;

• date and time of the publication of the previous video.

In order to collect the above-mentioned data, author used Python 3 and YouTube Data API 3 to develop a custom parsing software.

Development of popularity prediction methods

Author attempts to modify applicable approaches distinguished within literature review and applies them to solve the problem considered in this research.

Method based on CNN + LSTM architecture

This method is a modification of the Popularity-LRCN method. Similarly, the method uses content of 10 selected video frames to predict popularity of a video. However, instead of training a Convolutional neural network from scratch, author proposes to use a ResNet-152 network, which is pre-trained on a large ImageNet network.

I stage . The use of the ResNet152 network in order to obtain a representation of a single frame.

II stage. The use of Long short-term memory (LSTM) cell in order to obtain an embedding of a sequence of frames.

III stage. The use of two-layer neural network and a Softmax layer to predict the class of a video.

Testing of the CNN + LSTM architecture

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Tested hypothesis. The sequence of visual features and their correspondence with the typical visual characteristics of popular videos carry enough information to predict the popularity of a newly created video.

Result. The hypothesis is not confirmed. The model failed to converge, which indicates that the content of the video is not a significant factor in the popularity prediction problem or the necessity to train the ResNet model using video frames.

Method based on biLSTM architecture

This method is used to leverage the information contained in title of a video to predict its popularity. Similar of approach of text processing is used in the method Attention-Based Popularity Prediction (Bielski, Trzcinski, 2018).

I stage. Encoding of individual words of video titles using pretrained Glove embeddings, which were pretrained on Twitter.

II stage. The use of biLSTM to obtain embedding of a video title.

III stage. The use of a two-layer fully connected network and a soft-max layer to predict the popularity of a video.

Testing of the biLSTM architecture

Tested hypothesis. An attractive video title increases the chance of a video becoming popular after upload.

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) · Vol. 11, no. 2. 2020

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) ф Том 11, № 2. 2020

Result. The hypothesis is not confirmed. The model failed to converge. Perhaps the embeddings trained on Twitter Glove data are not suitable for this task. Unfortunately, it was not possible to train our embeddings due to the limited amount of text data.

Multimodal popularity prediction method

This combines models the two previous methods to use both the title and the frames of the video to predict the popularity of the video.

I stage. Processing of video frames. Firstly, video frames are fed into the Resnet152 network. Secondly, the outputs of the network are sequentially provided to LSTM cell as inputs. Lastly, a ReLU activation function is applied to the last hidden state of the LSTM cell to form an embedding of video frames.

II stage. Processing of video title. Individual words of the title are encoded using pretrained Glove embeddings, and resulting representations are fed into biLSTM to obtain embedding of a video title. Video title representation is formed by concatenating penultimate and last hidden states and applying a ReLU activation function.

III stage. Combined embedding is formed by concatenating feature vectors created at stages I and II.

IV stage. The use of a two-layer fully-connected network and a soft-max layer with the combined embedding as input to predict the popularity of a video.

Testing of the multimodal method

Tested hypothesis. The title and visual characteristics of the video are complementary factors for predicting the popularity of the video.

Result. The hypothesis is not confirmed. The model failed to converge. Even the use of both modalities does not appear to be a solution.

Methods based on a combination of neural networks and tree ensembles

The author also proposes a novel method which uses data about the performance of the previous video on the channel, video title and thumbnail.

In this method, the author attempts to process visual and text data using neural networks, combine these two modalities with numerical data and feed the obtained embeddings to a decision tree ensemble classifier.

Firstly, numerical data is converted into meaningful features. The details of these transformations are described in Table 5.

Table 5

Processing of the numerical features

Number Analyzed features Transformation

1 Date and time of the publication of the Сomputation of the length of the time interval

new video between the planned publication of the new

Date and time of the publication of the video and the upload of the previous videos

previous video

2 The exact time of creation of the channel Calculation of the length of the time interval

Date and time of the publication of the new between the time of the upload of the current

video video and the channel’s time of creation

3 Date and time of the publication of the Transformation to weekday and hour of the

new video day

4 Weekday and hour of the day of the The use of OneHotEncoder to convert

publication of the new video categorical value to binary vectors

5 Video duration, time intervals (1 and 2) Normalization

Features of the previous data: number

of views, dislikes, likes and comments

Secondly, video title and thumbnail are processed.

Text. In order to create a text embedding, the author uses the pre-trained BERT neural network (Devlin, 2019). It is a bidirectional network, which uses a Transformer encoder

to analyze the whole sequence and not individual words. Thus, this network is capable of learning contextual dependencies between words and is perfect for the representation of text data. In order to obtain text embedding of we follow the following steps:

1. A tokenized video title is provided to the pre-trained BERT neural network as input.

2. Text embedding is obtained by taking the average of BERT’s hidden states

3. In order to lower dimensionality of the embedding, Principal component analysis method is applied. Only first 12 Principal components are used.

Thumbnail. Following the approach developed in (Keyan Ding et al., 2019), the author trains a ResNet50 model to predict the popularity score of the thumbnail. The network is trained on the available dataset of pairs of Instagram images since no similar dataset of the comparable size is available for YouTube.

The combined embedding is constructed by concatenating text embedding of a video title, popularity score and processed numerical features.

The final step is to apply a tree ensemble classifiers. The author experiments with using three different classifiers. GradientBoostingClassifier and XGBClassifier, which are two different implementations of Gradient Boosting method, and RandomForestClassifier. Gradient Boosting is a machine learning approach used for classification and regression tasks. It is an ensemble of small ordinary decision trees. In the ensemble, each subsequent predictive model corrects the errors of the previous one. Random forest is also an ensemble of ordinary trees, but it consists of complicated decision trees. Each decision tree is trained individually on a subsample of data.

Testing of the methods based on a combination of neural networks and tree ensembles

The model converged, and author tests the proposed architecture and conducts several experiments:

1. The author tests various combinations of independent variables in order to find the most significant factors.

2. For each of the three classifiers, brute force through hyperparameters to find the best set was executed.

Results of the experiments

Classifier Data Precision Recall Roc_auc Accurac

У

GradientBoos Numerical 0.602 0.436 0.697 0.863

tingClassifier Numerical

Title 0.637 0.475 0.703 0.870

Numerical

Title

Thumbnail 0.639 0.483 0.719 0.872

XGBClassifier Numerical 0.626 0.467 0.706 0.863

Numerical

Title

Thumbnail 0.645 0.496 0.722 0.876

Numerical

Title 0.632 0.490 0.718 0.873

Thumbnail

RandomFore Numerical 0.595 0.436 0.690 0.862

stClassifier Numerical 0.639 0.472 0.711 0.873

Title

Numerical 0.632 0.490 0.718 0.873

Title

Thumbnail

Table 6

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) · Vol. 11, no. 2. 2020

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) ф Том 11, № 2. 2020

Results of experiments are presented in Table 6.

Firstly, it is possible to conclude that RandomForest demonstrates the quality metrics scores when using thumbnail, text and numerical data for popularity prediction. However, the performance of the RandomForest model using just numerical data and the title of the video is not way off.

Secondly, XGBClassifier demonstrates the best performance among all of the methods when it receives just the title of the video and its numerical data as input.

Lastly, Gradient boosting performs the worst among all of the methods.

Fig. 3. Video features importance for popularity prediction

Ensemble methods are very informative since they provide ways to measure the importance of each feature of the video for its popularity prediction. From Figure 3, it is possible to conclude that the performance (view count) of the previous video on the channel is a very significant feature. Subscriber count, video duration and hour of the day of upload are other informative independent variables. On the other hand, the thumbnail popularity score does not seem to be a feature with high predictive power.

Conclusions

To conclude, this paper presents the results of the research, which deals with an actual problem of the video popularity prediction. As part of this work, the author conducts a literature review to identify applicable methods and important video features to predict video content popularity. As a result, possible elements of machine learning models were identified.

A special parsing software was developed in order to collect the data about 11,000 YouTube videos. Data was grouped to construct a dataset, which would later be used for training and testing various models.

The author explores whether modifications of the methods revealed within the literature review were applicable to solve the research problem. The application of these modified methods was based on assumptions about the importance of visual attributes and/or video

titles for predicting the popularity of a newly created video. However, as it was discovered, visual information is not sufficient to predict the popularity of the video.

Lastly, the author develops methods based on tree ensembles and neural networks, which use the performance of the previous video published on the channel, data about the channel performance, title and thumbnail of the newly created video. Further investigation revealed that the performance of the previous video and channel were the variables with the most predictive power. The best model achieves the following values of quality metrics on a test subset: Accuracy - 0.87; Precision - 0.63; Recall - 0.49, that is, 87% of videos are correctly classified as popular/unpopular; among videos classified as popular, 63% are popular; 49% of truly popular videos are correctly identified.

As part of further research, it is necessary to look into constructing a larger dataset to avoid using pre-trained neural networks. Also, to improve the tree ensemble methods, it is needed to collect and process data on the duration and frequency of viewing the channel by regular users. Further research should encompass the interests of marketers, data analysts, linguists and psychologists to provide an interdisciplinary perspective on the video popularity prediction.

СПИСОК ЛИТЕРАТУРЫ/REFERENCES

Alexa Internet, Inc. (2020). The top 500 sites on the web (https://www.alexa.com/topsites -Accessed: 15-Apr-2020).

Bielski, A., Trzcinski, T. (2018). Pay Attention to Virality: Understanding Popularity of Social Media Videos with the Attention Mechanism. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). DOI: https://doi.org/10.1109/ cvprw.2018.00309 - Accessed: 29-Apr-2020).

Buffer (2020). State of Social 2019 (https://buffer.com/state-of-social-2019 - Accessed: 11-Apr-2020).

Clement, J. (2019). Global logged-in YouTube viewers per month 2017-2019. Statista (https://www.statista.com/statistics/859829/logged-in-youtube-viewers-worldwide/ -

Accessed: 9-Jan-2020).

Clement, J. (2020). Global number of internet users 2005-2019. Statista (https://www. statista.com/statistics/273018/number-of-internet-users-worldwide/ - Accessed: 12-Mar-2020).

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Crane, R., Sornette, D. (2008). Viral, quality, and junk videos on YouTube: Separating content from noise in an information-rich environment. The AAAI Spring Symposium: Social Information Processing (https://www.aaai.org/Papers/Symposia/Spring/2008/SS-08-06/SS08-06-004.pdf - Accessed: 15-Jan-2020).

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv (https://arxiv.org/ pdf71810.04805.pdf - Accessed: 15-May-2020).

Enberg, J. (2020). How COVID-19 Has - And Has Not - Affected Global Ad Spending. eMarketer (https://www.emarketer.com/content/how-coronavirus-affects-global-ad-spending -Accessed: 15-Apr-2020).

Facebook for Developers (2020). API Graph (https://developers.facebook.com/docs/graph-api/ - Accessed: 16-Apr-2020).

Fontanini, G., Bertini, M., Del Bimbo, A. (2016). Web Video Popularity Prediction using SentimentandContentVisual Features.ICMR’16: Proceedingsofthe2016ACMonInternational Conference on Multimedia Retrieval. DOI: http://dx.doi.org/10.1145/2911996.2912053 -Accessed: 29-Apr-2020).

Geurts, P., Ernst, D., Wehenkel, L. (2006). Extremely randomized trees. Mach Learn, 63, 3-42 (https://doi.org/10.1007/s10994-006-6226-1 - Accessed: 15-May-2020).

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) · Vol. 11, no. 2. 2020

JOURNAL OF ECONOMIC REGULATION (Вопросы регулирования экономики) ф Том 11, № 2. 2020

Guandan Chen, Qingchao Kong, Nan Xu, Wenji Mao (2019). NPP: A neural popularity prediction model for social media content. Neurocomputing, 333, 221—230. DOI: 10.1016/j. neucom.2018.12.039 (Accessed: 05-May-2020).

Influence Marketing Hub (2019). Coronavirus (COVID-19) Marketing & Ad Spend Impact: Report + Statistics (https://influencermarketinghub.com/coronavirus-marketing-ad-spend-report/ — Accessed: 11-May-2020).

Keyan Ding, Kede Ma, Shiqi Wang (2019). Intrinsic Image Popularity Assessment. Proceedings of ACM Conference (Conference’19). ACM, New York, NY, USA, 9 pages (https:// arxiv.org/pdf/1907.01985.pdf - Accessed: 15-May-2020).

Min Gyeong Choe, Jae Hong Park, Dong Won Seo (2019). How Long Will Your Videos Remain Popular? Empirical Study of the Impact of Video Features on YouTube Trending Using Deep Learning Methodologies, pp. 190-197 / In: Jennifer J. Xu, Bin Zhu, Xiao Liu, Michael J. Shaw, Han Zhang, Ming Fan (eds.) The Ecosystem of e-Business: Technologies, Stakeholders, and Connections: 17th Workshop on e-Business, WeB 2018, Santa Clara, CA, USA, December 12, 2018, Revised Selected Papers. Springer, 199 p.

Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipfs law. Contemporary Physics, 46(5), 323-351. DOI: 10.1080/00107510500052444 (Accessed: 11-May-2020).

Pew Research Center (2020). Share of US adults using social media, including Facebook, is mostly unchanged since 2018 (https://www.pewresearch.org/fact-tank/2019/04/10/share-of-u-s-adults-using-social-media-including-facebook-is-mostly-unchanged-since-2018/ — Accessed: 09-Apr-2020).

Sherman (2019). 35 Digital Marketing Statistics That Will Convince You to Advertise Online. Lyfe Marketing (https://www.lyfemarketing.com/blog/digital-marketing-statistics/ — Accessed: 12-Jan-2020).

Statt, N. (2020). YouTube is a $15 billion-a-year business, Google reveals for the first time. The Verge (https://www.theverge.com/2020/2/3/21121207/youtube-google-alphabet-earnings-revenue-first-time-reveal-q4-2019 — Accessed: 12-Apr-2020).

Tao Chen, Damian Borth, Trevor Darrell, Shih-Fu Chang (2014). DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. arXiv (https:// arxiv.org/abs/1410.8586 - Accessed: 15-May-2020).

Trzcinski T., Andruszkiewicz P., Bochenski T., Rokita P. (2017). Recurrent Neural Networks for Online Video Popularity Prediction. arXiv (https://arxiv.org/pdf/1707.06807. pdf - Accessed: 29-Apr-2020).

Trzcinski, T., Rokita, P. (2017). Predicting Popularity of Online Videos Using Support Vector Regression. IEEE Transactions on Multimedia, 19(11), 2561-2570. DOI: 10.1109/ TMM.2017.2695439 (Accessed: 23-Jan-2020).

Tubics (2020). How Many YouTube Channels Are There? (https://www.tubics.com/blog/ number-of-youtube-channels/ - Accessed: 08-Apr-2020).

Wyzowl (2019). The State of Video Marketing 2019 (https://info.wyzowl.com/state-of-video-marketing-2019-report - Accessed: 21-Feb-2020).

YouTube (2020). Press - YouTube (https://www.youtube.com/intl/en-GB/about/press/ -Accessed: 09-Apr-2020).

i Надоели баннеры? Вы всегда можете отключить рекламу.