-□ □-
Запропоновано алгебрагчний метод знаходження оценок ранггв PageRank для сторток сайтхв. Обсяг обчислень запропонованого методу не залежить вiд значення коефщ,-ента демпфiрування, що дозволяе отримувати бЫьш точн ощнкирангiв PageRank, в порiвняннi з аналогами. Ыдмтною особлив^тю запропонованого методу е послгдовне виконан-ня обчислень одночасно з роботою алгоритму обходу графа. Проведений порiвняльний аналЬ алгоритмiв обходу графiв показав, що на вiдмiну вiд алгоритму пошуку в глибину алгоритм пошуку в ширину дае бтьш упорядковану матри-цю переходiв, яка мае вигляд блочно Хессенберговской. Використання ще1 обставини дозволило ктотно скороти-ти обсяг обчислень запропонованого методу. Отримат рiв-няння, що описують запропонований метод, мають блочну структуру, яка дозволяе ефективно розподтяти весь обсяг операщт на паралельн обчислювальт потоки. Виходячи з того, що основна частина обчислень може бути викона-на тд час виконання алгоритму обходу графа, визначен умови, при яких запропонований метод дозволяе отрима-ти ощнку ранггв PageRank швидше, нж вiдомi терацшт алгоритми. Областю застосовностх розробленого методу в першу чергу е використання його при безпосереднт перевiр-щ достовiрностi розмщення рекламних матерiалiв на вiд-повгдному веб-ресурп, тому вона обмежена окремими сайтами або сегментами ттернету з тлькгстю сторток не бтьше 104-105
Ключовi слова: граф сайту, ранги сторток, матриця переходiв, коеф^ент демпфiрування, матриця теле-портацп
-□ □-
UDC 519.688: 519.177
|DÖI: 10.15587/1729-4061.2018.1312751
AN ALGEBRAIC METHOD FOR CALCULATING PAGERANK
V. Vla sy u k
Manager Vertamedia LLC branch Admiralskyi ave., 34а, Odessa, Ukraine, 65009 E-mail: [email protected] O. Galchon kov PhD, Associate Professor* E-mail: [email protected] A. Nevrev PhD, Associate Professor* E-mail: [email protected] *Department of Information Systems Institute of Computer Systems Odessa National Polytechnic University Shevchenko ave., 1, Odessa, Ukraine, 65044
1. Introduction
The constantly increasing use of the Internet in all areas of human activity has made it one of the most important places for the development of the advertising business [1]. However, when posting an advertisement, the advertiser does not directly interact with the owners of the websites that host the advertisement. On the way from the advertiser to the website, there is a large number of intermediaries, which include advertising agencies, ad networks [2], arbitrage networks (Trading desks and Ad Exchanges [3]), SSP and DSP platforms [4, 5]. On the one hand, it facilitates the activity of the advertiser, increases the effectiveness of advertising, and reduces its cost. On the other hand, there are additional chances for fraud.
Therefore, it is extremely topical to verify precisely the reliability of posting advertising materials on a relevant web resource [6-8]. As the structure of web resources dynamically changes, one of the primary tasks is to analyse the current structure of a site and the current ranks (importance) of its pages. This can help determine the effectiveness of advertisement posting in the next steps.
The questions of analysing the structure of sites and finding the ranks of their pages have been widely and fully covered in published studies, for example in [9, 10]. However, they are considered separately from each other there. It is assumed that the structure of a site is analysed first, and then the ranks of its pages are determined. This approach leads to the fact that when working with multicore or multiprocessor
computers, not all computing powers are used. Therefore, it is important to develop algorithms for determining the ranks of site pages, the main amount of which can be done in parallel with the work of the algorithm for analysing the structure of the site. This will make it possible to use the calculator more fully and get the result faster.
One of the most well-known algorithms for ranking pages of websites is PageRank, proposed in [11]. However, due to the presence of pages with no outbound links, the task of estimating the ranks in accordance with this approach is degenerate. To solve the situation, various types of regular-ization are used. The greater the level of regularization, the easier it is to obtain a solution, but the greater the deviation of the estimates obtained from the idea proposed in [11]. Therefore, it is important to develop algorithms for finding PageRank estimates that differ from those known by the best ratio of the volume of necessary computations and the level of regularization.
2. Literature review and problem statement
The most effective way to represent the structure of a site is to represent it in the form of a graph the vertices of which are pages of the site, and the edges are defined by the links from one page to another. To determine the ranks of pages, there are many approaches, but the most famous of them is PageRank, proposed in [11]. The main idea of this approach
©
is that the rank of a page u linearly depends on the ranks of the pages that refer to it:
PR(u) = £
PR(v) L(v) '
(1)
where PR(v) is the PageRank of a page v; L(v) is the total number of outbound links on the page v; Bu is the set of pages that refer to the page u.
Assuming that there is at least one reference to other pages on each page, equation (1) is a particular case of the vector equation [12]:
it is interpreted in the way as if a user who is on any page of the site follows, with a probability a, one of the links available on this page, and with a probability (1-a), goes to any other page:
P=aP+(i-0) J,
N
(5)
where J is a matrix with all elements equal to 1.
As a result, rank estimates are found from the solution of the modified equation (2):
Av = v,
(6)
Av = v, (2)
where v is the vector of the ranks of site pages: A = P T,
P is the matrix of transitions of a dimension N x N, a nonnegative matrix with coefficients:
1 /deg(i), if the i-th page has a link to the j-th page;
Pi = i
'' 0, if the i-th page does not have a link to the j-th page;
deg(i) is the number of outbound links on the page i,
Pj > 0 and £ Pj = 1,
i
N is the number of pages on the site.
The solution of equation (2) is considered either as finding the eigenvector of the matrix A, corresponding to the maximum eigenvalue of this matrix equal to 1, or as a solution of the system of equations [12, 13]:
(I - A) v = 0, (3)
where I is an identity matrix.
From the theory of Markov homogeneous chains, where a similar graph model is used, it is known that equation (2) has a solution, and it is unique (within a multiplier) only if the graph is strongly coupled and aperiodic [9]. The second condition for web graphs is automatic. And the first condition is not fulfilled if the site has the so-called 'hanging' pages - pages that do not have links to other pages. Hanging pages give zero rows in the matrix of transitions P and, accordingly, lead to its degeneracy.
The most popular approach for solving the problem of hanging pages is to modify the matrix P in such a way as if hanging pages contain transitions to all pages [9, 12, 13]:
P= P +—D, (4)
N
where N is the total number of pages on the site; D is a matrix of the dimension N xN, whose i-th rows contain ones, and the remaining rows are zero for all i-numbered pages.
In addition to solving the problem of hanging pages to ensure strong coherence of the graph, another modification of the matrix of transitions is made - 'teleportation.' Physically,
where v is the vector of the ranks of the site pages, A = (P )T.
The whole set of approaches to solving equation (6) can be divided into two groups [14] - algebraic methods and iterative methods.
Iterative methods assume a consistent approximation to the desired estimate of the vector of ranks, producing iterations. One of the most popular iterative methods is the power method [15]:
vk+1 = (AV )/|| Av\, (7)
where vk is the estimate of the vector of ranks v at the k-th iteration;
X L=11
i
is the norm of the vector X, equal to the sum of the moduli of its components xi.
As the initial approximation, we take the vector v0, all elements of which are equal to 1/N. Calculations stop after reaching
||vk+1 -4 <S, (8)
where 5 is a small value.
To obtain faster estimates of ranks, either linear algebra methods [15, 16], such as LU decomposition and QR algorithm, or the Jacobi, Lanczos, Arnoldi [17] and Gauss-Seidel methods are used. Or heuristic methods are involved. For example, the extrapolation method, which is based on the power-law method [18], or the Arnoldi method [19]; however, instead of computations on some iterations, there is simply extrapolation to obtain estimates of the next step. Or an adaptive method [20, 21] is used, which is based on the fact that the elements of the vector vk converge at different rates. Therefore, for the elements that have already practically converged and do not change, no calculations are made. In addition, a large number of studies are devoted to the acceleration of obtaining estimates by iterative methods due to their modification for calculating on multicore or multiprocessor computers [22-27].
For example, in [22], massive parallelism of GPUs is used to calculate PageRank, and in [23], a partitioning and compact representation of a site graph is proposed to match the memory size of single processors in an array of GPUs. In [24], a parallel algorithm for computing PageRank is proposed, based on the use of the distributed programming concept MapReduce and the list of contiguities in multiprocessor arrays. The load balancing capabilities in comparison with memory-based methods in the multiprocessor CUDA architecture are examined in [25]. In [26], a method is
proposed to accelerate the calculation of PageRank in the multiprocessor CUDA architecture by searching for special structures in the site graphs that allow parallelizing the computations to the maximum number of threads. In work [27], it is suggested to accelerate the calculation of PageRank on distributed multiprocessor computers due to the use of the MapReduce concept and the Hadoop framework.
Algebraic methods presuppose an exact solution of equation (6) in the form [14] of
I-a| PT +—DT
N
(1 -a) é
N
(9)
where e is a vector of a dimension N, all elements of which are equal to 1; I is the identity matrix of the size N x N.
In search programs with very large transition matrices, with N of the order of 107, iterative solutions of (2) are used with a=0.85...0.9, where a is also called the damping coefficient. The smaller the value of a, the higher the convergence rate of the iterative methods, but the less the accuracy of the solution for the page ranks in terms of formula (1) [12, 13, 28].
When verifying advertising companies, a particular web resource (site) is analysed; N usually does not exceed 104 ... 105. It is, therefore, desirable to obtain the most accurate solution of formula (1). In addition, it should be noted that the solution of equation (6) by iterative methods usually begins only after the structure of the web resource is completely determined, that is, when the matrix P is known. At the same time, the bypass of the site graph (definition of its structure) is iteratively performed, and it takes a significant amount of time that could be used to calculate page ranks. This is especially true when all operations are performed on a powerful calculator containing a large number of cores or processors. Therefore, it is of interest to develop a method for calculating page ranks by an algebraic method [14] the volume of operations for which does not depend on the proximity of a to 1 and which takes into account the features of the graph traversal algorithm.
3. The aim and objectives of the study
The aim of the study is to develop a method for calculating page ranks with an algebraic approach that takes into account the features of the graph traversal algorithm.
To achieve the aim, the following tasks are set and solved:
- to analyse the features of the algorithms for traversing graphs and to identify opportunities to reduce the amount of calculations, as well as to produce part of the calculations while the graph traversal algorithm is being used;
- to take into account the structural features of the transition matrix when constructing a step-by-step calculation of the ranks;
- to determine the order of the operations for the method obtained and to find the conditions under which the operations' volume is smaller than with the iterative methods of determining the ranks.
4. Analysis of the features of graphs traversal algorithms
Let us consider two main approaches to the construction of algorithms for traversing graphs [10]. These are a depth-first search (DFS) algorithm and a breadth-first search (BFS) algorithm.
We represent the graph of the target site in the form of
G = (V, E),
where V is the set of all pages of the site, or the set of all vertices of the graph G; E is the set of the pairs (u, v), where u, v e V are connected by an edge of the graph G, that is, on the page u there is a link to the page v.
We assume that we know the initial vertex of the graph G v0 0 as the main page of the site.
The DFS algorithm is aimed at a maximally rapid movement into the interior of the graph G. Upon falling into some vertex of the graph wj, the algorithm forms a set of the vertices of the graph that it has already by passed, Df
Di = Dj-1 + [wj}, i = 1,2,3, ...
D0 = HK where w0 = v0,0.
(10)
Next, we find the set of the vertices of the graph G to which there are links on the page wj:
B, = {b,,b„2, ... }.
(11)
From this set, we remove the vertices which the DFS algorithm has already bypassed:
W ,, = B i - (B i n D i ).
(12)
From the set Wj, by some rule (for example, the first outgoing link on the page), we select one vertex wj+i, into which the algorithm proceeds in the next step.
If the set Wj is empty, one step back is made to form a new set:
W- = B i-1 - (Bi-1 n Di ),
(13)
and the vertex wj+1 is selected from it.
If the set W1_1 is also empty, then it is necessary to roll back one more step and so on. The traversal of the graph is completed when it is impossible to form a nonempty set Wm and to roll back to the initial vertex of the graph.
In contrast to the depth-first search (DFS) algorithm, the breadth-first search (BFS) algorithm involves sequential traversal of the graph over layers. By the i-th layer, we mean a subset Vi of vertices of the graph G spaced at the shortest distance to i edges from the vertex v0,0. If any of the j-th vertices of the i-th layer of the graph vi,j is hit, the BFS algorithm generates a subset of the vertices of the graph that it has already bypassed - Pj
P,i = Ki} + Pii-1, i = 1,2, ... ,
where
(14)
As the structure of websites dynamically changes with time, when verifying advertising companies, it is considered that this structure is a priori unknown and it is necessary to determine it, starting from the main page of a website.
P =P i=12 P =v
ri,0 i-1,jmaxi' ' ' ••• > Jr0,0 ^0,0'
Jmaxi is the maximum index j for the vertex vi.1,j in the layer Vi-1.
Further, the BFS algorithm finds all vertices of the graph G that are separated from the vertex v, at a distance of one edge:
F{ffi,j2' ***
(15)
V!+1 = {(F'j -(Fj n P,'j )): Vv,.j î V,}.
(16)
Bypassing of the graph by the BFS algorithm is terminated when the next set Vi+1 is empty.
From the comparison of the DFS and BFS algorithms, it is clear that the breadth-first algorithm provides more ordered subsets of the graph's vertices - layers. Moreover, proceeding from the rules of layer formation in the BFS algorithm, it is clear that the vertices of the i-th layer have references only to the vertices of the (i+1)-th layer and do not have references to the vertices of the (¿+2), (¿+3) or other layers. Therefore, when the vertices of the graph are ordered in layers by the BFS algorithm, the transition matrix P will be of a lower blockwise Hessenberg matrix type [29]. An example of the matrix P for eight layers is the following:
_ a _ (1 -a)
P =aP +—D + --- J.
N N
the (¿+1)-th layer of the vertices of the graph is made up of new detected vertices, separated from the vertices of the i-th layer at a distance of one edge:
where 0 < a < 1;J is a matrix of teleportation, with the size N x N, corresponding to the matrix of transitions P; all its elements standing in the places of blocks Aj (j < i +1), are equal to 1, and all its elements standing in the places of zero blocks j > i +1, are equal to 0; D is a matrix with the size N x N, which corresponds to the 'hanging' pages; for each 'hanging' page (not having outgoing links) with a number m, the corresponding m-th row of the matrix D contains ones in the places corresponding to the blocks Aj with j < i +1, and zeros in the places corresponding to the zero blocks (j > i +1), and all other elements of the matrix D are equal to zero.
The PageRank vector v of the ranks is defined as the dominant eigenvector of the matrix where A = (P ) :
Av = v.
(19)
Taking into account (14), equation (15) is equivalent to the following:
I-a| PT + — DT
N
(1 -a)
v = --- e.
N
(20)
P =
| A11 A12 0 0 0 0 0 0
A21 A22 A23 0 0 0 0 0
A31 A32 A33 A34 0 0 0 0
A41 A42 A43 A44 0 0 0 0
A51 A52 A53 A54 A55 A56 0 0
A61 A62 A63 A64 A65 A66 A67 0
A71 A72 A73 A74 A75 A76 A77 A78
^ A81 A82 A83 A84 A85 A86 A87 A88
(17)
The size of the matrix P is equal to N x N, where N is the total number of pages on the site (the number of vertices in the site graph). Aj denotes blocks containing links between the vertices of the i-th layer and the vertices of the j-th layer. The size of each block Aj is equal to n{ x n-, where n is the number of vertices in the i-th layer of the graph, nj is the number of vertices in the j-th layer of the graph. Since the matrix of transitions for the BFS algorithm has a block view, and the Aj blocks, for which j > (i + 2), contain only zero elements, this can be used to develop an algorithm for calculating the page rank of the target site. In addition, it should be noted that the block view of the transition matrix P creates prerequisites for all calculations in the rank estimation algorithm to have a block view. This helps parallelize efficiently the computations to threads for their processing by multicore and multiprocessor computing structures.
5. Development of a modified algorithm for estimating PageRank
where e is a vector of a dimension N, all elements of which are equal to 1; I is the identity matrix of the size N x N.
The algebraic solution of (20) is defined as [12]
I-a| PT +—DT
N
(1 -a) e
N
(21)
We introduce the notation for the matrix beeing inverted, obtained at the k-th step of the operation of the BFS algorithm as follows:
Bk = Ik-a
PT DT
(22)
where the matrices Bk, Ik, Pk, and Dk have the dimensions of Nk x Nk; Nk is the total number of vertices of the graph obtained by the BFS algorithm in k steps.
Nk = £ n.
(23)
The matrix Bk, similarly to the form of the matrix P in (17) and taking into account the transposition, is the upper blockwise Hessenberg matrix:
( r k C11 Ck 12 Ck Ck 1( k-2 ) Ck C1(k-1) rk \ C1k
Ck 21 Ck 22 Ck 23 Ck 2(k-2) Ck 2(k-1) Ck 2k
0 Ck C32 Ck C33 . . Ck 3(k-2) Ck C3(k-1) Ck C3k
We assume that the BFS algorithm is used to bypass the graph of the target site. As a result. we obtain a matrix of transitions P structured with layers.
To ensure the stability of the subsequent computational process as to (4) and (5). we similarly modify the transition matrix P as follows:
Bk =
-/(k-2)(k-2) C(k-2)(k-1)
C'
( k-1)(k-2) 0
C
(
(k-1)(k-1) ^(k-1)k
-J(k-2)k Ck
(k
C
k(k-1)
Ck
kk
. (24)
Then the matrix B+ can be represented in the block form
Bi+i =
Bk
We also assume that k »1 and Nk+1» nk+1. Then the total number of multiplication operations at the (k+1)-th step is of the following order:
Ck-
Hl)(k+1),
(25)
Uk+i = O(Nh x (2nk+i +1)).
(29)
where the matrix 0k+1 has a dimension of nk+1 x Nk, and the matrix 0,+1 has a dimension of Nk x n,+1:
0k+1 = (0 0 0 ... c" ),
=(ck+1 ck+1 C ^k+1~ \ 1(k+1) 2(k+1) •••
k(k+1);
In accordance with the Frobenius theorem [29],
B-1 =
B-1 + Br1^ H-10^ B-1 Br1^ H-1
where
Hk+1 = C
k+1w k -Hk+\0 k+B1
(k+1)(k+1) 0k+1Bk ®k+1.
Hk+11
(26)
(27)
(28)
If we assume for simplicity that all n are equal to each other and equal to n, then the total number of multiplications in the steps from 1 to k for calculating B- is of the following order:
p, = O ( „• x k x (k + 13x (2 k+1) ) = O (| N1
(30)
The total number of additions has the same order. For comparison, the total number of multiplications (similarly, additions, too) of fast iterative methods [12, 28, 30] is of the order
uFPR = o(z5x n
2 )
k+1)
(31)
where L5 is the number of iterations that must be done to achieve
Thus, expression (28) makes it possible to calculate, step-by-step, the matrix Bk-+11 and the grades of the graph vertices that have already been bypassed by the BFS algorithm, using equation (21). At each (k+1)-th step, it is necessary to invert only one matrix Hk+1 of the size nk+1 x nk+1. All other operations are block multiplications and additions of the matrices. The total number of multiplication and addition operations to be performed at the (k+1)-th step is given in Table 1.
(32)
where Vj is the estimate of the vector v at the j-th iteration.
For the typical values of 8 = 10 , a = 0.85 and Nk+
101
the required number of iterations for iterative methods [30] is
L = 50 ... 100.
Table 1
The number of operations of multiplication and addition at the (A+1)-th step
Operation Multiplication Addition
k i(n, x nk+1) 1=1 -
Ck+1 C(k+1)(k+1) nk+1 x nk+1 nk+1
0k+1 nt x nk+1 -
©k+A1 nk+1x n x(in, j nk+1 x (nk -1) x(in, j
®i+1B-10 i+1 [in )x nL (i n, - 1jx n2+1
Hk+1 - nL
Hkh O(n3+1) O(n3+1)
[in )x n+1 [i n, )x nk+1 x (nk+1 -1)
B-1Oi+1H-+1 (i n,) xnk+1 (i n, - 1jx((^n, )xnk+1
-H-+10k+1B-1 [in, )x nk+1 [in, )x nt+1 x (nt+1 - 1)
(i n,) xnk+1 [in j x (nk+1 -1)
B-1 + B-1oi+1H+110i+1B-1 - (i n)'
vk+1 (1«] (i 1)x(i •■)
However, this number increases significantly as a approaches 1 and the necessary 5 decreases [30].
The exact PageRank value for the page u is determined by equation (1). The parameter a is introduced in (5) to regularize the resulting solution, and the closer it is to 1, the resulting solution is closer to the ideal value.
At the same time, the volume of calculations by the proposed algebraic method does not depend on a, and the accuracy of finding the ranks of the pages is determined only by the accuracy of the calculations.
It should be noted that iterative methods begin calculations when the site graph is fully known. At the same time, in accordance with the proposed algebraic method, the amount of calculations specified in (30) can be made while the graph of the site is traversed by the BFS algorithm. After traversing the graph, only Uk+1 operations remain (formula (29)). Proceeding from a comparison of formulas (29) and (31), we obtain that for
L8> (2 nk+1 +1),
(33)
the proposed algebraic method requires less computation at the last iteration, which allows getting the necessary PageRank values faster.
6. Discussion of the developed algebraic method for calculating PageRank values
The results obtained in this study are based on the fact that, in contrast to known works, the calculation of ranks is carried out simultaneously with using the algorithm for traversing the graph in breadth. Taking into account the structural features of the transition matrix obtained at each step of the algorithm for traversing the graph in breadth, it is possible to construct a step-by-step algorithm for calculating the ranks.
The advantage of the developed method of determining ranks is the independence of the volume of calculations from the damping coefficient. The proposed step-by-step calculation of rank estimates is oriented to multicore and multiprocessor architectures and allows more efficient use of the computing device.
The applicability of the developed method is limited to Internet sites or segments with the number of pages not exceeding 104 or 105. The spread of this method to segments of the Internet with a large number of pages requires further investigation of the peculiarities of the transition matrix in particular cases.
Estimates for the volumes of computations given in formulas (29) through (31) are upper estimates, since they assume that all the matrices involved in the computations are completely filled. On real websites, the number of links on one page usually does not exceed 5-10. Therefore, most of the elements in the matrices C^^, C^)^), and Ok+1 have zero values, which drastically reduces the number of floating point operations that need to be performed.
It is also noteworthy that the proposed algebraic method of calculating PageRank at each step contains operations of addition and multiplication of matrix blocks of a small size and only one operation of inverting a matrix, again of a small size. This makes it easy to split the entire number of computations into parallel computational threads using multicore or multiprocessor computers.
7. Conclusions
1. The analysis of the algorithm for traversing the graphs of sites in depth did not reveal any features of the resulting matrix of transitions. At the same time, the algorithm for traversing the graphs of sites in breadth arranges the pages of the site into layers, which reduces the matrix of transitions to the blockwise Hessenberg type. The presence of a large number of zero blocks in such a matrix was taken into account when developing the algorithm for calculating ranks.
2. The algorithm for calculating the PageRank values is based on a step-by-step estimation of the inverse modified transition matrix on the basis of the Frobenius theorem on inversion of block matrices. At each step of the algorithm for traversing the site graph in breadth, there is a next layer of graph vertices and, accordingly, a step of the algorithm for computing the inverse transition matrix is performed.
3. The comparison of the computational volume for the new algebraic method and iterative methods has shown an advantage of the proposed method in situations where only one site or a relatively small segment of the Internet is analysed, and it is required to use a damping factor close to 1.
References
1. Understanding the Detection of View Fraud in Video Content Portals / Marciel M., Cuevas R., Banchs A., González R., Traverso S., Ahmed M., Azcorra A. // Proceedings of the 25th International Conference on World Wide Web - WWW '16. 2016. doi: 10.1145/2872427.2882980
2. Alternative Ad Networks to Open Up New Channels of Growth in 2018. URL: https://www.singlegrain.com/blog-posts/pay-per-click/44-ad-networks-will-help-open-new-channels-growth/
3. Agency Trading Desks. URL: https://adexchanger.com/Agency_Trading_White_Paper.pdf
4. What's the Difference Between Using a DSP and an Agency Trading Desk? URL: http://weevermedia.com/uncategorised/whats-difference-using-dsp-agency-trading-desk/
5. Thompson J. What Is a DSP, SSP and Ad exchange, and how do they fit together? URL: https://www.bannerconnect.net/what-is-a-dsp-ssp-and-ad-exchange/
6. Scott S. The S8.2 Billion Adtech Fraud Problem That Everyone Is Ignoring // TechCrunch. URL: https://techcrunch. com/2016/01/06/the-8-2-billion-adtech-fraud-problem-that-everyone-is-ignoring/
7. Reichow J. TUNE Fraud Series: Types of advertising fraud // TUNE. 2017. URL: https://www.tune.com/blog/types-of-advertis-ing-fraud/
8. Ricci M. Why it's Time to Take a Stand Against Ad Fraud // MartechAdvisor. 2017. URL: https://www.martechadvisor.com/ articles/ads/why-its-time-to-take-a-stand-against-ad-fraud/
9. Leskovec J., Rajaraman A., Ullman J. D. Mining of Massive Datasets. Cambridge University Press, 2014. doi: 10.1017/ cbo9781139924801
10. Introduction to Algorithms / Cormen T. H., Leiserson C. E., Rivest R. L., Stein C. 3rd ed. MIT press and McGraw-Hill, 2009. 1313 p.
11. Brin S., Page L. The anatomy of a large-scale hypertextual Web search engine // Computer Networks and ISDN Systems. 1998. Vol. 30, Issue 1-7. P. 107-117. doi: 10.1016/s0169-7552(98)00110-x
12. Polyak B. T., Tremba A. A. Regularization-based solution of the PageRank problem for large matrices // Automation and Remote Control. 2012. Vol. 73, Issue 11. P. 1877-1894. doi: 10.1134/s0005117912110094
13. Del Corso G. M., Gullí A., Romani F. Fast PageRank Computation via a Sparse Linear System // Internet Mathematics. 2005. Vol. 2, Issue 3. P. 251-273. doi: 10.1080/15427951.2005.10129108
14. PageRank // Wikipedia. URL: https://en.wikipedia.org/wiki/PageRank
15. Eigensystems for Large Matrices / Harris L., Papanikolaou I., Shi J., Strott D., Tan Y., Zhong C., Changhui T. // AMSC460. 2015.
16. Sargolzaei P., Soleymani F. PageRank Problem, Survey And Future Research Directions // International Mathematical Forum. 2010. Vol. 5, Issue 19. P. 937-956.
17. Wu G., Wei Y. A Power-Arnoldi algorithm for computing PageRank // Numerical Linear Algebra with Applications. 2007. Vol. 14, Issue 7. P. 521-546. doi: 10.1002/nla.531
18. Kamvar S. D., Haveliwala T. H., Golub G. H. Extrapolation methods for accelerating PageRank computations // Technique Report SCCM 03-02.2003. Stanford University, 2003.
19. Wu G., Wei Y. An Arnoldi-Extrapolation algorithm for computing PageRank // Journal of Computational and Applied Mathematics. 2010. Vol. 234, Issue 11. P. 3196-3212. doi: 10.1016/j.cam.2010.02.009
20. Kamvar S., Haveliwala T., Golub G. Adaptive methods for the computation of PageRank // Linear Algebra and its Applications. 2004. Vol. 386. P. 51-65. doi: 10.1016/j.laa.2003.12.008
21. Yin J.-F., Yin G.-J., Ng M. On adaptively accelerated Arnoldi method for computing PageRank // Numerical Linear Algebra with Applications. 2011. Vol. 19, Issue 1. P. 73-85. doi: 10.1002/nla.789
22. Kim M.-S. Towards Exploiting GPUs for Fast PageRank Computation of Large-Scale Networks // Proceeding of the third International Conference on Emerging Databases. 2012.
23. Rungsawang A., Manaskasemsak B. Fast PageRank Computation on a GPU Cluster // 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing. 2012. doi: 10.1109/pdp.2012.78
24. Liu C., Li Y. A Parallel PageRank Algorithm with Power Iteration Acceleration // International Journal of Grid and Distributed Computing. 2015. Vol. 8, Issue 2. P. 273-284. doi: 10.14257/ijgdc.2015.8.2.24
25. Parallel PageRank Algorithms: A Survey / Srivastava A. K., Srivastava M., Garg R., Mishra P. K. // International Journal on Recent and Innovation Trends in Computing and Communication. 2017. Vol. 5, Issue 5. P. 470-473.
26. Parallel and Improved PageRank Algorithm for GPU-CPU Collaborative Environment / Choudhari P., Baikampadi E., Patil P., Gadekar S. // International Journal of Computer Science and Information Technologies. 2015. Vol. 6, Issue 3. P. 2003-2005.
27. Yang P., Zhou L. Research on PageRank Algorithm parallel computing Based on Hadoop // Proceedings of the 2016 4th International Conference on Mechanical Materials and Manufacturing Engineering. 2016. doi: 10.2991/mmme-16.2016.40
28. Gleich D., Zhukov L., Berkhin P. Fast Parallel PageRank: A Linear System Approach. URL: http://citeseerx.ist.psu.edu/viewdoc/ download?doi=10.1.1.592.64&rep=rep1&type=pdf
29. Gantmakher F. Theory of matrices. Moscow: Nauka, 2004. 581 p.
30. Wright G. Probability, linear algebra, and numerical analysis: the mathematics behind Google'sTM PageRankTM. URL: http:// math.boisestate.edu/~wright/courses/m297/google_talk.pdf