Научная статья на тему 'Параллельный алгоритм вычисления характеристических полиномов матриц, основанный на методе гомоморфных образов'

Параллельный алгоритм вычисления характеристических полиномов матриц, основанный на методе гомоморфных образов Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
114
38
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ВЫЧИСЛЕНИЕ ХАРАКТЕРИСТИЧЕСКИХ ПОЛИНОМОВ МАТРИЦ / ПАРАЛЛЕЛЬНЫЙ АЛГОРИТМ / МЕТОД ГОМОМОРФНЫХ ОБРАЗОВ / КЛАСТЕР / COMPUTING CHARACTERISTIC POLYNOMIAL OF MATRICES / PARALLEL ALGORITHM / METHOD OF HOMOMORPHIC IMAGES / CLUSTER

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Переславцева Оксана Николаевна

Предлагаются параллельные алгоритмы для вычисления характеристических полиномов целочисленных и полиномиальных матриц. Данные алгоритмы основаны на методе гомоморфных образов, примененном как к кольцу целых чисел, так и к кольцу полиномов многих переменных. Для применения метода гомоморфных образов находится верхняя оценка числовых коэффициентов характеристического полинома. Обсуждаются результаты экспериментов с параллельными алгоритмами, проведенных на кластере МВС-100К в МСЦ РАН.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

PARALLEL ALGORITHMS FOR COMPUTING THE CHARACTERISTIC POLYNOMIALS BASED ON THE METHOD OF HOMOMORPHIC IMAGES

There are produced parallel algorithms for computing the characteristic polynomials for integer and polynomial dense matrices. The algorithms are based on the method of homomorphic images in the ring of integers and in the ring of polynomials.We have obtained an upper bound for numerical coefficients of a characteristic polynomial. There are stated and discussed results of experiments with parallel algorithms for computing the characteristic polynomials of integer and polynomials matrices. The experiments with parallel algorithm are conducted on cluster MVSlOOk of Joint Super-Computer Center RAS. Supported by the Sci. Program Devel. Sci. Potent. High. School, RNP 2.1.1.1853.

Текст научной работы на тему «Параллельный алгоритм вычисления характеристических полиномов матриц, основанный на методе гомоморфных образов»

UDK 519.688

PARALLEL ALGORITHMS FOR COMPUTING THE CHARACTERISTIC POLYNOMIALS BASED ON THE METHOD OF HOMOMORPHIC IMAGES

© Oksana Nikolayevna Pereslavtseva

Tambov State University named after G.E. Derzhavin, Internatsionalnava, 33, Tambov, 392000, Russia, programmer of Algebraic Computing Department, e-mail: [email protected]

Key words: computing characteristic polynomial of matrices; parallel algorithm; method of homomorphic images; cluster.

There are produced parallel algorithms for computing the characteristic polynomials for integer and polynomial dense matrices. The algorithms are based on the method of homomorphic images in the ring of integers and in the ring of polvnomials.We have obtained an upper bound for numerical coefficients of a characteristic polynomial. There are stated and discussed results of experiments with parallel algorithms for computing the characteristic polynomials of integer and polynomials matrices. The experiments with parallel algorithm are conducted on cluster MVSlOOk of Joint Super-Computer Center RAS.

1 Introduction

Computation of the characteristic polynomials for dense matrices is a classical problem of computing algebra. Let’s give overview of the basic results.

In 1881 Leverrie suggested one of the first methods for computing the characteristic polynomials of matrices over ring[l], Faddeev D.K. in 1943 has offered modification of Leverrie’s method [2]. This method also can compute an adjoint matrix. The Leverrie’s algorithm (with Winograd’s improvement [3] (p.656)) demands ~ 4n3'5 ring operations, Faddeev’s algorithm demands ~ 2n4 ring operations for computing the characteristic polynomial of the matrix of order n x n. The basis of these algorithms is computation of matrix degrees. It allows to use parallel matrix multiplication to obtain the parallel algorithms for computing the characteristic polynomials. We notice that till now Leverrie’s and Faddeev’s algorithms have been the best parallel algorithms although much improvement of consecutive algorithms for computation of characteristic polynomials have been done.

Seifullin’s algorithm (2002) [4] has less ring operations (~ 1/2n4), But his algorithm cannot work parallel because it is strictly consecutive and is not recursive. For the same reason Malasehonok’s algorithm (1999) [5] with complexity ~ 8/3n3 and its modification (2008) [6] with complexity ~ 7/3n3 also cannot be write in parallel form. These two algorithms have the least number of ring operations.

Danilewskv’s algorithm (1937) [7], Keller-Gehrig’s algorithm (1985) [8], Pernet-Storjohann’s algorithm (2007) [9] are the best for computation of characteristic polynomials over a finite field. It demands ~ 2n3 , O(nw log2 n^d O(nw) operations over a finite field accordingly. Here O(nw) is a complexity of matrix multiplication. These algorithms are the asymptotic best algorithms for computation of characteristic polynomials over a ring of integers and over a ring of polynomials with integer coefficients if the CRT algorithm is used.

This work is directed to development of parallel methods for computation of characteristic polynomials of dense matrices. The considered parallel algorithms are based on the method of homomorphic images. It is do due to the fact that modular arithmetics assumes natural parallelism since computation of characteristic polynomials for each module is independent and parallel. In Section 2 there is a detailed description application of the method of homomorphic images to a ring of polynomials of many variables and to a ring of integers is described detail. In Section 3 an upper bound of numerical coefficients of a characteristic polynomial is obtained. This upper bound is necessary for application of the method of homomorphic images.

Algorithms for computation of a characteristic polynomial over a finite field for the different sizes of matrices will show different time. Therefore, it is needed to realize various algorithms for computation of characteristic polynomials, to compare them experimentally and to reveal the most effective ones. Some of the methods have been realized and the experiments have been made. In Section 4 the parallel algorithm for computation of characteristic polynomials of integer and polynomial matrices is described. In Section 5 results of the experiments with the parallel algorithms are discussed,

2 Application of the method of homomorphic images for characteristic polynomials computation

The method of homomorphic images is described in work [10]. We will apply to the method of homomorphic images for computation of characteristic polynomials of polynomial matrices of many variables.

The general circuit of a method of the homomorphic images applied to a ring of polynomials of many variables Z[xi,... , xt] is the following.

Let A = (aMV(x1,... ,xt)), 1 ^ ^ ^ n, 1 ^ v ^ n, be a polynomial matrix,

A e Znxn[x1,... , xt,y], f = (—1)ra(yra+^n=1 f (x1,..., xt)yn-i) is its characteristic polynomial, f e Z[x1,... ,xt,y].

Let ms be a hight degree of a variable xs, 1 ^ s ^ t in polynomials fj, 1 ^ i ^ n and P be a greatest absolute value of numerical coefficients,

0) Let’s choose h prime numbers: p1,... ,ph so that the inequality was fulfilled

log2 P < log2(p1 ■ ■ ■ ph). Then we will pass to homomorphic images of elements aM,v at mappings

Z[x1,... ,xt] ^ Z[x1,... ,xt]/pjZ[x1,... ,xt].

Denote

Z[x1,... ,xt]/pjZ[x1,... ,xt] = ZPi[x1,... ,xt].

1) Let’s choose mt polynomials: xt,xt — 1,...,xt — (mt — 1), Then we will pass to homomorphic images of elements aM,v at mappings

ZPi[x1,... ,xt] ^ ZPi[x1,... ,xt]/(xt — j)Zpi[x1,... ,xt],

(0 ^ j ^ mt — 1). The following isomorphism takes place

ZPi[x1,... ,xt]/(xt — j)Zpi[x1,... ,xt] ~ Zpi[x1,... ,xt-1],

2) Let’s choose mt-1 polynomials: xt-1,xt-1 — 1,... , xt-1 — (mt-1 — 1), Then we will pass to homomorphic images of elements aM,v at mappings

Zpi [x1, . . . ,xt-1] ^ Zpi [x1 ,...,xt-1]/(xt-1 — j) ~ Zpi [x1,... ,xt-2],

(0 ^ j ^ mt-1 — 1).

Let’s continue toeonstruet similarly the homomorphic images of elements aM,v for each variable xs, s = t — 2., 1, As a result we will pass to homo morph ic images in Zpi and we will obtain hm1m2 ■ ■ ■ mt matrices

Mij1'''jt e Znxn, 1 ^ i ^ h, 1 ^ j1 ^ m1, ..., 1 ^ jt ^ mt.

Let’s calculate characteristic polynomials of matrices by means of some algorithm

over a finite field. We will obtain hm1m2 ■ ■ ■ mt polynomials fij^j (y), 1 ^ i ^ h, 1 ^ j ^ m1, ..., 1 ^ jt ^ mt.

Computation of a required characteristic polynomial is found by means of the Chinese remainder theorem upside-down, mt

{fjji '"jt—11 (y),fjji '''jt—12 (y),..., fij1 '''jt—1mt (y)}

of the polynomial fjj1'„jt—1 (xt,y) we will restore this polynomial by means of the Chinese remainder theorem.

Also starting with mt-1 images

{fij1'-jt—2 1 (xt, y) , fij1'''jt—22(xt, y) , . . . , fij1'''jt—2mt—1 (xt,y)}

of the polynomial fjj1'„jt—2 (xt-1,xt,y) we will restore this polynomial. And so on.

Having fulfilled restoring on all variables, we will obtain k polynomials

{F1(x1,... ,xt,y),... ,Fh(x1,... ,xt,y)}, in factor rings Zp1,..., Zph accordingly,

Ah

3 Upper bound of coefficients of characteristic polynomials over rings Z and Z[x1,..., xt]

For program realization of modular algorithm for computing of a characteristic polynomial of a matrix is necessary to know numerical modules p1,... ,ph and polynomial modules x1,x1 — 1,..., x1 — (m1 — 1);...; xt, xt — 1,..., xt — (mt — 1) for a polynomial matrix ,

The best upper bound which is known today for the coefficients of a characteristic polynomial of an integer matrix is obtained in the work [11]. According to that the number of bits in the coefficients of a characteristic polynomial does not exceed

n

^n,a = 2(log2 n + 2log2 a + 0, 22),

when n is order of the matrix, a is the greatest absolute value for numerical coefficients of matrix elements.

The array of prime numbers is supposed to be set. Choosing from that the prime numbers and calculating their product it is easy to choose sufficient number h of modules. The inequality

^n,a ^ log2(pip2 ••• Ph ) (1)

must be fulfilled.

Let’s find upper bound for coefficients of a characteristic polynomial of a polynomial matrix from one variable [12],

Let F(x, y) = Elio j gj j yj ■ Numerical modules p1,... ,ph should be selected so

that p1 ■ ■ ■ ph > max |$ÿ |,

Let || f || be a norm of a polvnomial f, It is the greatest absolute value for numerical coefficients of the polynomial f ,

Let A = (aj(x)), 1 ^ i ^ n, 1 ^ j ^ n, d = max{deg aj (x)} ,

||A|| = max{||aj||} = a for 1 ^ i ^ n, 1 ^ j ^ n; s(A) = max{s(aj(x))} = t.

Theorem 1 Let

F (x,y) = yn + fi(x)yn-1 + ■ ■■ + fn(x)

be the characteristic polynomial of matrix A(x) and m = max{deg f1(x),..., degfn(x)} + 1.

Then m ^ nd +1 and for the greatest on the module of numerical coefficient of a polynomial F(x,y) the inequality is carried out

fog2 ||F(^y)!! ^ n(log2 n + log2 a + log2 t) - log2 t. (2)

Proof 1 The polynomial /¿(x) is a sum of all the (n — i) x (n — i) diagonal minors of A(x). Therefore m ^ nd + 1.

In order to find an upper bound for ||F(x,y)|| we use Leverrie-Faddeev’s algorithm ¡2]:

Bo = E ; i = 1,..., n :

{Ai = ABi-1 ; fi = (1/i)TrAi;

Bi = Ai — fiE}.

For matrix Bj we consider two norms ||Bj||d and ||Bj||n. ||Bj||d is ifte greatest absolute value for numerical coefficients of 'matrix Bj which elements are on the main diagonal. ||Bj||n is the greatest absolute value for numerical coefficients of matrix Bj which elements are not on the main diagonal.

For i = 1 : ||A11| ^ a, ||f1|| ^ na, ||B1||ra ^ a, ||B1 ||d ^ (n + 1)a .

For i > 1 : min{s(Bj-1),t} = t. Then || fj|| ^ (n/i) || Ai|| /

||Aj|| ^ (n — 1)ta||Bj-1 ||n + ta||Bj-1 ||d ;

||Bi-1|U = ||Ai-1|^ ||Bi-1 ||d = ||Aj-1|| + ||fj— 1|| •

Then, ||Aj|| ^ ta(n||Aj-1|| + ||fj-1||) ^ tan(i/(i — 1))||Aj-1|| . Then ||Aj|| ^ nj-1tj-1 aji and ||ft|| ^ njtj-1aj for i > 1.

||fj|| is greatest when i = n . ||fj|| ^ nratra-1ara . Taking the logarithm of the last inequality by the basis of 2 , we btain the upper bound for numerical coefficients of characteristic polynomials of polynomial matrices of one variable (2).

Remark 1 If polynomial f of one variable is dense then s(f ) = d + 1.

Remark 2 The formula (2) is true for a dense polynomial matrix of many variables. The number of polynomial modules is calculated for each variable x1,..., xt: m, = ndj + 1, where di is the hight degree of the variable x, , i = 1,..., t.

4 Parallel algorithms for computing the characteristic polynomials which are based on the method of homomorphic images

4.1 The circuit of data communication

The matrix A G Zraxra[x1,..., xt] and the number boundlev are input data for the parallel algorithm. The parameter boundlev is number of levels of the algorithm tree. This number depends on the task (the order and matrix coefficients) and on the computing cluster. The set

h

of numerical modules and the quantity of polynomial modules on each variable x1,... ,xt is calculated.

A graph of the algorithm is a binary tree which is presented in a figure 1. Horizontal lines are divide the graph into levels. At the first level there is only a root, at the second level there are its two daughter nodes, at the third level there are their daughter nodes etc.

Fig. 1. The graph of the algorithm

In the input root receives the matrix A and the array

intervals = {[1, mi],..., [1, mt], [1, k]}.

In the array intervals first two numbers [1, m1] correspond to polynomial modules x1,x1 — 1,x1 — 2,..,x1 — (m1 — 1), second two numbers [1,m2] correspond to polynomial modules x2,x2 — 1,x2 — 2,..,x2 — (m2 — 1) and so on, and the last two numbers [1,h] correspond to numerical modules [p1,p2, ..,ph]. As a result of calculations in the root we will receive a

A

A

intervals = {[1,m1],..., [1,mt], [1, h1]} (for the left node) or intervals = {[1,m1],..., [1,mt], [h1 + 1, h} (for the right node), where h1 = [(1 + h)/2j , Thus each daughter node has half of

A

a factor-ring module of products of all modules received from the root. For the left node the module is p1p2 ■ ■ ■ phl, for the right node - phl+1 ph1+2 ■ ■ ■ ■

Each node at level 2 also divide the array of modules half-and-half and sends to their daughter nodes to level 3, This process proceeds, while there are free processors and on each processor is available more than one module.

The graph of the algorithm has 2 types of nodes. Nodes of 1st type correspond to numerical modules and are designated in figures by squares. Nodes of 2nd type correspond to polynomial modules and are designated in figures by circles.

Node of 1st type.

The node of 1st type with daughter nodes is shown in a figure 2,

^ A, intervals Fj! ¡2 ^ ^ A, intervals u ^

Fig. 2. Node of 1st type

The node of 1st type on an input receives a matrix A and the array intervals = {[1,m1],..., [1,mt], [i1, i2]}, Numbers i1 and i2 set the first and last prime numbers from the list {p1,... ,ph} ,

The node of 1st type builds a polynomial Fi1i2(x1,... ,xt,y) on the remainders received from daughter nodes. As a result it returns the polynomial Fi1i2 which is the characteristic polynomial module {p1,... ,ph} ,

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

The node of 1st type sends numerical modules to two daughter nodes. The left node on an input receives a matrix A and the array intervals1 = {[1,m1],..., [1,mt], [i1,is]}, the right node - the matrix A and the array intervals2 = {[1, m1],..., [1, mt], [is + 1, i2]} , where is = [(i1 + i2)/2j , If the daughter node has received only one numerical module it is node of 2nd type, else it is node of 1st type.

The node of 1st type receives two polynomials Fil,is(x1,... ,xt,y) G Zd1[x1,... ,xt,y] and Fis+1,j2 (x1,..., xt, y) G Zd2[x1,..., xt, y] from daughter nodes, where d1 = pi1 ■ ■ ■ and d2 = pis+1 ■ ■ ■ pi2 ■ After reception of these polynomials the father node computes the polynomial Fi1i2 (x1, . . . , xt, y) G ZPi1 •••pi2 [x1, . . . , xt, y] .

Node of 2nd type.

The node of 2nd type with daughter nodes is shown in a figure 3,

^ A, intervals Fr ¡2 ^

Fig. 3. Node of 2nd type

A

intervals = {[1,m1],..., [bs,es], [js+1, js+1],..., [i,i]}

s i1 i2

{p1,... ,ph}. s denotes number of an active value, i gives number of a prime number from the set of modules {p1,... ,ph} , the interval [bs, es] gives polynomial modules of the active value .If v < s then [bv, ev ] = [1,mv ] .If v > s then [bv , ev ] = [jv ,jv ], i.e. the interval [bv, ev ] contains one module — (jv + 1),

The node of 2nd type builds a polynomial Fbses G Zp.[x1,..., xs, y] on the polynomial remainders received from daughter nodes.

Daughter nodes for a node of 2nd type are nodes of 2nd type. The left node on an input receives a matrix A , the array intervals1 = {[1, m1],..., [bs, hs], [js+1, js+1],..., [i, i]} and the number r1 of an active value, where hs = (bs + es)/2. If bs < hs then the number r1 = s else r1 = s — 1. The right node - the matrix A, the array intervals2 = {[1,m1],..., [hs +

1,es], [js+1, js+1],..., [i,i]} and the number r2 .If hs + 1 < es then the number r2 = s else r2 = s — 1

The node of 2nd type receives two polynomials

/1 G Zp. [X1,... ,xr1,y] and /2 G Zp. [x1,... ,xr2,y]

from daughter nodes. After reception of these polynomials the father node computes the polynomial (/¿) G Zp. [X1,..., xs, y].

4.2 Parallel algorithm

Let A G Znxn[x1,..., xt^ k be the number of processors;

the function numbO/Mod() compute the number of polynomial and numerical modules. The function numbO/Mod() uses formulas (1), (2);

the function send(a, b,..., c, i) send a data a, b,..., c from the current processor to the processor i

the function recv(a,b,...,c,i) receive a d at a a, b, ...,c on the current processor from the

i

the function go_down(intervals, r, boundlev) divide the task into two parts; the function charPol(A, intervals, r) compute on one processor a polvnomial g on one processor g fulfills if a characteristic polynomial of a matrix A to take product of module intervals

the function recoveryNewton(/1, /2, r1, r2) compute by means the Chinese remainder theorem / /1 /2

boundlev

on characteristics of a computing cluster. For the given task boundlev = log2 k because the graph of the considered algorithm will be a binary tree.

Let’s describe a method go^down , go_down(intervals, r, boundlev) {

s = [(intervals [r] + intervals [r + 1])/2_|; intervals1 = intervals; r1 = r; intervals2 = intervals; r2 = r; intervals1[r + 1] = s; intervals2[r] = s;

if (s == intervals1[r] + 1) r 1— = 2; if (s == intervals2[r + 1] — 1) r2— = 2; boundlev — —;

}

2 log2 k

on processors at each level,

1) Processor 0.

boundlev = log2 k;

intervals[ ] = numbOfMod();

r = k == 1? 2(t + 1) — 3 : 2(t + 1) — 1;

go_down(intervals2, r2, boundlev);

send(A, intervals2, r2, boundlev, k/2);

i) (i = 2,..., log2 k). Processors jk/2j-1, j = 0,..., 2j-1 — 1, recv(A, intervals, r, boundlev, (j — 1)k/2j-1 ) for odd j ; go_down(intervals, r, boundlev); send(A, intervals2, r2, boundlev, (2j + 1)k/2j);

1 + log2 k) Processors j , j = 0,1,..., k — 1,

recv(A, intervals, r, boundlev, j — 1) for odd j ; f = charPol(A, intervals, r); send(f, j — 1) for odd j ;

i + log2 k) (i = 2,..., log2 k). Processors j2% j = 0,..., k/2j — 1,

recv(f2, (2j — 1 )2j-2); f = recoveryNewton(f1, f2, r1, r2); send(f, (2j + 1)2j) for odd j;

Remark 3 If a matrix A G Znxn then intervals = [1, h] .

5 Experiments with the parallel algorithm

Experiments that were hold with characteristic polynomials of dense matrices of a numbers and a polynomials were computed had been made. Elements of matrices got out in a random way. All numerical coefficients have the equal number of bits.

For estimation of efficiency of parallel algorithms we enter the concept of an efficiency. Let tk be the computation time of the algorithm for the cluster with k processors. At transition

from the cluster with n processors to the cluster with k processors, k > n, the efficiency is equal 100%, When tn/tk = k/n. The efficiency is equal to zero, when tk = tn, To define an efficiency of computations at other values tn/tk we define the efficiency as the time function tk ■

k

m processors is the function

“m.k = tr/tk -11 ■ 100%.

k/m - 1

In experiments 1 and 2 we used two parallel algorithms (algorithm N and algorithm D), Algorithm D computes the characteristic polynomial of a matrix in a finite field with the help of Danilewsky’s algorithm [7], algorithm N - with the help of an algorithm in the work [6].

Experiment 1 on a supercomputer MVSlOOk of Joint Supercomputer Center of the EAS were made [13].

In the experiment we used dense integer matrix. The size of a matrix is 1000 x 1000, The number of processors is from 16 to 512 ,

The time and the efficiency of computations are presented in the table 1,

Table 1

The time and the efficiency of computations with the help of algorithms N and D for matrices

of an order 1000 x 1000 and log2 a = 7 bits

Quantity of processors Algorithm N Algorithm D

Time tk, s Efficiency аіб,к, % Time tk, s Efficiency a16.k, %

16 1849 1507

32 921 100 764 97

64 562 76 386 96

100 522 48 364 59

127 500 38 355 46

128 310 70 226 80

175 267 59 220 58

255 239 45 167 53

256 166 67 127 72

350 162 49 122 54

400 113 64 89 66

512 113 49 83 55

Apparently from table 1 if quantity of processors are divisible by 2p computation time extremely decreases where p - natural number.

Experiment 2 was hold with a cluster from 16 processors of Intel Xeon 3 GHz, 1 Gb, installed in a laboratory of algebraic calculations of the Tambov State University named after G.E. Derzhavin, In the experiment we used dense integer matrix. The size of a matrix is 400 x 400 and if a is a largest absolute value for coefficients of matrix then log2 a = 20 bits. The number of processors is from 2 to 16. The time and the efficiency of computations are presented in the table 2,

Table 2

The time and the efficiency of computation with the help of algorithms N and D for matrices

of an order 400 x 400 and log2 a = 20 bits

Quantity of processors Algorithm N Algorithm D

Time tk, s Efficiency a2,k, % Time tk, s Efficiency a2,k, %

2 2740 1648

4 1369 100 816 102

6 1268 58 833 48

8 691 98 416 98

10 660 78 429 71

12 644 65 426 57

14 660 52 427 47

16 359 94 222 91

Experiments have shown that efficiency (Table 2) is in limits from 50 % to 98 %. The best efficiency is reached, when the number of processors is a degree of number 2 ,

In experiments 3 and 4 with polynomial matrices we used the parallel algorithm D which computes the characteristic polynomial in a finite field with the help of Danilewsky’s algorithm [7]. These experiments were hold with a supercomputer MVSlOOk of Joint Supercomputer Center of the EAS,

In experiment 3 we used dense polynomial matrix of two variables: s = [2, 2], a =10 bits, n = 50. In experiment 4 we used dense polynomial matrix of two variables: s = [1], a =10 bits, n = 400 ,

The times and the efficiency of computations are presented in the table 3,

Table 3

The time and the efficiency of computations with the help of algorithms N and D for polynomial matrices of an order n x n, b =10 bits is largest absolute value for numerical coefficients of matrix elements, m1,..., mt is a hight degrees of variables x\,... ,xt

n = 50, m1 = 2, m2 = 2 n = 400, m1 = 1

Quantity of processors Time, s tk Efficiency, % a1,k Quantity of processors Time, s tk Efficiency, % a1,k

1 16558 16 14514

2 8676 91 32 8178 77

4 4548 88 64 5101 61

8 2651 75 128 2882 57

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

16 1626 61 256 2046 40

32 1146 43 512 1576 26

64 748 34 1024 1445 14

128 513 25 2048 1354 7

256 510 12 4096 1316 3

Experiments show that while increase in the number of processors increases the efficiency of calculations decreases. Then further multisequencing does not become favorable to some

number of processors. Transferred blocks become so small that transfer time comes close to computation time at boundary level. For example, for experiment 3 it is ineffective to use the considered parallel algorithm for computing the characteristic polynomials with usage of 256 and more processors as computation time does not decrease. The best computation time for

50 x 50 b = 10

coefficients of matrix elements, [2, 2] of highest degrees of variables x1, x2 ) on cluster MVSIOOh

400 x 400 b = 10 [1]

x1

6 Conclusion

Parallel implementation of algorithms allows to compute characteristic polynomials for matrices of a big size. Therefore, it is important to construct effective parallel algorithms. Modular arithmetics allows to make it as calculations on each module independently upon each other. If algorithms based on the method of homomorphic images over a finite field use the best according to the number of operations algorithms for calculation of characteristic polynomials it is possible to obtain effective parallel algorithms.

There were developed the parallel programs which realize two algorithms (algorithm N and algorithm D) of computation of characteristic polynomials for numerical matrices and one algorithm of computation of characteristic polynomials for polynomial matrices of many variables , Algorithm N in a finite field uses the algorithm from the work [13] which has the best estimation of ring operations ( ~ 7/3n3), Algorithm D uses Danilewskv’s algorithm [7] which has ~ 2n3 operations in a finite field. Graphs of algorithms N and D are binary trees. Therefore it is effective to use the parallel computer which has 2p processors. Really experiments showed that the efficiency of computations is the greatest at transition from 2p to 2p+1 processors, it is 75% - 94% , Experiments showed that computation time of characteristic polynomials of matrices by the algorithm D is 20-60% less, than on algorithm N,

Taking into account the obtained results of experiments in the ring of integers for calculation of characteristic polynomials of matrices in the ring of polynomials the algorithm which uses Danilewskv’s algorithm in a finite field has been realized. Experiments show that at increase of the number of processors the efficiency of calculations decreases. If the number of processors increases, the number of transfers also increases, and the size of calculations at boundary level decreases. For some number of processors the sending time will be equal to computation time at boundary level, further parallelization is not effective. For characteristic polynomials computing

50 x 50

2 and the greatest absolute value of numerical coefficients has 10 bits, it is not effective to use the considered parallel algorithm on 128 and more processors. For matrices of the size 400 x 400 10

numerical coefficients - on 512 and more processors.

The considered algorithms for computing of characteristic polynomials for matrices over a ring of integers and over a ring of polynomials showed good scalability. It is supposed to realize algorithms over a finite field using Keller-Gehrig’s algorithm [8] and Pernet-Storjohann algorithm (2007) [9], to compare them with algorithms already realized and to reveal the most effective algorithms.

References

1. Le Verrier U.J.J. Sur les variations séculaires des éléments elliptiques des sept planètes principales: Mercure, Venus, La Terre, Mars, Jupiter, Saturne et Uranus//J, de Mathématiques Pures et Appliquées, 1840, N, 4, P, 220-254,

2. Faddeev D.K., Faddeeva V.N. Computational methods of linear algebra, San Francisco: W.H, Freeman, 1963,

3. Knut D. The art of computing programming, V, 2, M,, 1977,

4. Seifullin T.R. Computation of determinants, adjoint matrices, and characteristic polynomials without division//Cybernetics and Systems Analysis, 2003, V, 39, N, 6, P. 805815.

5. Mala-schonok G.Ï. A computation of the characteristic polynomial of an endomorphism of a free module//Zapiski Nauehnyh Seminarov POMI. 1999. V. 258. P. 101-114.

6. Pereslavtseva O.N. Method for computing of matrix characteristic polvnomial//Tambov University Reports. Natural and Technical Sciences. 2008. V. 13. Issue 1. 2008. P. 131-133.

7. Danilewsky A.M. About numerical solution of a secular equation//Ree, Math. 1937. V. 2(44). N. 1. P. 169-172.

8. Keller-Gehrig W. Fast algorithms for the characteristic polynomial//Theoretical computer science. 1985. V. 36. P. 309-317.

9. Pernet C., Storjohann A. Faster algorithms for the characteristic polynomial//ISSAC, 2007. P. 307-314.

10. Buchberger B., Collins G. E., Loos R. Computer Algebra - Symbolic and Algebraic Computation. Vienna; New York: Springer-Verlag, 1982.

11. Dumas J.-G., Pernet C., Wan Z. Efficient Computation of the Characteristic Polynomial // ISSAC’05, July 24-27, 2005, Beijing, China, Beijing, 2005. P. 140-147.

12. Pereslavtseva O.N. Computation of characteristic polynomials for matrices over polynomial ring//International Conference Polynomial Computer Algebra. St. Petersburg, PDMI RAS, 2009. P. 35-39.

13. Pereslavtseva O.N. On the computation of characteristic polynomial eoeffieients//Numerieal Methods and Programming. 2008. V. 9. P. 366-370. URL: http://num-meth.srcc.msu.ru/.

GRATITUDES: Supported by the Sei. Program Devel. Sei. Potent. High. School, RNP 2.1.1.1853.

Accepted for publication 7.06.2010.

ПАРАЛЛЕЛЬНЫЙ АЛГОРИТМ ВЫЧИСЛЕНИЯ ХАРАКТЕРИСТИЧЕСКИХ ПОЛИНОМОВ МАТРИЦ, ОСНОВАННЫЙ НА

МЕТОДЕ ГОМОМОРФНЫХ ОБРАЗОВ

© Оксана Николаевна Переславцева

Тамбовский государственный университет им. Г.Р. Державина, Интернациональная, 33, Тамбов, 392000, Россия, программист лаборатории алгебраических вычислений,

e-mail: [email protected]

Ключевые слова: вычисление характеристических полиномов матриц; параллельный алгоритм; метод гомоморфных образов; кластер.

Предлагаются параллельные алгоритмы для вычисления характеристических полиномов целочисленных и полиномиальных матриц. Данные алгоритмы основаны на методе гомоморфных образов, примененном как к кольцу целых чисел, так и к кольцу полиномов многих переменных. Для применения метода гомоморфных образов находится верхняя оценка числовых коэффициентов характеристического полинома. Обсуждаются результаты экспериментов с параллельными алгоритмами, проведенных на кластере МВС-100К в МСЦ РАН.

i Надоели баннеры? Вы всегда можете отключить рекламу.