Кусочно-полиномиальная аппроксимация с сокращенными таблицами и гарантированной точностью

Салищев Сергей Игоревич

разраБотна^^^^ аппаратным платформ Л^Л

Салищев Сергей Игоревич

УДК 519.651, 004.315

КУСОЧНО-ПОЛИНОМИАЛЬНАЯ АППРОКСИМАЦИЯ С СОКРАЩЕННЫМИ ТАБЛИЦАМИ И ГАРАНТИРОВАННОЙ ТОЧНОСТЬЮ

Аннотация

В статье предлагаются улучшения аппаратной реализации кусочно-полиномиальной аппроксимации в полупроводниковых устройствах. Предлагается новая оценка ошибки интерполяции полиномами малой степени с поточечными ограничениями на равноотстоящих узлах. Предлагается метод уменьшения размера таблиц и почти оптимальной квантизации коэффициентов с использованием межсегментных ограничений и смешанного целочисленного программирования, обеспечивающий заданную точность аппроксимации. Демонстрируется 60 % сокращение размера таблиц по сравнению с методом без использования межсегментных ограничений. Результаты логического синтеза полупроводниковой схемы демонстрируют существенное влияние уменьшения размера таблиц на площадь устройства.

Ключевые слова: кусочно-полиномиальная аппроксимация, интерполяция, числа Лебега, логический синтез полупроводниковых схем.

1. INTRODUCTION

Hardware blocks for calculating smooth functions arecommon in many hardware designs related to DSP, 2D and 3D graphics and computer vision. Industrial component libraries contain implementations for such blocks [6; 7]. Piecewise polynomial approximation is an architecture of choice [5; 8] for computation accuracies higher than 18 bits as it providesa good trade-off between latency and area.

On each interval the target function isusually approximated by min-max or orthogonal interpolation polynomials as these polynomials provide strong error bounds for quantized coefficients [1; 4].

It's desirable to apply additional constraints on the values and derivatives of polynomials on the boundaries of adjacent segments. It

allows sharing the table data between segments reducing the table size. This complicates the quantization problem as the constrained polynomial may be quite far from min-max.The quantization effects on polynomial coefficients and computations should be accounted analytically, which is a complex mathematical task, or through exhaustive verification, which is impossible for high accuracies.

Strollo et al. [5] propose an effective empirical method for finding optimal table bit-width for piecewise polynomials with constraints on 2 adjacent segments and pointwise constraints on polynomial values on a uniform grid using mixed integer programming. Constraints on adjacent segments allow data sharing between segments and reduce table size by up to 40 %.

In this paper a strong posterior error bound is provided for any polynomial approximation

Canutes C.H.

of a function with limited second derivative on a closed interval with pointwise constraints on polynomial values on a uniform grid. It allows extending method [5] with additional constraints on the polynomial values and derivatives for further table size reduction.The new error bound guarantees method accuracy for conservative data path quantization. So it can be used without exhaustive verification and so it is applicable for arbitrarily high bit-width.

New constraints on polynomial values and derivatives on 4 adjacent segments are pro-posed.The method has been evaluated using the quadratic piecewise interpolation for a couple of elementary functions. The case study shows that applying both types of constraints can save up to 60 % of table bit-width compared to the unconstrained case. The architecture for one function was synthesized from SystemC to gate level using High-Level Synthesis and RTL synthesis tools. The comparison shows that additional constraints lead to noticeable design area reduction compared to the method with only 2 segment constraints.

Section 2 gives an overview of piecewise polynomial approximation and error analysis. Section 3 provides a description of the proposed algorithm, a posterior error bound and constraints on multiple segments for table size reduction. Section 4 describes implementation results for a trigonometric function and a natural logarithm for accuracies 24-32 bits. Section 5 compares the gate level synthesis results for different constraints and architectures for trigonometric functions. Section 6 summarizes the results and section 7 concludes the paper.

2. PIECEWISE POLYNOMIAL APPROXIMATION

The typical formulation of the function approximation problem is the following: for given f (x) defined on [a, b] we need to build g(x), with the error strictly less than 1 unit in last place of fixed-point binary representation of the result.

||f - g|| < a, a = 1 ulp, ulp = 2-k. (1)

Here and below IIfII = I\f\I „ = supxe Jf(x)|.

The piecewise polynomial approximation is a low-order polynomial approximation on multiple segments with different polynomial on each interval. The polynomial coefficients are tabulated. We map each segment on [-1, 1].

f ( x) = f

ai+1 + a,

2

+ x-

a

i+i

xe [-1,1], ie [0.M-

2

-1L

N k k=0 c'kx

(2)

a0 = a, aM = b, at+1 > at. (3)

Here M is a number of segments, fi (x) is a remapped piece of f(x) on a segment with index i.

We need to implement architecture for calculating polynomial pt (x) = ^N=( approximating fi (x), x e [-1,1], where N is an approximation order. The set of [pt (x)} should be optimal in terms of table size.

The multiplications and additions in datapath for polynomial calculation should be quantized. Let gi (x) be a quantized polynomial approximation on sub-interval index i, then the error can be represented as

\fi <?i|| £| |f pi || |pi <?i|| e meth + e quant ,

i e [0.M -1]. (4)

Here emeth is the error bound for the approximation method calculated accurately and equant is the bound for error introduced by limited precision of implementation. The task of data-path quantization can be solved separately either manually or by an auto-quantization tool. The data-path quantization is conservative if the following inequity holds

equant + emeth < a . (5)

In this case the resulting architecture will have guaranteed accuracy and the exhaustive verification will not be required. So the approach can be used for arbitrarily high precisions. For the datapath calculating a class of non-constant polynomials the quantization error includes errors for final rounding to 1 ulp and intermediate term quantization using l guard bits. Here we assume no quantization of x, so we calculate gi(x) only in exactly representable input points

quant

>a (1 + 2-). 2

(6)

a

We need to choose the number of guard bits l and approximation {p(x)} fulfilling emeth requirement

Emeh <"(1 - ) . (7)

Then we need to find the datapath quantization fulfilling (5). As the points with maximum error for {p(x)} and for data-path quantization rarely coincide, it's possible to break the conservative requirement and further reduce the width of data-path intermediate types. Doing so requires exhaustive verification and is not applicable for high accuracy due to the test bench running time.

3. APPROXIMATION POLYNOMIALS WITH POINTWISE VALUE CONSTRAINTS

Let's consider the polynomial p(x) = ckxk approximating f (x) on the interval [-1, 1] with error e.

\\f - p = e. (8)

In the context of piecewise approximation p (x) is an approximation polynomial for one segment. We would like to represent its coefficients with minimal bit-width. This problem is hard to solve. We consider a simpler problem by applying the constraint only to a limited number of points P + x:

max f (xt )"

= 1, x,-

' P(xi ) =e0 , X0 1 '

xc

vi+1 > Xi, ig [ал.

(9)

Section 3.1 describes the way to estimate e independently ofp (x),

e</+e0 =emeth . (10)

We can provide a constructive procedure for choosing p (x) and e0 to fulfill the method requirements (7):

1. Choose piece-wise approximation segment and f(x) for this segment.

2. Find e* = minmax | f (x,) -p*(x,) |,

* N *

p (x) = Xk_0ckxk using Linear Programming (LP).

3. Calculate g using (28).

4. Choose the number of guard bits l such

that e* <-(1 -2-l)-g. 2

5. If cannot complete step 4 increase P or reduce segment size, repeat form step 1.

a i

6. Set £0 = y(1 - 2 )-g.

7. Choose {dk} c Z fractional word lengths for coefficients {ck}.

8. Find {ck} for which {2 dkck} c Z and | f(x ) - p(xt)|<e0 using mixed integer programming.

9. Repeat from step 7 until minimal total

1

is

bit-widthis w* = min w, w = ^flog2 2dk cl

k=0

found using branch-and-bound strategy.

*

10. If w is infeasible for implementation, increase P or reduce segment size, repeat form step 1.

By construction the above algorithm provides near optimal bit-width coefficients {ck} of approximation polynomial for a given a. Supposing small changes of ck between optimization steps we can achieve good results

by optimizing d = ^N=o dk instead of w.

Details on applying LP are given in section 3.2. Section 3.3 describes a technique of table size reduction by applying additional constraints to LP.

3.1. ERROR BOUND FOR UNIFORM GRID

To make the previous algorithm work we need to estimate g which limits the deviation of between known points. Let, f have continuous first and second derivatives.

N k

CX approxi-

Lemma 1. If p(x) = ^k=0 ck mates f (x) on a set of interpolation nodes {xk d on [-1, 1] then

II/ Pl|£

2 N

(N - 2)!

f 1 + lN+1),2 ({x,'}) max f (xi) - p(xi ^ .

(11)

Here {1r,v }V=0 is a set of generalized Lebesgue constants characterizing the set of interpolation nodes {xk }kN=+11, the usual Lebesgue constant is 1r =1r,0 and r = N + 1

= supxe[-1,1] Xk=l'

%( x)

0 < v < r , r > 1, x g [-1,1].

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(12)

lr ,k ( x) =-

lr (x)

(x - xk )l'r (xk) '

lr (x) = nk=i(x - Xk) , (13)

Here {Ir,k (x)} is a set of fundamental polynomials of Lagrange interpolationon {X }i=+1 nodes.

Proof:

The proof is due to equation (10.10) in [4].

Lemma 2. Derivatives of fundamental polynomials of Lagrange interpolation have a form

l'rk (X) = X 1=1 l(r -1),k (x) =

,m ,(x) l(r-1),k

m=1,m=k

( Xm Xk )

x-x

lrk (x). (14)

Here l

(r-1),k

K,2 =X

k=1

m=1,m*kq=1,q*m,q*k (Xm — Xk)(Xq — Xm )

(16)

For 3 equidistant nodes on [-1, 1], |xm - xq| > 1, m * q and /u = 1.

For cubic case computations are slightly more difficult

K4,2 = X

k=1

4 4 lkm (x)

m=1, m*kq=1, q, q*k (Xm — Xk )(Xq — Xm )

(17)

Here /2k;qm is a first order fundamental polynomial with 2 interpolation nodes removed.

\xk xq

£ 3 , q * k (18)

2

l2,k =

x - xq

xk - xq

£ 3 , q * k .

(19)

(x) is a fundamental polynomial of lower order interpolation with one node removed.

Proof is by definition of fundamental polynomials (13).

Lemma 3. For 3 and 4 equidistant nodes on [-1, 1] used in quadratic and cubic interpolation Lebesgue constants for the second derivative have the following upper bounds

1,2 < 6 , l 4,2 < 162 . (15)

Proof:

For the quadratic case compute the second derivatives of fundamental polynomials using lemma 2, by definition

Lemma 4. Lebesgue constants are invariant for linear transformation of variables. Consider linear mapping between [-1,1] and [a, b].

b - a b + a

x =-s +--

2 2

(20)

The corresponding Lebesgue constant is

K ,v =

/ i. V

b - a

v 2 ,

Kr

(21)

Proof:

Directly follows from definition of the generalized Lebesgue constants (12).

Theorem 1. For [xt} which is uniform grid

on [-1,1] with step 8 = xt+1 — xt, x0 =—1,

xP = 1, there exists g fulfilling (10) which is independent of p (x) and only depends on f (x) and 8:

2N-4d2 82

g = iM -nil 1^11 + ^£01(N+1),2 , (22)

Here 1rv is the generalized Lebesgue constant (12).

Proof:

Consider the point the approximation error is maximal, on a compact it always exists:

ii ii II * * II

H\f — 4 = |\f (x) — p(x )|. (23)

There are 2 possible cases, if it lies on the interval boundary x* e {—1,1}, then e =e0 and g = 0, as -1, 1 are elements of the grid.

Otherwise x * e (—1,1), in this case x is the extreme point of a smooth function and so

f'(x) — p'( x) = 0. (24)

*

Let xk be a nearest point in a grid to x . So the distance between points is

8

s =

x - xk £ .

k2

(25)

1

xm - x

k

Let's consider the first 2 elements of Taylor

*

expansion for xk in point x

f (xk) - p(xk) = f (x*) - p( x*) + (f>) - p>)), * 2

ye [x,xk]. (26)

By replacing the last term with the estimate from lemma 1 we receive

2 N-4 5 2 d2

e <e0 + (N -2)Jf ^ + ^N+1),2 ({x, W .

(27)

Here I0 c [0..P], \I0\ = N +1 is a subset of

indexes defining set of interpolation nodes with minimal Lebesgue constant.

Corollary. For the uniform grid {x, }P=0 on [-1, 1] and interpolation order = 2, 3 and P mod N = 0,

,N-4 ~2

The vector of variables is

2" S и -.„и

e -eQ <g +-f +

0 (N - 2)! 11

S есЯ(n+1), 2

8

1з,2 < 6 , A4,2 < 162 .

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(28)

Proof:

As the grid contains N + 1 equidistant nodes the proposition directly follows from Theorem 1 and Lemma 3.

3.2. LP AND ILP METHODS FOR FINDING POLYNOMIAL COEFFICIENTS

For the sake of simplicity we will consider that the approximation segment is mapped to [0, 1]. Due to Lemma 4 all the above results still apply. We need to solve the following optimization problem.

max| f ( x,. )- p ( xj)| = £,

x0 = 0,xP = 1,xi+1 > x,,i e [0..P] £ ® min, p (x) = ^N=0

iN к

c.x

Jk=Q к

(29)

For using the linear programming solver we need to convert it to the canonical form.

Let V be a Vandermonde matrix of order N of points {x,}.

к={xk r

.}, = 0,k=0- (30)

The vector of function values is

x = {CQ5 CN ,e)={ce}-

(32)

The minimized function is lx, where l = {0,...,0,1}. In these terms the canonical linear problem is

- 1\<i f 1 -1 x < - f . (33)

/ V * / lx ® min

To implement the main algorithm we need to solve the mixed integer programming [2] problem with the additional integer constraint {2d' c,} c Z . As we know the target e we don't need to solve the optimization problem and only need to find a base solution satisfying the constraints. So the usual branch-and-bound method degrades to depth-first-search, which is substantially faster.

3.3. USING LINEAR CONSTRAINTS FOR TABLE SIZE REDUCTION

Strollo et al. [5] show that it's possible to share the table data between 2 adjacent approximation segments by exploiting the smoothness of the approximated function.

We consider an f (x), xe [-1, 1]. It can be decomposed into 2 halves.

\fR (x), x e [0,1]

f(x)=

fL (-x), x e [-1,0] •

(34)

f = {f ( xq),..., f ( xp )}•

(31)

Both fL , fR are defined on [0, 1] . Then we consider polynomial approximations for fL, fR of order N.

p(x)=X=o Pix', q(x)=X=o q,x'.(35)

We are interested in p(x), q(x) which share some coefficients. It means the corresponding derivatives at x = 0 are equal. Case study in [5] shows that for quadratic and cubic case it's possible to share all the coefficients except one of the highest order.

p(v)(0) = q(v)(0), 0 < v < N . (36) It is equivalent to

Cl,v = (-1)vCR,v, 0 < v < N . (37) We employ the mixed integer programming to find the coefficientsof p(x) , q(x) . The canonical for the problem is as follows

V 0

- V

0 V 0

0 -V

-1 -1 -1 -1

e

v y

<

'h ' fR

- fL

- fR

/ \ c,

K

R

e

V y / \ cL

= 0

(38)

(0,0,1)

e

v y

® min

Here K is the derivative equality constraint between adjacent segments. For example K21 for sharing the first 2 coefficients in quadratic approximation

œ1 0 0 -1 0 0 0A 0 10 0 10 0

K2,1 -

(39)

We can exploit the smoothness of f(x) even further by grouping 4 adjacent intervals. Let's considerf (x), x e [0,4]. We can divide it into 4 segments in the following way

/o(i - x), x e [0,1] fl(x -1), x e[1,2] /2(3 - x), x e [2,3] f3(x - 3), xe [3, 4] For these functions we consider approximation polynomials p0, p1, p2, p3. We can employ the following constraints

f ( x) -

(40)

p (0V)(0) - ^(0)

p 2v)(0) - p 3v)(0)

p2v)(1) - p?(D <d

(v)/

0 < v < N . (41)

® min

Improvement by applying the 4 segment constraints vs. 2 segment constraints is smaller compared to 2 segment constraints vs.

no constraint. First of all p(v }(1) contains more than 1 summand. So exploiting the data sharing requires adders in the data path. Also 8v need to be tabulated taking some additional bits. As the highest order coefficients are not changed it's expected that the data sharing will have a

minor effect on the critical path timing. It will be shown in the case study that the data sharing is beneficial for architecture area.

4. IMPLEMENTATION

For the implementation the functions resembling Synopsis DesignWare IP blocks DW_sincos [6] and DW_ln [7] were chosen with fractional accuracy ranging from 24 to 32. The first is a sin(roc) and cos(roc) approximation for x e [-1,1). The second is a natural logarithm ln(x +1) approximation for x e [0,1).

Due to symmetry in trigonometric functions it's only needed to approximate sin(roc) for xe [0,1/2).

For both cases the piecewise quadratic approximation was used with 2 segment constraints,with 4 segment constraints proposed above,and without constraints.A Matlab script was built to generate the tables for both cases.

The following tables show the growth of bit-width of tabulated values with increased accuracy (table 1).

The average bit-width reduction per segment for sin x for 2 segment constraints is 40 %, for 4 segment constraints is 57 % (table 2).

The average bit-width reduction per segment ln x for 2 segment constraints is 40 %, and for 4 segment constraints is 60 %, when compared to the unconstrained case.

5. COMPARISON

For the comparison a block compatible with Synopsys DesignWareDW_sincos [6] was implemented using the quadratic piecewise polynomials on optimal number of segments with 4 segment derivative constraints, 2 segment derivative constraints and cubic piecewise polynomials on 64 segments as described in the Synopsys DesignWare trigonometric architecture overview [8]. In all cases the polynomial values were computed directly without using Horner scheme. A conservative data-path quantization has been used.For quadratic methods the SystemC code was synthesized to gate level RTL (table 3, 4).

c

L

c

R

c

R

Table 1. Sin approximation

Accuracy Guard bits Segments Bits per segmentunconstrained Bits per segment 2 seg. constraints Bits per segment 4 seg. constraints

24 2 128 57 34.5 25

25 4 128 66 40.5 29.5

26 2 256 60 36 24.5

27 2 256 63 38 27.25

28 4 256 72 44 31.75

29 2 512 66 39.5 26.75

30 3 512 72 43.5 30.5

31 4 512 78 47.5 34

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

32 2 1024 72 43 29.75

Table 2. Ln approximation

Accuracy Guard Segments Bits per segment Bits per segment Bits per segment

bits unconstrained 2 seg. constraints 4 seg. constraints

24 2 128 54 32 21.5

25 3 128 60 36 24.5

26 4 128 66 40 27.25

27 2 256 60 35.5 23.25

28 3 256 66 39.5 26.25

29 4 256 72 43.5 29.5

30 2 512 66 39 26.5

31 3 512 72 43 28.5

32 4 512 78 47 32.75

The area comparison in Table 3 shows that the table size has serious impact on the resulting design and that approximations with 4 segment constraints have better area in practice compared to 2 segment constraint designs and cubic design despite additional adders to exploit the data sharing. Table 4 shows that the timing of quadratic polynomial is also smaller by 2230 % vs. cubic. It is due to one less multiplier on critical path. These numbers exhibit significant variability as they are highly sensitive to low level optimizations applied during gate level synthesis.

6. RESULTS

This paper provides a new error bound for the method of finding the piecewise polynomial approximation with finite precision coefficients of optimal bit width and linear constraints on derivatives for cross-segment data sharing proposed in [5]. The error bound allows guaranteed accuracy independent of the additional linear constraints. So the method can be extended with additional constraints and applied to arbitrary accuracy without exhaustive verification.

Table 3. Design area, clk 5ns

Accuracy (bit) Quadratic 4 seg. Quadratic 2 seg. Cubic no constraint

24 55% 72% 100%

32 68% 84% 100%

Table 4. Best timing

Accuracy (bit) Quadratic 4 seg. Quadratic 2 seg. Cubic no constraint

24 70% 74% 100%

32 78% 74% 100%

New linear constraints on derivatives for 4 adjacent segments are proposed.The case study shows that applying additional constraints substantially reduces the design area compared to the 2 segment constraints case and the cubic interpolation case.

7. SUMMARY

Piecewise polynomial approximation is the method of choice for hardware blocks computing smooth functions with fractional accuracies higher than 18 bits due to balance between performance and design complexity. It is used in multiple research papers and in industry strength component libraries.

The main result of this paper is a practical method of building piecewise polynomial approximation with an optimal table bitwidth

for given constraints with a guaranteed accuracy based on solving Integer Linear Programming problem.

In addition to the first and second derivative constraints on 2 adjacent segments, constraints for 4 adjacent segments were added leading to table reduction compared to [5].

For now, only a limited case study has been performed. The results show that the table reduction positively affects the area of the design without noticeable impact on timing.

The effect of datapath quantization hasnot beeninvestigated yet. The manual backward error propagation method was used leading to conservative quantization. It is expected that more aggressive quantization might save a couple of bits from multiplier widths reducing the design area even further.

Bibliography/References

1. Cheney E., Light W. A Course in Approximation Theory, New York: Chelsea, 1999.

2. Gärtner B., Matousek J. Understanding and Using Linear Programming, Berlin: Springer, 2006.

3. Günttner R. On asymptotics for the uniform norms of the Lagrange interpolation polynomials corresponding to extended Chebyshev nodes, SIAM J. Numer. Anal. Vol. 25 (1988). P. 461-469.

4. Lokuzievsky O., Gavrikov M. Numerical Analysis Essentials[in Russian], Moscow: Janus, 1995.

5. Strollo A. G.M., De Caro, D., Petra, N. Elementary Functions Hardware Implementation Using Constrained Piecewise-Polynomial Approximations, Computers, IEEE Transactions on. Vol. 60, № 3. P. 418-432, March 2011.

6. Synopsys DesignWareDW_sincos http://www.synopsys.com/dw/doc.php/doc/dwf/datasheets/ dw sincos.pdf(date 30.10.2012).

7. Synopsys DesignWareDW_ln http://www.synopsys.com/dw/doc.php/doc/dwf/datasheets/dw ln.pdf (date 30.10.2012).

8. Synopsys DesignWare trigonometry overview http://www.synopsys.com/dw/doc.php/doc/dwf/ datasheets/trig overview.pdf (date 30.10.2012).

Abstract

An improvement to the piecewise polynomial approximation in hardware is proposed. A new error bound is given for the low-order polynomial interpolation with pointwise constraints ona uniform grid. A method of table size reduction and near optimal quantization of coefficients using intersegment constraints and mixed integer programmingwith guaranteed accuracy is proposed.A case study shows up to 60% table size reduction compared to unconstrained polynomials. Gate level RTL synthesis shows that table reduction has noticeable impact on the design area.

Keywords: piecewise polynomial approximation, interpolation, Lebesgue numbers, RTL synthesis.

Салищев Сергей Игоревич, старший преподаватель кафедры информатики СПбГУ, инженер лаборатории Intel, sergey. i. salishev @gmail. com.

Кусочно-полиномиальная аппроксимация с сокращенными таблицами и гарантированной точностью Текст научной статьи по специальности «Медицинские технологии»

Аннотация научной статьи по медицинским технологиям, автор научной работы — Салищев Сергей Игоревич

Похожие темы научных работ по медицинским технологиям , автор научной работы — Салищев Сергей Игоревич

Текст научной работы на тему «Кусочно-полиномиальная аппроксимация с сокращенными таблицами и гарантированной точностью»