Научная статья на тему 'Determination of variables significance using estimations of the first-order partial derivative'

Determination of variables significance using estimations of the first-order partial derivative Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
80
33
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
NON-PARAMETRIC KERNEL REGRESSION / FIRST-ORDER PARTIAL DERIVATIVE

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Zablotskaya K., Walter S., Zablotskiy S., Minker W.

In this paper we describe and investigate a method which allows us to detect the most informative features out of all data extracted from a certain data corpus. The significance of input features is estimated as an average absolute value of the first-order partial derivative. The method requires the values of the objective function at the certain assigned points. If there is no possibility to calculate these values (the object is not available for experiments), we use non-parametric kernel regression to approximate them. The algorithm is tested on different simulated objects and is used for investigation of the dependency between linguistic features of spoken utterances and speakers ' capabilities.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Determination of variables significance using estimations of the first-order partial derivative»

4. Уткин В. И. Скользящие режимы и их применения в системах с переменной структурой. М. : Наука, 1974.

5. Kjaer M. A. Sliding Mode Control. Dept. of Autom. Control // Lund Institute of Technology, Sweden, 2004.

6. Slotine J.-J. E., Li W. Applied Nonlinear Control. Englewood Cliffs : Prentice-Hall, 1991.

7. Самарский А. А. Теория разностных схем. М. : Наука, 1989.

E. D. Agafonov

NONPARAMETRIC CONTROL ALGORITHM FOR NONLINEAR DYNAMIC SYSTEMS USING SLIDING MODES

The paper deals with a new control algorithm designed for nonlinear dynamic systems with the use of Sliding Mode control approach. The SISO object is represented by its nonparametric finite differences model in state space. The paper gives recommendations for tuning and optimization of the control algorithm. The proposed control algorithm is implemented in MATLAB/Simulink technical computing software. As an illustrative example we presrnt results of control process of inverted pendulum.

Keywords: nonlinear dynamic system, finite difference model, nonparametric control, Lypunov stability, sliding mode control.

© Агафонов Е. Д., 2010

Библиографические ссылки

1. Воронов А. А., Титов В. К., Новогранов Б. Н. Основы теории автоматического регулирования. М. : Высш. шк., 1977.

2. Медведев А. В. Адаптация в условиях непараметрической неопределенности // Адаптивные системы и их приложения : сб. науч. тр. / Новосибирск : Наука, 1978. С. 4-34.

3. Медведев А. В. Непараметрические системы адаптации. Новосибирск : Наука, 1983.

UDC 519.234

K. Zablotskaya, S. Walter, S. Zablotskiy, W. Minker

DETERMINATION OF VARIABLES SIGNIFICANCE USING ESTIMATIONS OF THE FIRST-ORDER PARTIAL DERIVATIVE

In this paper we describe and investigate a method which allows us to detect the most informative features out of all data extracted from a certain data corpus. The significance of input features is estimated as an average absolute value of the first-order partial derivative. The method requires the values of the objective function at the certain assigned points. If there is no possibility to calculate these values (the object is not available for experiments), we use nonparametric kernel regression to approximate them. The algorithm is tested on different simulated objects and is used for investigation of the dependency between linguistic features of spoken utterances and speakers ’ capabilities.

Keywords: non-parametric kernel regression, first-order partial derivative.

In our research we try to investigate if there is a dependency between spoken utterances of a person and his capabilities. For this purpose we collected a corpus of monologues and dialogues of different speakers [1]. Their verbal intelligence was measured with an intelligence test [2]. Dealing with the corpus, we try to extract relevant information enough for clustering, classification, regression, or other data mining tasks. There are normally lots of different features which could be extracted from the monologues and dialogues, but their importance or relevance is not always obvious. Most of them are noise fields, which make the analysis of data increasingly difficult. When working with high dimensional spaces, the computational effort required by data analysis tools may be tremendous. It is therefore essential to detect ir-

relevant or weakly correlated features and exclude them out of consideration.

There exist different solutions to this problem. One of them is the use of Pearson’s coefficient or the coefficient of multiple correlation. However, if Pearson’s coefficient is close to 0, it does not mean that the output and input variables are not correlated. It just shows that there is no linear dependency between them. Such features should not be excluded out of consideration without additional analysis. Another approach to decrease the number of features is Principal Component Analysis. This method involves a mathematical procedure that transforms correlated variables into a smaller number of uncorrelated ones called principal components. But it does not determine the contribution of a certain feature to an objective function.

In this paper we describe a method which determines the most informative features even if the dependency between input variables and the output is not linear.

Determination of the Most Informative Features. To determine if there is a dependency between input features (or extracted features) and the output, we make a series of experiments on the object (if it is available) or create a model using non-parametric kernel regression and estimate the average first-order partial derivative with respect to each input feature. The feature with the largest average partial derivative is the most important. This algorithm may be described in the following way.

Non-parametric kernel regression (NPR) allows us to create a model using the data set x1[t], ...,xn[t],

y[t], t = 1, s without additional knowledge about the dependency structure [3; 4]. NPR estimates the dependency between inputs and outputs using a weighted average of the observations y[t] :

Mi {Y | x} = y( x ) = —

Шф

Xj - Xt [t ]

where C, - is the bandwidth or smoothing parameter, ®(z) - is a kernel function.

The kernel function assigns weights for each observation. The weighted sum of y[t] estimates the

output at any point x . The parameters Ct determine how many points from the training data set are used for calculating y(x). The observations which are nearer to x have larger weights and are more significant for y(x). If Ct are large, a lot of observations are taken into account and the model is not precise. These parameters should be trained on the existing data set, and Ct providing the smallest mean square error (MSE) are used for other investigations.

Let an object have an input vector x = (xl,x2,...,xn) and an output y = f (x). A feature xt is informative if its average influence on the output is significant, given the other n-1 features fixed. We estimate this significance as an average absolute value of the first-order partial derivative with respect to this variable.

Let the variables x = (x1,x2,...,xn) belong to the intervals [a{;bj, [a2;b2], ..., [an;bn]. We generate random values {xjl],...,x1[m], x2[1],...,x2[m],...} in corresponding intervals, m is a predefined value. To get a precise estimation of the average first-order partial derivative, we generate these random values near one

observation value x[l], l = 1, s :

Шф

rX, [k ] - X, [t ] ^

C

> 0,

for all k = 1,m . Then we fix the features (x2,...,xn) at some points, for example at (x2[1],x3[1],...,xn[1]). The

outputs of the goal function are estimated at the following points:

y+ = f (x,[1]+h+, x2[1],..., xn [1]), y- = f (*[1] - h, x2[1],..., xn [1]),

j>+= f (*[2] + h+, -^2 [1],..., xn [1]),

j>2-= f (xj2] - h2- , *2[1],..., ~xn [1]), .,

where hj+ and hj- are random values from a small interval (for example, h e [0,01; 0,5]).

The first component of the average first-order partial derivative with respect to xl is estimated as:

1 yi+- y- | + 1 У2 ■y- 1 + + | y)I- ym 1

- hi h2 - h2

y+ ■ ^ ,

r m J m

h+ - h~

mm

Jrx1[1] =

1 m

Then the features (x2,..., xn) are fixed at other points, for example at (x2[2], x3[1], ..., xn[1]), and the same procedure is repeated for x1. The average absolute value

of the partial derivative in the neighborhood of x[l] is estimated as:

f*;[1]+£[2]+...+fijM]

fx, (x[l]) = —--------------------------1-,

1 M

where M - the number of all possible combinations, M = m(n-1). As these random values have been generated in the neighborhood of one observation values, only small space is investigated. We generate {^1[1],...,xjm], x2[1],...,x2[m],...} next to another

observation point x[l'] and find f (x[l']) in the same way. This procedure is repeated K times. The average absolute value of the partial derivative f - is estimated

: Д = ^ f’ ( x [/]WK, where K is a predefined value.

as

In the same way the average absolute values of ,

i = 1, n are estimated.

Investigation of the Algorithm. In this section we show the results of the algorithm’s work when the object is not available for experiments, i. e. there is only collected data. In the following experiments the average absolute value of the partial derivative is estimated with M = 2 and K = 20 . The function for simulating the object is given by: f (x) = 5x1 + 0,5x2 - 10x3 + 0,1x4 + 2x5.

In our first experiment, the non-parametric regression model is trained using all the input variables (C = [0,4; 1,7; 0,3; 1,8; 0,8], MSE = 0,08).

Then we take away the first feature x1 from the data set. In this case the situation with incomplete data is simulated, but the most informative features should nevertheless be found (Ct = [1,9; 0,3; 1,9; 0,9],

MSE = 0,39 ). The results of the algorithm are shown in Table 1. As we can see, the algorithm was able to find the most important features in both cases.

t=1 ,=i

t=1 ,=1

Table 1

Results of the algorithm’s work

Features 5 inputs are used 4 inputs are used

f’x (Real ranks) Algorithm’s f’x (Ranks) fX, (Real ranks) Algorithm’s f’x (Ranks)

x¡ 5,0 (2) 3,42 (2) - -

X2 0,5 (4) 1,40 (4) 0,5 (3) 0,39 (3)

X3 10,0 (1) 6,51 (1) 10,0 (1) 7,97 (1)

x4 0,1 (5) 1,20 (5) 0,1 (4) 0,28 (4)

X5 2,0 (3) 2,10 (3) 2,0 (2) 1,39 (2)

Table 2

Results of the algorithm’s work

Features 5 inputs are used 8 inputs are used

fX (Real ranks) Algorithm’s f'x (Ranks) f'x (Real ranks) Algorithm’s f'x (Ranks)

Xl 3,83 (5) 3,15 (5) 3,83 (5) 2,71 (5)

X2 4,23 (4) 3,29 (4) 4,23 (4) 3,24 (4)

X3 4,38 (3) 3,50 (3) 4,38 (3) 4,38 (3)

x4 7,05 (2) 4,24 (2) 7,05 (2) 4,43 (2)

X5 30,0 (1) 31,48 (1) 30,0 (1) 26,23 (1)

X6 - - 0,027 (7) 0,43 (7)

X7 - - 0,021 (8) 0,07 (8)

X8 - - 0,06 (6) 0,74 (6)

Now let’s use the following function for generating the input and output data: f (x ) = 7sin(x1) + 6cos(x2) -

-8sin(x3) - 10cos(x4) + 5x52. In this dependency there are no features which influence on the output is linear. This is a more complex situation for the algorithm. However, if the model is trained well ( Ct = [1,0; 0,6; 0,7; 0,5; 0,1], MSE = 0,32), the algorithm gives us good results (see Table 2).

Let’s use the same function for simulating the data set and add three more features to the input variables. We simulate the situation when the data set is large and not all features influence on the output. These additional input features are: x6 = 0,05sin(/), x7 = 0,03cos(/),

x8 = 0,01/2. The coefficients with these features are small, so that x6, x7 and x8 are noises for the output. In this case we use all the features to train the model. The results are shown in Table 2. The algorithm could find the most informative and the least informative features (C, = [1,0; 0,6; 0,7; 0,5; 0,1; 1,9; 1,5; 1,4], MSE = 0,32).

Let’s use for simulating the function:

f ( x ) = 0,2 sin(2 x1 ) + 2 cos(8x2 ) + 5 sin( x3 ) +

+ 0,1x4 + 0,5x5 + x6 + 2 x7 + 3x8 + 4 x9 + 5x10,

generate the data set and take away the features xb x2 and x3. The results of the algorithm

(C, = [1,5; 1,4; 1,0; 0,9; 0,5; 0,5; 0,5], MSE = 0,3) are given in Table 3.

Analyzing the results in the tables, we may say that the algorithm with the non-parametric model can find the most informative features. This method can be used for analyzing a high dimensional data set. It allows us to exclude the least informative features from consideration.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Table 3

Results of the algorithm’s work

Features f’x (Real ranks) Algorithm’s f (Ranks)

x1 - -

x2 - -

x3 - -

x4 0,1 (7) 0,61 (7)

x5 0,5 (6) 0,71 (6)

x6 1 (5) 1,26 (5)

x7 2 (4) 1,80 (4)

x8 3 (3) 3,79 (3)

x9 4 (2) 4,15 (2)

x10 5 (1) 4,83 (1)

Experiments with the Corpus. We decided to analyze different features extracted from monologues of German native speakers using the algorithm described above. The corpus consists of transcribed descriptions of a short film by different candidates. German native speakers of different ages and educational levels were asked to watch a short film and to describe it with their own words. The film was about an experiment on how long people could be without sleep. The participants were also asked to take an intelligence test. The verbal part of the test consists of 6 subtests. The first subtest is «Information». With this sub-test the general knowledge is measured; 25 questions come from a particular culture. For example, «What is the capital of Russia?» Overall, 56 candidates were tested; 3 hours 30 minutes of audio data were collected.

To extract features from the monologues, all the words from the descriptions were compared with a special dictionary [5]. The dictionary consists of different words sorted by 64 categories. For example, the category «Articles» contains words die, das, der, ein, eine, einen, etc. Each word from the dictionary may refer to several

categories. For example, the word traurig (sad) refers to the categories «Affect», «Negative emotion» and «Sadness». We analyzed all the monologues, calculated the number of words for each category and divided them by the total amount of words in each monologue. By this way we got 64 characteristics of 56 monologues. Our task was to investigate the dependency between these 64 features and the results of the subtest «Information», and to find several informative features out of 64 characteristics.

We combined 4 or 5 features together, trained the non-parametric model and applied our method. As a result, the category «Affection» had the largest value of the firstorder partial derivative and was estimated as a more informative feature. «Positive emotions» and «Negative emotions» are subcategories of «Affection» and are also relevant according to our algorithm. However, «Anger» and «Optimism» do not have large values of f'x . The

category «Cognitive mechanism» is estimated as irrelevant, however, the category «Cause» which is a subcategory of «Cognitive mechanism» is more important.

Discussion and Future work. The goal of this work was to apply the method to the corpora. In each combination of the features the category of emotions was determined as the most informative feature. It means that there is a dependency between speaker’s general knowledge and the amount of emotional words which he uses in his speech. We could not find any references describing this dependency. Only in LEAS [6] emotional intelligence is measured linguistically, however, the cor-

relation between them was not found. For our algorithm we used a small data set that also influenced the results. Also, these emotional words may be a subcategory of another category which was not analyzed. For example, they may create a group of frequently-used words, or they are formed from abstract words which show the level of intelligence in spoken utterances. This research and the results are preliminary; in our future work we are going to further investigate this phenomenon, to find other linguistic features which reflect verbal intelligence and to collect more data for more precise estimations.

References

1. Zablotskaya K., Walter S., Minker W. Speech data corpus for verbal intelligence estimation. In Proceedings of International Conference on Language Resources (LREC) 2010.

2. Wechsler D. Handanweisung zum HamburgWechsler-Intelligenztest fuer Erwachsene (HAWIE). Separatdr., Bern; Stuttgart; Wien, Huber. 1982.

3. Nadaraya E. On estimating regression. Theory of Probability and its Applications. 1964. Vol. 10. P. 186-190.

4. Watson G. Smooth regression analysis. Sankhya -The Indian Journal of Statistics. 1964. Vol. 26. P. 359-372.

5. Computergestuetzte quantitative textanalyse / M. Wolf, A. B. Horn, M. R. Mehl et al. // Diagnostica, 54. 2008. Heft 2. P. 85-98.

6. Lane R. D, Schwartz G. E. Levels of emotional awareness: a cognitive-developmental theory and its application to psychopathology. Am J Psychiatry. 1987. P. 113-143.

© Zablotskaya K., Walter S., Zablotskiy S., Minker W., 2010

UDC 519.234

S. Zablotskiy, T. Müller, W. Minker

ESTIMATION OF RADIO SIGNAL QUALITY DEGRADATION BY MEANS OF NEURAL NETWORK AND NON-PARAMETRIC REGRESSION MODEL

In this paper we present an approach which allows us to avoid expansive and time consuming subjective assessments of audio quality degradation caused by different nature distortions while transmitting and receiving of stereo audio signal through the radio channel. This approach is based on the basic version of PEAQ (Perceptual Evaluation of Audio Quality) originally developed mainly for audio codec estimation. The MOV (Model Output Variables) vector of the PEAQ method is mapped to the audio quality degradation scale using two different models: neural networks and non-parametric regression. The results of two independent approaches are compared.

Keywords: PEAQ, audio quality degradation, neural network, non-parametric regression.

The manufacturers of radio receivers and other radio equipment have to estimate the quality of the new product comparing to the existent equipment. Among other things the common listening comprehension of the perceived degradation with respect to the original (reference) audio signal has to be taken into consideration because humans with their own listening comprehension are supposed to be the main end-users of the designed equipment. Since

any high-quality reliable subjective assessments are very expensive and time consuming it is strongly desired to have a tool for automatic perceptual evaluation of the audio quality degradation. This is a fundamental idea behind the PEAQ method, as specified in ITU-R BS.1387 recommendation [1]. According to this recommendation the PEAQ measurement method is applicable to the most types of audio signal processing equipment, both digital

i Надоели баннеры? Вы всегда можете отключить рекламу.