Научная статья на тему 'HUMAN ACTION RECOGNITION METHOD BASED ON CONFORMAL GEOMETRIC ALGEBRA AND RECURRENT NEURAL NETWORK'

HUMAN ACTION RECOGNITION METHOD BASED ON CONFORMAL GEOMETRIC ALGEBRA AND RECURRENT NEURAL NETWORK Текст научной статьи по специальности «Медицинские технологии»

CC BY
152
33
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ACTIVITY RECOGNITION / PRINCIPAL COMPONENTS ANALYSIS / CONFORMAL GEOMETRIC ALGEBRA / DEEP LEARNING / РАСПОЗНАВАНИЕ ДЕЙСТВИЙ / АНАЛИЗ ПРИНЦИПИАЛЬНЫХ КОМПОНЕНТ / КОНФОРМНАЯ ГЕОМЕТРИЧЕСКАЯ АЛГЕБРА / ГЛУБОКОЕ ОБУЧЕНИЕ

Аннотация научной статьи по медицинским технологиям, автор научной работы — Nguyen Nang Hung Van, Pham Minh Tuan, Do Phuc Hao, Pham Cong Thang, Tachibana Kanta

Introduction: Deep Learning plays an important role in machine learning and artificial intelligence. It is widely applied in many fields with high dimensional data, including natural language processing and image recognition. High dimensional data can lead to problems in machine learning, such as overfitting and degradation of accuracy. To address these issues, some methods were proposed to reduce dimensions of the data and computational complexity simultaneously. The drawback of these methods is that they only work well on data distributed on the plane. In the case of the data distributed on the hyper-sphere, such as objects moving in space, the processing results are not so good as expected. Purpose: The use of Conformal Geometric Algebra in order to extract features and simultaneously reduce the dimensionality of a dataset for human activity recognition using Recurrent Neural Network. Results: Human activity data in a 3-dimensional coordinate system is pre-processed and normalized by calculating deviations from the mean coordinate. Next, the data is transformed to vectors in Conformal Geometric Algebra space and its dimensions are reduced to return the feature vectors. Finally, we use the Recurrent Neural Network model to train feature vectors. Empirical results performed on the Motion Capture dataset with eight actions show that the Conformal Geometric Algebra combined with Recurrent Neural Network can give the best test results of 92.5 %. Practical relevance: In human actions, some actions such as jump or dance will not move in motion and other actions, such as run, walk, will move in space. Therefore, we need a method to standardize actions. In the case of the data distributed on the hyper-sphere, the developed method can help us to extract features and simultaneously reduce the dimensionality of a dataset for human activity recognition using Recurrent Neural Network.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «HUMAN ACTION RECOGNITION METHOD BASED ON CONFORMAL GEOMETRIC ALGEBRA AND RECURRENT NEURAL NETWORK»

ТЕОРЕТИЧЕСКАЯ И ПРИКЛАДНАЯ МАТЕМАТИКА у

UDC 004.93 Articles

doi:10.31799/1684-8853-2020-5-2-11

Human action recognition method based on conformal geometric algebra and recurrent neural network

Nguyen Nang Hung Vana, PhD Student, Specialist, orcid.org/0000-0002-9963-7006, nguyenvan@dut.udn.vn

Pham Minh Tuana, PhD, Lecturer, orcid.org/0000-0001-9843-9676, pmtuan@dut.udn.vn

Do Phuc Haob, M. Sc., Lecturer, orcid.org/0000-0003-0645-0021, haodp@dau.edu.vn

Pham Cong Thanga, PhD, Lecturer, orcid.org/0000-0002-6428-102X, pcthang@dut.udn.vn

Tachibana Kantac, PhD, Associate Professor, orcid.org/0000-0002-8675-7842, kanta@cc.kogakuin.ac.jp

aThe University of Danang — University of Science and Technology, Information Technology Faculty,

54 Nguyen Luong Bang St., Da Nang 550000, Vietnam

bDanang Architecture University, 566 Nui Thanh St., Da Nang 550000, Vietnam

cKogakuin University, 1-24-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo 163-8677, Japan

Introduction: Deep Learning plays an important role in machine learning and artificial intelligence. It is widely applied in many fields with high dimensional data, including natural language processing and image recognition. High dimensional data can lead to problems in machine learning, such as overfitting and degradation of accuracy. To address these issues, some methods were proposed to reduce dimensions of the data and computational complexity simultaneously. The drawback of these methods is that they only work well on data distributed on the plane. In the case of the data distributed on the hyper-sphere, such as objects moving in space, the processing results are not so good as expected. Purpose: The use of Conformal Geometric Algebra in order to extract features and simultaneously reduce the dimensionality of a dataset for human activity recognition using Recurrent Neural Network. Results: Human activity data in a 3-dimensional coordinate system is pre-processed and normalized by calculating deviations from the mean coordinate. Next, the data is transformed to vectors in Conformal Geometric Algebra space and its dimensions are reduced to return the feature vectors. Finally, we use the Recurrent Neural Network model to train feature vectors. Empirical results performed on the Motion Capture dataset with eight actions show that the Conformal Geometric Algebra combined with Recurrent Neural Network can give the best test results of 92.5 %. Practical relevance: In human actions, some actions such as jump or dance will not move in motion and other actions, such as run, walk, will move in space. Therefore, we need a method to standardize actions. In the case of the data distributed on the hyper-sphere, the developed method can help us to extract features and simultaneously reduce the dimensionality of a dataset for human activity recognition using Recurrent Neural Network.

Keywords — activity recognition, Principal Components Analysis, Conformal Geometric Algebra, Deep Learning.

For citation: Nguyen Nang Hung Van, Pham Minh Tuan, Do Phuc Hao, Pham Cong Thang, Tachibana Kanta. Human action recognition method based on conformal geometric algebra and recurrent neural network. Informatsionno-upravliaiushchie sistemy [Information and Control Systems], 2020, no. 5, pp. 2-11. doi:10.31799/1684-8853-2020-5-2-11

Introduction

Deep Learning (DL) is a new research trend in recent years for many applications, such as image processing, object detection, and remote control [14]. DL has two main models: Convolutional Neural Network (CNN) used to feature extraction in image processing [5, 6], and Recurrent Neural Network (RNN) used to handle sequence identification (sequence/time-series) [7].

The drawback of the Neural Network (NN) model is that each input x event is handled independently and gives the corresponding output y without the exchange of information collected at each input x [8]. The RNN contains internal loops that are able to save the exchanged information and the saved information can be transferred from one step to another of NN. So RNN can be used to input data from image and video converted into sequences for recognition or prediction problems. However, the most challenging problem of DL is still the selection process of data preprocessing and feature extraction techniques for training models.

Some commonly used machine learning models, such as Principal Components Analysis (PCA) [9, 10], Principal Components Regression (PCR) [11], and Multi-class Linear Discriminant Analysis (MLDA) [12], were proposed to reduce dimensions of the data and computational complexity simultaneously for training models. These machine learning method sonly work well with data distributed on a plane, such as face recognition or image classification [13]. In the case of data distributed on hyper-sphere, eg. moving objects in space, it is difficult to calculate accurately with the above methods. To address this issue, in this paper, we propose to use the Conformal Geometric Algebra (CGA) to extract features and reduce the number of data dimensions during the training of the RNN models.

In recent years, there have been a number of studies that have successfully applied Geometric Algebra (GA) for dimensionality reduction in some applications, such as color image processing, signal processing, and time-Series analysis [14-16]. CGA is part of GA, and a vector in CGA space is called a

conformal vector. Each of conformal vector is represented for m + 2 dimensions of hyper-plane and hyper-spheres (see [17-20] for more details).

In this work, we propose to use principal components in m + 2 dimensional CGA space. The feature vectors are gotten by eliminating the less relative components. We propose to transfer data from real space Rm to conformal vectors as a set of points in CGA space P e Qm+21 • The selection of principal components determines the eigenvalues and eigenvectors. The eigenvectors of A (conformal vector) are arranged in descending order. Then, k smallest eigenvectors are removed to reduce the data dimensions. It leads to receive the conformal vectors B with m - k main components. These conformal vectors B are used to train an RNN model.

Related works

A growing interest in human action recognition using the DL model has recently arisen. To build a training model, we need to collect data via sensors or cameras [21, 22]. Next, it is important to use some preprocessing and machine learning methods for object feature extraction. Finally, we use the RNN model for action recognition.

Some methods are used in feature extraction for data dimensionality reduction such as PCA, LDA, and PCR. However, these methods are linear and it is hard to perform 3D relationships like linear motion or rotation. For example, a joint moves and rotates around the parent joint. Hence, motion data will be distributed on a sphere (or hyper-sphere) with the center coordinates of the parent joint.

Next, we present some feature extraction methods.

Principal Components Analysis

Principal Components Analysis algorithm [9, 10] is usually used to convert dataset from a multi-dimensional space into a less dimensional space, but the method still ensures that the variance of the input data on each new dimension is the largest.

Given training set X = jx; |x; e RdJ, i = {1, ..., n},

where xi is a vector in d-dimensional space, and n is the number of vectors in the set X. PCA will perform a linear transformation to convert data into a new coordinate system. The linear transformation is defined by the scalar product of the vector x and the unit vector of the weight w e Rd where llw = 111. The problem is transformed into finding weight vectors so that the covariance of the linear transformation wTx is the largest. We need to solve the problem:

max— ilwTXj - s.t llwll2

i=l

= 1,

(1)

where the average of all vectors in the X dataset is defined as follows:

l n H- = ~i xi ■

n

i=l

(2)

Equation (2) is the average of all vectors of the X dataset. To solve this optimal problem, this paper introduces the Lagrange coefficient X > 0 for the Lagrange function as follows:

L(w, X) = -i(wTx; - wT,a)2 -x(||w||2 -l) (3)

n i=1

Then, calculate the derivative of L(w, X) with w going to zero will get the following formula:

1 n T

1 i(xi -^)(xi w = Xw-

n' ,

i=1

(4)

So the optimization problem solved by decomposing Eigen is as follows:

Cw = Xw,

where

C =1 i (xi-^)(x-^)T-

n r -,

i=1

(5)

(6)

Equation (5), C is the variance matrix of the X data set. Finally, PCA uses a decrease in the number of dimensions of the data using the first k eigenvectors. These vectors are one that corresponds to the value of the maximum eigenvalues. This means that the original data set is approximated by data with less dimensionality and overview than the original data. The feature f(x) can be extracted from vector x using the first k eigenvector as follows:

/pca(x) = ((x - ^Twi, ..., (x - ^Twft)T,

(7)

where wi is the ith eigenvector, 1 < i < k.

The extraction feature method using PCA uses a linear transformation for the input data. Hence, it is only possible to represent accurately the data distributed on the plane. However, the feature extraction results are not good with data distributed on the sphere. Furthermore, PCA mainly uses 3D coordinates during data collection, but it does not go in-depth into an analysis of 3D relationships of objects.

Conformal Geometric Algebra

Given training set X = {xi|xi e Rd}, i e {1, ..., n} represented in real d-dimensional space. A CGA space is extended from the real Euclidean vector space Rd by adding 2 orthonormal basis vector.

Thus, a CGA space is defined by d + 2 basis vectors {e1, ..., em, e+, e-}, where e + , e- and et, i e {1, ..., d} are defined as following:

e+ = e+ * e+ _ 1;

e_ = e_ • e_ =-1; e + e_ = e + •ei = e_ei = 0, Vi e {1, ..., d}.

(8)

Thus, a CGA can be expressed by Qd+X 1. This paper then defines the converted basis vectors e0 and

eo = 2^e__e+)'e» = (e- + e+). (9)

From Eq. (8) and (9), it is easy to see that:

eo'eo = e<»'e<» = 0; eo'e» = e»'eo = -1; eo-ei = e^ = 0, Vi e {1, ..., d}. (10)

This training set is re-represented by the set of points P e Qd+1i1 in CGA space [23, 15] as follows:

1,, ||2

Pi = xi + 2N1 e^+ e0 e^d+1,1-

(11)

Hence, a sphere in CGA space is represented as a conformai vector

S = s + s»e» + s0e0-

(12)

The estimating process is performed by using the least squares d2(Pi, S). The error function is defined as follows:

E = £d2 (p, S) = £ix;sj -^j -2|x|2 soj1 . (13)

i=1 i=1 ^ 2 '

This means that when minimizing the error E function, s can be limited by llsll2 = 1. In this case, the optimization problem becomes as follows:

i=1

lin £[xi S j _ sxj _ 2 Ixi|| s0

s.t lis J = 1. (14)

Following [14], the optimal problem is solved by the Eigen decomposition as follows:

Ajsj =

(15)

where A is the variance matrix of the ith training set in CGA space:

A = X f (x; )fT (x ).

i=1

(16)

The function f(x^) is defined as follows: fx) = x - 4 - lxl2/o 6 Rm,

where

"'«ll ||2

fOO

fo =

X4Xi=1xi + X2Xi=Jxdl xi ,

(X2 )2 _ -X 4 ^

Zsr^n sr^n II ||2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2 X i=1xi _ nX i=1lxi|l xi

(X2 )2 _ nX 4

(18)

(19)

Xi 112 and the sum of the four powers

Z=V n 11 M4

4 _^i=lllXill .

An eigenvector sj is an conformal eigenvector of a subset Xj defined in hyper-plane or hyper-sphere Sj _ sj + s^e^ + s0je0 and eigenvalues Xj are variance; s and s0 are the scalar coefficient of the basis vectors e and e0.

Recurrent Neural Network

Recurrent Neural Network models [7] are mostly used in the fields of natural language processing, speech recognition, and action recognition. The learned model always has the same input size, because the input of each state is the output of the previous state. It allows to use same transition function with same parameters at every step. These properties make it possible to learn the model executing on all steps and all sequence lengths. Hence, RNN is able to generalize the sequence lengths not being in the training set. Therefore, the learned model will be estimated with much fewer data.

In this study, we will introduce RNN with data as human action videos (Fig. 1). These actions will go through PCA and CGA to create the feature vector x of size n. The calculation will take many to one form of the RNN model, with multiple input and output predicting actions in the data set.

Each input value xt is of size n, after passing RNN, there will be an output value y of size c (number of clustering), for each circle called a state, the input of each state is xt and st-1 (which is the output of the previous state). Now output st is calculated in the following formula:

st = tanh(Uxt + Wst_1),

(20)

where st of size m; U and W is hyperparameter [24]: U of size (m x n) is a coefficient matrix between xt-1 and xt; W of size (m x n) is a coefficient matrix connecting st and st-1. In Fig. 1, a V of size (c x m) is a coefficient matrix converting xt to y.

Because there is only one output value, y can be determined through the activation function is softmax:

e» as

■ Fig. 1. Illustrate the many to one problem of the RNN model: a — short RNN model; b — RNN model for application

y = softmax(Vsn)

(21)

From (21) from the input sequence x, we also determine the output value quite simply.

Deep Learning has two big models that are CNN and RNN. CNN is a processing model in the problem of image input, and very common applications in computer vision are classification, object detection, and segmentation. However, in human action recognition, the input data of the models is different actions and the model needs to predict the appropriate output action (predict time series). Therefore, in this paper, we choose the many to one model in RNN to apply.

The proposed method

The proposed method is to analyze data of moving objects and human actions with markers with

Input data

Feature Extraction

PCA

CGA

x] 0-x2 0

1 2

■ Fig. 2. The overview of proposed RNN model

coordinates in 3D space. Specifically, the proposed use of the CMU (Carnegie Mellon University) [25] motion capture dataset consists of 08 different actions, each action consisting of multiple files and each file consisting of corresponding frames. In each frame there are 41 markers (41 joints), each marker is each coordinates are represented in 3D space.

In this paper, we propose a technique to normalize data by moving the coordinate axes of all joints back to their original coordinates, then using PCA and CGA to extract features. Finally, use these feature vectors to create input values for the RNN (Fig. 2).

Transformation method of coordinates

In human actions, some actions such as jump or dance will not move in motion and other actions such as run, walk will move in space. Therefore, we need a method to standardize actions to be similar. In this study, we propose to transform all markers to new coordinates by calculating deviations from the mean coordinate to extract features.

Give a data set:

3 = {ÇjQ e Rt(i)xmx3}, i e {1,

(22)

where =

isT XT

ST1' •••' 8;,t(i)

R

t(i )xmx3

is a vector

of the ith action; t(i) is frame number of the ith action; 8ij =[eT;,i, ■■■, 0T/,m]

Rmx3 is a vector

corresponding to the ith frame of action and e R3 is the coordinates of the ith

Jijk

Jijk1 ^ijk2 ^ijk3

marker of the jth frame of the ith action.

Then, we convert action C>i to the set of vectors as follows:

X = jj e Rm*3}, j e {1, ..., t(i)}, (23) where xij is a vector corresponding to features the

]

jth frame, and the ith action:

xi] = [gT(0]]i), ..., gT(0]]ft), ..., flT(0]]m)]t, (24)

where fl(0jjk) is a transformation of coordinates of the kth marker of the jth frame of the ith action. In this paper, we can use the function

flT(V = ]

or transformation method of coordinates

(25)

s

x

x

x

n

2

x

s

g(©jjk )=0jjk -0 i

(26)

where

= _Lv m 0

ij mzLk'=i jjk •

So, we have the new set of vector for the training set of PCA and CGA_PCA.

Model combining PCA with RNN

Recurrent Neural Network can combine well with other models to predict in time series (predict time series). However, RNN uses many parameters on each state, which can lead to over-fitting. By combining PCA with RNN, PCA is capable reduce feature dimensions so that network nodes can be reduced but retain the original properties (original information).

The PCA algorithm uses orthogonal transformations to convert the data set from a multi-dimensional space to a new space with less dimension. This transformation is based on finding the axis of the new space so that the method of data projected on that axis is greatest. From Eq. (22) and (23), we have Xnew as follows:

X = {jj e Rm * 3}; j e {1, ..., t(i)}, i e {1, ..., n}. (27)

Now, we need to solve the problem:

1

t(i

max-

SlS (i ) i=1

¿¿(wTXij -wTp, s.t Iwl2 = 1, (28)

where

t(i) zz^

S ¡=1* (i ) i=1 j

(29)

Equation (28), to the optimal problem, this paper introduces the Lagrange coefficient X > 0 for the Lagrange function as follows:

L (w, X) =

t(i)

= Z Z ( - wV)2 -X(||w|2 -1). (30)

Z i=lt (i )i=1 i=1

Then, calculate the derivative of L(w, X) with w going to zero will get the following formula:

1 n t(i) T

ZZ(xij-v)(xij w=Xw. (3i)

Z i=if (i) i=1 j=1

Then, the eigenvalues can be obtained via the following function:

Cw = Xw,

(32)

where

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

C=^-7; ZZ(x a ij (33)

Z i=if (i) i=1 j=1

Equation (32), C is the variance matrix of the X data set. Finally, PCA uses a decrease in the number of dimensions of the data using the first k eigenvectors. The feature fPCA(x) can be extracted from vector x using the first k eigenvector as follows:

/pca(x) = ((x - |)Twp ..., (x - |a)Twk)t, (34)

where wi is the ith eigenvector, 1 < i < k.

Now, we use the transform fPCA(x) to apply the learning model by converting the data set T = = {fPCA(xij), yJx e Rm*3, y e {1, ..., c}}, i e {1, ..., n}, where fPCA(xj) and yt are label and feature vector after applying PCA.

Then, we use the data set T to initialize the input data for the RNN model. From Eq. (20), the formula is rewritten as follows:

Si = tanh(^xi] + WSi-1).

(35)

Because there is only one output value, yi can be determined through the activation function is soft-max and Eq. (21) is rewritten

1 k t(i) —^—ZZ^). (36) Z i=it (i )i=i j=i

ypCA,i = softmax

From Eq. (36) we get the clustering result of each action. Because PCA uses linear methods and assumes that the data is distributed on the plane, in cases where the data is distributed in the hyper-sphere, the PCA will not give a high result. The study further suggests using CGA to combine with the RNN.

Model combining CGA with RNN

This proposal will proceed to build the RNN model on CGA space. From Eq. (22) and (23) data is converted into points in CGA space as follows:

p = 1|| ||2 Pij = Xij + 9 Xij + e

0 fc »mx3+1,1 •

(37)

The process of estimating using least squares d2(Pj, S). The error function E as follows:

n t(i)

E = ZZd2(Pij, S) =

i=1 j=1

n t(i)

=ss

i=1 j=1

xijs s<x o xij s0

(38)

1

2

This means that when minimizing the error E function, s can be limited by llsll2 _ 1:

n t(i)

^Xj^is-2lIxi1 so i=1 j=1

1|' II2 2

(39)

Therefore, we might be tempted to express the previous problem using a non-negative Lagrange multiplier X as the minimization of

L(s, X) _- 1

S=it (i )i=i i

V^Y 1 ||2 LLIXijs-s°o"2Xi II s0

-x(| s I2 -1).

(40)

The optimal result can be solved using Eigen problem

As = Xs,

(41)

where A is the variance matrix of the ith training set in CGA space:

n t(i)

A _XXf (xij))T (xij).

i=1 j=1

The function fCGA(xi) is defined as follows: /cga(\) _ xij - 4 - lxiil2/0 e Rm,

(42)

where

frx> = ■

_v vn Yt(i)x . + V Vn Yt(i)llx j=1Xij i=i\\xi

=111^11 Xij

(43)

(44)

(X2 ) -XL*(OX4 f X2Xn=1 X%xijXn=1t(i)Xn=1t(i) (45)

fo =-;-—2-• (45)

(X2 )2-xn=1t(i)X4

Similar to the PCA model, RNN uses input data is fCGA(xi) after using the CGA to extraction feature. From Eq. (20), (21) and (43) can be rewritten as

ycGA,i = softmax

1 k t(i) —i—xx^). (46)

X i=1t (i );=1 i=1

This model is implemented on CGA space, i. e. data in real space is transferred to CGA space. With the characteristics of CGA, it is possible to represent objects in space and geometric relationships very well. So movements with complex distributions like human joints use CGA very reasonably.

Experimental

Experimental data

The database of motions of CMU, USA [25] is free for all uses. Motions are captured in a working volume of approximately 3 x 8 m. In this model, humans wear a black jumpsuit having 41 markers taped on (Fig. 3). The Vicon cameras see the markers in infrared. The images picked up from the various cameras are triangulated to get 3D data.

head

upperneck lowerneck

rclavicle rhumerus

rradius

rwrist rfingers rhipjoin

rfemur

rfoot rtoes

lradius

lwrist lfingers rhipjoin

lfemur

rtibia ■ ■ ltibia

rfoot ltoes

■ Fig. 3. Illustration model of markers on the body

■ Table 1. Database experiment

Action Number frame

Training Testing Total

Dance 3.305 1.577 4.882

Jump 1.198 846 2.044

Kick 1.605 1.163 2.768

Placing Tee 1.487 1.096 2.583

Putt 1.534 974 2.508

Run 452 322 774

Swing 1.324 977 2.301

Walk 1.074 928 2.002

Total 11.979 7.883 19.862

2

This study uses 8 kinds of human action. They are dancing, jumping, kicking, placing tea, putt, running, swing, walking. This paper uses the warehouse file format .c3d and data include total frame is 19.869 frames, divide the number of frames of each action into 2 parts (train — 60 % and the test — 40 %). Details are presented in Table 1.

Figure 3 shows the labels of markers, which have 3D coordinates (x, y, z).

Predict with RNN

This experiment is conducted with the original data set on the RNN model with the non-transformation using Eq. (25) and with the proposed transformation using Eq. (26). The parameters of the RNN network are the number of neural = 200, epochs = 20, and classes = 8 (8 kinds of human action), batch_size = 5, and activation function is Tanh.

The results from Table 2 show that when the coordinates are moved closer to the original coordinates, the result is 84.45 %. This result is much higher than when keeping the coordinates is about 70.97 %. However, to improve as well as increase efficiency when using the RNN. In this study, we propose to use many methods of dimensional reduction before putting into the RNN to predict.

Predict with PCA_RNN

In this experiment, we will conduct coordinate transformation before using PCA to extract features. Results Fig. 4 shows that as the number of dimensions increases, the result gradually increases.

When the number of dimensions is 13, the Train result is 93.12 % and the Test is 78.2 %, the result will not increase and can be considered to have converged at the number of dimensions equal to 14.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

■ Table 2. Comparison results of two data sets using RNN, %

Part non-Transformation Transformation

Train 72.57 87.11

Test 70.97 84.45

0 -

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Number of dimensions

- PCA_Train--PCA_Test

■ Fig. 4. Results of combining PCA and RNN

Next, research should continue to experiment with the use of CGA to extract features.

Predict with CGA_RNN

In this experiment, we will conduct preprocessing before using CGA to extract features. Fig. 5 shows that the result when using CGA will converge most when receiving the full attributes of the object. At the same time, the results clearly show that if you remove some key attributes, the result will decrease.

The highest train result was 98.1 % with a dimension is 2 and the highest test was 92.52 % with a dimension is 5.

Evaluation of results

The experiment was conducted with 5 times of implementation on the proposed method with the number of neural = 200, epochs = 20, the results achieved in Table 3. The results of using CGA_ RNN are much higher than PCA_RNN, this is also true. when using CGA to represent moving objects in space.

In the previous study [26], we used PCA and CGA to extraction feature and predict on data set [25]. The best result is 88.86 %. However, when we use PCR to classify and predict the results, we encounter some limitations such as calculation speed and complexity. So we propose a method of combining CGA with RNN.

Currently, some studies are using ML models [27-29] and DL in human action recognition [5, 30, 31]. However, these studies only focus on developing RNN and Long Short Term Memory (LSTM) models to predict but do not care about the characteristics of the object and extraction feature methods

i-

8 0.8-

Ac

0.75-

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Number of dimensions

- CGA_Train--CGA_Test

■ Fig. 5. Results of combining CGA and RNN

■ Table 3. Compares the results of the two proposed methods, %

Part PCA_RNN CGA_RNN

Train 85.24 95.59

Test 72.83 88.50

120 100 I 80

u 60

«

3 40

о

20

0 -

1 3 5 7 9 11 13 15 17 19

Number of epoch - PCA_Train-CGA_Train

■ Fig. 6. Comparison of the results of the two proposed methods

о

1 3 5 7 9 11 13 15 17 19 Number of epoch --Loss_PCA--Loss_CGA

■ Fig. 7. The loss rate of the two methods with epoch = 20

of the data (object). In the study [32], the authors used Android phones to collect actionable data and then used the LSTM-RNN model to train and the research results were quite high at 96 %. Although the number of markers (joints) is only 3, this study has the number of markers is 41.

The results of Figs. 4 and 5, the maximum results are achieved by the PCA method with the number of dimension 13, and by the CGA method with the number of dimension 2.

Figure 6 shows that as the number of epochs increases, the recognition accuracy will increase.

References

1.

2.

3.

Rosebrock A. Deep Learning for Computer Vision. Py-imageSearch, 2017. 500 p.

Arthishwari K., Anand M. Design of LSTM-RNN on a sensor based HAR using Android phones. International Journal of Recent Technology and Engineering (IJRTE), 2020, vol. 8, no. 5, pp. 2277-3878. doi:10. 35940/ijrte.E6821.018520

Pham C. T., Kopylov A. Multi-quadratic dynamic programming procedure of edgepreserving denoising for medical images. ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2015, XL-5/W6, pp. 101106.

When epoch increases, the recognition results can also increase, and when epoch reaches a certain value, the results cannot increase (a straight line or descending).

Figure 7 shows the output loss in each step of RNN. The loss rate decreases, the recognition results increases. This figure also shows that the loss rate of the CGA method is lower than that of PCA during each epoch.

Conclusion

In this paper, we proposed a normalization method for input data by calculating deviations from the mean coordinate, before using PCA and CGA to reduce the number of dimensions and create input data for the RNN network. Experimental results show that the proposed method CGA_RNN has 88.50 % higher results than 72.83 % of PCA_ RNN. Theoretically, RNN can learn distant states. However, in reality, RNN only brings the previous states to the later stages.

However, the RNN model only carries a certain number of states after that, it would be vanishing gradients, and this model can only be learned from near states (short term memory). Therefore, it is necessary to apply the proposed model with LSTM to improve the research results. Researching CGA to apply to DL is a direction, creating a basis for analyzing large data of moving objects in space and other applications such as image processing, action recognition, and automatic control in the future.

Acknowledgements

This work was supported by The University of Danang, University of Science and Technology, code number of Project T2019-02-20.

4. Goodfellow I., Bengio Y., and Courville Aaron. Deep Learning. MIT Press, 2018. 800 p. doi:10.1007/ s10710-017-9314-z

5. Liu H., and Taniguchi T. Feature extraction and pattern recognition for human motion by a deep sparse autoencoder. IEEE International Conference on Computer and Information Technology, 2014, pp. 173-181. doi:10.1109/CIT.2014.144

6. Ji S., Xu W., Yang M., and Yu K. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, vol. 35, no. 1, pp. 221-231. doi:10.1109/ TPAMI.2012.59

7. Du Y., Wang W., and Wang L. Hierarchical recurrent neural network for skeleton based action recognition.

2

1

0

IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1110-1118.

8. Mikolov T., Bengio Y. and Pascanu R. On the difficulty of training recurrent neural networks. International Conference on Machine Learning, 2013, vol. 28, pp. 1310-1318. arXiv:1211.5063v2.

9. Jolliffe I. T. Principal Component Analysis. 2nd Ed. New York, Springer-Verlag, 2002. 518 p.

10. Smith L. I. A tutorial on Principal Components Analysis. Cornell University, USA, 2002. 27 p.

11. Jolliffe I. T. A note on the use of principal components in regression. Applied Statistics, 1982, vol. 31, iss. 3, pp. 300-303.

12. Alan J. I. Linear Discriminant Analysis. Springer, 2012. Pp. 1525-1548.

13. Nixon M. S., and Aguado A. S. Feature Extraction and Image Processing for Computer Vision. 3rd Ed. ELSEVIER, 2012. 632 p.

14. Pham M. T., Tachibana K., Hitzer E. M. S., Buchholz S., Yoshikawa T., and Furuhashi T. Feature Extractions with geometric algebra for classification of objects. IEEE World Congress on Computational Intelligence, Hongkong, 2008,pp.

15. Hildenbrand D., and E. Hitzer. Analysis of point clouds using conformal geometric algebra. 3rd International Conference on Computer Graphics Theory and Applications, Funchal, Madeira, Portugal, 2008, pp. 69-94.

16. Minh Tuan Pham, Hao Do Phuc, Kanta Tachibana.

Feature extraction for classification method using principal component based on conformal geometric algebra. IEEE World Congress on Computational Intelligence / International Joint Conference on Neural Network, 2016, no. 978-1-5090-0620, pp. 4112-4117.

17. Hestenes D., and Sobczyk G. Clifford algebra to geometric calculus. A unified language for mathematics and physics. Springer, Dordrecht, 1984. 309 p. doi:10.1007/978-94-009-6292-7

18. Doran C., and Lasenby A. Geometric algebra for physicists. Cambridge University Press, 2003. 578 p.

19. Pham M. T., Tachibana K., Hitzer E. M. S., Yoshika-wa T., and Furuhashi T. Classification and clustering of spatial patterns with geometric algebra. AGACSE 2010, Leipzig, 2010, no. 978-1-84996-107-3, pp. 231247. doi:10.1007/978-1-84996-108-0_12

20. Pham M. T., Tachibana K., Yoshikawa T., and Furu-hashi T. A clustering method for geometric data based on approximation using conformal geometric algebra. 2011 IEEE International Conference on Fuzzy Systems, 2011, pp. 2540-2545. doi:10.1109/ FUZZY.2011.6007574

21. Hachaj T., Ogiela M. R., and Piekarczyk M. Dependence of Kinect sensors number and position on ges-

tures recognition with Gesture Description Language semantic classifier. Computer Science and Information Systems (FedCSIS), Federated Conference, 2013, pp. 571-575.

22. Afsar P., Cortez P., and Santos H. Automatic human action recognition from video using hidden Markov model. IEEE 18th International Conference on Computational Science and Engineering, 2015, pp. 105109. doi.org/10.1109/CSE.2015.41

23. Eckhard Hitzer, Tohru Nitta, and Yasuaki Kuroe. Applications of Clifford's geometric algebra. Advances in Applied Clifford Algebras, 2013, vol. 23, iss. 2, pp. 377-404. arXiv:1305.5663v1.

24. Goodfellow I., Bengio Y., and Courville A. Sequence Modeling: Recurrent and Recursive Nets. In: Deep Learning. MIT Press, 2016. Pp. 367-415.

25. The Carnegie Mellon University, The Carnegie Mellon University Motion Capture Database. Available at: http://mocap.cs.cmu.edu (accessed 20 April 2020).

26. Nguyen N. H. V., Pham M. T., Do P. H. Marker selection for human activity recognition using combination of conformal geometric algebra and principal component regression. Proceedings of the Seventh International Symposium on Information and Communication Technology, December 8-9, 2016, pp. 274379. doi:10.1145/3011077.3011133

27. Md. Al Mehedi Hasan, Omar Faruqe. Face recognition using PCA and SVM. Anti-counterfeiting, Security, and Identification in Communication, ASID 2009, pp. 97-101.

28. Gehrig D., and Schultz T. Selecting relevant features for human motion recognition. 19th International Conference on Pattern Recognition, Tampa, FL., 2008, pp. 1-4. doi:10.1109/ICPR.2008.4761290

29. K. G. Manosha Chathuramali, Ranga Rodrigo. Faster human activity recognition with SVM. The International Conference on Advances in ICT for Emerging Regions, Colombo, 2012, pp. 197-203.

30. Wei L., and Shah S. K. Human activity recognition using deep neural network with contextual information. The 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pp. 34-43.

31. da Silva R. E., Ondrej J., and Smolic A. Using LSTM for automatic classification of human motion capture data. 14th International Conference on Computer Graphics Theory and Applications, VISIGRAPP 2019, pp. 236-243.

32. Arthishwari K., Anand M. Design of LSTM-RNN on

a sensor based HAR using Android phones. International Journal of Recent Technology and Engineering (IJRTE), 2020, vol. 8, no. 5, pp. 2277-3878.

УДК 004.93

doi:10.31799/1684-8853-2020-5-2-11

Метод распознавания действий человека на основе конформной геометрической алгебры и рекуррентной нейронной сети

Нгуен Нанг Хунг Вана, аспирант, Specialist, orcid.org/0000-0002-9963-7006, nguyenvan@dut.udn.vn Фам Минь Туана, PhD, преподаватель, orcid.org/0000-0001-9843-9676, pmtuan@dut.udn.vn До Фук Хаоб, M. Sc., преподаватель, orcid.org/0000-0003-0645-0021, haodp@dau.edu.vn Фам Конг Тханг^ PhD, преподаватель, orcid.org/0000-0002-6428-102X, pcthang@dut.udn.vn Татибана Кантав, PhD, доцент, orcid.org/0000-0002-8675-7842, kanta@cc.kogakuin.ac.jp ^Университет Дананга — Университет науки и технологий, факультет информационных технологий, ул. Гнуен Лыонг Банг 54, Дананг, 550000, Вьетнам

бДанангский университет архитектуры, ул. Нуи Тхань 566, Дананг, 550000, Вьетнам вУниверситет Когакуин, Ниси-Синдзюку 1-24-2, Синдзюку-ку, Токио, 163-8677, Япония

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Постановка проблемы: глубокое обучение играет важную роль в машинном обучении и искусственном интеллекте. Оно находит широкое применение во многих областях, где приходится оперировать большими массивами данных, например в обработке естественных языков или распознавании изображений. Высокая размерность данных ведет к таким проблемам машинного обучения, как чрезмерное обучение или падение точности. Для их преодоления предлагаются методы одновременного снижения размерности данных и вычислительной сложности. Недостаток этих методов состоит в их ориентированности на данные, распределенные по плоскости. В случае данных, распределенных по гиперсфере, таких как передвигающийся в пространстве объект, результаты обработки показывают качество ниже ожидаемого. Цель: применение конформной геометрической алгебры для одновременного снижения размерности массива данных, необходимых для выделения признаков и распознавания действий человека с использованием рекуррентной нейронной сети. Результаты: данные о действиях человека в трехмерной системе координат подвергаются предобработке и нормализации путем вычисления отклонений от средних координат. Далее данные преобразуются в векторы в пространстве конформной геометрической алгебры, а их размерность снижается для извлечения векторов признаков. Наконец, применяется модель рекуррентной нейронной сети для обучения векторов признаков. Опытные результаты, полученные на массиве данных захвата движений для восьми действий, показали, что комбинация конформной геометрической алгебры с рекуррентной нейронной сетью обеспечивает наилучший результат в 92,5 % случаев. Практическая значимость: некоторые действия человека, например прыжок или танец, не сопряжены с перемещением в пространстве, в отличие от таких, как бег или ходьба. Поэтому необходим способ стандартизации действий. В случае данных, распределенных по гиперсфере, разработанный метод позволяет выделять признаки с одновременным снижением размерности массива данных для распознавания действий человека посредством рекуррентной нейронной сети.

Ключевые слова — распознавание действий, анализ принципиальных компонент, конформная геометрическая алгебра, глубокое обучение.

Для цитирования: Nguyen Nang Hung Van, Pham Minh Tuan, Do Phuc Hao, Pham Cong Thang, Tachibana Kanta. Human action recognition method based on conformal geometric algebra and recurrent neural network. Информационно-управляющие системы, 2020, № 5, с. 2-11. doi:10.31799/1684-8853-2020-5-2-11

For citation: Nguyen Nang Hung Van, Pham Minh Tuan, Do Phuc Hao, Pham Cong Thang, Tachibana Kanta. Human action recognition method based on conformal geometric algebra and recurrent neural network. Informatsionno-upravliaiushchie sistemy [Information and Control Systems], 2020, no. 5, pp. 2-11. doi:10.31799/1684-8853-2020-5-2-11

В статье Burkov A. A., Shneer S. V., Turlikov A. M. Lower bound for average delay in unblocked random access algorithm with orthogonal preambles, 2020, № 3, на с. 84 в разделе Acknowledgment вместо project number 19-31-27001

следует читать

project number 19-37-90041.

i Надоели баннеры? Вы всегда можете отключить рекламу.