Научная статья на тему 'A comprehensive leap motion database for hand gesture recognition'

A comprehensive leap motion database for hand gesture recognition Текст научной статьи по специальности «Медицинские технологии»

CC BY
223
39
i Надоели баннеры? Вы всегда можете отключить рекламу.
Журнал
Academy
Ключевые слова
HAND GESTURE RECOGNITION / TOUCHLESS INTERACTION / LEAP MOTION / SUPPORT VECTOR MACHINE

Аннотация научной статьи по медицинским технологиям, автор научной работы — Turgali Bauyrzhan Kuanishuly

The touchless interaction has received considerable attention in recent years with benefit of removing the burden of physical contact. The recent introduction of novel acquisition devices, like the leap motion controller, allows obtaining a very informative description of the hand pose and motion that can be exploited for accurate gesture recognition. In this work, we present an interactive application with gestural hand control using leap motion for medical visualization, focusing on the satisfaction of the user as an important component in the composition of a new specific database.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «A comprehensive leap motion database for hand gesture recognition»

9. Orio N., Lemouton S., Schwarz D. & Schnell N. "Score-following: State of the Art and New Developments", Proceedings of New Interfaces for Musical Expression. Montreal. Canada, 2003.

10.Aska A. [Electronic resource]. URL: http://www.alyssaaska.com/ (date of acces: April 15, 2014).

A COMPREHENSIVE LEAP MOTION DATABASE FOR HAND GESTURE RECOGNITION Turgali B.K.

Turgali Bauyrzhan Kuanishuly - Master, DEPARTMENT OF COMPUTER ENGINEERING AND TELECOMMUNICATIONS, INTERNATIONAL INFORMATION TECHNOLOGY UNIVERSITY, ALMATY, REPUBLIC OF KAZAKHSTAN

Abstract: the touchless interaction has received considerable attention in recent years with benefit of removing the burden of physical contact. The recent introduction of novel acquisition devices, like the leap motion controller, allows obtaining a very informative description of the hand pose and motion that can be exploited for accurate gesture recognition. In this work, we present an interactive application with gestural hand control using leap motion for medical visualization, focusing on the satisfaction of the user as an important component in the composition of a new specific database.

Keywords: hand gesture recognition, touchless interaction, leap motion, support vector machine.

I. INTRODUCTION

The user has dreamt for a long time to interact with a more natural and intuitive machine rather than with conventional means. For this reason, the gesture is the richest means of communication that can be used for human computer interaction [2].

The technology can prove to be interesting for applications requiring interacting with the computer in particular environments, like an operating room where sterility causes a big issue. The system is designed to allow a touchless HMI, so that surgeons will be able to control medical images during surgery. This paper presents a study of hand gesture recognition through the extraction, processing and interpretation of data acquired by the LM. This leads us to design recognition and classification approaches to develop a gesture library useful to the required system control. We introduce several novel contributions. We collected a new dataset formed by specific dynamic gestures related to recommended commands, and then we created our own data format. We propose a three-dimensional structure for the combination of spatial features like the arithmetic mean, the standard deviation, the root mean square and the covariance, in order to effectively classify dynamic gestures.

II. LEAP MOTION DEVICE

The LM is a compact sensor released in July 2013 by the Leap Motion Company [1]. The device has a small dimension of 80 x 30 x 12.7 mm. It has a brushed aluminum body with a black glass on its top surface, which hides three infrared LEDs, used for scene illumination, and two CMOS cameras spaced 4 centimeters apart, which capture images with a frame rate of 50 up to 200 fps. This depends on whether USB 2.0 or 3.0 is used. The LMC allows a user to get information about objects located in a device's field of view (a format similar to an inverted pyramid, whose lower area length measures 25mm and the top one 600mm, with 150° of field of view).

Fig.1. LMC with micro—USB plug

III. EXISTING DATABASES

Researchers have started to analyze the LM Controller (LMC) performances after its first release in 2013. A first study of the accuracy and robustness of the LMC was presented in [4]. An industrial robot with a reference pen allowing suitable position accuracy was used for the experiment. The results showed a deviation between the desired 3D position and the average measured positions below 0.2 mm for static setups and of 1.2 mm for dynamic ones.

To improve the human computer interaction in different fields through the LMC, we opt for recognizing the utilizer gestures to enter commands.

The sign language is another field of study: An investigation of the LMC in [3] showed its potential recognition accuracy in gesture and handwriting on the fly. The acquired input data were treated as a time series of 3D positions and processed utilizing the dynamic time warping algorithm. In [5], an Indian sign language recognition system was developed with both hands using LM sensor. The positional information of five-finger tips along with the center of the palm for both hands was used to recognize the sign posture based on the Euclidean distance and the cosine similarity. Likewise, in [6] the authors put forward a novel method of pattern recognition to recognize symbols of the Arabic sign language. The scheme extracted meaningful characteristics from the data, such as angles between fingers, to achieve a high-accuracy, which utilized a classifier to decide which gesture was being performed.

IV. DESCRIPTION OF OUR DATASET

During the technical visits to hospitals, we discussed with some surgeons the usability of touchless interaction systems in the sterile field to set finally the indispensable commands in order to handle medical images. These commands were identified mainly on the basis of intuitiveness and 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT) 515 memorability.

Since there are no publicly released datasets for 3D dynamic gesture recognition, we have collected a gesture dataset using the LM sensor to evaluate the performance of the proposed approach. Our database contains 11 actions performed by 10 different subjects (3 men and 7 women). Only one participant is left-handed. All subjects performed five repetitions of each motion, giving about 550 separate samples. In addition, we recorded all repetitions in the same room while keeping the lighting conditions similar to that of a surgical operating room.

V. OVERVIEW OF THE PROPOSED APPROACH

First, we should set up the programming environment to create a library of gestures based on the gathered hand attributes. Then a set of relevant features is extracted. Finally, a trained SVM classifier is applied to the extracted features in order to recognize each performed gesture.

Once the technological solutions to be adopted are defined as described in section II. The LMC is designed to be placed on a flat surface under the area where hand movements are

performed. The user needs to connect the LMC by a USB to a computer to detect hand movements on top of it.

The tracking data, which contain the palm and fingers' position, direction, velocity, can be accessed using its SDK. The LM company has kept updating their SDK after the first release. In this study, we use the V2 desktop for building the application with standard tracking. Currently, we are utilizing the last version SDK V2.3.1 available online at [1], which does not provide gesture recognition. We are trying to implement the gesture recognition by ourselves.

A. Feature Extraction

The hand attributes are gathered directly from the LMC that provides a very important datum for the recognition of the gesture and avoids the complex calculations required for their extraction from the depth and color data captured from the depths sensor like Kinect or others. Accordingly, we have created our own data format. It contains only necessary positional information and allows us to easily save captured frames into files, and read them later for processing and testing purposes.

For the data acquired by the leap device used in our experiments, we choose to extract the three coordinates (X, Y, Z) of the six most dominant points in the hand, the palm center and five fingertips processed respectively.

• Palm center: P = [(Xp1, Yp1, Zp1) ... (Xpn, Ypn, Zpn)]

• Fingertips Position: Fi = [(Xi1, Yi1, Zi1) ... (Xin, Yin, Zin)].

This data is then grouped in matrices of six rows and n columns:

L k =],..., 5; number of repecitioti For gesture recognition, each repetition [Ai, j, k] is partitioned in Wt = 20 temporal windows and then the arithmetic mean, the standard deviation, the covariance and the quadratic mean of coordinates in each window is calculated. To this purpose, we introduce the following features:

• Arithmetic mean:

• Standard deviation:

• Covariance:

• RMS:

This set of characteristics is calculated based on spatial attributes (3D positions) and normalized in the range [-1, 1]. Our previous approaches have produced six vectors whose size varies according to the time window Wt. The complete feature set is obtained by concatenating the six vectors [Palm, Thumb, Index, Middle, Ring, Pinky].

B. SVM Classifier

A typical issue of many machine-learning techniques is that a large dataset is required to properly train the classifier. The acquisition of the training data is a critical task that can require a huge amount of manual work. To compute the results, we split the dataset into a training set and a test one. In all the experiments, we have used the first six subjects for learning and the last four ones for testing.

For classification, we test the SVM classifier, which is one of the most common machine learning classifiers in use today. It was derived from the learning theory and was widely utilized in detection and recognition.

VI. EXPERIMENTAL RESULTS AND DISCUSSION

The evaluation was performed on the new dataset. An interesting observation released from table I is that the four descriptors capture different properties of the performed gesture. In general, the features extracted from one descriptor complement the drawbacks of the features extracted from others. Thence, by combining them together, it is possible to improve the recognition accuracy. We can reach about 81% of accuracy, and this is a proof of the good performance of the proposed machine learning strategy that is able to obtain good results even when combining descriptors with different performances, without being affected by the less performing features.

Much of the effectiveness of this basic solution relies on how precisely we choose the temporal window Wt. the following illustration shows how the accuracy rate varies depending on the temporal window. We note that at Wt= 20, a rate of 81% represents the best accuracy that can be extracted from the LM data with the suggested approach.

temporal window

Fig. 2. Accuracy rate based on variation of temporal window

Table 1. Performance of Im features

Features Accuracy1 (Wi = 20)

Mean ( jj ) 4

Standard deviation (S j

CoYariancc (C)

Root mean square (RMS)

|i + S + C +RMS fih№i %

Finally, Table II provides the confusion matrix for the SVM when all the features are combined together. The diagonal of the matrix shows the correctly classified examples. The dark gray cells represent the most correctly classified examples for that class, while the light gray cells indicate the false positives with failure rate greater than 10%.

Looking more in detail at the results in Table II, it is possible to notice how the accuracy is very close or above 90% for most of them. G4, G5, G6, G7, G8 and G9 are the critical gestures for the LM, and reveal a very high accuracy when recognized from the device. Whereas, gestures G1, G2 and G3, frequently fail the recognition. Both G2 and G3, two reciprocal gestures, have a single raised finger (index) and sometimes are confuse each other. Moreover, there are not spatio-temporal characteristics used to differentiate between them in our approach. G10 is sometimes confused with G4 since an open hand performs both of them. This is due to the limited accuracy of the hand direction estimation from the LM software. It can be also noticed that G1 is another challenging gesture owing to the touching fingers.

Table 2. Confusion matrix for performance evaluation

Gl Gl C4 tS G6 G7 GH (;? ULU Gil

gl 60 ü 15 0 a 20 Ü 5 0 0 a

g2 5 50 25 5 a 0 0 0 15 0 a

gs 0 20 60 1Ü 0 0 5 5 a

04 ü 0 Ü 100 0 0 ü 0 0 0 a

gs ü 0 Ü 0 9« 5 ü 0 0 5

g6 0 0 0 0 & 100 0 0 0 (I 0

g7 ü Ü Ü 0 & 0 1Ü0 0 0 0 0

gs 0 0 0 0 & 0 0 1.00 0 0 a

g9 0 0 5 0 0 0 0 0 95 0 0

Git) ü 0 Ü :o 0 Ü 0 5 75 a

Gil ü 0 ü 5 0 5 ü 5 0 0 85

VII. CONCLUSION

In this paper, we have studied the influence of different parameters on the overall recognition accuracy of a gesture recognition system for the visualization and manipulation of medical images during surgical procedures. To evaluate the

performance of our technique, we have collected a small but challenging dataset of 11 dynamic gestures by the LM sensor. Then the feature vectors of all gestures have been extracted. Besides, the experimental database has been extended to achieve early recognition. Subsequently, we have utilized the train set to model the SVM, while the test set has been used to detect the performance. Finally, the experimental results demonstrate the effectiveness of the proposed method.

In this work, we utilized features based only in positional information. Further research will be devoted to the introduction of novel feature descriptors and to the extension of the suggested approach to the recognition of dynamic gestures exploiting also the temporal information. In addition, we will look to incorporate an alternative model, such as a the hidden markov model, as a segmentation method to determine probable start and stop points for each gesture, and then input the identified frames of data into the convolutional neural network model for gesture classification.

References

1. Leap Motion Controller. [Electronic resource]. URL: https://www.leapmotion.com/ (date of acces: 10.11.2014).

2. Abdallah M. Ben, Kallel Med., Bouhlel Med. S. "An overview of gesture recognition," IEEE, 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). Pp. 20-24, 2012.

3. Vikram Sharad, Li Lei, Russell Stuart. "Handwriting and Gestures in the Air, Recognizing on the Fly". In Proceedings of the CHI 2013 Extended Abstracts. Paris. France, 2013.

4. Weichert F., Bachmann D., Rudak B., Fisseler D. "Analysis of the accuracy and robustness of the leap motion controller". Sensors. Vol.13. Pp. 6380-6393, 2013.

5. Rajesh B. Mapari, Govind Kharat. "Real Time Human Pose Recognition Using Leap Motion Sensor". ICRCICN. IEEE, 2015.

6. Bassem Khelil, HamidAmiri. "Hand Gesture Recognition Using Leap Motion Controller for Recognition of Arabic Sign Language". 3rd International conference ACECS'16, 2016.

СОВМЕЩЕНИЕ ТЕХНОЛОГИЙ ОТСЛЕЖИВАНИЯ ВЗГЛЯДА, РАСПОЗНАВАНИЯ ЭМОЦИЙ И VR Кармадонов В.Ю.

Кармадонов Виталий Юрьевич - магистр, специальность: программная инженерия, Иркутский государственный университет путей сообщения, г. Иркутск

Аннотация: в статье рассматриваются последние тенденции в совмещении и внедрении технологий для улучшения VR опыта пользователя.

Ключевые слова: V, виртуальная реальность, vr технологии, отслеживание эмоций, отслеживание взгляда.

УБ, распознавание эмоций и отслеживание взгляда — троица новых технологий в быстроразвивающихся областях, которые коммерчески привлекательны для инвесторов, смотрящих на десятилетие вперёд в развлекательной и обучающей индустрии.

В последнее время эти технологии хотят взять на вооружение многие разработчики, и, создать принципиально новую технологию, продукт нового

i Надоели баннеры? Вы всегда можете отключить рекламу.