PREPROCESSING DATA FROM THE MOUSE MANIPULATOR FOR USE IN BEHAVIORAL BIOMETRICS ANALYSIS

Uymin A.

TECHNICAL SCIENCES

PREPROCESSING DATA FROM THE MOUSE MANIPULATOR FOR USE IN BEHAVIORAL

BIOMETRICS ANALYSIS

Uymin A.

Graduate student, Vladimir State University Alexander Grigoryevich and Nikolai Grigorievich University Stoletovykh University (VlSU) https://doi.org/10.5281/zenodo.6594842

Abstract

The main tools for analyzing behavioral biometrics based on manipulator data are analyzed and described, the data preprocessing scheme is described, and the dataset preparation scheme is developed dataset based on. A model of user behavioral biometrics been developed, and 4 modules have been defined.

Keywords: biometric authentication, behavioral biometrics, RemoteTopology, data preprocessing, mouse manipulator.

Examples of biological characteristics include fingerprints, hand geometry, face image, iris structure, and retinal pattern. Behavioral characteristics - signature, hand gestures, keystrokes, and gait. The modality of a biometric system depends on a combination of the type of biometric characteristic, the type of sensor, and algorithms for extracting and processing biometric features. Currently, biometric systems attract more attention from researchers [1]. Biometric data collected by sensors, processed by algorithms, and classified by types of characteristics or samples are used to improve the accuracy of recognition of the biometric system.

The study identified the following results:

1) describes the main tools for developing software for capturing data on mouse actions;

2) the method c6opaof collecting and preparing собран a dataset is described, based on data from the mouse manipulator, with the main patterns;

3) projects of the DL model for CA and AD that allow user verification are proposed;

As a research platform, the software product RemoteTopology was chosen, designed for organizing and conducting remote championships and training in operating systems and network technologies [2].The module responsible for data collection was developed from c to языге Python. The model provides a set of parameters for mouse operations, such as each individual user click in a controlled environment. The module is based on the pyHook software package py-Hook[3].The Windows Hooking API is used to provide callbacks in low-level global mouse and keyboard events [4]. Hook Manager records manipulator events.

The connection manager provides for collecting various manipulator events, via callbacks. This paper discusses the issue of getting a dataset, in which all mouse events were recorded, including the message name, message ID, time, window name, X and Y. Our software is able to receive events when they occur and record these events in a log file (CSV format), which is constantly updated. Our data collection software does not record data that can be classified as personal data that is involved in testing. The user. To collect the data, software for recording mouse actions was installed on a reference personal computer.

The basis of the experiment was the Asia-Pacific Best Practice Marathon project Marathon held on the basis of the Khabarovsk Regional Institute of Education Development, as there is a certificate of approbation dated 25.03.2022 No. 10. The competence "Network and system administration" participated in the experiment. The participants determined their jobs independently, i.e. there was no reference to the types of devices and their location, the position of furniture, lighting, etc. In accordance with the infrastructure competence sheet, the minimum requirements for furniture, premises, and technical characteristics of equipment were determined. The described experimental database allowed us to collect a dataset consisting of 20,000 samples of manipulator movement actions, 10,000 images of pointing actions with a click of the manipulator. In total, data collection was performed on 12 participants. Figure 1 shows the RT form when performing a task.

Figure 1-RT when completing a task

Each session resulted in a file with data strings. A click is a recorded mouse action. Mouse hover contains 7 parameters: message name, event ID, message ID, time, window name, X and Y axes. The message name describes the name of the event (for example, move left/right, click down/up). The message ID represents the event ID (for example, the event ID for moving the mouse is 1024, the event ID for the down mouse button is 1023, and the event ID for the up mouse button is 1022). Time is the time that has elapsed in seconds from the beginning to the end of the recorded session, provided that the parameter is removed no more than 5 ms. The window name contains the name of the application being used (for example, in our task, the event occurs

in the Chrome web browser, and therefore, the Window Name parameter shows the Chrome web browser). The X and Y parameters are the coordinates of the cursor on the screen. message ID, time, window name, and the X and Y axes. A limitation of the project at the moment is that it is possible to work in a single resolution, without scaling.

We define the movement of the manipulator as a set of consecutive user actions, which is the movement of the manipulator between two points on screen. Figure 2 shows a mouse action consisting of n events represented as a sequence of n points: {P1, P2, P3,..., Pn} [5].

Figure 2. Moving the mouse between a series of screen locations

Based on the literature review, we divided the data set of mouse actions into three types, similar to [5], namely MM, PC, and DD: MM describes moving the mouse between two screen locations; PC describes hovering and clicking, moving the mouse over a point, and then pressing one of the mouse buttons; and DD is a drag-and-drop movement that is initiated by pressing the main mouse button and ends with releasing it. In our

study, data on mouse actions is divided into two categories: "Category 1" - mouse movement and "Category 2" - hover and click action (the action of pressing and releasing). The command treats the action as a hover-and-click (PC) action, when the previous mouse event is a down mouse click and then an up mouse click; otherwise, it is a data set as a mouse move action (MM). Two complete sets of actions are shown in table 1.

Table 1

Data Acquisition parameters and segmentation of mouse data into actions

Message Name Event ID Time Window Window ID X axis Y axis

mouse movements 1024 43351750 3999736 ChromeLegacy Windows 1121 701

mouse movements 1024 43351781 3999736 ChromeLegacy Windows 1124 712

mouse movements 1024 43351796 3999736 ChromeLegacy Windows 1125 720

mouse movements 1024 43351812 3999736 ChromeLegacy Windows 1127 726

mouse movements 1024 43351828 3999736 ChromeLegacy Windows 1127 728

mouse movements 1024 43354000 3999736 ChromeLegacy Windows 1127 730

left left mouse click mouse click 1023 43354078 3999736 ChromeLegacy Windows 1127 731

releasing the pressed mouse button release mouse button 1022 43354078 3999736 ChromeLegacy Windows 1127 731

mouse movements 1024 43354515 3999736 ChromeLegacy Windows 1127 732

mouse movements 1024 43354531 3999736 ChromeLegacy Windows 1162 733

mouse movements 1024 43354546 3999736 ChromeLegacy Windows 1123 736

Figures 3

and 4 show the user's behavior when moving the manipulator and hover and click actions that belong to users.

Figure 3. User behavior when moving the manipulator and hovering and clicking actions.

The main difficulty when working with data received from the manipulator is their extraction. You need to get data that is suitable for training the ML

model or analyzing it using other methods. We propose the following data preprocessing scheme, shown in Figure 5.

Figure 5. Scheme of actions for data preprocessing.

After preprocessing using this scheme, we will get a dataset with unique timestamps in the UNIX standard. Then, the sampling phase is determined along the time axis, with the condition that all missing records belonging to missing timestamps within the range of data sets are filled with interpolated values. This is achieved by using the "pandas" function in the Python library. Resampling is followed by linear interpolation, which provides continuous time-series data that can be conveniently used for complex time-series analysis. In this case, resampling and interpolation are used to complete the dataset. Data used for time series analysis must have all data points either recorded or interpolated. Most time series methods, such as AROMA / ARIMA [6] and LSTM-based neural network training [7], require LSTM to be complete, without any missing timestamps. Analyzing time series based on incomplete data with missing timestamps can lead to erroneous or inaccurate analytical results.

As a platform for the experimentnpegraraeTCa, the CNN neural network architecture is proposed in which the first layer consists of 64 filters, followed by a second layer with 32 filters, and the last layer contains only 16 filters. A core (1 x 1 pixel) is used for each of the three layers. In addition, the rectified linear unit (ReLU) activation function is used[8]. To reduce the effect of overfitting on the training set, we use the dropout probability p = 0.05 between each two layers. All three convolution levels and the maximum pooling level are connected to a fully connected layer to determine the final probabilities for each user. A pooling layer is located between each convolutional layer and a fully connected pair of layers. In Figure 5, the proposed user behavioral biometrics model consists of four modules: a data acquisition module, a feature extraction module, a classifier module, and a continuous authentication and anomaly detection module. The model is responsible for deciding whether a certain amount of mouse data belongs to a given user. In particular, the

following steps describe how the proposed model works:

• Data collection stage: Initial user data is collected.

• Object extraction phase: Pandas and numpy were used to extract objectsnumpy.

• Data preparation stage: During the training stage, all user data was combined and placed in a random order. The training dataset was then divided into two parts: the first part (80% of the data) was used for training, and the second part (20% of the data) was used for testing the model's performance. For each experiment, the balance of training sets and evaluation sets remained unchanged to avoid biasing the classifier.

• Classifier phase selection: DT, RF, KNN, and CNN were used to show the ability of the proposed model to determine whether a user was authentic or impostor based on the user's mouse click flow data.

• Training data stage: The training process started with reading the characteristics of all users from the training dataset and then loading them into four classifiers to train the model. This step was an important step, since the training data contained the user behavior itself and the class label.

• Data testing phase: After completing the training phase, the model was tested on new data that was never used for training to determine whether the user was a genuine user or an impostor.

Figure 5. Stages of the proposed model's operation.

The use of the described methods in training and independent assessment of employees ' qualifications will increase the reliability and reliability of classical tools for working with network infrastructure [9]. The use of biometrics will reveal unscrupulous participants, which will affect the quality of training and recruitment.

REFERENCES:

1. R. Wang, C. Han, and T. Guo. A novel fingerprint classification method based on deep learning. In 2016 23rd Inter-national Conference on Pattern Recognition (ICPR), pages 931-936. IEEE, 2016

2. Computer program name: Program interface for interaction of participants of WorldSkills competitions in competence 39 "System and network administration" with remote network infrastructure https://www.fips.ru/registers-doc-

view/fips_servlet?DB=EVM&DocNumber=20216147 35&TypeFile=html

3. pyHook. 2021. Available online: https://pypi.org/project/pyHook/ (accessed on 8 April 2021).

4. AF S. M., Marhusin M. F., Sulaiman R. Instrumenting API Hooking for a Realtime Dynamic Analysis //2019 International Conference on Cybersecurity (ICoCSec). IEEE, 2019, pp. 49-52.

5. Ahmed, A.A.E.; Traore, I. Dynamic sample size detection in continuous authentication using sequential sampling. In Proceedings of the 27th Annual Computer Security Applications Conference, Orlando, FL, USA, 5-9 December 2011; pp. 169-176.

6. Ahmar A. S. A comparison of a-Sutte Indicator and ARIMA methods in renewable energy forecasting in Indonesia //Int. J. Eng. Technol. - 2018. - Vol. 7. - no. 1.6. - pp. 20-22.

7. Sen S., Sugiarto D., Rochman A. Komparasi Metode Multilayer Perceptron (MLP) dan Long Short Term Memory (LSTM) dalam Peramalan Harga Beras //Ultimatics: Journal Teknik Informatika. - 2020. - T. 12. - №. 1. - C. 35-41.

8. Agarap A. F. Deep learning using rectified linear units (relu) //arXiv preprint arXiv:1803.08375. -2018.

9. Uymin A. G., Melnikov D. A. REVIEW OF NETWORK INFRASTRUCTURE MODELING TOOLS FOR TRAINING SPECIALISTS IN ENLARGED SPECIALTY GROUPS 09.00. 00, 10.00. 00 //The science. Informatization. Technologies. Education, 2021, pp. 392-405.

THERMAL CALCULATION METHODS OF A SOLAR COLLECTOR FOR HOT WATER SUPPLY

Salmanova F.,

Doctor of Philosophy in Technical Sciences, Associate Professor,

Institute of Radiation Problems, Baku Mustafayeva R.,

Candidate of Technical Sciences, Associate Professor, Institute of Radiation Problems, Baku Mahmudova T.,

Candidate of Physical and Mathematical Sciences, Associate Professor,

Institute of Radiation Problems, Baku Yusupov I., Engineer

Institute of Radiation Problems, Baku Velizade I.

Engineer

Institute of Radiation Problems, Baku https://doi.org/10.5281/zenodo.6594864

Abstract

Thus, it can be seen that the use of Hat solar collectors is an economically justified measure that can reduce the cost of traditional types of energy intended for heat supply. Solar collectors are the ideal solution to replace the seasonal load in heat supply, great for both warm and temperate climates. The advantages of using of a solar collector are ideal for seasonal operation in conditions of high solar insolation.

Keywords: Solar radiation, solar collector, heat engineering calculation, heat supply.

Solar heating systems (STS) are becoming more and more popular in many countries around the world. The success of STS is particularly impressive in Europe.

This problem acquired particular importance when, along with the development and increase in the efficiency of the use of traditional fuel and energy resources, there was a need to attract new energy sources such as solar energy and wind energy.

The energy of the sun is primarily characterized by constant renewal and, at the same time, its usage is not accompanied by a harmful effect on the environment.

The coastal strip of the Azerbaijani sector of the Caspian and especially the Absheron Peninsula have unique solar and wind resources suitable for widespread use of STS. A set of basic climatic requirements necessary for the rational and efficient use of solar energy is also considered.

In connection with this and the existing methodology of "State Citizenship", a thermal engineering calculation of the SVP was carried out according to the recommended data of the month with the highest intensity of solar radiation^ qi = 4439Bt/m2.

Assuming the inlet temperature ti=22°C ((for the month of June), the temperature at the outlet of the solar collector t2=55°C the ambient temperature tcp=25°C h and the efficiency factor n=0,44.

All types of installations with backup sources are calculated according to the data of the month with the largest amount of solar radiation over the period of operation, and systems without a backup source are calculated from the smallest.

The required area of the sun-absorbing surface of the installation collectors without backups A, m2, should be determined by the formula:

A = G / X g, ( 1 )

Where G is the daily consumption of hot water in the hot water supply system G, kg, is taken according to SNiP. gi - hourly productivity of the installation, referred to 1 m2 of the surface of the solar collector, kg / m2; i - estimated hours of operation of the installation.

In case of uneven consumption of hot water by months in installations without backups, the calculation of the area of solar collectors should be performed according to the daily consumption of hot water each month and take the largest of the obtained areas.

The hourly productivity of the installation, kg/m2, is determined by the formula:

(2)

0.86 U

ln

lmaxl-t! tmaxl-t2

Where U is the reduced heat loss coefficient of the solar collector (W/m2 • K), in the absence of passport data, 8 (W/m2 • K) can be accepted for single-glass collectors and 5 (W/m2 • K) for double-glass; tx, t2 coolant

PREPROCESSING DATA FROM THE MOUSE MANIPULATOR FOR USE IN BEHAVIORAL BIOMETRICS ANALYSIS Текст научной статьи по специальности «Медицинские технологии»

Аннотация научной статьи по медицинским технологиям, автор научной работы — Uymin A.

Похожие темы научных работ по медицинским технологиям , автор научной работы — Uymin A.

Текст научной работы на тему «PREPROCESSING DATA FROM THE MOUSE MANIPULATOR FOR USE IN BEHAVIORAL BIOMETRICS ANALYSIS»