Section 4. Computer science
Khujaev Otabek Kadambayevich, Urgench branch of Tashkent University of Information Technologies Researcher, Computer engineering department E-mail: otabek.hujaev@gmail.com
SELECTION OF ARCHITECTURE AND TRAINING ALGORITHMS OF NEURAL NETWORKS FOR CLASSIFICATION TASK SOLUTIONS
Abstract: The architecture of neural network for classification task solutions and selection of training algorithm are described in the article. The task has been solved with multilayer neural network by the method of back-propagation.
Keywords: multi layer feedforward network, backpropagation, hidden layer, classification.
1. Introduction
Construction of multi-layer neural networks and their implementation in actual (universal) task solutions is considered to be the urgent and complex tasks in branch problem solving. For example, exchange forecast, optical and acoustic signals recognition, automation of medical diagnosis based on medical data and other public sectors can be referred to such tasks.
The theory of neural networks is one of the universal instruments and it is developed with regard to the definite task of a branch. The definition of neural network architecture, as well as the choice of learning algorithm of neural network or the creation of such algorithms can be considered actual, especially, in the solution of the assigned task.
2. Statement of the problem
The training samples are given as following:
X ! =
V x lni
V11
2
AX1
X„ =
ml 1
X
V mnm
ml
2
X
n1
-N
X
X
Here the set is defined as X = (J Xr, X{ | Xj = 0, , , , r=i (i ^ j, i, j = 1,m), where x'pi - is j feature of i object of the
class p; m - the given number of classes; mp - the number
of objects ofp class.
Imagine we have objects x = (x\x2,...,xN) and w = (w 1,w2,...,wN) of RN Euclidean space given as N -dimensional vectors. Then the scalar product of vectors
N
is calculated (x,w) = ^xJwJ.
j=i
Let there is given the set of objects x{ = (x/, xi2,..., xN ),i = 1,n;n = m1 +... + mm,. The MxN matrix is given with values that are equal to 1 or -1 which show the membership to the class of respective objects. Matrix element ofrow and column intersection (i, j) equal to 1 means the membership of i- object to j- class, or if the value is -1, it should be understood that the i- object does not belong to the j class. The elements of the matrix yj ,i = 1, n ;j = 1, m; take the values 1 and -1, where n is the number of objects and m is the number of classes.
Introduce the following symbols:
1) N - number of features of a given object;
2) n - number of objects of learning sample;
3) k - number of neurons in hidden layers;
4) m - number of classes;
5) j - index of j-feature.
nmk /
6) i - index of j- object, for example, xj implies feature- j of object- i
7) r - r- neuron of hidden layer
8) a' - impact value of r- neuron on i-object. Usually it is defined by formula a* = (x{, wir)
9) Threshold function, or logical-sigma function
g (ar) = —1—, here the effect function of r - neuron
i 1 + e ~a'i
on i - object is shown. It can be concluded that the value
Table 1.
of function shows the object's membership to a particular class.
10) The matrix of expectations Y = {y *: yt] = {1,-1},i = Inj = 1m; which elements can be 1 or -1.
Assume we are given initial data set for classification task from the website http://archive.ics.uci.edu/ml/da-tasets.html shown in the table:
Name of data set Number of attributes Number of samples Type of sample Number of classes
Iris 4 150 real numbers 3
Blood Transfusion Service Center 5 748 real numbers 2
Spambase 57 4601 real and whole numbers 2
SPECTF Heart 44 267 Whole numbers 2
Wine 13 178 real and whole numbers 3
The elements of column 1 indicate the names of defined popular tasks, the second column shows the number of parameters of investigated objects, the third column gives total number of objects of learning samples, the fourth column gives the types designation of object parameters, the fifth column - is the number of class selection.
Considered tasks are classic examples of experiments in researches. Generally, developed methods and researcher's algorithms are tested on the above - mentioned tasks.
Task: Selection of efficient algorithms of neural networks training and neural network architecture at classification task solutions based on the above classical experimental data of the table.
Classification task solution made with neural networks application is associated with implementation of two above-mentioned important tasks. Task 1 - selection of neural network architecture. Task 2 - algorithm selection of neural network training.
3. Neural networks architecture and algorithms of neurons learning
Two tasks are continuously interconnected, i.e. selection of neural network architecture and selection of training algorithm of neural network are considered to be a correlated process. Depending on a particular selection of one of the tasks, different results can be derived at classification problem. When selecting the architecture it
is necessary to select an appropriate learning algorithm. Taking into consideration this matching, one can see marked reduction of errors in classification task solutions. The selection of ways of the given tasks is shown below.
3.1. Construction of neural network and its architecture
The construction of a neural network is carried out in two ways: direct and recurrence distribution. When the neural network is created by direct distribution, then the usual neural network consists of input and output layers. The input layer consists of an input vector and a hidden matrix linking with a hidden layer, and the output layer consists of an output vector and output matrix linking with a hidden layer. There is a feedback in recurrently distributed neural networks. It can be described due to the following scheme:
We will use a double-layer direct propagation neural network in construction of a neural network. It is described in the following scheme:
Here x{ = (x/,x{2,...,xN) e X,i = 1,n, and wtr = (wir\wir 2,..., w trN) £ W^*1 - input vectors, a = (a1, a2, ... ., ak) vector of hidden layer results, _ weight matrix of input and hidden layer interaction, W/2) weight matrix of hidden and output layers interaction, y = (y1, y2 ... ., ym) .vector of output layer results. n - number of neural network's input points for the assigned task, it shows the number of data set attributes,
Scheme 1.
m - the number of exit points which equals to the number of allocated classes. Define the number of neurons in the hidden layer with - k that in different tasks can be defined in different ways. Usually, for the assigned task multilayer neural network architecture is given as a number triple (n x k x m). It is necessary to define the required number of neurons of hidden bonding layer of input and output parameters for the given task solution when selecting the neural network architecture.
3.2. Neural network training
Having built the neural network, the neural network training is considered to be the main stage of training at classification task solutions. The method of back-propagation is used for training multilayer neural networks. Due to this method output value of each neuron is transmitted to transfer function and output values of the transfer function will be an input and resultant vectors for the next layer. Transfer function in some sources is used as an activation function or membership function. The requirement to be differentiable is considered to be the main requirement for these functions. The logic -sigma function and hyperbolic tangent are used for hidden layers of neuron as the activation function of back-propagation and linear functions are used in the neuron's output layers as these functions are easily differentiated.
g(a) = g(a) =
1
e" + e
(1) (2)
Here, a is the deviation parameter of activation sigma function.
Training of multilayer perceptron is based on the minimization of error function, i.e., the vector of results A...
and learning sample y..= 1,n;j = 1,m; has correlation
with the weight matrix, and it is expressed as following:
k _ _ _
E(wir) = 2X(yj -g))2,i = 1,n;j = l,mr = 1,k;
r=1
Algorithms based on back- propagation: 1) Lev-enberg-Marquardt[5]; 2) BFGS Quasi-Newton[5]; 3) Resilient Backpropagation [5]; 4) Scaled Conjugate Gradient[5]; 5) Conjugate Gradient with Powell / Beale Restart [4]; 6) Fletcher-Powell Conjugate Gradient [4]; 7) Polak-Ribiere Conjugate Gradient [4]; 8) One Step Secant [4]; 9) Gradient descent with momentum and adaptive learning [4].
4. Calculation results and conclusion
Minimal errors in classification reflect different training functions and different number of neurons in the hidden layers. The results have been obtained via MATLAB program. For example, for Wine task solution we will make the calculations for different algorithms and different architectures of neural networks.
1) Select 15% of hash from the initial database N, and mark it L
1N *15"
" J 100 L
Usually the given sample is called a sample test. The given set of sample is marked as v .
2) The sample test is allocated from the master training sample and a new organic training sample goes through all the processes of neural network training.. The sample test does not participate in a learning process, but its objects are known to be in membership to any particular class.
3) On this stage there will be carried out the operations based on calculated weight matrix.
-a
4) Let the number q is not distributed to the classes properly.
y' = (w,vp), y'eY, vp eV,p = 1,1 Thus, for all objects vp, p = 1,l will be determined in the membership to any particular class.
Table 2.Then, the number of objects in the class L-g_distributed correctly.
Classification error is equal to r =
L - q.
100%
No. Algorithm name Description in MATLAB Minimum error in classification Minimum number of neurons in hidden layers
1. Levenberg-Marquardt Trainlm 4.09 4
2. BFGS Quasi-Newton Trainbfg 3.18 32
3. Resilient Backpropagation Trainrp 2.27 8
4. Scaled Conjugate Gradient Tarinscg 2.27 4
5. Conjugate Gradient with Pow-ell/Beale Restarts Traincgb 2.73 32
6. Fletcher-Powell Conjugate Gradient traincgf 1.82 8
7. Polak-Ribiere Conjugate Gradient traincgp 2.27 2
8. One Step Secant trainoss 1.82 16
9. Gradient descent with momentum and adaptive learning traingdx 1.82 32
Table 3.Provide the minimum error for the given tasks using different algorithms and comparative analysis
................................................ Wine Iris Blood transfusion Heart Spambase Arithmetical mean
Levenberg-Marquardt 4.09 5 2.16 1 21.82 4 19.4 4 6.89 1 3
BFGS Quasi-Newton 3.18 4 3.78 3 24.17 8 22.39 8 10.07 7 6
Resilient Backpropagation 2.27 2 4.86 5 21.82 4 16.12 1 14.1 8 4
Scaled Conjugate Gradient 2.27 2 4.32 4 20.53 3 20.3 5 9.44 5 3.8
Conjugate Gradient with Powell/Beale Restarts 2.73 3 3.24 2 22.03 5 21.79 7 7.5 2 3.8
Fletcher-Powell Conjugate Gradient 1.82 1 5.95 6 21.07 3 22.39 8 8.43 3 4.2
Polak-Ribiere Conjugate Gradient 2.27 2 2.16 1 18.93 1 18.51 3 9.55 6 2.6
One Step Secant 1.82 1 4.86 5 22.78 6 17.61 2 8.82 4 3.6
Gradient descent with momentum and adaptive learning 1.82 1 4.86 5 22.89 7 21.49 6 28.78 9 5.6
Table 4. Minimum errors for the above-set tasks will be the following:
Minimum er- Minimum number
No. Name of data set ror in classification of neurons in hidden layers Algorithm
1. Wine 1,82 8 Gradient Descent Backpropagation with adaptive learning
16 One Step Secant
32 Fletcher-Powell Conjugate Gradient
2. Iris 2.16 2 Levenberg-Marquart
8 Polak-Ribiere Conjugate Gradient
3. Spambase 6.89 32 Levenberg-Marquart
4. SPECTTF Heart 16.12 8 Resilient Backpropagation
5. Blood Transfusion 18.93 32 Polak-Ribiere Conjugate Gradient
In this respect we will deduce that training algorithms and selection of neural network architecture have relationships with each other in classification tasks solution made with neural networks application. This, in turn, results in reduction of errors at classification.
Polak-Ribiere Conjugate Gradient algorithm has given the most effective solution for the selected tasks. And Levenberg-Marquardt and the Resilient Backpropaga-tion algorithms have been the most effective for large content learning samples.
1. 2.
3.
4.
5.
References:
Jeaf Heaton. "Introduction to Neural Networks with Java". Heaton Research inc. St. Louis - 2005.- P. 77-104. Аксенов С. В., Новоцелцов В. Б. "Организация и использование нейронных сетей (методы и технологии)".-Tomsk - 2006.- P. 16-21 p.
Simon Haykin. "Neural Networks and Learning Machines". 3rd edition.- 2009 y.- P. 80-95.
Howard Demuth, Mark Beale. "Neural Network Toolbox, User's Guide.
Осовский С. "Нейронные сети для обработки информации".- Moscow. - 2002.- P. 50-65.