Научная статья на тему 'A synthesis of pseudo-Boolean empirical models by precedential information'

A synthesis of pseudo-Boolean empirical models by precedential information Текст научной статьи по специальности «Математика»

CC BY
81
13
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
ПСЕВДОБУЛЕВА ОПТИМИЗАЦИЯ / PSEUDO-BOOLEAN OPTIMIZATION / ДИЗЪЮНКТИВНОЕ ОГРАНИЧЕНИЕ / DISJUNCTIVE CONSTRAINT / МАШИННОЕ ОБУЧЕНИЕ / MACHINE LEARNING / РЕШАЮЩИЕ ДЕРЕВЬЯ / DECISION TREES / ИНТЕЛЛЕКТУАЛЬНОЕ УПРАВЛЕНИЕ

Аннотация научной статьи по математике, автор научной работы — Donskoy V.I.

The problem of decision-making based on partial, precedential information is the most important when the creation of artificial intelligence systems. According to the results of observations over the behaviour of external objects or systems it is necessary to synthesize or, more precisely, extract from the data a mathematical model of optimization of the object on the basis of accumulated empirical information in the form of a finite set of triples: "a state vector, the value of the quality of functioning of the object, a binary indicator of the admissibility of this state". The aim of the work is to create and substantiate mathematical methods and algorithms that allow to synthesize models of scalar pseudo-Boolean optimization with a constraint in the form of disjunctive normal form (DNF) using this precedential information. The peculiarity of pseudo-Boolean optimization models with separable objective functions and DNF constraint, which has a bounded constant length, is their polynomial solvability. However, the complexity of bringing the problem to the form with a DNF constraint in general case is exponential. When extracting the model from the data the DNF constraint is synthesized approximately but with polynomial complexity and the number of conjunctions in the extracted DNF does not exceed the number of examples in the initial precedential information. In the paper is shown how to use binary decision trees to construct a disjunctive constraint, proposed the methods to identify the properties of monotonicity and linearity of the partially defined objective functions, and developed algorithms for solving problems pseudo-Boolean scalar optimization in the presence of incomplete, precedential initial information. The scope of application of the obtained results includes intelligent control systems, intelligent agents. Although the control models derived from the data are approximate, their application can be more successful than the use of less realistic, inconsistent with the objects models which are chosen on the base of subjective considerations.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Синтез эмпирических псевдобулевых моделей по прецедентной информации

Проблема принятия решений по частичной, прецедентной информации является важнейшей при создании систем искусственного интеллекта. По результатам наблюдений над поведением внешних объектов или систем необходимо на основе накопленной информации в виде конечного множества троек: вектор состояния, значение качества функционирования объекта, бинарный индикатор допустимости этого состояния синтезировать или, точнее, извлечь из данных математическую модель оптимизации объекта. Целью работы является создание и обоснование математических методов и алгоритмов, позволяющих синтезировать модели скалярной псевдобулевой оптимизации с ограничением в виде дизъюнктивной нормальной формы (ДНФ), используя указанную прецедентную информацию. Особенностью псевдобулевых оптимизационных моделей с сепарабельными целевыми функциями и ДНФ ограничением, имеющим ограниченную константой длину, является их полиномиальная разрешимость. Однако сложность приведения задачи к форме с ДНФ ограничением в общем случае является экспоненциальной. При извлечении модели из данных ДНФ ограничение синтезируется приближенно, и сложность его аппроксимации оказывается полиномиальной, а число конъюнкций в извлеченной ДНФ не превышает числа примеров в начальной прецедентной информации. В статье показано, как использовать для построения дизъюнктивного ограничения бинарные решающие деревья. Предложены методы выявления свойств монотонности и линейности частично заданной целевой функции и алгоритмы решения задач псевдобулевой скалярной оптимизации при наличии неполной, прецедентной начальной информации. Область применения полученных результатов системы интеллектуального управления, интеллектуальные агенты. Несмотря на то, что модели управления, извлеченные из данных, являются приближенными, их применение может быть более успешным, чем использование менее реалистичных, не согласованных с моделируемым объектом и выбранных из субъективных соображений моделей.

Текст научной работы на тему «A synthesis of pseudo-Boolean empirical models by precedential information»

MSC 68R99

DOI: 10.14529/ mm p 180208

A SYNTHESIS OF PSEUDO-BOOLEAN EMPIRICAL MODELS BY PRECEDENTIAL INFORMATION

V.I. Donskoy, Crimean Federal University, Simferopol, Russian Federation [email protected]

The problem of decision-making based on partial, precedential information is the most important when the creation of artificial intelligence systems. According to the results of observations over the behaviour of external objects or systems it is necessary to synthesize or, more precisely, extract from the data a mathematical model of optimization of the object on the basis of accumulated empirical information in the form of a finite set of triples: "a state vector, the value of the quality of functioning of the object, a binary indicator of the admissibility of this state". The aim of the work is to create and substantiate mathematical methods and algorithms that allow to synthesize models of scalar pseudo-Boolean optimization with a constraint in the form of disjunctive normal form (DNF) using this precedential information. The peculiarity of pseudo-Boolean optimization models with separable objective functions and DNF constraint, which has a bounded constant length, is their polynomial solvability. However, the complexity of bringing the problem to the form with a DNF constraint in general case is exponential. When extracting the model from the data the DNF constraint is synthesized approximately but with polynomial complexity and the number of conjunctions in the extracted DNF does not exceed the number of examples in the initial precedential information. In the paper is shown how to use binary decision trees to construct a disjunctive constraint, proposed the methods to identify the properties of monotonicity and linearity of the partially defined objective functions, and developed algorithms for solving problems pseudo-Boolean scalar optimization in the presence of incomplete, precedential initial information. The scope of application of the obtained results includes intelligent control systems, intelligent agents. Although the control models derived from the data are approximate, their application can be more successful than the use of less realistic, inconsistent with the objects models which are chosen on the base of subjective considerations.

Keywords: pseudo-Boolean optimization; disjunctive constraint; machine learning; decision trees.

Introduction

Discrete pseudo-Boolean models of conditional scalar optimization generally have the form

where Bn = {0,1}n is the set of vertices of a unit ^dimensional cube, Q is the set of admissible solutions defined by constraints of different types, f : Bn ^ R is a pseudo-Boolean objective function, X = (xi,... ,Xi,... ,xn) E Bn is an arbitrary Boolean vector

01

model (1) contains restrictions in the form of linear inequalities and a linear objective

max (min) f (X) under condition x E П С Bn,

(1)

function, such as

max "y ] c-iXi;

i=l

ajiXi + • • • + ajiXi + • • • + üjnxn < bj; j = 1, m; Xi £ {0,1}; Oi,aji £ R; i = 1,n.

(2)

Model (2) is widely used to solve problems of production planning, equipment loading, investment, and in many other applications. When large dimension of the model (2) (when n and m are large) can arise the situation of the complexity or even the impossibility of obtaining the full source numeric information in the form of objective function coefficients Oi, matrix constraints [aji]mxn, and values bj. Moreover, it may be even that there is no information on how the model constraints that define the set of admissible solutions Q are "arranged". The statement of problems solved in this article can be explained using the concept of "black box". The system or controlled object under study is considered as a black box exposed to x and generating some responses in the form of numbers f (x) and possibly values of predicates [X £ Q]. It is required to construct as accurate as possible a model M =< f, Q > approximating its true but unknown model M, knowingly belonging to the class (1) by the given set of impacts and responses. Then find the optimum point X* = agmax(min)f (x)/x £ Q, and if some of the vector variables X are controllable, suggest

M

In broad terms, problems considered in this paper are included in the scope of problems acquisition of optimization models from data and making decisions by incomplete information. Such problems were the first investigated by VI.D. Mazurov [2].

Source partial information is represented by the training sample {(xj,f (Zj),jj)}j=i

f

Yj = [x £ Q] in these points. This source information is believed to reflect the properties of a regular object or system and to be error-free (correct).

1. Basic Definitions and Statements Required

Definition 1. The set of variables (the point) x = (xi,... ,xi,... ,xn) precedes the set z = (zi;... ,zi,... ,zn) (denoted as x ^ z ) if for a 11 i = l,n the inequality xi < zi is

satisfied, and strictly precedes ( x — z ) if Wi (xi < zj) and at the same time 3i (xi < zi).

f

of points x,z £ Dom(f) such that x ^ z the following inequality is satisfied f (x) < f (z).

Definition 3. A pseudo-Boolean function is called a linear one if it has the form f (x... ,xi,..., xn) = o0 + oixi + • • • + oixi + • • • + onxn where o^ are real numbers.

A literal is an expression xa where x is Boolean variable, and a is a Boolean constant such that x° = x when a = l and x° = x when a = 0. To match the values of the Boolean variable x equivalent to its real values, formulas are introduced x = l — xhx = x =

l — (l — x). For variables that accept values only 0 or l the identities xi • xi.....xi = xi

and xi • xi = 0 are performed. Therefore pseudo-Boolean polynomials may not include members that contain a degree variable, and may include only the products of different variables without inversions.

Theorem 1. Any pseudo-Boolean function can be represented as a polynomial.

Proof. Let arbitrary pseudo-Boolean function of n variables f : Bn ^ R takes at points a0,...,aj,..., o2n-1 values y0,Vi ,...,y2n-i, 3j = (aj,...,an). If one multiply the characteristic function of each of these points by the corresponding value of the function f in them and then add the resulting members, then an expression f (x1,x2,... ,xn) =

1 G1 G2 Gn q^

S2=- yj ■ xj ■ x2 ■ ■ ■ Xn wh ere x^ = 1 ^ Xi = aj will be obtained. Rep lacing x0 = 1 — xi7 multiplication and bringing like terms gives the polynomial. ^

In the optimization model (1) constraints, generally speaking, can be presented in different forms - algebraic, graph, logical, even in the form of a specific algorithm. The objective function as well can be represented by various ways. The variety of possible representations of the model (1) makes it reasonable to introduce the concept of the form of representation. For example (2) is a model with constraints in the form of non-strict inequalities and with a linear objective function.

Definition 4. The model of scalar pseudo-Boolean optimization in the form

m

max (min)f (x1;..., xn) provided \J x(11 A ■■■ A xj3k3 = 1 (3)

j=i

()

m

max (min)f (x1;..., xn) provided \J xaqf A ■ ■ ■ A xaqkqq = 0 (4)

q=1

is the second form with DNF constraint.

Theorem 2. Any model of pseudo-Boolean conditional optimization can be represented both in the first and in the second forms with DNF constraint.

Proof. The constraint x E Q in general form contained in the model (1) can be represented in the following equivalent form. We introduce Boolean function ^n(x) = 1 ^ x E Q. Then the constraint takes the equivalent form ^n(x) = 1. Any Boolean function can be represented in a disjunctive normal form, so when representing the function in the DNF the first form can be obtained. Defining otherwise Boolean function: ^n(x) = 0 ^ x E Q one can get the second form. ^

Lemma 1. If in the linear pseudo-Boolean optimization model (1) all coefficients of the constraints are positive and

aj1x1 +----+ ajixi +----+ ajnxn < bj ^ ^j (x) = 0

then ij is monotone Boolean function and ^q = \Jm=1 j ^ monotone Boolean function. Proof Let a X /3, i.e. Vi (ai < /3i). Then

nn

Sa = ^2 ajiai — bj < ^ aji/i — bj = S^ i=1 i=1

By the condition of the Lemma Sa < 0 ^ Vj(<3) = 0 и S^ < 0 ^ Vj(3) = 0. Because of the inequality S& < S^ the following mathematical expressions are valid: Vj (3) = 0 ^

Vj (<3) = 0 и Vj (<3) = 1 ^ Vj (3) = 1 whence it follows the in equality Vj (<3) < Vj (3), i.e. Vj is monotone Boolean function. The class of monotone Boolean functions is closed and contains disjunction, so the function Фп = Vj=i Vj ^ also monotone. n

Theorem 3. If the problem of conditional linear pseudo-Boolean optimization (2) with positive coefficients in constraints-inequalities is presented in the second form with DNF constraint

Фп(х)= V <f Л^Л xXq =0,

q = 1

then Фп is monotone function.

Proof. Using Theorem 2 and Lemma 1 it is easy to get a proof of the theorem. n

Theorem 4. The problem of scalar pseudo-Boolean conditional optimization of a linear function with n variables and DNF constraint

n m

max (min) ^ CiXi / \J xj Л • • • Л Xjj = 1 (5)

i=i j=i

which contains m conjunctions is solvable with a time complexity O(mn).

Proof. DNF-constraint in (5) contains m conjunctions xjl1 Л • • • Л Xj^3 each of which is drawn per unit at its corresponding interval of rank kj. Boolean variables with numbers j 1,..., jkj are fixed for this interval by values Oj1,... ajkj, and the other variables are

Xi

if ci > 0, equal to zero, if ci < 0, and arbitrary when ci = 0. Therefore, the linear pass at m

intervals will require O(mn) of calculation steps. n

Corollary 1. If the problem of conditional optimization of a linear pseudo-Boolean function is reduced to the form with DNF constraint for polynomial number of steps, then it is polynomial solvable.

Most often, the problems of conditional optimization of pseudo-Boolean functions including those presented by the model (2) are NP-hard. The exceptions are, in particular, models with separable objective function the set of constraints of which corresponds to the structure of the matroid. Therefore, taking into account corollary 1, the construction of the constraint DNF is in itself a complex problem. However, within the framework of the problems considered in this article, the approximation of the domain of admissible solutions by logical machine learning algorithms leads to the construction of DNFs having a length not exceeding the number of training sample examples.

2. DNF Constraint Synthesis as a Problem of Machine Learning: Approach Based on the Use of Decision Trees

If the initial partial information contains the predicate values Yj = [Xj £ at the training sample points then mathematical precedent-based machine learning methods

can be used to construct an approximation t/3Q(x) or ^q(x) of ^ in the form of DNF. The most suitable for this purpose are learning algorithms based on the construction of binary decision trees (BDT) which are algorithmic operators that display training sequences of precedents in a family of Boolean functions having a tree-like structural representation [3]. It is necessary for such algorithmic operators to possess the ability for empirical generalization and guarantee leamability in the sense of arbitrarily accurate approximation of the constructed empirical DNF V to the true but unknown DNF V as the number of precedents in the training sample grows.

When one builds BDT classifying the points x E Bn (considered in this article the case - onto two classes: satisfying the constraints and do not satisfying them), the initial information is a training sample {(xj ,Yj)}j=15 where Yj = 1 ^ x3j E Q and Yj = 0 ^ x3j E Q- The procedure of BDT synthesis consists in sequential execution of the same-type

Bn

Bn n

one of the variables and two outgoing edges that correspond to the unit and zero values of the selected variable. Such a partition is called splitting or branching. Branching continues until the condition of stopping is met which is the presence in each interval of the resulting partition (or, let's say, in each leaf of the tree) of points of only one and the same class [4].

The following simple example is intended to illustrate the proposed approach to the construction of a constraint in the form of DNF by decision tree machine learning.

Example 1. Assume that the learning information was generated by the following M model with a single constraint in the form of an inequality:

max f (x1, x2, x3, x4) = 2x1 + x2 + 4x3 + 2x4; 3x1 + x2 + 2x3 + x4 < 3. (6)

Let the target function in (6) be known exactly, and the area of admissible solutions actually defined by inequality (6) is unknown, but is partially represented by eight precedent points (Table 1).

Table 1

Precedents for learning

x1 x2 x3 x4 Y

0 0 0 1 1

0 0 1 1 1

0 1 0 1 1

0 1 1 1 0

1 0 0 0 1

1 0 1 0 0

1 0 1 1 0

1 1 1 0 0

Y

zero - otherwise. One of the possible BDT built on the learning sample contained in the Table 1 is presented on Figure. This BDT is equivalent to the following DNF:

x1x2 V x1x2x3 V x1x3 = x1x2 V x3. (7)

We obtain approximation <p(x) = x1x2 V x3 and, accordingly, constraint pn(x) = 1 in the first form. Since ^n(x) = 0 ^ <p(x) = 1, we obtain the approximating function ^n(x) = x1x2 V x3 = x1x3 V x2x3. DNF (7) defines two possibilities for choosing a solution: either xi = x2 = 0, and the remaining variables can be assigned by any (since the coefficients of

x3 = 0

can be assigned by units. As a result, f (0, 0,1,1) = 6, f (1,1,0,1) = 5, and the decision max f (x) = 6 at point x = (0, 0,1,1). Approximations pn(x) and ^n(x) are different from the true functions

(x) = xx V xx V xx V x2x3x4 = 1 ^ 3x1 + x2 + 2x3 + x4 < 3 and

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

6 3 5 2

*n(x) = x1x2 V x1x3 V x1x4 V x2x3x4 = 0 ^ 3x1 + x2 + 2x3 + x4 < 3

(under conjunctions of DNF of function pn(x) the maximal values of the object function 2x1 + x2 + 4x3 + 2x4 in the intervals corresponding to these conjunctions are signed).

i о

Binary Decision Tree

In a particular case considered in the example 1 the constraint </3q(X) = 1 built on the learning sample information really allowed to find the exact solution of the problem (£»). But in the general case, it is no guarantee of obtaining an exact solution of this problem according to partial learning information. □

Denote Q = {x : <pn(x) = 1}. When constructing a BDT to obtain an approximation of the constraints region Q at each branching step t the set Bn is split into shredded intervals N\,..., Ns,..., Nt.

Definition 5. Interval Ns C Bn is called correct if Ns c Q or Ns C Bn \ Q. The union

t

correct approximation region Q'C°rr, and the set Bn \ Q<corr is called incorrect region. Theorem 5.

1° Q1°rr C • • • C Qcs°rr ^•••C Qct°rr for a 111 satisfying the double in equality 2 < t < T where T is the final step of the BDT synthesis;

2° In some step t* < (2n — 1) the equality Qt°rr = Bn can be performed provided that the length of the training sample l > t* + 1.

Proof. 1) Let's suppose that after the branching step t-1,t > 2, the correct approximation region Q—1" is formed (this region may be empty at the first branching steps). By definition, it consists only of correct intervals. In step t the correct intervals are not partitioned, so a certain interval N from the region Bn \ Q— will ke selected for partitioning. This interval will be split into two intervals N0 h N1 : No UNi = N If at least one of the intervals No, N1 will be correct the area of correctness will expand otherwise it will remain unchanged, so

Qcorr Qcorr t— 1 — t

Bn

step, some selected interval of already obtained partition is split into two new intervals and sooner or later the equality QCorr = Bn can be fulfilled. Indeed, in the step t of the partitioning t+1 intervals will be obtained, and in the worst-case such process will continue until a step t = 2n — 1 when the set Bn will be split onto 2n intervals of rank n so that each of them will contain exactly one point from the correct learning sample. Therefore, the equality QO—1 = Bn will certainly be achieved.

Theorem 5 states the monotonous refinement of the region of correct approximation of the partially given constraint with the growth of the length of the training sample right down until an accurate result. But herewith the correctness of all examples in the training sample is obligatory condition. The latter condition is natural in the approach to extracting optimization models from the data when regular processes and systems are investigated. In this case, we are not talking about any probabilistic distributions, but a possibility exists to assess the acceptability of the synthesized DNF on the base of Kolmogorov approach to the evaluation of the regularity as non-randomness.

A.N. Kolmogorov emphasized the need to distinguish between actual randomness as the absence of regularity and a stochastic randomness as the subject of the theory of probability [5, p. 42]. When empirical extraction of regularity based on the Kolmogorov approach, it is possible to estimate the non-randomness of the found regularity, in the considered case - of empirical DNF approximating the constraints of the optimization problem.

Definition 6. A result of machine learning is called an exact tuning on a training sample when the obtained empirical decision rule accurately calculates the approximable value at each sampling point (for each precedent).

Theorem 6. [6] Let empirical regularity is extracted from the family F and an appearance in the sample {(Xj ,Yj )}j=1 of any example from the general population Bn+l x • • • x Bn+\

i

is equiprobably. Then the probability P(F, l) of a random exact tuning on a training sample of length l satisfies the inequality P(F, l) < 2—l+K(:p} where K(F) is Kolmogorov

F

Corollary 2. When the conditions of the theorem 6 are met there is the estimation P(F,l) < 2—l+pVCD(:p} where pVCD(F) is upper bound on the Kolmogorov complexity of the family F which is obtained by pVCD method [6].

If binary decision trees are used to extract such regularity as DNF constraint then the family F is a class BDTn,^ of trees with at most j leaves and from n Boolean variables. It is known the estimation [7]

pVCD(BDTntfl) = (a - 1){\log2n] + \log2(a + 3)]). (8)

We require the conditionP(F, l) < £ which will lead to an equation to determine the required length of the training sample to ensure that the probability of random extraction of the DNF constraint will not exceed £. From the equation £ = 2-i+(M-i)(\iog2u]+\io92{„+3)]) we get I = iog2i + ^ - i)n\log2n] + \log2(ta + 3)]). The results of calculating the required sample length l = l(n,a,£) are shown in the Table 2.

Table 2

The required length of the training sample when - = 128, £ ~ 0, 0078

£

п 20 20 20 50 50 100

¡1 20 50 100 30 100 100

197 546 1195 355 1294 1393

3. Approximation of Partially Defined Pseudo-Boolean Functions

Assume that objective function / is monotone, we need to find its maximum, and monotone approximation / is obtained. Let the empirical DNF constraint Vj=i Kj = 1 is constructed. Then in any interval of the set Bn which corresponds to the conjunction

о ■ о ■ - <3k ■ ^

Kj = Xj11 A • • • A A • • • A Xjk 3 the extreme value / (X*Kj) is immediately found in the point x*Kj which is defined as follows. Variables with numbers j such that aji = 1 are

getting unit values, aji = 0 - zero values, and the rest variables, free from entering the Kj

^ ^ о ■ о ■ °3k ■

by units. Next is x* = argmaxj / (x*Kj) = argmaxj / (X) / Xj11 A^ • • A Xj3i A^ • • A Xjk j = 1. It is easy to make sure that the time complexity of the described algorithm for finding the extremum of a monotone pseudo-Boolean function with DNF constraint is estimated as O(mn).

As shown above, information on the monotony of the function partially defined by sample greatly simplifies the solution of the problem. To check the monotony condition on a given training sample it is quite simply by the following Algorithm 1.

Algorithm 1. Checking the consistency property with monotonicity of a function on a training sample.

Input: the correct sample {(Xj,/(Xj))}j=1.

Output: M = 1 if the sample does not contradict the monotony condition otherwise M = 0.

M := 1;

for j := 1 to l — 1 do

for s := j + 1 to l do

if (Xj ^ Xs) A (f (Xj) > f (xs)) v (xs ^ Xj) a (f (xs) > f (Xj))

then M := 0; stop end then.

Let us denote further (C,x) = 5^= cixi-

Definition 7. The function f which is partially defined by the training sample {(Xj,f (Xj))}j=1 allows linear approximation if there is such a vector C E Rn that for any pair of points (Xp,Xq) from this sample such that f (Xp) < f (xq) the inequality (C , Xp) < (C, Xq) is satisfied .

f

{(xj,f (xj))}j=i where f (xj) = f (xm), 1 < j < m < I, admits a linear approximation if and only if in sorted by values of objective function f (xj) sequence of sample points j ,...,Xjp ,...,xjl such th at f (Xj1) < ••• < f (Xjp) < ••• < f (Xjl) for a 11 p = 1, 2,..., (l — 1) the points Xj1,..., Xjp can be separated from the points Xjp+1,..., Xjl by the hyperplane (C *,X) = \p, at th at (C*,Xjp) < (C *,Xjp+1).

Proof The necessity. Let the sample allows a linear approximation. Then there exists a

j) < (C*,xjp+i,

vector C * E Rn such th at (C *,Xj1) < ••• < (C * ,Xjp) < (C *,Xjp+1) < ••• < (C *,Xjl). It

is obvious that the hyperplanes (C*,X) = \p, p = 1, (l — 1), where Xp = 2((C*,Xjp) + (C*,Xjp+1))) satisfy the condition of the theorem.

Sufficiency. Let the separating hyperplanes (C*,X) = Xp exist. Then the vector C* satisfies the admissibility requirement of linear approximation by the transitivity of the relation "<". □

Theorem 8. If for the sorted by values of objective function f (Xj) sequence of the points Xj1 ,...,Xjp ,...,Xjl such th at f (Xj1) < ••• < f (Xjp) < ••• < f (Xjl) for all p = 1, 2,..., (l — 1) exists a hyperplane (Cp,X) = Xp separating the p oints Xj1 ,...,Xjp from the points Xjp+1,... ,Xjl and (Cp,Xjp) < (Cp,Xjp+1) then a vector C* E Rn exists which defines the hyperplanes (C*,X) = f3p separating the points Xj1 ,...,Xjp from the points Xjp+1,... ,Xju herewith (C*,Xjp) < (C*,Xjp+1).

Proof The opposite assumption

3 C* E Rn yp (C*,xjp) < (C*,Xjp+1) ^ V C E R 3p : (C\xJp) > (C\xJp+1)

entails the denial of the condition 3 Cp E Rn (Cp,Xjp) < (Cp,Xjp+1). □

To verify the possibility of linear approximation of the objective function according to the theorems 7 and 8 we need to check the linear separability of points Xj1,... ,Xjp,..., Xjl of the training sample which are ordered by increasing values of the objective function. Namely: j1 point from all the others, j1 point and j2 points from all the others and forth until separability of all points with numbers j1; ..., j(i-1) from the point ji. Overall we

l—1

Remark 1. If the training sample {(Xj, f (Xj))}j=1 contains a subset of the points with

f

one of the groups to be separated. It is easy to verify that with this addition the theorem 7 remains true, and the number of separability checks required will be one less than the number of subsets of training sample points with the same values of the objective function.

To check sequentially the separability of two groups of points with simultaneous construction of separating hyperplanes (in the direction of increasing the objective function

as in 8) it is advisable to use the iterative procedure of Eosenblatt-Novikov linear correction [8]. If two finite sets of points G1 and G2 are linearly separable then this procedure provides constructing a separating hyperplane in a finite number of correction steps k < \D2/p2] where D = supieG1uG2 ||X||, and p is half the distance between convex G1 G2

Let us denote Xj = (X1,..., xjn, 1) G Bn+1 - extended vector representing the description of the point Xj with adding the (n + 1)-th coordinate to which assigned a value 1; C = (c1,... ,cn ,cn+1) G Rn+1 is an extended vector which defines the separating hyperplane c1x1 + • • • + cnxn + c0 = 0. The linear correction procedure starting with an

C

13 and 14 of the Algorithm 2 below. The parameter of this algorithm is the maximum number of exceeding of cyclic views of the sample when learning what means that the

G1 G2

the theorem 8.

Algorithm 2. Test of admissibility of linear approximation of the function f.

Input: A sample {(Xj,f (Xj)) }j=1 sorted by non-decreasing values f (Xj);

Max - the maximum number of cyclic views of the sample when learning. Output: L = 1 if the sample does not contradict the monotony condition otherwise L = 0.

a

1;

1: L := 1;

2: C = (0,..., 0, 0); 3:q :=1;

4: while f (Xq) = f (Xq+1) do q : 5: if q = l then stop;

6: G1 := {Xi,. 7: G2 := {X+ 8: Count := 0; 9: t := 1:

\\ A flag of result.

\\ Initialization.

\\

q + 1; \\ If the function values are equal. \\

.., Xp,..., Xq}; \\ The first group to be separated.

,..., Xp,..., Xl}; \\ The second group to be separated.

\\

\\

10: if Count > MaX then L := 0; stop end then \\ Corrections number > MaX. 11: LS := 1; \\ Assume that corrections was not, and next separator is constructed. 12: for p := 1 to l do \\ Cycle by all points of the sample.

13: if (C, Xp) > 0 A Xp E G1 then C 14: if (C, Xp) < 0 A Xp E G2 then C :

C—a C+a •

X X

p

p

t +1; LS :--t + 1; LS :--

0 0

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

p

LS = 0 10 \\

16: if q < (l — 1) then q := q + 1; goto 4 end then. \\ To the following separation.

If it turns out that the data does not allow linear approximation, it is possible to check the possibility of quadratic approximation. Assuming that f (x) = Sn-11Sn=i+1cijXiXj + jcojXj+coo and by change of variables x1x2 = y1, x1x2 = y2,..., x1 = yn(n-1)/2+1, x2 = yn(n-1)/2+2,..., xn = yn(n+1)/2 it is easy to go to the capability check and then - actually

n(n + 1)/2

of variables becomes very large but iterative algorithms allow to cope with this task.

For nonlinear approximation in the general case regression trees [4] and the trained neural networks are applicable.

References

1. Antamoshkin A.N., Macich I.S. Search Algorithms for Conditional Pseudo-Boolean Optimization. Control, Communications and Security Systems, 2016, no. 1, pp. 103-145. (in Russian)

2. Mazurov VI.D. Application of Methods of Theory of Pattern Recognition in the Optimal Planning and Management. Proceeding of I-st all-Union Conference on Optimal Planning and National Economy Management. Moscow, 1971, p. 49. (in Russian)

3. Rokach L., Maimón O.Z. Data Mining with Decision Trees: Theory and Applications. New Jersey, London, Singapore, Bejing, Shanghai, Hong Kong, Taipei, Chennai, World Scientific, 2014. DOI: 10.1142/9097

4. Loh W.-Y. Classification and Regression Trees. Data Mining and Knowledge Discovery, 2011, vol. 1, no. 14, pp. 14-23.

5. Kolmogorov A.N. Algorithm, Information, Complexity, Moscow, Znanie, 1991. (in Russian)

6. Donskoy V.I. Complexity of Families of Learning Algorithms and Estimation of the Nonrandomness of Extraction of Empirical Regularities. Cybernetics and Systems Analysys, 2012, vol. 48, no. 2, pp. 233-241. DOI: 10.1007/sl0559-012-9402-2

7. Donskoy V.I. Capacity Estimates of the Main Classes of Empirical Generalizations Derived by the pVCD Method. Scientific Notes of Taurida National V.I. Vernadsky University, 2010, vol. 23 (62), no. 2, pp. 56-65. (in Russian)

8. Nilsson N.J. Learning Machines. N.Y., McGraw-Hill, 1965.

9. Bonates T.O., Hammer P.L. Pseudo-Boolean Regression. Rutcor Research Report RRR 32007. New Jersey, Rutgers Center for Operations Research of Rutgers University, 2007.

Received February 25, 2018

УДК 519.854 Б01: 10.14529/ттр180208

СИНТЕЗ ЭМПИРИЧЕСКИХ ПСЕВДОБУЛЕВЫХ МОДЕЛЕЙ ПО ПРЕЦЕДЕНТНОЙ ИНФОРМАЦИИ

В.И. Донской, Крымский федеральный университет, г. Симферополь, Российская Федерация

Проблема принятия решений по частичной, прецедентной информации является важнейшей при создании систем искусственного интеллекта. По результатам наблюдений над поведением внешних объектов или систем необходимо на основе накопленной информации в виде конечного множества троек: «вектор состояния, значение качества функционирования объекта, бинарный индикатор допустимости этого состояния » синтезировать или, точнее, извлечь из данных математическую модель оптимизации объекта. Целью работы является создание и обоснование математических методов и алгоритмов, позволяющих синтезировать модели скалярной псевдобулевой оптимизации с ограничением в виде дизъюнктивной нормальной формы (ДНФ), используя указанную прецедентную информацию. Особенностью псевдобулевых оптимизационных моделей с сепарабельными целевыми функциями и ДНФ ограничением, имеющим ограниченную константой длину, является их полиномиальная разрешимость. Однако сложность

приведения задачи к форме с ДНФ ограничением в общем случае является экспоненциальной. При извлечении модели из данных ДНФ ограничение синтезируется приближенно, и сложность его аппроксимации оказывается полиномиальной, а число конъюнкций в извлеченной ДНФ не превышает числа примеров в начальной прецедентной информации. В статье показано, как использовать для построения дизъюнктивного ограничения бинарные решающие деревья. Предложены методы выявления свойств монотонности и линейности частично заданной целевой функции и алгоритмы решения задач псевдобулевой скалярной оптимизации при наличии неполной, прецедентной начальной информации. Область применения полученных результатов - системы интеллектуального управления, интеллектуальные агенты. Несмотря на то, что модели управления, извлеченные из данных, являются приближенными, их применение может быть более успешным, чем использование менее реалистичных, не согласованных с моделируемым объектом и выбранных из субъективных соображений моделей.

Ключевые слова: псевдобулева оптимизация; дизъюнктивное ограничение; машинное обучение; интеллектуальное управление; решающие деревья.

Литература

1. Антамошкин, А.Н. Поисковые алгоритмы условной псевдобулевой оптимизации / А.Н. Антамошкин, И.С. Масич // Системы управления, связи и безопасности. - 2016. -Т. 1, № 1. - С. 103-145.

2. Мазуров, В.Д. Применение методов теории распознавания образов в оптимальном планировании и управлении / В.Д. Мазуров // Труды I Всесоюзной конференции по оптимальному планированию и управлению народным хозяйством. - М.: ЦЭМИ, 1971.

3. Rokach, L. Data Mining with Decision Trees: Theory and Applications / L. Rokach, O.Z. Maimón. - New Jersey; London; Singapore; Bejing; Shanghai; Hong Kong; Taipei; Chennai: World Scientific, 2014.

4. Loh, W.-Y. Classification and Regression Trees / W.-Y. Loh // Data Mining and Knowledge Discovery. - 2011. - V. 1, № 14. - P. 14-23.

5. Колмогоров, А.Н. Алгоритм, информация, сложность / А.Н. Колмогоров. - М.: Знание, 1991.

6. Донской, В.И. Сложность семейств алгоритмов обучения и оценивание неслучайности извлечения эмпирических закономерностей / В.И. Донской // Кибернетика и системный анализ. - 2012. - Т. 48, № 2. - С. 86-89.

7. Донской, В.И. Оценки емкости основных классов эмпирического обобщения, полученные pVCD методом / В.И. Донской // Ученые записки Таврического национального университета им. В.И. Вернадского. - 2010. - Т. 23 (62), № 2. - С. 56-65.

8. Нильсон, Н. Обучающиеся машины / Н. Нильсон. - М.: Мир, 1967.

9. Bonates, Т.О. Pseudo-Boolean Regression / Т.О. Bonates, P.L. Hammer // Rutcor Research Report RRR 3-2007. - New Jersey: Rutgers Center for Operations Research of Rutgers University, 2007.

Владимир Иосифович Донской, доктор физико-математических наук, профессор, кафедра информатики, Крымский федеральный университет (г. Симферополь, Российская Федерация), [email protected].

Поступила в редакцию 25 февраля 2018 г.

»

i Надоели баннеры? Вы всегда можете отключить рекламу.