Научная статья на тему 'Elaboration of a vector based semantic classification over the words and notions of the natural language'

Elaboration of a vector based semantic classification over the words and notions of the natural language Текст научной статьи по специальности «Языкознание и литературоведение»

CC BY
106
38
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
NATURAL LANGUAGE GENERATION / NATURAL LANGUAGE SEMANTICS

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Safonov K. V., Lichargin D. V.

The problem of vector-based semantic classification over the words and notions of the natural language is discussed. A set of generative grammar rules is offered for generating the semantic classification vector. Examples of the classification application and a theorem of optional formal classification incompleteness are presented. The principles of assigning the meaningful phrases functions over the classification word groups are analyzed.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Elaboration of a vector based semantic classification over the words and notions of the natural language»

system analysis: 1) management of regional social-economic system development in application to the analysis of regional industrial policy; 2) the coordination of the manufacturer, the investor and the supplier of the equipment contract (a problem of firm development); 3) the restructuralization of large enterprises in the machine-building branch; 4) development of investment analyst workplaces in the construction industry, and hypothecary crediting.

Now the package of applied programs [5], facilitating the multicriterial dynamic analysis and static linearproblems of economic dynamics is developed. The use of a specified package increases the validity of decision-making in the global social-economic development management; including the interests of many persons for the reception of WSES parameter ranges and the optimum values of the operating variables. This provides stable development for as long as possible.

Bibliography

1. Форрестер, Дж., Мировая динамика / Дж. Форрестер. М. : Наука, 1978.

2. Махов, С. А. Математическое моделирование мировой динамики и устойчивого развития на примере модели Форрестера: препринт Ин-тащ)икл. математики им. акад. М.В. Келдыша Рос. акад. наук/С. А. Махов. М., 2005.

3. Solte, Dirk. Weltfinanzsystem am Limit - Einblicke in den “Heiligen Graal” der Globalisierung / D. Solte. Berlin : TerraMediaVerlag, 2007.

4. Радермахер, Ф. Бадмс или разрушение. Экосоци-альная рыночная экономика как ключ к устойчивому развитию мира / Ф. Й. Радермахер ; ForSIS. Некоммерческое партнерство «За устойчивое информационное общество в России». Новосибирск. 2008.

5. Конструктор и решатель дискретных задач оптимального управления («Карма»): ^о^тма для ЭВМ / правообладатели А. В. Медведев, П. Н. Победаш, А. В. Смольянинов, М. А. Горбунов. Зарегистрировано Федер. службой то интеллект, собственности, патентам и товарным знакам Роспатент) 11.09.2008, № 20086143 87.

© GorbunovM. A., MedvedevA. V., Pobedash P. N., Semenkin E. C., 2009

K. V Safonov

Siberian State Aerospace University named after academician M. F. Reshetnev, Russia, Krasnoyarsk

D. V Lichargin SiberianFederalUniversity, Russia, Krasnoyarsk

ELABORATION OF A VECTOR BASED SEMANTIC CLASSIFICATION OVER THE WORDS AND NOTIONS OF THE NATURAL LANGUAGE

The problem of vector-based semantic classification over the words and notions of the natural language is discussed. A set ofgenerative grammar rules is offeredforgenerating the semantic classification vector. Examples of the classification application and a theorem of optional formal classification incompleteness are presented. The principles of assigning the meaningful phrases functions over the classification word groups are analyzed.

Keywords: natural language generation, natural language semantics.

One of the most important problems of the formal languages theory, a subdivision of theoretical computer science, is the problem of syntactic and semantic analysis of a given language sentences. Respecting the study of the natural and machine language structure, the foreground is the problemof generating the natural language i. e. grammatically and semantically meaningful phrases and texts of such languages, which satisfying definite meaningfulness criteria. For example, the Turing test. The importance of the matter is determined by the significance of such applied tasks as building natural-language interfaces, developing expert systems, electronic translators, electronic summarizing systems, e-learning systems, advertisement of user dialogue software provision, etc.

The principle purpose of this research is to offer a classificationof natural language words and notions, allowing the generation performance for the meaningful speech and

definition of meaningful speech criteria. The basic task is to determine the classificationvectorfornatural speech words and notions, creating a dictionary for the classification of a set of the commonest English words. This make possible the algorithms of meaningful speech generation based on the given classification, proving the theorem of the optional formal classification incompleteness for the description of the differences in natural language word meanings.

The novelty of the work is reduced to the distinguishing particularities and the application efficiency of the generative grammar, described above, for the generation of the vector coordinates for the natural language word and notion classification and the particularities of using the classification for natural language generation.

A great number of researchers currently work on the problem of generating the meaningful subset of the language:

philologists, programmers, mathematicians, semantics experts, philosophers, etc. [1; 2; 3; 4]. Especially surprising for today are results in generating natural language grammatically meaningful phrases. Text editors, electronic translators and other systems effectively carry out the generation of grammatically meaningful language structure. However, the generating process of semantically meaningful speech is a less studied topic. Although many systems based on semantic nets, speech graffiti, ontology and other methods, they still show good results in a dialogue with the natural language user. The most popular method of sustaining the dialogue with the user is reduced to the application of databases in natural language dialogues between people, participators of forums, etc. Insufficient developments are provided for the natural language phrases and texts presentation in the form of functions and functional clusters over a multidimensional semantic classification, in spite of the fact that the method shows its efficiency for the generation of meaningful speech [5;6;7].

Classification ofNatural Language Words and Notions. Let’s look at a semantic classification of natural language words and notions, reduced to16 classes oflanguage semes (semantic, meaning “atoms”) and further to four gene-semes (elementary particles of meaning). Thento anotionoflink (a meaning “quantum”), that can be shown based on the notional semantic nets’ apparatus. The definition based on the meaning quantum is a semantic net with arcs baring the notion semantics of some elements’ equivalence, which means a link between objects.

Using four elementary particles - gene-semes such as {system, classification, localizationandperception} itispossible to determine the natural language. Localization is determined as an object, in which there is a similarity between all levels of the subsystems; for example, a triangle formed by the stars of a galaxy is similarto any proportional triangle createdby the planet houses of the star system. Perception is defined as an object, where all the subsystems (perceived) are similar to the supersystems (perceiving). For example, a real image of avase in the light specter will forman information similarity pyramid, at first in the pupil of the eye, and then in the brain. The structure is defined as an object with heterogeneous and super-systems. For example, the structure of the automobile body and wheels are heterogeneous. Classification is defined as an object with the similarity of all subsystems to the super-systems. For example, crab apples posses all the properties of apples, while apples posses all the properties of fruit.

Using four gene-semes it is possible to determine 16 classes of semes. We shall give some examples of such a definitionforthe semes class: “Basic semes”:

- creature - perceiving and localized in space;

- thing - not perceiving and localized in space;

- mind - perceiving and not localized in space;

- abstraction - notperceiving and not localized in space;

- idea - perceived and not localizing in space;

- place - not obligatory perceived and localized in space;

- information - perceived and localizing;

- abstraction - not obligatorily perceived and not localizing.

The following basic classes of meaning atoms are determined as semes of the natural language:

1. Basic semes: creature, place, informationandothers.

2. Semes ofprobability: existing, non-existing, necessary, possible and the derived ones.

3. Semes-predicates: relation-x, relation-x-x, relation-creature-x and others.

4. Semes-arguments: subject, object, recipient, instrument and others.

5. Semes oflocalization: of, in, on, at and others.

6. Semes-relations: includes, is includedin, includes and is included in, partially includes, is more than, is less than and others.

7. Semes-numbers: digitsfromOto 15.

8. Semes of indefinite number: all, many, some, few, no and others.

9-12. Semes of the language stylistics: positive -negative, low - high and others.

13-16. Semes, characterizing the description of images and forms: wide - narrow, stable - unstable and others.

Based on the natural language semes classification a natural language notions classification vector of five coordinates is offered. The values of the G vector coordinates are assigned by means of a generative grammar of the followingform:

1. The first level of the notions classification corresponds to the coordinate Gl of the vector G. Let Gl = {something, relation, mind, idea, information, place, thing, creature}.

2. The second level of the notions classification is presented by the coordinate G2. A set G2 of the coordinates value for the classification is assigned by a set of generative grammar rules: {S k Fd, S k Fx, d k alive, d k not alive, x k which alive, x k which not alive, f k of, f k in, f k on, f k at}, where notion At means any not zero distance between objects.

3. The third level of the notions classification is determinedbythecoordinate G3, G3= {X-y (essence), X-X-y (essence of essence), omnorneHue-X-y (property), omno-tuenue-X-X-y (connection), omHouieHue-^^ecmeo-X-y (action), omHoiueHue-^wecmeo-X-X-y (joining), omnoiue-Hue-^^ecmeo-^^ecmeo-X-y (presenting), omHotueHue-cyufecmeo-c^ecmeo-X-X-y (exchange)}, whereX is any of the basic semes, determined on the first level of the classification, while y is any sequence of such semes. X is determined as the seme, main by its meaning. Sign “-” is used in the given case for concatenation notation. Essential explanations are shown in the round brackets.

4.Aset of G4 values of the coordinate G is assigned by a set of generative grammar rules: {SkP 1 -P2 -P3 -P4-P5 -P6 ■P1 -P8, Pi k g■ quantity, P1 k Z, P2 k g■ stability, P2 k Z, P3 k g-positivity, P3k Z, P4 k g-spectrum, P4 k Z, P5 k g-information content, P5 kZ,P6 k g- location, P6 k Z, P7 k g-size, P7 kZ,P8 k g-being artificial, P8 kZ}, where g isalinguistic scalevaluelike: {minimal,... ,little,...,medium, ..., big,., maximal, Z}. Here Z isanempty symbol.

5.Aset G5 of the coordinate values G is assigned by a set of generative grammar rules: {S k x, x k (xFx), x k xFx, x K 1 (existing), x K 0 (non-existing), x k i (possible), x k □ (necessary), F k includes, F Kis included in, F k includes and is included in, F k partially includes, F k more than, F k less than, F k equal to, F k similar to, F k becomes, F k derives from, F k is simultaneous to,

F k is not simultaneous to, F k implies, F k is determined by, F k corresponds to, F k is connected to}.

All further levels of the classification are formed by means of the recursive repeating the offered five levels of classification. The level index can be calculated by the formula Gi = Gmod(i,5), where i belongs to the set of natural numbers. Any notion or class of notions for the natural language corresponds to a definite classification vector.

For example, the group of words {take, give, buy, sell, accept, present, etc.} correspond to the such a vector as [thing\\relation-creature-creature-X].

The group of words {shop, kiosk, supermarket, etc.} correspond to such a vector as [thing\in which alive\X]+[thing\\relation-creature-creature-X].

The word “transport” corresponds to a vector: [thing\in whichaliveX] + [place\\relation-creature-X].

Each word corresponds to a set of semantic notions -points of the notions ’ space. However, using the five coordinates of the multidimensional classificationvector is definite simplification. In the most complete form the classification canbe based on16 coordinates of a recursively repeating vector of values.

The principle of meaningful speech generationbased on the offered classification has been tested by such software as: “Electronic Dictionary”.

The Incompleteness of a Formal Classification Theorem is the basis for the given classification; let’ s introduce a definition of a conditionally complete classification and prove the theorem of semantic classification incompleteness.

Definition 1. Let ’ s consider a system of words semantics representation as points of a vector space to be a conditionally complete, as for an optional element ae{a ’, a’’, a’ ’’...}, b E{b ’, b'\ b’ ’’,...},..., c e{c ’,c’ ’, c ’ ’’...}and vector v [a, b, ., c] it is true, as that for any notion A, A ~ a ’ va’’va’’ ’T>...,foranynotionB,B~b’ vb’ ”Lb ’ ’’ v...,... for any notion C, C ~ c’ v c’ ’ v c ’ ’ ’ v ..., where “~” is a sign of correspondence.

Theorem. Any system of words semantics representation as points of a vector space is characterized by incompleteness. In other words, for any classification there exist words, with meaning elements being classified by the classification not completely. For any classification A of the words set {a }, where any ak~ v[ax, by, ., cz],the meaningof the word S(ak) includes the meaning shade S(ak)Ln, such that — (S(ak).Ln ~ S(v[ax, by, ., c2])), that is —I (S(ak) e S(v[ax, by, ..., cz])),where a = {a1, a2, a3, ...}, b = {b1, b2,b3, ...}, ..., c = {c1,c2,c3, ...}.

Let ’ s show an example of transfer of the meaning out of the meaning, determined by a classification. In this way, the wordlight~v[action, ...,fromthesurface, ...,intensive,...], while — (S(light). shining ~ S(v[action,., from the surface,., intensive, ...])), so, the emotional and associative rows, determined by a person cannot be completely manifested by a formal classification. Consequently, a row of the meaning elements cannotbe manifestedby any formalism, forexample it is impossible to explain a blind person what is the feeling of a color such as red, and therefore it is impossible for him to imagine it. This way the words correspond to positions in a classification according to the law of the excluded third, but meanwhile their meanings are not reduced to the division.

Lemma 1. A word meaning can have optionally large power. Proof. Let ’ s understand the power of a word meaning as the power of set {S(ak)L } for a definite word ak. Let the word ak meaning is assigned by definition in the form of a semantic net {L.(L.,,L/,)}. The word ak is correlated with an object of reality, being in a system of relations with outer objects, parts of the system and the perception of the system; because of the fact that relations with outer objects of reality (distance, concatenation, simultaneousness) determine the meaning {L.(L.,,L.,,)}, where L is an outer object, and the reality (for example, space points set, a quantity of literary worlds, time, subsets of sets of objects and points) is principally endless, so the set of word meaning is principally limitedby nothing. {L } = @|- {L , (L L ,)} = @

For example, the meaning of any word can be always increased: a reading student, a student reading a book, a student sitting and reading a book, etc. - without limitation.

Lemma 2. The quantity of possible words with different semantics is endless. Proof. {L(L., ,L ,,)} = @ |- {L} = @ |-{S(ak).Lj} = @, because of the fact that word ak can be optional.

Proof. Let classification A be assigned by a vector of coordinates v[a, b, ..., c], where ak ~ v[ax, by, ..., cz] and S(ak).L . is an element of ak word meaning and where a = {a1, a2,a3,...},b = {b1,b2,b3,...},...,c = {c1,c2,c3, ...}.Forany ak, let it be true that v[ax, by, ., cz]~ S(ak). For any coordinate dofvectorv[a, b, ..., c]: d = {d v — d} i {d"v — d”} i... Let ’ s assignvalue g = {g’ v — g’} i {g” v — g } i ., where g F a, g F b, ..., g F c. As a result of the union of the classification vectors v[a, b, ., c] and v’ [g], vector v’’ [a, b, ..., c, g]is obtained. It is evident, that according to lemma 2, such a S(a).Lh can be found that S(a) e v’’ [a, b, ..., c, g]. Let ’ s consider a set of such meaning elements E = {S(a).Lh}, I = {S(ak).L.}. I e E. The sets are different, ifgis not empty, because g F a, g F b, ..., g F c. Let Y = E -1.

If such a classification v [a, b, ., c], can exist, that set Y is always empty, than it will be always that either S(g) = 0, or S(g) e S(a) i S(b) i... S(c). Let ’ s assign a word at such that S(a) = S(v[a, b, c]). Let’ s show that it is always possible to select suchg, that Y willbe not empty; because oflemma 2, aword a, canbe found, suchthatS(a) = S(v[a, b, ..., c, g]) and such that its semantics will be always different from the at word semantics. In other case {S(af)} F @, where S(af) is any meaning of the word-classification. Correspondingly, in this respect the set Y will not be empty and the complete classification v[a, b, ., c] cannot exist, the theorem has been proved.

In this way, a meta-notion always exists that adds an extra meaning into the classification of words. It means that, no word classification can generate all words meaning. The given theorem is correlated with the Gijdel ’ s theorem about the incompleteness of formal systems.

The Principles of Meaningful Natural Language Generation. Let ’s consider the principles of meaningful speech generation based on the offered vector based classification (see the figure).

The structures of different levels are formed over the given semantic classification of words and notions of the natural language. On the first level there are word groups of the language, on the second level they are united into

word combinations - pairs of words linked semantically and grammatically, onthat level the combinations of words more or less useful as word combinations are assigned. On the second level the words are united into patterns for example: “Determiner + Attribute + Subject + Modality + Predicate + Determiner + Attribute + Object + Link + Determiner+Attribute + Nominal Group (Modifier of Time) + Link + Determiner+Attribute + Nominal Group (Modifier ofPlace) + Link + Determiner + Attribute + Nominal Group (Modifier of Purpose) + ...”. Semantic chains of the type are presented in the following way: “this/that/. + hungry/ full/. + vegetarian/gourmand/. + can/wants to/. + eat/ cook/. + the/a/. + tasty/aromatic/. + pie/salad/. + after/ before/. + five/six/. + hours + in/for/. five/six/. + minutes + . + in + a + big/beautiful/. + restaurant/ canteen/. + on + a + big/beautiful/. + street/square/.

named after Smith/Brown/. +in + a + big/beautiful/. + city/village/. + Ababa/Acaca/ . +inorder to/to/. + taste/ know/. +a + pungent/spicy/. + taste/aftertaste/. + .”. On the forth level words are separated into subsets of these patterns: “I/he/. + have eaten/tasted/. + on a street/ square/. + named after + Smith/Brown/.”. On the fifth level the fragments of the patterns are united into semantic patterns of the second rank: “the taste of a pie surprised me in the morning” (pattern type: Relation-Attribute_of_Object-Time), “The restaurant gladdened me with a crunching crust”. (pattern class: Relation-Place-Part_of_Object). Generation and ordering the semantic patterns of the second rank is an important task determining the success of the system for natural speech generationby software means. Example of semantic patterns of natural speech generation are shown in the table:

the... этот... of the...чего... is... является... ... -а/у-йте(сь) the...этот... stuff предмет

taste вкус berry ягода sweet сладкий mjoy наслаждаться good хороший thing вещь

after-taste привкус strawberry клубника sour кислый feel чувствовать great великолепный object объект

smack привкус raspberry малина salty соленый savor смаковать excellent отличный gem прелесть

flavor вкус gooseberry крыжовник bitter горький discuss наслаждаться wonderful чудесный must важная вещь

currant смородина pungent острый identify узнать superior превосходный trifle мелочь

bilberry черника weak слабый notice заметить splendid великолепный process процесс

blackberry черная смородина strong сильный learn узнавать magnificent сказочный time время

cranberry клюква experience испытать surprising удивительный moment момент

sweet cherry черешня lovely красивый

cherry вишня worthy стоящий

grape виноград useful полезный

raisin изюм funny забавный

In conclusion it is necessary to notice that the method semantic classification and assigning different levels structures on it is a perspective method of analysis and synthesis of a natural language and meaningful speech generation; the offered classification is new, its efficiency in the sphere of meaningful speech generation has been shown with corresponding software products.

Semantic Classification Tree

Bibliography

1. Agamdjanova, V. I. Contextual Redundancy of the Lexical Meaning of a Word / V. I. Agamdjanova. M.: Higher School, 1977. (inRussian)

2. Apresyan, Yu. D. Ideas and Methods ofModern Structural Linguistics/Yu.D.Apresyan.M.: Science, 1966. (inRussian)

Tree Nodes -Notions in Classification

Truth Functions over Subsets of the Classification

Tables of Semantic Meaningful Speech Generation as Classification Subsets

Phrases ofNatural Language, as functions over a tree of semantic classification

3. Verdieva, Z. N. Semantic Fields in the Modern English Language / Z. N. Verdieva. M. : Higher School, 1986. (in Russian)

4. Nikitin, M. V Lexical Meaning of a Word/M. V. Nikitin. M.: HigherSchool, 1983. (inRussian)

5. Lichargin, D. V. Operations overthe Natural Language Words Semes in Machine Translation / D. V. Lichargin // Works of the Conf. of Young Scientists. Krasnoyarsk, 2003. P. 23-31. (inRussian)

6. Lichargin, D. V. Elimination of Semantic Noise as the Means of Adequate Translation /D.V Lichargin // Questions of the Theory and Practice of Translation : Works of All-RussianConf. - Penza, 2003. P. 90-92. (inRussian)

7. Lichargin, D. V. Generation of the Natural Language Phrases within the Task of Creating Natural Language Interface with Software / D.V Lichargin // Materials of the Eighth All-Russian Conf. “Problems of the Territory Information Development’. Vol. 2. Krasnoyarsk, 2003.P. 152-156. (inRussian)

© Safonov K. V., Lichargin D. V., 2009

S. A. Matyunin, V D. Paranin, V. I. Levchenko Siberian State Aerospace University named after academician M. F. Reshetnev, Russia, Krasnoyarsk

MODELING PHASE FUNCTION OF CONTROLLED DIFFRACTION ELEMENTS ON THE BASIS OF LINEAR ELECTRO-OPTICAL EFFECT*

A design of controlled diffractive optical elements based on electro-optic effect is suggested. The influence of the electro-optical crystal orientation, the direction of light wave propagation and the electric field distribution on the characteristics of controlled diffractive optical elements is considered. The efficiency indicators of controlled diffraction elements structure and material are proposed and their values for the basic elements are calculated.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Keywords: electrostatic field, electro-optical effect, controlled diffraction element.

Elements and devices based on the electro-optical effect are widely used to control parameters of optical illumination, e. g. intensity, phase, state of polarization, spectral composition [1]. Their advantages are high speed (GHz units), great nomenclature of functional materials with various physical properties. Volume and planar modulators, switchboards, deflectors ofbroadband and laser illumination, tunable spectral filters etc. are developed onthe basis of the electro-optical effect at present.

Development of electro-optical controlled diffraction structures (CDS) with a tunable phase function [2-4] is one of the promising directions of creating devices of this kind. In general, the design of such elements includes electro-optical material, control electrodes with individual or group addressing ensuring the required distribution of the material as will as a complex of functional coatings possessing electro insulation, protective, spectroforming or polarization-selective functions (fig. 1).

Changing the kind of phase function with the help of single or multichannel voltage source results in forming a certain diagram of orientation of such a structure and changing its spectral composition.

The aim of this study is to model the phase function of controlled diffraction structures.

Since CDSs based on diffraction gratings form periodic structures a system of conventional symbols to designate the structures has been developed for the sake of convenience. The system of designations of controlled diffraction structures is based on constructional indicators

of the basic element. By the basic element the elementary part of the structure is meant, which, repeated many times, forms CDS. The system takes into account the number and type of electrodes (continuous, discrete) on each surface of the element, the distraction of potentials overthe electrodes, the presence of functional coatings. The following structural formula is proposed for the designation of basic optical elements:

N (X - Y - Zj): p 1(M1 -R - Kl):Tl

where N1 is the number of times the basic structure is repeated; X1, M1 - the number of electrodes in the top (bottom) layer of the basic structure; Y1 = {ND, R1 = {ND is the type of electrodes in the top (bottom) layer (continuous N or D); Z1 indicates the potential distributionoverthe top (bottom) electrodes (0 for equal potentials of all electrodes, 1 for different potentials of electrodes); P1 = {01, T1 = {01 presence or absence of the functional top (bottom) layer of the basic structure.

Examples of the main types ofbasic elements with their designations and descriptions are given in table 1.

The analysis of the structure of electric fields in diffraction CDS shows that the field structure is quite complex, therefore the type of the electro-optical effect used essentially depends on the relationship between the CDS geometrical dimensions and properties of the electro-optical material. Because of this to assess the efficiency of the

* This work was supported by the Ministry of Education and Science within the analytical department program “Development of Universities Scientific Potential“, project number 10v-B001-053.

i Надоели баннеры? Вы всегда можете отключить рекламу.