Integration of object-oriented software engineering models on the basis of the ontology
transformation approach
H. Abdulrab,
Professor of computer science, LITIS laboratory, INSA (Rouen, France). E. Babkin,
Associate Professor, Faculty of Business Informatics and Applied Mathematics, State University — Higher School of Economics (Nizhny Novgorod). LITIS laboratory, INSA (Rouen, France).
(s ^
This article proposes combination of information flow theory and relational logic for achieving semantic interoperability during integration of heterogeneous UML models. The suggested approach is during integration of CIM and SID object-oriented models which are now widely used in telecommunications.
Introduction
The issue of interoperability in modern software systems has great impact on their productivity and usefulness. In the simplest form interoperability means the ability of two or more systems or components to exchange information and to use the information that has been exchanged. The need for consistent and flexible methods of information interexchange between components of software systems became the significant challenge in modern software engineering due to dramatic increase of complexity and distribution scale in modern economical, sociological and technical systems all over the world. In presence of such diversity business analysts and software developers face a lot of problems during application of empirical engineering methods to describe structural and dynamical characteristics of communication processes between the components, to maintain smooth flow of divergent information inside the distributed software systems.
This article presents one new approach to achieve interoperability in object-oriented models which are represented in the form of UML class diagrams and OCL constraints. This approach was called ontology transformation because shared description of domain concepts (ontology) [SOWOO] guides the connection of partial UML models into the coherent structure, and defines origins for detailed manual definition of the mappings between the different models.
Theoretical foundations of ontology transformation
We develop a theory of hierarchical ontology transformation suitable for support of dynamic integration in multi-agent and other complex distributed systems. The process of ontology transformation is initiated at the level of conceptual modeling (the level of domain ontology), then it continues at the architectural software levels (correspondent information models), and finally at the data level it is practically implemented in dynamics during runtime execution of mappings between semi-structured data models. Without loss of generality describing the theory of ontology transformation we use three abstraction levels of modeling hierarchy and two components C1, C2 of the distributed system (fig. 1).
The topmost level of the modeling hierarchy consists of domain core ontology which catches important business concepts of the application domain and their relations. The second level contains information models of common software engineering concepts. The bottom level contains implementation-specific information models which are instantiated during runtime in the form of semi-structured data models. These semi-structured data models participate in the "composition-processing-delivery" process of ontology transformation.
In the course of the development of the theory our first task is looking for the mathematical structure which permits consistent representation of models' properties for all three levels. At the level of ontology such mathematical structure should be isomorphic to the refined
Figure 1. The illustrative structure of the distributed system.
Karlsruhe mathematical structure S = {C, <C, R, o) of the core domain ontology [EHR07], where
C — is a set of concept identifiers (concepts for short).
R — is a set of relations identifiers (relations for short).
< c — is a partial order on C, called concept hierarchy or taxonomy.
o — a function R^ CxC called signature, such that oI= {domI, ranI), where r e R, domain domI, range rani.
At the same time the selected mathematical structure should express major concepts of object-oriented modeling which will be exploited on the levels of Common
Software Engineering Models and Implementation-specific models.
Given such preconditions we selected the same formalism of relational logic and the language of Alloy system for representation of structure and relationships in ontological and software engineering models. In this case the set C of the ontology structure S is directly mapped onto the set of atoms of the relational logic and the set of relations identifiers R is mapped onto the relational operators of the relational logic [JAC06]. Additional axioms in terms of first-order logic represent define domain-specific constraints.
Further refinement of the theory consists of restricting the available modeling technique and end-user modeling languages for software engineering models. Pragmatic
considerations force to use widely accepted object-oriented modeling, and specifically UML and OCL. Because [CRA01] show applicability of UML and OCL for representation of ontology as well, our theoretical framework obtains the elegant uniform language for the models of all three levels of the modeling hierarchy. In our case the same language of UML class diagrams is used for representation of the structure S of domain core ontology and the models of lower levels. OCL is used as the language of axi-omatization of domain- or implementation-specific constraints both for domain core ontology and software engineering models. In order to avoid semantic ambiguity of pure UML and OCL structures they are translated to the form of relational logic.
Fig. 1 shows that different components of the distributed system are modeled by different software engineering models labeled CJ, I, j =1..2 correspondingly. At the same time the developers of components share the same domain core ontology O. Semantic interoperability, which should be achieved in the result of ontology transformation, gives the ability of interpretation of data structures and sharing of the meaning of the data between remote parts of the integrated system. In such a case we can say that when semantic interoperability is achieved then information models are integrated and seamless information flow between remote parts of the integrated systems is maintained. Integration can be viewed as creation of new constraints, which restrict diversity of information models at the software engineering levels in accordance with the logic of the whole system, and mappings between partial views of remote components and the whole system.
Such formulation of integration leads to application of the major principles of information flow theory for expression of ontology transformation. According to the information flow theory in a general case the distributed system A consists of an indexed family cla(A) = {A}iŒl of classifications together with a set inf(A) of info-morphisms all having both domain and codomain in cla(A) [BAR97]. In our specific case of the layered information modeling of two components each separate level of the software engineering models is represented as a classification , Cj = {ToK(Cij), \=Cj), I, j= 1..2:
Typ (Cj)
Tok cq )
Similarly the level of ontology is represented as a classification O = Tok (O), Typ (O), l=O).
Typ (O) No Tok ( O )
In our theory we propose to use the language of relational logic L as a foundation for definition of types and tokens of classifications CJ and O. In this case types are correct sentences of L which define a certain logic micro-theory of UML class diagrams and tokens are L-structures which in fact become the model of the corresponding micro-theory.
Correspondingly to the proposed classifications five local logics are defined to express valid constraints on the classification:
CJ = (CJ, l-cj, N) I, j=1..2;
L = (O, he, Nc ).
The theory of each classification has as constraints just the sentences of relational logic L that are valid in the usual sense, and the consequence relation hc is logical consequence in the language of relational logic.
O
Figure 2. The structure of isomorphisms between different classifications of information models
According to the fig. 2 infomorphisms are defined between the classifications of the models and ontology. The structure of infomorphisms makes up the information
C
channel C = {f': C' ^ O} i = 1..2, where the classification of ontology O plays a role of the core of the information channel. On tokens f1 assigns to each ontology token correspondent Z-structure of the software engineering models. On types, assigns certain concepts of ontology defined by Z-sentences to the subset of software engineering models. The core of the information channel represents the whole system view, and tokens define connections between information models of the separate components.
Given the described structure of infomorphisms between classifications, the sum of classifications of implementation-specific models (C2 + C2) can be defined (fig.3). It is the classification defined as follows [BAR97]:
1. The set Tok (C2 + C2) is the Cartesian product of
Tok (C2) and Tok (C2).
2. The set Typ (C( + C() is the disjoint union of
Typ (C2) and Typ (C2).
Using the Universal Mapping Property for Sums [BAR97] the following diagram is made.
O.
C 1 C 1
C 2 C 1
C 1
c 1+ c 2
C 2 C 2
Figure 3. The sum of classifications in the information channel
As the sum (C, + C2) has common properties of every infomorphisms of the information flow theory, it has own local logic. According to [BAR97] the local logic of the sum (C\ + C) is called the distributed logic of the information channel generated by L. Such kind of the logic is denoted by Dloge (L). The distributed logic Dloge (L) represents the reasoning about relations among the information models of the components of the distributed system justified by the logic of the core. In other words the created structure of infomorphisms allows moving local logics around from one classification to another.
Analysis of formal properties of Dloge (L), which was done by Barwise in the framework of the information flow theory, shows that it is not generally sound. Because moving of local logics does not preserve soundness and completeness, we have a theoretical justification for necessity of manual procedures in the course of ontology transformation. Concepts of Infomorphism and Information Channel give answers to "what" and "why" questions — they explain what part of information is moved and why elements of one information model move information about other information models in the context of the whole system. But in order to produce real-time data
elements of semi-structured data models we need to perform manual procedures of mapping, answering the important pragmatic question "how". In terms of our theory that is of manually defined mapping M between the tokens Tok (C\) and Tok (C2) of the sum classification (Ci + C2):
M : Tok (Ci)
map
-*~Tok (Ci)
As our theory shows that mapping of the tokens of the sum classification is a minimally necessary manual procedure for the process of ontology transformation. In principle different computational paradigms may be applied for definition and execution of that procedure. In our work we adopted workflow paradigm and the procedure is defined in a graph-oriented manner. Correspondent engineering methodology for design and implementation of the ontology transformation will be presented the in the next chapter.
Illustration of the theory application
To get insight to the presented theory of ontology transformation and its practical implications a simple but important example of identity transformation is considered. In this case the shared domain core ontology has a single entity which connects two fragments of different software engineering models. Therefore the process of ontology transformation defines the valid translation of one engineering model to another.
The presented example considers the application domain of telecommunication, and the shared domain ontology defines the generic concept of rack mounted physical equipment for connection purposes. There are two common software engineering UML models which describe engineering concept of rack mounted equipment in terms of different modeling paradigms. These models become the models C\ and C2 in accordance with the theory of ontology transformation (fig. 1). In details, the first common software engineering model C\ contains a fragment of physical CIM model describing rack mounted equipment in terms of the CIM modeling paradigm (fig. 4). Accordingly the second model represents the same concept of rack mounted equipment expressed in terms of the SID modeling paradigm (fig. 5). Because the common engineering models represent only most important structural modeling constituents, and do not define precisely application-specific details the classes of the common models are abstract. Also for the sake of simplicity the models contain only relations between classes, and attributes are not shown.
MyRack
CIM_Rack
located_at
CIM_ChassisinRack 0...1 0...
CIM_Chassis
rack chassis
0... 1
CIMJocinto 0...1
chass toslot
0... 1 —jH
adj2 CIM_AdjSlot2
SiteLocation
CIM_Location
0... 1
MyChassis
ir0...
CIM Slot
0... 1
adj1
0... 1
0...*
CIM_CardinSlot 0... 1
CIM_Card MyCard
0...1 CIM_CardOnCard 0...1
Figure 4. The common and specific engineering models of rack mounted equipment in terms of CIM-based physical domain. Classes of the specific model are marked by a color
MySIDRack Rack holderlnHolder 0...1 0...* Chassis MySIDChassis
placed_at
0... 1
PlacedAt
0... 1
holderlnHolder2 0...*
0... 1
MySIDSlot
0... *
equipmentinHolder 0...1
MySIDPlace Place MySIDCard
Figure 5. The common and specific engineering models of rack mounted equipment in terms of SID-based physical domain. Classes of the specific model are marked by a color
Because common engineering models left many specific issues outside the scope of the model, additional refinement of the common concepts is needed for description of application-specific information. Detailed application-specific refinement of the concept of rack mounted equipment is given in the specific software engineering models C\ and C\. These models contain concrete classes and additional OCL constraints. The classes of concrete CIM-and SID-based software engineering models are marked by light color on fig. 4 and fig. 5 correspondingly.
Appling principles of the theory of ontology transformation to the given models we need to define precisely semantics of infomorphisms fj between classifications which comprise the information channel e= {f j : C 1 ^ O} i = 1..2, and correspond to the software engineering models and the shared domain ontology. Such formal semantics allows consistent expressing of UML structure of ontology and software engineering models in terms of the language of relational logic.
MySlot
In the case of ontology transformation the most important part of infomorphism f1 defines mapping between types of classifications: Typ (C1) ^ Typ (O). Semantics of this part of the infomorphism between classifications can be naturally represented as the UML association "one-many" between the ontology concept and semantically related UML classes of the common engineering models (fig. 6).
«domain ontology GenericConcept
7
ZI
«software engineering»Class1
IS
V
«software engineering»Class3
«software engineering»Class2
The group of semantically related UML classes which describe implementation details of the same domain concept
Figure 6. Formal representation of mapping between the concept of domain ontology and correspondent classes of the software engineering model
This approach defines foundation principles for reusable and expandable expression of ontology concepts and correspondent classes of UML models in terms of relational logic and Alloy language. In accordance with this approach reusable definition of shared domain core ontology includes the following signature and logical micro-theory which is expressed in Alloy language (fig. 7):
I abstract sig OntologyConcept
2 {
3 disj se_concepts1, se_concepts2: some univ ,
4 disj assoc1, assoc2: univ -> univ
5 }
6 {
7 all disj p,p1:se concepts1l p in se concepts1 and p1 in
8 p.assoc1 implies p1 in se_concepts1
9 all disj p,p1:se_concepts2l p in se_concepts2 and p1 in
10 p.assoc2 implies p1 in se_concepts2
11 assoc1 in se_concepts1 -> se_concepts1
12 assoc2 in se_concepts2 -> se_concepts2
13 }
14 fact { all disj o1, o2 : OntologyConcept 1 o1.se_concepts1 & o2.se_concepts1 = none}
15 fact { all disj o1, o2 : OntologyConcept 1 o1.se_concepts2 & o2.se_concepts2 = none}
Figure 7. Foundation L-sentences of ontology (micro theory)
This micro-theory states, that in the result of mapping between the types of the classifications any ontology concept links two different software engineering models via classes relations se_conceptl, se_concept2 and associations relations assoc1 and associ. The info-morphism mapping should possess the property of closeness: for all connected classes their associations should be included in the infomorphism.
The next step is reformulation of UML class diagrams from the fig. 4 and fig. 5 in terms of relational logics. With the help of freely available UML2Alloy tool [U2A07] this task is almost automated and correspondent sentences are represented in the fig.8 and fig.9 correspondingly. These sentences contain signatures, facts about properties of associations, and several generic OCL constraints.
1 abstract sig CIM_Rack{ located_at : lone CIM_Location, rack_chassis : set CIM_Chassis}
2 abstract sig CIM_Chassis{ toslot : set CIM_Slot }
3 4 abstract sig CIM SLot{ cim card : lone CIM Card, adjl : lone CIM_SLot, adj2 : lone CIM_SLot}
5 abstract sig CIM_Card{ cim_slot : set CIM_SLot, cim_card1 : lone CIM_Card }
6 abstract sig CIM_Location{}
7 fact { cim_slot in ( CIM_Card) lone->set ( CIM_SLot) }
8 fact { cim_card in ( CIM_SLot) set->lone ( CIM_Card) }
9 fact CIM_slot_card { cim_slot = ~cim_card }
10 fact { adjl in ( CIM_SLot) lone->lone ( CIM_SLot) }
11 fact { adj2 in ( CIM_SLot) lone->lone ( CIM_SLot) }
12 fact { located at in ( CIM Rack) lone->lone ( CIM Location) }
13 fact { rack_chassis in ( CIM_Rack) lone->set ( CIM_Chas-sis) }
14 fact { toslot in ( CIM_Chassis) lone->set ( CIM_SLot) }
a)
Original OCL Constraints Alloy Sentences
context CIM Slot inv Loop: CIM_SLot.allInstances -orAll (sl1 : CIM SLot 1 s11.adj 1.adj2 = s11) fact { Loop[ ] } pred newConstraint ( ){ all sl1 : CIM SLot 1 sl1.adj1.adj2 = sl1 }
context CIM_Slot inv notWithTheSame : CIM_SLot . allInstances -o forAll ( sl1 : CIM_SLot 1 sl1 . adj1 <o sl1 ) fact { notWithTheSame [ ]} pred notWithTheSame ( ){ all sl1 : CIM SLot 1 sl1.adj1 ! = sl1 }
b)
Figure 8. L-sentences of the CIM-based model : a) - expression of the semantics of the class diagram; b) - expression of the OCL constraints.
I
■
1 abstract sig Place{}
2 abstract sig Rack{placed_at : lone Place, holded : set Chassis}
3 abstract sig Chassis{sidslot : some SLot}
4 abstract sig SLot{chassis : lone Chassis, holdedEquipment: lone Card}
5 abstract sig Card{slot : set SLot}
6 fact { chassis in ( SLot) set->lone ( Chassis) }
7 fact { sidslot in ( Chassis) lone->set ( SLot) }
8 fact { slot in ( Card) lone->set ( SLot) }
9 fact { placed_at in ( Rack) lone->lone ( Place) }
10 fact { holded in ( Rack) lone->set ( Chassis) }
11 fact { holdedEquipment in ( SLot) set->lone ( Card) }
12 fact chassis_slot { chassis = ~sidslot }
13 fact slot_card { slot = -holdedEquipment }
Figure 9. L-sentences of the SID-based model , which express the semantics of the class diagram
Given Alloy-based definition of UML class diagrams the infomorphism mappings between Z-sentences of ontology and Z-sentences of CIM- and SID-based engineering models are defined as a certain type of specialization of generic micro-theory from fig. 7. In terms of Alloy language it is done by specification of subsignature and definition of constraints (fig. 10).
1 sig OntologyRackMounted extends OntologyConcept {}
2 fact {#OntologyRackLocation = 1}
3 fact {
4 5 OntologyRackMounted.se conceptsl = CIM Rack + CIM_Chassis + CIM_SLot + CIM_Card + CIM_Location}
6 fact { all o: OntologyRackMounted 1 #CIM_Rack<:o.se_con-cepts1 = 1 }
7 8 fact { OntologyRackMounted.assocl = located_at + rack_chassis + toslot + cim_card + adjl + adj2 + cim_slot + cim_card1 }
9 fact {OntologyRackMounted.se_concepts2 = Place + Rack + Chassis + SLot + Card }
10 11 12 fact {all o: OntologyRackMountedl #Rack<:o.se_concepts2 = 1 and #Place<:o.se_concepts2 = 1 and #Chas-sis<:o.se_concepts2 = 1 and #SLot<:o.se_concepts2 = 1}
43 fact { OntologyRackMounted.assoc2 = placed_at + holded + sidslot + chassis + holdedEquipment + slot }
Figure 10. L-sentences of the specific mapping between different definitions of RackMounted Equipment
By the similar manner reformulation of specific software engineering models should be provided in terms of Alloy language and the following mappings should be established between Z-sentences of common models and Z-sentences of specific models:
Typ (C2>-
map
+Typ (Ci) ,
Typ (C2) ■
map
+Typ (C2i) ,
As far as the structure of specific software engineering models was constrained to concrete subclasses of abstract classes in common models, the mappings are defined by a straightforward way: all signatures of the concrete software engineering models extend correspondent signatures of the common software engineering models (fig. 11 and fig.12).
1 sig MyRack extends CIM_Rack{}
2 sig SiteLocation extends CIM_Location{}
3 sig MyChassis extends CIM_Chassis{}
4 sig MySlot extends CIM_SLot {}
5 sig MyCard extends CIM_Card{}
Figure 11. L-sentences of the mapping between Cj and C2 (CIM domain)
1 sig MySIDRack extends Rack{}
2 sig MYSIDPlace extends Place{}
3 sig MYSIDChassis extends Chassis{}
4 sig MYSIDSLot extends SLot{}
5 sig MYSIDCard extends Card{}
Figure 12. L-sentences of the mapping between Ci and C2
Additionally domain-specific OCL constraints are translated to the appropriate facts in Alloy language (fig. 13).
To complete the definition of the information channel the mapping between tokens (i.e. concrete L-structures of instances which make the model true) should be done:
Tok (O)
map
■*-Tok (Cj) , i = 1...2;
Tok (Cj)-map—► Tok (C2) , i = 1...2.
Due to specific capabilities of Alloy analyzer finding of these Z-structures may be done automatically in the result of the model analysis. To do this task some auxiliary predicate Model should be defined and the command run is issued.
In the result Alloy Analyzer will return several instances of the Z-structures which satisfy the posed constraints. These Z-structures form the token part of the sum classification (C\ + C^) and relate connected instances of CIM and SID models for the given application specific constrains.
In the given example there is a restriction on the number of slots in the rack and following Z-structures are found.
Original OCL Constraints Alloy Sentences
context MyRack inv hasLoc: MyRack.allInstan- ces-> forAll(r: MyRackl r.located_at->size = 1) fact{hasLoc[ ]} pred hasLoc ( ){ all r : MyRack | # r . located at = 1 }
context MyRack inv hasChassis : MyRack . allIn-stances -> forAll ( r : MyRack 1 r . rack_chassis -> notEmpty ) fact{hasChassis[]} pred hasChassis ( ){ all r : MyRack | # r . rack chassis > 0 }
context MyChassis inv has2Slots : MyChassis . al- lInstances -> forAll ( c : MyChassis 1 c . toslot -> size = 2 ) fact{has2Slots[]} pred has2Slots ( ){ all c : MyChassis 1 # c . toslot = 2 }
context MyChassis inv insertedIntoRack: MyChas- sis.allInstances-> forAll(c:MyChassis | c.chas- sis_rack->notEmpty) fact {insertedIntoRack[]} pred insertedIntoRack() { all c :MyChassis 1 one r:CIM Rack 1 c in r.rack chassis }
context MySlot inv insertedIntoChassis: My- Slot.allInstances-> forAll(s:MySlot | s.chass ->no- tEmpty) fact {insertedIntoChassis[]} pred insertedIntoChassis() { all slot :MySlot 1 one c:CIM_Chassis 1 slot in c.toslot }
context MyCard inv insertedIntoSlot: MyCard.al-lInstances-> forAll(s:MyCardls.CIM_CardIn-Slot -> notEmpty fact {insertedIntoSlot[]} pred insertedIntoSlot() { all card :MyCard 1 one s:CIM_SLot 1 s.cim_card = card and card.cim slot = s }
Figure 13. Additional L-sentences of the model
MyRack (se_concepts1)
MYSIDSLot (se_concepts2)
MySIDRack (se_concepts2)
holdedEquipmenslot chassissidslot
il
MYSIDChassis (se_concepts2)
holdedplaced_at /
MYSIDPlace (se_concepts2)
71
adj1adj2 À!
MySlot1 (se_concepts1)
Figure 14. The first found pair of ¿-structures for the sum classification (C\ + C)
MyRack (se_concepts1)
A
located_atack_chassis
' A
MYSIDSLot (se_concepts2)
chassissidslot holded
W
sim_card sim_slot ^_
MySCard1 (se_concepts1)
sim slot sim card
MySCard0 (se_concepts1)
Figure 15. The second found pair of L-structures for the sum classification (C2 + C2)
Conclusion
The presented method helps in resolution of the problem of interoperability giving the automated procedure of inference semantically related runtime data structures of correspondent UML data models.
Because in practice definition of intermediate mappings between classes of the concrete software engineering models is not always possible, detailed specification of correspondences between tokens of the sum classification can not be done automatically in the framework of the theory of ontology transformation. Such specification should be performed manually in accordance with certain engineering methodology. ■
Bibliography
[SOW00] Sowa J.F. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks Cole, Pacific Grove, CA. 2000. [EHR07] Ehrig M. Ontology Alignment : Bridging the Semantic Gap. Series: Semantic Web and Beyond , Vol. 4. Springer-Verlag. (ISBN 978-0-387-32805-8). 2007. [JAC06] Jackson D. Software Abstractions: Logic, Language and Analysis. The MIT press, Cambridge, Massachusetts. 2006. [CRA01] Cranefield S. UML and the Semantic Web. In Proc. of thelntrl.
Semantic Web Working Symposium, SWWS'01. July 2001. [BAR97] Barwise J., Seligman J. Information Flow. Cambridge
University Press, 1997. [U2A07] UML2Alloy Project.
Online: http://www.cs.bham.ac.uk/~bxb/UML2Alloy/index.php. 2007.