DOI: 10.15514/ISPRAS-2021-33(4)-12
A Multilayer Approach to Subgraph Matching
in HP-graphs
N.M. Suvorov, ORCID: 0000-0003-2871-9757 <SuvorovNM@gmail.com> L.N. Lyadova, ORCID: 0000-0001-5643-747X <LNLyadova@gmail.com>
HSE University, 20, Myasnitskaya Ulitsa, Moscow, 101978, Russia
Abstract. Visual modeling is widely used nowadays, but the existing modeling platforms cannot meet all the user requirements. Visual languages are usually based on graph models, but the graph types used have significant restrictions. A new graph model, called HP-graph, whose main element is a set of poles, the subsets of which are combined into vertices and edges, has been previously presented to solve the problem of insufficient expressiveness of the existing graph models. Transformations and many other operations on visual models face a problem of subgraph matching, which slows down their execution. A multilayer approach to subgraph matching can be a solution for this problem if a modeling system is based on the HP-graph. In this case, the search is started on the higher level of the graph model, where vertices and hyperedges are compared without revealing their structures, and only when a candidate is found, it moves to the level of poles, where the comparison of the decomposed structures is performed. The description of the idea of the multilayer approach is given. A backtracking algorithm based on this approach is presented. The Ullmann algorithm and VF2 are adapted to this approach and are analyzed for complexity. The proposed approach incrementally decreases the search field of the backtracking algorithm and helps to decrease its overall complexity. The paper proves that the existing subgraph matching algorithms except ones that modify a graph pattern can be successfully adapted to the proposed approach.
Keywords: DSM platform; visual model; subgraph matching; isomorphism; graph model; HP-graph; algorithms on graphs.
For citation: Suvorov N.M., Lyadova L.N. A Multilayer Approach to Subgraph Matching in HP-graphs. Trudy ISP RAN/Proc. ISP RAS, vol. 33, issue 4, 2021, pp. 163-176. DOI: 10.15514/ISPRAS-2021-33(4)-12
Многослойный подход к поиску изоморфных подграфов
в HP-графах
Н.М. Суворов, ORCID: 0000-0003-2871-9757 <SuvorovNM@gmail.com> Л.Н. Лядова, ORCID: 0000-0001-5643-747X < LNLyadova@gmail.com> Национальный исследовательский университет «Высшая школа экономики», 101978, Россия, г. Москва, ул. Мясницкая, д. 20
Аннотация. Визуальное моделирование на данный момент широко распространено, однако существующие платформы, предназначенные для моделирования, не могут удовлетворить все требования пользователей. Визуальные языки, как правило, основаны на графовых моделях, однако графовые формализмы, используемые для представления моделей, обладают существенными ограничениями. Для решения проблемы недостаточной выразительности существующих графовых моделей ранее была представлена новая графовая модель (HP-граф), основным элементом которой является множество полюсов, подмножества которых объединены в вершины и гиперребра. Многие операции над визуальными моделями, включая трансформацию моделей, сталкиваются с проблемой поиска изоморфного подграфа, что оказывает значительное влияние на скорость их выполнения.
Многослойная структура HP-графа позволяет снизить временную сложность алгоритмов поиска. Количество операций может быть снижено благодаря тому, что поиск изначально осуществляется на слое вершин и гиперребер, и только в случае нахождения подграфа с желаемыми характеристиками алгоритм переходит на более детальный уровень, где сравниваются наборы соответствующих полюсов и обыкновенных связей отобранных подграфов. Представлено описание идеи многослойного подхода. Предложен алгоритм поиска с возвратом, основанный на этом подходе. Алгоритмы Ульмана и VF2 адаптированы к данному подходу, выполнена оценка их временной сложности. Предложенный подход постепенно сокращает область поиска алгоритмов и помогает уменьшить их общую сложность. В статье доказывается, что существующие алгоритмы сопоставления подграфов, за исключением тех, которые изменяют шаблон графа, могут быть успешно адаптированы к предлагаемому подходу.
Ключевые слова: DSM платформа; визуальная модель; поиск изоморфного подграфа; изоморфизм; графовая модель; HP-граф; алгоритмы на графах
Для цитирования: Суворов Н.М., Лядова Л.Н. Многослойный подход к поиску изоморфных подграфов в HP-графах. Труды ИСП РАН, том 33, вып. 4, 2021 г., стр. 163-176 (на английском языке). DOI: 10.15514/ISPRAS-2021-33(4)-12
1. Introduction
The study of any objects and processes, as well as their design, can barely be done without modeling; that is why software tools that allow specialists to build various models and formalize descriptions of objects and processes, or use modeling as a method of analysis, are becoming more popular. Models are described and built with the help of a visual modeling language, which is a fixed set of graphical symbols and rules for constructing visual models by using these symbols [1]. Visual languages can be represented as various types of graphs, including oriented graphs [2], hypergraphs [3], hi-graphs [4], meta-graphs [5] and P-graphs [6].
Previously, a new graph model, called HP-graph, was proposed as a formalism for representing visual languages [7]. This model unites expressive possibilities of all the mentioned graph types and, thus, it can be used for building more complicated models than those which can be built with the help of the other graph models. The paper [7] proved that this graph model allows the creation of a flexible visual model editor based on it.
This model is proposed as a basis for domain-specific modeling, one of the key aspects of which is model transformations. Such transformations allow users to move from one level of abstraction to another (a vertical transformation) or from one modeling language to another (a horizontal transformation) [5]. Different approaches can be used to transform visual models, but the current standard is the algebraic approach which is based on the graph grammars [9]. Based on this approach, a transformation r = (L, R) includes the left and the right part, where L is a subgraph to be found in a source graph, and R is a subgraph replacing L in the source graph. As for the HP-graph, only main operations, including operations of adding and removing graph elements and operations of decomposition, were described for this model, and no algorithm were proposed to perform an isomorphic subgraph search operation. The structural complexity of the model requires modifying the existing algorithms to adapt them to this model. The HP-graph has a multilayer structure which consists of the layer of vertices and hyperedges and the layer of poles and links, sets of which are combined into the elements of the former layer. The multilayer structure of the graph model allows to reduce time complexity of search algorithms. The number of operations can be decreased due to the fact that the first search and matching is performed on the layer of vertices and hyperedges, and only after finding a subgraph with the desired characteristics, the algorithm moves to a more detailed level, where the already selected sets of corresponding poles and ordinary edges are compared.
In practice, a task of finding an isomorphic subgraph has a wide range of applications, including chemical compound search [10], social network analysis [11], pattern recognition [12], and protein interaction analysis [13]. However, subgraph matching is a bottleneck in the overall performance for most of these applications due to the fact that this task is NP-hard [14]. For instance, nodes count 164
for protein structure analysis can reach up to tens of thousands [15]; that is why active efforts are currently being made to find an optimal algorithm for subgraph matching.
In visual modeling the problem is the same. The thesis [5] proposes to represent all the models in the form of a single graph, which allows users to maintain links between the models and automatically propagate changes from the source model to the target ones associated with it. For instance, a change in the metamodel of the subject area should be propagated to all the models built on this metamodel. However, storing all the models as a single graph increases the computational complexity of the algorithms on this graph, which requires developing an efficient subgraph search algorithm for the graph model used. The contributions of these paper are:
1) a new multilayer approach to decrease complexity of subgraph matching algorithms,
2) a backtracking algorithm based on this approach,
3) applications of this approach in several existing subgraph matching algorithms.
The paper is organized as follows. Section 2 discusses related work and the main algorithms for finding subgraph isomorphism. Section 3 presents the proposed graph model, definitions of the HP-subgraph and isomorphism of the HP-graphs, and the multilayer approach to subgraph matching. Section 4 introduces a backtracking algorithm based on this approach. Section 5 presents several applications of the approach in the existing subgraph matching algorithms. Section 6 describes the obtained results. Section 7 concludes the paper.
2. Related work
The problem of subgraph matching has been investigated for many years. The works of many scientists, such as [16]-[18], are dedicated to exploring applicability, time complexity and limitations of the existing subgraph matching algorithms. These algorithms are generally divided into two classes:
• Algorithms that observe many graphs (G1, ..., G„} and retrieve those which contain a query graph Q.
• Algorithms that observe a single graph G and retrieve all its subgraphs which are isomorphic to a query graph Q.
In both of these approaches, algorithms can either return a correct and complete answer (having an exponential time complexity) or return an approximate answer (having a polynomial time complexity). While the complete answers describe all subgraphs exactly isomorphic to a pattern, the approximate answers are generally obtained using specific similarity measures and, thus, may also contain false positive subgraphs.
This work belongs to the second class of the algorithms. Most of these algorithms use backtracking to move through the built search tree and find appropriate combination of corresponding vertices of the source graph and the graph-pattern. Algorithms in this class include Ullmann algorithm [19], VF2 [20] (and also VF2 Plus [21] and VF3 [22]), TurbolSO [23], CFL-Match [24], QuickSI [25], SPath [26] and others. These algorithms implement various techniques to decrease time needed for the matching process.
Exploiting Pruning Rules. The Ullmann algorithm uses refining procedure on each step of the algorithm by comparing degrees of corresponding neighbors of the added pair of vertices. VF2 [20] provides feasibility rules that are checked before a vertex is added to a graph-candidate. There rules check consistency of graph-candidates with this vertex and check for a sufficient number of vertices-neighbors of these graph-candidates. SPath [26] uses neighborhood signature for each vertex to store information about the surrounding vertices. These signatures are compared with the corresponding signatures of the query graph and are used for search space pruning before subgraph matching. TurbolSO [23] compares quantity of neighborhood labels of corresponding vertices and prune out unpromising ones. CFL-Match [24] proposes a compact-path-index (CPI) structure
presented as a tree which is built from the source graph vertices with the same labels as query graph vertices and then refined by exploiting matching operations.
Graph Pattern Modification. The Ullmann algorithm and VF2 [20] do not modify graph pattern and search its embeddings in the source graph. SPath [26] changes the way of graph query processing from vertex-at-a-time to path-at-a-time, which tends to be more cost-effective than traditional graph matching methods. TurbolSO [23] presents a NEC-tree structure which merges similar vertices together and present a query graph as a tree. CFL-Match [24] transform a query into a set of dense subgraphs, forests, and leaves. The source graph in this algorithm is only probed for non-tree edge validation, whereas other query parts are checked in the CPI structure.
Optimizing Matching Order. The Ullmann algorithm [19] does not specify the matching order of the vertices, whereas VF2 [20] starts from a random query vertex and then recursively adds those vertices that are connected with the already matched ones. QuickSI [25] exploits an order which is based on the vertex label frequency, and the algorithm starts a process of matching from the least frequent ones. TurbolSO [23] implements a concept of candidate region exploration and produces a matching order for every region where a NEC-tree was found. CFL-Match [24] present all candidates as a CPI-structure, where all the pattern embeddings are filtered and validated by traversing this tree structure.
The most of theoretical research of this problem was conducted specifically for ordinary graphs [18]; that is why the approaches of these algorithms have to be adapted to an .HP-graph model. In particular, this paper presents an adaptation of a standard backtracking algorithm for subgraph matching, the Ullmann algorithm [19] and the VF2 algorithm [20], which are optimized for the multilayer structure of this graph model.
3. Graph-Matching Approach for HP-graphs
Let Pol be a set of all poles of the graph, including external poles and internal poles of vertices and hyperedges. Then, an HP-graph is an ordered triple G = (P, V, W), where P = {n,.. .,rc„} is a set of external poles, V = {v1,...,vm} is a non-empty set of vertices, W = {w1,...,wi} is a set of hyperedges [7]. An example of the graph model is demonstrated on fig. 1.
Рис. 1. Пример HP-графа Fig. 1. Example of an HP-graph In this figure external poles are represented by a set P = {П1, П2}, hyperedges by a set W = {w1,...,w5}, and vertices by a set V = {v1,...,v5}. A set Pol includes of the poles of the graph and is presented as {p1,.,p12}u{^1, П2}.
Every hyperedge w of the HP-graph G can be presented by ordinary links, which are defined as a set Ew = {e1 ,.,en}, where every link (e e Ew) is a pair of connected poles (p, r), where p is a source pole and r is a target pole of a link. An example of this decomposition is presented in Fig. 2. The hyperedge w2 defines a set E^ = {(p4, p8), (p4, p6), (p6, p8)}. Every vertex and hyperedge can also be decomposed by a new HP-graph, which is described in detail in [7]. 166
Рис. 2. Декомпозиции гиперребра W2 Fig. 2. Decomposition of the hyperedge W2
3.1 Definitions of a Subgraph and Isomorphism
To determine subgraph matching operations, it is needed to give a definition to a subgraph of the HP-graph. An HP-graph G' = (P', V', W) is a subgraph of an HP-graph G = (P, V, W) iff G' is a part of the graph G (P' с P & (Vv'e V' 3v e V: [v' с v]) & W' с W) and meets the condition (1) to make transformation operations possible [7]. A subgraph can contain vertices called incomplete whose sets of poles are only part of the sets of poles of the vertices of the original graph:
Vwe W(3ve V'\V'partiai ([Pol(w)nPol(v#0])^we W), (1)
the set V'partiai is a set of the incomplete vertices in the graph, where V'partal с V'. To define the isomorphism mapping, it is necessary to establish one-to-one correspondences between the same type elements of graphs that preserve the incident relations. This, two HP-graphs G = (P, V, W) and G' = (P', V', W) are isomorphic iff there exists a bijectionf 2Pol(G)^2Pol(G,) such that for Vte2Pol(G):
(te W^f(t)e W)&(te V^fyt)eV)&(teP^f(t)ePr).
3.2 A Multilayer Approach to Graph Matching
As the graph model is proposed to store all the models together, search algorithms for this formalism have to be optimized for this task. A possible solution to this problem is to divide the HP-graph into two main levels: the level of vertices and hyperedges, and the level of poles and ordinary links between them. In this case, the search is started on the higher level, and when a candidate is found, it moves to the lower level, where a more detailed comparison of graph elements is performed. Fig. 3(a) illustrates an example of a query graph Q, which is a pattern for subgraph matching for a data G from Fig. 1. As is seen, it contains 4 vertices, 2 hyperedges and 4 poles. Its higher (or first) level is presented in fig. 3(b). It contains only 4 vertices and 2 hyperedges, whereas all the poles are eliminated. This layer is compared with the first layer of the graph G (fig. 4), and when a potential subgraph is found, the matrix of vertex correspondence is built.
Рис. 3. Граф-паттерн Q и его верхний уровень Fig. 3. Query graph Q and its first level
Рис. 4. Верхний уровень графа G Fig. 4. First level of the graph G The found correspondences between vertices of Q and G can be presented as a set {(v1', v2), (v3', v3), (v2', v4), (v4', v5)}. If a subgraph is found, the algorithm moves to the next level, where the corresponding hyperedges and their poles are compared.
All the candidate hyperedges are grouped by their incidence with each other depending on the poles which they consist of. For instance, hyperedges w1' and w2' are presented as a single group because of the pole p3' which both of them own. Thus, a corresponding pair (w3, w4) is also presented as a single group. All these groups are compared for exact isomorphism on the layer of poles and ordinary links. Fig. 5 demonstrates this layer for a pair of candidate groups (w 1', w2') and (w3, w4). All these hyperedges are decomposed and only their poles and links are considered on this stage. As these graphs are identical, the found correspondences between poles of incident hyperedges of graphs Q and G can be presented as a set {(p3', p9), (p4', pll), (p2', p7), (pl', p4)}.
Рис. 5. Сопоставление гиперребер (w3, w4) и (w1' w2') Fig. 5. Comparison of hyperedges (w3, w4) and (w1', w2')
If a validation on this hyperedge group is succeeded, the algorithm moves to the next group of hyperedges and validate them, until all the hyperedges are traversed. If a validation fails, the algorithm moves to the upper level and tries to find new pairs of vertices and hyperedges and validate them.
Lastly, the algorithm verifies that for every pole of the pattern graph only one pole of the source graph has been found. Otherwise, the found subgraph is considered as not isomorphic and the search continues.
4. Backtracking Graph Matching Algorithm based on the Multilayer Approach
The algorithm presented in this section uses as a basis a backtracking algorithm presented in [19]. This algorithm traverses a search tree using DFS until an isomorphic subgraph is found. If a pair of corresponding elements cannot be found at a certain step, a transition to an earlier step is carried out.
Considering the division of the subgraph matching into several levels, the search algorithm should be modified to perform the isomorphism search operation separately at the vertex level, separately at the hyperedge level, and separately at the level of poles and links.
Let CompElems define a set of compared elements: vertices, hyperedges or poles. Then, an algorithm for matching the corresponding sets of graph elements can be presented as follows (listing 1):
Function FindIsomorphism(G, Q, CompElems, args) :
MP, Ml, H, F, k, d = InitializeValues (G, Q, CompElems, args); do:
k = GetNextNonVisitedColumn (M, F, k); if (k = -1): if (d = 1):
return null; else
MakeStepBack(F, d, M, k); continue;
M = ChangeRowElementsToZerosExceptChosen(M, d, k); MakeStepForward(k, d, F, H, M) ; while (d < |CompElems(Q)|);
return ValidateIsomorph(M', CompElems(G), CompElems(Q));
Листинг 1. Псевдокод алгоритма сопоставления соответствующих множеств элементов графа Listing 1. Pseudocode of the algorithm that matches the corresponding sets ofgraph elements
This algorithm at the beginning initializes a matrix M0 which defines possible candidates between corresponding elements of graphs. If m°p — 1 then the i-th element of the first graph is a candidate for isomorphism for the j-th element of the second graph. Otherwise, they cannot form a pair of corresponding elements. At each step, the modification of this matrix is used to determine appropriate pairs of elements. Thus, it is needed to define rules for building this matrix for each set of ЯР-graph elements.
For vertices matching, external poles and vertices can be combined into one set and named as vertices (for simplification). Thus, the matrix M0 — |Qv^Qp|x|Gv^Gp| is filled according to the rule (2); if this condition is not met, m0j — 0:
m0ij — {1| Deg(vGj)>Deg(vQ,) & Count(vGj)>Count(vQ,)}, (2)
Deg(v) is a number of hyperedges incident to the vertex v, Count(v) is a number of the vertex poles. For hyperedges matching, the matrixM0 — |Qw|x|Gw| is filled according to the rule (3):
m0j — {1| Vertices(wGj) = Vertices(wQi)}, (3)
Vertices(w) is a set of vertices incident to the hyperedge w.
For poles matching, the matrix M0 is created for each pair of grouped hyperedges; thus M0 — |Pol(WQl)|x|Pol( WGm)|. The matrix is filled according to the rule (4), considering that graphs G and Q on this stage only contain those hyperedges that are presented in the current groups:
m°j — {1| vertex(pGj)=vertex(pQi) & deg(pGj)>deg(pQ,) &
& Vedge(pQi) Sedge(pGj) [edgepj) = edge(pQt)] & (4)
& Vedge(pGj) Sedge(pQi) [edgepj) = edge(pQt)] }, vertex(p) is a vertex which contains a pole p, edge(p) is an edge which is incident to a pole p, deg(p) is a degree of a pole (a number of ordinary links incident to a pole).
Listing 2 illustrates how an isomorphic subgraph for the proposed graph structure can be found. Vectors VCorr, WCorr and PolCorr contain pairs of corresponding elements of the graphs. Findlsomorphism method is presented above and is assumed to have a possibility to continue the
search from the position where the last candidate was found. For this purpose, the last argument for vertices and hyperedges isomorphism search is given to the algorithm (VCorr and WCorr respectively). GroupBylncidence combines the given hyperedges into groups, which represent incident edges.
Function FindHPGraphIsomorphism(G, Q) :
VCorr = [ |V(Q)uP(Q) | ], WCorr = [|W(Q)|]; PolCorr = [|Pol(Q)|]; do:
VCorr = FindIsomorphism (G, Qr V(Q)^P(Q), VCorr); if (VCorr = 0): continue;
do:
WCorr = FindIsomorphism(G, Q, W(Q), VCorr, WCorr); if (WCorr = 0 & |W(Q)| > 0): break;
incidentHyperedges = GroupByIncidence(WCorr, G, Q); for V (W'Q, W'g) e incidentHyperedges:
polWCorr = FindIsomorphism(G, Q, Pol(W'Q), VCorr, WCorr); if (!PolCorr.TryAppend (polWCorr)): PolCorr = 0; break;
if (PolCorr Ф 0 or |W(Q)| = 0):
unlinkedCorr = MatchUnlinked(G, Q, PolCorr, VCorr, WCorr); GenerateAnswer(PolCorr, unlinkedCorr, VCorr, WCorr); while (PolCorr = 0); while (VCorr Ф 0 & PolCorr = 0);
Листинг 2. Псевдокод алгоритма поиска изоморфного подграфа в HP-графе Listing 2. Pseudocode of the algorithm that finds an isomorphic subgraph in HP-graph
The main idea of this algorithm is to incrementally shorten the search field. While the search for vertices traverses all the vertices of the original graph, the search for hyperedges only moves through those edges that are connected with the already chosen vertices and utilizes information about their correspondence with the vertices of the query graph. Pole matching is performed for each group of incident hyperedges, where a sufficient quantity of combinations is pruned out by exploiting information about the corresponding vertices and hyperedges. The algorithm also checks and matches the unlinked poles if they exist, which can be done in linear or close to linear time as all the corresponding vertices are already found. For simplicity, the algorithm is given for searching for the first isomorphic subgraph but can be transformed to searching for all embeddings of a pattern.
5. Exploiting Pruning Techniques of the Existing Algorithms
To optimize algorithms certain existing techniques can be used. Adaptation of the main techniques of the existing algorithms to the proposed graph model can prove the possibility of adapting these algorithms as a whole and improve the efficiency of the algorithm presented above.
5.1 Ullmann Algorithm
Ullmann algorithm [19] is one of the first algorithms for subgraph matching. This algorithm uses a backtracking algorithm presented above and at each step it performs a refinement procedure to prune out unpromising pairs.
This algorithm is performed at each node of the search tree. It traverses the matrix M and converts a certain part of values from ones to zeros. The condition for preserving 1 is that if a vertex j of the original graph is a candidate of a vertex i of the pattern graph, then each neighbor of the vertex i must have at least one candidate among the neighbors of the vertex j. Otherwise, j cannot be a candidate for a vertex i.
This algorithm can be implemented for both vertex matching and pole matching to eliminate unpromising element pairs. The refining algorithm for vertices can be presented as follows (listing 3):
Function RefineV(G, Qr M) : do:
anyChanges = false; for Vi e Range(|V(Q)|): if (-3 j: [Mij = 1]):
return false; for Vj e Ranged V(G) |):
for Vx e V( Q) \{ vQi} where 3weW(Q) [w<^vQi±0 & wnx^0 ]:
if (-3yeV(G)\{ vGj} where 3weW(G) [wt^voj Ф0 & wny^0]& MXy = 1): Mij = 0; anyChanges = true; while (anyChanges) ; return true;
Листинг 3. Псевдокод алгоритма очистки для вершин HP-графа
Listing 3. Pseudocode of the algorithm that runs refining for vertices of the HP-graph
The algorithm goes through all the neighbors of the current query vertex, which have at least one common hyperedge with this vertex, and checks whether a source graph contains a corresponding neighbor-vertex. The algorithm for poles looks similarly but poles and ordinary links are used instead of vertices and hyperedges.
5.2 VF2 Algorithm
VF2 [20] has been proposed for performing subgraph matching on large graphs. Effective representation of data structures and the usage of feasibility rules significantly reduces both the average time complexity of the search and the amount of memory used.
The idea of the algorithm is to use special rules, called feasibility rules, at each node of the search tree to evaluate the feasibility of further progress on this branch of the tree before adding a pair of vertices to graph-candidates. There rules check consistency of graph-candidates and sufficiency of vertices-neighbors' quantity of the graph-candidate. If all the checks are passed, the algorithm can move to the next level of the tree.
An approach of checking the feasibility rules can be applied on both vertex and pole layers. As a pole layer is presented as an ordinary graph, the feasibility rules from [20] can be used without any significant modifications. However, feasibility rules for a vertex layer have to be defined. The first rule checks the consistency of the existent candidate graphs by checking correctness of connections with the already added vertices. Let coreo be a list of found pair vertices for the graph G and coreQ be a list of found pair vertices for the graph Q. Accordingly, let conno be a list of vertices which already have a pair or have a connection to the current graph-candidate G' and connQ be a similar list for the graph-candidate Q'. Then, the first rule can be presented as follows: Vn'[coreG[nr\£0 & n'eConn(G', n)]: 3m'[m'eConn(Q', m) & coreQ[mr] = nr] & & Vm'[coreQ[mr\p0 & m'eConn(Q', m)]: 3n'[n'eConn(G', n) & coreG[nr] = m]
Conn(G, v) is a set of vertices of the candidate-graph G, which are connected to the vertex v.
Let PC define a set of vertices that can be connected to the vertex u, but the graph G does not include them; then it can be represented as follows:
PC(G, u) — {v | veConn(G, u) & coreG[v]—0 & connG[v]^0}. Thus, a new rule, which compares numbers of newly added connections to graphs, appears:
\PC(G', n)\ > |PC(Q', m)|. The last rule performs a two-look-ahead in the searching process. Let N be a set of vertices which are connected to the target vertex but are not connected to the graph-candidate:
N(G, u) — {v| v eConn(G, u) & connG[v] — 0}. Then, the last rule is presented by the condition:
|N(G', u)| > N(Q', u)|. The algorithm for traversing vertices can be presented as follows (listing 4):
Procedure RecurseV(G, Q, vectors):
if (Vitem e vectors.coreQ[ item^0]):
polesQ = RecurseW(vectors.coreG, vectors.coreQ, 0, G, Q) ; if (polesQ Ф 0):
GenerateAnswer(polesQ); else:
vectors = RestoreVectors(vectors);
else:
P = GetAllCandidatePairs(vectors); for Vp e P:
if (CheckVFisibilityRules (p, vectors, G, Q): vectors = UpdateVectors(vectors, G, Q); RecurseV(G, Q, vectors); vectors = RestoreVectors(vectors);
Листинг 4. Псевдокод алгоритма обхода вершин HP-графа на основе алгоритма VF2
Listing 4. Pseudocode of the algorithm that traverses vertices of the HP-graph based on the VF2 algorithm
5.3 Graph Pattern Modification Algorithms
The usage of algorithms such as TurbolSO [23], CFL-Match [24] and other ones, that change a graph pattern, is complicated in the presented multilayer approach because these algorithms are made specifically for ordinary graphs. Their usage on the layer of vertices and hyperedges is a subject for the future research as it requires reformulation of their main aspects and ideas. Nevertheless, all these algorithms can be successfully used on the layer of poles and links and can find an isomorphic subgraph in the single-layer approach.
6. Complexity of the Algorithms
The presented algorithms can decrease the complexity of subgraph search by implementing matching on different graph layers. The search field shortens at each stage whereas the usage of pruning rules can also eliminate unpromising combinations of elements. Table 1 shows computational complexity of the backtracking algorithm at its main stages.
Табл. 1. Сложность алгоритма поиска с возвратом Table 1. Complexity of the backtracking algorithm
Algorithm Best Case Worst Case
Isomorphic Vertices Matching O(N2) O(NxN!)
Isomorphic Hyperedges Matching OCN2) O(NxN!)
Isomorphic Poles Matching O(N2) O(NxN!)
The evaluation of the backtracking algorithms based on the Ullmann refinement is presented in Table 2. As the algorithm of hyperedge matching does not implement this technique, its complexity stays the same.
Табл. 2. Сложность алгоритма Ульмана
Table 2. Complexity of the Ullmann algorithm_
Algorithm Best Case Worst Case
Isomorphic Vertices Matching O(N3) O(N3xN!)
Isomorphic Hyperedges Matching O(N2) O(NxN!)
Isomorphic Poles Matching O(N3) O(N3xN!)
The evaluation of the algorithms based on the VF2 approach is demonstrated in Table 3. The modification of the GetAllCandidatePairs procedure according to rules (2-4) slightly increases the worst-case complexity from NxN! to N2xN! and the best-case complexity from N2 to N3 but significantly shortens the search field.
Табл. 3. Сложность алгоритма VF2
Table 3. Complexity of the VF2 algorithm
Algorithm Best Case Worst Case
Isomorphic Vertices Matching O(N3) O(N2xN!)
Isomorphic Hyperedges Matching O(N3) O(N2xN!)
Isomorphic Poles Matching O(N3) O(N2xN!)
7. Conclusion
This paper proposed a solution to the problem of identifying isomorphic subgraphs in HP-graphs. The proposed approach is based on implementing matching on different graph layers of the graph model and incrementally shortening the search field at each layer.
The designed algorithms for subgraph matching based on the multilayer approach and evaluations of their complexity are presented above. The proposed approach incrementally decreases the search field of the algorithm and helps to decrease its overall complexity. The usage of pruning rules of the existing algorithms can eliminate unpromising candidates at each stage of the proposed algorithm and thus, significantly shorten the size of the search tree.
It is planned to evaluate actual time complexity of these algorithms on various data sets and develop a visual modeling system using the proposed approach to subgraph matching.
References / Список литературы
[1]. Koznov D.V. Methodology and tools for domain-specific modeling. Doctor Degree thesis. Saint-Petersburg, 2016, 430 p. (in Russian) / Кознов Д.В. Методология и инструментарий предметно-ориентированного моделирования. Диссертация доктора технических наук. СПб., 2016 г., 430 стр.
[2]. A formalism for describing software systems and computational processes for cyclic parallel processing of real time data. Information and control systems, 2006, no. 2, pp. 8-13 (in Russian) / Стручков И.В. Формализм для описания программных систем и вычислительных процессов циклической параллельной обработки данных реального времени. Информационно-управляющие системы, вып. 2, 2006, стр. 8-13.
[3]. Courcelle B. Recognizable Sets of Graphs, Hypergraphs and Relational Structures: A Survey. Lecture Notes in Computer Science, vol. 3340, 2005, pp. 1-11.
[4]. Power J., Tourlas K. Abstraction in Reasoning about Higraph-Based Systems. Lecture Notes in Computer Science, vol. 2620, 2003, pp. 392-408.
[5]. Sukhov A.O. Development of tools for creating visual subject-oriented languages. PhD thesis, Moscow, 2013, 256 p. (in Russian) / Сухов А.О. Разработка инструментальных средств создания визуальных предметно-ориентированных языков. Диссертация кандидата физико-математическиъх наук. М., 2013 г. 256 стр.
[6]. Mikov A.I. Performance evaluation: textbook. Krasnodar, Kuban State University, 2013, 89 p.
[7]. Suvorov N.M., Lyadova L.N. HP-Graph as a Basis of a DSM Platform Visual Model Editor. Suvorov N.M., Lyadova L.N. HP-Graph as a Basis of a DSM Platform Visual Model Editor. Trudy ISP RAN/Proc. ISP RAS, vol. 32, issue 2, 2020. pp. 149-160. DOI: 10.15514/ISPRAS-2020-32(2)-12.
[8]. Parra F. Dean T. Survey of Graph Rewriting applied to Model Transformations. In Proc. of the 2nd International Conference on Model-Driven Engineering and Software Development, 2014, pp. 431-441.
[9]. Ehrig H., Ehrig K., Prange U., Taentzer G. Fundamentals of Algebraic Graph Transformation. Springer 2006, 403 p.
[10]. Yan X., Yu P.S., Han J. Graph Indexing: A Frequent Structure-based Approach. In Proc. of the ACM SIGMOD International Conference on Management of Data, 2004, pp. 335-346.
[11]. Fan W. Graph pattern matching revised for social network analysis. In Proc. of the 15th International Conference on Database Theory, 2012, pp. 8-21.
[12]. Liu C., Lio B., Kropatsch W, eds. Advances in Graph-based Pattern Recognition. Pattern Recognition Letters, vol. 87, 2017, 230 p.
[13]. Przulj N., Comeil D.G., Jurisica I. Efficient Estimation of Graphlet Frequency Distributions in Proteinprotein Interaction Networks. Bioinformatics, vol. 22, no. 8, 2006, pp. 974-980.
[14]. Han M., Kim H. et al Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. In Proc. of the ACM SIGMOD International Conference on Management of Data, 2019, pp. 1429-1446.
[15]. Carletti V., Foggia P. et al. Challenging the Time Complexity of Exact Subgraph Isomorphism for Huge and Dense Graphs with VF3. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, 2018, pp. 804-818.
[16]. Ren X., Wang J. Exploiting Vertex Relationships in Speending up Subgraph Isomorphism over Large Graphs. Proceedings of the VLDB Endowment, vol. 8, no. 5, 2015, pp. 617-628.
[17]. Lee J., Han W. et ak. An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases. In: Proceedings if the VLDB Endowment, vol. 6, no. 2, 2012, pp. 133-144.
[18]. Seriy A.P., Lyadova L.N. An Approach to Graph Matching in the Component of Model Transformations. In Proc. of the 7th Spring/Summer Young Researchers' Colloquium on Software Engineering, 2013, pp. 41-46.
[19]. Ullmann J.R. An Algorithm for Subgraph Isomorphism. Journal of the ACM, vol. 23, no. 1, 1976, pp. 3142.
[20]. Cordella L.P., Foggia P. et al. (Sub)Graph Isomorphism Algorithm for Matching Large Graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 10, 2004, pp. 1367-1372.
[21]. Carletti V., Foggia P., Vento M. VF2 Plus: An Improved version of VF2 for Biological Graphs. Lecture Notes in Computer Science, vol. 9069, 2015, pp. 168-177.
[22]. Carletti V., Foggia P. et al. Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, 2018, pp. 804-818.
[23]. Han W., TurboISO: Towards UltraFast and Robust Subgraph Isomorphism Search in Large Graph Databases. In Proc. of the ACM SIGMOD International Conference on Management of Data, 2013, pp. 337-348.
[24]. Bi F., Chang L., Lin X., Qin L., Zhang W. Efficient Subgraph Matching by Postponing Cartesian Products. In Proc. of the ACM SIGMOD International Conference on Management of Data, 2016, pp. 1199-1214.
[25]. Shang H., Zhang Y., Lin X., Yu J.X. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. Proceedings if the VLDB Endowment, vol. 1, no. 1, 2008, pp. 364-375.
[26]. Zhao P., Han J. On graph query optimization in large networks. Proceedings if the VLDB Endowment, vol. 3, 2010, pp. 340-351.
Information about authors / Информация об авторах
Nikolai Mikhailovich SUVOROV - student. His research interests include language-oriented programming, modeling, language toolkits.
Николай Михайлович СУВОРОВ - студент бакалавриата НИУ ВШЭ-Пермь. Научные интересы включают языково-ориентированное программирование, моделирование, языковые инструментарии.
Lyudmila Nickolaevna LYADOVA - Candidate of Physical and Mathematical Sciences, associate professor of the Department of Information Technology in Business of the HSE (Perm). Research interests: modeling languages, modeling tools, domain specific modeling, language toolkits, semantic modeling.
Людмила Николаевна ЛЯДОВА - кандидат физико-математических наук, доцент, доцент кафедры информационных технологий в бизнесе НИУ ВШЭ (Пермь). Сфера научных интересов: языки моделирования, средства моделирования, предметно-ориентированное моделирование, языковые инструментарии, семантическое моделирование.