D2-synchronization in nondeterministic automata

Shabana Hanan

URAL MATHEMATICAL JOURNAL, Vol. 4, No. 2, 2018, pp. 99-110

DOI: 10.15826/umj.2018.2.011

d2-synchronization in nondeterministic

automata1

Hanan Shabana

Institute of Natural Sciences and Mathematics,

Ural Federal University, 51 Lenin aven., Ekaterinburg, Russia, 620000 Faculty of Electronic Engineering, Menoufia University, Egypt

hananshabana22@gmail.com

Abstract: We approach the problem of computing a D2-synchronizing word of minimum length for a given nondeterministic automaton via its encoding as an instance of SAT and invoking a SAT solver. In addition, we report some of the experimental results obtained when we had tested our method on randomly generated automata and certain benchmarks.

Keywords: Nondeterministic automata, Synchronizing word, SAT solver

Introduction

A nondeterministic finite automaton (NFA) is a triple A = (Q, £,5), where Q is a finite nonempty set of states, £ is a finite non-empty set of input symbols, and 5 is a map Q x £ a P(Q), where P(Q) is the power set of Q. The map 5 is called the transition function of A; it describes the action of symbols in £ at states in Q. As usual, we represent the NFA A by the labeled digraph with the vertex set Q, the label alphabet £, and the set of labeled edges

{q A q' | q, q' € Q, s € £, q' € 5(q, s)}.

A word over £ is a finite (maybe, empty) sequence of symbols from £. The set of all words over £ including the empty word is denoted by £*. If w = a1 ■ ■ ■ ai with a1,..., ai € £ is a non-empty word over £, the number I is said to be the length of w and is denoted by |w|. The length of the empty word is defined to be 0. The set of all words of a given length I over £ is denoted by £l.

For every NFA A = (Q, £, 5), we extend the function 5 to a function P(Q) x £* a P(Q) (still denoted by 5) by induction on the length of w € £*. If |w| =0, that is, w is the empty word, then, for each X C Q, we let 5(X, w) := X. If |w| > 0, we represent w as w = sw' with w' € £* and s € £ and, for each X C Q, let 5(X, w) := (JqeX 5(5(q,s),w') (the right hand side of the latter equality is defined by the induction assumption since |w'| < |w|). To lighten the notation, we write q.w for 5(q, w) and X.w for 5(X, w) whenever we deal with a fixed automaton.

The present note is a follow-up of the paper [12] by Volkov and the present author. We briefly recall the problem approached in [12] and, in parallel, introduce the problem that we tackle here. We are interested in synchronization of finite automata. The basic idea of synchronization is as follows: for a given automaton, we look for a sequence of input signals that allows us to predict the behaviour of the automaton after consuming these signals, no matter at which state the automaton was at the beginning. This input is called a synchronizing word, and if an automaton possesses such a word, it is called synchronizing.

1 Supported by the Competitiveness Enhancement Program of Ural Federal University.

The above informal idea of synchronization is fairly easy to formalize for deterministic automata but for NFAs it admits several non-equivalent formalizations. We are not going to survey all formalizations that appear in the literature and restrict ourselves to the two following versions of synchronization, both originating in [7].

Let A = (Q, X, ¿) be an NFA. A word w € X* is said to be D3-synchronizing if P|q&Q q.w = 0. In terms of the labeled digraph representing A, this condition amounts to saying that for each q € Q, there exists a directed path, whose consecutive labels form the word w, that starts at q and terminates at a certain common state, independent of q. Observe that this definition implies that the action of any D3-synchronizing word must be defined at every state of A. A NFA is called D3-synchronizing if it admits a D3-synchronizing word.

A word w € X* is said to be D2-synchronizing for A = (Q, X, ¿) if q.w = q'.w for all q, q' € Q. To understand the 'physical meaning' of this concept, imagine a big quantity of identical NFAs which get the same input sequence and work on it in parallel. If the sequence constitutes a D2-synchronizing word, then after consuming the input, the NFAs will demonstrate identical (that is, synchronous) behaviour, even though originally they all might have been in different states that were unknown to us.

In contrast to the condition f]q.w = 0, the equality q.w = q'.w does not imply that the action of w must be everywhere defined. However, the equality ensures that if a D2-synchronizing word is undefined at some state, the word must be nowhere defined. Thus, a D2-synchronizing word is either nowhere or everywhere defined; in the latter case, it is easy to see that the word is also D3-synchronizing. (The converse is not true: a D3-synchronizing word can fail to be D2-synchronizing.) A NFA is called D2-synchronizing if it has a D2-synchronizing word.

We mention that both D3- and D2-synchronization get very transparent meanings within a standard matrix representation of NFAs. In this representation, an NFA A = (Q, X, ¿) becomes a collection of |X| Boolean Q x Q-matrices where each input symbol s € X is encoded by a matrix M(s) whose (q, q')-entry is 1 if q' € ¿(q, s) and 0 otherwise. It is not hard to check that the automaton A is D3-synchronizing if and only if some product of the matrices M(s), s € X, has a column consisting entirely of 1s, and A is D2-synchronizing if and only if in some product of the matrices M(s), s € X, every column consists either entirely of 0s or entirely of 1s.

The problems of determining whether or not a given NFA is either D3- or D2-synchronizing are known to be PSPACE-complete and there is no polynomial in n upper bound on the length of D3- or D2-synchronizing words for NFAs with n states. (These facts follow from results found by Rystsov in the early 1980s [10, 11] and later rediscovered (and strengthened) by Martyugin [9].) Thus, given an NFA, finding a D3- or D2-synchronizing word of minimum length for it is computationally hard. In [12] the author and Volkov have approached the problem of finding a D3-synchronizing word of minimum length for a given NFA via the SAT-solver method. The method of treating computationally hard problems consists in encoding them as instances of the Boolean satisfiability problem (SAT) that are then fed to a SAT-solver, that is, a specialized program designed to solve instances of SAT. Modern SAT solvers are extremely powerful: they can solve instances with hundreds of thousands of variables and millions of clauses within a few minutes. Therefore the SAT-solver method has a very wide range of applications, see [5] for a survey. In particular, the method has been successfully invoked for studying synchronization in deterministic automata, see [6, 13]. Our results in [12] have demonstrated that the SAT-solver method can also be applied in the realm of NFAs. Here we extend the approach to the case of D2-synchronization.

The paper is organized as follows. Sect. 1 describes the encoding reducing our problem to SAT. Sect. 2 presents the main algorithm, outlines the settings of our experiments and gives samples of our experimental results. Sect. 3 collects a few final remarks, including a discussion of the future work in the direction of the present paper.

1. Encoding to SAT

We start with a precise formulation of the problem which we are going to study here.

D2W (the existence of a D2-synchronizing word of a given length): Input: A NFA A with two input symbols and a positive integer I. Output: YES if A has a D2-synchronizing word of length I; NO otherwise.

In [12] the present author and Volkov have considered the problem D3W that has exactly the same instances as D2W but asks whether or not A has a D3-synchronizing word of length I. For both D2W and D3W, the integer I is assumed to be given in unary; as explained in [12, Sect. 2], with I given in binary, it is not feasible to expect the existence of a polynomial reduction from D3W to SAT, and the very same argument applies to D2W.

It is fair to say that our encoding of D2W has been obtained as a modification of the encoding of D3W suggested in [12]. However, restricting here to the "new" part of the encoding only would make the present paper difficult to follow without looking at [12] at every single step of the way. Therefore, we have preferred to describe our encoding in a self-contained manner, even though this causes a few overlaps with [12].

Recall that an instance of SAT is a pair (V, C), where V is a set of Boolean variables and C is a collection of clauses over V. (A clause over V is a disjunction of literals and a literal is either a variable in V or the negation of a variable in V.) The answer to an instance (V, C) is YES if (V, C) has a satisfying assignment (i.e., a truth assignment on V that satisfies C) and NO otherwise. We aim to construct a polynomial reduction of D2W to SAT. For this, we have to find two polynomials v(x,y) and c(x,y) (preferably of low degrees in x and y) with the following property: given an arbitrary instance (A, I) of D2W, where A = (Q, £, 5) is an NFA with two input symbols, we are able to produce an instance (V, C) of SAT such that the answer to (A, I) is YES if and only if so is the answer to (V,C), while |V| < v(|Q|,l) and |C| < c(|Q|,l).

Throughout our encoding, we let £ := {0,1} and Q := {qo,..., qn-1}. For a state q € Q, we use the expressions P0 (q) and P^q) to denote the sets of all preimages of q under the actions of the input symbols 0 and 1 respectively; that is, if a is either of the two symbols, then

Pa(q) := {p € Q | q € p.a}.

First we define the set V of variables. We need two sorts of variables: letter variables and token variables.

The letter variables are x1,..., xi. The variable xt, 1 < t < I, plays the role of an indicator for the t-th symbols at in the input word w := a1 ■ ■ ■ ai € £l: the value of xt is 1 if and only if at = 1.

The token variables are y j where i, j = 0,..., n — 1 and t = 0,1,2,..., I. To explain the role of these variables, we use a solitaire-like game r on the labeled digraph representing the NFA A. In the initial position of r, each state q» € Q holds exactly one token denoted i. In the course of the game, tokens migrate and may multiply or disappear according to the previous position of the game and the action of the player. Namely, at each move an input symbol a € £ is chosen. Then for each state q € Q such that q.a = 0, all tokens that were held by q slide along the edges labeled a to all states in the set q.a. (If |q.a| > 1, then every token held by q gives rise to |q.a| identical tokens, one for each state in q.a.) If q.a = 0, then all tokens that were held by q disappear. Thus, after the move, the token i occurs at a state p € Q if and only if p € q.a for some state q that had held i just prior to the move. For an illustration, Fig. 1 (borrowed from [12]) demonstrates the initial case of a 5-state NFA with the input alphabet {0,1} (top), along with the outcomes of the first move, depending on whether 0 or 1 has been chosen for the move (bottom left and bottom right, respectively).

Figure 1. Redistribution of tokens after the first move

0

1

The intended meaning of the variables yj (which will be enforced by the condition we impose

on them later) is as follows: yj = 1 should mean that after t rounds of the game r, one of the tokens held by the state qj is i.

Perhaps, it makes sense to add a matrix interpretation of the game r as the token variables get quite a clear meaning under this interpretation. The initial position of r can be thought of as the identity Boolean Q x Q-matrix. At each move, an input symbol a € X is chosen and the matrix of the current position is right multiplied by the matrix M(a). Then for each fixed t, the values of the variables yj are exactly the entries of the matrix corresponding to the position of r after t moves. For instance, the matrices that correspond to two possible positions of the game played on the 5-state NFA in Fig. 1 are

/1 1 0 0 0\

0 0 10 0

0 0 0 1 0

\1 0 0 0 0/

and

00110 01000 00000 00001 01000

Altogether, V consists of n2(l +1) +1 variables so that we can take the polynomial x2(y +1) + y to play the role of v(x,y) from the above definition of polynomial reduction. For the reduction from D3W to SAT in [12], an extra set of n variables (the so-called synchronization variables) was used. Here we have managed to slightly decrease the number of variables.

Now we describe the set C of clauses over V corresponding to the instance (A, I). As in [12], C is the disjoint union of set Co of initial clauses, the sets Ct, t = 1,... ,1, of transition clauses, and the set S of synchronization clauses. The clauses in C0,C1,...,Ci are constructed exactly as in [12] (but we will recall the construction for the reader's convenience) while the clauses in S are

essentially different as these are the clauses that reflect the essence of ^-synchronization.

The clauses in C0 describe the initial position of the game r. In this position, each state qi € Q holds the token i and nothing else. Therefore C0 consists of n2 one-literal clauses, namely, the clauses y°0,..., y°n_1 n_1 along with all clauses of the form -yOj with i = j.

In order to define Ct for t = 1,..., I, consider for all i, j = 0,..., n — 1, the following formulas:

: yij ^ (xt A \/ y-1) v(-xt A \/ y^1).

qk ePi(qj) qh&Po(qj)

The equivalence is nothing but a direct translation of the above propagation rule for the tokens in the language of propositional logic. Indeed, it says that the token i occurs at the state Qj after t moves if and only if one of the following alternatives takes place:

• the t-th move was done with the input symbol 1 and one of the preimages of Qj under the actions of 1 was holding i after t — 1 moves, or

• the t-th move was done with the input symbol 0 and one of the preimages of Qj under the actions of 0 was holding i after t — 1 moves.

The following fact is a special instance of [12, Lemma 2]:

Lemma 1. Every truth assignment ^: {x1,...,xl} ^ {0,1} on the letter variables has a unique extension Tp to the token variables yjj that makes all the clauses in Co and all the formulas hold true (i, j = 0,..., n — 1, t = 1,..., €). The token variable yjj gets value 1 under Tp if and only if after the moves ^(x1),..., <^(xt) of the game r one of the tokens held by the state Qj is i.

Now, for each t = 1,... ,1, we let Ct be the set of all clauses of a suitable CNF (conjunctive normal form) equivalent to /\ . Of course, there are many ways to convert the latter formula

1<i,j<n

to an equivalent CNF, but in order to reuse a part of code written for [12], we retain for Ct the following set of clauses:

-yjv xtv V y^ -yj v-xtv V y^ (rn) qh&Po(qj) qk&Pi(qj)

yj v -xt v -y^1 for each qfc € P^Qj), (1.2)

yj v x t v -yt-1 for each qh € Po(Qj). (1.3)

Clauses of the form (1.1)-(1.3) simplify if one of the sets P0(Qj) or P^Qj) is empty. In (1.1) the disjunctions over the empty sets are omitted so that if, say, P0(qj) = 0, then the first clause in (1.1) reduces to -yj v xt. As for (1.2) or (1.3), these clauses disappear whenever P^Qj) or, respectively P0(Qj) are empty. Thus, if the state Qj is such that P0(qj) = P^Qj) = 0, then both (1.2) and (1.3) vanish and the two clauses in (1.1) reduce to -yj v xt and -yj v -xt. The latter pair of clauses is clearly equivalent to just -yj whence all clauses (1.1)-(1.3) reduce to -yj for this particular j and for all i = 0,..., n — 1 and t = 1,..., I. This fact amounts to expressing the following simple idea: if the state Qj has no incoming edges, then no token can arrive at Qj after any move of the game r.

Let m stand for the number of all transitions in A, that is, triples (q, a, q') € Q x £ x Q with q' € ¿(q, a). Clearly, m < 2n2. For each fixed i, the number XjL^lP^Qj)l + |P0(Qj)l) of clauses of the forms (1.2) and (1.3) is equal to m, whence the total number of such "short" clauses is mn. As for "long" clauses in (1.1), there are at most two such clauses for each fixed pair (i, j), whence their total number does not exceed 2n2. Altogether, |Ct| < n(m + 2n) < 2n2(n +1) for each t = 1,..., I.

While clauses in IJi=0 Ct coincide with those used in [12], the sets of synchronization clauses in [12] and here are different. The present set S contains n2 disjunctions of the following form:

V y-+1(mod n) j, i,j =0,...,n - L (1-4)

Clearly, for each fixed j, the clauses (1.4) are equivalent to the cycle of implications

£ £ £ £ £ £ y0j " y1j, y1j " y2j, . . . , yn-1 j " y0j

that expresses the idea of D2-synchronization as follows: if the state qj holds some token after I moves, then qj must hold all n tokens 0,1,..., n — 1. Observe that the clauses (1.4) are satisfied if all variables yj get value 0; by Lemma 1 this happens exactly when all tokens disappear after I moves which means that the word w € X1 corresponding to the chosen sequence of moves is nowhere defined. In this paper we are interested in finding only those D2-synchronizing words that are somewhere defined; we refer to them as proper D2-synchronizing words. Therefore, we add to the set S the following clause:

V y0j. (1.5)

0<j<n-1

The clause (1.5) is satisfied if and only if some state holds the token 0 after I moves; in the presence of (1.4), the latter fact is equivalent to the claim that some state holds some token after I moves, which in turn means that the word w is somewhere defined.

The whole set C = S U Ut=o Ct consists of at most 2n2((n + 1)1 + 1) + 1 clauses. Thus, the polynomial 2x2((x + 1)y + 1) + 1 can be taken as c(x,y) from the definition of polynomial reduction. Summarizing the above discussion, we arrive at the following result parallel to [12, Theorem 3].

Theorem 1. An NFA A has a proper D2-synchronizing word of length I if and only if the instance (V, C) of SAT constructed above is satisfiable, and the construction takes time polynomial in the size of A and the value of 1. Moreover, a word w = a1 ■ ■ ■ ai with a1,..., ai € {0,1} is proper D2-synchronizing for A if and only if the map xt " at, t = 1,..., I, extends to a satisfying assignment for (V, C).

2. Experimental results

The general scheme of our experiments follows [12] mutatis mutandis. We outline our basic procedure, commenting on similarities with and differences from the procedure implemented in [12].

1. A positive integer n (the number of states) is fixed. As in [12], we have considered n < 100.

2. A random NFA A with n states and 2 input symbols is generated. We have used the same two models of random generation that were used in [12] but we provide details below for the reader's convenience. As in [12], we disregard NFAs that have no everywhere defined input symbol because such NFAs possess neither D3-synchronizing nor proper D2-synchronizing words.

3. A positive integer l0 (the hypothetical length of the shortest D2-synchronizing word for A) is chosen. Taking into account the fact that proper D2-synchronization is more restrictive than D3-synchronization, we have used slightly larger values of l0 than in [12]. We introduce three integer variables lm;n, I, and lmax and initialize them as follows: lm;n := 1, I := l0,

lmax :— 2l0.

4. The pair (A, 1) is encoded into a SAT instance (V',C') as described in Sect. 1.

5. The instance (V',C) is scaled to the instance (V, C) that encodes the pair (A,1), see Remark 1 below.

6. The SAT solver MiniSat 2.2.0. is invoked to solve the SAT instance (V,C). We refer to [3] for a description of the underlying ideas of MiniSat and to [4] for a discussion and the source code of the solver.

7. The binary search on I is performed. If MiniSat returns YES on the instance (V,C), we check whether or not I = lmin. If I = lmin, then I is the minimum length of proper ^-synchronizing words for A, and we pass to Step 2 to generate another NFA. If I > lmin, we keep the value of lmin, update lmax and I by letting

I — I I —

lmax •— l, 1 • —

lmin +

and pass to Step 5.

If the MiniSat returns NO on the instance (V, C), we check whether or not I — lmax. If I — lmax, we interpret this as the evidence that the NFA A fails to be properly ^-synchronizing2 and go to Step 2 to generate another NFA. If I < lmax, we keep the value of lmax, update lmin and I by letting

lmin •— 1 + 1 1 • —

2

and pass to Step 5.

Remark 1. In the course of the binary search outlined above, we have to consider instances of D2W with the same NFA A but different values of I. An important feature of the encoding presented in Sect. 1 is that as soon as we have constructed the "primary" SAT instance (V',C) that encodes the D2W instance (A, 1), we are in a position to scale (V',C) to the SAT instance encoding the D2W instance (A, I) for any value of I. In order to explain this feature, recall that MiniSAT accepts its input in the following text format (so-called simplified DIMACS CNF format). Every line beginning c is a comment. The first non-comment line is of the form:

p cnf NUMBER_OF_VARIABLES NUMBER_OF_CLAUSES Variables are represented by integers from 1 to NUMBER_OF_VARIABLES. The first non-comment, line is followed by NUMBER_OF_CLAUSES non-comment, lines each of which defines a. clause. Every such line starts with a space-separated list of different non-zero integers corresponding to the literals of the clause: a positive integer corresponds to a literal which is a variable, and a negative integer corresponds to a literal which is the negation of a variable; the line ends in a space and the number 0.

Given an NFA A with n states, we write the SAT instance (V', C'), which corresponds to (A, 1), in DIMACS CNF format, representing the variables yj, yj, and xi by the numbers, respectively, in + j + 1, n2 + in + j + 2, and n2 + 1. Consider, for a simple illustration, the NFA E2 shown in Fig. 2.

Table 1 in the next page presents our encoding of the D2W instance (E2,1) as a SAT instance. In the left column the SAT instance is shown as a list of clauses while the right column shows it in DIMACS CNF format.

2

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

2Of course, the equality I — lmax only means that A has no proper ^-synchronizing word of length < 2lo, and it is not excluded, in principle, that the NFA is properly D2-synchronizing but its shortest proper D2-synchronizing word is very long. However, by choosing appropriate values of the parameter l0, we have drastically minimized the number of the "bad" cases when the SAT solver returns NO and I — lmax so that we have been able to analyze each of them individually.

Clauses

DIMACS CNF lines

C0

S '

yoo —yoi —yOo

y Oi

nyOo V xi V yOo V yOi nyoo V —xi

yOo V x1 V —y°o

—yoi

yOo V x i V —yO iV x iV yO i —y0 i V —x i V yOo

yo V

ix i—yoo

yO i V x i V —y°i —y 1o V x i V yOO V y Oi —y iO V —x i y Jo v x i V —y Oo y Jo V x i V—y Oi —y J i V x i V yO i

—y i i V —x i V yOo

y V

ix 1—y Oo

y J i V x i V—y Oi

—yoo V yO i —yO iV yOo —y Jo V y 1 1

—y 1 i V y io

yOoV yO i

p cnf 9 25

1 0 -2 0 -3 0 4 0

-65120 -6 -5 0 6 5-10

6 5-20 -7 5 2 0 -7-5 10 7-5-10

7 5-20 -85340 -8 -5 0

8 5-30

8 5-40 -9 5 4 0 -9 -5 3 0 9-5-3 0

9 5-40

-6 7 0 -7 6 0 -8 9 0 -9 8 0 6 7 0

Table 1. The SAT encoding of the D2W instance (E2,1)

Now, in order to scale (V',C') to the SAT instance (V,C) that encodes the pair (A,1) for some given I > 1, we perform the following transformations on the DIMACS CNF representation of CC = C0 U C U S':

1. In the first non-comment line, replace NUMBER_OF_VARIABLES and NUMBER_OF_CLAUSES by

n2(l + 1) +1 and respectively IN + 2n2 + 1, where N is the number of clauses in C'.

2. Keep the lines corresponding to the clauses in CO and C'.

3. For each t = 2,... ,1, add all the lines obtained from the ones corresponding to the clauses C' by keeping the sign of every non-zero integer and adding (t — 1)n2 +1 — 1 to its absolute value.

4. In each line corresponding to a clause in S', substitute every nonzero integer ±k by the integer ±(k + (I — 1)n2 +1 — 1).

Our experiments have been performed on a personal computer equipped with an Intel(R) Core(TM) i5-2520M processor with 2.5 GHz CPU and 4GB of RAM. We have implemented the described algorithm in C++ and compiled with GCC 4.9.2. For various fixed n < 100, up to 1000 NFAs with n states have been generated and analyzed. We have generated 1000 automata for each n € {5,10,..., 30}, 700 automata for each n € {35,40,..., 60}, 500 automata for each n € {65, 70,..., 80}, and 200 automata for each n € {90,100}. The calculations have taken « 400 seconds for n = 10 and « 1.2 ■ 105 seconds for n = 80.

As in [12], the two models used for random generation of an NFA A = (Q, £, ¿) with n states and 2 input symbols were the uniform model based on the uniform distribution and the Poisson model based on the Poisson distribution with some parameter A. For each state q € Q and each symbol s € £, we first choose a number k € {0,1,2,... ,n} that serves as the cardinality of the set ¿(q,s). In the uniform model, each k is chosen with probability 1/(n + 1) while in the Poisson model with parameter A, each k < n is chosen with probability e-AAfc/k! and n is chosen with probability 1 — e-A XX-o Ak/k!. With k having been chosen, we proceeded the same in both models, by choosing ¿(q, s) from all subsets of Q with cardinality k uniformly at random.

For NFAs generated under the uniform model, we have observed that for an overwhelming majority of properly D2-synchronizing NFAs, the length of the shortest proper D2-synchronizing word is 3, and this conclusion does not depend on the number of states within the range of our experiments. Recall that the experiments in [12] revealed quite a similar phenomenon for D3-synchronization: if a NFA generated under the uniform model is D3-synchronizing (which happens with the probability « 60%, see [12, Proposition 5]), then its shortest D3-synchronizing word has length 2, and this fact does not depend on the number of states. An informal explanation of the latter phenomenon can be found in [12]; similar arguments apply also in the present situation.

Thus, the uniform model fails to produce any "slowly synchronizing" NFA. This indicates that using SAT-solvers in the uniform setting was not really necessary since a brute-force approach would suffice. Indeed, given an NFA A = (Q, £, ¿), one can write all words over £ up to a given length in the short-lex order and apply each of these words to A until one finds a D2-synchronizing word. As our experiments reveal, for a majority of NFAs generated under the uniform model, the brute-force approach requires to check only words up to length 3.

For random NFAs generated under the Poisson model, our experiments show that if the parameter A is fixed, the length of the shortest proper D2-synchronizing word grows with the number of states but the growth rate is rather small. Some sample experimental results are presented in Fig. 3. The three graphs in Fig. 3 correspond to NFAs with 30, 45, and 60 states generated under the Poisson models with A = 2 and demonstrate how these NFAs are distributed according to the length of their shortest proper D2-synchronizing words. The horizontal axis is the minimum length of proper D2-synchronizing words and the vertical axis is the number of NFAs. We have applied the method of least squares to our experimental data, searching for an explicit function of n that approximates the mean value E^(n) of the minimum lengths of proper D2-synchronizing words for n-state NFAs generated under the Poisson model with a given parameter A. The best

< 400

N o

J 200

m u

N

0

The length of the shortest synchronizing word

Figure 3. Distributions of random NFAs with 30, 45, and 60 states generated under the Poisson model with A — 2 according to the minimum lengths of their proper D2-synchronizing words

approximations have been provided by logarithmic functions; for instance, for A — 2, we have found the following solution:

#2(n) - -0.39 +2.2 ln(n).

.2 0.25 it ia

I 0.2

d r

a d

§ 0.15 t s

itla 0.1 le

Figure 4. The relative standard deviation of the minimum lengths of proper D2-synchronizing words for n-state NFAs as a function of n

Fig. 4 shows the relation between the relative standard deviation of our datasets and the number of states (for A — 2).

Besides experimenting with randomly generated NFAs, we have tested our approach on certain provably "slowly synchronizing" NFAs considered in the literature. Here we report a set of results in which we used as benchmarks several automata from the series Pn suggested by de Bondt, Don, and Zantema [2]. The state set of Pn is {1,2,... ,n}, n > 3, and the input alphabet consists of

Number of states

two letters a and b whose actions are defined as follows:

q.a :

q + 1 if q = 1, 2, q if q = 3,..., n;

q.b :

undefined if q = 1, q + 1 if q = 2,... ,n — 1,

1 if q = n.

Thus, the automata Pn are partial deterministic, and it is easy to see that for partial deterministic automata, proper D2-synchronizing words coincide with ^-synchronizing words and coincide with so-called carefully synchronizing words considered in [2]. Hence we can compare the information about the length of shortest synchronizing words for Pn obtained in [2, Theorem 3] and the results produced by an application of our procedure. In our experiments, we have examined all automata Pn with n — 4, 5,..., 11, and for each of them, our result has matched the theoretical value predicted by [2, Theorem 3]. The time consumed ranges from 0.301 sec for n — 4 to 4303 sec for n — 11. Observe that in the latter case the shortest synchronizing word has length 116; clearly, this value is out of reach for any brute-force method.

We have presented a modification of the approach originated in [12] that has allowed us to find shortest proper D2-synchronizing words for nondeterministic automata with two input letters and up to 100 states. The size of automata that we are able to analyze may seem modest in comparison with the results of [8] whose authors describe sophisticated methods to compute shortest synchronizing words for deterministic automata with up to 350 states. However, two important nuances should be taken into account. First, for the time being, the approach of [12] and the present paper appears to be the only one that has proved to work in the realm of nondeterministic automata. Second, it is well known that nondeterministic automata may be exponentially more succinct than equivalent deterministic ones, and, say, an NFA with 100 states may encode the same amount of information as a DFA with 2100 states.

We have concentrated on D2-synchronizing words which are everywhere defined. In fact, shortest nowhere defined words are even easier to be find with a similar method. The point is that in terms of our game r from Sect. 1, a nowhere defined word is just a word which application removes all tokens. However, if all tokens are going to be eventually removed, there is no need to distinguish between them! Therefore one can drastically reduce the number of variables and clauses used in the encoding. Instead of the 3-parameter set of variables {yj } used in Sect. 1, it suffices to consider the 2-parameter set {yj} where yj — 1 should mean that after t rounds of the game r, the state qj holds a token; similarly, the role of the 3-parameter set of formulas {^j} can be played by the 2-parameter set consisting of the formulas

for all j — 0,... ,n — 1 and t — 1,... ,1. Similar simplifications apply to the sets of initial and synchronization clauses. Therefore, it made no sense to search for nowhere and everywhere defined D2-synchronizing words simultaneously, although it was possible (for this, one just had to remove the clause (1.5) from the set of synchronization clauses).

Yet another version of synchronization for nondeterministic automata suggested in [7] is D1-synchronization. A word w € X* is said to be D1-synchronizing for A — (Q, E,5) if q.w — q'.w and |q.w| — 1 for all q,q' € Q. Clearly, every D1-synchronizing word is everywhere defined and is D2-synchronizing but the converse is not true: an everywhere defined D2-synchronizing word need not be ^-synchronizing. We can use encodings similar to those in [12] and the present paper in

3. Conclusion and future work

qk ePl(Çj )

order to find shortest Di-synchronizing words for NFAs of reasonable sizes; one only has to adjust the set of synchronization clauses.

We think that the results presented here and in [12] demonstrate that our approach works in principle but, of course, its present implementation is only a toy prototype for a system that could be used for real-world applications. There are several resources, on both software and hardware sides, which can be employed to speed up our calculations and enlarge their range. In particular, one can try more advanced SAT-solvers, such as CryptoMiniSat [14] and lingeling [1], and run a version of our program on a multiprocessor grid.

Acknowledgments. The author thanks the anonymous referees for their constructive comments and recommendations.

REFERENCES

1. Biere A. Yet another local search solver and lingeling and friends entering the SAT Competition 2014. In: Proceedings of SAT Competition 2014: Solver and Benchmark Descriptions. University of Helsinki, 2014. P. 39-40. URL: http://fmv.jku.at/papers/Biere-SAT-Competition-2014.pdf

2. de Bondt M., Don H., Zantema H. Lower bounds for synchronizing word lengths in partial automata. Preprint, 2018. URL: https://arxiv.org/abs/1801.10436

3. Een N., Sorensson N. An extensible SAT-solver. Lect. Notes Comput. Sci., Vol. 2919: Theory and Applications of Satisfiability Testing (SAT 2003). 2004. P. 502-518. DOI: 10.1007/978-3-540-24605-3.37

4. Een N., Sorensson N. The MiniSat Page. URL: http://minisat.se

5. Gomes C. P., Kautz H., Sabharwal A., Selman B. Satisfiability Solvers. Ch. 2. In: Handbook of Knowledge Representation, Elsevier, 2008. P. 89-134. DOI: 10.1016/S1574-6526(07)03002-7

6. Giinicen C., Erdem E., Yenigiin H. Generating shortest synchronizing sequences using Answer Set Programming. In: Proceedings of Answer Set Programming and Other Computing Paradigms (ASPOCP 2013). P. 117-127. URL: https://arxiv.org/abs/1312.6146.

7. Imreh B., Steinby M. Directable nondeterministic automata. Acta Cybernetica. 1999. Vol. 14, no. 1. P. 105-115.

8. Kisielewicz A., Kowalski J., Szykula M. Computing the shortest reset words of synchronizing automata. J. Comb. Optvm. 2015. Vol. 29, no. 1. P. 88-124. DOI: 10.1007/s10878-013-9682-0

9. Martyugin P. Synchronization of automata with one undefined or ambiguous transition. Lect. Notes Comput. Sci., Vol. 7381: Implementation and Application of Automata (CIAA 2012). 2012. P. 278-288. DOI: 10.1007/978-3-642-31606-7-24

10. Rystsov I.K. Polynomial complete problems in automata theory. Inf. Process. Lett. 1983. Vol. 16, no. 3. P. 147-151. DOI: 10.1016/0020-0190(83)90067-4

11. Rystsov I. K. Asymptotic estimate of the length of a diagnostic word for a finite automaton. Cybernetics. 1980. Vol. 16, no. 1. P. 194-198. DOI: 10.1007/bf01069104

12. Shabana H., Volkov M. V. Using Sat solvers for synchronization issues in nondeterministic automata. Siberian Electronic Math. Reports. 2018. Vol. 15. P. 1426-1442.

URL: http://semr.math.nsc.ru/v15/p1426-1442.pdf

13. Skvortsov E., Tipikin E. Experimental study of the shortest reset word of random automata. Lect. Notes Comput. Sci., Vol. 6807: Implementation and Application of Automata (CIAA 2011), 2011. P. 290-298. DOI: 10.1007/978-3-642-22256-6.27

14. Soos M. CryptoMiniSat 2. URL: http://www.msoos.org/cryptominisat2/

D2-synchronization in nondeterministic automata Текст научной статьи по специальности «Физика»

Аннотация научной статьи по физике, автор научной работы — Shabana Hanan

Похожие темы научных работ по физике , автор научной работы — Shabana Hanan

Текст научной работы на тему «D2-synchronization in nondeterministic automata»