[Solved]: Finding maximal factorization of regular languages

Problem Detail: Let language $mathcal{L} subseteq Sigma^*$ be regular. A factorization of $mathcal{L}$ is a maximal pair $(X,Y)$ of sets of words with

$X cdot Y subseteq mathcal{L}$
$X neq emptyset neq Y$,

where $X cdot Y = {xy$ | $x in X, y in Y}$. $(X,Y)$ is maximal if for each pair $(X’,Y’) neq (X,Y)$ with $X’cdot Y’ subseteq mathcal{L} $ either $X not subseteq X’$ or $Y not subseteq Y’$. Is there a simple procedure to find out which pairs are maximal? Example: Let $mathcal{L} = Sigma^∗ab Sigma^∗$. The set $F = {u, v, w}$ is computed:

$u =(Sigma^∗, Sigma^∗abSigma^∗)$
$v = (Sigma^∗aSigma^∗, Sigma^∗bSigma^∗)$
$w = (Sigma^∗abSigma^∗, Sigma^∗) $

where $Sigma = {a,b}$. Another example: $Sigma = {a, b}$ and $mathcal{L} = Sigma^*aSigma$ Factorization set $F = {q, r, s, t}$ with

$q = (Sigma^*, mathcal{L})$
$r = (Sigma^*a, Sigma + mathcal{L})$
$s = (Sigma^*aa, epsilon + Sigma + mathcal{L})$
$t = (mathcal{L}, epsilon + mathcal{L}) $

Asked By : Laura

Answered By : Cornelius Brand

As suggested in the comments to the question, I will try to give an (unfortunately partial) answer to the question, at least to the extent that I have understood the problem myself (this implies that you may well find mistakes, and if you find a way to more briefly or clearly explain one of the below points, feel free to edit the answer accordingly): First, one should note that we do not actually have to compute the universal automaton of a language if we want to compute the factorizations of a language. From the paper mentioned in my comment¹, there is a 1-1 correspondence between left and right factors of a regular language, that is, given a left factor of the language, the corresponding right factor is uniquely determined and vice versa. More precisely, we have the following: Let $(X,Y)$ be a factorization of $L$. Then $$Y = bigcap_{x in X}x^{-1}L, X = bigcap_{y in Y}Ly^{-1},$$ that is, any left factor is an intersection of right quotients, and any right factor is an intersection of left quotients. Conversely, any intersection of left quotients of $L$ is a right factor of $L$, and any intersection of right quotients of $L$ is a left factor of $L$. Note that for a regular language, there is only a finite set of left and right quotients, and thus or problem reduces to computing the left and right quotients of a language, and then to compute their $cap$-stable closure, that is, a minimal superset of the quotients that is closed under intersection. These are then precisely the right factors and left factors, and then it is usually easy to see which pairs are subsets of $L$.

Example

In order to illustrate the above points, consider the first example in the question (of which I also think that it is incorrect in the paper): Let $L = Sigma^ast ab Sigma^ast$. Now, the left quotients of $L$ are the sets $x^{-1}L$ for $xin Sigma^ast$, that is, those words $u$ in $Sigma^ast$ that can be prefixed with $x$, i.e. $xu in L$. When is $y^{-1}L=x^{-1}L$ for distinct $x,y$? This is the case if and only if $x$ and $y$ can be augmented to words in $L$ with precisely the same suffixes. This means, to put it into more familiar terms, they are Nerode-equivalent, and the suffixes that are needed to append to words in a Nerode class are precisely the respective left quotients. For $L$, we see that our Nerode-equivalence classes are

$N_1$, the set of words not containing $ab$ as a factor and ending with $a$,
$N_2$, the set of words ending with $b$ and not containing $ab$ as a factor, and
$N_3$, the set of words containing $ab$ as a factor, that is, $N_3 = L$

They can be augmented with the following sets (that is, these are the left quotients of the words in the respective classes):

$S_1 = x^{-1}L$ for $x$ in $N_1$ consists of all words in $L$ (any word can be augmented with a word containing $ab$ as a factor and thus becomes a word in $L$) and $bSigma^ast$, that is $S_1 = L cup bSigma^ast$
$S_2 = x^{-1}L$ for $x$ in $N_2$ is the language itself, that is, $S_2 = L$ and
$S_3 = x^{-1}L$ for $x$ in $N_3$ is obviously $Sigma^ast$. That is, we have found three right factors of $L$. As $S_2subset S_1subset S_3$, their $cap$-stable closure is trivially ${S_1,S_2,S_3}$, and those are then precisely the right factors.

Hence, our factorization set $mathcal{F}_L$ is of the form $(P_1,S_1),(P_2,S_2),(P_3,S_3)$. Now, for the left factors $P_i$, we use the equations of the beginning of this answer: $$ P_i = bigcap_{xin S_i} Lx^{-1} $$. For $P_1$, this yields $L cup Sigma^ast a$, for $P_2$ we get $Sigma^ast$ and for $P_3$, we obtain $L$. You can see this by inspection (the most popular excuse for being too lazy to state a formal proof) or by explicitly computing the right quotients (which is fairly analogous, although not completely, to computing the left quotients). Our factorizations are thus given by $mathcal{F}_L = {u,v,w}$ where

$u = (P_1,S_1) = (Sigma^ast ab Sigma^ast cup Sigma^ast a, Sigma^ast ab Sigma^ast cup bSigma^ast)$
$v = (P_2, S_2) = (Sigma^ast, Sigma^ast ab Sigma^ast)$ and
$w = (P_3, S_3) = (Sigma^ast ab Sigma^ast, Sigma^ast)$

Summary

To summarize (as you were asking for a simple procedure):

For computing the factorizations of a language $L$, first compute the left quotients of $L$.
You can do so, in the language of the paper, by constructing a minimal DFA $A$ for $L$ and then for each state $q$ in $A$ (corresponding, as a Nerode-equivalence class, to a left quotient) compute the future of $q$ in $A$, thus obtaining one left quotient of the language for each state.
The collection of left quotients obtained in this way yields, in general, a subset $S_R$ of the right factors.
Compute then the $cap$-stable closure of $S_R$, which can be done in practice by forming the intersection of any subset of $S_R$ and adding any subset obtained in this way to $S_R$.
The set $S_R$ together with all the intersections from the previous step is then the set of right factors of $L$.
In order to obtain the left factors, we can compute the right quotients of $L$.
These are sets of the form $Ly^{-1}$, for $yin Sigma^ast$. Now, these are again only finitely many, and for $xneq y$, we have $Ly^{-1} = Lx^{-1}$ if and only if for all $uin Sigma^ast$, $ux in L Leftrightarrow uy in L$, that is they can be prefixed to words in the language with precisely the same set of strings.
To compute $Lx^{-1}$, consider those states $q$ in $A$ such that $x$ is contained in the future of $q$. The union of the pasts of those states constitute one right quotient. Find all these quotients.
You know you are done when you have found as many left factors as you have right factors.
Find those pairs of left and right factors $X,Y$ such that $Xcdot Y subseteq L$. This is $mathcal{F}_L$.

The Universal Automaton by Lombardy and Sakarovitch (in Texts in Logic and Games, Vol 2: Logic and Automata: History and Perspectives, 2007)

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/13617 Ask a Question Download Related Notes/Documents