[Solved]: Can POSIX BRE express all regular languages?

Problem Detail: It appears that “Basic Regular Expressions” as defined by POSIX.1-2008 do not support alternation, a|b (although some grep implementations recognize the escaped version, |). Since the regular languages are closed under union by definition, does this mean that POSIX BRE has less expressive power than a finite automaton? Or is there some way to simulate alternation using other constructs?

Asked By : Steve Kobes

Answered By : Gilles

Indeed the POSIX BRE language cannot express all regular expressions because it lacks alternation. It can’t even recognize all finite languages, let alone all regular languages. For example, ${ab, ba}$ is not recognizable as a BRE. To prove this, consider what the toplevel syntactic form could be:

It can’t be one of the single-character forms since the language has words of length $gt 1$.
It can’t be $R^*$ because that would match the empty string.
It can’t be $R^{{m,n}}$ except with $m=n=1$ (in which case we’re back to the original problem) because that would match strings of different lengths or the empty string.
So it has to be concatenation: $R_1 R_2$. Now consider how $ab$ is recognized:
- If $R_1$ recognizes $ab$ then $R_2$ must not recognize anything anything other than the empty string. So $R_1$ must recognize ${ab,ba}$ and we’re back to the original problem.
- If $R_1$ recognizes $a$ but not $ab$ then $R_2$ must recognize $b$. But then $R_1R_2$ recognizes all words of the form $u b$ where $R_1$ recognizes $u$, so $R_1$ must not recognize anything other than $a$. There’s no way to recognize $ba$.
- If $R_1$ recognizes neither $ab$ nor $a$ then the only way for $R$ to recognize $ab$ is if $R_1$ recognizes the empty string, in which case we’re back to the original problem as above, but for $R_2$ this time.

When “we’re back to the original problem”, that means that the only solution to find a BRE recognizes the language is to find a smaller BRE that has the same property. This is an infinite descent, so there’s no BRE that has the desired property. I don’t think there’s a “nice” characterization of BRE-recognizable languages, for example as languages recognizable by a “nice” class of automata. Note that BRE-recognizable languages are actually not a subclass of regular languages, since backreferences add expressive power. For example ${w w mid w in {a,b}^*}$ is recognized by the BRE (.*)1 but is famously not regular. BRE without backreferences are just syntactic sugar over regular expressions so the languages they can recognize are a subclass of the regular languages.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/62944

Asked By : Steve Kobes

Answered By : Gilles

Best Answer from StackOverflow

Related