[Solved]: Does a regular expression model the empty language if it contains symbols not in the alphabet?

Problem Detail: Suppose $Sigma = { a,b }$ and the regular expression $(a^*b+dc)^*(b^*d + ad)^*$. Is it equal to $varnothing$? So I have a regular expression like this: $(a^*b+dc)^*$. As only $(a,b) in Sigma$, I see that:

  • $dc=varnothing$

So $(a^*b+dc)^*=(a^*b)^*$. Then:

  • $(b^*d + ad)^* = (varnothing + varnothing)^*=(varnothing)^*$ and as $(varnothing)^*=epsilon$, $(b^*d + ad)^*$ becomes $epsilon$.

So my regular expression is simply $ (a^*b)^*$? Am I correct, or does the fact that the regular expression contain at least one symbol not in alphabet make it wrong immediately?

Asked By : John Mayne

Answered By : David Richerby

Regular expressions only use characters from the alphabet so, if you’ve fixed your alphabet to be ${a,b}$, then $(a^∗b+dc)^∗(b^∗d+ad)^∗$ isn’t a regular expression. It doesn’t describe any language, in just the same way that “seventy red” doesn’t describe any number. In particular, it doesn’t describe the empty language, in the same way that “seventy red” isn’t equal to zero. Now, if your alphabet includes all the symbols $a$, $b$, $c$ and $d$, you might want to ask what are the strings in $L((a^∗b+dc)^∗(b^∗d+ad)^∗)$ that contain only $a$’s and $b$’s. That is, what is $L((a^∗b+dc)^∗(b^∗d+ad)^∗)cap {a,b}^*$? And the answer to that is that it’s all strings matching $(a^*b)^*$, as you derive in the question. Appendix. One could attempt to sidestep these issues by redefining regular expressions in a way that gives them well-defined semantics if they include symbols not in the alphabet. However, the standard definition does not do this. Indeed, trying to do so would seem to open up a huge can of worms. For example, suppose your alphabet is ${a,b}$. We could agree that, since $c$ is not a symbol in the alphabet, the regular-expression-like-object $abc$ matches nothing. OK, but $+$ is also a symbol that’s not in the alphabet. Maybe you’re happy with $ab+$ matching nothing because it’s syntactically invalid – note, $ab+$, not $ab^+$! But, now, what does $a+b$ match? Does it mean “$a$ or $b$” or “$a$ followed by some symbol that’s not in the alphabet followed by $b$, which is impossible, so it matches nothing”?
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/64990  Ask a Question  Download Related Notes/Documents