Problem Detail: Here’s a conjecture for regular expressions:
For regular expression $R$, let the length $|R|$ be the number of symbols in it, ignoring parentheses and operators. E.g. $|0 cup 1| = |(0 cup 1)^*| = 2$ Conjecture: If $|R| > 1$ and $L(R)$ contains every string of length $|R|$ or less, then $L(R) = Sigma^*$.
That is, if $L(R)$ is ‘dense’ up to $R$’s length, then $R$ actually generates everything. Some things that may be relevant:
- Only a small part of $R$ is needed to generate all strings. For example in binary, $R = (0 cup 1)^* cup S$ will work for any $S$.
- There needs to be a Kleene star in $R$ at some point. If there isn’t, it will miss some string of size less than $|R|$.
It would be nice to see a proof or counterexample. Is there some case where it’s obviously wrong that I missed? Has anyone seen this (or something similar) before?
Asked By : Lucas Cook
Answered By : Yuval Filmus
Your conjecture is disproved by Keith Ellul, Bryan Krawetz, Jeffrey Shallit and Ming-wei Wang in their paper “Regular Expressions: New Results and Open Problems”. While the paper is not available on-line, a talk is. In the paper, they define the measure $|mathrm{alph}(R)|$, which is the number of symbols in $R$, not counting $epsilon$ or $emptyset$. However, $emptyset$ can be eliminated from every expression not generating the empty language, and the expression can be “cleaned up” so that the number of $epsilon$ it contains is at most $|mathrm{alph}(R)|$ (Lemma on page 10 of the talk). In page 51, for every $n geq 3$ they construct a regular expression of size $O(n)$ over ${0,1}$ which generates all strings of size at most $Omega(2^n n)$, but doesn’t generate all strings. Note that “size” here is both in your sense and theirs, since we’re using big-O notation. They also pose an open question to find the best dependence between the two parameters.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/1511