[Solved]: Chomsky normal form: epsilon rule

Problem Detail: I have pretty simple question, but still can’t find an answer just googling it. I’m trying to understand Chomsky Normal Form (CNF). There are three production rules:

  1. $A to BC$
  2. $A to alpha$
  3. $S to epsilon$

First two I understand. But last one $epsilon$ doesn’t makes sense for me. Why do we need this rule? What is use of having this?

Asked By : Igor Konoplyanko

Answered By : Yuval Filmus

The last rule is necessary for languages containing $epsilon$. A context-free grammar in which $S$ does not generate $epsilon$ can be converted to Chomsky normal form without the rule $S to epsilon$. Conversely, simple induction shows that all other rules do not generate $epsilon$, so this rule is needed for languages generating $epsilon$. The rule arises during the $epsilon$-elimination step. For each non-terminal generating $epsilon$, we “optionally” delete it from any production containing in in the right-hand side, and continue the process inductively (eliminating one non-terminal overtly generating $epsilon$ may reveal that under generates $epsilon$). Finally, it could happen that $epsilon$ is pushed all the way to the starting symbol $S$; there is no way to eliminate it there.
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/24012