Connection between KMP prefix function and string matching automaton

Problem Detail: Let $A_P = (Q,Sigma,delta,0,{m})$ the string matching automaton for pattern $P in Sigma^m$, that is

  • $Q = {0,1,dots,m}$
  • $delta(q,a) = sigma_P(P_{0,q}cdot a)$ for all $qin Q$ and $ain Sigma$

with $sigma_P(w)$ the length of the longest prefix of $P$ that is a Suffix of $w$, that is $qquad displaystyle sigma_P(w) = max left{k in mathbb{N}_0 mid P_{0,k} sqsupset w right}$. Now, let $pi$ the prefix function from the Knuth-Morris-Pratt algorithm, that is $qquad displaystyle pi_P(q)= max {k mid k < q wedge P_{0,k} sqsupset P_{0,q}}$. As it turns out, one can use $pi_P$ to compute $delta$ quickly; the central observation is:

Assume above notions and $a in Sigma$. For $q in {0,dots,m}$ with $q = m$ or $P_{q+1} neq a$, it holds that $qquad displaystyle delta(q,a) = delta(pi_P(q),a)$

But how can I prove this? For reference, this is how you compute $pi_P$:

m ← length[P ] π[0] ← 0 k ← 0 for q ← 1 to m − 1 do   while k > 0 and P [k + 1] =6 P [q] do     k ← π[k]     if P [k + 1] = P [q] then        k ← k + 1     end if     π[q] ← k  end while end for  return π 
Asked By : Bob

Answered By : Raphael

First of all, note that by definition

  • $delta(q,a) = sigma_P(P_{0,q}cdot a) =: s_1$ and
  • $delta(pi_P(q),a) = sigma_P(P_{0,pi_P(q)}cdot a) =: s_2$.

Let us investigate $s_1$ and $s_2$ in a sketch: sketch
[source] Now assume $s_2 > s_1$; this contradicts the maximal choice of $s_1$ directly. If we assume $s_1 > s_2$ we contradict the fact that both $s_2$ and $pi_P(q)$ are chosen maximally, in particular because $pi_P(q) geq s_1 – 1$. As both cases cases lead to contradictions $s_1=s_2$ holds, q.e.d. As requested, a more elaborate version of the proof: Now we have to show $s_1=s_2$; we do this by showing that the opposite leads to contradictions.

  • Assume $s_2 > s_1$. Note that $P_{0,s_2} sqsupset P_{0,q}cdot a$ because $P_{0,s_2} sqsupset P_{0,pi_P(q)}cdot a$ and $P_{0,pi_P(q)} sqsupset P_{0,q}$ by definition of $s_2$. Therefore, $P_{0,s_2}$ — a prefix of $P$ and a suffix of $P_{0,q}cdot a$ — is longer than $P_{0,s_1}$, which is by definition of $s_1$ the longest prefix of $P$ that is a suffix of $P_{0,q}cdot a$. This is a contradiction.

Before we continue with the other case, let us see that $pi_P(q) geq s_1 – 1$. Observe that because $P_{0,s_1} sqsupset P_{0,q}cdot a$, we have $P_{0,s-1} sqsupset P_{0,q}$. Assuming that $pi_P(q) < s_1 – 1$ immediately contradicts the maximal choice of $pi_P(q)$ ($s_1 – 1$ is in the set $pi_P(q)$ is chosen from).

  • Assume $s_1 > s_2$. We have just shown $|P_{0,pi_P(q)}cdot a| geq s_1$, and remember that $P_{0,pi_P(q)}cdot a sqsupset P_{0,q} cdot a$. Therefore, $s_1 > s_2$ contradicts the maximal choice of $s_2$ ($s_1$ is in the set $s_2$ is chosen from).

As neither $s_1 > s_2$ nor $s_2 > s_1$ can hold, we have proven that $s_1 = s_2$, q.e.d.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/1669