[Solved]: Shortest sub-sequence of one string, that’s not a sub-sequence of another string

Problem Detail: Given two strings $x$ and $y$ over the alphabet ${A,C,G,T}$, I’m trying to determine a shortest string $z$ such that $z$ is a subsequence of $x$ and not a subsequence of $y$.

Example: a shortest string that is a subsequence of AATGAAC but not a subsequence of ATCGATC is AAG.

Asked By : Mike

Answered By : Aryabhata

A dynamic programming algorithm seems to work. Given a prefix $X_k = x_1 x_2dots x_k$ of string $X$ ($X$ is of length $s$, so $X = X_s$), a prefix $Y_m = y_1 y_2 dots y_m$ of string $Y$ ($Y$ is of length $t$, so $Y_t = Y$), and a length $L$, we determine if there is a subsequence of $X_k$ which is also a subsequence of $Y_m$, such that the sub-sequence is of length exactly $L$ and ends at $x_k$. Call that (boolean) value: $text{is_there}[k,m,L]$. This satisfies the recurrence: $$text{is_there}[k,m,L] = text{is_there}[k, m-1, L] vee (x_k = y_m wedge text{is_there}[k-1,m-1,L-1])$$ The smallest $L$ for which there is some $k$, such that $text{is_there}[k, t, L] = false$ gives you the result. (Note that $t$ is the length of $Y$). If the length of the shortest such subsequence is $S$, This can be implemented in $O(stS)$ time and $O(stS)$ space by a triply nested for-loop with $L$ on the outside, then $k$, then $m$. Something like:

Compute isThere for L = 1. foreach L in (2 ... s)     foreach k in (1 ... s)        foreach m in (1 ... t)          is_there[k,m,L] = blah 

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/12037