[Solved]: compressing a set of binary strings with fixed length

Problem Detail: I’m looking for a data structure / algorithm to store an unordered set S of binary strings of a fixed length n (i.e. all matching the following regular expression: [01]{n}). Insertion and lookup (“Is element x in the S?”) should maximally take polynomial time to |S|. The space complexity should be logarithmic to |S| for large sets. In other words, the space should not be exponential to n if for example 2^n / 2 random and unique strings are inserted, but polynomial to n. Is such a thing known to exist?

Asked By : Tobias Hermann

Answered By : Yuval Filmus

Suppose $S$ consists of $m$ strings in ${0,1}^n$ and we don’t allow query arrows. Any two different sets $S_1,S_2 subseteq {0,1}^n$ of size $m$ respond differently to some lookup query: there must be some $x in S_1setminus S_2$, for example, and the lookup of $x$ should succeed in $S_1$ and fail in $S_2$. For this reason, the contents of your data structure should be different for $S_1,S_2$. Since there are $binom{2^n}{m}$ different choices for $S$, any data structure supporting all of them should have at least $binom{2^n}{m}$ different settings. In particular, if you use $M$ bits to store it then $2^M geq binom{2^n}{m}$. When $m = 2^n/2$, for example, this forces $M geq 2^n – O(n) = 2m – O(log m)$.
Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/32466