[Solved]: Are probabilistic search data structures useful?

Problem Detail: A SkipList provides the same $O(log n)$ bounds for search as a balanced tree with the advantage that rebalancing isn’t necessary. Since the SkipList is constructed using random coin flips, these bounds only hold as long as the structure of the SkipList is sufficiently “balanced”. In particular, with probability $1/n^c$ for some constant $c>0$, the balanced structure might be lost after inserting an element. Let’s say I want to use a skip list as a storage backend in a web application that potentially runs forever. So after some polynomial number of operations, the balanced structure of the SkipList is very likely to be lost. Is my reasoning correct? Do such probabilistic search/storage data structures have practical applications and if so, how is the above problem avoided? Edit: I’m aware that there are deterministic variants of the SkipList, which, are much more complicated to implement in comparison to the (classic) randomized SkipList.

Asked By : somebody

Answered By : adrianN

I don’t think there is a polynomial probability for losing ‘balance’. After you insert an element in a skip list, you build a tower of copies above it by flipping a coin until it comes up heads. So you have layers with fewer and fewer elements as you reach the top. Since a tower has height $k$ with probability $2^{-k}$, there is an element at height $k$ with probability (union bound) of less than $n/2^k$. Hence having an element at level $clog n$ has probalitiy less than $1/n^c$. Towers of height $omega(log n)$ have subpolynomial probability. Let $M$ be the maximum level, then we have $$E[M] = sum_{kgeq 1} Pr(Mgeq k) leq log(n) + sum_{kle log(n)} n/2^k = log(n) + 2.$$ Furthermore, on level $k$ there are $n/2^k$ elements with very high probability, as this is the sum of $n$ independent random variables and you can use Chernov’s bound. Since you can also show that you only do a constant number of steps per level (with very high probability!), search costs are logarithmic. So you would have to be very unlucky indeed to end up with an unbalanced list. Note that ‘luck’ here is independent of your data, unlike for example in unbalanced search trees. Coin flips in Skip Lists are always random. As far as I know, skip lists are of great practical interest, because it is relatively easy to implement them as lock-free search structures, with the obvious benefits. B-trees on the other hand are rather difficult to make performant under concurrent accesses.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/10229 Ask a Question Download Related Notes/Documents