- Suppose the $i$th number in the stream is $S_i$.
- Keep a max heap of size $K$.
- If the heap contains fewer than $K$ elements, add $S_i$ to the heap.
- Otherwise: if $S_i$ is smaller than the maximum element in the heap (i.e., the root of the heap), replace the root of the heap with $S_i$ and apply Max-Heapify to the heap; if $S_i$ is greater than or equal to the max element, do nothing.
The problem now is to find the expected number of times the Max Heapify operation will be called, when the stream of integers is of length $R$ and each element of the stream is (iid) uniformly distributed on $[1,N]$. If the stream were guaranteed to contain only distinct elements, then the answer is easy: $$E[X] = E[X_1] + E[X_2] + dots + E[X_R],$$ where $X_i$ is the random indicator variable for occurrence of the Max Heapify operation at the $i$th number in the stream. Now $$E[X_i] = Pr[text{$S_i$ is ranked $le K$ among first $i$ elements}] = min(K/i, 1).$$ Hence, $$E[X] = K + K/(K+1) + cdots + K/R.$$ That case is relatively easy. But how do we handle the case where the elements are iid uniformly distributed? [This was actually a Google interview question.]
Asked By : Piyush
Answered By : FrankW
$c_{j,x} = binom{i-1}{j} (x-1)^j (N-x+1)^{i-1-j}$, and thus overall (after shifting $i$ and $x$ by $1$) $$E[X] = sum_{i=0}^{R-1}frac{1}{N^{i+1}} sum_{x=0}^{N-1} sum_{j=0}^{K-1} binom{i}{j} x^j (N-x)^{i-j}.$$ Unfortunately, all my attempts at bringing this formula into a simpler/closed form have been unsuccessful.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/14661