Suppose you have an array of integer numbers in the range ${0,1,2,ldots n log n}$. Show and explain an effective algorithm in the worst case which finds out if there are two elements with the same value. note – You can use an extra memory in order of magnitude equals to $O(n)$.
I don’t really care about how the algorithm will look like. The idea to this proof is using radix-sort in $O(n)$ and then looking for two elements with the same value as the array is sorted. The proof outline: Let’s suppose we are examining a larger domain which is ${0,1,2 ,ldots n^2 – 1}$. Now we’ll treat each number according to its binary representation using radix-sort bit by bit. Right after this, comes the part I can’t understand at all…. As we know the order of magnitude of radix-sort is $Theta(d(n+k))$ and therefore all we have to decide is which a to choose to have order of magnitude equals to $O(n)$. using the formula $frac{2log n}{a} (n + 2^a)$. After this step, you just choose $a = log n$ and you are done. My questions are:
- How did the writer concluded the following formula: $frac{2log n}{a} (n + 2^a)$?
- Also, why do the writer prefer to work on the domain of ${0,1,2 ,ldots n^2 – 1}$ instead the one given in the question?
Asked By : SyndicatorBBB
Answered By : William Macrae
- In the formula $Theta(d(n+k))$, $d$ is the number of digits (iterations in the sort) and $k$ is the size of the buckets (the base you are working in). So, if you were sorting the numbers 0 through 999 with 10 buckets, there would be $d=3$ iterations (since the numbers have length 3 in base 10) and we would have $k=10$, which is the base. If we leave the base, $2^a$ to be determined later (I guess by using a power of two we can read a bits and let that determine the bucket), there are $log_{2^a}(n^2) = frac{log_2 n^2}{log_2(2^a)} = frac{2 log n}{a}$ iterations. $k= 2^a$ is just the base. So $d(n+k) = frac{2 log n}{a}(n + 2^a)$.
- The expanded range is to make the math prettier. Above we took $log n^2$, which worked out nicely to give a constant factor 2 in our assymptotic. If we instead had $log (n log n)$, things would not be as nice. It should still work (possibly for a different choice of $a$), and if you’re comfortable with big-O work you can do it as an exercise, but the proof is much cleaner when you don’t have to argue that $O(log n + log log n) = O(log n)$.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/7051