[Solved]: Is there a “sorting” algorithm which returns a random permutation when using a coin-flip comparator?

Problem Detail: Inspired by this question in which the asker wants to know if the running time changes when the comparator used in a standard search algorithm is replaced by a fair coin-flip, and also Microsoft’s prominent failure to write a uniform permutation generator, my question is thus: Is there a comparison based sorting algorithm which will, depending on our implementation of the comparator:

  1. return the elements in sorted order when using a true comparator (that is, the comparison does what we expect in a standard sorting algorithm)
  2. return a uniformly random permutation of the elements when the comparator is replaced by a fair coin flip (that is, return x < y = true with probability 1/2, regardless of the value of x and y)

The code for the sorting algorithm must be the same. It is only the code inside the comparison “black box” which is allowed to change.

Asked By : Joe

Answered By : frafl

The following deterministic (without the comparator) algorithm works for an input tuple $(a_1,dots,a_n)$:

  1. Do the Fisher-Yates shuffle using your comparator with some static pair (say $a_1 < a_2$) as a coin flip (doing acceptance-rejection sampling). If the comparator outputs $1$ the first time, use it inverted to avoid an endless rejection loop in the deterministic case.
  2. (optional speedup: Try a single pair $n$ times, where $n$ is the length or your input. If any two of the outputs differ return the permutation obtained in (1))
  3. Sort your array using merge sort.

Given a deterministic order relation as comparator this algorithm sorts an array in time $mathcal{O}(n log n)$ since the Fisher-Yates shuffle runs in $mathcal{O}(n)$ using maximal $mathcal{O}(log n)$ nonrandom “random bits” (e.g. calls to your comparator) in each step and merge sort has the same asymptotic complexity. The result of (1) is totally useless in this case, but since it’s followed by a real sort, this does no harm. Given a real coin flip as comparator (1) permutes the array with equal probability for each permutation and if you really have to do (3) (you left out (2) or (2) failed to determine the randomness), this is no harm because the distribution of its result only depends on the order of its input which is uniformly distributed among all permutations because of (1), so the result of the entire algorithm is also uniformly distributed. The number of times each acceptance-rejection sampling has to be repeated is geometrically distributed (reject with probability $< frac{1}{2}$) and therefore it has an expected value $< 2$. Each repetition uses at most $log n$ bits, so the runtime analysis is nearly the same as in the deterministic case, but we only get an expected runtime of $mathcal{O}(n log n)$, with the possibility of nontermination (terminates only almost surely). As Joe pointed out: If you don’t like the test for the first bit in (1), do (3) then (1) and use $a_n < a_1$ which is always $0$, since the array is already sorted in the deterministic case. Additionally you have to subtract your random number from the upper bound on the range in the loop, because the upper bound for the random number yields the identical permutation. But be aware that (2) is forbidden then, because you always have to do the shuffle in the ransom case. You can even use the same calls to your comparator for (1) and (3), but then proving that the result is uniformly distributed is at least a lot harder, if possible at all. The following algorithm is has no distinct phases to shuffle and sort, but is asymptotically slower. It’s essentially insertion sort with binary search. I will use $a=(a_1,dots,a_n)$ to denote the input and $b_k=(b_{k,1},dots,b_{k,k})$ to denote the result after the $k$-th round:

  1. Set $b_{1,1} = a_1$
  2. If $a_2 < a_1$ then $b_2 = (a_2,a_1)$ and $(c,d):= (2,1)$ else $b_2 = (a_1,a_2)$ and $(c,d):= (1,2)$. In either case $a_d < a_c$ will always be $0$ (i.e. false) for a nonrandom comparator.
  3. To obtain $b_{k}$ for $k geq 3$ obtain $b_{k-1}$ first.
  4. Let $l=lceil log_2 k rceil$ and $k’ = 2^l$, i.e. $k’$ is the least power of $2$ not smaller than $k$.
  5. Let $i_0 = 0$. For every $j in {1,dots,l}$ let $$i_j = begin{cases} i_{j-1} + 2^{l-j} & i_{j-1} + 2^{l-j} > k-1 wedge a_d < a_c i_{j-1} & i_{j-1} + 2^{l-j} > k-1 wedge neg (a_d < a_c) i_{j-1} + 2^{l-j} & i_{j-1} + 2^{l-j} leq k-1 wedge b_{k-1,i_{j-1} + 2^{l-j}} < a_k i_{j-1} & i_{j-1} + 2^{l-j} leq k-1 wedge neg(b_{k-1,i_{j-1} + 2^{l-j}} < a_k) end{cases}$$
  6. If $i_l > k$ repeat (5.) else $b_k=(b_{k-1,1},dots,b_{k-1,i_l -1},a_k,b_{k-1,i_l},dots,b_{k-1,k-1})$
  7. Output $b_n$

Random case: 5 + the if clause of 6 is essentially acceptance-rejection sampling. The rest of the algorithm is a naive shuffle: shuffle the first $k-1$ elements and add the $k$-th element to each position with equal probability. If we used the normal insertion sort, we would get a binomial distribution instead. Note that this algorithm is inefficient in both modes compared to the Fisher-Yates shuffle and merge sort as inserting an element to an arbitrary position is expensive if using an array and binary search needs linear time if using a list. But perhaps a modification of heap sort or tree sort in a similar way could lead to a faster algorithm.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/10656

Leave a Reply