[Solved]: Algorithm to partially sort list into equal-sized buckets

Problem Detail: Suppose I have a large list of numbers that I want to divide into equal-sized buckets so that every bucket contains only larger numbers than buckets to its left. Numbers within each bucket don’t need to be sorted. For example, for the input

[64, 72, 57, 47, 5, 4, 64, 21, 64, 65, 36, 43, 81, 44, 19, 87, 17, 86,   73, 21, 19, 64, 94, 91, 34, 49, 8, 52, 18, 37]

that I want to divide into 5 buckets, some valid outputs would be

[[4, 5, 8, 17, 18], [19, 19, 21, 21, 34], [36, 37, 43, 44, 47],   [49, 52, 57, 64, 64], [64, 64, 65, 72, 73]]  [[8, 17, 18, 5, 4], [21, 19, 21, 34, 19], [43, 37, 44, 36, 47],   [52, 57, 49, 64, 64], [64, 64, 72, 73, 65]]  [[8, 18, 5, 4, 17], [19, 19, 21, 21, 34], [47, 36, 44, 43, 37],   [52, 64, 49, 64, 57], [64, 72, 64, 65, 73]]  [[18, 8, 17, 5, 4], [21, 19, 21, 19, 34], [36, 44, 43, 37, 47],   [57, 64, 64, 49, 52], [64, 72, 64, 65, 73]]

One approach to do this would be to sort the list, and then divide it at equal intervals. Can this be done faster? I would also be happy with a result where the buckets are of only roughly equal size, but note that I can’t assume a uniform distribution for the numbers, so bucket sort doesn’t help.

Asked By : mrueg

Answered By : Raphael

Assuming you want to create $k$ buckets, to the following.

Obtain elements of rank $lceil n/k rceil, 2 cdot lceil n/k rceil dots, (k-1) cdot lceil n/k rceil$.
Perform multi-way partitioning (cf. Quicksort) with these elements as pivot.

Both steps take time $Theta(kn)$. All buckets are within $pm 1$ of the same size. In the case of duplicate elements, you can “uniquify” them in a $Theta(n)$ preprocessing run; otherwise bucket sizes might differ more. Essentially, this is performing only the top-level call of a $k$-way Quicksort with perfect pivots. All variants regarding pivot selection and partitioning apply and will lead to different runtime and bucket size characteristics.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/28951 Ask a Question Download Related Notes/Documents