Problem Detail: Given lists $A_1, A_2, dots, A_n$ of non-negative numbers, I want to find the $k$ smallest elements of the Cartesian product $A_1 times A_2 times dots times A_n$ ordered by the value $x_1 + x_2 + dots + x_n$, where $x_iin A_i$. I can’t simply create all combinations in memory and sort them because of large amount of data. For example, I have 1000 drinks, 1000 main courses and 1000 desserts. I want to find the 100 cheapest meals but I can’t store all 1,000,000,000 elements in memory. Additional details
- I have Drinks, Main Courses and Desserts already sorted.
- I’ve found some solutions like Efficient sorted cartesian product of 2 sorted array of integers but only for 2 lists. For greater amount of sets it still memory-consuming.
- Lazy generating next “sorted” value will be useful.
- If no efficient exact algorithm, it could be heuristic or approximation algorithm.
- It’s similar to X + Y sorting problem , efficient faster than O(n^2 * log n) solution is unknown
Question: Are there an efficient ways to get only $k$ cheapest(*) elements of Cartesian product?
(*) Price of the result element is sum of all numbers in that element.
Asked By : Dmitry Yaremenko
Answered By : Klaus Draeger
If $k$ is known in advance, we can do the following. I will assume that each list $A_j$ is sorted according to cost. What we will do is a weighted form of breadth-first exploration of index tuples $(i_1,dots,i_n)$, storing the current candidates in a (bounded length) priority queue $Q$ according to their cost $A_1[i_1]+dots+A_n[i_n]$. Initially $Q$ contains just the initial tuple $(0,dots,0)$. In each iteration, pop the first tuple $t=(i_1,dots,i_n)$ off $Q$; this is a cheapest previously unexplored solution. Then generate its successors of the form $t^{(j)}=(i_1,dots,i_j+1,dots,i_n)$, compute their cost, and insert them into $Q$ accordingly. We can avoid creating duplicates by only creating the successors $t^{(j)}$ up to the first index $j$ for which $i_j>0$; e.g. for $t=(0,1,0,2)$ we would only create $(1,1,0,2)$ and $(0,2,0,2)$ (the other tuples $(0,1,1,2)$ and $(0,1,0,3)$ instead get created as successors of $(0,0,1,2)$ and $(0,0,0,3)$, respectively). This eliminates the need for a closed list of explored tuples. We still need to store the priority queue $Q$, but it can be bounded to length $k-j$, where $j$ is the current iteration, with any entries beyond that discarded. For example, suppose we have $A_1=(1,5,7),A_2=(1,2,5),A_3=(1,3,4)$, and $k=4$.
- Initially, $Q=[(0,0,0)]$.
- Pop $(0,0,0)$, with cost $3$, and push $(1,0,0),(0,1,0),(0,0,1)$. Their costs are $7,4,5$, respectively, and $Q$ is now $[(0,1,0),(0,0,1),(1,0,0)]$.
- Pop $(0,1,0)$ (cost $4$), and push $(1,1,0)$ (cost $8$) and $(0,2,0)$ (cost $7$). $Q$ is now $[(0,0,1),(1,0,0),(0,2,0),(1,1,0)]$; since we only need two more solutions, we can drop the last two, getting $[(0,0,1),(1,0,0)]$.
- Pop $(0,0,1)$ (cost $5$), and push $(1,0,1)$ (cost $9$), $(0,1,1)$ (cost $6$), and $(0,0,2)$ (cost $6$). $Q$ is now $[(0,1,1),(0,0,2),(1,0,0),(1,0,1)]$; again, we can drop unneeded entries since we only want one more, getting $[(0,1,1)]$.
- Pop $(0,1,1)$; done.
To see that this method is correct, we just need to prove the following:
- Given the above restrictions, any index tuple $i=(i_1,dots,i_k)$ (with $0le i_jle size(A_j)$) is reachable along the unique path $(0,dots,0)todotsto(0,dots,0,i_k)todotsto(0,dots,0,i_{k-1},i_k)todotsto i$.
- The cost of the tuples along that path is nondecreasing, since the weights are nonnegative.
- If there is another tuple $i’$ with a higher cost, and $i’$ is popped off $Q$ during the algorithm, then so are all tuples along the path to $i$ (using induction).
- If a tuple $i’$ is discarded due to the length restriction, then there are at least $k$ tuples whose cost is no greater than that of $i’$.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/46910 Ask a Question Download Related Notes/Documents