- Take all possible subsets of 0..35, except the empty subset. Each of this subsets denotes which elements we should compare.
- Use each of this subsets with input all the 24 arrays of our given set and check which of these subsets produce as output the given set of our 24 arrays, these are valid subsets.
- From these valid subsets, the one with the fewer elements is our solution.
Unfortunately, all possible subsets of 0..35 excluding the empty subset is $2^{36}-1$ I was wondering if there could be another approach to the problem. Many different arrays X will be matched against the given set of 24 arrays so any preprocessing of the 24 arrays is really worth the effort. User JarkkoL suggested to precompute CRC32 of the set of the given arrays and then compare with the CRC32 of the array X. Still, I’m wondering if there’s a smart way not to calculate CRC32 for the whole array. I’m looking for the smallest possible number of elements for which I should compute it.
Asked By : egwspiti
Answered By : babou
A faster solution for large values of $n$, assuming $v$ remains small.
Actually, though the worst case $O(n)$ cannot be improved, one can do better than the first solution I propose above, assuming $v$ remains small. The idea is that one does not need to use binary tree. With more branching, one can get even faster to the answer. For that purpose, we need to build a function $phi$ that maps the $v$ values used into the integer range $[1..v]$. This can be done by hashing or by other means. Then we can assume without loss of generality that the values used are the integer in $[1..v]$. Then one could find a minimal set $I$ of indices for the arrays, so that for any pair of arrays $A_j$ and $A_k$ there is an index $iin I$ such that $A_j[i] neq A_k[i]$. Then, for any array $X$ known to be equal to an array $A_hin S$, we can identify $A_h$ by looking only at the values for the indices in $I$. One could then build a decision tree based on these indices considered in succession. However, finding such a minimal set of indices may be difficult (I did not look into its complexity), and there may be better solutions, by using independent indices for the different nodes on a given level of the decision tree. So we can probably get an even faster result by proceeding as follow. Choose a first index $i_0$ such that the $p$ different values for that index partition the set $S$ of arrays into $p$ subsets $S_x$ for $xin [1..p]$. Depending on whether you try to reduce the average cost or the maximum cost, you may try to choose $i_0$ so as to maximize $p$, or to minimize the maximum size of the subsets $S_x$ (though this is heuristic). These $p$ subsets are the daughters of the root in the decision tree being built. Then you repeat this operation for each subset, finding independently for each subset $S_x$ a new index $i_{x,1}$, so as to further partition the subset. You repeat the operation until you reach the leaves which correspond to singleton sets, i.e., a specific array in $S$. You get a widely branching decision tree, where each non-leaf node (actually corresponding to a subset of arrays) is labeled by an index, each branch is labeled by a value found for that index, and each leaf is labeled by an array in $S$. This tree must be implemented so that each branch can be accessed in one step by direct indexation. Then, given an array $X$, you read its value at the index $i_0$ labeling the root, and use this value as $phi(X[i_0])$ to access by indexation the next node in the tree. Then you repeat the step until you reach a leaf. When $v$ is large, the algorithm has to be adapted by using a different function $phi$ at each node.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/29539 Ask a Question Download Related Notes/Documents