Asked By : texasflood
Answered By : Sumik
- low computational cost
- low probability of generating the same hash value for two input vectors = evenly distributed hashes
If you language/library have some function for hashing float array, it should be ok. For example, this function is what Java would do with an float array (it’s not a direct copy of Java library):
int hashFloatArray(float[] arr) { //this is done in Arrays.hashCode() int h = 1; for (int i = 0; i < arr.length; i++){ int floatAsInt = Float.floatToIntBits(arr[i]); //in C: int floatAsInt = *(int*)(&arr[i]); h = 31 * h + floatAsInt; } //this is done in HashMap h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
Finding matching hashes Storing hashes instead of whole float arrays reduces amount of memory needed. Space of hashes is smaller than the space of float arrays, so matching hashes never give 100% certainty that original objects (float arrays) where identical. The longer hashes the smaller is the change of hash collision (situation when two different inputs produce the same hash). Array indexed with hashes (as OP proposed) Indexing array with hashes means that the length of the array must be 2^k where k is length of key in bits. For example: Assuming single field of array occupies 1 byte (boolean variable usually occupy 1 byte). Using 30 bit key requires 2^30 * 1 bytes = 1GB of memory (O(2^k) space complexity). As mentioned before:
- shorter key = smaller array = higher chance of hash collision
- longer key = bigger array = lower chance of hash collision
Hash table (Hash set) Hash table is a data structure that can be used to implement hash set. Main advantage of hash set is it’s O(1) average time complexity combined with O(n) space complexity. You can use hash set to store and effectively find:
- long hashes (64 bit?) of float arrays without wasting memory
- float arrays themselves (if you have enough memory)
Hash sets are available in standard libraries of almost all programming languages.
- java.util.HashSet in Java
- unordered_set in C++11
- set in Python
Why comparing and hashing floats is a problem? Let’s take two float numbers: 1.2345678 and 1.2345679 The difference between them is small enough to have no influence on the quality of your Robot, but comparing 1.2345678 and 1.2345679 results in “not equal” and hashing them results in two different hashes. Introducing an “almost equal” term may be a good idea, but that can’t be done with hashes.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/37952