[Solved]: String Comparison vs. Hashing

Problem Detail: I recently learned about the rolling hash data structure, and basically one of its prime uses to searching for a substring within a string. Here are some advantages that I noticed:

Comparing two strings can be expensive so this should be avoided if possible
Hashing the strings and comparing the hashes is generally much faster than comparing strings, however rehashing the new substring each time traditionally takes linear time
A rolling hash is able to rehash the new substring in constant time, making it much quicker and more efficient for this task

I went ahead and implemented a rolling hash in JavaScript and began to analyze the speed between a rolling hash, traditional rehashing, and just comparing the substrings against each other. In my findings, the larger the substring, the longer it took for the traditional rehashing approach to run (as expected) where the rolling hash ran incredibly fast (as expected). However, comparing the substrings together ran much faster than the rolling hash. How could this be? For the sake of perspective, let’s say the running times for the functions searching through a ~2.4 million character string for a 100 character substring were the following:

Rolling Hash – 0.691 seconds
Traditional Rehashing – 71.009 seconds
Just comparing the strings (no hashing) 0.100 seconds

How could the string comparing be so much faster than the rolling hash? Could it just have something to do with the particular language I tested this in? Strings are a primitive type in JavaScript; would this cause string comparisons to run in constant time?

Asked By : Nick Zuber

Answered By : D.W.

One possible hypothesis: Javascript is interpreted, which can slow things down. When you compare two strings, this will invoke a string comparison routine that is coded directly in assembly language or C, and thus string comparison will probably be about as fast as the architecture can possibly compare strings. In contrast, when you hash strings, you have to execute multiple instructions in Javascript, which might incur overhead from an interpreter. Another possible hypothesis: Numbers in Javascript are a floating-point type. There is no integer type in Javascript. Your rolling hash probably involves what looks like integer arithmetic. However, under the covers this is probably getting interpreted as operations on double-precision floats, which can be a bit slower. This might also make your hash-based schemes slower. Some Javascript engines might optimize this to integer arithmetic in certain circumstances, but no guarantees about whether any particular Javascript engine will or won’t be able to do that in any particular case.

Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/51933

Asked By : Nick Zuber

Answered By : D.W.

Best Answer from StackOverflow

Related