balanced-tree: worst(19) average(18.84) edlin-tree: worst(47) average(22.72) edlin-tree(SHA2): worst(46) average(23.25) regular BST: worst(468685) average(234342.50)
So it’s not going to replace your language library algorithms but, as stated in the accepted answer, it is obviously logarithmic. As for the hash function(Which for reference is add+xorshift as in:
do{h+=*s;h^=h<<left1;h^=h>>right;h^=h<<left2;}while(*s);
), I have also tested how bad it is by comparing collisions to the expected value.
8 bits: expected : 468428.99 axs : 468429 SHA2-256 : 468429 16 bits: expected : 403200.35 axs : 403189 SHA2-256 : 403198 24 bits: expected : 6485.99 axs : 6451 SHA2-256 : 6433 31 bits: expected : 51.14 axs : 59 SHA2-256 : 47 32 bits: expected : 25.57 axs : 30 SHA2-256 : 20 33 bits: expected : 12.79 axs : 14 SHA2-256 : 11 35 bits: expected : 3.19 axs : 7 SHA2-256 : 2 36 bits: expected : 1.59 axs : 1 SHA2-256 : 2 37 bits: expected : 0.80 axs : 0 SHA2-256 : 0
So, while it appears to lag for some bitnesses, it generally follows the expected values, both for tree depth and collisions. What would a hash which is not good for your input mean? That your input generates a sequence when passed to it? That would be ridiculously unlikely statistically speaking. Worst case is O(n) but you are not going to see it. The algorithm is not good compared to AVL or red-black, even though insertion will probably be faster generally, however it beats a linked list or betting on input for a BST any day.
Asked By : Edlin
Answered By : Edlin
Reed, Bruce (2003), "The height of a random binary search tree", Journal of the ACM 50 (3): 306–332, doi:10.1145/765568.765571.
The author shows that the expected height of a random binary tree is $E(H_n ) = alpha log_e n − betalog_e log_e n + O(1)$, where $α = 4.311ldots$ and $β = 1.953ldots$. The magic number represents the upper bound. The expected height happens to be $approxlog_{frac{4}{3}} n$, which is somewhat intuitive when you consider that each parent node represents a random pivot partitioning the node search space like in random quicksort. Finally, the average access time for any given node is actually $approxlog_{frac{4}{3}} sqrt{n}$(I can’t find a reference for this). That makes using hash as the key $approx1.205log_2n$ or 20% slower on average than a self-balancing binary tree.* Similar structures with various trade-offs include the hash-as-priority treap, hash+b-tree, and hash+suffix tree. The suffix tree is only a bit less lazy than the proposed structure and provides expected $log_2n$ access time. [*] Many balanced tree algorithms also have a worst-case height upper bound of $log_phi n$
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/28194 Ask a Question Download Related Notes/Documents