Premium
Self‐adjusting trees in practice for large text collections
Author(s) -
Williams Hugh E.,
Zobel Justin,
Heinz Steffen
Publication year - 2001
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.394
Subject(s) - binary search tree , heuristics , computer science , ternary search tree , binary tree , tree (set theory) , hash function , overhead (engineering) , binary number , vocabulary , self balancing binary search tree , theoretical computer science , algorithm , mathematics , tree structure , k ary tree , interval tree , arithmetic , combinatorics , operating system , linguistics , philosophy , computer security
Splay and randomized search trees (RSTs) are self‐balancing binary tree structures with little or no space overhead compared to a standard binary search tree (BST). Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collection for index construction. We investigate the efficiency of these trees for such vocabulary accumulation. Surprisingly, unmodified splaying and RSTs are on average around 25% slower than using a standard binary tree. We investigate heuristics to limit splay tree reorganization costs and show their effectiveness in practice. In particular, a periodic rotation scheme improves the speed of splaying by 27%, while other proposed heuristics are less effective. We also report the performance of efficient bit‐wise hashing and red–blacktrees for comparison. Copyright © 2001 John Wiley & Sons, Ltd.