z-logo
Premium
UniIndex: An index and query middleware for parallel file systems
Author(s) -
Cheng Peng,
Wang Yong,
Lu Yutong,
Du Yunfei,
Chen Zhiguang
Publication year - 2019
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5609
Subject(s) - computer science , search engine indexing , database , file system , data mining , middleware (distributed applications) , index (typography) , speedup , granularity , set (abstract data type) , information retrieval , operating system , world wide web , programming language
Summary As data analysis scenarios keep increasing on high‐performance computing systems, the ability to select a small fraction of data from a large volume of scientific data sets is vital to accelerate scientific discovery. However, parallel file systems lack the ability to provide efficient data locating services at the granularity of both a file and a record. Existing methods for identifying and indexing data are often domain‐specific and do not scale to large scientific data sets. In this paper, we describe the design and implementation of UniIndex framework, which combines our proposed techniques for user‐annotation extraction, in‐memory cache layer, in‐situ indexing, and parallel query processing. Acting as middleware on top of production file systems, UniIndex enables efficient data locating services with minimal user effort. Our evaluations show that UniIndex can locate target files from directories containing millions of files in microseconds. By applying in situ indexing and the lightweight range‐bitmap index, record‐level index building time can be dramatically reduced while maintaining up to two orders of magnitude query speedup than scanning the entire data set.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here