UniIndex: An index and query middleware for parallel file systems | Zendy

Cheng Peng | Zendy; Wang Yong | Zendy; Lu Yutong | Zendy; Du Yunfei | Zendy; Chen Zhiguang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

UniIndex: An index and query middleware for parallel file systems

Author(s) -

Cheng Peng,

Wang Yong,

Lu Yutong,

Du Yunfei,

Chen Zhiguang

Publication year - 2019

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.5609

Subject(s) - computer science , search engine indexing , database , file system , data mining , middleware (distributed applications) , index (typography) , speedup , granularity , set (abstract data type) , information retrieval , operating system , world wide web , programming language

Summary As data analysis scenarios keep increasing on high‐performance computing systems, the ability to select a small fraction of data from a large volume of scientific data sets is vital to accelerate scientific discovery. However, parallel file systems lack the ability to provide efficient data locating services at the granularity of both a file and a record. Existing methods for identifying and indexing data are often domain‐specific and do not scale to large scientific data sets. In this paper, we describe the design and implementation of UniIndex framework, which combines our proposed techniques for user‐annotation extraction, in‐memory cache layer, in‐situ indexing, and parallel query processing. Acting as middleware on top of production file systems, UniIndex enables efficient data locating services with minimal user effort. Our evaluations show that UniIndex can locate target files from directories containing millions of files in microseconds. By applying in situ indexing and the lightweight range‐bitmap index, record‐level index building time can be dramatically reduced while maintaining up to two orders of magnitude query speedup than scanning the entire data set.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research