
Keyword-Driven Suffix Arrays for On-Line Keyword Searching from Documents In Chinese
Author(s) -
Yanhua Zhang
Publication year - 2012
Publication title -
international journal of artificial intelligence and applications
Language(s) - English
Resource type - Journals
eISSN - 0976-2191
pISSN - 0975-900X
DOI - 10.5121/ijaia.2012.3503
Subject(s) - computer science , suffix , keyword search , keyword spotting , information retrieval , natural language processing , artificial intelligence , line (geometry) , linguistics , mathematics , philosophy , geometry
On-line keyword searching from documents in Chinese tends to use inverted indexing as the main\udtechnique, which has its difficulties. Suffix Array is widely used for processing text in Western languages.\udHowever, it fails to get widely used in Chinese processing because of the speciality of Chinese. Suffix Array\udis a powerful tool. However it costs too much space. That is the major bottleneck of suffix Array. A data\udstructure called Keyword-driven Suffix Array is proposed in this paper for on-line keyword searching from\uddocuments in Chinese, based on observation of on-line search pattern and traits of Chinese. Space\udefficiency is improved a lot using this data structure. When the document database is large enough, space\udefficiency is improved by about 5/6 using this data structure without sacrificing its time efficiency.\u