
Full Text Search and Indexing in Languages With Two Alphabets
Author(s) -
Tijana Talić
Publication year - 2014
Publication title -
journal of information technology and applications
Language(s) - English
Resource type - Journals
eISSN - 2233-0194
pISSN - 2232-9625
DOI - 10.7251/jit1401041t
Subject(s) - search engine indexing , affix , computer science , information retrieval , natural language processing , index (typography) , vocabulary , artificial intelligence , world wide web , linguistics , philosophy
The languages spoken in Bosnia and Herzegovina use both Cyrillic and Latin equally. This is an additional problem with indexing and full text searching. In this paper, we are analyzing this problem. Using the tools available on PostgreSQL and ispell dictionaries, we made a solution. As part of the solutions, we created a dictionary of stop words, adjusted the affix file for both alphabets and from the list of words made functional vocabularies for indexing and searching. We made a full search configuration which is useful for indexing texts in both alphabets.