Open Access
AN IMPROVED HAUSA WORD STEMMING ALGORITHM
Author(s) -
Sirajo Musa,
G. N. Obunadike,
Muhammad Muntasir Yakubu
Publication year - 2022
Publication title -
fudma journal of sciences
Language(s) - English
Resource type - Journals
ISSN - 2616-1370
DOI - 10.33003/fjs-2022-0601-899
Subject(s) - computer science , word (group theory) , natural language processing , affix , hausa , spelling , lemmatisation , factor (programming language) , search engine indexing , information retrieval , population , artificial intelligence , linguistics , algorithm , programming language , philosophy , demography , sociology
The explosion of scientific publications in different domains coupled with the introduction and socialization of the internet experienced in the last few decades has made information more available than ever before. Consequently, digital storage capacity has been consistently doubling to reflect this geometric increase in information. In view of this, Information Retrieval (IR), nowadays considered the dominant form of information access has become even more critical. However, the problem of using free text in indexing and retrieval arising from spelling mistake, alternative in spelling, affixes and abbreviations has continued to bedevil the field of IR. To mitigate this problem, Stemming Algorithm was introduced in the 1960s. Stemming is an automated process of stripping all word derivatives of their inflectional affixes in order to obtain stem of the word. Because stemming is language specific, there are stemming algorithms designed specifically for most of the major languages in the world. With a speaker population of about 150 million Hausa language stands in need of a better stemming algorithm. This research is an attempt to improve upon the existing Hausa word stemming algorithm. Affix stripping method of conflation with reference lookup was used. Using Sirsat’s evaluation method, this research achieved 96.9% as Correctly Stemmed Word Factor (CSWF), Index Compression Factor – 74.76%, Words Stemmed Factor (WSF) – 70.44% and Average Word Conflation Factor – 59.47%.