z-logo
open-access-imgOpen Access
BnVec: Towards the Development of Word Embedding for Bangla Language Processing
Author(s) -
Md. Kowsher,
Md. Jashim Uddin,
Anik Tahabilder,
Nusrat Jahan Prottasha,
Mahid Ahmed,
Kazi Masudul Alam,
Tamanna Sultana
Publication year - 2021
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v10i2.31538
Subject(s) - bengali , word2vec , computer science , word embedding , artificial intelligence , natural language processing , word (group theory) , embedding , mathematics , geometry
Progression in machine learning and statistical inference are facilitating the advancement of domains like computer vision, natural language processing (NLP), automation & robotics, and so on. Among the different persuasive improvements in NLP, word embedding is one of the most used and revolutionary techniques. In this paper, we manifest an open-source library for Bangla word extraction systems named BnVec which expects to furnish the Bangla NLP research community by the utilization of some incredible word embedding techniques. The BnVec is splitted up into two parts, the first one is the Bangla suitable defined class to embed words with access to the six most popular word embedding schemes (CountVectorizer, TF-IDF, Hash Vectorizer, Word2vec, fastText, and Glove). The other one is based on the pre-trained distributed word embedding system of Word2vec, fastText, and GloVe. The pre-trained models have been built by collecting content from the newspaper, social media, and Bangla wiki articles. The total number of tokens used to build the models exceeds 395,289,960. The paper additionally depicts the performance of these models by various hyper-parameter tuning and then analyzes the results.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here