Text representation and classification based on bi-gram alphabet | Zendy

Fatma El-Ghannam | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Text representation and classification based on bi-gram alphabet

Author(s) -

Fatma El-Ghannam

Publication year - 2019

Publication title -

journal of king saud university - computer and information sciences

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.617

H-Index - 33

eISSN - 2213-1248

pISSN - 1319-1578

DOI - 10.1016/j.jksuci.2019.01.005

Subject(s) - n gram , computer science , natural language processing , feature vector , artificial intelligence , alphabet , construct (python library) , feature (linguistics) , space (punctuation) , representation (politics) , vector space model , support vector machine , arabic , information retrieval , language model , linguistics , philosophy , politics , political science , law , programming language , operating system

In text classification, texts have to be transformed into numeric representations suitable for the learning algorithms. A main problem with the commonly used bag of words method is the high dimensions of vector space, as well as the need for language-dependent tools. In the present study, text classification is performed based on a novel bi-gram alphabet approach to construct feature terms. The proposed approach has two main contributions to text classification area. First, we have demonstrated the possibility of using constant feature terms that are based on the standard alphabet without the need for the documents vocabularies; this definitely helps in reducing the dimensions of the vector space for large corpus. Second, it does not require natural language processing tools. The current work has proved the ability to classify collections of Arabic or English text documents successfully. It showed approximately 80% savings in vector space and 2% performance improvement compared to the best recorded results on Arabic dataset Aljazeera News.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research