z-logo
open-access-imgOpen Access
Text data-augmentation using Text Similarity with Manhattan Siamese long short-term memory for Thai language
Author(s) -
Thananya Phreeraphattanakarn,
Boonserm Kijsirikul
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1780/1/012018
Subject(s) - computer science , leverage (statistics) , artificial intelligence , similarity (geometry) , natural language processing , long short term memory , training set , semantic similarity , data set , term (time) , set (abstract data type) , language model , artificial neural network , information retrieval , machine learning , recurrent neural network , image (mathematics) , physics , quantum mechanics , programming language
In this paper, we address the issue of using small text datasets for learning of neural networks. We explore the method that is used with image and sound datasets to augment data for increasing the performance of models. We then leverage this data augmentation technique to expand the training set of textual data. A great challenge in our dataset is that the amount of data is insufficient for training models. For this reason, we propose a method for augmenting text data specifically for Thai language which is based on Text Similarity and using the model to determine the semantic relationship between two sentences. The experimental results indicated that our proposed method is able to improve the performance of text classification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here