
A HYBRID MODEL USING THE PRETRAINED BERT AND DEEP NEURAL NETWORKS WITH RICH FEATURE FOR EXTRACTIVE TEXT SUMMARIZATION
Author(s) -
Tuan Minh Luu,
Hanh N. D. Le,
Tan Minh Hoang
Publication year - 2021
Publication title -
journal of computer science and cybernetics (vietnam academy of science and technology)/journal of computer science and cybernetics
Language(s) - English
Resource type - Journals
eISSN - 2815-5939
pISSN - 1813-9663
DOI - 10.15625/1813-9663/37/2/15980
Subject(s) - automatic summarization , computer science , artificial intelligence , sentence , convolutional neural network , vietnamese , feature (linguistics) , natural language processing , relevance (law) , artificial neural network , machine learning , philosophy , linguistics , political science , law
Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.