z-logo
open-access-imgOpen Access
TEMPORAL CONDENSATION OF TAMIL NEWS
Author(s) -
S Shreenidhi,
Sridhar Ranganathan
Publication year - 2021
Publication title -
engineering and technology journal
Language(s) - English
Resource type - Journals
ISSN - 2456-3358
DOI - 10.47191/etj/v6i7.03
Subject(s) - automatic summarization , newspaper , credibility , tamil , the internet , computer science , news media , journalism , information retrieval , advertising , political science , internet privacy , world wide web , business , linguistics , law , philosophy
Since the dawn of the Internet, we have been inundated with an excess of information. The volume of information available on the Internet is expected to grow exponentially. This brings a need for summarization of information. Thus, making summarization one of the most sought-after topics in the domain of natural language processing. It is essential to be informed about the vital happenings, and newspapers have been serving this purpose for a very long time. Sadly, there is a perception among the general public that no news agency today can be unequivocally trusted, the credibility of news articles is uncertain. Therefore, one has to read news articles from various sources to get an unbiased view on topic. When a query related to an event is entered in SEs like google, the search renders an overwhelming number of responses, it is humanly impossible to read all of them. In an effort to address the aforementioned problems, a condensation of news articles covering the Tamilnadu Legislative Assembly election is performed. The news articles were collected from various news sources over a period of two months. The collected articles were translated from Tamil to English. These articles included news about various events, in order to segregate Tamilnadu related news from them k-means clustering was performed on the dataset. The relvant news articles acquired was pre-processed to remove ambiguity and mistakes from translation. These articles were summarized individually using a linear regression model that gave importance to features such as named entities, number of words that were similar to title etc. The acquired individual summaries were summarized using BERT extractive summarizer as it would reduce redundancy. When generated summary was compared with introduction and title of the article in the absence of an introduction a precision of 0.512, recall of 0.25 and f-measure of 0.31 were obtained.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here