
Transfer Learning for Spam Text Classification
Author(s) -
Pratiksha Bongale
Publication year - 2021
Publication title -
international journal for research in applied science and engineering technology
Language(s) - English
Resource type - Journals
ISSN - 2321-9653
DOI - 10.22214/ijraset.2021.37349
Subject(s) - computer science , task (project management) , transfer of learning , key (lock) , artificial intelligence , machine learning , domain (mathematical analysis) , training set , labeled data , binary classification , support vector machine , engineering , mathematical analysis , computer security , mathematics , systems engineering
Today’s world is mostly data-driven. To deal with the humongous amount of data, Machine Learning and Data Mining strategies are put into usage. Traditional ML approaches presume that the model is tested on a dataset extracted from the same domain from where the training data has been taken from. Nevertheless, some real-world situations require machines to provide good results with very little domain-specific training data. This creates room for the development of machines that are capable of predicting accurately by being trained on easily found data. Transfer Learning is the key to it. It is the scientific art of applying the knowledge gained while learning a task to another task that is similar to the previous one in some or another way. This article focuses on building a model that is capable of differentiating text data into binary classes; one roofing the text data that is spam and the other not containing spam using BERT’s pre-trained model (bert-base-uncased). This pre-trained model has been trained on Wikipedia and Book Corpus data and the goal of this paper is to highlight the pre-trained model’s capabilities to transfer the knowledge that it has learned from its training (Wiki and Book Corpus) to classifying spam texts from the rest.