
Machine learning methods for classification of sensitive data
Author(s) -
Gints Rudusans,
AUTHOR_ID,
Gatis Vītols,
AUTHOR_ID
Publication year - 2021
Publication title -
research for rural development/research for rural development (online)
Language(s) - English
Resource type - Conference proceedings
eISSN - 2255-923X
pISSN - 1691-4031
DOI - 10.22616/rrd.27.2021.046
Subject(s) - computer science , machine learning , artificial intelligence , naive bayes classifier , classifier (uml) , data classification , big data , statistical classification , general data protection regulation , labeled data , data mining , support vector machine , data protection act 1998 , computer security
In the era of Big Data there are a lot of new challenges – understanding, processing, and securing the data, assuring data quality, dealing with data growth and other challenges. One of the challenges is to identify and classify data sets in different systems which must follow the conditions defined by different regulations. The classification of these data sets can be automated using machine learning methods. The aim of the research is to provide machine learning methods for classifying sensitive data. The research is based on analysis and comparison of European Union legislation and scientific literature, which addresses issues of data classification using machine learning methods. Special attention is paid to sensitive data defined by the General Data Protection Regulation (GDPR). The main focus in this research is on supervised learning algorithms, where one of the most effective is Naïve Bayes classifier. In order to achieve good results, there is a need to find a proper training data set. Usage of hybrid methods provides a new way for increasing performance of classifiers.