
Confidential Data Identification Using Data Mining Techniques in Data Leakage Prevention System
Author(s) -
Peneti Subhashini,
Padmaja Rani B
Publication year - 2015
Publication title -
international journal of data mining and knowledge management process
Language(s) - English
Resource type - Journals
eISSN - 2231-007X
pISSN - 2230-9608
DOI - 10.5121/ijdkp.2015.5505
Subject(s) - computer science , leakage (economics) , confidentiality , identification (biology) , data mining , computer security , botany , biology , economics , macroeconomics
Data leakage means sending confidential data to an unauthorized person. Nowadays, identifying confidential data is a big challenge for the organizations. We developed a system by using data mining techniques, which identifies confidential data of an organization. First, we create clusters for the training\uddata set. Next, identify confidential terms and context terms for each cluster. Finally, based on the confidential terms and context terms, the confidentiality level of the detected document calculated in terms of score. If the score of the detected document beyond a predefined threshold, then the document is blocked and marked as a confidential