Open Access
A novel unsupervised feature‐based approach for electricity theft detection using robust PCA and outlier removal clustering algorithm
Author(s) -
Hussain Saddam,
Mustafa Mohd Wazir,
Jumani Touqeer Ahmed,
Baloch Shadi Khan,
Saeed Muhammad Salman
Publication year - 2020
Publication title -
international transactions on electrical energy systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.428
H-Index - 42
ISSN - 2050-7038
DOI - 10.1002/2050-7038.12572
Subject(s) - cluster analysis , computer science , outlier , anomaly detection , data mining , principal component analysis , categorization , artificial intelligence , pattern recognition (psychology)
Abstract This paper presents a novel data‐oriented unsupervised machine learning‐based theft detection approach for efficiently identifying the fraudster consumers. It accomplishes the above‐mentioned objective by exploiting the intelligence of the robust principal component analysis (ROBPCA) algorithm in conjunction with the outlier removal clustering (ORC) algorithm. To avoid the irregularities in acquired consumers’ data from a power utility, the statistical features are extracted from each consumer's consumption patterns using an anomalous time series extension. Based on the extracted features, the consumers with most similar features are initially grouped into two categories using the ROBPCA algorithm. In order to evade any overlapping between the two newly formed groups, the ORC algorithm is utilized to categorize the consumers distinctly as “suspicious” and “non‐suspicious”. Finally, a very selective onsite inspection is proposed, thus, saving the considerable time, resources, and overall cost of the utilities. The effectiveness of the proposed theft detection method is validated by comparing its performance with nine most widely used outlier detection methods on the basis of seven of the most prominent performance metrics. The accuracy and detection rate of the proposed technique are found as 94.34% and 92.52%, respectively, which is significantly higher than that of other studied conventional methods.