Application of Genetic Algorithm and K-Nearest Neighbour Method in Real World Medical Fraud Detection Problem | Zendy

Hongxing He | Zendy; Simon Hawkins | Zendy; Warwick Graco | Zendy; Xin Yao | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Application of Genetic Algorithm and K-Nearest Neighbour Method in Real World Medical Fraud Detection Problem

Author(s) -

Hongxing He,

Simon Hawkins,

Warwick Graco,

Xin Yao

Publication year - 2000

Publication title -

journal of advanced computational intelligence and intelligent informatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.172

H-Index - 20

eISSN - 1343-0130

pISSN - 1883-8014

DOI - 10.20965/jaciii.2000.p0130

Subject(s) - euclidean distance , k nearest neighbors algorithm , computer science , metric (unit) , naive bayes classifier , artificial intelligence , class (philosophy) , sample (material) , algorithm , statistical classification , nearest neighbour , classification rule , genetic algorithm , pattern recognition (psychology) , data mining , machine learning , support vector machine , operations management , chemistry , chromatography , economics

In the k-Nearest Neighbour (kNN) algorithm, the classification of a new sample is determined by the class of its k nearest neighbours. The performance of the kNN algorithm is influenced by three main factors: (1) the distance metric used to locate the nearest neighbours; (2) the decision rule used to derive a classification from the k-nearest neighbours; and (3) the number of neighbours used to classify the new sample. Using k = 1, 3, or 5 nearest neighbours, this study uses a Genetic Algorithm (GA) to find the optimal non-Euclidean distance metric in the kNN algorithm and examines two alternative methods (Majority Rule and Bayes Rule) to derive a classification from the k nearest neighbours. This modified algorithm was evaluated on two real-world medical fraud problems. The General Practitioner (GP) database is a 2-class problem in which GPs are classified as either practising appropriately or inappropriately. The ’.Doctor-Shoppers’ database is a 5-class problem in which patients are classified according to the likelihood that they are ’doctor-shoppers’. Doctor-shoppers are patients who consult many physicians in order to obtain multiple prescriptions of drugs of addiction in excess of their own therapeutic need. In both applications, classification accuracy was improved by optimising the distance metric in the kNN algorithm. The agreement rate on the GP dataset improved from around 70% (using Euclidean distance) to 78 % (using an optimised distance metric), and from about 55% to 82% on the Doctor Shopper’s dataset. Differences in either the decision rule or the number of nearest neighbours had little or no impact on the classification performance of the kNN algorithm. The excellent performance of the kNN algorithm when the distance metric is optimised using a genetic algorithm paves the way for its application in the real world fraud detection problems faced by the Health Insurance Commission (HIC).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research