Minimization of Gini Impurity: NP-completeness and Approximation Algorithm via Connections with the k-means Problem
Author(s) -
Eduardo Sany Laber,
Lucas Murtinho
Publication year - 2019
Publication title -
electronic notes in theoretical computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.242
H-Index - 60
ISSN - 1571-0661
DOI - 10.1016/j.entcs.2019.08.050
Subject(s) - partition (number theory) , mathematics , completeness (order theory) , cluster analysis , combinatorics , minification , set (abstract data type) , key (lock) , discrete mathematics , mathematical optimization , computer science , statistics , mathematical analysis , computer security , programming language
The Gini impurity is a very popular criterion to select attributes during decision trees construction. In the problem of finding a partition with minimum weighted Gini impurity (PMWGP), the one faced during the construction of decision trees, a set of vectors must be partitioned into k different clusters such that the partition's overall Gini impurity is minimized. We show that PMWGP is APX-hard for arbitrary k and admits a randomized PTAS when the number of clusters is fixed. These results significantly improve the current knowledge on the problem. The key idea to obtain these results is to explore connections between PMWGP and the geometric k-means clustering problem.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom