Premium
Semi‐supervised incremental feature extraction algorithm for large‐scale data stream
Author(s) -
Tan Chao,
Ji Genlin
Publication year - 2016
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3914
Subject(s) - computer science , artificial intelligence , feature extraction , pattern recognition (psychology) , machine learning , k nearest neighbors algorithm , feature (linguistics) , scale (ratio) , spark (programming language) , subspace topology , data mining , algorithm , quantum mechanics , programming language , philosophy , physics , linguistics
Summary In big data era, how to process large‐scale data stream is one of the existing challenges. Feature extraction method has attracted much attention because of its effectiveness to data classification. Traditional classification algorithms may take less advantage of labeled samples information. Online learning and out‐of‐sample problems are also hot topics recently. To solve these problems, a novel algorithm named s emi‐supervised i ncremental f eature e xtraction algorithm is proposed in this paper. First, we extract feature incrementally in unsupervised way. Then we propose a semi‐supervised subspace learning algorithm by taking advantage of class information to adjust k ‐nearest neighbor weights. Third, we combine the unsupervised and semi‐supervised feature extraction approaches to obtain objective function, in order to solve the out‐of‐sample learning problem. Experiments have been carried out on Machine learning datasets of University of California Irvine (UCI) datasets and real‐world face image datasets (Olivetti faces (ORL), Yale, YaleB, and Rendered face). To demonstrate the proposed algorithm's expandability to process the large‐scale data stream, classification experiments using Spark skill in parallel computation environment are performed, with comparisons with some related semi‐supervised feature extraction methods. The experiment results and computational complex comparison demonstrate that the proposed algorithm can obtain good performance. Copyright © 2016 John Wiley & Sons, Ltd.