Semi‐supervised incremental feature extraction algorithm for large‐scale data stream | Zendy

Tan Chao | Zendy; Ji Genlin | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Semi‐supervised incremental feature extraction algorithm for large‐scale data stream

Author(s) -

Tan Chao,

Ji Genlin

Publication year - 2016

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3914

Subject(s) - computer science , artificial intelligence , feature extraction , pattern recognition (psychology) , machine learning , k nearest neighbors algorithm , feature (linguistics) , scale (ratio) , spark (programming language) , subspace topology , data mining , algorithm , quantum mechanics , programming language , philosophy , physics , linguistics

Summary In big data era, how to process large‐scale data stream is one of the existing challenges. Feature extraction method has attracted much attention because of its effectiveness to data classification. Traditional classification algorithms may take less advantage of labeled samples information. Online learning and out‐of‐sample problems are also hot topics recently. To solve these problems, a novel algorithm named s emi‐supervised i ncremental f eature e xtraction algorithm is proposed in this paper. First, we extract feature incrementally in unsupervised way. Then we propose a semi‐supervised subspace learning algorithm by taking advantage of class information to adjust k ‐nearest neighbor weights. Third, we combine the unsupervised and semi‐supervised feature extraction approaches to obtain objective function, in order to solve the out‐of‐sample learning problem. Experiments have been carried out on Machine learning datasets of University of California Irvine (UCI) datasets and real‐world face image datasets (Olivetti faces (ORL), Yale, YaleB, and Rendered face). To demonstrate the proposed algorithm's expandability to process the large‐scale data stream, classification experiments using Spark skill in parallel computation environment are performed, with comparisons with some related semi‐supervised feature extraction methods. The experiment results and computational complex comparison demonstrate that the proposed algorithm can obtain good performance. Copyright © 2016 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research