
Document Segmentation Method Based on Style Feature Fusion
Author(s) -
Gang Liu,
Kai Wang,
Wangyang Liu,
Xiao Cheng,
Tao Li
Publication year - 2019
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/646/1/012044
Subject(s) - style (visual arts) , computer science , pattern recognition (psychology) , segmentation , artificial intelligence , feature (linguistics) , cluster analysis , feature extraction , identity (music) , cluster (spacecraft) , position (finance) , data mining , geography , linguistics , philosophy , physics , archaeology , finance , acoustics , economics , programming language
Style crack refers to the position where the author’s identity changes in the article completed by multiple authors. This paper summarizes the current situation and theory of related fields at home and abroad, and proposes a multi-feature based document segmentation method for plagiarism detection. Seven text style features are used for style crack recognition. Through the result of feature extraction, the combination of multi-feature fusion and unsupervised machine learning algorithm is used to classify the features based on extraction, and the clustering algorithm is used to cluster the style features so as to find the location of style cracks. Experiments show that the method is effective and scientific, and achieves good results.