z-logo
Premium
Similarity measures for sequential data
Author(s) -
Rieck Konrad
Publication year - 2011
Publication title -
wiley interdisciplinary reviews: data mining and knowledge discovery
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.506
H-Index - 47
eISSN - 1942-4795
pISSN - 1942-4787
DOI - 10.1002/widm.36
Subject(s) - similarity (geometry) , computer science , data mining , string (physics) , task (project management) , edit distance , key (lock) , domain (mathematical analysis) , word (group theory) , information retrieval , artificial intelligence , theoretical computer science , machine learning , mathematics , image (mathematics) , mathematical analysis , management , computer security , geometry , economics , mathematical physics
Expressive comparison of strings is a prerequisite for analysis of sequential data in many areas of computer science. However, comparing strings and assessing their similarity is not a trivial task and there exists several contrasting approaches for defining similarity measures over sequential data. In this paper, we review three major classes of such similarity measures: edit distances, bag‐of‐word models, and string kernels. Each of these classes originates from a particular application domain and models similarity of strings differently. We present these classes and underlying comparisons in detail, highlight advantages, and differences as well as provide basic algorithms supporting practical applications. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 296–304 DOI: 10.1002/widm.36 This article is categorized under: Algorithmic Development > Biological Data Mining Algorithmic Development > Text Mining Fundamental Concepts of Data and Knowledge > Data Concepts Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here