Document similarity based on concept tree distance | Zendy

Praveen Lakkaraju | Zendy; Susan Gauch | Zendy; Mirco Speretta | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Document similarity based on concept tree distance

Author(s) -

Praveen Lakkaraju,

Susan Gauch,

Mirco Speretta

Publication year - 2008

Publication title -

ku scholarworks (the university of kansas)

Language(s) - English

Resource type - Conference proceedings

DOI - 10.1145/1379092.1379118

Subject(s) - computer science , information retrieval , similarity (geometry) , search engine , tree (set theory) , similarity measure , data mining , vector space model , measure (data warehouse) , classifier (uml) , similitude , artificial intelligence , mathematics , mathematical analysis , image (mathematics)

The Web is quickly moving from the era of search engines to the era of discovery engines. Whereas search engines help you find information you are looking for, discovery engines help you find things that you never knew existed. A common discovery technique is to automatically identify and display objects similar to ones previously viewed by the user. Core to this approach is an accurate method to identify similar documents. In this paper, we present a new approach to identifying similar documents based on a conceptual tree-similarity measure. We represent each document as a concept tree using the concept associations obtained from a classifier. Then, we make employ a tree-similarity measure based on a tree edit distance to compute similarities between concept trees. Experiments on documents from the CiteSeer collection showed that our algorithm performed significantly better than document similarity based on the traditional vector space model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research