
Implementation of Weighted Tree Similarity and Cosine Sorensen-Dice Algorithms for Semantic Search in Document Repository Information System
Author(s) -
Abdurrosyiid Amrullah,
Indra Gita Anugrah
Publication year - 2021
Publication title -
journal of development research
Language(s) - English
Resource type - Journals
eISSN - 2579-9347
pISSN - 2579-9290
DOI - 10.28926/jdr.v5i1.143
Subject(s) - cosine similarity , information retrieval , computer science , metadata , dice , similarity (geometry) , tree (set theory) , semantic similarity , data mining , artificial intelligence , world wide web , cluster analysis , mathematics , image (mathematics) , mathematical analysis , geometry
As more and more documents we manage, the more difficult it is in the search process, and the need to use information retrieval becomes important. With the information retrieval system, it can help in searching for documents that match the similarity of keywords. Usually document searches usually only see the name of the document (file) being searched for by the user without paying attention to the content or metadata of the document, so that it cannot meet their information needs. Document search has several approaches, including full-text search, plain metadata search and semantic search. This study uses the Weighted Tree Similarity algorithm with the Cosine Sorensen Dice algorithm to calculate the semantic search similarity. In this study, document metadata is represented in the form of a tree that has labeled nodes, labeled branches and weighted branches. The similarity calculation on the subtree edge label uses Cosine Sorensen Dice, while the total similarity of a document uses the weighted tree similarity. The metadata structure of the document uses the taxonomy owner, description, title, disposition content and type. The result of this research is a document search application with taxonomic weight on file storage.