Penerapan Metode N-Gram dan Cosine Similarity Dalam Pencarian Pada Repositori Artikel Jurnal Publikasi | Zendy

Indra Gita Anugrah | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Penerapan Metode N-Gram dan Cosine Similarity Dalam Pencarian Pada Repositori Artikel Jurnal Publikasi

Author(s) -

Indra Gita Anugrah

Publication year - 2021

Publication title -

building of informatics, technology and science

Language(s) - English

Resource type - Journals

eISSN - 2685-3310

pISSN - 2684-8910

DOI - 10.47065/bits.v3i3.1058

Subject(s) - cosine similarity , computer science , information retrieval , similarity (geometry) , relevance (law) , trigonometric functions , preprocessor , artificial intelligence , mathematics , pattern recognition (psychology) , geometry , political science , law , image (mathematics)

Digital repository is one source of data in human information needs, especially in an organization. In a digital repository, various digital documents are stored that can be used by users, for example, a publication journal repository. Every day the published articles in the repository grow in the hundreds or even thousands in number, besides publication journals usually consist of various formats and languages. This will cause the search of results relatively low level of relevance. To optimize search results today, the application of an information retrieval system in a repository is important. Preprocessing is one of the most important stages of the development of a retrieval system, especially in the process of selecting a stemming algorithm to generate basic words (terms) which will later be used in determining the level of similarity between queries and documents in a search process. N-Gram is a method of character decomposition from a string that can be used to analyze words or sentences which are words or sentences from what language will later affect the determination of the stemming algorithm. Cosine Similarity is a method to determine the level of similarity, which will calculate the angle that represents the query vector and the document vector. In this study, a repository will be built that implements retrieval systems using N-Gram and Cosine Similarity, then the system performance will be calculated where the average total accuracy for Indonesian-language queries and English-language queries is 0.967, precision is 0.851 while the average recall is obtained. 0.869.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore