Finding nuggets in documents: A machine learning approach | Zendy

Brook Wu Yifang | Zendy; Li Quanzhi | Zendy; Bot Razvan Stefan | Zendy; Chen Xin | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Finding nuggets in documents: A machine learning approach

Author(s) -

Brook Wu Yifang,

Li Quanzhi,

Bot Razvan Stefan,

Chen Xin

Publication year - 2006

Publication title -

journal of the american society for information science and technology

Language(s) - English

Resource type - Journals

eISSN - 1532-2890

pISSN - 1532-2882

DOI - 10.1002/asi.20341

Subject(s) - computer science , automatic summarization , information retrieval , personalization , phrase , thesaurus , function (biology) , metadata , automatic indexing , document clustering , glossary , search engine indexing , cluster analysis , world wide web , natural language processing , artificial intelligence , linguistics , philosophy , evolutionary biology , biology

Document keyphrases provide a concise summary of a document's content, offering semantic metadata summarizing a document. They can be used in many applications related to knowledge management and text mining, such as automatic text summarization, development of search engines, document clustering, document classification, thesaurus construction, and browsing interfaces. Because only a small portion of documents have keyphrases assigned by authors, and it is time‐consuming and costly to manually assign keyphrases to documents, it is necessary to develop an algorithm to automatically generate keyphrases for documents. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified phrases to assign weights to the candidate keyphrases. The logic of our algorithm is: The more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. KIP's learning function can enrich the glossary database by automatically adding new identified keyphrases to the database. KIP's personalization feature will let the user build a glossary database specifically suitable for the area of his/her interest. The evaluation results show that KIP's performance is better than the systems we compared to and that the learning function is effective.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research