Building topic models in a federated digital library through selective document exclusion | Zendy

Efron Miles | Zendy; Organisciak Peter | Zendy; Fenlon Katrina | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Building topic models in a federated digital library through selective document exclusion

Author(s) -

Efron Miles,

Organisciak Peter,

Fenlon Katrina

Publication year - 2011

Publication title -

proceedings of the american society for information science and technology

Language(s) - English

Resource type - Journals

eISSN - 1550-8390

pISSN - 0044-7870

DOI - 10.1002/meet.2011.14504801048

Subject(s) - metadata , computer science , digital collections , digital library , information retrieval , world wide web , topic model , quality (philosophy) , data science , art , philosophy , literature , poetry , epistemology

Building topic models in federated digital collections presents numerous challenges due to metadata inconsistencies. The quality of topical metadata is difficult to ascertain and is interspersed with often irrelevant administrative metadata. In this study, we propose a way to improve topic modeling in large collections by identifying documents that convey only weak topical information. These documents are ignored when training topic models. Their topical associations are instead inferred model training. A method is outlined for identifying weakly topical documents by defining runs of similar documents in a collection. In preliminary evaluation using a corpus from the Institute of Museum and Library Services Digital Collections and Content aggregation, results show an increase in coherence among words in topics. In showing this, we demonstrate that it may be beneficial to induce topic models using less, higher‐quality data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore