
Author gender metadata augmentation of hathitrust digital library
Author(s) -
Peng Zong,
Chen Miao,
Kowalczyk Stacy,
Plale Beth
Publication year - 2014
Publication title -
proceedings of the american society for information science and technology
Language(s) - English
Resource type - Journals
eISSN - 1550-8390
pISSN - 0044-7870
DOI - 10.1002/meet.2014.14505101098
Subject(s) - metadata , computer science , world wide web , digital library , information retrieval , domain (mathematical analysis) , set (abstract data type) , public domain , resource (disambiguation) , scale (ratio) , quality (philosophy) , library science , geography , cartography , art , mathematical analysis , computer network , philosophy , literature , poetry , mathematics , archaeology , epistemology , programming language
Bibliographic metadata is essential for digital library resource description. Especially as the size and number of bibliographic entities grows, high‐quality metadata enables richer forms of digital library access, search, and use. Metadata records can be enriched through automated techniques. For example, a digital humanities scholar might use the gender of a set of authors during their literature analysis. In this study, we undertook to enrich the metadata description of a large‐scale digital library, the HathiTrust (HT) digital library, specifically by determining the gender of authors of the public domain portion of the collection. The results are stored to a separate Solr index accessible through the HathiTrust Research Center services. This study, which successfully resolved in 78.9% of the cases the gender of authors in the HT public domain corpus, suggests future research directions in capturing and representing the provenance of the contributing sources to enhance trust, and in machine learning to resolve the remaining names.