
MetaCrowd: Crowdsourcing Biomedical Metadata Quality Assessment
Author(s) -
Amrapali Zaveri,
Wei Hu,
Michel Dumontier
Publication year - 2019
Publication title -
human computation
Language(s) - English
Resource type - Journals
ISSN - 2330-8001
DOI - 10.15346/hc.v6i1.98
Subject(s) - metadata , crowdsourcing , computer science , world wide web , interoperability , key (lock) , geospatial metadata , controlled vocabulary , reuse , quality (philosophy) , annotation , information retrieval , metadata repository , data science , meta data services , biology , ecology , artificial intelligence , philosophy , computer security , epistemology
To reuse the enormous amounts of biomedical data available on the Web, there is an urgent need for good quality metadata. This is extremely important to ensure that data is maximally Findable, Accessible, Interoperable and Reusable. The Gene Expression Omnibus (GEO) allow users to specify metadata in the form of textual key: value pairs (e.g. sex: female). However, since there is no structured vocabulary or format available, the 44,000,000+ key: value pairs suffer from numerous quality issues. Using domain experts for the curation is not only time consuming but also unscalable. Thus, in our approach, MetaCrowd, we apply crowdsourcing as a means for GEO metadata quality assessment. Our results show crowdsourcing is a reliable and feasible way to identify similar as well as erroneous metadata in GEO. This is extremely useful for data consumers and producers for curating and providing good quality metadata.