Open Access
occCite: Tools for querying and managing large biodiversity occurrence datasets
Author(s) -
Owens Hannah L.,
Merow Cory,
Maitner Brian S.,
Kass Jamie M.,
Barve Vijay,
Guralnick Robert P.
Publication year - 2021
Publication title -
ecography
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.973
H-Index - 128
eISSN - 1600-0587
pISSN - 0906-7590
DOI - 10.1111/ecog.05618
Subject(s) - metadata , workflow , computer science , documentation , data science , citation , information retrieval , database , data mining , world wide web , programming language
The amount of observational and specimen‐based biodiversity data available to researchers is increasing exponentially, yet the ability to manage and cite large, complex biodiversity datasets lags behind. This management and citation gap impedes reproducibility for data users and the ability for data publishers to track use and accumulate use citations, ultimately harming the longer‐term sustainability of the still‐emerging enterprise of research data‐sharing. Here we present an R package, occCite (v. 0.4.7), to aid researchers in querying large species occurrence data aggregators (specifically, the Global Biodiversity Information Facility, GBIF, and the Botanical Information and Ecology Network, BIEN), and store metadata such as primary data providers, database accession dates, DOIs, and the taxonomic source used for search terms. occCite also includes tools to summarize and visualize query results and generate citation lists of all data providers and software packages used during the query process. We provide examples of a basic occurrence search and citation workflow as well as an advanced workflow using features for custom optimized searches, visualization, and summary procedures. occCite improves upon existing R packages by uniting data from powerful API‐based query packages ( rgbif and BIEN ) into a unified object‐based framework, while maintaining metadata vital to best‐practice recommendations for documenting biodiversity analysis workflows. occCite aims to efficiently close the gap in the citation cycle between primary data providers and final research products, allowing researchers to meet dataset documentation standards without sacrificing time and resources to the demands of providing increasing levels of detail on their datasets.