z-logo
open-access-imgOpen Access
Is the Supporting Information the Venue for Reproducibility and Transparency?
Author(s) -
Benjamin Rudshteyn,
Atanu Acharya,
Víctor S. Batista
Publication year - 2017
Publication title -
the journal of physical chemistry a
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.756
H-Index - 235
eISSN - 1520-5215
pISSN - 1089-5639
DOI - 10.1021/acs.jpca.7b11663
Subject(s) - transparency (behavior) , citation , computer science , social media , world wide web , library science , information retrieval , computer security
Transparency? M research data, software, and data processing tools readily available to the public could significantly enhance the impact of scientific publications. Openness and transparency could address reproducibility concerns and accelerate scientific progress. While archives and repositories would be preferable for securing data in the long term, the Supporting Information (SI) already allocates significant space to provide auxiliary files, links, and essential information needed to make scientific findings immediately reproducible as well as data processing protocols and numerical procedures executable. Furthermore, the SI document could ensure that data that might not fit in the tight confines of a journal article lives online even when laboratories move on to other projects, close, and lose track of their data. The change for openness and transparency might be difficult to embrace, depending on the specific nature of the project. Nevertheless, the trend to make raw research data and opensource software packages available to the public has been building steady momentum. Openness in research data and software sharing is already making transformative contributions to the communication of research findings, critical for ensuring reproducibility as well as training of the next generation of scientists. Significant progress on transparency has already been made in various fields, such as molecular and protein crystallography, that will likely continue inspiring the broader scientific community. Databases and repositories in the public domain have been successfully established and proved transformational for a wide range of studies of small molecules and proteins, including extensive theoretical work and detailed analysis that builds upon data provided by expert crystallographers. Much of this work is expedited by using machine-readable formats such as PDB and CIF, providing essential data that assists the task of peer review. At the same time, comparative studies of reported model structures as well as homology models are routinely performed, enabling studies that would never be possible if crystallography data were confined only to the research groups who reported the data. Similarly, sophisticated data search engines, such as Google Scholar and PubMed, together with the tools of data science have transformed the way the scientific community operates, allowing for instant accessibility of publications from anywhere and at any time. In the theoretical/computational field, resources such as GitHub allow for massive dissemination of open-source software and codes under a stable URL. In addition to commercial software, open-source software allows the global community to build upon codes developed by other researchers when they opt not to invest time and resources to redevelop tools that have already been published. Such resources ensure reproducibility of reported results and secure codes that might otherwise run the risk of being lost in obscurity when developers no longer support them. An outstanding question is whether the physical chemistry community has sufficiently embraced the trend for openness or whether there are opportunities for further developments. Some enthusiasts of transparency suggest that the SI should provide enough data, instructions, and information for direct reproducibility of the reported results by the reviewers. Others suggest that immediate reproducibility should be a requirement for publication. While that suggestion is under debate, simple steps could already be taken to make publications more impactful and reported findings readily reproducible. For example, some researchers suggest that the universal, but difficult to process, PDF might not always be the most appropriate form for the SI. In particular, the PDF format is not ideal for data mining and facile analysis in the context of other data. This aspect could be most relevant to methodology papers that intend to introduce approaches that others would adopt after reproducing the results reported in the publication. On the experimental front, the SI could include accessibility to the raw data as well as the files necessary for data processing (e.g., Igor Pro, Excel, Prism, etc.). For example, NMR studies could benefit from access to raw data provided through the SI. While the free induction decay (FID) contains all of the research data of an NMR experiment, it is routinely made unavailable in favor of Fourier-transformed plots and tables, which themselves are prone to unspecified data processing methods and human error. The transformation of the data may result in the loss of valuable information such as line widths, which comment on the dynamical nature of the system, as well as information such as the field strengths essential to reproduce the experiments. Including the FID as part of the Supporting Information (and making it part of a repository system) could allow for faster correction of mischaracterized molecules and identification of potential impurities that, while minute, may still be important (e.g., could affect biological activity). While FID data is typically found in proprietary formats, free software packages could convert the data into a standard format and enable readers to regenerate the plots in the paper for themselves. The field of computational chemistry is especially conducive to making full use of the SI. Traditionally, publications have provided information for “reproducibility” in the form of keywords and parameters in sentence form as well as figures of molecular structures that might not be always sufficient for reproducibility. More recently, researchers have started to provide XYZ coordinates in the SI. However, some researchers suggest that input files should be provided because they are often essential for immediate reproducibility. In addition to coordinates, reproducibility of electronic structure calculations might require essential information from input files, such as the convergence criteria and implemented algorithms, which are often decisive in reproducibility even when using the same XYZ coordinates. Molecular dynamics (MD) simulations also require many “details” of the simulation conditions, beyond

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom