The Role of Data Repositories in Reproducible ResearchPosted: June 19, 2014
By Limor Peer in Institution for social and policy studies blog
Who is responsible for the quality of data deposited in repositories? And what is quality data, anyway?
These questions were on my mind as I was preparing to present a poster at the Open Repositories 2013 conference in Charlottetown, PEI earlier this month. The annual conference brings the digital repositories community together with stakeholders, such as researchers, librarians, publishers and others to address issues pertaining to “the entire lifecycle of information.” The conference theme this year, “Use, Reuse, Reproduce,” could not have been more relevant to the ISPS Data Archive. Two plenary sessions bookended the conference, both discussing the credibility crisis in science. In the opening session, Victoria Stodden set the stage with her talk about the central role of algorithms and code in the reproducibility and credibility of science. In the closing session, Jean-Claude Guédon made a compelling case that open repositories are vital to restoring quality in science.
My poster, titled, “The Repository as Data (Re) User: Hand Curating for Replication,” illustrated the various data quality checks we undertake at the ISPS Data Archive. The ISPS Data Archive is a small archive, for a small and specialized community of researchers, containing mostly small data. We made a key decision early on to make it a “replication archive,” by which we mean a repository that holds data and code for the purpose of being used to replicate and verify published results.
The poster presents ISPS Data Archive’s answer to the questions of who is responsible for the quality of data and what that means: We think that repositories do have a responsibility to examine the data and code we receive for deposit before making the files public, and that this data review involves verifying and replicating the original research outputs. In practice, this means running the code against the data to validate published results. These steps in effect expand the role of the repository and more closely integrate it into the research process, with implications for resources, expertise, and relationships, which I will explain here.
First, a word about what data repositories usually do, the special obligations reproducibility imposes, and who is fulfilling them now. This ties in with a discussion of data quality, data review, and the role of repositories.
Read More Here