HPS 64th Annual Meeting

7-11 July 2019

Single Session



[Search]
[Schedule Grid]



EV32 - PEP 1C: Fundamentals of Reproducible Research (LaBone, Chalmers, Brackett)

Lake Down   08:00 - 10:00

 
Here we will define research to be the process where we: • Ask a question. • Acquire data that we hope is capable of answering the question. • Analyze the data. • Draw conclusions from the analysis that are generally applicable to similar situations and data not yet observed. Research can be high-stakes, a clinical trial for a new cancer treatment for example. Or, it can be fairly mundane, like trying to decide if your GM counter is operating properly. The gold standard for demonstrating that the conclusions you reached at the end of your research are valid is replication. Research is replicated when another person independently acquires another dataset, reanalyzes it, and arrives at more or less the same conclusions. Replication is not always feasible because it can be expensive, time consuming, unethical, or impossible. A lesser standard is reproduction. Research is reproduced when another person can recreate all the numbers and graphs in your report given your data, code, and associated documentation. There is a bit of a crisis in modern research because an uncomfortable amount of published research can't be replicated or reproduced. Failure to replicate someone's work is called science. Failure to reproduce someone's work is actually more troubling because at first glance one might think this should be easy to do. But, at a personal level, who has not experienced the situation where a plot in a report can't be reproduced by the author (much less someone else) at a later date? One can't help but to be suspicious of any research that can't be reproduced. The idea of reproducible research centers around configuring the workflow in your research so as to make it possible for someone else to readily reproduce all the numerical results and graphs in your report, starting with the original data and documentation on how you manipulated the this data. Today we are going to discuss details of reproducible research, including • asking a good question, • acquiring adequate data, • cleaning data, • using appropriate analytical methods, and • reaching conclusions that are based on the data and analysis. To a large extent the software tools you use for these activities has a huge impact on the effort involved with creating reproducible research and hence on the chances of your work being reproducible. The ubiquitous Microsoft Word/Excel applications do not easily lend themselves to the production of reproducible research, but there are other software packages that do. We will review some freely available applications like the statistical programming language R, the word-processing/typesetting software Lyx, and version control software Git that make this task easier. The goal of this software review is not necessarily to convert you to using these tools, but to illustrate what you should be trying to do with Microsoft Word/Excel if you use them to do your research.


[back to schedule]