Skip to main content

Instruction in Enhancing Reproducibility

INSTRUCTION IN METHODS FOR ENHANCING REPRODUCIBILITY

Integration of reproducibility training in the overall HGEN curriculum: All students in the biomedical PhD programs at Vanderbilt are required to take training in the Responsible Conduct of Research (RCR) developed by Vanderbilt’s Office of Biomedical Research Education and Training (BRET) at the end of their first year of graduate school. This full-day workshop includes specific content on methods for enhancing research reproducibility, including defining and avoiding plagiarism, data fabrication and falsification. That training also includes details on evaluating conflicts of interest, a review of policies for human subjects and animal subjects, and consideration of the mentor-mentee responsibilities to each other and to the scientific community. Many aspects of data, including acquisition, management, sharing, and ownership are discussed in this training workshop, as is the concept of responsible authorship.

In required HGEN coursework, trainees are taught a solid understanding of statistical analysis for both quantitative and qualitative studies, an understanding of research design that reinforces scientific rigor, and how to minimize bias and appropriately account for biological variables such as sex. Discussions in several HGEN classes cover such topics as how important it is to authenticate biological resources, and the many ways scientists use to accomplish this authentication. Journal clubs consistently reinforce the goal of transparent reporting of research results. To reinforce these lessons, we build further discussion of these issues into PhD Qualifying Examinations, work-in-progress meetings, and discussion sessions after seminars.

In the required core course HGEN 8340 & 8341, didactic classroom lectures discuss how even the order of quality control studies routinely conducted for –omics data of all sorts influences the composition of the final dataset used for analysis, and reinforces that quality standards for data are themselves data dependent; quality control studies that are adequate for sample sizes of hundreds or thousands of subjects may be inadequate for studies on hundreds of thousands to millions of subjects.

In the required course HGEN 8370 the students present critical readings of recent genetics papers in Scienceand Nature. The students are instructed to approach these papers as if they were a referee for the journal, and to specifically address the rigor of both the experimental and statistical analysis methods, and the reproducibility of the research considering the level of detail given in the methods section of the paper.

In the required course HGEN 8371 the students are taught how to write scientific papers.  This course reinforces the messages from the previous semester course HGEN 8370 (above) on communicating detail in their manuscripts so that the rigor of their work is communicated and so that the methods section has complete details to make their work reproducible.

In the required course HGEN 8385, the students read in-depth several fundamental papers in the development of genetics over the past century and further.  This course deals directly with changes in concepts of rigor and reproducibility in science over this period.  The purpose of this course is to give the students a critical evaluation of the foundations of the field of genetics.

In the elective course HGEN 8394 we teach python programming. This course presents the concept of “containerized” software tools, that offer automatic versioning of software and data input files, time-stamps for when analyses were completed and the straightforward ability to do turnkey reproducibility – to completely reproduce a set of analysis – which has become a routine requirement in quantitative fields such as human genetics.

In the elective course HGEN 8383 we teach our students how to deal with the difficult questions of rigor and reproducibility in a modern biobank. Rigor in the diagnosis and phenome studies has long been a special emphasis at Vanderbilt because of research in our biobank (BioVU). Among key points taught in this course are that laboratory values and vital signs are part of EHR data for a reason – there is often some suspicion of a particular disease or diagnosis that prompts a physician to order a laboratory test or to take vital signs. This is different than data from carefully controlled prospective studies where the data acquisition can be designed to avoid bias.  Students in this course are trained on how to carry out and quantify assessments of phenotype algorithms through comparisons against gold-standard manual review by trained clinicians.

In both this elective HGEN 8383 and the required HGEN 8341 courses, our trainees are alerted to the pitfalls of studies of polygenic risk scores in association studies across the medical phenome (PheWAS) or in the set of clinically measured laboratory values (LabWAS). For example, a polygenic risk score for schizophrenia may show associations to several phenotypes that are common complications of drugs used to treat schizophrenia unless patients with schizophrenia are removed from analysis as part of the study design.

As these examples show, a part of our instruction in methods for enhancing reproducibility in science is structured and systematic. But part of our instruction is also about how to think about interpretation of results from complex systems, and to look for systematic biases that may compromise the interpretation of results.

Training of rigorous experimental design and data interpretation: This topic is specifically addressed in the courses HGEN 8341 regarding genome wide association studies, and other genome wide methods such as polygenic risk scores, and in the course HGEN 8383.  Teaching proper experimental design and proper data interpretation methods are a fundamental part of the science of human genetics.  We assess our students understanding of these fundamental concepts in the phase 2 of our Qualifying Exam, where the students prepare a 5-page written research plan for their proposed thesis research, and then defend that proposal in an oral exam.  The primary focus on that phase of the qualifying exam is an assessment of the rigor of the research plan and the details of the planned data analysis and interpretation.

Rigor and reproducibility training in the context of mentored research.  Once our students pass their qualifying exam, the assessment of the rigorous experimental design and the proper data interpretation for the student’s research continues through the activity of the student’s thesis committee.  The student meets with this committee at least every 6 months to present their research work.  Much of the content of these committee meetings revolves around proper study design, and advice on how that study design could be improved.  As the student progresses in their research, the focus grows on proper data interpretation, with discussions of potential unrecognized biases affecting the student’s analyses.

Consideration of relevant biological variables such as sex. This is a fundamental aspect of almost all modern human genetics research and is dealt with in all aspects of our PhD training.

Data & material sharing and record keeping. For our students carrying out computational resource, the requirements of data sharing are a fundamental part of their training.  This is dealt with in the HGEN courses 8371 (teaching scientific writing) and in HGEN 8394 (python programming). Students carrying out wet-lab research in our program often deal with cell lines and/or animal models for which material sharing is fundamental.  Education on this practical topic is carried out through the course of their mentored research by both the student’s primary mentor and the members of their thesis committee.

Transparency in reporting: This topic is treated throughout the scientific writing course (HGEN 8371).  The students also deal with this issue directly in the annual research-in-progress talk that each student is required to give and at which they are rigorously questioned by both the faculty and their trainee peers. All other experiences that the students have in reporting their research results (at group meetings, national and international conferences, and to their thesis committees) give the students direct experience in the importance of transparency in reporting. This process culminates in the preparation and defense of the PhD thesis.

Further institutional resources for Rigor and Reproducibility education: VUMC and VU have created many resources to enable trainees to improve their understanding of concepts related to research reproducibility, and the VGI keeps track of these resources to facilitate their use in relevant courses and for self-study by our students (and faculty) in preparation for preliminary and qualifying exams. These concepts are also emphasized in grant writing workshops often taken by students nearing their PhD defense, that are provided throughout the institution for both K- and R-awards. These workshops cover how to write text that appropriately assesses the rigor of prior research related to the premise of the research, how to build scientific rigor into study design and interpretation of results, transparency in reporting data and results, and how to ensure that your studies can be reproduced.

Specific VUMC resources that can be tapped for Rigor and Reproducibility training include the Translational Bridge Seminars that occur monthly and include seminars on unconscious bias, responsible collaboration, responsible authorship, and how to write career development award applications. The Vanderbilt Institute for Clinical and Translational Research offers Studios to provide expert consultation on research design, recruitment strategies, implementation strategies and dissemination of research results. We encourage trainees to request a studio prior to completing an important manuscript. Biostatistics Clinics are offered monthly by the Dept. of Biostatistics to enable anyone who needs it feedback from biostatisticians on data analysis and interpretation, and in transparent presentation of results.