Importance of Model Selection in Systems Biology

The rapidly growing field of systems biology attempts to develop mathematical models that both describe and predict complex cellular behavior. Our increasing sophistication in computational statistical approaches has rendered almost routine the process of developing models that “fit” a dataset extremely well. However, most such models fail to predict system behavior under new conditions. A commonly proposed solution to this conundrum is to acquire more and better data, but even the ability to obtain thousands of measurements of biochemical processes with single cell resolution has not solved the problem. This led Vanderbilt Basic Sciences investigator Gregor Neuert, his collaborator Brian Munsky (Colorado State University), and their laboratories to hypothesize that the problem lies not in the quality of the data, but in the models themselves. The premise of this hypothesis is that most modeling approaches are based on the assumption that the data, or the mean of the data, fit a normal (Gaussian) distribution. This assumption is not always true, however, particularly in the case of processes such as gene transcription, which are highly random and may vary widely from cell to cell. To test this hypothesis, the investigators obtained a dataset of the gene transcription response to a high salt environment by >65,000 individual yeast cells. They analyzed the data using four different models, three of which were based on the assumption that the data were normally distributed and one [called finite state projection (FSP)] of which was not. In every case, the researchers were able to develop a mathematical model that fit the data used for development. However, only the FSP model was able to predict properties, such as the average number of full length mRNAs generated per active transcription site, that had not been previously incorporated in the fitting process. The results clearly show that it is extremely important to verify that all assumptions made in the development of a mathematical approach be fulfilled by the data before model development commences. Otherwise, the resulting model will fail to achieve its primary purpose – to be able to predict complex biological behavior. The work is published in the journal Proceedings of the National Academy of Sciences U.S.A. [B. Munsky, et al., Proc. Natl. Acad. Sci. U.S.A., (2018) published online June 29, DOI: 10.1073/pnas.1804060115].

 

 

Explore Story Topics