Skip to main content

Optimization of search engines and post-processing approaches to maximize peptide/protein identification for high-resolution mass data.


AUTHORS

Tu C , Sheng Q , Li J , Ma D , Shen X , Wang X , Shyr Y , Yi Z , Qu J , . Journal of proteome research. 2015 9 21; ().
  • NIHMSID: 101128775

ABSTRACT

The two key steps to analyze proteomic data generated by high-resolution MS are database searching and post-processing. While the two steps are inter-related, studies on their combinatory effects and the optimization of these procedures have not been adequately conducted. Here we investigated the performance of three popular search engines: SEQUEST, Mascot and MS Amanda, in conjunction with five filtering approaches including respective score-based filtering, group-based approach, local-false-discovery-rate (LFDR), PeptideProphet, and Percolator. Eight datasets from various proteomes (e.g. E. coli, yeast and human) produced by various instruments with high-accuracy MS1 and high- or low-accuracy MS2 (LTQ-Orbitrap, Orbitrap-Velos, Orbitrap-Elite, Q-Exactive, Orbitrap-Fusion and Q-TOF) were analyzed. It was found combinations involving Percolator achieved markedly more peptide/protein identifications at the same FDR level than the other twelve combinations, for all datasets. Among these, combinations of SEQUEST-Percolator and MS Amanda-Percolator provided slightly better performances respectively for datasets with low-accuracy MS2 (ion trap, IT) and high accuracy MS2 (Orbitrap or TOF), than other methods. For approaches without Percolator, SEQUEST-Group performs the best for datasets with MS2 produced by collision-induced dissociation (CID) and IT analysis; Mascot-LFDR gives more identifications for dataset generated by higher-energy collisional dissociation and analyzed in Orbitrap (HCD-OT) and HCD-IT in Orbitrap Fusion; MS Amanda-Group excels for the datasets of Q-TOF dataset and Orbitrap Velos HCD-OT dataset. Therefore, if Percolator was not used, a specific combination should be applied for each type of datasets. Moreover, higher percentage of multiple-peptides proteins and lower variation of protein spectral counts were observed when analyzing technical replicates using Percolator-associated combinations; therefore Percolator enhanced the reliability for both identification and quantification. The analyses were performed using the specific programs embedded in Proteome Discoverer, Scaffold and an in-house algorithm BuildSummary. These results provide valuable guidelines for optimal interpretation of proteomic results and development of fit-for-purpose protocols under different situations.


The two key steps to analyze proteomic data generated by high-resolution MS are database searching and post-processing. While the two steps are inter-related, studies on their combinatory effects and the optimization of these procedures have not been adequately conducted. Here we investigated the performance of three popular search engines: SEQUEST, Mascot and MS Amanda, in conjunction with five filtering approaches including respective score-based filtering, group-based approach, local-false-discovery-rate (LFDR), PeptideProphet, and Percolator. Eight datasets from various proteomes (e.g. E. coli, yeast and human) produced by various instruments with high-accuracy MS1 and high- or low-accuracy MS2 (LTQ-Orbitrap, Orbitrap-Velos, Orbitrap-Elite, Q-Exactive, Orbitrap-Fusion and Q-TOF) were analyzed. It was found combinations involving Percolator achieved markedly more peptide/protein identifications at the same FDR level than the other twelve combinations, for all datasets. Among these, combinations of SEQUEST-Percolator and MS Amanda-Percolator provided slightly better performances respectively for datasets with low-accuracy MS2 (ion trap, IT) and high accuracy MS2 (Orbitrap or TOF), than other methods. For approaches without Percolator, SEQUEST-Group performs the best for datasets with MS2 produced by collision-induced dissociation (CID) and IT analysis; Mascot-LFDR gives more identifications for dataset generated by higher-energy collisional dissociation and analyzed in Orbitrap (HCD-OT) and HCD-IT in Orbitrap Fusion; MS Amanda-Group excels for the datasets of Q-TOF dataset and Orbitrap Velos HCD-OT dataset. Therefore, if Percolator was not used, a specific combination should be applied for each type of datasets. Moreover, higher percentage of multiple-peptides proteins and lower variation of protein spectral counts were observed when analyzing technical replicates using Percolator-associated combinations; therefore Percolator enhanced the reliability for both identification and quantification. The analyses were performed using the specific programs embedded in Proteome Discoverer, Scaffold and an in-house algorithm BuildSummary. These results provide valuable guidelines for optimal interpretation of proteomic results and development of fit-for-purpose protocols under different situations.


Tags: