UWEE Tech Report Series

Filtering tandem mass spectra for quality


UWEETR-2012-0001

Author(s):
Sergey Feldman, Barbara Frewen, Michael J. MacCoss, and Maya R. Gupta

Keywords:
random forests, mass spectrometry, feature selection

Abstract

Accurate protein and peptide identifications by database search depend on the quality of the mass spectrometer spectra. Excessive quantities of low quality spectra consume valuable computing resources and can decrease overall accuracy of peptide and protein identifications. We present a fast spectrum quality filter called French Press that can remove low quality spectra without database searching. The filter's speed is the result of a tuned random forest classifier and a greedily optimized classification feature subset, culled from features appearing in prior research on spectrum filtering and modeling. Results on diverse data sets of mass spectrometer runs show that the filter can remove roughly $50\%$ of low quality spectra while retaining $99\%$ of identifiable spectra.

Download the PDF version

Download the Gzipped Postscript version