An important prerequisite for the development and benchmarking of novel analysis methods is a well-designed comprehensive LC-MS/MS data set. Here, we present our data set consisting of 59 LC-MS/MS analyses of 50 protein samples extracted individually from Escherichia coli K12 and spiked with different concentrations of bovine carbonic anhydrase II and/or chicken ovalbumin, according to a 2 × 3 full factorial design. Using the well-annotated and commonly used E. coli proteome as the sample background ensures that the complexity of the data is on a par with most current proteomic analyses. Data were acquired over a 2-month period using multiple reversed-phase columns and instrument calibrations to include real-life challenges faced when analyzing large proteomics data sets. Moreover, so-called "ground truth" data, comprised by LC-MS/MS measurements of the pure spikes are included in the data set. The current manuscript elaborates this comprehensive benchmark data set for future development and evaluation of analysis methods and software.
© 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.