Posted by Patrick Emery (January 24, 2022)

Default or prof_prof? Peak picking Thermo .RAW data with Mascot Distiller

We supply a number of processing options files for each of the main vendor raw file types with Mascot Distiller. These .opt files are designed as a reasonable starting point for peak picking your own data – but to get the very best you’ll need to tweak the parameters on a typical raw file from your instruments and then use those settings.

For Thermo .RAW files, we supply several sets of options, but the main two are:

default.ThermoXcalibur.opt
prof_prof.ThermoXcalibur.opt

Very common support questions we get asked are what are the differences between the options files and which of these are the best processing options to use?

Differences between the default and prof_prof processing options

The main point of difference between the two sets of processing options is that the default settings will take centroids for the MS/MS peaklists if the MS/MS scans in your ‘raw’ data are actually saved as centroids, whereas the prof_prof processing options will always uncentroid these types of scans, and then carry out it’s own peak picking. By doing this, the MS/MS peaklists generated using the prof_prof method will have additional information available, such as fragment ion charge states. These are required if you wish to carry out de novo sequencing in Mascot Distiller, and can be used to de-charge the peaklists (either in Distiller or by Mascot if you have version 2.7 or later) which is important if you are expecting charge states of greater than 2+, as you would if you were doing top- or middle- down experiments.

In both cases, Distiller will use profile data for peak detection in the MS scans, which is required if you want to carry out quantitation using survey scan based methods (e.g. intensity based Label-free, SILAC etc).

Therefore, the answer to the question of which are the best processing options to use depends on how your MS/MS scans are saved (as centroids or profile data), and on exactly what you’re trying to do with the data.

MS/MS scans saved as profile data

This is a simple case – if your MS/MS scans are saved as profile data you should use the prof_prof.ThermoXcalibur.opt settings as your starting point for peak picking. The processing time in this case is similar between the two sets of options, but you’ll get much better results using the prof_prof options. To illustrate this, we took a .RAW file which has the MS/MS scans saved as profile data, processed the file using Mascot Distiller with default.ThermoXcalibur.opt and with prof_prof.ThermoXcalibur.opt, then searched the generated peaklists using identical search settings. Results are summarised in table 1 below:

Processing options	#Sig. matches (1% FDR)	Processing time (HH:MM:SS)
default.ThermoXcalibur.opt	3820	00:01:45
prof_prof.ThermoXcalibur.opt	9329	00:03:18

Table 1: Comparison of search results and processing time using default and prof_prof options on a .RAW file with 42021 MS/MS scans which have been saved as profile data.

Although processing time has increased using prof_prof, peak picking was still very fast and we got much better results. So, if you’re saving your MS/MS scans as profile data we’d strongly recommend using the prof_prof.ThermoXcalibur.opt options file as the starting point for your peak picking.

MS/MS scans saved as centroids

In general, we’d expect peaklists generated using the prof_prof options to give slightly higher scores when searched in Mascot that those produced using the default options. This is because peak picking using the Distiller MDRO library should result in a cleaner peak list with fewer noise peaks. This improved signal to noise will give better Mascot scores for equivalent matches. However, because the centroids have to be uncentroided back into profile data to do this, it will come at the expense of increased processing time.

If you need the additional fragment ion peak information, such as the charge state for de novo searching or de-charging for higher charge states, the you have no choice but to uncentroid the MS/MS ‘raw’ data of course. For other use cases it’s likely to be a trade off between speed and results.

To examine this further we took four .RAW files from the same project in PRIDE, processed them using Mascot Distiller with either the default or prof_prof options, and then searched the generated peaklists using identical search settings. Results are summarised in table 2 below:

Processing options	#Sig. matches (1% FDR)	Average score of significant peptides	# peptides score >=70	Processing time (HH:MM:SS)
default.ThermoXcalibur.opt	15981	40	1192	00:04:58
prof_prof.ThermoXcalibur.opt	16173	42	1353	03:57:39

Table 2: Comparison of search results and processing time using default and prof_prof options on a set of 4 .RAW file with a total of 99745 MS/MS scans which have been saved as centroids.

As you can see, we are seeing an approximate 1% increase in the number of significant matches at a 1% PSM FDR, and the average score of significant peptide matches increases to 42 from 40 by using the prof_prof processing options instead of default.

If you look at the higher scoring peptide matches, we can see a more significant effect, with 1353 matches with a score of 70 or above from the prof_prof processed peaklists, compared with 1192 from the default processing options – an improvement of ~13.5%. The average identity threshold for current releases of NCBInr is 70, so if you are searching a very large database, that is a significant improvement.

However, this is achieved at a significant increase in the processing time, which increases from ~5 minutes using default.ThermoXcalibur.opt to ~4 hours using prof_prof.ThermoXcalibur.opt. That is a high price to pay for slightly improved coverage of the data.

The prof_prof.ThermoXcalibur.opt processing options uncentroid the MS/MS centroids at a resolution of 600 points per Da, which is a high resolution that we’ve found gives good search results, but which accounts for a significant proportion of the processing time when processing centroided MS/MS scans. Table 3 below shows the effect of decreasing the uncentroiding resolution to 400 and 200 points per Da respectively:

Uncentroiding points per Da	#Sig. matches (1% FDR)	Processing time (HH:MM:SS)
200	15895	00:52:50
400	15913	02:09:15

Table 3: Effects of changing the uncentroiding points per Da used by the prof_prof processing options on the number of significant matches and processing time

As you can see, reducing the uncentroiding points per Da does give us a significant improvement in processing speed, but at the cost of some significant matches, with the numbers of those falling to slightly below those obtained using the default processing options. However, if you needed the fragment charge state information for de novo sequencing, or for decharging the peaklists, the trade-off may be acceptable.

In general though, if you don’t require fragment ion charge state information, we’d recommend using the default.ThemoXcalibur.opt processing options with .RAW data files which have MS/MS scans saved as centroids. You’ll generally get good results from the peaklists if your data are good, and the processing time is significantly reduced.

Keywords: Mascot Distiller, peak picking, Thermo

Comments are closed.

Matrix Science

Default or prof_prof? Peak picking Thermo .RAW data with Mascot Distiller

Differences between the default and prof_prof processing options

MS/MS scans saved as profile data

MS/MS scans saved as centroids