Peak list arcana: Area vs. signal to noise
A recent paper from Hijazi et al. [1] raised some interesting points about how a peaklists file can differ from the appearance of the underlying raw file, and specifically raised some differences with the peaklists generated by Mascot Distiller if you output the signal to noise (S/N) as fragment ion intensity instead of the default, which is the peak area.
In the paper, the authors are looking at data produced from highly modified histone peptides. These modifications produce low m/z, high intensity diagnostic ions which can help distinguish between different isobaric modification combinations as outlined in table 1 below:
| Lysine Modification | Mass (Diagnostic ion) | Mass (Immonium ion) |
|---|---|---|
| Non-modified | 84.0813 | 101.1079 |
| Methyl | 98.097 | 115.1236 |
| Formyl | 112.0757 | 129.1023 |
| Acetyl | 126.0917 | 143.1183 |
| Propionyl | 140.1075 | 157.1341 |
| Butyryl | 154.1226 | 171.1492 |
| Crotonyl | 152.107 | 169.1336 |
| Hydroxybutyryl | 170.1176 | 187.1442 |
| Lactyl | 156.1019 | 173.1285 |
Table 1:Diagnostic and immonium ion masses for various common histone lysine modifications (adapted from Hseiky et al. [2])
Their raw data are saved as profile for both the survey and MS/MS scans. When the peaklists generated by Mascot Distiller were exported with the default setting of fragment ion areas as the peak intensity value, Hijazi et al. et al found that the intensity of the diagnostic ions was apparently supressed when compared with the raw data, while the higher mass fragments were closer to the raw trace. If they exported using the alternative setting of S/N this was reversed – the lower mass diagnostic ions were now reported with high intensity values while the higher mass fragment ions appeared to be suppressed. An example of this is shown in figure 1 below:
Figure 1: Comparison of the same spectrum output with peak area or S/N output as the fragment ion intensity value
Possible diagnostic peaks for Methyl, Acetyl and Propionyl can be clearly seen in the spectrum using S/N, but which appear far less intense in the spectrum using peak area. In the higher mass region the fragment ions appear much more intense in spectrum using peak area compared with the spectrum using S/N.
The reason for this is because these lower mass diagnostic ions, while having a high maximum intensity, are in fact very narrow; meaning that these peaks have quite a limited amount of total signal (area) but very good S/N. At the higher mass end, the situation is reversed as the fragment ions are, on average, wider but with a lower maximum intensity.
One question is, what difference does this make to the Mascot search results? When matching a spectrum, Mascot splits the peaklist into 100Da bins and selects up to the top ten most intense peaks in each bin. Changing the peaklist from area to S/N therefore can change either the peaks that are selected, or the order in which they are picked. There is also a factor in the scoring which accounts for unmatched intensity, and again this could be impacted by changing between the two intensity options.
To test this, we downloaded one of the files used in the paper from PXD057347, processed it in Mascot Distiller 2.8.5.1 using the shipped prof_prof.ThermoXcalibur.opt options and searched against a histone database, once using S/N as the fragment ion intensity values, once using the area. Results are summarised in table 2 below:
| Intensity value type | Significant PSMs (1% FDR) | Protein score for Histone H3.3-like protein |
|---|---|---|
| Signal to noise | 520 | 5827 |
| Peak area | 574 | 6722 |
Table 2: Comparison between using signal to noise and total peak area as the fragment intensity value in a Mascot search.
Using peak area as the intensity value in this case gives us better results. This is because using S/N means we have apparently high intensity peaks in the low m/z region; these peaks are not matched by Mascot as they’re not part of either the b- or y- series being used for the peptide sequence match. These same ions are much lower intensity if you use the peak area instead. These peaks will dominate the low m/z bins in the S/N case, and since they’re not matched the penalty term Mascot applies for unmatched intensity will be higher than for the area case. This can be seen in the example matches in figure 1 – using S/N the match gets a score of 23 while using peak area gives us a score of 36, despite the same fragment ions being used to get the match.
So, for these data, using peak area gives the most accurate representation of the actual signal in the spectra, leading to better Mascot results. However, it does make the diagnostic ions for the various modifications much harder to spot on the peptide view report, so you may wish to use S/N and accept the slight reduction in the quality of the search results. Another option would be to pre-process the peaklists prior to searching to add text to the query title to highlight which diagnostic ions are present in the spectrum. This could be done as a script and run in Mascot Daemon before the search is submitted. For a similar approach, see this blog article.
References:
- DOI: 10.1021/acs.jproteome.4c01056, Des: Hijazi H, Manessier J, Brugiere S, Ravnsborg T, Courçon M, Brule B, Merienne K, Jensen ON, Hesse AM, Bruley C, Pflieger D. Mind Your Spectra: Points to be Aware of When Validating the Identification of Isobaric Histone Peptidoforms. J Proteome Res. 2025
- Hseiky, A.; Crespo, M.; Kieffer-Jaquinod, S.; Fenaille, F.; Pflieger, D. Small Mass but Strong Information: Diagnostic Ions Provide Crucial Clues to Correctly Identify Histone Lysine Modifications. Proteomes 2021, 9, 18. https://doi.org/10.3390/proteomes9020018
Keywords: Mascot Distiller, peak picking, site analysis