Posted by Patrick Emery (January 21, 2021)

NIST Human HCD Spectral Libraries

Mascot 2.6 and later can search spectral libraries using the MSPepSearch spectral library search engine from the US National Institute of Standards and Technology (NIST). Spectral libraries can be searched alongside FASTA sequence databases to give an integrated report, and you can easily generate spectral libraries from your own search results.

When we introduced spectral library searches in Mascot 2.6, we also added a number of predefined definitions for Database Manager for some freely available and commonly used spectral libraries, including libraries from NIST and the European Bioinformatics Institute (EBI). One of these was the “NIST_Human_HCD” library, a consensus library derived from over 10,000 raw data files. The current definition of the library in Mascot is for the release of the library from 2016/05/03. However, NIST updated the Human HCD library in May of 2020. With this change, the library was updated and spectra split by quality into 3 separate libraries:

LibraryDescription
human_hcd_tryp_best high-quality spectra, mostly tryptic peptides without missed cleavages
human_hcd_tryp_good medium-quality spectra, mostly tryptic peptides with missed cleavages
human_hcd_semitryp high- and medium-quality spectra, mostly semi-tryptic peptides
Table 1: Descriptions of the 3 NIST Human HCD consensus spectral libraries released in May 2020

According to the information on the NIST website, “the new libraries of consensus spectra contain 86% more peptides than the previous version with 4-15% increase in the number of positive IDs returned for typical samples at FDR=0.01″.

To confirm whether or not we see the expected improvement from the updated libraries we carried out a search of the iPRG2016 dataset against the 2016 NIST_Human_HCD, separately against each of the newer libraries, and finally a combined search against all three of the updated libraries. We then took the number of Peptide-spectrum matches and peptide-sequence matches at the default score threshold of 300. Results are summarised in table 2 below:

LibraryNo. significant PSMsNo. significant sequences
Human_HCD 20162883505
human_hcd_tryp_best3421513
human_hcd_tryp_good805140
human_hcd_semitryp850216
human_hcd_tryp_best+human_hcd_tryp_good+human_hcd_semitryp5056829
Table 2: The number of peptide-spectrum and peptide-sequence matches with scores greater than the default threshold of 300 from the iPRG2016 dataset searched against each spectral library separately and then against all three of the 2020 library releases.

From these results, we can see that by searching just the updated ‘Best’ library gave us an additional 538 PSMs, but only 8 new sequences over the 2016 library, so there is a strong overlap in the results between those two searches, and presumably in the peptides represented in the two libraries. The average quality of the spectra in the ‘Best’ library does appear to exceed that of the original 2016 release though. We can also see that there is very little overlap in the peptide sequences identified by the searches against the ‘Best’, ‘Good’ and ‘Semi-tryptic’ spectral libraries. By searching all three of the updated libraries, we can gain an additional 2173 PSMs and 324 peptide sequences.

We have updated the NIST_Human_HCD definition to download the human_hcd_tryp_best library and added new definitions for NIST_Human_HCD_2_good and NIST_Human_HCD_3_semitryp. If you have enabled NIST_Human_HCD in Database Manager, updating is as simple as clicking the Update or Get New Files button. If you are using Mascot 2.6, there is a known issue with modification handling for the updated libraries and we would recommend that users updated to Mascot 2.7 if they wish to use NIST libraries.

Keywords: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.