Mascot 2.6 and later can search spectral libraries in parallel with a FASTA database. This page deals with opening spectral library search results using the standard Parser interface. If you need to read spectral library files (in MSP format), see Classes for reading and writing spectral libraries.
By default, Parser opens spectral library results in backwards-compatible mode, where spectral library matches are invisible. Enabling spectral library support in code using Parser is more involved than just setting a constructor flag. Please study the list of methods under Spectral library API summary. Many of the methods listed either take a new argument in Parser 2.6 to differentiate between FASTA and library matches, or they interpret existing arguments differently depending on mode.
For example, ms_peptidesummary::getPeptideThreshold() takes an optional rank parameter. You should always specify it, because Mascot and library matches have different score thresholds. If you don't give a rank to the method, it will choose the score threshold based on the rank 1 match.
To detect whether a results file contains spectral library matches, the easiest test is ms_mascotresfile::anyPeptideSummaryMatches(). If the test returns true for SEC_LIBRARYPEPTIDES, the results file has a non-empty section containing spectral library matches and the file can be opened as a spectral library search.
Three modes are available:
The helper function ms_mascotresfile::get_ms_mascotresults_params() does not set MSPEPSUM_SL_ONLY or MSPEPSUM_SL_INTEGRATED automatically. This means code written for Parser 2.5 and earlier always opens spectral library search results in FASTA-only mode.
Parser 2.6 and Mascot 2.6 do not support the following search types in combination with spectral libraries:
(*) Reporter quantitation may work; it depends on spectral library matches having the correct set of modifications (see How modifications are encoded). Other types of quantitation will not work with library matches.
The default value for
minProbability, 0.05, corresponds to a score threshold of 300 (see Library scores and thresholds). To convert a raw score to a value suitable as
minProbability, see ms_peptidesummary::getMinProbabilityForSLScore().
In Mascot 2.6, the spectral library search tool is the NIST MSPepSearch. Library scores range from 0 to 1000, with 0 meaning the observed spectrum and the library match are entirely dissimilar, and 1000 meaning they are identical. Library scores are obviously on a very different scale from Mascot scores, which have no particular upper limit (although a score of a couple hundred is extremely unusual). FASTA matches and library matches must have different score thresholds.
If a query contains both FASTA and library matches, and you open the file in integrated mode, there are three thresholds of interest:
The library score threshold is global -- it applies to all queries in the search -- whereas Mascot thresholds are specific to a query. However, in order to retain a compatible API, functions that return the identity threshold may return either the Mascot identity or the library score threshold, depending on function arguments. Functions that return the homology threshold will always return zero for spectral library matches.
For example, the following functions take a new argument that determines which threshold is of interest:
The expect value of a spectral library match is a function of its score and threshold, as usual.
For details on how the library score threshold is derived, see Advanced reading: calculating the spectral library score threshold.
Protein inference in SL-only or integrated mode uses the same rules as in FASTA-only mode and previous versions of Parser. The main differences are the source of protein data for spectral library matches, and the treatment of "ties" between otherwise equivalent accessions, e.g., when choosing a representative family member or sameset protein.
During a library search, a library match can be mapped to a FASTA accession, library accession or both. FASTA accession in this context means an accession from a FASTA database that is part of the same search. Library accession means an accession from the reference database of the spectral library, or an accession from the MSP file if the sequence was not found in the reference database; see the Mascot help for more information on reference databases.
The search can be opened in integrated or SL-only mode. In SL-only mode, spectral library matches appear only under library accessions. It is as if the FASTA database was not searched at all, so protein inference is the same as in a single database search.
In integrated mode, a spectral library match can be mapped to a FASTA accession or a library accession or both. There are therefore potentially many more sameset and subset proteins. The situation is more even complicated if the spectral library has a reference database that is not part of the search.
Here are some of the possible database, library and reference database combinations:
The string in parentheses (AA, SL, SLREF) is the database type. The numbers refer to the database number. This is normally in the interval 1..ms_searchparams::getNumberOfDatabases(), except that reference databases have index numbers after this range. It is possible for a FASTA database to be both part of the search and a spectral library's reference database.
Parser assigns reference accessions the database number of the relevant spectral library, so most of the time you do not need to worry about the difference. For example, in case C of two libraries with the same reference database, an accession from SwissProt (e.g., KPYK1_YEAST) can appear under either database number, "1::KPYK1_YEAST" and "2::KPYK2_YEAST". Matches from the first library searched appear under "1::KPYK1_YEAST", and matches from the second under "2::KPYK2_YEAST". (Protein inference may of course make one a subset of the other.) Database numbers above
ms_searchparams::getNumberOfDatabases() do not appear in ms_protein::getDB().
You can access the database type and the mapping between spectral library index and reference database index with ms_mascotresfile::getDatabaseType(), ms_mascotresfile::getReferenceDatabaseNumberOfSL() and ms_mascotresfile::getSLDatabaseNumbersOfReference().
Parser prioritises FASTA accessions over reference accessions and reference accessions over MSP accessions in protein inference. For example, if a FASTA accession and a reference accession have the same set of significant sequences, Parser will choose the FASTA accession as the representative protein and relegate the reference accession to sameset status. If a FASTA accession is the superset protein of a library accession (of either kind), the library accession is removed entirely. This prioritisation is necessary to simplify the output of protein inference and remove redundancies.
The mass and description of reference accessions are not saved in the results file. This means ms_mascotresults::getProteinDescription() returns the empty string for reference accessions. The description of an MSP accession may or may not be available, depending on what the MSP file contained at the time of the search.
When you need to look up a protein attribute like mass, description, pI or taxonomy information from an external source, you need to know what the correct source database is. If the protein comes from a FASTA file (database type AA or NA), the source database is the FASTA file. But if the protein comes from a spectral library (database type SL), you need to target the query at the reference database. You can find the reference database number with ms_mascotresfile::getReferenceDatabaseNumberOfSL() and the reference database name with ms_searchparams::getDB().
Modifications are encoded differently between FASTA matches and spectral library matches. In a FASTA-only search, you can specify both fixed and variable modifications, as well as have query-level modifications and modifications as part of a quantitation method. In an SL-only or integrated library search, none of these modifications are used for spectral library matches.
Spectral library matches can contain their own modifications, which you can access with ms_mascotresults::getLibraryModString(). The modifications originate from the spectral library entry. The results file contains a list of all possible library modifications in the spectral libraries searched; see ms_searchparams::getLibraryModName(). It is possible for fixed and variable modifications to have the same name as a library modification, and vice versa. Whether they are the same modification depends entirely on how the spectral library was created.
The results file contains a back reference for each spectral library match that allows you to look up the corresponding library entry from the spectral library file. See ms_peptidesummary::getLibraryEntryId() for more details.
For more details on spectral library specific behaviour, see the following classes and functions:
|Copyright © 2016 Matrix Science Ltd. All Rights Reserved. Generated on Fri Jun 2 2017 01:44:51|