Matrix Science header

Error tolerant searches
[Mascot results file module]

There are two types of error tolerant search and these are described at the Matrix Science website. In Mascot Parser documentation, these are referred to as the Error tolerant search and the Integrated error tolerant search. This section assumes familiarity with both types.

Original error tolerant search

In Mascot Server 1.8 and later, an error tolerant search may be run as a repeat search. In this case, one or more ACCESSIONs will have been specified, and the results file will just contain the error tolerant search results. The accessions in the search may be retrieved using ms_searchparams::getACCESSION().

In the peptide summary report, peptide matches are ignored if

Significance level is taken from the parameter ignoreIonsScoreBelow given to the ms_peptidesummary constructor. If this value is zero, then a default threshold of 1 in 20 is used.

The filename for the 'parent' non-error-tolerant search is saved in the parameters section as _errortolerantsearchparent, and can be accessed with ms_searchparams::getErrTolParentFilename(). It is not possible to use getINTERMEDIATE() because an error tolerant search can be performed as a repeat search from another error tolerant search. If MSRES_SHOW_ALL_FROM_ERR_TOL is not specified, and the file specified by _errortolerantsearchparent cannot be found, then the error ERR_NO_ERR_TOL_PARENT will be set, and the results will be shown as if MSRES_SHOW_ALL_FROM_ERR_TOL had been specified.

Always use standard protein grouping (MSRES_GROUP_PROTEINS) with manual error tolerant searches. If you enable protein clustering, the parent file is ignored and the results reported may be incorrect.

Integrated error tolerant search

In Mascot Server 2.2 and later, a single search can be performed which contains both the standard search results and the error tolerant search results. This is known as the integrated error tolerant search.

For an integrated error tolerant search, if MSRES_ERR_TOL is specified, then the results will just be taken from the error tolerant sections of the results file, and these results are handled in exactly the same way as for the Original error tolerant search. If MSRES_INTEGRATED_ERR_TOL is specifed, then the results will contain matches from both the standard and error tolerant sections. If neither of these flags are specified, then the results will just be derived from the standard peptides section.

When MSRES_INTEGRATED_ERR_TOL is specified, the results are first combined at the query level, so there will be up to 20 matches for each query. The methods ms_peptide::getRank() and ms_protein::getPeptideP() will therefore return a number in the range 1 to 20. This rank value can be used for any ms_peptidesummary method that requires a 'p' (rank) value. Use the method ms_peptide::getIsFromErrorTolerant() to find out if the results are from the standard results section or the error tolerant section.

Score thresholds and score filtering (Mascot Server 2.8 and later)

Error tolerant searches from Mascot Server 2.8 and later are subject to two kinds of filtering. If a query has a significant first pass match (score exceeds threshold), then ET matches in the query are suppressed. This is equivalent to only searching unassigned/unmatched queries in the error tolerant pass.

Secondly, ET matches are suppressed if they score lower than the rank 1 first pass match. The flag MSRES_SHOW_ALL_FROM_ERR_TOL can be used to override this behaviour.

First and second pass results may have different significance thresholds. If the search is a target-decoy search, and the constructor parameter TargetFDR is defined, Parser chooses independent first and second pass significance thresholds that yield the target FDR. If TargetFDR is not defined, or the search is not a target-decoy search, Parser uses the same significance threshold for both passes (by default, 0.05).

The score threshold for queries searched in the error tolerant pass is derived from the combined first and second pass identity threshold (number of trials, or qmatch). The expect value of an ET match is calculated in the same way as first pass matches. See ms_mascotresults::getPeptideExpectationValue().

Protein score is derived from the highest scoring non-error-tolerant match for each query, and this value can be found by calling ms_protein::getPeptideIonsScore().

Score thresholds and score filtering (Mascot Server 2.7 and earlier)

If the error tolerant search is from Mascot Server 2.7 or earlier, different rules apply.

An error tolerant match will be discarded if it has a score below the average identity threshold (getAvePeptideIdentityThreshold()) or below the maximum standard result for the query. This means that, in practice, it will be rare to get 20 matches for a particular query -- the requirement would be for all 10 error tolerant matches to be the top 10 scores, and all would need to be above the average peptide identity threshold. To show all matches, including those that would be discarded by default, the flag MSRES_SHOW_ALL_FROM_ERR_TOL needs to be used when constructing the ms_peptidesummary object.

The average identity threshold is calculated using the minProbability value passed to the ms_peptidesummary constructor. If a value of <= 0 or >= 0.1 is passed to the constructor, then a default of 1 in 20 is assumed for calculating the threshold. The flag MSRES_MAXHITS_OVERRIDES_MINPROB should normally be used so that the maxHitsToReport value is not overridden by the minProbability value.

Statistical significance of error tolerant matches is not defined, so ms_mascotresults::getPeptideExpectationValue() returns -1.

Useful functions for both type of search

To determine if a search is an error tolerant search, use ms_mascotresfile::isErrorTolerant().

The following functions can be used from both peptide and protein summary to get information about error tolerant modifications or residue substitutions.

See also ms_mascotresfile::getNumEtSeqsSearched().


Copyright © 2022 Matrix Science Ltd.  All Rights Reserved. Generated on Thu Mar 31 2022 01:12:30