Matrix Science header

Target-decoy searches and false discovery rate
[Mascot results file module]

Decoy search results

In Mascot Server 2.2 and later, any search can be run against a decoy database generated on the fly. Each sequence in the database is randomised or reversed and then searched along with the original sequence. Results for matches against the original sequence are stored as normal, while results against the randomised sequence are saved in the decoy_summary, decoy_mixture and decoy_peptides sections. The decoy algorithm type is saved in the header section and can be retrieved with ms_mascotresfile::getDecoyTypeForDB.

It is easy to switch between the decoy results and the standard results. To view the decoy results, specify MSRES_DECOY when creating the ms_peptidesummary or ms_proteinsummary object.

The functions anyPeptideSummaryMatches() and getNumHits() have an optional section name parameter so that these can be used with decoy or error tolerant results.

Calculating the false discovery rate (FDR)

The false discovery rate is an estimate of the proportion of false discoveries (false positives) among all significant matches. The number of hits in the decoy database provides the estimated number of false positives. The FDR is then (number of decoy hits) / (number of target hits).

The following methods return the number of hits at different thresholds:

For a peptide summary, the number of hits is the number of peptide matches, while for a protein summary, the number of hits is the number of protein matches.

You can use ms_searchparams::getDECOY() to decide whether to display false discovery rates.

Automatically adjusting the significance threshold

You can use ms_mascotresults_params::setTargetFDR() to set a target FDR. Parser will choose a significance threshold that yields an FDR as close as possible to the target. The chosen significance threshold is available through ms_mascotresults::getProbabilityThreshold().

If the search is an error tolerant target-decoy search, the target FDR applies to both first and second pass matches; see Score thresholds and score filtering (Mascot Server 2.8 and later).

Another procedure available in Parser 2.7 and earlier was to use ms_mascotresults::getThresholdForFDRAboveIdentity or ms_mascotresults::getThresholdForFDRAboveHomology to automatically determine a significance threshold that yields the desired target FDR. This procedure is no longer recommended, because it requires creating an auxiliary ms_peptidesummary object.

  1. Create ms_mascotresfile as usual; call it resfile.
  2. Create ms_peptidesummary as usual; call it pepsum. The minProbability parameter can be the default (0.05).
  3. Decide on a target FDR, say 0.01, and get new minProbability with ms_mascotresults::getThresholdForFDRAboveHomology. You may want to check the return value and closestFDR, since it's not always possible to find a suitable FDR, especially in small searches.
  4. Create a new ms_peptidesummary object using the same resfile but new minProbability. The rest of the parameters can be the same as before. You don't need to keep the original pepsum object in memory. It's only needed for finding the desired significance threshold.

Note that the FDR is a global FDR and it applies to the whole search.


Copyright © 2022 Matrix Science Ltd.  All Rights Reserved. Generated on Thu Mar 31 2022 01:12:30