Warning -- this is a complex issue that only applies to peptide summary reports, and only needs to be understood by advanced users. The default flag MSRES_DUPE_DEFAULT is suitable for most use cases.
In a typical results file derived from an LC-MS-MS data set, the same peptide will appear multiple times. In a perfect system, identical peptides would rarely be seen. Ideally, peptides would be separated perfectly by chromatography, and even if that failed, peak detection software would combine similar peptides before the data were submitted to Mascot Server.
Back to reality... The standard Mascot Server reports display most 'duplicate' peptides, but the scores are shown with brackets (parentheses) to indicate that these peptides don't affect the total protein score. Mascot Parser has full flexibility to customise the treatment of such 'duplicate' peptides.
In peptide summary reports, each candidate protein contains one or more peptides. Some of these peptides will be duplicates of others, and these generally do not add to the confidence of the protein match, and therefore can optionally be ignored.
There are four possible situations that will cause Mascot Parser to mark a peptide as a duplicate:
Same query number. For example, if a protein contains the two peptides
BACIDK, and query 3 matches both of these (possibly with different scores), then the two peptides are duplicates of each other. Alternatively, the same query could match the same peptide twice but with different modifications.
Same peptide sequence. The peptide matches may or may not be from the same query. Also, they may or may not have the same start and end position in case there are repeated peptides in the protein.
Same modifications. If the peptide sequences are different, then (to reduce permutations) the modifications are defined as being different.
There are 4 rules to cover the cases above, which generates 16 possibilities. However, because of the definitions described above, only 9 are possible combinations:
Use of the 'Rule ID' column is described below.
There are two flags that control how these rules are applied to the ms_mascotresults object:
Each protein initially contains all possible matching peptides. If this flag is given to the constructor, duplicate peptides are then removed according to the specified rules.
A removed peptide will never be considered to be part of that protein -- so, for example, it won't be included in the score. In rare cases, this will also affect how proteins are grouped together. (See Grouping proteins together). If no duplicates are ever to be removed, specify MSRES_DUPE_REMOVE_NONE.
When calculating the protein score, it may be desireable to include some duplicates in the score. If this flag is given to the constructor, duplicates matching the rule will be included in protein scoring. These flags apply to standard Mascot protein scoring and not to MudPIT scoring.
For the standard Mascot Server reports, no duplicates are included when calculating the protein score. (And duplicates are shown in brackets.) Any peptides removed by
MSRES_DUPE_REMOVE_[RULE_ID] have already been removed before protein scoring is performed, so it is pointless to try and override, for example, MSRES_DUPE_REMOVE_D by using MSRES_DUPE_INCL_IN_SCORE_D.
When displaying or storing a list of peptides, it may be desirable to inhibit the display of some duplicate peptides. ms_protein::getPeptideDuplicate() can be used to that end. Note that peptides that have been removed due the
MSRES_DUPE_REMOVE_[RULE_ID] setting need never be specifically inhibited by client code, since they will not be present anyway.
ms_peptidesummary::getProteinsWithThisPepMatch() will return a list of all proteins that contained this peptide. The list only includes peptides that were not discarded due to the
MSRES_DUPE_REMOVE_[RULE_ID] setting. There is currently no other override for this function.
For Mascot 1.9 and Mascot 2.0, the default for
MSRES_DUPE_REMOVE_A | MSRES_DUPE_REMOVE_D. This means that peptides with
are not included in any proteins.
No duplicates are ever added into the score, so
MSRES_DUPE_INCL_IN_SCORE_NONE needs to be specified.
MSRES_DUPE_DEFAULT is defined as
MSRES_DUPE_REMOVE_A | MSRES_DUPE_REMOVE_D | MSRES_DUPE_INCL_IN_SCORE_NONE.
This functionality was introduced in Mascot Parser 1.2. No changes are required to client code that was used before Mascot Parser 1.02:
MSRES_DUPE_REMOVE_[RULE_ID] flags are supplied, then the default
MSRES_DUPE_REMOVE_A | MSRES_DUPE_REMOVE_D is assumed. (In the unlikely event that it is required that no duplicates are ever to be removed, then
MSRES_DUPE_REMOVE_NONE must be specified.)
MSRES_DUPE_INCL_IN_SCORE_NONEneeds to be specified. However, since this is defined as '0', it does not need to be passed to the
Earlier client code would typically check the following before displaying peptides or adding them to a database:
prot->getPeptideDuplicate(i) != ms_protein::DUPE_DuplicateSameQuery
This test is still required since generally peptides with the same query, same sequence and different modifications are not shown in the report, but would be seen in the yellow popup.
Warning -- this is a complex issue that only applies to peptide summary reports, and only needs to be understood by advanced users. It is best to always specify MSPEPSUM_REMOVE_CHIMERIC_DUPES or always use ms_mascotresfile::get_ms_mascotresults_params().
Mascot 2.5 added support for chimeric spectra. A chimeric spectrum contains MS/MS data from multiple precursors. Mascot divides each input spectrum with multiple precursor masses into a set of subsidiary queries, linked by the source index of the original spectrum. Each subsidiary query is matched separately, so that the number of output queries is the total number of precursor masses in the input file.
The subsidiary queries in each set can have duplicate matches, called chimeric duplicates. This is easiest to explain by example. Suppose we have a chimeric spectrum with two precursors, forming queries 1 and 2:
The two queries originate from the same chimeric spectrum and have nearly the same precursor masses. (The precursor mass tolerance is quite wide in this example.) As a result, both queries match the same peptide sequences. However, all but the rank 1 match in the first query have a much larger mass delta than in query 2. This means the rank 2-5 matches in query 1 are all chimeric duplicates. Conversely, the rank 1 match in query 2 is a duplicate.
Parser removes chimeric duplicates from consideration if you open the results file as a peptide summary (ms_peptidesummary) and use the flag MSPEPSUM_REMOVE_CHIMERIC_DUPES. In the above case, you would only see one match in query 1 and four in query 2. Ranks are renumbered accordingly (e.g. query 2 rank 2 becomes query 2 rank 1 when the chimeric duplicate is removed). Chimeric duplicate removal is done at query level before protein grouping and before considering peptide match duplicates within protein hits.
Chimeric duplicates are removed by default if you use ms_mascotresfile::get_ms_mascotresults_params() to construct the default flags to
ms_peptidesummary. Otherwise you need to specify the flag yourself when creating the peptide summary object.
Note that the flag has no effect if the results file was created by Mascot 2.4 or earlier, or if the results file has no chimeric spectra.
|Copyright © 2016 Matrix Science Ltd. All Rights Reserved. Generated on Fri Jun 2 2017 01:44:51|