Matrix Science header
Public Types | Public Member Functions | Friends

ms_protein Class Reference
[Mascot results file module]

This class encapsulates a protein in the mascot results file. More...

#include <ms_mascotresprotein.hpp>

Collaboration diagram for ms_protein:
Collaboration graph
[legend]

List of all members.

Public Types

enum  DISTINCT_PEPTIDE_FLAGS {
  DPF_SEQUENCE = 0x0001,
  DPF_CHARGE = 0x0002,
  DPF_MODS = 0x0004,
  DPF_UNIQUE = 0x0008,
  DPF_NODUPSAMEQUERY = 0x0010
}
 

Enum for getNumDistinctPeptides().

More...
enum  DUPLICATE {
  DUPE_NotDuplicate,
  DUPE_Duplicate,
  DUPE_DuplicateSameQuery,
  DUPE_HighestScoringDuplicate,
  DUPE_Ignored
}
 

Enum for the each peptide in the protein to indicate if it is a duplicate.

More...
enum  GROUP {
  GROUP_UNKNOWN,
  GROUP_NO,
  GROUP_SUBSET,
  GROUP_COMPLETE,
  GROUP_FAMILY
}
 

Enum to say if a protein is similar to another higher scoring protein.

More...
enum  MASS_FLAGS {
  MASS_NON_SELECT_NON_MATCH = 0x0001,
  MASS_SELECT_NON_MATCH = 0x0010,
  MASS_NON_SELECT_MATCH = 0x0100,
  MASS_SELECT_MATCH = 0x1000
}
 

enum for each protein to specify what masses to select.

More...

Public Member Functions

 ms_protein (const double score, const std::string accession, const bool updateScoreFromPepScores, const int proteinSummaryHit=0)
 Constructors - used from ms_proteinsummary and ms_peptidesummary.
 ms_protein (const ms_protein &src)
 Copying constructor.
 ~ms_protein ()
 Destructor - called automatically - don't call explicitly from Perl or Java.
bool anyBoldRedPeptides (const ms_mascotresults &results) const
 Returns true if any of the peptides in the match were top scoring and not seen before.
bool anyMatchToQuery (const int query) const
 See if any match to this query.
bool anyMatchToQueryAndP (const int query, const int P) const
 See if any match to this query and 'P' (rank / hit).
void copyFrom (const ms_protein *src)
 Copies all content from another instance of the class.
std::string getAccession () const
 Return the accession string for a protein.
const ms_proteingetComponent (const int componentNumber) const
 For UniGene and PMF mixture return the 'component' protein.
long getCoverage () const
 Return the number of residues covered.
int getDB () const
 Return the index of the database where the sequence is found.
ms_peptide getDistinctPeptide (int distinctIndex, int repeatIndex=1, bool aboveThreshold=false, DISTINCT_PEPTIDE_FLAGS flags=DPF_SEQUENCE) const
 Return the peptide repeat of the distinct peptide in the protein's peptide matches.
int getFrame () const
 Returns the frame number for the protein.
GROUP getGrouping () const
 Returns a flag which shows if this protein only contain the same peptides as those in another protein.
int getHitNumber () const
 Returns the hit number in the results list.
void getIgnoredQPs (std::vector< int > &q, std::vector< int > &p) const
 Return a list of queries and ranks that would have been part of this protein hit had they not been removed by IgnoreIonsScoreBelow.
int getLongestPeptideLen () const
 Return the length (in residues) of the longest peptide in the protein.
int getLongestSigPeptideLen () const
 Return the length (in residues) of the longest significant peptide in the protein.
std::string getMasses (ms_mascotresfile &resfile, const ms_proteinsummary &summary, const unsigned int flags=MASS_SELECT_MATCH, const int numDecimalPlaces=2) const
 Return a list of comma separated experimental masses according to a specified filter.
int getMemberNumber () const
 Returns the member number within a family in the results list.
double getNonMudpitScore () const
 Will only return a different score from getScore() if the MSRES_MUDPIT_PROTEIN_SCORE flag has been specified.
int getNumComponents () const
 For UniGene and PMF mixture, return number of 'component' proteins.
int getNumDisplayPeptides (bool aboveThreshold=false) const
 Return the number of peptides excluding those that with duplicate matches to same query.
int getNumDistinctPeptideRepeats (int distinctIndex, bool aboveThreshold=false, DISTINCT_PEPTIDE_FLAGS flags=DPF_SEQUENCE) const
 Return the number of repeats of the distinct peptide in the protein's peptide matches.
int getNumDistinctPeptides (bool aboveThreshold=false, DISTINCT_PEPTIDE_FLAGS flags=DPF_SEQUENCE) const
 Return the number of distinct peptides in the protein sequence.
int getNumObservedForEmPAI () const
 Return the number of peptides 'observed' for emPAI quantitation calculation.
int getNumPeptides () const
 Return the number of peptides that had a match in this protein.
int getPepNumber (const int q, const int p) const
 Return the pepNumber given query and rank.
int getPeptideComponentID (const int pepNumber) const
 Returns 0 except for a UniGene entry or a PMF mixture entry.
DUPLICATE getPeptideDuplicate (const int pepNumber) const
 Return the DUPLICATE status given the peptide 'number'.
long getPeptideEnd (const int pepNumber) const
 Return the peptide end residue given the peptide 'number'.
int getPeptideFrame (const int pepNumber) const
 Return the frame number given the peptide 'number'.
double getPeptideIonsScore (const int pepNumber) const
 Return the ions score within this protein context given the peptide 'number'.
bool getPeptideIsBold (const int pepNumber) const
 Returns true if this peptide should be displayed in bold in a Mascot report.
long getPeptideMultiplicity (const int pepNumber) const
 Return the number of precursor matches in this protein for the specified peptide 'number'.
int getPeptideP (const int pepNumber) const
 Return the 'rank' number given the peptide 'number'.
int getPeptideQuery (const int pepNumber) const
 Return the query number given the peptide 'number'.
char getPeptideResidueAfter (const int pepNumber) const
 Returns the residue immediately after the peptide.
char getPeptideResidueBefore (const int pepNumber) const
 Returns the residue immediately before the peptide.
bool getPeptideShowCheckbox (const int pepNumber) const
 Returns true if a check box for repeat searches should be shown in a Mascot report.
long getPeptideStart (const int pepNumber) const
 Return the peptide start residue given the peptide 'number'.
int getProteinSummaryHit () const
 For a protein from the protein summary only.
double getRMSDeltas (const ms_mascotresults &results) const
 Return the RMS value of the deltas between the calculated and experimental value.
double getScore () const
 Return the protein score for this protein.
double getScoreWithET () const
 Return the protein score including ET matches for this protein.
int getSimilarProteinDB () const
 Return the database index of a protein that contains the same set (or a superset of) of the peptides in this protein.
std::string getSimilarProteinName () const
 Return the accession of a protein that contains the same set (or a superset of) of the peptides in this protein.
int getSimilarProteins (std::vector< std::string > &accessions, std::vector< int > &dbIdxs) const
 Return a list of proteins that that contains the same set (or a superset of) of the peptides in this protein.
std::string getUnmatchedMasses (ms_mascotresfile &resfile, const int numDecimalPlaces=2) const
 Return a list of comma separated experimental masses that don't match.
bool isASimilarProtein (const ms_protein *prot, const ms_mascotresults *results, const bool groupByQueryNumber=false)
 Find a protein in the results.
bool isPMFMixture () const
 Returns true if the 'protein' is actually a PMF mixture.
bool isSimilarProtein (const std::string &acc, const int dbIdx) const
 Returns true if the specified protein has the sameset or a superset of peptides that this protein has.
bool isUnigene () const
 Returns true if the 'protein' is actually a UniGene entry.
ms_proteinoperator= (const ms_protein &right)
 C++ assignment operator.
void setDB (int dbIdx)
 Set database index.
void setPeptideIsBold (const int pepNumber)
void setPeptideShowCheckbox (const int pepNumber)
void sortPeptides (const ms_mascotresults &results, bool keepAlive=false, int keepAlivePercent=0, const char *keepAliveAccession="", int keepAliveCount=0)
 Sorts the peptides into ascending query number.

Friends

bool operator< (const ms_protein &lhs, const ms_protein &rhs)
 Protein objects perform a simple sort of themselves by database ID and then accession.

Detailed Description

This class encapsulates a protein in the mascot results file.

Pointers to ms_protein objects are returned from ms_peptidesummary::getHit() or ms_proteinsummary::getHit(), so there should be no need to create one of these from outside the library.

Examples:

peptide_list.cpp, repeat_search.cpp, and resfile_summary.cpp.


Member Enumeration Documentation

Enum for getNumDistinctPeptides().

See Using enumerated values in Perl, Java, Python and C#.

There are several possible defintions for 'distinct'!

One of more of these flags can be combined using a bitwise 'OR' operator to determine which peptide matches are treated as distinct matches when counting up matches. Imagine a protein that has the following matches

  • AGCMK - Charge state 2
  • AGCMK - Charge state 3
  • AGCMK - Charge state 2, Oxidised methionine
  • HSMTMR - Charge state 2
  • HSMTMR - Charge state 2
  • HSMTMR - Charge state 2, Oxidised methionine

In this case:

For completeness, getNumDisplayPeptides() will return a count of 6 and getNumPeptides() could return a count of 6 or could return 7 if HSM*TMR and HSMTM*R (where the asterisk indicates the oxidised methioine) both appear in the top 10 matches to the final query. (Some of these functions apply a threshold to the match scores, so this example assumes either no threshold is used or all matches are above threshold.)

A further complication is uniqueness within the whole database search. A peptide sequence that is distinct in one protein hit may also appear in another protein hit in the search. If you specify the flag DPF_UNIQUE, then only peptide matches that are unique within the whole search are counted, subject to the other flags. For example, if DPF_SEQUENCE .OR. DPF_UNIQUE is specified in the example above, then getNumDistinctPeptides() may return 0, 1 or 2, depending on which other protein hits contain the distinct peptide sequences assigned to the current protein hit.

The MCP guidelines require a count of "the total number of peptides assigned to the protein. To compute this number, multiple matches to peptides with the same primary sequence count as one, even if they represent different charge states or modification states". Specify DPF_SEQUENCE by itself to obtain this value.

A flags value that does not include DPF_SEQUENCE is unlikely to give a useful return value from getNumDistinctPeptides().

Enumerator:
DPF_SEQUENCE 

Peptide matches must have different primary sequences to be counted as distinct matches.

DPF_CHARGE 

Peptide matches must have different charge states to be counted as distinct matches.

DPF_MODS 

Peptide matches must have different modification states to be counted as distinct matches.

DPF_UNIQUE 

Peptide matches must be unique in the whole search to be counted as distinct matches.

DPF_NODUPSAMEQUERY 

Duplicate peptide matches from the same query should be excluded (see DUPE_DuplicateSameQuery ).

enum DUPLICATE

Enum for the each peptide in the protein to indicate if it is a duplicate.

See Using enumerated values in Perl, Java, Python and C#.

A protein match is made up of one or more peptides. Duplicate peptides don't increase the coverage of the protein. They also do not increase the score except for MudPIT scoring.

Enumerator:
DUPE_NotDuplicate 

There are no other peptides with the same sequence in this protein - from this query or other queries.

DUPE_Duplicate 

Another peptide from a different query with the same sequence as this got a higher score.

DUPE_DuplicateSameQuery 

Another match for the same query with the same peptide string got a higher score (different mods).

DUPE_HighestScoringDuplicate 

There is at least one other peptide the same as this with a lower score.

DUPE_Ignored 

The peptide match was ignored due to IgnoreIonsScoreBelow.

enum GROUP

Enum to say if a protein is similar to another higher scoring protein.

See Grouping proteins together and Using enumerated values in Perl, Java, Python and C#.

Note that if there are say 3 proteins with the same 4 peptide matches, then the highest scoring protein will have GROUP_NO, and the other two will have GROUP_COMPLETE. Calling getSimilarProteinName() on the highest scoring protein will return an empty string. Calling it for the other two proteins will return the accession for the highest scoring protein.

See also:
ms_mascotresults::getNextSimilarProtein(), ms_mascotresults::getNextSubsetProteinOf()
Enumerator:
GROUP_UNKNOWN 

No information about grouping.

GROUP_NO 

Does not contain same set (or subset) of peptides as another proteins. A 'lead' protein.

GROUP_SUBSET 

Contains a subset of peptides in one ore more other proteins.

GROUP_COMPLETE 

Contains an identical set of peptides to one or more other proteins.

GROUP_FAMILY 

Second or subsequent family member when using new family grouping introduced in Mascot 2.3.

enum MASS_FLAGS

enum for each protein to specify what masses to select.

See Using enumerated values in Perl, Java, Python and C#.

Only a subset of all masses is used for scoring proteins. However, all matching masses are usually reported for each protein. Using these flags one can specify more precisely what sub-set of masses one is interested in. The flags can be combined with binary OR ("|"-operator in C++).

Enumerator:
MASS_NON_SELECT_NON_MATCH 

Only masses that are not selected and couldn't match.

MASS_SELECT_NON_MATCH 

Only masses that are selected but do not match.

MASS_NON_SELECT_MATCH 

Only masses that are not selected but otherwise would match.

MASS_SELECT_MATCH 

Only masses that are selected and do match.


Constructor & Destructor Documentation

ms_protein ( const ms_protein src )

Copying constructor.

Calling this function ensures that all the data is loaded into memory in the case where ms_peptidesummary::MSPEPSUM_DISCARD_RELOADABLE is specified.

Parameters:
srcis the ms_protein object that will be copied.

Member Function Documentation

bool anyMatchToQuery ( const int  query ) const

See if any match to this query.

This could be useful if you need to give a list of unmatched queries for a given protein.

Parameters:
queryQuery number
Returns:
true if any match was found, false otherwise
bool anyMatchToQueryAndP ( const int  query,
const int  P 
) const

See if any match to this query and 'P' (rank / hit).

This is useful for finding if this protein matched an identical peptide to another protein.

Parameters:
queryquery number
Prank number
Returns:
true if any match was found, false otherwise
void copyFrom ( const ms_protein src )

Copies all content from another instance of the class.

Calling this function ensures that all the data is loaded into memory in the case where ms_peptidesummary::MSPEPSUM_DISCARD_RELOADABLE is specified.

Parameters:
srcis a pointer to the source object
std::string getAccession (  ) const

Return the accession string for a protein.

This will always be available, for every protein.

Returns:
accession string of the protein
Examples:
resfile_summary.cpp.
const ms_protein * getComponent ( const int  componentNumber ) const

For UniGene and PMF mixture return the 'component' protein.

For Unigene and PMF mixture, each 'protein' is made up of a number of components. Call this method to get the protein 'components' that were used to make up this 'pseudo' protein.

See Peptide mass fingerprint mixtures and Maintaining object references: two rules of thumb.

Parameters:
componentNumbermust be in the range 1..getNumComponents(), or a null value will be returned. No error message is generated for an out of range call.
Returns:
A protein object if the componentNumber is valid, or a null value if there is an error.
long getCoverage (  ) const

Return the number of residues covered.

If two peptides overlap, then the overlapped ones are only counted once. Getting the coverage as a percentage is not possible from the results file because the length of the protein is not stored in the file. An approximate value could be calculated using ms_mascotresults::getProteinMass() and dividing by 110.

This function currently returns '0' for a PMF mixture and for a UniGene entry.

Returns:
coverage of the protein
Examples:
resfile_summary.cpp.
int getDB (  ) const

Return the index of the database where the sequence is found.

Use ms_searchparams::getDB() to retrieve the database name using the index returned by this function.

Returns:
A number in the range 1..ms_searchparams::getNumberOfDatabases(), except for a UniGene or protein mixture where the returned number will be zero.
Examples:
resfile_summary.cpp.
ms_peptide getDistinctPeptide ( int  distinctIndex,
int  repeatIndex = 1,
bool  aboveThreshold = false,
DISTINCT_PEPTIDE_FLAGS  flags = DPF_SEQUENCE 
) const

Return the peptide repeat of the distinct peptide in the protein's peptide matches.

getNumDistinctPeptides() and getNumDistinctPeptideRepeats() indicate how many distinct peptides the protein has and how often they recur.

The individual ocurrences of the peptide are sorted by their ions scores in decreasing order. The first one (repeatIndex = 1) is the occurrence with the highest score.

When determining the valid ranges for the indexes, the same values should be passed for the aboveThreshold and flags parameters.

This can be used to retrieve the peptides to construct a tree as seen on the Mascot Search Results report or the Mascot Distiller protein tab.

To create a tree that matches the Mascot Search Results report, set aboveThreshold to false and use flags DPF_SEQUENCE, DPF_CHARGE, DPF_MODS and DPF_NODUPSAMEQUERY.

To create a tree that matches the Mascot Distiller protein tab, set aboveThreshold to false and use flags DPF_SEQUENCE, DPF_MODS and DPF_NODUPSAMEQUERY.

The distinct peptides are ordered into increasing molecular weight (see ms_peptide::getMrCalc()). The lowest Mr(calc) is the first peptide (distinctIndex = 1).

The repeats are ordered into decreasing score (see ms_peptide::getIonsScore()). The highest score is always the first repeat (repeatIndex = 1).

Parameters:
distinctIndexIndex into a list of distinct peptides, 1..getNumDistinctPeptides().
repeatIndexIndex into a list of repeats of the distinct peptide, 1..getNumDistinctPeptideRepeats(). Set this to 1 to get the primary (highest score) instance of the distinct peptide.
aboveThresholdIf true, the function will only count the number of peptides above the threshold. The threshold used will be ms_mascotresults::getPeptideIdentityThreshold() unless ms_peptidesummary::MSPEPSUM_USE_HOMOLOGY_THRESH was specified in the ms_peptidesummary constructor in which case the threshold is returned by ms_peptidesummary::getHomologyThreshold().
flags- see ms_protein::DISTINCT_PEPTIDE_FLAGS for details.
Returns:
The chosen repeat of the chosen distinct peptide.
int getFrame (  ) const

Returns the frame number for the protein.

A value of -1 will be returned if the peptides come from different frames. For a protein database, a value of zero will be returned. Frames 1 to 3 are the 'forward' strand, and 4 to 6 are the 'reverse' strand.

Returns:
frame of the protein
Examples:
resfile_summary.cpp.
ms_protein::GROUP getGrouping (  ) const

Returns a flag which shows if this protein only contain the same peptides as those in another protein.

See Grouping proteins together.

Returns:
The group flag.
int getHitNumber (  ) const

Returns the hit number in the results list.

For a protein that is a subset, the value returned is the hit number of the 'main' protein.

Returns:
hit number of the protein
void getIgnoredQPs ( std::vector< int > &  q,
std::vector< int > &  p 
) const

Return a list of queries and ranks that would have been part of this protein hit had they not been removed by IgnoreIonsScoreBelow.

Note that this method is only useful when the search is an integrated spectral library search, the results file has been opened in integrated library mode (ms_peptidesummary::MSPEPSUM_SL_INTEGRATED) and IgnoreIonsScoreBelow is set to a non-zero value.

When all the requirements are met, this method returns the queries and ranks (q,p) of the peptide matches removed due to IgnoreIonsScoreBelow but which would have been part of this protein hit otherwise. The q,p values are needed when iterating over all peptide matches assigned to this protein hit, because evidence for a peptide sequence can come from either search engine.

For example, suppose query 4339 has two matches to the same sequence: significant rank 1 match from the FASTA file and non-significant rank 2 match from the spectral library. Let the rank 1 match be in FASTA accession FAS1 and the rank 2 match in library accession LIB1. Since the matches have the same sequence, the rank 1 match provides evidence for it (it is significant) and the sequence is used in protein grouping. FAS1 and LIB1 end up in the same family, since they match the same significant sequence.

Suppose the rank 2 match is now hidden due to IgnoreIonsScoreBelow. Suppose further that FAS1 ends up as a subset of LIB1. If you iterate over the visible peptide matches of LIB1, query 4339 is nowhere to be seen, because it's hidden. But FAS1 might not appear anywhere either, since it's a subset protein. Query 4339 won't be in the unassigned list, because its rank 1 match was used as peptide evidence in protein grouping.

The only way to discover query 4339 is by iterating over the ignored matches in LIB1. There, query 4339 has the rank 2 match, and you can inspect the other ranks in the query to discover the rank 1 match.

See also:
Spectral libraries.
Parameters:
[out]qVector in which the list of queries is returned.
[out]pVector in which the list of ranks is returned.
int getLongestPeptideLen (  ) const

Return the length (in residues) of the longest peptide in the protein.

Returns:
The length of the longest peptide in the protein
int getLongestSigPeptideLen (  ) const

Return the length (in residues) of the longest significant peptide in the protein.

The threshold can be the identity or the homology threshold and is determined using ms_mascotresults::getPeptideThreshold()

Returns:
the length of the longest peptide with a score greater than the relevant threshold.
std::string getMasses ( ms_mascotresfile resfile,
const ms_proteinsummary summary,
const unsigned int  flags = MASS_SELECT_MATCH,
const int  numDecimalPlaces = 2 
) const

Return a list of comma separated experimental masses according to a specified filter.

This is useful for selecting and displaying the list of observed mass values that satisfy the selection criteria. A number of different flag combinations can produce various sets of masses. An incomplete list of possible combinations follows:

  • peaks selected for scoring that match: MASS_SELECT_MATCH
  • peaks selected for scoring that don't match: MASS_SELECT_NON_MATCH
  • peaks not used for scoring that could match: MASS_NON_SELECT_MATCH
  • peaks not used for scoring that wouldn't match: NON_SEL_NON_MATCH
  • all peaks that could match: MASS_SELECT_MATCH | MASS_NON_SELECT_MATCH
  • all peaks that wouldn't match: MASS_NON_SELECT_NON_MATCH | MASS_SELECT_NON_MATCH

See Maintaining object references: two rules of thumb.

Parameters:
resfilefile object to extract information from.
summarysummary-object to extract information from.
flagscontrols what masses should be returned (see ms_protein::MASS_FLAGS for the complete list of possible values).
numDecimalPlacesnumber of decimal places for a formatted mass.
Returns:
string of concatenated observer masses separated by commas
int getMemberNumber (  ) const

Returns the member number within a family in the results list.

For a protein that is not in a family, the value returned is 0. For a 'main' protein in a family, the value returned is 1. For other members of a family, the value returned is 2 or more.

Returns:
member number of the protain
double getNonMudpitScore (  ) const

Will only return a different score from getScore() if the MSRES_MUDPIT_PROTEIN_SCORE flag has been specified.

For a protein summary, this will be the same as returned by getScore().

For a peptide summary, if the ms_mascotresults::MSRES_MUDPIT_PROTEIN_SCORE has been specified as a flag when creating the ms_peptidesummary object, then the protein score will be calculated differently to offset some artifacts created when the number of spectra approaches the number of sequences in the database (e.g. for MudPIT data sets). See getScore() for details.

If the MUDPIT flag was specified, then the old score can be obtained using this function.

For a peptide summary where ms_mascotresults::MSRES_MUDPIT_PROTEIN_SCORE was not specified, this function will return the same value as getScore().

Returns:
non mudpit score of the protein
int getNumComponents (  ) const

For UniGene and PMF mixture, return number of 'component' proteins.

For UniGene and PMF mixture, each 'protein' is made up of a number of components. Call this method to see how many 'components' were used to make up this 'pseudo' protein. Then call getComponent() to get each of the proteins in turn.

For a 'real' protein (i.e not a mixture or UniGene entry), this method will return zero.

See Peptide mass fingerprint mixtures.

Returns:
The number of components.
int getNumDisplayPeptides ( bool  aboveThreshold = false ) const

Return the number of peptides excluding those that with duplicate matches to same query.

There can be multiple matches to a peptide from the same query; this will occur when there are matches with different mods or mods in different locations. In this case, it is normal to display them using the following loop:

 for (int i=1; i <= prot->getNumPeptides(); i++)
 {
   int query = prot->getPeptideQuery(i);
   int p     = prot->getPeptideP(i);
 
   if (p != -1 
   && query != -1 
   && prot->getPeptideDuplicate(i) != ms_protein::DUPE_DuplicateSameQuery)
   {
       // Display peptide match
   }
 }

For an error tolerant search, if the top match to a query is an error tolerant match, then the query does not contribute to the number of matches above the threshold even if the rank 2 match for the query is above the threshold.

Parameters:
aboveThresholdIf true, the function will only count the number of peptides above the threshold. The threshold used will be ms_mascotresults::getPeptideIdentityThreshold() unless ms_peptidesummary::MSPEPSUM_USE_HOMOLOGY_THRESH was specified in the ms_peptidesummary constructor in which case the threshold is returned by ms_peptidesummary::getHomologyThreshold().
Returns:
the number of peptides that would be displayed.
Examples:
resfile_summary.cpp.
int getNumDistinctPeptideRepeats ( int  distinctIndex,
bool  aboveThreshold = false,
DISTINCT_PEPTIDE_FLAGS  flags = DPF_SEQUENCE 
) const

Return the number of repeats of the distinct peptide in the protein's peptide matches.

getNumDistinctPeptides() returns the number of distinct peptides (distinct as defined by the calling parameters). Each distinct peptide may occur more than once in the complete set of peptide matches for the protein.

This returns the number of times the distinct peptide occurs. The individual occurrences can then be accessed with getDistinctPeptide().

When determining the valid ranges for the index, the same values should be passed for the aboveThreshold and flags parameters.

Parameters:
distinctIndexIndex into a list of distinct peptides, 1..getNumDistinctPeptides()
aboveThresholdIf true, the function will only count the number of peptides above the threshold. The threshold used will be ms_mascotresults::getPeptideIdentityThreshold() unless ms_peptidesummary::MSPEPSUM_USE_HOMOLOGY_THRESH was specified in the ms_peptidesummary constructor in which case the threshold is returned by ms_peptidesummary::getHomologyThreshold().
flags- see ms_protein::DISTINCT_PEPTIDE_FLAGS for details.
Returns:
the count of the number of repeats of the chosen distinct peptide.
int getNumDistinctPeptides ( bool  aboveThreshold = false,
DISTINCT_PEPTIDE_FLAGS  flags = DPF_SEQUENCE 
) const

Return the number of distinct peptides in the protein sequence.

Useful, for example, for MCP reports.

Parameters:
aboveThresholdIf true, the function will only count the number of peptides above the threshold. The threshold used will be ms_mascotresults::getPeptideIdentityThreshold() unless ms_peptidesummary::MSPEPSUM_USE_HOMOLOGY_THRESH was specified in the ms_peptidesummary constructor in which case the threshold is returned by ms_peptidesummary::getHomologyThreshold().
flags- see ms_protein::DISTINCT_PEPTIDE_FLAGS for details.
Returns:
the count of the number of distinct peptides.
int getNumObservedForEmPAI (  ) const

Return the number of peptides 'observed' for emPAI quantitation calculation.

This function should not normally be called directly but is called by ms_mascotresults::getProteinEmPAI

The count of observed peptides only includes peptide matches with scores at or above the homology threshold, or the identity threshold, if there is no homology threshold. Ishihama et. al. obtained best proportionality for a standard protein mixture by counting unique parent ions, including different charge states from the same peptide sequence. This function counts the number of unique parent ions ignoring charge state, which produces better results when the number of charge states is large (e.g. 2+, 3+, 4+, 5+, 6+ and 7+). The differences are negligible when the data are only singly or doubly charged.

This function will still return a value even if ms_mascotresults::isEmPAIallowed() returns false.

The value returned is stored in the cache file when Using the ms_peptidesummary cache

Returns:
the count of peptides used for the emPAI value. Will be 0 if no peptides are above the specified threshold
int getNumPeptides (  ) const

Return the number of peptides that had a match in this protein.

This includes peptides that are duplicates. See also getNumDisplayPeptides().

Returns:
number of peptides in the protein
Examples:
resfile_summary.cpp.
int getPepNumber ( const int  q,
const int  p 
) const

Return the pepNumber given query and rank.

A matched protein contains a number of peptides. Further information about the peptide in the context of the protein can be obtained by calling getPeptideFrame(), getPeptideStart(), getPeptideEnd() etc. These functions all require a pepNumber, and this may be found using this function in cases where only q and p are readily available.

Parameters:
qis the query number in the range 1 to ms_mascotresfile::getNumQueries().
pis the 'rank' number. For a peptide summary, the top 10 matches are saved and hence p would normally be in the range 1 to 10. See ms_peptidesummary::getMaxRankValue() and ms_proteinsummary::getMaxRankValue().
Returns:
pepNumber will be -1 if there is no matching peptide with these q and p values. Otherwise it will be in the range 1 to getNumPeptides().
int getPeptideComponentID ( const int  pepNumber ) const

Returns 0 except for a UniGene entry or a PMF mixture entry.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

If this protein is really just a UniGene entry, or a PMF mixture entry then it is not a 'real' protein but just a container for a number of component proteins. In this case, each peptide originates from one of the components. The 'real' protein that corresponds to each component can be found using getComponent().

In the case of the same peptide being found in multiple components, the ID returned will be any one of these components.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
component id for the peptide whose index is passed
ms_protein::DUPLICATE getPeptideDuplicate ( const int  pepNumber ) const

Return the DUPLICATE status given the peptide 'number'.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

However, duplicate peptides could possibly be different for different proteins, so this value is not available in the ms_peptide object, but can be found in here in the protein object.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
duplicate status for peptide whose index is passed in parameter
Examples:
resfile_summary.cpp.
long getPeptideEnd ( const int  pepNumber ) const

Return the peptide end residue given the peptide 'number'.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

However, the same peptide sequence may occur in different places in different proteins, so the start and end residue information is not available in the ms_peptide object, but can be found in here using this function and getPeptideStart(). The returned number is 1 based.

A value of -1 is returned if there is an error.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
end of the peptide whose index is passed in parameter
int getPeptideFrame ( const int  pepNumber ) const

Return the frame number given the peptide 'number'.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

However, the same peptide sequence may occur in different frames in different proteins, so the frame information is not available in the ms_peptide object, but can be found in here in the protein object.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
frame of the peptide whose index is passed in parameter
double getPeptideIonsScore ( const int  pepNumber ) const

Return the ions score within this protein context given the peptide 'number'.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

However, there are minor corrections to the score for each peptide depending on the protein that it is found in.

The Mascot results pages display the score returned from ms_peptide::getIonsScore() because results from similar proteins are displayed together. For ms_proteinsummary results, the return values from ms_peptide::getIonsScore() and ms_protein::getPeptideIonsScore() will be identical.

For an integrated error tolerant search where ms_mascotresults::MSRES_INTEGRATED_ERR_TOL is specified, the protein score is derived from the highest scoring non error tolerant match for each query, and this is the value returned by this function.

In an integrated spectral library search, if the peptide at pepNumber is a library match, the return values from ms_peptide::getIonsScore() and ms_protein::getPeptideIonsScore() will be identical. Library scores are not affected by multiplicity correction.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
the peptide ions score as used in the protein score.
bool getPeptideIsBold ( const int  pepNumber ) const

Returns true if this peptide should be displayed in bold in a Mascot report.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

This function returns true if this peptide should be displayed in bold in a Mascot report. Bold is used for the first time a query is shown in a report. See also ms_peptide::getFirstProtAppearedIn().

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
whether the peptide whose index is passed in parameter is bold
Examples:
resfile_summary.cpp.
long getPeptideMultiplicity ( const int  pepNumber ) const

Return the number of precursor matches in this protein for the specified peptide 'number'.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

The multiplicity value is the number of times that the precursor mass for the specified peptide got a match in this protein. With a tight tolerance and no variable modifications, this will normally be a small number. For a large protein, with no enzyme specificity and a large number of modifications (or an error tolerant search), this can be a large number.

This value is used internally for standard protein scoring and is not normally required outside Mascot Parser.

See also:
ms_mascotresults::getIonsScoreCorrected() and getScore() for details of how this value is used for protein scoring.
Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
The number of matches to the precursor in this protein.
int getPeptideP ( const int  pepNumber ) const

Return the 'rank' number given the peptide 'number'.

A matched protein contains a number of peptides. These peptides all originate from a 'query'. For peptide summary information, the top 10 scoring results are kept. This is the 'rank' number.

For protein summary information, 'P' refers to the protein hit number.

To get an ms_peptide object, call ms_mascotresults::getPeptide() using the return value from this function and the 'query' from ms_protein::getPeptideQuery().

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
rank number for the peptide whose index is passed in parameter
Examples:
resfile_summary.cpp.
int getPeptideQuery ( const int  pepNumber ) const

Return the query number given the peptide 'number'.

A matched protein contains a number of peptides. These peptides all originate from a 'query'. The query number is returned by this function.

To get an ms_peptide object, call ms_mascotresults::getPeptide() using the return value from this function and the 'p' value from ms_protein::getPeptideP().

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
query number for the peptide whose index is passed in parameter
Examples:
resfile_summary.cpp.
char getPeptideResidueAfter ( const int  pepNumber ) const

Returns the residue immediately after the peptide.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

The residue before and after are only saved in the results files for Mascot 2.1 and later. For files created with earlier versions of Mascot, a '?' will be returned.

If the peptide is an C terminal peptide, then this function will return '-'.

If the search was against nucleic acid data and the peptide is just before a stop codon, then this function will return '@'.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
residue after the peptide whose index is passed in parameter
char getPeptideResidueBefore ( const int  pepNumber ) const

Returns the residue immediately before the peptide.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

The residue before and after are only saved in the results files for Mascot 2.1 and later. For files created with earlier versions of Mascot, a '?' will be returned.

If the peptide is an N terminal peptide, then this function will return '-'.

If the search was against nucleic acid data and the peptide is just after a stop codon, then this function will return '@'.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides()
Returns:
residue before the peptide whose index is passed in parameter
bool getPeptideShowCheckbox ( const int  pepNumber ) const

Returns true if a check box for repeat searches should be shown in a Mascot report.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

A check box is displayed if this is the first rank 1 match that has been displayed for this query. See also ms_peptide::getRank() and ms_peptide::getPrettyRank().

By definition, all unassigned queries will need a check box.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
whether a check box is displayed the peptide whose index is passed in parameter
Examples:
resfile_summary.cpp.
long getPeptideStart ( const int  pepNumber ) const

Return the peptide start residue given the peptide 'number'.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

However, the same peptide sequence may occur in different places in different proteins, so the start and end residue information is not available in the ms_peptide object, but can be found in here using this function and getPeptideEnd(). The returned number is 1 based.

A value of -1 is returned if there is an error.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
Returns:
start of the peptide whose index is passed in parameter
int getProteinSummaryHit (  ) const

For a protein from the protein summary only.

There should be no real reason to use this method outside the library, apart from when determining a value to pass as the singleHit parameter when calling ms_proteinsummary::ms_proteinsummary().

If the protein came from the summary section (or mixture section) then this will return the hit number. For a protein that came from a ms_peptidesummary, this will return zero.

Used within the library for getting the sort order for a PMF the same as in the results file (only an issue for proteins with a similar score).

Returns:
hit number of the protein in the protein summary
double getRMSDeltas ( const ms_mascotresults results ) const

Return the RMS value of the deltas between the calculated and experimental value.

The value is returned in ppm.

Parameters:
resultsreference to an ms_mascotresults object
Returns:
RMS value of all deltas in the protein
double getScore (  ) const

Return the protein score for this protein.

Two protein scoring algorithms are available: MudPIT scoring (recommended) and standard scoring. For a protein summary, only standard scoring is supported.

MudPIT Scoring (recommended)

If the flag ms_mascotresults::MSRES_MUDPIT_PROTEIN_SCORE is specified, protein score is calculated by:

  Protein score = 0
  For each peptide match {                                                
    If there is a homology threshold and ions score > homology threshold  
    {                                                                     
      Protein score += ions score - homology threshold                    
    } else if ions score > identity threshold {                           
      Protein score += ions score - identity threshold                    
    }                                                                     
  }                                                                       
  Protein score += 1 * average of all the subtracted thresholds           

In spectral library searches (Parser 2.6 and later), the algorithm is the same but the score excess over threshold has a different form depending on library mode:

  • In SL-only mode, proteins contain only spectral library matches, so MudPIT score is calculated as above using the raw library scores and library score threshold.
  • In integrated library mode, proteins can contain a mixture of FASTA and library matches. Library scores are first converted to a score excess on the same scale as Mascot scores. (For a detailed discussion, see Advanced reading: calculating the spectral library score threshold.)

In versions prior to 2.2, the thresholds were not affected by the minProbability parameter of the ms_peptidesummary constructor; a default value of 1 in 20 was always used.

Standard Scoring

The standard protein score is the sum of ions scores for each match. For duplicate peptides, just the highest score is taken.

The standard score can be lower than the sum of the ions scores, particularly when the protein is large. This is because a correction is applied to compensate for the accumulation of random ions scores from random matches. The difference is more substantial when doing a no-enzyme search, because there are orders of magnitude more random matches. See the function ms_mascotresults::getIonsScoreCorrected() for details of how the correction is calculated. If the correction causes the ions score to become negative, then this ions score is ignored when calculating the protein score.

Note that the match score correction is only used with standard scoring. MudPIT scoring always uses the uncorrected score.

See also:
getNonMudpitScore(), ms_mascotoptions::getMudpitSwitch()
Returns:
score of the protein
Examples:
repeat_search.cpp, and resfile_summary.cpp.
double getScoreWithET (  ) const

Return the protein score including ET matches for this protein.

Returns the score of the protein including ET matches This is used mainly for Family grouping, to differentiate between between two otherwise identical proteins

Returns:
score of the protein including error tolerant matches
int getSimilarProteinDB (  ) const

Return the database index of a protein that contains the same set (or a superset of) of the peptides in this protein.

Deprecated:
Use ms_protein::getSimilarProteins(), ms_mascotresults::getNextSimilarProtein() or ms_mascotresults::getNextSimilarProteinOf() instead.

This function returns a single protein database ID. When using the ms_mascotresults::MSRES_CLUSTER_PROTEINS flag, a subset protein may be a subset of more than one parent protein. To find the complete list of proteins that it is a subset of, call getSimilarProteins(). This function just returns the 'first' protein in the list of superset proteins. There will only be a multiple 'similar' proteins in cases where ms_mascotresults::MSRES_CLUSTER_PROTEINS is specified and where this protein is a GROUP_SUBSET. There will only be a single 'similar' protein where this protein is GROUP_COMPLETE.

See Grouping proteins together, getGrouping() and getSimilarProteinName().

Returns:
the database ID for the sameset or superset protein.
std::string getSimilarProteinName (  ) const

Return the accession of a protein that contains the same set (or a superset of) of the peptides in this protein.

Deprecated:
Use ms_protein::getSimilarProteins(), ms_mascotresults::getNextSimilarProtein() or ms_mascotresults::getNextSimilarProteinOf() instead.

This function returns a single protein accession string. When using the ms_mascotresults::MSRES_CLUSTER_PROTEINS flag, a subset protein may be a subset of more than one parent protein. To find the complete list of proteins that it is a subset of, call getSimilarProteins(). This function just returns the 'first' protein in the list of superset proteins. There will only be a multiple 'similar' proteins in cases where ms_mascotresults::MSRES_CLUSTER_PROTEINS is specified and where this protein is a GROUP_SUBSET. There will only be a single 'similar' protein where this protein is GROUP_COMPLETE.

See Grouping proteins together, getGrouping() and getSimilarProteinDB().

Returns:
the accession for the sameset or superset protein.
int getSimilarProteins ( std::vector< std::string > &  accessions,
std::vector< int > &  dbIdxs 
) const

Return a list of proteins that that contains the same set (or a superset of) of the peptides in this protein.

See Using MSRES_CLUSTER_PROTEINS.

Parameters:
accessionsIs the list of accessions for which this protein is a sameset or a subset. See Using STL classes in Perl, Java, Python and C#.
dbIdxsIs the corresponding list of databases for the accessions. This array will be the same size as the accessions array. For a search against a single database, all the IDs will be 1. See Using STL classes in Perl, Java, Python and C#.
Returns:
The number of accessions and database IDs returned in the vectors.
std::string getUnmatchedMasses ( ms_mascotresfile resfile,
const int  numDecimalPlaces = 2 
) const

Return a list of comma separated experimental masses that don't match.

This is useful for displaying the list of observed mass values that failed to get a match to a protein hit in a PMF (listed at end of each hit in a protein summary report).

Parameters:
resfilereference to a ms_mascotresfile object
numDecimalPlacesdecimal precision
Returns:
string of concatenate observed masses separated by commas
bool isASimilarProtein ( const ms_protein prot,
const ms_mascotresults results,
const bool  groupByQueryNumber = false 
)

Find a protein in the results.

Note:
This function should not need to be called from outside Mascot Parser.

The function looks to see if 'self' contains the same set or a subset of matching peptides as the passed 'prot'. If it does, then it sets its group to be GROUP_COMPLETE or GROUP_SUBSET and also sets the similar protein accession.

Parameters:
protIs the protein to compare.
resultsNeed to be passed for access to the peptide information.
groupByQueryNumberIs used to determine whether peptide similarity is just by query number (for PMF) or by peptide string for MS-MS.
Returns:
TRUE if the proteins are similar.
bool isPMFMixture (  ) const

Returns true if the 'protein' is actually a PMF mixture.

To find out what 'component' proteins were used to get this entry, see getComponent() and Peptide mass fingerprint mixtures.

Returns:
true if the protein originates from a PMF mixture, false otherwise
void setDB ( int  idx )

Set database index.

Parameters:
idxdatabase index from 1 to ms_searchparams::getNumberOfDatabases().
void setPeptideIsBold ( const int  pepNumber )
Note:
This function should normally only be used internally.

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().
void setPeptideShowCheckbox ( const int  pepNumber )

A matched protein contains a number of peptides. Further information about the peptide can be obtained by getting an ms_peptide object using getPeptideQuery(), getPeptideP() and ms_mascotresults::getPeptide().

A check box is displayed if this is the first rank 1 match that has been displayed for this query. See also ms_peptide::getRank() and ms_peptide::getPrettyRank().

By definition, all unassigned queries will need a check box.

Parameters:
pepNumbermust be in the range 1 to getNumPeptides().

Friends And Related Function Documentation

bool operator< ( const ms_protein lhs,
const ms_protein rhs 
) [friend]

Protein objects perform a simple sort of themselves by database ID and then accession.

Note:
This method can only be used by C++ programs.

Final sorting for proteins by score and then accession is more complex.

Parameters:
lhsleft element to compare
rhsright element to compare

The documentation for this class was generated from the following files:
Copyright © 2016 Matrix Science Ltd.  All Rights Reserved. Generated on Fri Jun 2 2017 01:44:53