Matrix Science header

Crosslinked search results
[Mascot results file module]

Crosslinked search results

Mascot 2.7 and later can search for intact crosslinked peptides as well as peptides with cleavable crosslinks. A typical example of intact crosslinking is disuccinimidyl suberate (DSS), which chemically bonds lysines in two different peptides. An MS-cleavable linker like disuccinimidyl sulfoxide (DSSO) also links lysines but mostly cleaves during CID.

Intact crosslinked peptide matches consist of the alpha peptide sequence, the beta peptide sequence and the linker. The intact link can occasionally cleave during MS/MS, which leaves behind a linker fragment called the monolink. Monolinks are modelled as ordinary variable modifications.

Conversely, cleavable crosslinks are always modelled as monolinks. The alpha and beta peptides hold different ends of the linker fragment, and there is one peptide-spectrum match for alpha and another for beta. However, if cleaving efficiency is less than 100%, some of the links may survive MS or MS/MS intact, so the search results may still contain intact crosslinked peptide matches. The rules are defined in the crosslinking method.

A linker is often chemically quenched during sample processing when one end reacts with a non-peptide molecule. The distinction between quenched linkers and linker fragments is made in the neutral loss elements of the linker definition. Mascot treats both types simply as monolinks.

By default, Parser opens crosslinked search results in a backwards-compatible mode, where intact crosslinked matches are invisible. Adding support in client code is more involved than just setting a constructor flag. Many of the methods either take a new argument in Parser 2.7, or they interpret existing arguments differently.

Glossary of terminology and concepts

Term Synonyms Definition
linear peptide a single peptide sequence without branches or cycles
(intact) crosslink chemical crosslink; non-cleavable link a link between two peptides that survives MS/MS (e.g. DSS/BS3, DST, ...)
cleavable crosslink a link between two peptides that cleaves during CID or is cleaved by other means before MS/MS
monolink type 0 modification; deadend link a linear peptide with a linker fragment whose other end is unattached
looplinked peptide type 1 modification; intrapeptide link; intramolecular link a single peptide where two sites are linked, forming a loop
crosslinked peptide type 2 modification; interpeptide link; intermolecular link two linear peptides joined by the intact crosslink
alpha peptide, beta peptide alpha chain, beta chain the two linear peptides part of a crosslinked unit; alpha is the heavier or longer one
homo-crosslinked peptide type 2 link where alpha and beta peptides have the same sequence
hetero-crosslinked peptide type 2 link where alpha and beta peptides have different sequences
protein intralink a protein where two non-overlapping peptides have an intact crosslink
protein interlink two proteins with an intact crosslink (either homo-crosslinked or hetero-crosslinked proteins)
crosslinking method a set of parameters and settings that define which types of links to search

Note that a looplinked peptide (type 1) is different from a homo-crosslinked peptide (type 2).

Note also that either or both of the alpha or beta peptide in a crosslinked pair may contain looplinks. For simplicity of terminology, a linear peptide denotes both a single peptide sequence (with or without looplinks) and the alpha or beta peptide in a crosslinked pair (with or without looplinks).

Detecting and opening crosslinked search results

To determine whether a results file might contain matches with intact crosslinks, looplinks or monolinks, use ms_mascotresfile::getCrosslinkingMethod(). If the search has a crosslinking method, some of the linear peptide matches in the 'peptides' section may contain monolinks or looplinks. There is currently no easy check to determine whether there are any matches with monolinks or looplinks, other than iterating through all peptide matches. No special constructor flags are needed for accessing linear peptide data.

To determine whether there actually are crosslinked matches, use ms_mascotresfile::anyPeptideSummaryMatches(). If the test returns true for SEC_CROSSLINK_PEPTIDES, then the results file has a non-empty section containing crosslinked matches and the file should be opened in integrated crosslink mode.

Three modes are available:

  1. Linear-only mode (no flag): The backwards-compatible default is to hide crosslinked matches.
    If the search contains only crosslinked matches, then opening the file in this mode means there are no matches available at all.
  2. Integrated crosslink mode (MSPEPSUM_CROSSLINK_INTEGRATED): Show matches from all sections of the results file.
  3. Crosslink-only mode (MSPEPSUM_CROSSLINK_ONLY): Hide linear matches (not recommended).

The helper function ms_mascotresfile::get_ms_mascotresults_params() does not set MSPEPSUM_CROSSLINK_INTEGRATED automatically. This means code written for Parser 2.6 and earlier always opens crosslinked search results in linear-only mode.

Parser 2.7 and Mascot 2.7 do not support the following search types in combination with crosslinking:

These restrictions may be lifted in a future release.

Opening the file in integrated mode

To open the file in integrated mode, pass the flag MSPEPSUM_CROSSLINK_INTEGRATED to the matrix_science::ms_peptidesummary constructor.

The integrated crosslink mode is similar to the integrated error tolerant mode (see Integrated error tolerant search) and the integrated spectral library mode (see Opening the file in integrated mode). A query can contain a mixture of up to 20 linear and intact crosslinked peptide matches (see ms_mascotresults::getMaxRankValue()).

The identity and homology thresholds are determined by pooling match data from both linear and crosslinked peptides. For example, the total number of trials (qmatch) is the sum of qmatch from the linear summary section and qmatch from the crosslinked summary section.

Integrated mode is the preferred mode for opening crosslinked search results.

Opening the file in crosslink-only mode

To open the file in crosslink-only mode, pass the flag MSPEPSUM_CROSSLINK_ONLY to the matrix_science::ms_peptidesummary constructor.

If the search contains only linear matches, then opening the file in this mode means there are no matches available at all. Using this mode is not recommended, because you will get misleading significance thresholds.

The crosslinking method

The crosslinking method defines the linkers used in the search as well as parameters like whether to search for protein intralinks or protein interlinks. See the Mascot help page for more detail.

Use ms_mascotresfile::getCrosslinkingMethod() or ms_peptidesummary::getCrosslinkingMethod() to access the crosslinking method object.

How crosslinked and looplinked match data are encoded

When Mascot parses the crosslinking method at the beginning of the search, it assigns a variable mod number to each linker specificity. For example, if the method defines linker specificities Xlink:DSS (Protein N-term) and Xlink:DSS (K), the first one could be varmod number 1 and the second varmod number 2. The linked sites are encoded in the variable mods string, and additionally in the linked_sites attribute.

Here is an example of a match with one intact link between alpha K4 (1:4:2) and beta K5 (2:5:2) and no variable modifications.

q9913_p1=-0.000397,11,66,16.87,1010001000200001201,0,0
q9913_p1_sequence_1=1,801.470831,NLGKVGSK,0000200000;"ALBU_HUMAN":0:453:460:1
q9913_p1_sequence_2=1,746.403463,ASSAKQR,000002000;"ALBU_HUMAN":0:215:221:1
q9913_p1_linked_sites=1:4:2,2:5:2

If a match has a monolink, the variable mod number of the linker is output in the variable mods string. The monolink string records the index of the corresponding neutral loss element in the linker definition. In the next example, the alpha peptide has a monolink at K8, and the intact link is between alpha K2 (1:2:2) and beta K5 (2:5:2). The monolink has neutral loss index 1, which (in this case) corresponds to monolink code [A]. NL index 0 (in this case) corresponds to the intact link [I]. The monolink string is only present if the peptide match has monolinks or looplinks.

q17957_p1=-0.000976,10,65,29.12,0010001011200001001,0,0
q17957_p1_sequence_1=2,1590.855163,LKCASLQKFGER,00200000200000;"ALBU_HUMAN":0:222:233:1
q17957_p1_sequence_2=1,746.403463,ASSAKQR,000002000;"ALBU_HUMAN":0:215:221:1
q17957_p1_linked_sites=1:2:2,2:5:2
q17957_p1_monolink_1=00000000100000
q17957_p1_monolink_2=000000000

If the match has a looplink, one end of the looplink is encoded exactly like a monolink. The other end can be inferred from the looplinked_sites attribute. Linear peptides may have monolinks and looplinks, as in the example below. The peptide has a looplink between K2 and K4. One end, K2, is encoded like a monolink, except its NL index corresponds to the intact link [I]. The peptide also has a normal monolink at K7.

q9022_p1=3,1599.934799,0.000313,8,HKPKATKEQLK,39,0020000200000,17.70,1000011012000002000,0,0;"ALBU_HUMAN":0:559:569:1
q9022_p1_monolink=0000000100000
q9022_p1_looplinked_sites=2:4

Finally, the most complicated case is a crosslinked peptide where the alpha and beta peptides have variable mods (like Oxidation (M)), monolinks and looplinks, and the variable mods additionally have fragment neutral losses. Matches this complicated are not commonly seen.

Parser API for accessing intact link, monolink and looplink data

ms_peptide methods for intact crosslinked peptide match

The ms_peptide class represents the peptide-spectrum match. The original API design assumes a linear peptide. Some interface changes have been made to allow fetching data specific to alpha and beta peptides without introducing a new class or a new set of methods.

Most ms_peptide methods take a new argument psmComponent of type ms_peptide::PSM. This is one of PSM_COMPLETE, PSM_CROSSLINK_ALPHA or PSM_CROSSLINK_BETA.

The rules for whether the parameter needs to be specified are:

In either case, the match may contain monolinks encoded as variable modifications, as well as looplinks.

In an intact crosslinked match, PSM_COMPLETE represents only the match-level data, such as match score. Where reasonable, methods may return data concatenated or summed from the alpha and beta peptides. For example, ms_peptide::getPeptideStr(true, ms_peptide::PSM_COMPLETE) returns a string where the alpha sequence is concatenated with the beta sequence. The two sequences are separated by "][", where the characters mark the alpha C-terminus and beta N-terminus, respectively. This ensures the concatenated variable mods string, primary NL string and related strings align correctly with the concatenated sequence string.

The table below summarises the return values when the match is an intact crosslinked peptide and the method depends on psmComponent. Methods that always return match-level data, like ms_peptide::getDelta() do not take a new parameter.

Method Return value with PSM_COMPLETE Return value with PSM_CROSSLINK_ALPHA Return value with PSM_CROSSLINK_BETA
ms_peptide::getAmbiguityString() "" alpha string beta string
ms_peptide::getAnyProteinTermination() false, false alpha values for isNterminus, isCterminus beta value isNterminus, isCterminus
ms_peptide::getComponentStr() alpha component if alpha component == beta component, otherwise "" alpha component beta component
ms_peptide::getLocalModsNlStr() alpha string + beta string alpha string beta string
ms_peptide::getLocalModsStr() alpha string + beta string alpha string beta string
ms_peptide::getLoopLinks() alpha looplinks followed by beta looplinks alpha looplinks beta looplinks
ms_peptide::getMissedCleavages() -1 if alpha/beta value is -1, otherwise alpha value + beta value alpha value beta value
ms_peptide::getMonoLinkStr() alpha string + beta string alpha string beta string
ms_peptide::getMrCalc() alpha value + beta value + linker mass alpha value beta value
ms_peptide::getPeptideLength() alpha length + length("][") + beta length alpha length beta length
ms_peptide::getPeptideStr() alpha string + "][" + beta string alpha string beta string
ms_peptide::getPrimaryNlStr() alpha string + beta string alpha string beta string
ms_peptide::getSummedModsNlStr() alpha string + beta string alpha string beta string
ms_peptide::getSummedModsStr() alpha string + beta string alpha string beta string
ms_peptide::getVarModsStr() alpha string + beta string alpha string beta string

ms_peptide methods for linear peptide match

The table below summarises the return values when the match is a linear peptide and the method depends on psmComponent. The argument is ignored in all cases.

Method Return value with PSM_COMPLETE, PSM_CROSSLINK_ALPHA, PSM_CROSSLINK_BETA
ms_peptide::getAmbiguityString() ambiguity string
ms_peptide::getAnyProteinTermination() isNterminus, isCterminus
ms_peptide::getComponentStr() peptide component
ms_peptide::getLocalModsNlStr() local mods NL string
ms_peptide::getLocalModsStr() local mods string
ms_peptide::getLoopLinks() looplinks
ms_peptide::getMissedCleavages() missed cleavages
ms_peptide::getMonoLinkStr() monolink string
ms_peptide::getMrCalc() MrCalc
ms_peptide::getPeptideLength() peptide sequence length
ms_peptide::getPeptideStr() peptide sequence
ms_peptide::getPrimaryNlStr() primary NL string
ms_peptide::getSummedModsNlStr() summed mods NL string
ms_peptide::getSummedModsStr() summed mods string
ms_peptide::getVarModsStr() variable mods string

Procedure for differentiating between intact links, looplinks and monolinks

For backwards compatibility, the delta returned by ms_searchparams::getVarModsDelta() is the mass of the intact link. If you open crosslinked search results in Parser 2.6 or earlier, which are unaware of the new attributes, it will appear as if monolinks and looplinks all have the same delta. This is correct for looplinks but not for monolinks.

Determining the type of the variable mod number is a sequential procedure.

  1. Get the list of intact link positions from ms_peptide::getIntactLinks().
  2. Get the list of looplink positions from ms_peptide::getLoopLinks().
  3. When looping over the variable mods string, if the mod number is 0, there's no mod or linker at this position.
  4. Otherwise:
    1. Call intactLinks.getVarModIdxOfLinkedSite() to determine whether there is an intact link at this position.
    2. If not an intact link, call loopLinks.getVarModIdxOfLinkedSite() to determine whether there is looplink at this position.
    3. If not a looplink, extract the the monolink number from the monolink string, ms_peptide::getMonoLinkStr(), at the same position. Use the variable mod number and the monolink number as arguments to ms_mascotresfile::getMonoLinkModification(). If it returns a valid object, this position encodes a monolink.
    4. Otherwise, this position is an ordinary variable modification.

Processing intact link and looplink data is easiest in a separate loop. For intact links:

  1. Get the list of intact link positions from ms_peptide::getIntactLinks().
    This class encapsulates the linked_sites line.
  2. Loop over the contents of the ms_linker_site_vector:
    1. Access the linker start and end positions and the variable mod numbers using the ms_linker_site methods.
    2. Note that the PSM component is different at either end of the link.

Processing looplink data is the same, except it is typically easiest to do it separately for alpha and beta peptides:

  1. Get the list of intact link positions from ms_peptide::getLoopLinks(PSM_CROSSLINK_ALPHA).
    This class encapsulates the looplinked_sites line.
  2. Loop over the contents of the ms_linker_site_vector:
    1. Access the looplink start and end positions and the variable mod numbers using the ms_linker_site methods.
    2. Note that the PSM component is PSM_CROSSLINK_ALPHA at both ends of the link.
  3. Repeat for PSM_CROSSLINK_BETA.

Note that the monolink index zero could mean either lack of modification (if the variable mods string also has a zero) or the first monolink in the linker definition. It's important to check both the variable mods string and the monolink string.

The following C++ code illustrates processing the variable mods, intact links, looplinks and monolinks of a linear peptide match.

C++
std::string varModsStr = peptide.getVarModsStr();
std::string monoLinkStr = peptide.getMonoLinkStr();
ms_linker_site_vector intactLinks = peptide.getIntactLinks();
ms_linker_site_vector loopLinks = peptide.getLoopLinks(ms_peptide::PSM_COMPLETE);

for (int i = 0; i < varModsStr.length(); ++i) {
  int modNum = convertToInt(varModsStr[i]);
  int monoLinkNum = 0;

  // Monolink string could be empty -- check first.
  if (monoLinkStr.size() > 0)
    monoLinkNum = convertToInt(monoLinkStr[i]);

  if (modNum == 0)
    continue;

  // Is it an intact link?
  if (0 < intactLinks.getVarModIdxOfLinkedSite(ms_peptide::PSM_COMPLETE, i)) {
    // Yes, it's an intact link. Process in a separate loop as described above.
    continue;
  }

  // Is it a looplink in either PSM component?
  if (0 < loopLinks.getVarModIdxOfLinkedSite(ms_peptide::PSM_COMPLETE, i)) {
    // Yes, it's a looplink. Process in a separate loop as described above.
    continue;
  }

  // Is it a monolink?
  const ms_modification *monoLinkMod = resfile.getMonoLinkModification(modNum, monoLinkNum);
  if (monoLinkMod) {
    delta = monoLinkMod->getDelta(MASS_TYPE_MONO);
    continue;
  }

  // Not intact link, monolink or looplink, so it's a regular variable mod.
  delta = resfile.params().getVarModsDelta(modNum);
}

Processing the alpha or beta peptide in a crosslinked match is very similar; simply replace ms_peptide::PSM_COMPLETE with ms_peptide::PSM_CROSSLINK_ALPHA or ms_peptide::PSM_CROSSLINK_BETA where relevant.

Protein inference

Protein inference and the ms_protein API are not affected by the presence of monolinks or looplinks, since these are just variable modifications.

However, there are a few details to consider with intact crosslinked peptides. A peptide match is assigned to a protein hit when the alpha sequence, the beta sequence or both appear in the protein. Methods that return sequence-level data, such as ms_protein::getPeptideStart() now take a new parameter psmComponent. The change is analogous to ms_peptide methods for intact crosslinked peptide match. In fact, the easiest way to check whether psmComponent is assigned to the protein is by looking at the return value of getPeptideStart(): if it's -1, the psmComponent is not assigned to the protein hit.

The definition of duplicate peptide has been extended:

No client code changes are needed; duplicate checking is done internally, as before. (For more details, see Peptide match duplicates.)

Protein inference is not affected by intact crosslinks. That is, the intact crosslink between two proteins is not enough to cluster them in the same protein family. Clustering requires sharing the same significant alpha or beta sequence.

If a protein has any crosslinked peptides, these are ignored when emPAI is calculated.

ms_protein methods for intact crosslinked peptide match

The table summarises the return values when at least one psmComponent is assigned to the protein hit.

Method Return value with PSM_COMPLETE Return value with PSM_CROSSLINK_ALPHA Return value with PSM_CROSSLINK_BETA
ms_protein::getPeptideStart() -1 alpha value (if alpha is in this protein) or -1 beta value (if beta is in this protein) or -1
ms_protein::getPeptideEnd() -1 alpha value (if alpha is in this protein) or -1 beta value (if beta is in this protein) or -1
ms_protein::getPeptideMultiplicity() -1 alpha value (if alpha is in this protein) or -1 beta value (if beta is in this protein) or -1
ms_protein::getPeptideFrame() -1 alpha value (if alpha is in this protein) or -1 beta value (if beta is in this protein) or -1
ms_protein::getPeptideResidueBefore() '?' alpha value (if alpha is in this protein) or '?' beta value (if beta is in this protein) or '?'
ms_protein::getPeptideResidueAfter() '?' alpha value (if alpha is in this protein) or '?' beta value (if beta is in this protein) or '?'

ms_protein methods for linear peptide match

When the peptide match assigned to the protein hit is a linear peptide, the psmComponent argument is ignored.

Method Return value with PSM_COMPLETE, PSM_CROSSLINK_ALPHA, PSM_CROSSLINK_BETA
ms_protein::getPeptideStart() start position
ms_protein::getPeptideEnd() end position
ms_protein::getPeptideMultiplicity() multiplicity
ms_protein::getPeptideFrame() frame or -1
ms_protein::getPeptideResidueBefore() residue before
ms_protein::getPeptideResidueAfter() residue after

Fragmenting crosslinked and looplinked peptides

ms_aahelper::calcFragmentsEx() and related methods can fragment a crosslinked peptide. Fragmentation produces single-cleavage product ions. First, the beta sequence is treated as a modification attached to the alpha peptide, and the alpha sequence is fragmented as usual. Then the roles are reversed. The final list of fragments contains single-cleavage ions from alpha and single-cleavage ions from beta. There are no double-cleavage ions where the alpha and beta fragment simultaneously.

The ms_fragment class has two new methods: isFromAlpha() and isFromBeta(). Fragmenting a linear peptide produces ms_fragment objects where both methods return false. Fragmenting a crosslinked peptide produces fragments where one or the other flag is true. Because there are no double-cleavage ions, isFromAlpha() and isFromBeta() cannot both be true at the same time.

There is a helper method for linking two peptide objects: ms_aahelper::createCrosslinkedPeptide().

Fragmenting a peptide with looplinks the exactly the same as fragmenting a peptide without looplinks, apart from one exception. The looplink is assumed to be stronger than the peptide backbone. Thus, there are no fragments that start or end in the region spanned by a looplink. This includes regular series as well as internals.

Crosslinking changes in ms_peptidesummary

There is a new method that returns the number of discovered intact crosslinks and looplinks: ms_peptidesummary::getNumDiscoveredIntactLinks()

Because monolinks share the same variable mod number, ms_peptidesummary::getNumDiscoveredVariableMods() has been extended. The method returns the modification names and deltas as well as counts, positions and sites.

ms_peptidesummary::getAllProteinsWithThisPepMatch() has been extended to return the psmComponent assigned to the protein hit.

There are no changes to the parameters of ms_peptidesummary::findPeptides(). If you search for a peptide sequence with findPeptides(), it will compare the input string to both the alpha sequence and beta sequence. If either one is a match, findPeptides() adds it to the return vector.

Calling ms_peptidesummary::getReadableVarMods() without psmComponent produces a human-readable string of variable modifications, monolinks, looplinks and intact links contained in the peptide match.


Copyright © 2022 Matrix Science Ltd.  All Rights Reserved. Generated on Thu Mar 31 2022 01:12:30