Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Crosslink search

Identification of crosslinked peptides is a new feature in Mascot Server 2.7. Crosslinking is widely used to elucidate protein tertiary structure and to probe protein-protein interactions. In most cases, the link is created artificially, by a chemical reaction, but there are also natural crosslinks, such as disulfide bridge formation between a pair of cysteines.

Crosslinks can be cleaved, prior to analysis, by chemical means, by irradiation, or by collision induced dissociation inside the mass spectrometer. Ideally, cleavage leaves behind some fragment of the linker to mark the site of attachment. The individual peptides can then be identified by a conventional database search that includes appropriate variable modifications.

Identification of pairs of peptides linked by intact crosslinks is a challenging problem. If the number of candidate peptides that could be linked is N, a blind search for a linked pair becomes an N^2 problem. Usually, this means that the database must be greatly restricted – possibly just two proteins of interest. The MS/MS spectrum is likely to contain fragments from both peptides, and fragmentation will usually be less complete than for an isolated peptide, particularly where one peptide is much shorter than the other.

Isotopic labels can be used in various ways with both cleavable and non-cleavable crosslinkers to show whether an analyte contains a link and to quantify the extent of linking.

Crosslinked products can be classified as

  • intralinks – the linked peptides are from the same protein
  • interlinks – the linked peptides are from two different proteins
  • looplinks – connecting two amino acids within a single peptide
  • monolinks – a linker with one end attached and the other free

The variety and complexity of crosslinking experiments means that analysis software needs a large number of parameters to define how to process the data. We have chosen a similar approach to that used for quantitation, and placed these new parameters into named crosslinking methods, one of which can be selected for a search.

Peak picking

If the most common charge states for single tryptic peptides are 2+ and 3+ then the likely charge states for a linked pair of peptides will be 4+ to 6+. Make sure your peak picking allows for this. Also, if data quality permits, de-charge the fragment masses to 1+, because fragments that include the linked peptide are likely to have charge states higher than 2+, which means they will not be matched as m/z values. If your peak picking software doesn’t support de-charging of fragments, give Mascot Distiller a try.

Search and reporting parameters

(Information about global settings can be found in Chapter 6 of the Mascot Server Installation and Setup manual.)

The following apply independently to each peptide in a linked pair: missed cleavages, MaxPepNumVarMods, MaxPepNumModifiedSites

The following apply to the linked pair as a whole: MaxPepModArrangements, 16 kDa mass limit for standard Mascot Server

The following apply to the longer peptide in a linked pair: MinPepLenInSearch, MinPepLenInPepSummary

Enzyme can be specific or semi-specific. Non-specific enzyme cleavage cannot be used. Nucleic acid or spectral library databases cannot be searched.

The following functionality cannot be combined with a crosslink search: error tolerant search, auto-decoy, quantitation, Percolator.

Unimod

For most cross-links, there is a master entry in Unimod that contains all the information needed by a cross-link aware application, e.g. Xlink:DSSO. There are separate Unimod ‘legacy’ entries to enable standard software to calculate the masses of intact links, link fragments, and monolinks, e.g. Xlink:DSSO[176]. These legacy entries are easily identified by a name that begins “Xlink:” and ends with an integer mass in square brackets.

On a Mascot Server, (2.7 or later), the master entries for crosslinks are in a separate file from the other Unimod data. This is because the additional information needed for crosslink entries required XML schema changes, which would break earlier releases. More information here.

In the Mascot Server configuration Editor, there are separate entries for Modifications and Linkers.

Modifications: When you update Unimod, all entries in the public Unimod database except for master crosslink entries are downloaded to a file on the Mascot Server called master.xml. When you create or edit entries using the Modifications Configuration Editor, changes are saved to usermod.xml. When either file changes, the two are automatically merged into a single file called mascot/config/unimod.xml. This means that you can always get the latest changes from the public database without losing your local changes.

Linkers: When you update Unimod, the master crosslink entries in the public Unimod database are downloaded to a file on the Mascot Server called master_xl.xml. When you create or edit entries using the Linkers Configuration Editor, changes are saved to usermod_xl.xml. When either file changes, the two are automatically merged into a single file called mascot/config/unimod_xl.xml. This means that you can always get the latest changes from the public database without losing your local changes. Linkers are only used in crosslink methods, and do not appear in the modifications lists in the search form, etc.

Note that some crosslink entries can generate large numbers of variable mods when specified in a crosslinking method. For example, Xlink:DSS has two specificities (K and Protein N-term), each of which has four delta masses (intact link and three monolinks). If all of these are included in a search, this counts as eight variable modifications. If two more varmods are selected in the search form, this will exceed the default limit of nine, and the search will terminate with an error message. Every additional varmod reduces specificity and increases the search time, so think carefully about whether some monolinks or specificities can be dropped. If not, then it may be necessary to increase the limit on the number of varmods. (The global limit is MaxVarMods in Configuration Options, or it can be set for a user group in Mascot security.)

Export

Search results that use a crosslink method can be exported in CSV, XML or mzIdentML format. They can also be exported to xiView for visualisation (exporting guidelines).

Crosslinking method reference

Provided you have the necessary security rights, the Mascot Configuration Editor can be accessed by following a link from your local Mascot home page. Choose Crosslinking to edit, copy, or delete existing methods and create new ones. Choosing print displays a printer friendly summary of the crosslinking method.

Click on the method name to edit a method. The method tab provides a control-based interface. The XML formatted method can be inspected or edited in the XML tab. String values are case sensitive, but element and attribute labels are not. The schema can be found on your Mascot Server under html/xmlns/schema/crosslinking_1.

Method

Name attribute – The name to be displayed in the drop-down list on the search form. Safest to avoid non-ASCII characters.

Description attribute – Space for a more complete description. Can be left empty.

Strategy attribute – Only two values are currently supported:

  • Strategy=None means the rest of the method parameters are completely ignored.
  • Strategy=Brute-force means all N^2 peptide pairs are generated and matched against the input queries.

Linkers

The Linkers element must contain at least one Linker element, with a ModFileName attribute that specifies the mod_file-style name, e.g. "Xlink:DSS (K)". Available crosslinks are defined in unimod_xl.xml. There will often be more than one Linker element. Currently all ModFileName attributes in a method must refer to the same Unimod entry, and can only differ in specificity.

Linker elements can optionally enable monolinks, which behave as variable modifications. The monolink code identifies the neutral loss element in Unimod that encodes the monolink definition.

      <mxm:linker ModFileName="Xlink:DSS (K)">
        <mxm:monolink>A</mxm:monolink>
        <mxm:monolink>W</mxm:monolink>
        <mxm:monolink>T</mxm:monolink>
      </mxm:linker>

The optional does_not_pair_with element can be used to restrict links between certain pairs of specificities. EDC, as shown in one of the crosslink examples, links primary amine (K, Protein N-term) to carboxyl (D, E, Protein C-term). The does_not_pair_with elements prevents testing for links such as K-K. If there are no does_not_pair_with elements, a linker can connect any two linker specificities.

Although it seems redundant, it is necessary to define both senses for each restriction. That is, if the linker element for K contains a does_not_pair_with element for Protein N-term, then the linker element for Protein N-term must contain a does_not_pair_with element for K.

If a Linker element contains both monolink and does_not_pair_with elements, the monolink elements must come first.

Accessions

An Accessions element specifies which database entries will be considered for crosslinks. In the EDC example, only MND1_ARATH and HOP2_ARATH are tested for crosslinks. If the accession by itself might be ambiguous, an optional DatabaseName attribute can be included. For example:

    <mxm:accessions>
      <mxm:accession DatabaseName="scratch">seq_a</mxm:accession>
      <mxm:accession DatabaseName="scratch">seq_b</mxm:accession>
      <mxm:accession DatabaseName="scratch">seq_c</mxm:accession>
    </mxm:accessions>

If all entries in a Fasta files should be tested for crosslinks, the Accession element can be empty, e.g.:

    <mxm:accessions>
      <mxm:accession DatabaseName="scratch"/>
    </mxm:accessions>

Note that an empty Accessions element or an empty Accession element without a DatabaseName attribute means that no crosslinks will be reported.

Scope

The Scope element can contain up to three elements, specifying the type of crosslinks to be considered. The default for each is false, so at least one must be set true for there to be any crosslinks in the results, as in the EDC crosslink example.

  • InterLink – Look for crosslinks between two peptides from two different protein entries
  • IntraLink – Look for crosslinks between two peptides from the same protein entry
  • LoopLink – Look for crosslinks within a single peptide

Filters

The following settings can be used to exclude spectra from crosslink matching:

  • MinPrecursorMr – Only look for crosslinks if the precursor mass is at least this value (default 1300 Da)
  • MinLen – Only look for crosslinks where the length of one candidate peptide is at least this value (default MinPepLenInSearch)
  • MinCharge – Only look for crosslinks if the precursor charge is at least this value (default 2)

This element is optional. If included, it might look like this

    <mxm:filters>
      <mxm:parameter name="MinPrecursorMr">2000</mxm:parameter>
      <mxm:parameter name="MinLen">2</mxm:parameter>
      <mxm:parameter name="MinCharge">4</mxm:parameter>
    </mxm:filters>

Settings

Currently, the only miscellaneous setting is:

  • MaxProteins – If the number of protein entries specified by the Accessions element (or the number of entries in the search if there is no Accessions element) exceeds this value, then terminate the search (default 100)

This element is optional. If included, it might look like this

    <mxm:settings>
      <mxm:parameter name="MaxProteins">10</mxm:parameter>
    </mxm:settings>