Matrix Science header
Public Member Functions

ms_spectral_lib_file Class Reference
[Mascot utilities and tools module]

This class is used to encapsulate a complete NIST .msp, SpectraST .sptxt or X!Hunter ASL MGF file. More...

#include <ms_spectral_lib_file.hpp>

Inheritance diagram for ms_spectral_lib_file:
Inheritance graph
[legend]
Collaboration diagram for ms_spectral_lib_file:
Collaboration graph
[legend]

List of all members.

Public Member Functions

 ms_spectral_lib_file (const char *fileName, const char *regexForAccession, const char *cdbFileName=0)
 Constructor.
 ms_spectral_lib_file (const char *fileName, const char *regexForAccession, const char *cdbFileName, const std::map< std::string, std::string > &modificationAliases)
 Constructor just for C++, accepting a list of modification aliases.
 ~ms_spectral_lib_file ()
 Destructor.
void appendErrors (const ms_errors &src)
 Copies all errors from another instance and appends them at the end of own list.
void clearAllErrors ()
 Remove all errors from the current list of errors.
void copyFrom (const ms_errors *right)
 Use this member to make a copy of another instance.
std::vector< int > findEntries (const char *sequence, const char *checksum=0, const char *accession=0, const char *mods=0) const
 Returns a list of entries that match the parameters.
std::string getAccessionFromNumber (const int number) const
 Returns the accession given the offset into the file.
std::vector< std::string > getAllMods () const
 Returns the complete list of mods named in the file.
std::string getChecksumFromNumber (const int number) const
 Returns the spectrum checksum given the offset into the file.
ms_spectral_lib_entry getEntryFromNumber (const int number) const
 Returns the individual spectrum from the msp file.
std::vector< std::string > getEntryFromNumberAsText (const int number) const
 Returns the individual spectrum from the msp file as a vector of strings.
const ms_errsgetErrorHandler () const
 Retrive the error object using this function to get access to all errors and error parameters.
std::string getFileName () const
 Returns the full file path passed to the constructor.
ms_spectral_lib::FILE_FORMAT getFormat () const
 Returns the format of the file specified in the constructor.
int getLastError () const
 Return the error description of the last error that occurred.
std::string getLastErrorString () const
 Return the error description of the last error that occurred.
std::string getModsFromNumber (const int number) const
 Returns the mods given the offset into the file.
int getNumEntries () const
 Returns the number of spectra in the msp file.
int getNumResidues () const
 Returns the number of residues in the msp file.
int getPrecursorChargeFromNumber (const int number) const
 Returns the precursor charge given the offset into the file.
double getPrecursorMZFromNumber (const int number) const
 Returns the precursor m/z value given the offset into the file.
long getQmatch (double minMz, double maxMz) const
 Return the number of spectra in the library with a precursor mass within the passed m/z range.
std::string getSequenceFromNumber (const int number) const
 Returns the spectrum peptide sequence given the offset into the file.
std::string getStatsInformation () const
 Returns some unstructured text giving some statistics for the file.
bool isValid () const
 Call this function to determine if there have been any errors.
bool saveAs (const char *fileName, const bool replaceProteinName=true, ms_spectral_lib::FILE_FORMAT fileFormat=ms_spectral_lib::FORMAT_NIST_MSP, const int startNumber=1, const int endNumber=-1, const ms_spectral_lib_entry::WHAT_TO_ANNOTATE whatToAnnotate=ms_spectral_lib_entry::ANNOTATE_REPLACE_QUESTION_MARKS, const double annotateTol=0.6, const char *annotateTolu="Da", const ms_umod_configfile *unimod=0) const
 Save a copy of the file in the specified format.
bool verifyThatModsAreInUnimod (const ms_umod_configfile &unimod)
 The modifications listed should all be in the passed unimod file.

Detailed Description

This class is used to encapsulate a complete NIST .msp, SpectraST .sptxt or X!Hunter ASL MGF file.

Support for spectral library searches was added in Mascot Server 2.6 and Mascot Parser 2.6. Although external NIST software is used for the spectral library search itself, there is a requirement to process the msp and SpectraST files which are plain text.

This class is used to open an msp file, and an individual entries can be obtained using ms_spectral_lib_file::getEntryFromNumber

The format of the msp file is multiple lines, each with a [param][colon][space][value]. For example,

 Name: AAAAAAAAAAAAAAAGAGAGAK/3
 MW: 1598.861
 Comment: Spec=Consensus Pep=Tryp 
 Num peaks: 150

followed by, in this case, 150 lines of peak data, followed by a blank line.

The format is defined here.

The SpectraST files (.sptxt) have minor differences, for example additional lines and a different format for the peak list. See saveAs() for details.

The X!Hunter ASL MGF format is described here.

See Spectral libraries for related information.


Constructor & Destructor Documentation

ms_spectral_lib_file ( const char *  fileName,
const char *  regexForAccession,
const char *  cdbFileName = 0 
)

Constructor.

Commonly used constructor.

The use of a cdb index file is optional. If one is specified in the constructor, and the file doesn't exist, or is incompatible, then the whole spectral library will be read and the index (re-) created during this constructor call. If no cdbFileName is specified, then the spectral library file will be opened, but only read on demand. For example, calling getNumEntries() obviously requires the whole file to be read, but calling getEntryFromNumber(1) only requires the first entry to be read.

The cdb index contains a lookup for the accessions found in the spectral library, and so is dependent on the regexForAccession. This means that if you try and create a ms_spectral_lib_file with a different regular expression, then the cdb file will be automatically re-created.

Parameters:
fileNameis the full path to the spectral library file
regexForAccessionis the regular expression used to extract the accession from the protein description. If a regular expression is not defined, then it is not possible to search for or return an accession. If the regexForAccession is of the incorrect format, then ms_errs::ERR_MSP_COMPILE_PARSE_RULE errors will be raised.
cdbFileNameis the full path to the cdb index file for the library file.
ms_spectral_lib_file ( const char *  fileName,
const char *  regexForAccession,
const char *  cdbFileName,
const std::map< std::string, std::string > &  modificationAliases 
)

Constructor just for C++, accepting a list of modification aliases.

Constructor for C++

The use of a cdb index file is optional. If one is specified in the constructor, and the file doesn't exist, or is incompatible, then the whole spectral library will be read and the index (re-) created during this constructor call. If no cdbFileName is specified, then the spectral library file will be opened, but only read on demand. For example, calling getNumEntries() obviously requires the whole file to be read, but calling getEntryFromNumber(1) only requires the first entry to be read.

The cdb index contains a lookup for the accessions found in the spectral library, and so is dependent on the regexForAccession. This means that if you try and create a ms_spectral_lib_file with a different regular expression, then the cdb file will be automatically re-created.

Parameters:
fileNameis the full path to the spectral library file
regexForAccessionis the regular expression used to extract the accession from the protein description. If a regular expression is not defined, then it is not possible to search for or return an accession. If the regexForAccession is of the incorrect format, then ms_errs::ERR_MSP_COMPILE_PARSE_RULE errors will be raised.
cdbFileNameis the full path to the cdb index file for the library file.
modificationAliasesare a map of modification aliases, such as "CAM" => "Carbamidomethyl". In Mascot Server, these are read from the library_mod_aliases file. The modification aliases are used in the created CDB file and when saveAs() is called.

Member Function Documentation

void appendErrors ( const ms_errors src ) [inherited]

Copies all errors from another instance and appends them at the end of own list.

Parameters:
srcThe object to copy the errors across from. See Maintaining object references: two rules of thumb.
void clearAllErrors (  ) [inherited]

Remove all errors from the current list of errors.

The list of 'errors' can include fatal errors, warning messages, information messages and different levels of debugging messages.

All messages are accumulated into a list in this object, until clearAllErrors() is called.

See Error Handling.

See also:
isValid(), getLastError(), getLastErrorString(), getErrorHandler()
Examples:
common_error.cpp, resfile_error.cpp, and resfile_summary.cpp.
void copyFrom ( const ms_errors right ) [inherited]

Use this member to make a copy of another instance.

Parameters:
rightis the source to initialise from
std::vector< int > findEntries ( const char *  sequence,
const char *  checksum = 0,
const char *  accession = 0,
const char *  mods = 0 
) const

Returns a list of entries that match the parameters.

The library may contain multiple spectra for the same sequence and/or multiple spectra with the same checksum.

The function 'ands' all the parameters, so if sequence, checksum, accession and mods are all supplied, it will only return spectra which match all four parameters.

If a cdb index file has been created, then the lookup will be fast because an index is saved in the cdb file.

Use getEntryFromNumber() with each value from the returned list to get the relevant ms_spectral_lib_entry objects.

See also:
ms_peptidesummary::getLibraryEntryId

Example code:

        matrix_science::ms_peptidesummary pepsum(...);
        std::vector<int> dbIdx, offset;
        std::vector<std::string> checksum, mods;
        pepsum.getLibraryEntryId(query, p, dbIdx, offset, checksum, mods);

        if (!dbIdx.empty()) {
            matrix_science::ms_peptide *pep;
            if (pepsum.getPeptide(query, 1, pep)) {
                std::vector<int> found = SLFile.findEntries(pep->getPeptideStr().c_str(), checksum[0].c_str(), 0, mods[0].c_str());

                // If the spectral library file has not changed since the search
                // was run then the 'offset' value[s] obtained from the results 
                // file will be the same as those returned from the findEntries call.
                // However, if the spectral library file has changed since the 
                // search was run, but the original spectrum is still present in the
                // updated spectral library file, but at a different offset, 
                // then it will be successfully returned using this
                // ms_spectral_lib_file::findEntries() call

                // Now, get the entry for each one (normally only one)
                SLFile.getEntryFromNumber(found[x]);
            }
        }
Parameters:
sequenceis the peptide sequence to find. It should just contain upper case A-Z
checksumis the string that would be returned by ms_spectral_lib_entry::getPeakListChecksum()
accessionis matched to the accession retrieved from the Protein= entry in the comment line. The accession is extracted from the protein using the regular expression passed to the constructor
modsis in the form exactly as in the spectral library file and as described in ms_spectral_lib_entry::getMods
Returns:
a vector of 'numbers'. See Using STL vector classes vectori, vectord and VectorString in Perl, Java, Python and C#.
std::string getAccessionFromNumber ( const int  number ) const

Returns the accession given the offset into the file.

The accession retrived from the Protein= entry in the comment line. The accession is extracted from the protein using the regular expression passed to the constructor The number supplied has no relation to spectrum id returned by mspepsearch.exe and is normally obtained using findEntries()

Parameters:
numbermust be in the range of 1..getNumEntries()
Returns:
accession for the spectrum number passed in parameter
std::vector< std::string > getAllMods (  ) const

Returns the complete list of mods named in the file.

The order of the names is just the order that they appear in the file and there will be no duplicate names.

See ms_spectral_lib_entry::getMods for a description of the format in the msp file

See Using STL vector classes vectori, vectord and VectorString in Perl, Java, Python and C#

Returns:
the list of modification names
std::string getChecksumFromNumber ( const int  number ) const

Returns the spectrum checksum given the offset into the file.

The number supplied has no relation to spectrum id returned by mspepsearch.exe and is normally obtained using findEntries()

Parameters:
numbermust be in the range of 1..getNumEntries()
Returns:
checksum for the spectrum number passed in parameter
ms_spectral_lib_entry getEntryFromNumber ( const int  number ) const

Returns the individual spectrum from the msp file.

The number supplied has no relation to spectrum id returned by mspepsearch.exe. Either iterate through the whole file, using number = 1..getNumEntries() or use findEntries() to get the number.

If the function fails to load the entry, then a fatal error: ms_errs::ERR_MSP_NIST_FAILED_TO_LOAD_ENTRY or ms_errs::ERR_MSP_NIST_INDEX_OUT_OF_RANGE is set in the returned ms_spectral_lib_entry object. Call ms_spectral_lib_entry::isValid() on the returned object to determine if it is safe to use it.

See also getEntryFromNumberAsText()

Parameters:
numbermust be in the range of 1..getNumEntries()
Returns:
an ms_spectral_lib_entry object
std::vector< std::string > getEntryFromNumberAsText ( const int  number ) const

Returns the individual spectrum from the msp file as a vector of strings.

The number supplied has no relation to spectrum id returned by mspepsearch.exe. Either iterate through the whole file, using number = 1..getNumEntries() or use findEntries() to get the number.

If the function fails to load the entry, then a fatal error: ms_errs::ERR_MSP_NIST_FAILED_TO_LOAD_ENTRY or ms_errs::ERR_MSP_NIST_INDEX_OUT_OF_RANGE is set in this object. Call isValid() to determine if it is safe to use the returned vector of strings.

See also getEntryFromNumber()

Parameters:
numbermust be in the range of 1..getNumEntries()
Returns:
a vector of strings
const ms_errs * getErrorHandler (  ) const [inherited]

Retrive the error object using this function to get access to all errors and error parameters.

See Error Handling.

Returns:
Constant pointer to the error handler
See also:
isValid(), getLastError(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Examples:
common_error.cpp, and http_helper_getstring.cpp.
std::string getFileName (  ) const

Returns the full file path passed to the constructor.

Returns:
the file path.
ms_spectral_lib::FILE_FORMAT getFormat (  ) const

Returns the format of the file specified in the constructor.

The file format of the file specified in the constructor is auto detected

Returns:
the auto detected file format
int getLastError (  ) const [inherited]

Return the error description of the last error that occurred.

All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.

See Error Handling.

See also:
isValid(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Returns:
the error number of the last error, or 0 if there have been no errors.

Reimplemented in ms_mascotresfile.

std::string getLastErrorString (  ) const [inherited]

Return the error description of the last error that occurred.

All errors are accumulated into a list in this object, until clearAllErrors() is called. This function returns the last error that occurred.

Returns:
Most recent error, warning, information or debug message

See Error Handling.

See also:
isValid(), getLastError(), clearAllErrors(), getErrorHandler()

Reimplemented in ms_mascotresfile.

Examples:
common_error.cpp, config_enzymes.cpp, config_fragrules.cpp, config_license.cpp, config_mascotdat.cpp, config_masses.cpp, config_modfile.cpp, config_procs.cpp, config_quantitation.cpp, config_taxonomy.cpp, http_helper_getstring.cpp, and tools_aahelper.cpp.
std::string getModsFromNumber ( const int  number ) const

Returns the mods given the offset into the file.

The number supplied has no relation to spectrum id returned by mspepsearch.exe and is normally obtained using findEntries()

See ms_spectral_lib_entry::getMods for a description of the format in the msp file

Parameters:
numbermust be in the range of 1..getNumEntries()
Returns:
mods for the spectrum number passed in parameter
int getNumEntries (  ) const

Returns the number of spectra in the msp file.

This value is retrieved from the cdb file if possible, or if no cdb file is specified, then the whole msp file will have to be parsed.

Returns:
the number of spectra in the file.
int getNumResidues (  ) const

Returns the number of residues in the msp file.

This value is retrieved from the cdb file if possible, or if no cdb file is specified, then the whole msp file will have to be parsed.

This is a slightly confusing number! It is the count of all of the residues in each sequence in the library. Just used because at the end of each Mascot search, we report the number of sequences and residues searched.

Returns:
the number of residues in the file.
int getPrecursorChargeFromNumber ( const int  number ) const

Returns the precursor charge given the offset into the file.

For MSP files, the charge is taken from the Name: line

Parameters:
numbermust be in the range of 1..getNumEntries()
Returns:
the precursor charge for the spectrum
double getPrecursorMZFromNumber ( const int  number ) const

Returns the precursor m/z value given the offset into the file.

There are several possible values in the Comment line to use for the precursor mz: Parent=865.409, Mz_exact=865.4092, Mz_av=865.898

This function returns the Mz_exact value if it exists, otherwise it returns the Parent= value.

In .sptxt files, there is also a separate PrecursorMZ: line, but this is currently not used.

See also getQmatch()

Parameters:
numbermust be in the range of 1..getNumEntries()
Returns:
the precursor m/z value for the spectrum
long getQmatch ( double  minMz,
double  maxMz 
) const

Return the number of spectra in the library with a precursor mass within the passed m/z range.

This function calls getPrecursorMZFromNumber() to find the number of matches within a precursor mass range.

Parameters:
minMzIs the lower limit to consider
maxMzIs the upper limit to consider
Returns:
the number of entries in the library with a precursor mass between minMz and maxMz
std::string getSequenceFromNumber ( const int  number ) const

Returns the spectrum peptide sequence given the offset into the file.

The number supplied has no relation to spectrum id returned by mspepsearch.exe and is normally obtained using findEntries()

Parameters:
numbermust be in the range of 1..getNumEntries()
Returns:
sequence for the spectrum number passed in parameter
std::string getStatsInformation (  ) const

Returns some unstructured text giving some statistics for the file.

Returns:
a free form multi-line text string
bool isValid (  ) const [inherited]

Call this function to determine if there have been any errors.

This will return true unless there have been any fatal errors.

See Error Handling.

Returns:
True if no fatal error occured
See also:
getLastError(), getLastErrorString(), clearAllErrors(), getErrorHandler()
Examples:
common_error.cpp, config_enzymes.cpp, config_fragrules.cpp, config_license.cpp, config_mascotdat.cpp, config_masses.cpp, config_modfile.cpp, config_procs.cpp, config_quantitation.cpp, config_taxonomy.cpp, http_helper_getstring.cpp, peptide_list.cpp, resfile_summary.cpp, and tools_aahelper.cpp.
bool saveAs ( const char *  fileName,
const bool  replaceProteinName = true,
ms_spectral_lib::FILE_FORMAT  fileFormat = ms_spectral_lib::FORMAT_NIST_MSP,
const int  startNumber = 1,
const int  endNumber = -1,
const ms_spectral_lib_entry::WHAT_TO_ANNOTATE  whatToAnnotate = ms_spectral_lib_entry::ANNOTATE_REPLACE_QUESTION_MARKS,
const double  annotateTol = 0.6,
const char *  annotateTolu = "Da",
const ms_umod_configfile unimod = 0 
) const

Save a copy of the file in the specified format.

SpectraST format files can be converted to files that will be read by NIST tools using this function

SpectraST and NIST (MSP) files differ in the following ways:

  • SpectraST files commonly have comments at the top. These are never written if the format is FORMAT_NIST_MSP
  • MSP files have the line: "Num peaks:" while SpectraST files have the line "NumPeaks:"
  • The peak list, following the "Num peaks:" or "NumPeaks:" in each entry has a different format
    • MSP files have 3 columns. The first column for m/z, the second for intensity and the third, in double quotes, is used for quoted annotations.
    • SpectraST files have 4 columns. The first two are idenctical to MSP files, but there are then two further columns for peak annotations that are unquoted.
Parameters:
fileNameis the file to write to
replaceProteinNamespecifies if the Protein="..." field in the comment section should be replaced with Protein="[1..n]:checksum" where

fileFormatspecifies the format as described above
startNumberis the (index) number of the first spectrum to be saved. and should be in the range 1..getNumEntries(). The default value is 1
endNumberis the (index) number of the last spectrum to be saved. and should be greater that startNumber and in the range 1..getNumEntries(). A value of -1 (the default) is used to specify that it should go the end of the file.
whatToAnnotateis used to specify whether existing annotation should be replaced.
annotateTolis the value in the units specified, for matching to the calculated data. Only peaks within this tolerance will be annotated. Other peaks will be annotated with a "?"
annotateTolumust be "Da", "mmu" or "ppm".
unimodis required if any entry has any modifications that are just specified by name. Otherwise, there is no way to calculate the fragment ion masses.
Returns:
true if the file can be saved, false otherwise.
bool verifyThatModsAreInUnimod ( const ms_umod_configfile unimod )

The modifications listed should all be in the passed unimod file.

If any mods are not found, then a warning ms_errs::ERR_MSP_NIST_MODIFICATION_NOT_FOUND is added

Parameters:
unimodis a reference to the unimod file
Returns:
true if all modifications are listed in the passed unimod list

The documentation for this class was generated from the following files:

Copyright © 2022 Matrix Science Ltd.  All Rights Reserved. Generated on Thu Mar 31 2022 01:12:38