Posted by David Creasy (August 14, 2014)

Integrating Mascot into a proteomics pipeline

A number of academic groups and commercial companies have written their own software ‘pipelines’ to automate peak picking, searching and post processing of their raw MS data. The aim is generally to reduce the amount of researcher time that’s often spent after the data have been collected. However, we’d encourage you to look at existing software before embarking on your own project. The software provided by the instrument manufacturers, or a combination of Mascot Distiller, Mascot Daemon and Mascot Insight may do 99% of what you need.

In our experience, the real problem with home built pipelines starts at the design stage. It’s surpsisingly difficult to specify exactly what you want to be implemented and to then communicate this to a developer who may have little MS experience. Inevitably, many of these projects seem to fail. However, if you do decide to develop your own pipeline, there are tools from Matrix Science that will help with the this. There are three main tasks that will be covered in this and a subsequent blog article:

  • Creating peak lists from raw data
  • Submitting searches to a Mascot Server
  • Parsing the results from a Mascot search.

For the first two stages, it may well be that Mascot Daemon, possibly in combination with the Mascot Distiller Daemon Toolbox will do all you need. You should certainly look at that and try it out before writing your own software. Mascot Daemon can monitor a directory for new files, it can run custom software using “External Processes” before and after each search and from version 2.5 onwards you can even specify that your own software will be used to convert raw files to peak lists. It’s hard to see why you might want to write own software for these steps, but if you do, keep reading!

Creating peak lists from raw data

There are a number of tools available to do this, some free and some provided by the instrument manufacturers

From Matrix Science, the Mascot Distiller Developer toolbox provides a uniform Application Programmer Interface to all of the different raw file formats, greatly reducing development time if you have instruments from more than one manufacturer. A surprisingly small amount of code is required to open a raw file and produce a peak list. And, if you want to do some special processing of the data, there’s rich functionality available in the toolkit to enable this. The thirty day trial licence gives you access to the SDK.

The best known open source software has been developed by the OpenMS group. There are libraries and utilities to read the raw data from some instruments and convert this to a number of different text formats.

Submitting searches to a Mascot Server

It is highly advisable to use HTTP for all communication with a Mascot server. One reason is that Mascot may be running on Linux, but your pipeline running on Windows or vice versa. It’s quite possible to write all your own code using the information provided in the Mascot Installation and Setup manual, but we’ve provided some interfaces in Mascot Parser to make this relatively easy. Mascot Parser is free unless you want to include it in a commercial product. It can be accessed from Java, Perl, Python and C++. The steps using Mascot Server are:

Connect to a Server

Create a new ms_http_client, passing the base URL and suitable settings.

matrix_science::ms_connection_settings settings;
settings.setUserAgent("your_application_name");
matrix_science::ms_http_client server("http://your_mascot_server/mascot/cgi/", settings);

Get a Session

Login, passing a username and password if required, to obtain a new ms_http_client_session. If security has not been enabled, then the username and password can be blank.

matrix_science::ms_http_client_session session;
matrix_science::ms_http_client::LoginResultCode loginResult = 
  server.userLogin("your_username", "your_password", session);
// ... check success

Create a Search

Get a new ms_http_client_search from the session.

std::string httpHeader = "Content-Type: multipart/mixed; boundary=---MyPipelineMascotSearch";
matrix_science::ms_http_client_search search;
matrix_science::ms_http_helper_progress progress;
bool isOk = m_session->submitSearch(search, httpHeader, "", "your_source_mime_filename", "", progress);
// ... check success

Follow its Progress

Follow the progress of the search on the server until is is complete.

matrix_science::ms_http_client_search::SearchStatusCode code;
int progressPercent;
do {
    bool isOk = search.getStatus(code, progressPercent);
    // ... check success
    // ... waiting for search to complete
    double x = progressPercent;
} while (code == matrix_science::ms_http_client_search_status::TS_RUNNING);

Download the Results

Download the results.

bool isOk = search.downloadResultsFile("your_target_download_filename");
// ... check success

Parsing the results from a Mascot search

For ms-ms data, the Mascot Results files contain scores and related information for peptide spectrum matches (PSMs) but no protein inferencing. Protein inferencing is complex and writing new software to perform this from scratch is not something to be undertaken lightly. Mascot Parser provides two different algorithms for this, and all the standard Mascot reports and export functions use Mascot Parser. Further tips will be provided in a later blog article.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.