Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Posted by Ville Koskinen (May 15, 2023)

How old is the software in that publication?

Most proteomics data analysis requires complex and sophisticated software. When you read the methods section of a journal article, it should state the software name and version number, as this is the minimum required for reproducibility of the data analysis. Often it is useful to also know how old the software package is. If it’s Mascot Server, there is now a trivial way to find out: we’ve added the full Mascot Server release history to the technical support page. It lists the date of every major release since 2004 as well as the date of the latest patch release.

Screenshot of the Mascot Server release history page

The information on the release history page has always been available in the News section of this website. However, it was becoming tedious to find, as you had to scroll through several years of history to reach the release announcement. The new release history page also lists the major new functionality and improvements in each version, which should make it easy to compare versions.

Next time you are writing a methods section, do state the release date, or even just the year. Many software packages have frequent releases, and it is rarely obvious from the version number how old the software is. The lead time from planning and running the experiment to publishing a paper can be long, and sometimes there is also a good reason to stick to an old version. On the other hand, if the software used is more than a decade old, how much could the data analysis have been improved by using a more recent version? And if you wanted to reproduce it, would you be able to access and install that old version on a modern PC?

For example, suppose you identified endogenous peptides using Mascot Server 2.4, which was released a decade ago. Repeating the search in version 2.8 against the same database should find at least the same peptides. This is because we try hard to maintain reproducibility when updating Mascot, and it should never have a negative impact on the results. Updating Mascot may have a positive effect. Indeed, version 2.8 greatly improves the identification rate of endogenous peptides, so the latest version will find both the previous set of peptides and many more that the old version could not.

Reproducibility isn’t a feature of many proteomics software packages. Some have wild swings in behaviour between releases, so it makes sense to standardise your workflow to a specific version and never update it again. On the other hand, anyone who tries to reproduce the data analysis will need the exact same version. If important scientific conclusions hinge on using a specific version of an old software package, are they truly reproducible?

When the software package version is stated, but the release date isn’t, here is an example of the detective work that is sometimes required. Recently, we came across a paper published in 2023 that used ProteoWizard (msconvert) 3.0.4468. All fine and good, but it’s impossible to say for certain when that version was released. The latest version in May 2023 is 3.0.23121. A web search finds 3.0.4468 mentioned in scientific publications as long ago as 2013. Was it released in 2013? Or is it even older? At the time of writing, the ProteoWizard website has no publicly accessible release history and we haven’t heard back from their support team, so it’s impossible to tell. If one of the ProteoWizard developers is reading this, please add the information as a comment!

The raw data for the study is available, so could the peak lists be reproduced meaningfully in a later version of ProteoWizard? Would a later version, in fact, have improved the quality of the data analysis? The old version is no longer available for download from any official source. It’s probably possible to get the source code of 3.0.4468 from the project’s GitHub repository, but compiling code from ten years ago is hardly trivial.

If you have a Mascot Server 2.8 licence, you are eligible to install version 2.8 or any earlier version. Same applies to Mascot Distiller. If you’re under support and you need to temporarily install an older version to repeat a database search, please ask us for a temporary 30-day product key. The installer packages for all the historical versions remain available, although installing very old versions on the latest Windows may be difficult. If you just need to view old results, you can drop the file in the ‘data’ directory and open it in Protein Family Summary. Mascot Server 2.8 can read results files from all past versions.

Finally, if your colleague has forgotten to write the software version number in your next publication draft, please fix their mistake. The Mascot Server version number can be found from your local Mascot home page, click Database Status and look at the top of the status page. It’s also saved in the results file. The Mascot Distiller version is in the Help menu, About Mascot Distiller. The Mascot Daemon version number is less important, as it does not perform any data analysis on its own, but can be found from the Daemon GUI, Help, About.

Keywords: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.