Mascot: The trusted reference standard for protein identification by mass spectrometry for 25 years

Posted by Ville Koskinen (February 16, 2024)

How to run the public Mascot service

The public Mascot service running on this website recently had its 25th birthday: the service was launched in November 1998. The purpose is to let you evaluate the product before buying, but it’s also useful for small data sets and proteomics training courses. We’re a small company and our main activity is software development. The secrets to running the public service are minimising ongoing costs, reducing the amount of system administration and avoiding unnecessary development costs.

In January 2024, we upgraded the server hardware and operating system, but there is a bit more to the story. The hardware has, of course, been upgraded several times between 1998 and 2023. Most recently, we ran a Mascot cluster with up to 12 search nodes (blade servers) using IBM BladeCenter hardware, colocated in a data centre not far from London, UK. The building remained the same for the 25 years, although the data centre went through 7 different owners as IT and telecoms companies consolidated.

The IBM hardware was incredibly robust. We upgraded the blade servers, RAM and disks a couple times, but the chassis and system modules remained the same throughout. We also replaced the head node with a more powerful Dell server in 2018. As the IBM blades wore down, we had to replace one and eventually run the cluster down from 12 to 10 to 6 active blades. The capacity reduction was possible because we’ve made Mascot searches more efficient, and also because we moved NCBIprot behind a login at the start of the Covid-19 pandemic.

Before 2020, a lot of computing time was wasted on running PMF searches against NCBIprot without a taxonomy filter. This won’t get you statistically significant matches due to the size of the database (hundreds of millions of protein sequences), but one of the lessons of running a service open to the public is: if it’s possible to submit a wonky search, it will happen. We also get a lot of searches against the alphabetically first database, which is often the contaminants database (247 sequences).

Replacing the old hardware became necessary, as it was well beyond end of life and we started having to do frequent visits to the data centre to fix issues. There were two options. We could either continue colocating own hardware and make a big capital purchase, or we could choose among the many cloud providers. The costs of cloud computing and server rental have reduced in the last decade while colocation costs have been increasing, so cloud computing was the obvious choice. We opted to rent the new hardware from a small UK-based provider, as they offer fixed monthly costs at a much lower price than giants like AWS.

The new server is a single ‘bare metal’ box with an Intel Xeon Gold 24-core processor, 256GB RAM and RAID1 NVMe disks. Although processor performance had a lull in the 2010s, speeds have again been increasing in the last few years. The Gold 24-core processor is as powerful as the 80+ core BladeCenter from a decade ago. The new server not only uses less power and takes up less space, it runs in a recently built data centre powered by renewable energy, so it’s a win all around.

Hardware is only half of the story. Running the public service for free is possible for a small company because there isn’t a lot of active maintenance that needs to be done, saving on personnel time and costs. This is thanks to several Mascot Server qualities:

  • Designed for remote use – Search submission, reports and most admin tasks are done through the web browser.
  • Fully automated – Once the system is set up, little manual admin is required.
  • Runs on commodity hardware – Any standard Intel/AMD server is suitable, no expensive GPUs or exotic hardware required.
  • Few system dependencies – Use any Windows or Linux version. We run Mascot on Debian Linux, which is stable and trouble-free.
  • Fault tolerance – All kinds of dodgy input gets submitted to a public service. Mascot copes with invalid, faulty and incomplete input.
  • Mascot Security – Automatically apply limits on the size and number of concurrent ‘guest’ searches to ensure everyone gets a fair slice.

Migrating Mascot Server from the old to the new system was just a matter of copying the ‘mascot’ directory and adjusting filepaths in configuration files. The public server also runs the company website and other things, and it took longer to test and validate those than Mascot Server itself.

We use various monitoring tools like Checkmk to get alerts if the system is running low on resources or a hardware fault develops. Mascot Server also sends admin e-mails if a search ends with a fatal error or some other error occurs. This means nobody needs to babysit the system to guarantee its operation.

Finally, the public service runs the unmodified retail version of Mascot Server to avoid extra development cost. When we prepare and release a new version, we always install it on the public website. This means every major version gets, on average, a couple of years of 24/7 real-life testing with everything thrown at it. Any bugs discovered and fixed for the public website always end up in a patch release, benefiting everyone.

Keywords: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.