Mascot Server in the cloud

Overview

Cloud computing is an increasingly popular model for hosting services and client-server applications. There are several public clouds, such as Amazon Web Services, Microsoft Azure and Google Cloud Platform. The cloud providers own, run and maintain physical data centres, which are partitioned into virtual machines. The virtual machines, or even smaller computing units, are typically offered as a pay-as-you-go service with no upfront capital cost.

Mascot Server can be run on any infrastructure as a service (IaaS) cloud provider. The only requirement is that the cloud provider offers virtual machines with suitable CPU, RAM and disk resources. Whether hosting Mascot in the cloud is the right choice depends on a number of factors, which are described below.

We don’t currently provide Mascot under software as a service (SaaS) model.

Benefits

There are several benefits to cloud hosting compared to physical hardware:

  • Reduced up-front costs – Effectively, you are only paying for the computing time you use.
  • High performance – Connectivity to the Internet for downloading sequence databases is excellent and most cloud providers have powerful hardware.
  • Security – Data centres have a high level of physical security, and cloud VMs typically have very secure default configuration.
  • Snapshots, cloud storage – Easy and secure data backup.
  • Convenience – Hardware maintenance and upgrading is outsourced to a third party.

Drawbacks

On the other hand, the ongoing cost of running a cloud service can be higher than physical hardware. Every case is different and it depends on whether you have to pay overheads for an in-house server, such as power, Internet bandwidth and IT support. It also depends on whether you have the server running continuously, how often you update in-house hardware and whether you transfer large volumes of data to and from the cloud platform. Some organisations also have legal, regulatory or organisational requirements that prevent sending sensitive data to a public cloud.

Mascot Server architecture

Mascot Server is a client-server system. The server can be hosted anywhere as long as clients have network access to it. For simplicity, think of Mascot Server as composed of two parts. There is the web interface, which allows submitting searches, viewing search reports and uploading or updating sequence databases. Then there is the backend, which preprocesses and executes the search.

The web interface has low processor requirements and uses a modest amount of RAM. When you browse search results, the workload is very typical of the average case for which public cloud platforms are optimised. However, the execution model of the backend sets certain constraints on effective utilisation in the cloud.

The Mascot backend requires sequence databases to be initialised (compressed) before use, which can take minutes (SwissProt) to several hours (NCBIprot). The compressed files are then memory mapped. The assumption is, the database will be searched hundreds or thousands of times between database updates. When there is enough available RAM, the operating system keeps the memory-mapped database entirely in memory, giving the best possible performance per search. There is no limit to the number of active databases, as the other assumption is, different users may search different databases and all of them should be available for client programs.

The Mascot database search is designed for maximum processor utilisation using a high level of threading. Data sets are split into parts and the parts are searched independently on any number threads and processor cores. This is not a typical workload for public clouds, which tend to be priced for low-CPU applications. Depending on your usage pattern, Mascot processor usage is bursty (short, infrequent searches) or sustained (long, continuous/concurrent searches) or something in between.

Mascot Server is designed to be able to run any number of database searches concurrently. It doesn’t have a built-in job control or a queueing system. If a long search is running and a user submits a smaller search, the small search is run in parallel with the long search. This will slightly slow down both searches, but the benefit is, the long search does not block short searches. It is very difficult, in general, to predict search duration from the search parameters.

Provisioning requirements

The consequence of the above facts is that Mascot Server functions optimally when it stays resident on dedicated hardware or a dedicated VM. It is not designed to be stopped and restarted for every database search.

As a rule of thumb, if you need to run Mascot searches continuously throughout the day and you require searches to start without delay, it is often better to install Mascot on a physical server. The upfront cost is higher for the hardware purchase, but the annual running costs will be lower than a high-performance virtual machine in a public cloud running 24/7. You could also consider renting a dedicated server at a flat monthly fee, rather than pay-as-you-go, which can work out cheaper.

On the other hand, if you only need Mascot only intermittently, it can be cheaper run a cloud VM on demand and suspend it when it’s not needed. This is because you typically only pay for storage when a cloud VM is suspended, and cloud storage is inexpensive. Some cloud platforms like Amazon Web Services offer so-called spot instances at very low cost, where compute speed is not guaranteed and the VM may be terminated prematurely if there is not enough capacity. This can be an inexpensive way to run Mascot, although the unpredictability is not suitable for everyone.

If your licence is for 4-CPU or more, a middle ground could be configuring Mascot in cluster mode. Set up a master node with modest resources, and a cluster node or nodes using more powerful VM templates. When you want to be able to browse search results, but don’t want to pay for the cluster nodes, you could suspend one or more cluster nodes. Mascot will detect that a node is no longer available and remove it from the cluster. When the node is restarted (at the same IP address), Mascot will add it back to the cluster. Mascot does not currently support dynamic scaling, such as Amazon EC2 Auto Scaling, so suspending and restarting nodes would be done manually or would need to be controlled by your client program. There will also be a startup delay, as any new or updated sequence databases or configuration will be propagated to the nodes.

Virtual hardware specification

Whether you have a 1-CPU or a cluster licence, it is simplest to initially start with a single, standalone virtual machine. Most cloud providers offer a wide range of VM types optimised for different workloads, such as high memory or high CPU. The pricing can be very complex, and the right type depends on your specific workload, computing requirements, data access requirements and budget.

In short, the cloud VM will need an Intel or AMD processor with a sufficient number of physical cores, at least 16GB of RAM (preferably 64GB) and plenty of disk space. The considerations are similar to general hardware virtualisation, except you usually have less control over things like CPU mapping. There is no need to select RAID as disk type if one is offered. Although it is beneficial to have disks in a RAID configuration on a virtual machine host, we have not found any difference in Mascot Server performance if the cloud VM has a virtualised RAID array.

The Mascot licence locks to the virtualised hardware MAC address. At least one of the network interfaces assigned to the virtual machine must have a stable, fixed MAC address.

Cost and licensing

There is no additional licensing charge to running Mascot Server in the cloud; the cost is the same whether the server runs on physical or virtualised hardware. You will need to pay the cloud platform a usage-based or monthly fee, which depends on the platform, but this is separate from the software licence.

We advise giving it a try to see what your actual monthly bill looks like. Provided you have a Mascot Server licence under warranty or support, we can give you a free, 1-CPU, 30-day licence for this purpose.

Example: Amazon Web Services

We provide a turn-key Amazon Machine Image (AMI) for provisioning Mascot Server on Amazon Web Services (AWS). See Hosting Mascot Server on Amazon Web Services.