Posted by Chris Allen (July 15, 2013)

Running Mascot in a virtual machine

One of the most significant trends in IT in recent years has been the shift towards virtual machines. Virtualisation can offer a host of advantages such as consolidation, elastic provisioning, high availability / disaster recovery, multi-tenant isolation, and legacy OS support. At Matrix Science for example, being able to run many different operating system configurations on a single PC makes our work of testing Mascot Server a lot easier.

Do the same considerations apply to a production server running Mascot searches 24/7? In some cases the answer may be yes, but more often, installing Mascot Server in a VM may not be the best solution unless very careful consideration is given to resource allocation.

A Mascot licence is priced according to the number of processor cores available for searches. Unless the licence is for many processors, the cost of the licence is likely to be greater than the cost of the hardware. This makes it important to choose fast hardware and make maximum use of it. A VM is an additional layer of software that can only cause some reduction in efficiency, but let’s assume that this effect is negligible for the moment. A much more important factor is whether there are other VMs running simultaneously on the same machine because these may be competing with Mascot for processor time unless isolated appropriately. For example, if you have a 3 CPU (12 core) Mascot licence and a physical server with 16 cores, then ensuring that the other VMs never require more virtual CPUs than the “spare” unlicensed cores will probably be perfectly fine. An extreme example of virtualisation being misused would be if someone with a Mascot Cluster licence for 6 CPU created 3 VMs on a dual quad-core processor PC, each configured as a 2 CPU (8 core) search node. Everything would appear to run OK, but the speed of the system would be no better, in fact almost certainly worse, than if they had saved their money and bought a 2 CPU Mascot licence.

If this is understood and you still wish to install Mascot Server in a VM, you need to ensure that the virtual processors are correctly configured. Mascot 2.3 and later will use up to 4 cores per licensed CPU and it doesn’t matter how these are grouped into processor “sockets”; you just need to make sure that the VM includes the correct total number of cores. Earlier versions of Mascot used socket-based licensing, details of which can be found on this page.

Another factor to consider is that search speed depends heavily on the sequence database files being held in memory during the search. This makes it important that the VM is given access to as much physical RAM as practical. Commonly, the default VM configuration assigns a limited amount of memory to each VM on the assumption that it will be one of many.

Storage arrangements can also impact Mascot performance. If several VMs have their virtual disks on a single physical drive, this creates a potentially significant bottleneck when two or more VMs are doing I/O at the same time. On the other hand, you need to plan for enough future storage capacity for sequence databases (some of which grow in size each month) and your search results and related result report cache files which are all under the Mascot “data” directory. And of course you do have a suitable backup regime don’t you?

A word of caution: Don’t be tempted to use VM snapshot functionality as a Mascot data backup mechanism. Aside from other issues, if the disk on which the snapshots are stored dies then you lose everything. Also, using snapshot volumes can incur a performance penalty even during normal system operation. Certainly, the very large sequence database, taxonomy and “data” directories should be kept on snapshot-independent volumes otherwise snapshot operations will end up being incredibly slow and you’re likely to lose search results when switching back to an earlier snapshot. Quite apart from that, the use of snapshots in a production environment should generally be limited to very short term rollbacks for various important reasons beyond Mascot.

One comment on “Running Mascot in a virtual machine

  1. Samuel G. on said:

    I agree on most of the points. Virtualisation has large benefits when managing many services that should be isolated and redundant. The virtualisation might not be the right answer in the scientific community when doing intensive computation and/or using big data. To virtualise or not must take into account the real use and needs. At our place, Mascot is virtualised because queries are computed within a correct delay and the number of queries does not make the CPU breathless. The disks are mapped as RAW (direct access). As in our previous un-virtualised installation, sequence files and cache are separated from data, and cached files older than two months are removed automatically. Quite happy with that setup.

Leave a Reply to Samuel G. Cancel reply

Your email address will not be published. Required fields are marked *


HTML tags are not allowed.