Posted by Ville Koskinen (November 16, 2021)

Retiring the File Transfer Protocol (FTP)

Mascot Server includes Database Manager, which can be configured to download protein sequence databases from many sources using FTP (File Transfer Protocol), HTTP (Hyper Text Transfer Protocol) and HTTPS (HTTP Secure). FTP is the oldest of these protocols by far, making its first appearance in 1971. Until some years ago, FTP was the best and only choice for large file downloads, and FASTA files used to be only made available on FTP servers. However, its technical advantages have diminished over time, and it has several problems that have been solved with newer protocols:

  • FTP has no error checking;
  • FTP servers can be difficult to access through firewalls and proxy servers;
  • FTP has little support for encryption.

HTTPS is increasingly used as the default protocol. Earlier this year, the neXtProt project retired their FTP server, making downloads available by HTTPS only. This seems to be part of a general trend, perhaps triggered by the 50th anniversary of the protocol. Mozilla announced that the integrated FTP client will be removed from Firefox. Google removed FTP support in Chrome 95. Microsoft Edge, based on the same Chromium project as Google Chrome, removed FTP support in version 88.

At the time of writing, all the major protein sequence database servers are available via FTP and HTTPS, including NCBI, EBI and Uniprot. Some servers also allow access via HTTP, although most will redirect the request to HTTPS. (The only exception is ftp.thegpm.org, which only supports FTP and HTTP; trying to use HTTPS gives a certificate error.)

One of the advantages of HTTPS over FTP is built-in error detection. This is a consequence of how the protocol is implemented. First, the HTTPS client (e.g. web browser) establishes an encrypted TLS connection to the server. Then, the client program tunnels HTTP traffic through the TLS connection. TLS is a low-level cryptographic protocol that encrypts data and as a side effect, also checks the data integrity. If there are any transfer errors, such as an unreliable network or data packets arriving in the wrong order, the connection terminates with an error. The HTTPS client can then resume the download from the last known good position. FTP has no error detection, so any transfer problems can silently result in corrupt data.

Database Manager continues to support FTP downloads with custom database definitions. However, from November 2021, the predefined databases no longer use FTP. Download URLs have been switched over to HTTP. This includes the commonly used databases like SwissProt, contaminants and all the EST databases, as well as taxonomy files downloaded from NCBI. Database Manager will automatically upgrade the connection to HTTPS if the target server has a redirect from HTTP to HTTPS. All the other functionality remains the same.

A small side effect on the switchover is database version numbering. Some databases like SwissProt, Trembl and the EST databases have formal versioning. At the time of writing, the latest SwissProt version is 2021_03. If you downloaded SwissProt 2021_03 via Database Manager before November 2021, update the database after the switchover and there is no new version available yet, Database Manager will download version 2021_03 again and call it 2021_03x. The same version is re-downloaded because the URL has changed. The ‘x’ is appended to avoid a version conflict. When SwissProt 2022_01 becomes available, updating SwissProt via Database Manager will download the new version.

Whether you will notice any other change depends on your version of Mascot.

Mascot Server 2.7 and later

Mascot versions 2.7 and later have native HTTPS support, so database updates will work as before.

Mascot Server 2.6

If you have version 2.6, please install patch 2.6.2 for full HTTPS support. The patch is available for free.

Mascot Server 2.5

Version 2.5 has partial HTTPS support, in that it is able to follow a redirect from HTTP to HTTPS for simple URLs. Database updates of predefined definitions should continue to work. The exception is the predefined Uniprot proteomes, which contain a complicated query string and which Mascot 2.5 is not able to download via HTTPS. This is a known issue that predates the switch from FTP to HTTP.

Mascot Server 2.4

Mascot Server 2.4 doesn’t support HTTPS. If you enable a predefined definition like SwissProt in Mascot Server 2.4, the software is able to download the latest configuration from matrixscience.com. However, when Database Manager tries to download any database files, it will fail to do so. The workaround is to download and extract the FASTA file and any taxonomy files manually and copy them to the correct directories. The instructions can be found in sequence database setup help.

Mascot Server 2.3 and earlier

Versions 2.3 and earlier don’t have Database Manager, so are unaffected.

Keywords: , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

HTML tags are not allowed.