Sequence database setup: UniProt proteomes
A UniProt complete proteome consists of the set of proteins thought to be expressed by an organism whose genome has been completely sequenced. A reference proteome is the complete proteome of a representative, well-studied model organism or an organism of interest for biomedical research.
First, you need to discover the Proteome ID for your proteome of interest. For example, go to http://www.uniprot.org/proteomes/ and search for rice by name or by taxonomy ID. The Proteome ID for Oryza sativa subsp. japonica is UP000059680
In Database Manager, create a new custom definition, as follows:
- Fasta or New database; Create New
- Use pre-defined template; UniProt_proteome_template
- Download from remote URL; Next
- Set up download URL
- Paste the following into the FASTA file URL field, where the proteome ID is for your proteome of interest
- Save; Start downloading
The complete configuration for the rice proteome in Database Manager would look similar to this (except URL, which is outdated format)
Once configured, You can enable automatic updating by clicking on the database name then choosing Edit schedule.
- Locate the proteome for your organism of interest by searching by name or by taxonomy ID at
- Click on the Proteome ID link
- Click on the Download button and choose All protein entries, Fasta (Canonical and isoform), compressed
Taxonomy is not required for a single organism database
When a single entry is expanded into entries for multiple isoforms, they share the same ID, so AC must be used as the unique identifier
>sp|Q67W82-2|4CL4_ORYSJ Isoform 2 of Probable 4-coumarate--CoA ligase 4 OS=Oryza sativa subsp. japonica GN=4CL4
AC from Fasta title: ">..|\([^|]*\)"
Description from Fasta title: ">[^ ]* \(.*\)"
A Fasta file containing canonical and isoform sequence for the rice proteome was downloaded to /usr/local/mascot/sequence/rice_proteome/current, and renamed to rice_proteome_20120414.fasta.
Full text for individual entries can be retrieved across the web from Uniprot:
Parse rule: RULE_23 "\(.*\)"
Always test a new definition before applying the changes to mascot.dat