Welcome to SIMAP.

SIMAP is a database containing the similarity space formed by about all amino-acid sequences from public databases and completely sequenced genomes.

You may find sequences and protein entries of interest by fulltext search which uses an index of proteins IDs, accession numbers and descriptions, and the Biothesaurus.
Starting from your query sequence you may find the nearest sequences in SIMAP. By searching parts of your query in a suffix array of all SIMAP sequences (generated by VMATCH), this search runs much faster than BLAST.


  • 2014, Jun 5: New release of SIMAP including upgrade to InterPro 47.0
    The new release of SIMAP is now online. All pre-calculated features have been upgraded to InterPro 47.0.
  • Total service downtime on Apr 28 and 29: We are moving our entire server infrastructure to another building. All our IT services, including BOINC, will therefore be offline on Monday, 28th of April and Tuesday, 29th of April.
  • 2014, Mar 15: New release of SIMAP including upgrade to InterPro 46.0
    The new release of SIMAP is now online. All pre-calculated features have been upgraded to InterPro 46.0.
  • 2013, Dec 14: Next release scheduled for Feb 2014
    The next release of SIMAP is scheduled for February 2014. The delay is caused by the approx. 6 million new sequences imported from the restructured NCBI RefSeq database in Aug 2013. Due to the limitation of our computational capacity to about 1.2 million sequences per month, the calculation of hits for these sequences is still ongoing. The new release will also be upgraded to the latest Interpro release.
  • 2013, Dec 14: Removal of metagenomes and re-calculation of SIMAP
    We going to migrate the entire SIMAP database in 2014 to a new algorithm for sequence similarity calculation. We will not longer use FASTA heuristics but switch to the exact Smith-Waterman algorithm. The scoring by BLOSUM50 will be continued, but will incorporate composition-based score adjustment, such as in BLAST. During the migration we will maintain the current, FASTA based SIMAP, thus keeping SIMAP up-to-date and online. However, in order to minimize the computational and storage requirements, we will temporary remove those metagenomes from SIMAP that were imported from IMG/M, Camera, HMP and ENA/WGS. Only environmental sequences in ENA (from Uniprot/TrEMBL) will remain. Metagenomes will be integrated again after the migration of the non-metagenomic SIMAP to the new algorithm has been finished.
    If you urgently need sequences of particular metagenomes, please inform thomas.rattei@univie.ac.at as soon as possible. We will then check if we can keep these in SIMAP.