SIMAP web-service

New:

  • Axis 2 Web Service:
    A completely new Web Service has been implemented based on Axis 2. This is easier to use, more stable and provides a better performance. We suggest changing to the new Web Service as soon as possible. The previous Web Service, however, will remain online until 2014.
    URL of the new Web Service: http://ws.csb.univie.ac.at/simapwebservice/services/SimapWS?wsdl
  • Distributed Annotation System (DAS) service:
    The DAS interface of SIMAP has a new URL: http://ws.csb.univie.ac.at:8080/das

Accessing SIMAP using the web-service

If You want to create a web-service based client by a tool of Your choice, You will find the WSDL [here]. Please note, that the service is document/wrapped and therefore, some client systems as SOAP::Lite will fail to use it. If You depend on programming in such an environment, You might either evaluate the SOAP messages 'by hand' or create a wrapper around the SimpAT package (see below).

If You are developing in Java, then You can simply integrate the SimpAT libraries.

This web-service has been implemented within the HOBIT project (http://hobit.sf.net/).

Accessing SIMAP using the SimpAT Package

The SimpAT (Simap Access Tools) allows easy access to the SIMAP database using SOAP based web-services. The package is written in Java and allows easy integration of SIMAP queries into own applications.

Requirements

The SimpAT package is written in the Java 1.5 programming language. To access SIMAP, You need access to the Internet. Since the amount of data transferred might be quite high, a broad-band access should be available.

Download

Download the package as tar-ball from http://fileshare.csb.univie.ac.at/simpat/simpat1.3.2.tar.gz
After decompressing, You should obtain following two directories:
lib: contains all needed libraries
eccb: contains example code to use SimpAT

You might now import the examples into your favorite development-tool. Don't forget to add the libraries in th lib folder to your project.

Querying SIMAP

SIMAP is queried using the unique Md5 key of an amino-acid sequence. This sequence must be in upper case with no space/newlines in it. The SimpAT package can compute this MD5 for You:

  try {
    SimapAccessWebService simap=new SimapAccessWebService();
    String sequence="MSELKKNVTQDNLWQETSPKK";
    String md5=simap.computeMD5(sequence);
    simap.setMD5(md5);
  }

Before retrieving a result list, one should set some cut-offs to the search. The more restrictive a cut-off is, the faster a query will be. There are three cut-offs built in. They can be combined and are then additive, which means, that in each case all three are evaluated.

The cut-offs are:
a) the E-Value cut-off
b) maximum number of hits (upper limit: 5000)
c) minimum alignment score

the cut-offs can be set in the SimapAccessWebService Object.

  simap.setMax_evalue(10e-25);
  simap.setMax_number_hits(5);
  simap.setMin_swscore(120);

If You want SIMAP to report also sequences and alignments to a hit, You must activate this:

  simap.alignments(true);
  simap.sequences(true);

Basically, SIMAP reports XML output:

  System.out.println(simap.getHitsXML());

Alternatively, SIMAP can report BLASTML style output:

  System.out.println(simap.getHitsByMD5BLASTML());

Alternatively, SimpAT can parse the XML in convenient Java objects for further analysis:

  ArrayList result = ResultParser.parseResult(simap.getHitsXML());

Each HitSet object contains all information on a hit. This information can now be accessed by getter methods:

  HitSet second=result.get(1);
			
  // a hit consists out of alignment data and hit data
  System.out.println(second.getHitAlignment().getAlignment_hit());
  System.out.println(second.getHitAlignment().getAlignment_markup());
  System.out.println(second.getHitAlignment().getAlignment_query());
		
  System.out.println("Bitscore\t"+second.getHitAlignment().getBits());
  System.out.println("E-Value\t"+second.getHitAlignment().getEvalue());
  System.out.println("Coverage in Hit\t"+second.getHitAlignment().getHit_coverage());
  System.out.println("Coverage in Query\t"+second.getHitAlignment().getQuery_coverage());
  System.out.println("Percantage matched residues\t"+second.getHitAlignment().getPositives()+" %");
  System.out.println("Score ratios: Hit,Query\t"+second.getHitAlignment().getHit_ScoreRatio()+","+second.getHitAlignment().getQuery_ScoreRatio());
			
  System.out.println("Identity\t"+second.getHitAlignment().getIdentity()+" %");
  System.out.println("in "+second.getHitAlignment().getOverlap()+" aa overlap");
  System.out.println("from: "+second.getHitAlignment().getQuery_start()+" to "+second.getHitAlignment().getQuery_stop()+ " in query sequence");
  System.out.println("from: "+second.getHitAlignment().getHit_start()+" to "+second.getHitAlignment().getHit_stop()+ " in hit sequence");			
			
  // print out data concerning the hit-sequence
  System.out.println("\nLength of the sequence:\t"+second.getHitData().getLength());
  System.out.println("Selfscore\t"+second.getHitData().getSelfscore());
  System.out.println("Number of Hits in SIMAP\t"+second.getHitData().getNumber_hits());
  System.out.println("Sequence:\n"+second.getHitData().getSequence());
  System.out.println("with checksum:\t"+second.getHitData().getMd5());
			
  // print out data of the protein instances
  System.out.println("Instances:");
  for (HitProtein o : second.getHitData().getProteins()) {
    System.out.println(o.getTitle()+" Description "+o.getDatabase_description()+" Tax-Node"+o.getTax_node());
    System.out.println(o.getLinkoutUrl());
    System.out.println(o.getTaxonomy());
    System.out.println("----");
  }

 

Search-Spaces

One of the most important features of SIMAP is the virtual definition of search-spaces. A search-space comprises a sub-selection of different datasets, as, for example, SWISSPROT+PDB or all complete PEDANT databases. The data retrieved by a client is then automatically restricted to this selected search-space. Most importantly, the E-Values are recomputed to the selected search-space size, since they depend on the size of the database used. By default, the search-space is set to whole SIMAP. As soon the user starts to define own selections, the search-space is restricted to this selections.

There are three different types of manipulating the search-space:

1. Using Database-ids

Databases can be exclusively added. For example: adding PDB will retrieve hits in PDB only.

  int dbid=0;
  // we get all available databases to look up the dataset-id we want
  ArrayList databases=simap.getAllDatabases();
for (Hashtable mydatab : databases) {
System.out.println(mydatab.get("taxon_id")+"\t"+mydatab.get("name")+"\t"+mydatab.get("id")+"\t"+mydatab.get("source"));
if (mydatab.get("name").equals("PDB")) {
System.out.println(mydatab.get("id"));
dbid=new Integer((String)mydatab.get("id"));
break;
}
}
simap.addDatabase(dbid);

2. Using taxonomy-ids

Taxonomy-ids can be used to include only certain taxa, e.g. only to search in eukaryotes. Excluding taxa is also possible. Combining both ways allows queries as "look for all human proteins in PDB homologoues to my query sequence x from organism/dataset y". The query sequence needs not to be contained in the positive selection of subsets. If it is missing, simply no self-hit is reported (as not existent in the workspace), but the hits in the activated subset. The ids used here are the taxonomy-ids from the NCBI homepage (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Taxonomy).

Code examples:

  simap.addDatabase(dbid);
  simap.includeTaxon(2759);
  simap.excludeTaxon(9606);
  simap.getHitsXML();

3. Using resource types

Resource types describe different kinds of data-origin. They are referred to by an integer value. Currently following types are provided:

GENRE_WZW=1;
PEDANT2_COMPLETE_GENOMES=2;
SPUTNIK_EST=3;
PEDANT3_COMEPLETE_GENOMES=4;
UNIPROT=5;
MULTIFASTA=6;
EMBL_TAX=7;
PLANTSDB=8;
GENEBANK=9;
GENEBANK_TAX=10;
GENRE=11;
PEDANT2_INCOMPLETE_GENOMES=12;
PEDANT3_INCOMPLETE_GENOMES=13

Most often, You will not need to restrict resource-types. However, an interesting use-case is the restriction to complete genomes, which can be done by adding the PEDANT complete genomes and thereby excluding the incomplete ones.

Code example:

  simap.addSource(ResourceTypes.PEDANT2_COMPLETE_GENOMES);
  simap.addSource(ResourceTypes.PEDANT3_COMPLETE_GENOMES);

In case of further questions, please contact sysadmin.csb@univie.ac.at