Help pages

How to use WebScipio

To obtain the gene structure corresponding to a certain protein query, first select a species and a genome dataset, then enter your protein query, optionally adjust Scipio's settings, and finally start Scipio.

Selecting a species and a genome dataset

Species are selected using an autocompletion form. Either scientific names, common names, or taxa may be entered. Note, that fungi often have different names for their teleomorph and anamorph. Both names can be searched for, but only the anamorph name will be listed at the moment. Strains will be listed separately, Typing a taxon-name is very useful, if the user wants to perform a cross-species search and is looking for closely related species. In the example, mamm has been typed, the first letters of Mammalia, and the autocompletion lists the first 10 species out of 33 mammalian species for which genome data is available.

Select Species

As soon as a species has been selected, the available genome datasets will be listed. Different version numbers refer to the different version of the genome assemblies. Note, that the larger the file the longer the search will take. For many species, contig and supercontig data is available. For some species, the supercontigs have already been ordered into chromosomes. Uchromosomes contain the contigs that could not be placed onto chromosomes yet.

Entering the protein query and setting options

When a genome dataset has been chosen, the protein query can be entered and Scipio and BLAT options can be set. The protein query can be entered in either plain text or FASTA format. Now, Scipio can be started either using the default settings that are most suitable for in-species searches, or several options might be set to allow for e.g. more mismatches or to use a smaller BLAT-tilesize to also get very small exons. To set those options, click "Expert-mode":
Best SizeThe minimum fraction of the query that has to be found on one single contig. E.g. a Best Size of 0.3 means, that at least 30 % of the protein query must be found on one contig.
Min. IdentityThe minimal identity within a stretch of DNA in order to be taken into account by WebScipio.
Max. MismatchThe maximum number of mismatches on a contig in order to be included in the results. The Value "∞" allows an unlimited number of mismatches.
Region SizeThe length of the up- and downstream regions that will be retrieved.
Max. Results per QuerySometimes, there are gene duplicates in the genome, that are very similar. Increasing the value might result in several hits, as long as all hits are above the cutoff given by the other options.
BLAT TilesizeDetermines the width of the search window used to scan the genome. Decreasing this value makes it more likely that small exons are found but also slows the search process.

Enter sequence

Viewing Scipio's results

WebScipio generates a graphical representation of the gene that clearly indicates the length and position of exons and introns and shows where discrepancies are located. It also shows the identifiers of the target sequences. In order not to make small exons vanish when very large intronic stretches are found, the scaling of introns end exons is automatically balanced to make the picture visually meaningful. Tooltips show additional information. For detailed inspection of the hits, WebScipio generates an easy to read alignment of the query and the genome. It is grouped by exons, and mismatches and frame shifts are highlighted. Different stretches of DNA can be viewed: Up- and downstream DNA, genomic DNA from the first to the last exon including introns, or the coding DNA. The translation of the coding DNA as determined by the algorithm can also be viewed. Five types of files can be downloaded: A FASTA file containing all types of DNA sequences as described above, a FASTA file containing the protein translation, a log file with alignments and detailed reports, a GFF file for use with genome software, and a YAML file which contains all information generated by WebScipio.

Results

Scaling in gene pictures

When exons are short compared to introns, the unscaled picture is dominated by introns and it is hard to see the length of the exons (see first picture). To improve the visualization WebScipio scales down the introns and scales up the exons so that the the average length of the introns equal the average length of the exons (see second picture).

Unscaled sequence picture

Scaled sequence picture

Evaluating the results in the YAML-file

In the YAML-file, the "status" of a hit can be one of the following:
autothe complete query is correctly matched by the hit
partialpart of the query is correctly matched by the hit
incompleteone of the cases "A" to "E" occured
(manualthe hit has been edited manually)

In the Log-file, the "status" of a query can be:
completethe query is matched completely by one single hit or several partial hits
incompleteone of the cases "A" to "E" occured

The various cases causing   status: !incomplete

A!missing stopcodonThere is no stop codon in the genomic DNA after the last amino acid of the query.
B!bad intronAt least one of the introns does not show appropriate 5' and/or 3' splice sites.
C!mismatchesAt least one of the amino acids of the query does not match the translation of the genomic DNA.
D!sequence shiftAdditional or missing bases have been identified that would lead to frame shifts during translation. Those are most probable due to sequencing/assembly problems, but might also hint to the existance of pseudogenes.
E.1!gap to querystart/queryendThere are unmatched aminoacids at the N-/C- terminus of the query: the first hit for a query doesn't start with the first amino acid of the query (the last hit for a query doesn't end with the last amino acid).
E.2!gap to previous/next hitThere are unmatched amino acids between hits for the same query on different targets.
E.3!gapThere are unmatched amino acids between two exons of a single hit.

Example searches

Gene containing a mismatch and a frameshift

OrganismFusarium oxysporum f. sp. lycopersici 4286
Genome file v1_supercontigs
Query sequenceMALVIPYTYIQCPCSDQSPPDLPQARQSQSSDERTFDPRDPRSNYSLYPLEYLLYCEDCQ
QIRCPRCVNEEVVTYYCPNCLFEVPSSNLRSDGNRCTRSCYQCPVCIGPLQVMETPIEKD
QSHLGADIPGPQYALYCQYCNWTSTEIGIKFDKPNGIHSQLSKINNGGDLKLTAKELKER
RKENPDEPPLADSDVDTDLQYANLKSFYQSQLADTNAASSGISPLNDTTGYGSPAASLSR
IMAMYTGHGHARKRNGPSXVMREALSAEEGLKLADLDESAQIKKLHQEGWDATATIQQNL
EQAEVQRFQDGLRPIPHLLRTKRSKRCSVCRHIISKPENKVTSTRFKIRLVAKSYIPTIT
IKPLNPTAGTVPTTQRPQILEERPLKPLTPHHYIITFKNPLFDGIKVTLATPNSTPGRFS
SKVTILCPQFDIDANTDMWDDALKDDDRDKKRKGEESSGQPEAGKIWERGRNWVSIILEV
VPASLRLDGQKDKSPLKEDEDILEIPMFVRMEWEPDSQQDVGAASAKEKDAQERRELAYW
CVLGVGRISHD
Search parametersstandard parameters
Result
ResultfileScipioResult_Fusarium_oxysporum.yaml

Gene spread on several contigs and containing a gap (query sequence from cDNA)

OrganismBombyx mori str. Dazao
Genome file v1_supercontigs
Query sequenceMEHSLQHRERVGVQDFVLLEDYRSEAAFIDNLKKRFHENIIYTYIGNVLISVNPYKNLPI
YTEEKTKLYFKKAFFEAPPHVFAIADNAYRSLVYEHREQCILISGESGSGKTEASKKVLE
YIAARTNHLRNVENVKDKLLQSNPLLEAFGNAKTHRNDNSSRFGKYMDIQFNYEGGPEGG
HILNYLLEKSRVVSQMHGERNFHIFYQLLASSDQSLMTHLKLQGRPEAYKYTSDSTSHMS
QRANDQEQFRVVQEAMKVIEIGESEQREIFEIVASVLHLGNVKFVQNDKGYAEILSHDAN
SGNAADLLKVNATALREALTNRTIEARGDVVSTPLDVEQAQYARDALAKAIYDKHFSWLV
SRLNSSLAPIEKDAKSSVIGILDIYGFEIFPKNSFEQFCINFCNEKLQQLFIQLTLRQEQ
EEYLREGIEWEPVEYFNNIIICDLIEARHKGIISILDDECLRPGDATDASFLDKLNQHLD
GHQHYKSHRKSDTKTQKLMGRDEFCLVHYAGEVTYNVNTFLEKNNDLLFRDIQSLMASSD
NTIVGCCFKVTFSNREPSYIRCIKPNDFKAPMQFDDKLVSHQVKYLGLMENLRVRRAGFA
YRRTYEAFLERYKCLSAETWPNYRGAARDGVQRLVEALQYEKEEYRMGNTKIFVRFPKTL
FATEDAFQIKKNDIATIIQSRWRGYYLRKRYLRMRNAAIVIQKWVRRFLAQRLRERRRKA
ADVIRAFIKGFITRNGPETPENRRFLGVAKVHWLKRLSAQLPTKLLDLSWPPCPSTCREA
SEELHRLHRAHLARKYRLALSPDDKKQFELKVLAEKIFKYSCEAVKYDRRGYKARARGLL
ASRAALYVLDAGGRRTFRLKHRLPLDRLTVVVTNESDSLLLVKVPRDLKKDKGDLIISVT
HLIEALTIVTDYTKKPELIEIVDTRTIAHSLVNGKQGGTIEVTKGTQPAIQRAKSGNLLV
VATP
Search parametersstandard parameters
Result
ResultfileScipioResult_Bombyx_mori.yaml

Gene with a 5' exon just encoding a methionine

OrganismMagnaporthe grisea 70-15
Genome file v5_supercontigs
Query sequenceMEHSLQHRERVGVQDFVLLEDYRSEAAFIDNLKKRFHENIIYTYIGNVLISVNPYKNLPI
YTEEKTKLYFKKAFFEAPPHVFAIADNAYRSLVYEHREQCILISGESGSGKTEASKKVLE
MGITRRGKDKAAAGQAVAGGASGGRARPKKATFETSKKKDVGVSDLTLLSKVSNEAINEN
LQKRFEGREIYTYIGHVLVSVNPFRDLGIYTDQVLDSYKGKNRLEMPPHVFAIAESAYYN
MKAYKDNQCVIISGESGAGKTEAAKRIMQYIASVSGGDSTDIQQIKDMVLATNPLLESFG
NAKTLRNNNSSRFGKYLQIHFNSVGEPVGADITNYLLEKSRVVGQITNERNFHIFYQFTK
GASEHYRQMFGIQKPETYIYTSRSKCLDVDGIDDLAEFQDTLNAMKVIGLSQEEQDSVFR
ILAAILWTGNLVFREDDEGYAAVTDQSVVEFLAYLLEVDPQQLIKAITIRILTPRSGEVI
ESPANVAQAMATRDALAKSLYNNLFDWIVERINQSLKARQPTSNSVGILDIYGFEIFEKN
SFEQLCINYVNEKLQQIFIQLTLKAEQDEYAREQIKWTPIKYFDNKIVCDLIESVRPPGV
FSALKDATKTAHADPAACDRTFMQSVNGMSNAHLIPRQGSFIIKHYAGDVAYTVDGITDK
NKDQLLKGLLGMFQVSQNPFLHTLFPNQVDQDNRKQPPTAGDRIRTSANALVETLMKCQP
SYIRTIKPNENKSPTEYNVPNVLHQIKYLGLQENVRIRRAGFAYRQSFEKFVDRFFLLSP
ATSYAGEYTWQGSYEAAVKQILKDTSIPQEEWQMGVTKAFIKSPETLFALEHMRDRYWHN
MATRIQRMWRAYLAYRAESATRIQTFWRKKRTGAEYLQLRDHGHRVLQGRKERRRMSILG
SRRFIGDYLGINASSGPGAHIRNAIGIGSNEKTVFSCRGEILEAKFGRSSKASPRILIVT
NSKFYVVAQMLVNGQVQITAEKAIPLGAIKFIGASSSRDDWFSLGVGSPQEADPLLNCVL
KTEMFTQMERVMPGGFNLKIGDSIEYAKKPGKMQVVKVLKDSPNPVDFYKSGAVHTQQGE
PPNSVSRPTPKGKPVPPRPITRGKLIRPGGPNGRPARGTTNRTPQPRPGGASASAVASRP
VPQAQPQAQAQVAASIPVRTQQQSQTSSASVRAPPPPPPAAPPAKAKIMAKVLYDFAGQK
ENEMSIKEGDLIEIVQKENNGWWLAKSGNQQAWVPAAYVEEQKQAPPPVAASRPPPPAPP
AANGKNKPLPPAKRPAAGKKPASLQPRDSGMSLNGSDGSRSNTPTPSLGNSLADALLARK
QAMAKKDDDDDW
Search parametersstandard parameters
Result
ResultfileScipioResult_Magnaporthe_grisea.yaml

FAQ's

link to diark
link to cymobase
link to motorprotein.de
MPG
MPI for biophysical chemistry
Uni-Goettingen
Informatik Uni-Goettingen