Help pages
How to use WebScipio
To obtain the gene structure corresponding to a certain protein query, first select a species and a genome dataset, then enter your protein query, optionally adjust Scipio's settings, and finally start Scipio.Selecting a species and a genome dataset
Species are selected using an autocompletion form. Either scientific names, common names, or taxa may be entered. Note, that fungi often have different names for their teleomorph and anamorph. Both names can be searched for, but only the anamorph name will be listed at the moment. Strains will be listed separately, Typing a taxon-name is very useful, if the user wants to perform a cross-species search and is looking for closely related species. In the example, mamm has been typed, the first letters of Mammalia, and the autocompletion lists the first 10 species out of 33 mammalian species for which genome data is available.
Entering the protein query and setting options
When a genome dataset has been chosen, the protein query can be entered and Scipio and BLAT options can be set. The protein query can be entered in either plain text or FASTA format. Now, Scipio can be started either using the default settings that are most suitable for in-species searches, or several options might be set to allow for e.g. more mismatches or to use a smaller BLAT-tilesize to also get very small exons. To set those options, click "Expert-mode":| Best Size | The minimum fraction of the query that has to be found on one single contig. E.g. a Best Size of 0.3 means, that at least 30 % of the protein query must be found on one contig. |
| Min. Identity | The minimal identity within a stretch of DNA in order to be taken into account by WebScipio. |
| Max. Mismatch | The maximum number of mismatches on a contig in order to be included in the results. The Value "∞" allows an unlimited number of mismatches. |
| Region Size | The length of the up- and downstream regions that will be retrieved. |
| Max. Results per Query | Sometimes, there are gene duplicates in the genome, that are very similar. Increasing the value might result in several hits, as long as all hits are above the cutoff given by the other options. |
| BLAT Tilesize | Determines the width of the search window used to scan the genome. Decreasing this value makes it more likely that small exons are found but also slows the search process. |

Viewing Scipio's results
WebScipio generates a graphical representation of the gene that clearly indicates the length and position of exons and introns and shows where discrepancies are located. It also shows the identifiers of the target sequences. In order not to make small exons vanish when very large intronic stretches are found, the scaling of introns end exons is automatically balanced to make the picture visually meaningful. Tooltips show additional information. For detailed inspection of the hits, WebScipio generates an easy to read alignment of the query and the genome. It is grouped by exons, and mismatches and frame shifts are highlighted. Different stretches of DNA can be viewed: Up- and downstream DNA, genomic DNA from the first to the last exon including introns, or the coding DNA. The translation of the coding DNA as determined by the algorithm can also be viewed. Five types of files can be downloaded: A FASTA file containing all types of DNA sequences as described above, a FASTA file containing the protein translation, a log file with alignments and detailed reports, a GFF file for use with genome software, and a YAML file which contains all information generated by WebScipio.
Scaling in gene pictures
When exons are short compared to introns, the unscaled picture is dominated by introns and it is hard to see the length of the exons (see first picture). To improve the visualization WebScipio scales down the introns and scales up the exons so that the the average length of the introns equal the average length of the exons (see second picture).

Evaluating the results in the YAML-file
In the YAML-file, the "status" of a hit can be one of the following:| auto | the complete query is correctly matched by the hit |
| partial | part of the query is correctly matched by the hit |
| incomplete | one of the cases "A" to "E" occured |
| (manual | the hit has been edited manually) |
In the Log-file, the "status" of a query can be:
| complete | the query is matched completely by one single hit or several partial hits |
| incomplete | one of the cases "A" to "E" occured |
The various cases causing status: !incomplete
| A | !missing stopcodon | There is no stop codon in the genomic DNA after the last amino acid of the query. |
| B | !bad intron | At least one of the introns does not show appropriate 5' and/or 3' splice sites. |
| C | !mismatches | At least one of the amino acids of the query does not match the translation of the genomic DNA. |
| D | !sequence shift | Additional or missing bases have been identified that would lead to frame shifts during translation. Those are most probable due to sequencing/assembly problems, but might also hint to the existance of pseudogenes. |
| E.1 | !gap to querystart/queryend | There are unmatched aminoacids at the N-/C- terminus of the query: the first hit for a query doesn't start with the first amino acid of the query (the last hit for a query doesn't end with the last amino acid). |
| E.2 | !gap to previous/next hit | There are unmatched amino acids between hits for the same query on different targets. |
| E.3 | !gap | There are unmatched amino acids between two exons of a single hit. |
Example searches
Gene containing a mismatch and a frameshift
| Organism | Fusarium oxysporum f. sp. lycopersici 4286 |
| Genome file | |
| Query sequence | MALVIPYTYIQCPCSDQSPPDLPQARQSQSSDERTFDPRDPRSNYSLYPLEYLLYCEDCQ QIRCPRCVNEEVVTYYCPNCLFEVPSSNLRSDGNRCTRSCYQCPVCIGPLQVMETPIEKD QSHLGADIPGPQYALYCQYCNWTSTEIGIKFDKPNGIHSQLSKINNGGDLKLTAKELKER RKENPDEPPLADSDVDTDLQYANLKSFYQSQLADTNAASSGISPLNDTTGYGSPAASLSR IMAMYTGHGHARKRNGPSXVMREALSAEEGLKLADLDESAQIKKLHQEGWDATATIQQNL EQAEVQRFQDGLRPIPHLLRTKRSKRCSVCRHIISKPENKVTSTRFKIRLVAKSYIPTIT IKPLNPTAGTVPTTQRPQILEERPLKPLTPHHYIITFKNPLFDGIKVTLATPNSTPGRFS SKVTILCPQFDIDANTDMWDDALKDDDRDKKRKGEESSGQPEAGKIWERGRNWVSIILEV VPASLRLDGQKDKSPLKEDEDILEIPMFVRMEWEPDSQQDVGAASAKEKDAQERRELAYW CVLGVGRISHD |
| Search parameters | standard parameters |
| Result | |
| Resultfile | ScipioResult_Fusarium_oxysporum.yaml |
Gene spread on several contigs and containing a gap (query sequence from cDNA)
| Organism | Bombyx mori str. Dazao |
| Genome file | |
| Query sequence | MEHSLQHRERVGVQDFVLLEDYRSEAAFIDNLKKRFHENIIYTYIGNVLISVNPYKNLPI YTEEKTKLYFKKAFFEAPPHVFAIADNAYRSLVYEHREQCILISGESGSGKTEASKKVLE YIAARTNHLRNVENVKDKLLQSNPLLEAFGNAKTHRNDNSSRFGKYMDIQFNYEGGPEGG HILNYLLEKSRVVSQMHGERNFHIFYQLLASSDQSLMTHLKLQGRPEAYKYTSDSTSHMS QRANDQEQFRVVQEAMKVIEIGESEQREIFEIVASVLHLGNVKFVQNDKGYAEILSHDAN SGNAADLLKVNATALREALTNRTIEARGDVVSTPLDVEQAQYARDALAKAIYDKHFSWLV SRLNSSLAPIEKDAKSSVIGILDIYGFEIFPKNSFEQFCINFCNEKLQQLFIQLTLRQEQ EEYLREGIEWEPVEYFNNIIICDLIEARHKGIISILDDECLRPGDATDASFLDKLNQHLD GHQHYKSHRKSDTKTQKLMGRDEFCLVHYAGEVTYNVNTFLEKNNDLLFRDIQSLMASSD NTIVGCCFKVTFSNREPSYIRCIKPNDFKAPMQFDDKLVSHQVKYLGLMENLRVRRAGFA YRRTYEAFLERYKCLSAETWPNYRGAARDGVQRLVEALQYEKEEYRMGNTKIFVRFPKTL FATEDAFQIKKNDIATIIQSRWRGYYLRKRYLRMRNAAIVIQKWVRRFLAQRLRERRRKA ADVIRAFIKGFITRNGPETPENRRFLGVAKVHWLKRLSAQLPTKLLDLSWPPCPSTCREA SEELHRLHRAHLARKYRLALSPDDKKQFELKVLAEKIFKYSCEAVKYDRRGYKARARGLL ASRAALYVLDAGGRRTFRLKHRLPLDRLTVVVTNESDSLLLVKVPRDLKKDKGDLIISVT HLIEALTIVTDYTKKPELIEIVDTRTIAHSLVNGKQGGTIEVTKGTQPAIQRAKSGNLLV VATP |
| Search parameters | standard parameters |
| Result | |
| Resultfile | ScipioResult_Bombyx_mori.yaml |
Gene with a 5' exon just encoding a methionine
| Organism | Magnaporthe grisea 70-15 |
| Genome file | |
| Query sequence | MEHSLQHRERVGVQDFVLLEDYRSEAAFIDNLKKRFHENIIYTYIGNVLISVNPYKNLPI YTEEKTKLYFKKAFFEAPPHVFAIADNAYRSLVYEHREQCILISGESGSGKTEASKKVLE MGITRRGKDKAAAGQAVAGGASGGRARPKKATFETSKKKDVGVSDLTLLSKVSNEAINEN LQKRFEGREIYTYIGHVLVSVNPFRDLGIYTDQVLDSYKGKNRLEMPPHVFAIAESAYYN MKAYKDNQCVIISGESGAGKTEAAKRIMQYIASVSGGDSTDIQQIKDMVLATNPLLESFG NAKTLRNNNSSRFGKYLQIHFNSVGEPVGADITNYLLEKSRVVGQITNERNFHIFYQFTK GASEHYRQMFGIQKPETYIYTSRSKCLDVDGIDDLAEFQDTLNAMKVIGLSQEEQDSVFR ILAAILWTGNLVFREDDEGYAAVTDQSVVEFLAYLLEVDPQQLIKAITIRILTPRSGEVI ESPANVAQAMATRDALAKSLYNNLFDWIVERINQSLKARQPTSNSVGILDIYGFEIFEKN SFEQLCINYVNEKLQQIFIQLTLKAEQDEYAREQIKWTPIKYFDNKIVCDLIESVRPPGV FSALKDATKTAHADPAACDRTFMQSVNGMSNAHLIPRQGSFIIKHYAGDVAYTVDGITDK NKDQLLKGLLGMFQVSQNPFLHTLFPNQVDQDNRKQPPTAGDRIRTSANALVETLMKCQP SYIRTIKPNENKSPTEYNVPNVLHQIKYLGLQENVRIRRAGFAYRQSFEKFVDRFFLLSP ATSYAGEYTWQGSYEAAVKQILKDTSIPQEEWQMGVTKAFIKSPETLFALEHMRDRYWHN MATRIQRMWRAYLAYRAESATRIQTFWRKKRTGAEYLQLRDHGHRVLQGRKERRRMSILG SRRFIGDYLGINASSGPGAHIRNAIGIGSNEKTVFSCRGEILEAKFGRSSKASPRILIVT NSKFYVVAQMLVNGQVQITAEKAIPLGAIKFIGASSSRDDWFSLGVGSPQEADPLLNCVL KTEMFTQMERVMPGGFNLKIGDSIEYAKKPGKMQVVKVLKDSPNPVDFYKSGAVHTQQGE PPNSVSRPTPKGKPVPPRPITRGKLIRPGGPNGRPARGTTNRTPQPRPGGASASAVASRP VPQAQPQAQAQVAASIPVRTQQQSQTSSASVRAPPPPPPAAPPAKAKIMAKVLYDFAGQK ENEMSIKEGDLIEIVQKENNGWWLAKSGNQQAWVPAAYVEEQKQAPPPVAASRPPPPAPP AANGKNKPLPPAKRPAAGKKPASLQPRDSGMSLNGSDGSRSNTPTPSLGNSLADALLARK QAMAKKDDDDDW |
| Search parameters | standard parameters |
| Result | |
| Resultfile | ScipioResult_Magnaporthe_grisea.yaml |











