Contents Download_button Download this help page as PDF

General Back to top

WebScipio offers an advanced option to search for exons of tandem gene duplications. It is based on the exon-intron structure determined by Scipio. The algorithm of the search divides into several steps, which are conducted for each known exon. It takes three assumptions into account: Firstly, duplicated exons have a similar length; secondly, their splice sites and reading frames are conserved; thirdly, they are homologues. Several parameters of the search are adjustable.

Options Back to top

Allowed length difference for exons

Determines the maximal difference between the length of an exon and the length of the duplicated exon. The length is defined by the number of amino acids coded by the exon.

Minimal score for exons

The score is defined by the global alignment score of the translated exon sequence to the duplicated exon translation divided by the global alignment score of the exon translation to itself. The minimal score is a threshold above which the duplicated exon is taken into account.

Minimal tandem gene score

The score is defined by number of amino acids of the original gene found in the duplicated gene divided by the number of amino acids of the original gene. This means that mismatches are counted, but gaps are not counted.

Minimal exon length

Just exons, which are longer than the minimal exon length, are taken into account for the search for tandem gene duplications. The length is defined by the number of amino acids coded by the exon.

Search for concatenated exons

If this option is enabled, the algorithm searches for duplicated exons with pairs (2-tuples) of neighbouring exons, triplets (3-tuples), 4-tuples, 5-tuples and so on up to all exons by concatenating the translations of the neighbouring exons. So helps to find duplicated exons in which introns are lost.

Search for splitted exons

If this option is enabled, the algorithm searches for exons splitted in two exons in the duplicated gene. Only exons, which were not found in the first round of the algorithm are used for this search. The missing exons are splitted in all possible two parts, whereby the smaller part must be longer than the minimal exon length.

Search with start codon for first exon

The first exon does not have a conserved splice site pattern at the 5' end but its beginning is determined by the start codon (ATG) coding for a Methionin. So the algorithm tries to find duplicated exons beginning with ATG (and not with a splice site pattern at the 5' end of the exon).

  • Auto: This option makes it possible to autodetect, whether the first exon in the Scipio result is the beginning of the protein. It searches for duplicated exons with the ATG start codon (instead of a splice site pattern) if the first exon begins at the start of the protein sequence and starts with ATG.
  • Enable: This option forces the algorithm to look for alternative exons with ATG as start codon for the first exon (and does not use the splice site pattern).
  • Disable: This option forces the algorithm to use splice site patterns of the first exon to look for duplicated exons as if it is a normal exon in the middle.

Search with stop codons for last exon

The last exon does not have a splice site pattern at the 3' end but its end is determined by a stop codon (TAG, TAA, TGA). So the algorithm tries to find duplicated exons followed by a stop codon (and not with a splice site pattern at the 3' end of the exon).

  • Auto: This option makes it possible to autodetect, whether the last exon in the Scipio result is the end of the protein. It searches for duplicated exons with stop codons (instead of a splice site pattern) if the last exon ends at the end of the protein sequence and is followed by TAG, TAA or TGA.
  • Enable: This option forces the algorithm to look for duplicated exons followed by stop codons for the last exon (and does not use the splice site pattern).
  • Disable: This option forces the algorithm to use splice site patterns of the last exon to look for duplicated exons as if it is a normal exon in the middle.

Examples Back to top

Oreochromis niloticus mysion heavy chain 13 (Mhc13) gene with two tandem gene duplications

OrganismOreochromis niloticus
Genome file supercontigs v 1.2.0
Query sequenceMNDTDMEVFGVAAPYLRKSERERIAAQNVPFDAKSAVFVPHPKQEYVKGKIRSQDGTKVN
VEIEDGKVVTVHVDDIRPMNPPKFDKIEDMALLTHLHEPAVLFNLKERYAAWMIYTYSGL
FCVTVNPYKWLPVYNPEVVAGYRGKKRQEAPPHIFSISDNAYQYMLTDRENQSILITGES
GAGKTVNTKRVIQYFATITAMGESSKKEQLGSKMQGTLEDQIIQANPLLEAFGNAKTVRN
DNSSRFGKFIRIHFGTKGKLASADIETYLLEKSRVTFQLLAERSYHIFYQILSNKKPDLI
EMLLITSNPYDYPFISQGEITVLSINDAEELMASDRAIDILGFSTEEKVGIYKLTGAVMH
NGNMKFKQKQREEQAEPDGTEVADKVAYLMGLNSADLLKALCCPRVKVGNEYVTKGQTPQ
QVNNAVGALSKAVYEKLFLWMVTRINQQLDTKLPRQHFIGVLDIAGFEIFEINSLEQLCI
NFTNEKLQQFFNHHMFVLEQEEYKKEGIEWEFIDFGMDLAACIELIEKPMGIFSILEEEC
MFPKATDGSFKNKLYDQHLGKNSIFQKPKPSKAKTEAHFSLMHYAGTVDYNISGWLEKNK
DPLNDTVVQLYQKASLKLLCQLFATYASADAAADGNKKNYKKKGSSFQTVSALFRENLNK
LMANLRSTHPHFVRCIIPNETKIPGIMDHHLVLHQLRCNGVLEGIRICRKGFPSRILYGD
FRQRYRILNASVIPEGQFIDSKKASEKLLSSIDVDHTQYRFGYTKVFFKAGLLGLLEEMR
DERLAVLMTRIQAVARGYVTRLRLKEMMKKREAVYIIQYNIRSFMNVKNWPWMKLFFKIK
PLLRSAEAEKEMQTMKEEFARLREELAKSEARRKELEEKMVMLMQEKNDLYLQIQAEREN
LCDAEERCEGLIKSKIHLEAKAKEFSERMEEEEEINAEITAKKRKLEDECSELKRDIDDL
ELTIAKVEKEKYATENKVKNLIEELTTLEENLLKSSKEMKALQEVHQQTLDDLQAEEDRV
NSLIKTKTKLEQQIDDLEGLVEQEKKLRADLERSRRKLEGDLKLNQETIMDLENERQQAE
EQLKKKDFEISSLQSKIDDEQALSTQLQKKIKELQARTEELEEEIEAERAARAKVEKQRS
DLSRELEEITERLEEAGGASAAQAELNKKREAEIQRLRHELEESTLQHESIAVALRKKQA
DSVAELGDQIENLQRVKQKLEKEKSEMKMEIDDMARSMETVLKSKANVEKQCRSLEDQMN
EYKTKADEAQRSLSDYTTLSARLQTENGELTRLLEEKESILSQVNRGKTAAGHKIEEMKR
LLDEEIKTKNALAHSLQSSRHDCELLREQYEEEQEAKAELQRCLSKVNSDVAQWRNKYET
DTIQRTEELEEAKKKLVQRLQESEEMTEAANVKCASLEKTKQRLQAEVEDLMVELERSNA
ANATLDKKQKNFDKVLAEWKQKYEECQSDLEVSQRESRALNTELFKLKNSYEEVLDHLES
MKRDNKNLQQEITDINEQVGESTKMLRELEKATKHAEQEKRDTQAALEEAESSLEQEESK
ILQLELELNQIKSEVERKVAEKDEEIDQLKRNYQRTVDYLQSTLDAETRSRNDAVRMKKK
MEGDLNEMEIQLGHANRQAAEATKQLRNLQTQLKDTQVHLDEALQRQEDLKEELAIIERR
NNLMMAENEELRASLEQSDRSRKLAEQELMEVSERVQLLHSQNTSLVNSKKKMEADLTQL
QSEMEETMQEARNADEKAKKAITDAAMMAEELKKEQDTSAHLERMKKNLEATVKDLQQRL
DEAEQVALKGGKKEIQKLEAKVRELENELEAEQKRSGEAVKGVRKYERKIKELTYQGEEE
KKNSARLQDLVNKLQLKVKAYKRQFEEAEEQSSIHLAKFRKVQHELEEAEERADVAESQL
NKLRARSRDVAGKGELKS
YAML-file for uploadDownload_button ScipioResult_Orn_Mhc13.yaml
Expert optionsdefaults, except "Region Size = 50000"
Search for tandem gene duplicationsdefaults, except "Minimal score for exons = 30" and "Region Size = 50000"
Enable search!
Result

Drosophila melanogaster CG30047 gene with five tandem gene duplications

OrganismDrosophila melanogaster
Genome file chromosome v 5.0.0
YAML-file for uploadDownload_button ScipioResult_Dm_CG30047.yaml
Query sequenceMKSREKNGSAASNSDVALVNVLSQQKLRRHRLPWYYAPSFLLLWVALFYAVVYPLYHRLP
DSVLISHESSKPGQFVAERAQRLLYKYDKIGPKVVGSVANEVTTVAFLEEEVENIRAAMR
SDLYELQLDVQHPSGAYMHWQMVNMYQGVTNVVVKISSRSSNSSSYLLVNSHFDSKPSSP
GSGDDGTMVVVMLEVLRQVAISDTPFEHPIVFLFNGAEENPLEASHGFITQHKWAGNCKA
LINLEVAGSGGRDLLFQSGPNNPWLIKYYYQNAKHPFATTMAEEIFQSGILPSDTDFRIF
RDYGQLPGLDMAQISNGYVYHTIFDNVQAVPIDSLQSSGDNALSLVRAFADAPEMQNPED
HSEGHAVFFDYLGLFFVYYTENTGIVLNCCIAVASLVLVVCSLLRMGRESDVSIGRVSIW
FAIILVLHVLGMILSLGLPLLMAVLFDAGDRSMTYFSNNWLVIGLFIVPAIIGQILPLTL
YYTLKPNDEISHPNHIHMSLHAHCVLLSLIAIILTAISLRTPYLCMMSLLFYGGALLINL
LSTLHDRGYYWVVLVQVLQLVPFLYFCYLFYTFLLVFFPMLGRFGHGTNPDLLIALICAV
GTFFALGFVAPLINIFRWPKLALLGLGVVTFIFSMIAVSEVGFPYRAKTSVMRIHFLHVR
RIFYEYDGSVSLSDSGYYFDFQDRRLYYPLENTSVNLTGLASTSSECDKYLMCGVPCFNH
RWCKTRAKSHWLPREQEVAIPGATSLKLLSKAVLDSGKVARFEFEISGPPHMSLYIQPLD
GVEVEDWSFIRNMLDEPDTYSPPYQIFFAYGADNTPLKFHIDFAKSSGDFSTPTFQLGFA
ASFVSYDYDRDAAGLKFISDFPDFAHVMEWPTLYERYIF
Expert optionsdefaults, except "BLAT Tilesize = 5"
Search for tandem gene duplicationsdefaults, except "Search for concatenated exons = enabled" and "Allowed length difference for exons = 5"
Enable search!
Result

Overview of search parameters as activity diagram Back to top

Activity diagram of search for mutually exclusive spliced exons
link to kassiopeia
link to diark
link to cymobase
link to motorprotein.de
MPG
MPI for biophysical chemistry
Uni-Goettingen
Informatik Uni-Goettingen