Schema for Augustus - Augustus Gene Predictions
|
|
Database: hub_32_GCA_031877795.1 Primary Table: hub_32_augustus Data last updated: 2023-12-01
Big Bed File: https://hgdownload.soe.ucsc.edu/hubs/GCA/031/877/795/GCA_031877795.1/bbi/GCA_031877795.1_bStrAlu1.hap1.augustus.bb Item Count: 23,981
Format description: bigGenePred gene models
field | example | description |
chrom | CM062876.1 | Reference sequence chromosome or scaffold | chromStart | 115408632 | Start position in chromosome | chromEnd | 115435893 | End position in chromosome | name | g231.t1 | Name or ID of item, ideally both human readable and unique | score | 0 | Score (0-1000) | strand | - | + or - for strand | thickStart | 115408632 | Start of where display should be thick (start codon) | thickEnd | 115435893 | End of where display should be thick (stop codon) | reserved | 0 | RGB value (use R,G,B string in input file) | blockCount | 14 | Number of blocks | blockSizes | 129,165,46,255,68,210,54,111,104,137,82,162,148,2769, | Comma separated list of block sizes | chromStarts | 0,3692,6948,8437,11480,12475,13036,13581,15471,18714,19774,22092,22446,24492, | Start positions relative to chromStart | name2 | g231 | Alternative/human readable name | cdsStartStat | cmpl | Status of CDS start annotation (none, unknown, incomplete, or complete) | cdsEndStat | cmpl | Status of CDS end annotation (none, unknown, incomplete, or complete) | exonFrames | 0,0,2,2,0,0,0,0,1,2,1,1,0,0, | Reading frame of the start of the CDS region of the exon, in the direction of transcription (0,1,2), or -1 if there is no CDS region. | type | | Transcript type | geneName | g231.t1 | Primary identifier for gene | geneName2 | g231 | Alternative/human readable gene name | geneType | | Gene type |
|
| |
|
|
Sample Rows
|
|
chrom | chromStart | chromEnd | name | score | strand | thickStart | thickEnd | reserved | blockCount | blockSizes | chromStarts | name2 | cdsStartStat | cdsEndStat | exonFrames | type | geneName | geneName2 | geneType |
CM062876.1 | 115408632 | 115435893 | g231.t1 | 0 | - | 115408632 | 115435893 | 0 | 14 | 129,165,46,255,68,210,54,111,104,137,82,162,148,2769, | 0,3692,6948,8437,11480,12475,13036,13581,15471,18714,19774,22092,22446,24492, | g231 | cmpl | cmpl | 0,0,2,2,0,0,0,0,1,2,1,1,0,0, | | g231.t1 | g231 | |
CM062876.1 | 115408632 | 115435893 | g231.t2 | 0 | - | 115408632 | 115435893 | 0 | 13 | 129,79,255,68,210,54,111,104,137,82,162,148,2769, | 0,6948,8437,11480,12475,13036,13581,15471,18714,19774,22092,22446,24492, | g231 | cmpl | cmpl | 0,2,2,0,0,0,0,1,2,1,1,0,0, | | g231.t2 | g231 | |
CM062876.1 | 115521210 | 115532146 | g232.t1 | 0 | + | 115521210 | 115532146 | 0 | 15 | 48,171,104,137,101,93,192,66,90,111,52,101,66,135,153, | 0,782,2211,2392,3140,3619,4221,4497,5090,5625,6180,6823,8927,10122,10783, | g232 | cmpl | cmpl | 0,0,0,2,1,0,0,0,0,0,0,1,0,0,0, | | g232.t1 | g232 | |
CM062876.1 | 115535153 | 115539855 | g233.t1 | 0 | - | 115535153 | 115539855 | 0 | 4 | 318,131,164,149, | 0,1223,3720,4553, | g233 | cmpl | cmpl | 0,1,2,0, | | g233.t1 | g233 | |
CM062876.1 | 115535153 | 115548866 | g233.t2 | 0 | - | 115535153 | 115548866 | 0 | 5 | 318,131,164,188,60, | 0,1223,3720,8420,13653, | g233 | cmpl | cmpl | 0,1,2,0,0, | | g233.t2 | g233 | |
CM062876.1 | 115563551 | 115613376 | g234.t1 | 0 | + | 115563551 | 115613376 | 0 | 12 | 138,97,96,72,45,77,238,124,135,115,107,151, | 0,3652,6245,13376,15553,18835,20220,32450,33147,33944,47040,49674, | g234 | cmpl | cmpl | 0,0,1,1,1,1,0,1,2,2,0,2, | | g234.t1 | g234 | |
CM062876.1 | 115637712 | 115658931 | g235.t1 | 0 | - | 115637712 | 115658931 | 0 | 6 | 177,179,144,182,169,49, | 0,3067,7981,8806,20605,21170, | g235 | cmpl | cmpl | 0,1,1,2,1,0, | | g235.t1 | g235 | |
CM062876.1 | 115637712 | 115658931 | g235.t2 | 0 | - | 115637712 | 115658931 | 0 | 5 | 177,179,182,169,49, | 0,3067,8806,20605,21170, | g235 | cmpl | cmpl | 0,1,2,1,0, | | g235.t2 | g235 | |
CM062876.1 | 115637712 | 115741675 | g235.t3 | 0 | - | 115637712 | 115741675 | 0 | 12 | 177,179,144,182,169,114,152,169,115,63,908,16, | 0,3067,7981,8806,20605,34273,37541,44649,61198,85831,101891,103947, | g235 | cmpl | cmpl | 0,1,1,2,1,1,2,1,0,0,1,0, | | g235.t3 | g235 | |
CM062876.1 | 116738718 | 116787465 | g236.t1 | 0 | + | 116738718 | 116787465 | 0 | 19 | 85,74,78,190,166,109,96,93,126,65,70,147,71,88,153,153,165,62,154, | 0,8872,15818,18162,20864,21918,22458,23979,24426,25136,27366,30909,31926,32448,33464,39304,43168,44484,48593, | g236 | cmpl | cmpl | 0,1,0,0,1,2,0,0,0,0,2,0,0,2,0,0,0,0,2, | | g236.t1 | g236 | |
|
| |
|
|
Augustus (hub_32_augustus) Track Description
|
|
Description
This track shows ab initio predictions from the program
AUGUSTUS (version 3.1).
for the 25 Sep 2023 Strix aluco/GCA_031877795.1_bStrAlu1.hap1 genome assembly.
The predictions are based on the genome sequence alone.
Gene count: 23,981; Bases covered: 411,701,066
Data Access
Download GCA_031877795.1_bStrAlu1.hap1.augustus.gtf.gz GTF file.
Methods
Statistical signal models were built for splice sites, branch-point
patterns, translation start sites, and the poly-A signal.
Furthermore, models were built for the sequence content of
protein-coding and non-coding regions as well as for the length distributions
of different exon and intron types. Detailed descriptions of most of these different models
can be found in Mario Stanke's
dissertation.
This track shows the most likely gene structure according to a
Semi-Markov Conditional Random Field model.
Alternative splicing transcripts were obtained with
a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2
--minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2).
The different models used by Augustus were trained on a number of different species-specific
gene sets, which included 1000-2000 training gene structures. The --species option allows
one to choose the species used for training the models. Different training species were used
for the --species option when generating these predictions for different groups of
assemblies.
Assembly Group |
Training Species |
Fish |
zebrafish
|
Birds |
chicken
|
Human and all other vertebrates |
human
|
Nematodes |
caenorhabditis |
Drosophila |
fly |
A. mellifera |
honeybee1 |
A. gambiae |
culex |
S. cerevisiae |
saccharomyces |
This table describes which training species was used for a particular group of assemblies.
When available, the closest related training species was used.
Credits
Thanks to the
Stanke lab
for providing the AUGUSTUS program. The training for the chicken version was
done by Stefanie König and the training for the
human and zebrafish versions was done by Mario Stanke.
References
Stanke M, Diekhans M, Baertsch R, Haussler D.
Using native and syntenically mapped cDNA alignments to improve de novo gene finding.
Bioinformatics. 2008 Mar 1;24(5):637-44.
PMID: 18218656
Stanke M, Waack S.
Gene prediction with a hidden Markov model and a new intron submodel.
Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25.
PMID: 14534192
| |
|
|
|