Schema for Augustus - Augustus Gene Predictions
  Database: hub_32_GCA_031877795.1    Primary Table: hub_32_augustus Data last updated: 2023-12-01
Big Bed File: https://hgdownload.soe.ucsc.edu/hubs/GCA/031/877/795/GCA_031877795.1/bbi/GCA_031877795.1_bStrAlu1.hap1.augustus.bb
Item Count: 23,981
Format description: bigGenePred gene models
fieldexampledescription
chromCM062876.1Reference sequence chromosome or scaffold
chromStart115408632Start position in chromosome
chromEnd115435893End position in chromosome
nameg231.t1Name or ID of item, ideally both human readable and unique
score0Score (0-1000)
strand-+ or - for strand
thickStart115408632Start of where display should be thick (start codon)
thickEnd115435893End of where display should be thick (stop codon)
reserved0RGB value (use R,G,B string in input file)
blockCount14Number of blocks
blockSizes129,165,46,255,68,210,54,111,104,137,82,162,148,2769,Comma separated list of block sizes
chromStarts0,3692,6948,8437,11480,12475,13036,13581,15471,18714,19774,22092,22446,24492,Start positions relative to chromStart
name2g231Alternative/human readable name
cdsStartStatcmplStatus of CDS start annotation (none, unknown, incomplete, or complete)
cdsEndStatcmplStatus of CDS end annotation (none, unknown, incomplete, or complete)
exonFrames0,0,2,2,0,0,0,0,1,2,1,1,0,0,Reading frame of the start of the CDS region of the exon, in the direction of transcription (0,1,2), or -1 if there is no CDS region.
typeTranscript type
geneNameg231.t1Primary identifier for gene
geneName2g231Alternative/human readable gene name
geneTypeGene type

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStartsname2cdsStartStatcdsEndStatexonFramestypegeneNamegeneName2geneType
CM062876.1115408632115435893g231.t10-115408632115435893014129,165,46,255,68,210,54,111,104,137,82,162,148,2769,0,3692,6948,8437,11480,12475,13036,13581,15471,18714,19774,22092,22446,24492,g231cmplcmpl0,0,2,2,0,0,0,0,1,2,1,1,0,0,g231.t1g231
CM062876.1115408632115435893g231.t20-115408632115435893013129,79,255,68,210,54,111,104,137,82,162,148,2769,0,6948,8437,11480,12475,13036,13581,15471,18714,19774,22092,22446,24492,g231cmplcmpl0,2,2,0,0,0,0,1,2,1,1,0,0,g231.t2g231
CM062876.1115521210115532146g232.t10+11552121011553214601548,171,104,137,101,93,192,66,90,111,52,101,66,135,153,0,782,2211,2392,3140,3619,4221,4497,5090,5625,6180,6823,8927,10122,10783,g232cmplcmpl0,0,0,2,1,0,0,0,0,0,0,1,0,0,0,g232.t1g232
CM062876.1115535153115539855g233.t10-11553515311553985504318,131,164,149,0,1223,3720,4553,g233cmplcmpl0,1,2,0,g233.t1g233
CM062876.1115535153115548866g233.t20-11553515311554886605318,131,164,188,60,0,1223,3720,8420,13653,g233cmplcmpl0,1,2,0,0,g233.t2g233
CM062876.1115563551115613376g234.t10+115563551115613376012138,97,96,72,45,77,238,124,135,115,107,151,0,3652,6245,13376,15553,18835,20220,32450,33147,33944,47040,49674,g234cmplcmpl0,0,1,1,1,1,0,1,2,2,0,2,g234.t1g234
CM062876.1115637712115658931g235.t10-11563771211565893106177,179,144,182,169,49,0,3067,7981,8806,20605,21170,g235cmplcmpl0,1,1,2,1,0,g235.t1g235
CM062876.1115637712115658931g235.t20-11563771211565893105177,179,182,169,49,0,3067,8806,20605,21170,g235cmplcmpl0,1,2,1,0,g235.t2g235
CM062876.1115637712115741675g235.t30-115637712115741675012177,179,144,182,169,114,152,169,115,63,908,16,0,3067,7981,8806,20605,34273,37541,44649,61198,85831,101891,103947,g235cmplcmpl0,1,1,2,1,1,2,1,0,0,1,0,g235.t3g235
CM062876.1116738718116787465g236.t10+11673871811678746501985,74,78,190,166,109,96,93,126,65,70,147,71,88,153,153,165,62,154,0,8872,15818,18162,20864,21918,22458,23979,24426,25136,27366,30909,31926,32448,33464,39304,43168,44484,48593,g236cmplcmpl0,1,0,0,1,2,0,0,0,0,2,0,0,2,0,0,0,0,2,g236.t1g236

Augustus (hub_32_augustus) Track Description
 

Description

This track shows ab initio predictions from the program AUGUSTUS (version 3.1). for the 25 Sep 2023 Strix aluco/GCA_031877795.1_bStrAlu1.hap1 genome assembly.

The predictions are based on the genome sequence alone.

Gene count: 23,981; Bases covered: 411,701,066

Data Access

Download GCA_031877795.1_bStrAlu1.hap1.augustus.gtf.gz GTF file.

Methods

Statistical signal models were built for splice sites, branch-point patterns, translation start sites, and the poly-A signal. Furthermore, models were built for the sequence content of protein-coding and non-coding regions as well as for the length distributions of different exon and intron types. Detailed descriptions of most of these different models can be found in Mario Stanke's dissertation. This track shows the most likely gene structure according to a Semi-Markov Conditional Random Field model. Alternative splicing transcripts were obtained with a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2).

The different models used by Augustus were trained on a number of different species-specific gene sets, which included 1000-2000 training gene structures. The --species option allows one to choose the species used for training the models. Different training species were used for the --species option when generating these predictions for different groups of assemblies.

Assembly Group Training Species
Fish zebrafish
Birds chicken
Human and all other vertebrates human
Nematodes caenorhabditis
Drosophila fly
A. mellifera honeybee1
A. gambiae culex
S. cerevisiae saccharomyces

This table describes which training species was used for a particular group of assemblies. When available, the closest related training species was used.

Credits

Thanks to the Stanke lab for providing the AUGUSTUS program. The training for the chicken version was done by Stefanie König and the training for the human and zebrafish versions was done by Mario Stanke.

References

Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656

Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. PMID: 14534192