TwoBit Sequence Archives

A twoBit file is a highly efficient way to store genomic sequence. The format is defined here. Note that lower-case nucleotides are considered masked in twoBit, which can cause such sequence to be ignored when using the -mask option with gfServer; therefore, you may wish to convert lower-case sequence to upper-case when preparing the FASTA format.

To complete the steps below you must first download the faToTwoBit, twoBitInfo, and twoBitToFa utilities. For more information on downloading our command-line utilities, see these instructions.

To create a twoBit file, follow these steps:

Prepare the sequence for your twoBit file in a FASTA-formatted file (i.e. genome.fa).
Run the faToTwoBit program on your FASTA file:
```
    faToTwoBit genome.fa genome.2bit
```
Use twoBitInfo to verify the sequences in this assembly and create a chrom.sizes file, which is useful to construct the big* files in later processing steps:
```
    twoBitInfo genome.2bit stdout | sort -k2rn > genome.chrom.sizes
```

The twoBit commands can function with the .2bit file as a URL:

    twoBitInfo -udcDir=. http://your-website.edu/~user/genome.2bit | sort -k2nr > genome.chrom.sizes

Sequence can be extracted from the .2bit file with the twoBitToFa command, for example:

    twoBitToFa -seq=chr1 -udcDir=. http://your-website.edu/~user/genome.2bit stdout > genome.chr1.fa

Examples of extracting sequences

See these series of blog posts about Accessing the Genome Browser Programmatically to see examples of extracting sequences remotely, such as the following:

$ twoBitToFa http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit:chr1:100100-100200 stdout
>chr1:100100-100200
gcctagtacagactctccctgcagatgaaattatatgggatgctaaatta
taatgagaacaatgtttggtgagccaaaactacaacaagggaagctaatt

Also, see the API getData functions to see examples of using the URL, such as the following:

https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chr1;start=100100;end=100200

  downloadTime:      "2022:05:19T18:45:56Z"
  downloadTimeStamp: 1652985956
  genome:            "hg38"
  chrom:             "chr1"
  start:             100100
  end:               100200
  dna:               "gcctagtacagactctccctgcagatgaaattatatgggatgctaaattataatgagaacaatgtttggtgagccaaaactacaacaagggaagctaatt"