Vervet-AGM (ChlSab1.1)

Vervet-AGM assembly and gene annotation

Assembly

Vervet-AGM (Chlorocebus sabaeus), also known as the Vervet Monkey or African Green Monkey, is originally from West Africa but was introduced in the late 1600s to the Caribbean. The species is important in studying high blood pressure and AIDS, since it is a host for simian immunodeficiency virus (SIV). This release features the assembly ChlSab 1.1 (GCA_000409795.2), which became available in March 2014.

The assembly comprises 2,003 toplevel sequences (including 21 chromosomes) from 162,723 contigs. The N50 of the scaffolds is 81.8Mb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.

Gene annotation

The gene set was built using a mixed approach. Due to the lack of species-specific sequences and the availability of RNASeq data for Vervet-AGM from Washington University, the final gene set comprises models based on orthologous proteins from the vertebrate division of UniProtKB, longest translations of some human gene models from Ensembl 73, as well as models from RNASeq data.

11,258 gene models were made exclusively from RNASeq data. The data were also used to add UTR to gene models. The total gene set contains 10165 protein-coding genes with a further 8,218 ncRNAs and 505 pseudogenes.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyChlSab1.1, INSDC Assembly GCA_000409795.2, Mar 2014
Base Pairs2,789,656,328
Golden Path Length2,789,656,328
Annotation providerEnsembl
Annotation methodFull genebuild
Genebuild startedMar 2014
Genebuild releasedOct 2014
Genebuild last updated/patchedFeb 2015
Database version111.1

Gene counts

Coding genes19,165
Non coding genes8,245
Small non coding genes6,326
Long non coding genes3
Misc non coding genes1,916
Pseudogenes575
Gene transcripts28,078

Other

Genscan gene predictions88,465
Short Variants0