Ensembl mobile site help

Things to know when navigating the Ensembl mobile site

Search box

Use the search box at the top right of all Ensembl views to search for a gene, phenotype, sequence variant, and more.

Top navigation

Touch MENU button to open the main menu and touch again to close.

Touch MENU

Left hand side menu

Touch the left menu icon () or swipe right to open the side menu and touch anywhere outside the menu or touch the cross icon or swipe left to close.

The ? icon

Touch the icon to get help

And don't forget to send us your comments using the feedback link inside the main menu.

EnsemblEnsembl Home

Macaque assembly and gene annotation


The Mmul_8.0.1 assembly was submitted by the Baylor College of Medicine in 2015. The assembly is on the chromosome level, consisting of 348,493 contigs assembled into 286,262 scaffolds/22 chromosomes.

The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 107,172 while the scaffold N50 is 4,193,270. This large contig and scaffold N50 has allowed for a much more complete annotation of the rhesus macaque genome that the previous annotation on Mmul_1.0.

The genome assembly represented here corresponds to GenBank Assembly ID GCA_000772875.3

Gene annotation

The annotation for the protein -coding genes was carried out using three main techniques:

  • Rhesus macaque RNA-seq data from thirteen tissues.
  • Whole genome alignment against the human GRCh38 assembly followed by projection of GENCODE Basic protein-coding transcripts in regions of sufficient alignment conservation.
  • Splice-aware alignment of a subset of UniProt proteins to the Mmul_8.0.1 assembly.

The subset of UniProt proteins used was our 'primates basic' set. This consisted of the proteins from the following clades and protein existence (PE) levels:

  • Human PE level 1 & 2 proteins.
  • Other primates PE level 1, 2 & 3 proteins.
  • Mouse PE level 1 & 2 proteins.
  • Other mammals PE level 1 & 2 proteins.
  • Other vertebrates PE level 1 & 2 proteins.

UTRs were obtained (where possible) from the RNA-seq data and alignments of RefSeq rhesus macaque cDNAs.

Small ncRNAs were obtained using a combination of BLAST and Infernal/RNAfold.

Long intergenic ncRNAs were annotated from transcripts produced during the RNA-seq pipeline that had either a poor or no BLAST hit to any UniProt vertebrate PE12 protein. These transcripts are then scanned for evidence of protein domains. If no evidence was found the model was marked as lincRNA.

The annotation process is described in the document below.

More information

General information about this species can be found in Wikipedia.



AssemblyMmul_8.0.1, INSDC Assembly GCA_000772875.3, Nov 2015
Base Pairs3,146,411,622
Golden Path Length3,236,224,332
Annotation providerEnsembl
Annotation methodFull genebuild
Genebuild startedFeb 2016
Genebuild releasedOct 2016
Genebuild last updated/patchedOct 2016
Database version94.801

Gene counts

Coding genes21,099
Non coding genes11,001
Small non coding genes5,995
Long non coding genes2,945
Misc non coding genes2,061
Gene transcripts56,748


Genscan gene predictions57,398
Short Variants53,041,736
Structural variants110

About this species