Ensembl mobile site help

Things to know when navigating the Ensembl mobile site

Search box

Use the search box at the top right of all Ensembl views to search for a gene, phenotype, sequence variant, and more.

Top navigation

Touch MENU button to open the main menu and touch again to close.

Touch MENU

Left hand side menu

Touch the left menu icon () or swipe right to open the side menu and touch anywhere outside the menu or touch the cross icon or swipe left to close.

The ? icon

Touch the icon to get help

And don't forget to send us your comments using the feedback link inside the main menu.

EnsemblEnsembl Home

Mouse assembly and gene annotation

Assembly

This site features the latest major assembly release for mouse. The primary assembly, GRCm38, was released by the Genome Reference Consortium in January 2012. It is based on Mus musculus strain C57BL/6J. This assembly is used by UCSC to create their mm10 database.

The GRCm38 primary assembly comprises 21 chromosomes and 22 unplaced scaffolds. Similar to the human genome assembly, the Genome Reference Consortium will be releasing additional sequence for GRCm38 in the form of minor releases (patches).

To convert your old data from Mouse assembly m37 to m38, click on the 'Tools' link in the header bar on any page and select 'Assembly converter' from the table.

Patches

As the GRC maintains and improves the mouse reference assembly, patches are being introduced. These patches do not change the coordinates of the primary assembly. For more information, please see our Genome Assemblies help document.

The genome assembly represented here corresponds to GenBank Assembly ID GCA_000001635.8

Gene annotation

The mouse primary assembly GRCm38 was annotated using Ensembl's automatic annotation system. This includes an updated mouse-specific repeat library, RefSeq and Uniprot protein sequence data for annotating the coding regions of protein-coding genes, as well as mouse cDNAs and ESTs for annotation untranslated regions (UTRs) of protein-coding genes.

In the current release, we continue to display a joint gene set based on the merge between the automatic annotation from Ensembl and the manually curated annotation from Havana. See the statistics table, right, for the corresponding GENCODE version number. The Consensus Coding Sequence (CCDS) identifiers have also been mapped to the annotations. More information about the CCDS project.

Updated manual annotation from Havana is merged into the Ensembl annotation every release. Transcripts from the two annotation sources are merged if they share the same internal exon-intron boundaries (i.e. have identical splicing pattern) with slight differences in the terminal exons allowed. Importantly, all Havana transcripts are included in the final Ensembl/Havana merged (GENCODE) gene set.

In addition to the gene set, we display alignments of mouse cDNA and EST sequences. The mouse cDNA alignments are updated for every Ensembl release. We also display alignments of sequences from UniProt, UniGene and the ENA vertebrate RNA collection, and ab initio gene predictions from Genscan.

HEROIC

Additional functional genomics data produced by the HEROIC project (High-throughput Epigenetic Regulatory Organisation In Chromatin) is available to download from the Ensembl Projects HEROIC portal.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyGRCm38.p6 (Genome Reference Consortium Mouse Reference 38), INSDC Assembly GCA_000001635.8, Jan 2012
Base Pairs3,486,944,526
Golden Path Length2,730,871,774
Annotation providerEnsembl
Annotation methodFull genebuild
Genebuild startedJan 2012
Genebuild releasedJul 2012
Genebuild last updated/patchedJun 2018
Database version94.38
Gencode versionGENCODE M19

Gene counts (Primary assembly)

Coding genes22,619 (incl 268 readthrough)
Non coding genes15,795
Small non coding genes5,531
Long non coding genes9,702 (incl 69 readthrough)
Misc non coding genes562
Pseudogenes12,958 (incl 5 readthrough)
Gene transcripts137,862

Gene counts (Alternative sequence)

Coding genes350 (incl 3 readthrough)
Non coding genes227
Small non coding genes110
Long non coding genes111
Misc non coding genes6
Pseudogenes194
Gene transcripts1,877

Other

Genscan gene predictions57,381
Short Variants83,761,978
Structural variants791,878

About this species