Ensembl mobile site help

Things to know when navigating the Ensembl mobile site

Search box

Use the search box at the top right of all Ensembl views to search for a gene, phenotype, sequence variant, and more.

Top navigation

Touch MENU button to open the main menu and touch again to close.

Touch MENU

Left hand side menu

Touch the left menu icon () or swipe right to open the side menu and touch anywhere outside the menu or touch the cross icon or swipe left to close.

The ? icon

Touch the icon to get help

And don't forget to send us your comments using the feedback link inside the main menu.

EnsemblEnsembl Home

Zebrafish assembly and gene annotation

Assembly

After 2.5 years of assembly curation, the GRC presents the new zebrafish reference genome assembly, GRCz11. This latest assembly has been refined by the addition of nearly 1000 finished clone sequences and the resolution of more than 400 genome issues. GRCz11 shows a significant reduction in scaffold numbers and increase in scaffold N50 whilst the overall genome size was not affected. For the first time in a zebrafish assembly, GRCz11 also features alternate loci scaffolds (ALT_REF_LOCI) for representations of variant sequences. The alignments of the alternate loci scaffolds to the primary chromosomal path are also included in the GRCz11 assembly to provide the chromosome context for these alternate sequences.

More information about zebrafish research can be found at the Wellcome Trust Sanger Institute and GRC Zebrafish .

The genome assembly represented here corresponds to GenBank Assembly ID GCA_000002035.4

Gene annotation

The Ensembl GRCz11 assembly was annotated using Ensembl's automatic annotation pipeline. Predictions from zebrafish proteins have been given priority over predictions from other non-mammalian vertebrate species. All Uniprot proteins were filtered to remove predictions (PE level 3 and above). Aligned zebrafish cDNAs and zebrafish RNASeq data have been used to add UTRs. RNASeq data from embryonic and olfactory epithelium tissues were also used to produce gene models. Genes are named based on the alignment of their coding regions to known entries in public databases; ZFIN genes have priority in this process.

The Ensembl annotations were then merged with Vega annotations at the transcript level. Transcripts were merged if they shared the same internal exon-intron boundaries (i.e. had identical splicing pattern) with slight differences in the terminal exons allowed. Importantly, all Vega source transcripts (regardless of merge status) were included in the final merged gene set.

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

AssemblyGRCz11 (Genome Reference Consortium Zebrafish Build 11), INSDC Assembly GCA_000002035.4, May 2017
Base Pairs1,674,207,132
Golden Path Length1,373,471,384
Annotation providerEnsembl
Annotation methodFull genebuild
Genebuild startedAug 2017
Genebuild releasedMar 2018
Genebuild last updated/patchedApr 2018
Database version94.11

Gene counts (Primary assembly)

Coding genes25,592 (incl 47 readthrough)
Non coding genes6,599
Small non coding genes3,227
Long non coding genes3,278 (incl 6 readthrough)
Misc non coding genes94
Pseudogenes315
Gene transcripts59,876

Gene counts (Alternative sequence)

Coding genes4,721
Gene transcripts6,029

Other

Genscan gene predictions50,550
Short Variants17,297,641
Structural variants5,735

About this species