Zebrafish assembly and gene annotation

Assembly

After 2.5 years of assembly curation, the GRC presents the new zebrafish reference genome assembly, GRCz11. This latest assembly has been refined by the addition of nearly 1000 finished clone sequences and the resolution of more than 400 genome issues. GRCz11 shows a significant reduction in scaffold numbers and increase in scaffold N50 whilst the overall genome size was not affected. For the first time in a zebrafish assembly, GRCz11 also features alternate loci scaffolds (ALT_REF_LOCI) for representations of variant sequences. The alignments of the alternate loci scaffolds to the primary chromosomal path are also included in the GRCz11 assembly to provide the chromosome context for these alternate sequences.

More information about zebrafish research can be found at the Wellcome Trust Sanger Institute and GRC Zebrafish .

Other assemblies

Gene annotation

The Ensembl GRCz11 assembly was annotated using Ensembl's automatic annotation pipeline. Predictions from zebrafish proteins have been given priority over predictions from other non-mammalian vertebrate species. All Uniprot proteins were filtered to remove predictions (PE level 3 and above). Aligned zebrafish cDNAs and zebrafish RNASeq data have been used to add UTRs. RNASeq data from embryonic and olfactory epithelium tissues were also used to produce gene models. Genes are named based on the alignment of their coding regions to known entries in public databases; ZFIN genes have priority in this process.

The Ensembl annotations were then merged with Vega annotations at the transcript level. Transcripts were merged if they shared the same internal exon-intron boundaries (i.e. had identical splicing pattern) with slight differences in the terminal exons allowed. Importantly, all Vega source transcripts (regardless of merge status) were included in the final merged gene set.

Detailed information on genebuild (PDF)

More information

General information about this species can be found in Wikipedia.

Statistics

Summary

Assembly	GRCz11 (Genome Reference Consortium Zebrafish Build 11), INSDC Assembly GCA_000002035.4, May 2017
Base Pairs	1,373,471,384
Golden Path Length	1,373,471,384
Annotation provider	Ensembl
Annotation method	Full genebuild
Genebuild started	Aug 2017
Genebuild released	Mar 2018
Genebuild last updated/patched	Apr 2018
Database version	111.11

Gene counts (Primary assembly)

Gene/transcipt that contains an open reading frame (ORF).Coding genes	25,545 (excl 47 A readthrough transcript has exons that overlap exons from transcripts belonging to two or more different loci (in addition to the locus to which the readthrough transcript itself belongs).readthrough)
Non coding genes	6,599
Small non coding genes	3,227
Long non coding genes	3,278
Misc non coding genes	94
A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.Pseudogenes	315
A transcript is the operational unit of a gene. In a genomic context, transcripts consist of one or more exons, with adjoining exons being separated by introns. The exons/introns are transcribed and then the introns spliced out. Transcripts may or may not encode a proteinGene transcripts	59,876

Gene counts (Alternative sequence)

Gene/transcipt that contains an open reading frame (ORF).Coding genes	4,721
Gene transcripts	6,029

Other

Genscan gene predictions	50,550
Short Variants	18,225,999
Structural variants	5,735

Zebrafish assembly and gene annotation

Assembly

Other assemblies

Gene annotation

More information

Statistics

Summary

Gene counts (Primary assembly)

Gene counts (Alternative sequence)

Other

About Us

Get help

Our sister sites

Follow us

Favourite species

All species

Zebrafish assembly and gene annotation

Assembly

Other assemblies

Gene annotation

More information

Statistics

Summary

Gene counts (Primary assembly)

Gene counts (Alternative sequence)

Other

About Us

Get help

Our sister sites

Follow us