Ensembl mobile site help

Things to know when navigating the Ensembl mobile site

Search box

Use the search box at the top right of all Ensembl views to search for a gene, phenotype, sequence variant, and more.

Top navigation

Touch MENU button to open the main menu and touch again to close.

Touch MENU

Left hand side menu

Touch the left menu icon () or swipe right to open the side menu and touch anywhere outside the menu or touch the cross icon or swipe left to close.

The ? icon

Touch the icon to get help

And don't forget to send us your comments using the feedback link inside the main menu.

EnsemblEnsembl Home

Sheep assembly and gene annotation


The sheep (Ovis aries) genome was produced by the International Sheep Genome Consortium (ISGC). A single Texel ewe and a single Texel ram were sequenced using Illumina technology. The assembly is based on the Texel ewe data set. The Texel ram data set and Roche 454 reads from the previous assembly v1.0 (ACIV000000000) were used to fill in the gaps. 39,042 SNP markers and Ovine SNP50 genotyping linkage data were used to check scaffold integrity and to anchor scaffolds and super-scaffolds to chromosomes.

The assembly comprises 5,697 toplevel sequences from 130,765 contigs, 27 chromosomes (including the X chromosome). The N50 of the contigs is 40.4 kb and the N50 of the scaffolds is 100.1 Mb. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer.

Gene annotation

The gene set was built using a mixed approach. The similarity pipeline was used to generate 66,797 models from orthologous vertebrate proteins from UniProtKB. The RNASeq pipeline used 8.2 billion paired-end reads provided by the ISGC. The RNASeq data contains samples from a trio (ram, ewe and lamb), 7 tissue types from the reference sheep and samples from different breeds.

We pooled the tissues to avoid creating too many fragmented models. Using the RNASeq pipeline, we created 19,604 models from the pooled set. When a pooled model was missing but we had a consensus within the tissues models, the consensus model was added to the pooled set, which brought the number of RNASeq models up to 25,832. By combining the orthologous set, the RNASeq set and our ncRNA pipeline we built the final gene set: 20,921 protein coding gene models, 291 pseudogenes and 3,985 short non-coding RNA.

RNASeq data set

In addition to the main set, we have predicted gene models for each tissue type using the RNASeq pipeline. We did a BLASTp of these models against UniProt proteins of protein existence level 1 and 2 in order to confirm the open reading frame (ORF). The best BLAST hit is displayed as a transcript supporting evidence.

The tissue-specific sets of transcript models built using our RNAseq pipeline are as follows:

TissueNumber of gene models
Ewe kidney medulla8350
Ewe abomasum8694
Ewe adrenal gland8447
Ewe alveolar macrophages7747
Ewe cerebellum7714
Ewe cervix6305
Ewe colon9576
Ewe corpus luteum8096
Ewe heart ventricle7256
Ewe liver8301
Ewe lung9402
Ewe lymph node mesenteric8340
Ewe mammary gland9466
Ewe muscle biceps7650
Ewe muscle long dorsal7185
Ewe omentum8081
Ewe ovary8077
Ewe peyers patch9227
Ewe pituitary7957
Ewe placenta membranes7703
Ewe rectum8980
Ewe rumen8856
Ewe skin side8795
Ewe thyroid gland8167
Ewe uterus9071
Lamb abomasum8606
Lamb adrenal gland7813
Lamb caecum7770
Lamb cerebellum8405
Lamb cerebrum8977
Lamb cervix8924
Lamb colon8310
Lamb hypothalamus9009
Lamb kidney cortex8416
Lamb kidney medulla9115
Lamb lung9029
Lamb lymph node mesenteric8315
Lamb lymph node prescapular8913
Lamb mammary gland9924
Lamb muscle biceps7633
Lamb muscle long dorsal7155
Lamb omentum8085
Lamb ovarian follicles8026
Lamb ovary8729
Lamb peyers patch9758
Lamb pituitary gland8702
Lamb rectum8546
Lamb rumen8655
Lamb skin back8629
Lamb spleen8444
Lamb thyroid gland8307
Lamb uterus9259
Lamb ventricle8175
Ram kidney cortex8398
Ram kidney medulla8754
Ram abomasum mucosa9003
Ram adrenal gland8116
Ram alveolar macrophages7676
Ram brain stem9277
Ram caecum9616
Ram cerebellum8662
Ram cerebrum9034
Ram colon8839
Ram duodenum9068
Ram hypothalamus9153
Ram liver7674
Ram lung9041
Ram lymph node mesenteric7857
Ram lymph node prescapular7018
Ram muscle biceps6392
Ram muscle long dorsal6073
Ram omentum8640
Ram pituitary gland8627
Ram rectum8925
Ram rumen8519
Ram skin back8631
Ram spleen8619
Ram testes epididymis8805
Ram testes10891
Ram thyroid gland7817
Ram tonsil8477
Ram ventricle8058
Whole embryo10965
Reference kidney6422
Reference brain5737
Reference heart5379
Reference liver5396
Reference lung7487
Reference ovarian7112
Reference white adipose6755
Merino skin5418

More information

General information about this species can be found in Wikipedia.



AssemblyOar_v3.1, INSDC Assembly GCA_000298735.1, Aug 2012
Base Pairs2,534,344,180
Golden Path Length2,619,054,388
Annotation providerEnsembl
Annotation methodMixed strategy build
Genebuild startedDec 2012
Genebuild releasedDec 2013
Genebuild last updated/patchedMay 2015
Database version100.31

Gene counts

Coding genes20,921
Non coding genes5,843
Small non coding genes3,624
Long non coding genes1,858
Misc non coding genes361
Gene transcripts29,118


Genscan gene predictions43,449
Short Variants60,323,418
Structural variants2

About this species