MANE (Matched Annotation between NCBI and EBI)
Accurate annotation of the human genome is essential for genomics research and clinical applications. RefSeq (NCBI) and Ensembl/GENCODE (led by EMBL-EBI) produce independent human gene annotation. Since 2005, the joint Consensus Coding Sequence (CCDS) project has defined over 30,000 CDS for 95% of coding genes. However, the large number of alternatively spliced transcripts and the lack of standardised default transcripts displayed across resources present challenges, especially in the clinical context. Building on our past collaboration, we have launched a new initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) project, to jointly converge on a high-confidence, genome-wide transcript set.
During phase 1, we will release the MANE Select transcript set to include one well-supported transcript per protein-coding locus. All transcripts in the set will perfectly align to the GRCh38 reference assembly and represent 100% identity (5’UTR, CDS, 3’UTR) between the Ensembl (ENST) transcript and the corresponding RefSeq (NM) transcript. MANE Select transcripts are identified using independent computational methods complemented by manual review and discussion. The methods utilise evidence of functional potential such as expression levels, evolutionary conservation, and clinical significance. Transcript ends are defined using CAGE data from the FANTOM consortium and polyA site data from conventional and next generation sequencing.
The MANE project is only being completed for human genes on GRCh38. There is no plan to extend this out to other species, or to retroactively add this data to our archived GRCh37 gene annotation.