Biotypes

  • Biotype: A gene or transcript classification.
    • IG gene: Immunoglobulin gene that undergoes somatic recombination, annotated in collaboration with IMGT http://www.imgt.org/.
      • IG C gene: Constant chain immunoglobulin gene that undergoes somatic recombination before transcription
      • IG D gene: Diversity chain immunoglobulin gene that undergoes somatic recombination before transcription
      • IG J gene: Joining chain immunoglobulin gene that undergoes somatic recombination before transcription
      • IG V gene: Variable chain immunoglobulin gene that undergoes somatic recombination before transcription
    • Nonsense Mediated Decay: A transcript with a premature stop codon considered likely to be subjected to targeted degradation. Nonsense-Mediated Decay is predicted to be triggered where the in-frame termination codon is found more than 50bp upstream of the final splice junction.
    • Processed transcript: Gene/transcript that doesn't contain an open reading frame (ORF).
      • Long non-coding RNA (lncRNA): A non-coding gene/transcript >200bp in length
        • 3' overlapping ncRNA: Transcripts where ditag and/or published experimental data strongly supports the existence of long (>200bp) non-coding transcripts that overlap the 3'UTR of a protein-coding locus on the same strand.
        • Antisense: Transcripts that overlap the genomic span (i.e. exon or introns) of a protein-coding locus on the opposite strand.
        • Macro lncRNA: Unspliced lncRNAs that are several kb in size.
        • Non coding: Transcripts which are known from the literature to not be protein coding.
        • Retained intron: An alternatively spliced transcript believed to contain intronic sequence relative to other, coding, transcripts of the same gene.
        • Sense intronic: A long non-coding transcript in introns of a coding gene that does not overlap any exons.
        • Sense overlapping: A long non-coding transcript that contains a coding gene in its intron on the same strand.
        • lincRNA (long intergenic ncRNA): Transcripts that are long intergenic non-coding RNA locus with a length >200bp. Requires lack of coding potential and may not be conserved between species.
      • ncRNA: A non-coding gene.
        • miRNA: A small RNA (~22bp) that silences the expression of target mRNA.
        • miscRNA: Miscellaneous RNA. A non-coding RNA that cannot be classified.
        • piRNA: An RNA that interacts with piwi proteins involved in genetic silencing.
        • rRNA: The RNA component of a ribosome.
        • siRNA: A small RNA (20-25bp) that silences the expression of target mRNA through the RNAi pathway.
        • snRNA: Small RNA molecules that are found in the cell nucleus and are involved in the processing of pre messenger RNAs
        • snoRNA: Small RNA molecules that are found in the cell nucleolus and are involved in the post-transcriptional modification of other RNAs.
        • tRNA: A transfer RNA, which acts as an adaptor molecule for translation of mRNA.
        • vaultRNA: Short non coding RNA genes that form part of the vault ribonucleoprotein complex.
    • Protein coding: Gene/transcipt that contains an open reading frame (ORF).
    • Protein coding CDS not defined: Alternatively spliced transcript of a protein coding gene for which we cannot define a CDS.
    • Protein coding LOF: Not translated in the reference genome owing to a SNP/DIP but in other individuals/haplotypes/strains the transcript is translated. Replaces the polymorphic_pseudogene transcript biotype.
    • Pseudogene: A gene that has homology to known protein-coding genes but contain a frameshift and/or stop codon(s) which disrupts the ORF. Thought to have arisen through duplication followed by loss of function.
      • IG pseudogene: Inactivated immunoglobulin gene.
      • Polymorphic pseudogene: Pseudogene owing to a SNP/indel but in other individuals/haplotypes/strains the gene is translated.
      • Processed pseudogene: Pseudogene that lack introns and is thought to arise from reverse transcription of mRNA followed by reinsertion of DNA into the genome.
      • Transcribed pseudogene: Pseudogene where protein homology or genomic structure indicates a pseudogene, but the presence of locus-specific transcripts indicates expression. These can be classified into 'Processed', 'Unprocessed' and 'Unitary'.
      • Translated pseudogene: Pseudogenes that have mass spec data suggesting that they are also translated. These can be classified into 'Processed', 'Unprocessed'
      • Unitary pseudogene: A species specific unprocessed pseudogene without a parent gene, as it has an active orthologue in another species.
      • Unprocessed pseudogene: Pseudogene that can contain introns since produced by gene duplication.
    • Readthrough: A readthrough transcript has exons that overlap exons from transcripts belonging to two or more different loci (in addition to the locus to which the readthrough transcript itself belongs).
    • Stop codon readthrough: The coding sequence contains a stop codon that is translated (as supported by experimental evidence), and termination occurs instead at a canonical stop codon further downstream. It is currently unknown which codon is used to replace the translated stop codon, hence it is represented by 'X' in the protein sequence
    • TEC (To be Experimentally Confirmed): Regions with EST clusters that have polyA features that could indicate the presence of protein coding genes. These require experimental validation, either by 5' RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies.
    • TR gene: T cell receptor gene that undergoes somatic recombination, annotated in collaboration with IMGT http://www.imgt.org/.
      • TR C gene: Constant chain T cell receptor gene that undergoes somatic recombination before transcription
      • TR D gene: Diversity chain T cell receptor gene that undergoes somatic recombination before transcription
      • TR J gene: Joining chain T cell receptor gene that undergoes somatic recombination before transcription
      • TR V gene: Variable chain T cell receptor gene that undergoes somatic recombination before transcription