Ensembl mobile site help

Things to know when navigating the Ensembl mobile site

Search box

Use the search box at the top right of all Ensembl views to search for a gene, phenotype, sequence variant, and more.

Top navigation

Touch MENU button to open the main menu and touch again to close.

Touch MENU

Left hand side menu

Touch the left menu icon () or swipe right to open the side menu and touch anywhere outside the menu or touch the cross icon or swipe left to close.

The ? icon

Touch the icon to get help

And don't forget to send us your comments using the feedback link inside the main menu.

EnsemblEnsembl Home

GFF3 File Format - Definition and supported options

The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track definition lines. The following documentation is based on the Version 3 specifications.

Fields

The first line of a GFF3 file must be a comment that identifies the version, e.g.

##gff-version 3

Fields must be tab-separated. Also, all but the final field in each feature line must contain a value; "empty" columns should be denoted with a '.'

  1. seqid - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important note: the seq ID must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.
  2. source - name of the program that generated this feature, or the data source (database or project name)
  3. type - type of feature. Must be a term or accession from the SOFA sequence ontology
  4. start - Start position of the feature, with sequence numbering starting at 1.
  5. end - End position of the feature, with sequence numbering starting at 1.
  6. score - A floating point value.
  7. strand - defined as + (forward) or - (reverse).
  8. phase - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
  9. attributes - A semicolon-separated list of tag-value pairs, providing additional information about each feature. Some of these tags are predefined, e.g. ID, Name, Alias, Parent - see the GFF documentation for more details.

Note that where the attributes contain Parent identifiers, these will be used by Ensembl to display the features as joined blocks.

##gff-version 3
ctg123 . mRNA            1300  9000  .  +  .  ID=mrna0001;Name=sonichedgehog
ctg123 . exon            1300  1500  .  +  .  ID=exon00001;Parent=mrna0001
ctg123 . exon            1050  1500  .  +  .  ID=exon00002;Parent=mrna0001
ctg123 . exon            3000  3902  .  +  .  ID=exon00003;Parent=mrna0001
ctg123 . exon            5000  5500  .  +  .  ID=exon00004;Parent=mrna0001
ctg123 . exon            7000  9000  .  +  .  ID=exon00005;Parent=mrna0001

More information

For more information about this file format, see the documentation on the GMOD wiki.

HASH(0x5487e00)