Plant Ontology: Principles and Rationales
Objectives
The main objective of the Plant Ontology (PO) project is to create a set of precisely defined terms that can be uniformly applied to describe the anatomy, morphology, and development stages of all plants, providing a semantic framework for meaningful cross-species queries across databases. In order to make meaningful queries, the terms themselves must be organized in a way that reflects their known biological relationships. A unified ontology for all plants can integrate existing species-specific vocabulary terms and will facilitate functional annotation efforts, such as the annotation of gene expression data and phenotypes.
The original task of the Plant Ontology Consortium (POC) was to efficiently integrate the diverse vocabularies used to describe Arabidopsis, maize, and rice anatomy, morphology, and development. Thus, the first version of PO spanned two major taxonomic divisions: monocots and dicots. Recent revisions of the PO have extended this controlled vocabulary to encompass not only other angiosperm families (such as Fabaceae and Solanaceae), but also groups such as gynomsperms, pteridophyte, bryophytes, and even algae.
It is important to emphasize that ontologies are not just an extensive collection of botanical terms, but rather a complex hierarchical structure in which botanical concepts are described by their meaning and by their relationships to each other. While the educational aspects of earlier versions of the PO were to some extent limited by the software available, the current AmiGO web browser allows easy access to the PO by novice users. Work is underway to include links to images for many of the terms, making the PO a valuable educational tool.
Organizing principles and rationales
Plant descriptors (e.g., gynoecium, leaf, flowering stage) are often common English words that have been applied with varying degrees of precision; the same term can be applied to quite different structures (e.g., calyptra in mosses and Eucalypts), or conversely, different terms can be applied to similar structures (e.g, leaf, needle). The Plant Ontology represents a major step toward a unified vocabulary for all plants. While conflicts and exceptions are inevitable in an ontology with such a broad scope, ontology practices, suchs the use of synonyms, taxonomic restrictions, and subsets, can resolve many conflicts. The current version of the PO is a working solution, so that annotation of genes and phenotypes can proceed, but it is a work in progress and will need to be continuousy updated and refined as our knowledge of biology grows.
As a candidate ontology in the Open Biomedical Ontologies (OBO) Foundry, the POC is following OBO Foundry principles. These include principles for ontology management (e.g., appoint a person responsible for liaison with the OBO Foundry, provide a tracker for additions and corrections, provide a help desk for inquiries), principles for collaboration with the developers of other ontologies (e.g., reuse terms from other ontologies whenever possible degree), and principles pertaining to specific aspects of developing the ontology files (e.g., version tracking).
General considerations
The following principles were adopted by PO developers:
- To create a biologically accurate ontology, while at the same time keeping in mind its applications (i.e. annotations and query results) as a driving force. We have come to realization that the practical use of the ontology, for both annotation and querying, in many ways will dictate its structure.
- To keep the ontology simple, we feel it is important to avoid the tendency to be too inclusive, leading to a massive proliferation of the terms. Such over-population of terms defeats the purpose of having a simple, broadly applicable ontology. In keeping with this, many terms that can be post-composed by combining a super class and a part_of relation have not been added to the PO, unless specifically requested by a user.
- High-level nodes (for instance, inflorescence), must remain general to encompass the enormous morphological diversity of these structures in plants.
- Avoid creation of 'species specific' terminology. Even if a structure or growth stage is known to occur in only a limited set of taxa, rather than create a species-specific name, we choose a more general name based on structural or positional characteristics (e.g., spore capsule calyptra, rather than moss calyptra) and take advantage of synonymy and filtering options available in the ontology browsing software that allow for species specific queries.
- Reuse existing tools and resources as much as possible. For the most part, we have adopted the current structure of Gene Ontology (GO), as well as the software tools developed by Gene Ontology Consortium (GOC). All of the top level terms in the PO are defined with references to other ontologies (primarily the Common Anatomy Reference ontology, CARO, but also the GO).
Top nodes of the Plant Ontology
The PO is divided into two major branchs: plant anatomical entity and plant structure development stage. The plant anatomical entity branch of the PO covers material and immaterial anatomical entities such as plant structure (which includes classes such as whole plant, plant cell, and plant organ), portion of plant substance (such as cutin), and plant anatomical space (such as axil or micropyle). The plant structure development stage branch of the PO includes development stages of whole plant and other plant structures, such as sporophyte development stage, flower development stage, or secondary xylem development stage. These two branches are part of a single ontology file (plant_ontology.obo or plant_ontology.owl), which allows the ontology editors to create relations and logical definitions that span the two different branches.
Ontology structure rationale
What constitutes a term in the PO?
The domain of the PO is encompassed by the top-level classes: plant anatomical entity (PO:0025131) and plant structure development stage (PO:0009012). Generic terms describing anatomical or mophological parts of a plant, spanning from organs to tissues to cells, are included in PO. Also, a number of 'grouping' terms were created, with the purpose of classifying main branches of the ontology (terms such as collective plant structure or phyllome). Subcellular structures (such as filiform aparatus, sieve plate, or primary endosperm nucleus) are excluded from PO, because they fall within the domain of the Gene Ontology's cellular component branch.
Terms for plant structure development stages describe stages in the life of a whole plant or plant part during which the structure undergoes developmental processes such as growth, differentiation, or senescence. The PO does not define the developmental processes themselves, which fall within the domain of the Gene Ontology. Instead, it uses the relevant GO terms to define the stages in the life cycle of a plant or of part of a plant that are delimited by particular developmental landmarks. Development stages may be included for any plant structure, from trichome development stage to whole plant development stage, although generally, plant structure development stages are for multicelluar structures, as celluar developmental process are already covered in the GO.
Attributes of anatomical parts are generally avoided in the PO. Users wishing to include attributes in their annotations should refer to the Phenotypic Quality Ontology (PATO).
Relations in the Plant Ontology
All relationships in the PO are defined according to the OBO Foundry Relation Ontology or the Basic Formal Ontology.
A complete list of relations used by PO is available on our wiki.
All relationships in the PO are OWL 'existential restrictions', also known as all-some relationships. A relationship R, between any two terms A and B should be read as 'all instances of A stand in relation R to some instance of B'. For example, 'parenchyma cell part_of parenchyma tissue' means that all instances of parenchyma cell are part of some instance of parenchyma tissue.
Is_a completeness and part_of relationships
Every term in the PO has an is_a parent, that is, every PO class is a subclass of some other PO class. Ultimately, each term's ancestry can be traced to one of the top level terms: plant anatomical entity or plant structure development stage. Is_a completeness is a best practice for ontology development, because, logically, every entity must be an instance of a more general class of entities (notice that the ultimate root of the PO is the entity 'all').
Wherever possible, terms in the plant structure branch of the PO are classified as part_of another structure. However, because all relations in the PO must be true in every case, some common part_of relations cannot be applied in an ontology for all plants. For example, we cannot include the relation "seed part of fruit", because it is not true in gymonsperms, nor the relation "flower part of inflorescene", because it is not true of all flowering plants. In many cases where a part_of relation cannot be used, the inverse has_part relation is used instead (e.g., infloresecence has_part flower).
Issue of granularity (taxon-specific terms, synonyms)
Taxon-specific terms are included as separate terms only when they have structural or developmental characteristics that distinguish them from the more general parent class and are necessary for annotation work (e.g., ear inflorescence). In many cases, granular or taxon-specific terms are included as synonyms of a generic term. For example, the term inflorescence (PO:0009049) currently has 36 synonyms (e.g., cob, cyme, panicle, raceme, etc.).
The PO uses synonyms of four different scopes: exact, narrow, broad, and related. Exact synonyms are important when the same structure has multiple names, e.g., phellogen is an exact synonym of cork cambium. Narrow synonyms are often used for structures that have different names in different taxa. For example, pod and achene are narrow synonyms of fruit. Broad synonyms are used when the synonym may encompass multiple entities, e.g., adventitious root is a broad synonym of both basal root and shoot-borne root. Related synonyms are used when a word or phrase has been used synonymously with the primary term name in the literature, but the usage is not strictly correct. For example, carpel septum is a related synonym of ovary septum.