Biosystematics and Ecology :
Research Article
|
Corresponding author: Luis Antonio González-Montaña (lagonzalezmo@unal.edu.co)
Academic editor: Christian Sturmbauer
Received: 15 Jul 2021 | Accepted: 28 Sep 2021 | Published: 11 Nov 2021
© 2021 Luis González-Montaña
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
González-Montaña LA (2021) Semantic-based methods for morphological descriptions: An applied example for Neotropical species of genus Lepidocyrtus Bourlet, 1839 (Collembola: Entomobryidae). Biosystematics and Ecology 1: e71620. https://doi.org/10.1553/biosystecol.1.e71620
|
The production of semantic annotations has gained renewed attention due to the development of anatomical ontologies and the documentation of morphological data. Two methods are proposed in this production, differing in their methodological and philosophical approaches: class-based method and instance-based method. The first, the semantic annotations are established as class expressions, while in the second, the annotations incorporate individuals. An empirical evaluation of the above methods was applied in the morphological description of Neotropical species of the genus Lepidocyrtus (Collembola: Entomobryidae: Lepidocyrtinae). The semantic annotations are expressed as RDF triple, which is a language most flexible than the Entity-Quality syntax used commonly in the description of phenotypes. The morphological descriptions were built in Protégé 5.4.0 and stored in an RDF store created with Fuseki Jena. The semantic annotations based on RDF triple increase the interoperability and integration of data from diverse sources, e.g., museum data. However, computational challenges are present, which are related with the development of semi-automatic methods for the generation of RDF triple, interchanging between texts and RDF triple, and the access by non-expert users.
class-based methods, instance-based methods, resource description framework (RDF), Neotropical region, Hexapoda
In recent years, anatomical ontologies have gained attention in the formalization of semantic-based morphological descriptions, which encompass the standardization of anatomical terminology and interoperability between morphological data coming from diverse sources (
chaeta (Entity): ciliated (Quality)
chaeta (character): ciliated (character state)
This software was based initially in NeXML language, incorporating after mx, a web-based application to gather information about specimens, and building descriptive matrices (
A second approach called “semantic instance anatomy” is developed by
Philosophical differences rely on objects that are described in each method, classes, and instances (Fig.
Graphs of internal structure of a) class-based and b) instance-based methods. The orange points represent classes within the descriptive template that in the example refers to the chaeta Ps2 part of cephalic chaetae. The purple points represent instances or individuals for each class and are named with provisional labels. The SubClassOf relation links classes, and the has_individual relation links class and individual instanced.
Unfortunately, approaches to the production of semantic annotations in a standardized language are scarce or not available for non-expert users. A method is the building of semantic spreadsheet templates where classes, individuals, and properties are declared, with which an ontology is built. A good example is developed by
Continuing with the example above, the class antenna represents an individuals set that shares some property, e. g., part of the head capsule. The individuals are described object property or data property assertion could be made as:
“antenna has_individual some antenna123”, where the individual labeled as “antenna123” is the instance of the class antenna.
and
“antenna123 part_of headcapsule123”, where “part_of abdominalsegment1” is the object property assertion and “headcapsule123” is a provisional label for an individual that is an instance of the class head capsule.
Under the class-based method, the template has a “layer”, the class expressions, and under the instance-based method, the template has two “layers”, the class expressions, and individuals by class.
The morphological descriptions follow the HYA model proposed by
Predicates or relations employed during the expression of RDF triple statement under class-based and instance-based methods. Relation Ontology (RO); Phenotype And Trait Ontology (PATO).
Term | definition | Ontology identifier |
globular | A spheroid quality inhering in a bearer by virtue of the bearer's resembling a ball | PATO_0001499 |
cylindrical | A convex 3-D shape quality inhering in a bearer by virtue of the bearer's exhibiting a consistently sized round cross section | PATO_0001873 |
absent | A quality denoting the lack of an entity. | PATO_0000462 |
rounded | A shape quality inhering in a bearer by virtue of the bearer's being such that every part of the surface or the circumference is equidistant from the center | PATO_0000411 |
Protruding | A quality inhering in a bearer by virtue of the bearer's extending out above or beyond a surface or boundary. | PATO_0001598 |
truncated | A shape quality inhering in a bearer by virtue of the bearer's terminating abruptly by having or as if having an end or point cut off | PATO_0000936 |
lanceolate | A shape quality inhering in a bearer by virtue of the bearer's being shaped like a lance-head, considerably longer than wide, tapering towards the tip from below the middle; attached at the broad end | PATO_0001877 |
serrated | A shape quality inhering in a bearer by virtue of having sharp straight-edged teeth pointing to the apex. | PATO_0001206 |
asymmetrically curved | A curvature quality inhering in a bearer by virtue of the bearer's being curved asymmetrically. | PATO_0001848 |
domed | A curvature quality inhering in a bearer by virtue of the bearer's having a shape resembling a dome. | PATO_0001789 |
increased curvature | A curvature which is relatively high. | PATO_0001592 |
distributed | A spatial pattern inhering in a bearer by virtue of the bearer's being spread out or scattered about or divided up. | PATO_0001566 |
subulate | A shape quality inhering in a bearer by virtue of the bearer's being linear, very narrow, tapering to a very fine point from a narrow base. | PATO_0001954 |
filamentous | A shape quality inhering in a bearer by virtue of the bearer's having thin filamentous extensions at its edge. | PATO_0001360 |
semicircular | A 2-D shape quality inhering in a bearer by virtue of the bearer's having shape or form of half a circle. | PATO_0002232 |
branched | A branchiness quality inhering in a bearer by virtue of the bearer's having branches. | PATO_0000402 |
increased amount | An amount which is relatively high. | PATO_0000470 |
decreased amount | An amount which is relatively low. | PATO_0001997 |
right side of | A spatial quality inhering in a bearer by virtue of the bearer's being located on right side of a another entity. | PATO_0001793 |
left side of | A spatial quality inhering in a bearer by virtue of the bearer's being located on left side of from the a another entity. | PATO_0001792 |
aligned with | An alignment quality inhering in a bearer by virtue of the bearer's being in a proper spatial positioning with respect to an additional entity. | PATO_0001653 |
normal | A quality inhering in a bearer by virtue of the bearer's exhibiting no deviation from normal or average. | PATO_0000461 |
Predicates or relations employed during the expression of RDF triple statement under class-based and instance-based methods. Relation Ontology (RO); Phenotype And Trait Ontology (PATO). The relation bearer_of is a aternative term of has_characteristic (RO:0000053), which could be available.
Imported relation property |
Ontology |
PURL |
has_part |
RO |
http://purl.obolibrary.org/obo/BFO_0000051 |
adjacent_to |
RO |
http://purl.obolibrary.org/obo/RO_0002220 |
aligned_with |
RO |
http://purl.obolibrary.org/obo/RO_0002001 |
anterior_to |
PATO |
http://purl.obolibrary.org/obo/PATO_0001632 |
bearer_of |
RO |
http://purl.obolibrary.org/obo/RO_0000053 |
decreased_in_magnitude_relative_to |
PATO |
http://purl.obolibrary.org/obo/pato #decreased_in_magnitude_relative_to |
external_to |
RO |
http://purl.obolibrary.org/obo/PATO_0002483 |
increased_in_magnitude_relative_to |
PATO |
http://purl.obolibrary.org/obo/pato #increased_in_magnitude_relative_to |
internal_to |
Not available | |
is_approximately_equivalent_to |
RO |
http://purl.obolibrary.org/obo/RO_0002603 |
lateral_to |
PATO |
http://purl.obolibrary.org/obo/PATO_0001193 |
located_in |
PATO |
http://purl.obolibrary.org/obo/PATO_0002261 |
part_of |
RO |
http://purl.obolibrary.org/obo/BFO_0000050 |
posterior_to |
PATO |
http://purl.obolibrary.org/obo/PATO_0001633 |
An RDF store was built with Apache Jena Fuseki, an HTTP interface for querying RDF graphs, which can be explored in a browser as http://localhost:3030//query.html, employing SPARQL according to W3C recommendations (The World Wide Web Consortium). Apache Jena Fuseki was chosen by the simplicity in the installation, in contrast with another web API with SPARQL endpoint, for instance, Openlink Virtuoso, Ontotext, or Neo4j. This created RDF store only holds class-based morphological descriptions for each described species.
Semantic-based morphological descriptions were made for 22 species, whose files in RDF/XML format are available at https://github.com/luis-gonzalez-m/Lepidocyrtus-RDF-Store. These descriptions have an average of 592 anatomical terms, of which 260 are referred for the chaetotaxy. The RDF triple (see below) expresses (a) part-whole relation between anatomical entities (Fig.
chaeta am6.ab3 part_of some abdominal segment 3..........................................(a)
chaeta am6.ab3 bearer_of some macrochaeta.....................................................(b)
chaeta am6.ab3 not (part_of some abdominal segment 3)..................................(c)
Under the instance-based method, the above is most complex because a second “layer” must be added (Fig.
Screenshot of Protégé showing panels used in the class-based method. Left-hand side, the class “chaeta m6.ab1”, the chaeta “m6” located on the abdominal segment 1, and right-hand side, the individual identified by the label “chaeta11” and object property assertion expressed for the species Lepidocyrtus biphasis Mari Mutt, 1986.
chaeta145 part_of abdominalsegment3……………………..…..…..……………....………(d)
An object property assertion with the provisional labels “chaeta145” and “abdominalsegment3” to name the individuals or parts of organisms perceived in the reality. Some examples of RDF triples and descriptive statements are present in the Table
RDF triple statement |
Descriptive statement (natural language) |
chaeta A0.h bearer_of some microchaeta |
chaeta A0, size: microchaeta |
chaeta A0.h bearer_of some macrochaeta |
chaeta A0, size: macrochaeta |
chaeta A0.h has_part exactly 1 microchaeta |
chaeta A0, number: 1 |
chaeta Ps4 part_of some cephalic chaeta |
chaeta Ps4: present |
chaeta a3.ab1 part_of some abdominal segment 1 and (anterior_to some chaeta m3.ab1 |
chaeta a3, position: anterior to chaeta m3 |
chaeta a2.ab2 bearer_of some triangular |
chaeta a2, shape: triangular |
chaeta a2.ab3 aligned_with some chaeta m2.ab3 |
chaeta a3, alignment: aligned with chaeta m2 |
chaeta as.ab2 bearer_of some length) and (inheres_in some chaeta m3e.ab2) and (is_approximately_equivalent_to some length) |
chaeta as, length: equal to chaeta m3e |
chaeta D1p bearer_of some smooth |
chaeta D1p, texture: smooth |
dental tubercle bearer_of some domed |
dental tubercle, curvature: domed |
RDF store for biological data is oriented mainly to molecular data with Uniprot (https://uniprot.org) and Bio2RDF (https://bio2rdf.org), while for morphological data, RDF stores have not been developed. When the RDF is available, the next step is the creation of a semantic web service to put the semantic data on the web (
A list of RDF triple, where the subclass relation is retrieved for the chaetae that composed the cephalic chaeta (Table
ID |
SUBJECT |
PREDICATE |
OBJECT |
1 |
chaeta A1.h |
subClassOf |
cephalic chaeta |
2 |
chaeta A2.h |
subClassOf |
cephalic chaeta |
3 |
chaeta A3.h |
subClassOf |
cephalic chaeta |
4 |
chaeta A4.h |
subClassOf |
cephalic chaeta |
5 |
chaeta A5.h |
subClassOf |
cephalic chaeta |
ID |
SUBJECT |
PREDICATE |
OBJECT |
1 |
chaeta A1 |
bearer_of |
microchaeta |
2 |
chaeta A1.h |
bearer_of |
microchaeta |
3 |
chaeta A3.h |
posterior_to |
chaeta A5.h |
4 |
chaeta a4.lb |
bearer_of |
smooth |
5 |
chaeta A5.lb |
bearer_of |
serrated |
Example 1 (Table
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?s ?p ?o
WHERE { ?s rdfs:subClassOf ?o;
}
Example 2 (Table
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?s ?p ?o
WHERE { ?s rdf:type owl:Class ;
rdfs:subClassOf [ rdf:type owl:Restriction ;
owl:onProperty ?p ;
owl:someValuesFrom ?o; ] ;
}
<!-- http://purl.obolibrary.org/obo/L.americanus.owl#e.lb -->
<owl:Class rdf:about=“http://purl.obolibrary.org/obo/L.americanus.owl#e.lb”>
<rdfs:subClassOf rdf:resource=“http://purl.obolibrary.org/obo/L.americanus.owl#labial_chaeta”/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource=“http://purl.obolibrary.org/obo/L.americanus.owl#bearer_of”/>
<owl:someValuesFrom rdf:resource=“http://purl.obolibrary.org/obo/L.americanus.owl#ciliated”/>
</owl:Restriction>
</rdfs:subClassOf>
<rdfs:label>chaeta e.lb</rdfs:label>
</owl:Class>
The querying of individuals is similar to querying ontology classes in Fuseki, but the main difference is the querying of individuals or objects in named graphs. For example, the SPARQL query to access named graph follows the general syntax (
SELECT*
WHERE {
GRAPH <123> # 123 is the named graph
}
Semantic-based methods for the documenting of morphological data have gained interest in recent years, motivated by the potential application in phylogenetics (
The instance-based method, “semantic anatomy instance”, is most complex when the Abox assertions are built with Protégé because individuals by class need to be specified, resulting in a template composed of two layers: ontology class and instanced objects. Recently proto.morphdbase.de incorporates instance-based methods, promising to increase the use of semantic-based tools during morphological descriptions. This application is complementary to other tools that integrate morphological data with multimedia representations, for instance, MorphoNet (
Currently, there is an imperative need to document biological diversity, which implies the use of computational tools for the processing of different data generated in the biology domain. Unfortunately, this urgency is directed mostly to the storing and processing of molecular data, while the morphology is continually displaced. Morphological descriptions are a useful source of data but due to their nature and complexity requires “creative” solutions, new automatic or semi-automatic methods that permit the interchange between natural language employed commonly in published morphological descriptions and RDF triple syntax. The use of ontologies uncovers the subtle process between morphological data (expressed by RDF triple statement) to character statement, where the character state (properties) arises from the comparison between species and before the building of character matrix.
Initiatives about morphological descriptions that employ standardized languages are not new (
However, the taxonomic tradition has an important weight in the language employed during morphological descriptions and is taxon-dependent. It is necessary to reconciliate the needs of taxonomists and friendly tools to incorporate these methods. It is not the goal to evaluate the multiple RDF store available, which differs in properties as storage size, querying time, and applicability (
The author thanks the financial support of MINCIENCIAS (National doctorate scholarship 727). To Lars Vogt for comments about Semantic Anatomy Instance.
MINCIENCIAS
Scholarship for doctoral studies 727
Universidad Nacional de Colombia, Facultad de Ciencias Agrarias.
None