Semantics In Kepler

This is version 3. It is not the current version, and thus it cannot be edited.
[Back to current version] [Restore this version]

Intended audience

This document is intended for SEEK and Kepler developers. It is a DRAFT DESIGN DOCUMENT and does not reflect functionality as it currently exists in Kepler or SEEK. Comments and feedback are appreciated.

Use of Semantics in Kepler

We intend to further develop the use of semantic mediation technologies within the Kepler scientific workflow environment. Semantic tools will be used for both discovery and integration of actors and data within the Kepler system. This document outlines some of the semantic functions, ontology issues, and GUI features for each of the major areas of functionality:

Data and Actor Discovery
Data and Actor Classification and Organization
Data and Actor Integration

Data and Actor Discovery

Overview

We intend to use ontology-driven search capabilities to improve search capabilities for both data and actors that have been annotated with terms from formal ontologies. For both actors and data, there are two levels of annotation that we will make use of. The first level is an annotation that addresses the general topic of the data or function of the actor. We can call this the "topical ontology". The second level is another annotation that describes the semantic signature of the data or actor. We can call this the "signature ontology". In the case of data, this describes the semnatic type of each attribute within the data and how the attributes relate to one another. For actors, this describes the semantic type of each input and output port and how the ports relate to one another. In many ways the topical ontology describes what something represents or does, while the signature ontology describes its data requirements.

Envisioned GUI

We envision several mechanisms for discovering relevant data and actors via their semantic annotations. First, in the left-hand list of data and actors, we envision users being able to search for terms from an ontology and have all related results display in the list of results. For example, one might search for data about "biodiveristy" and see a data set that contains abundance of reptile species at a site. Or one might search for "SpeciesDistributionModels" and see the GARP model. In this scenario, we need to decide how the user chooses terms from the ontology for searching. Currently, we simply allow them to type in a text string which is then compared to the ontology terms, and if there is a match, those ontology terms are used in the search. We may want the ability to be more precise.

Second, we would like scientists to be able to discover semantically compatible actors and data while composing a workflow. This would involve two possible GUI mechanisms. First, there could be a toggle that allowed the user to only display semantically compatible actors and data in the left-hand pane. When toggled, the system would impose a constraint in which each potential actor is screened against the current workflow to see if it could be added in any semantically compatible way based on the semantic type of its I/O signature. When the workflow canvas is blank, all actors and data woud show up in the list, but as actors and data are added to the canvas, this imposes constraints that reduces the number fo actors that are displayed. Second, the user should be able to select any combination of one or more actors that is currently on the canvas and right click to "Show compatible actors". This effectively launches a semantic query against the semantic I/O signature of the selected actors and displays compatible actors in the left-hand pane.

Ontology issues

There is some question as to what exactly our needs are for the topical ontology and the signature ontology. These could probably both be part of one overall ontology, but that may also complicate matters. The topical ontology is in many ways more general, and would represent one or more classification axes that allowed users to label the actors and data with relevant terms. For example, an actor that calculates the Shannon-Weiner index might be labeled as a "BiodiversityIndex" and a model that generates spatially explicit maps of species distributions might be labeled a "SpatialSpeciesDistributionModel" (not that that would be an explicit term, but that would be the meaning of the annotation). The signature ontology would specifically be used to label the semantic types of data that are contained within a data set or that flow between two or more actors. Thus, this ontology would contain terms that are concretely tied to real-world measurements as represented by data. For example, the ontological annotation for a particular column in a biodiversity data set might be "PsychotriaLimonensisArealDensity" (it would also have a structural type describing units, etc). A particular "BiodiversityIndex" actor might take as input data with the type "SpeciesArealDensity", and the ontology would allow us to deduce that the Psychotria limonensis data is semantically compatible with the actor's input requirement.