Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Semantics In Kepler

This is version 5. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]


Intended audience

This document is intended for SEEK and Kepler developers. It is a DRAFT DESIGN DOCUMENT and does not reflect functionality as it currently exists in Kepler or SEEK. Comments and feedback are appreciated.

Use of Semantics in Kepler

We intend to further develop the use of semantic mediation technologies within the Kepler scientific workflow environment. Semantic tools will be used for both discovery and integration of actors and data within the Kepler system. This document outlines some of the semantic functions, ontology issues, and GUI features for each of the major areas of functionality:

  • Data and Actor Discovery
  • Data and Actor Classification and Organization
  • Data and Actor Transformation and Integration
  • Semantic Workflow Design

Data and Actor Discovery

Overview

We intend to use ontology-driven search capabilities to improve search capabilities for both data and actors that have been annotated with terms from formal ontologies. For both actors and data, there are two levels of annotation that we will make use of. The first level is an annotation that addresses the general topic of the data or function of the actor. We can call this the "topical ontology". The second level is another annotation that describes the semantic signature of the data or actor. We can call this the "signature ontology". In the case of data, this describes the semnatic type of each attribute within the data and how the attributes relate to one another. For actors, this describes the semantic type of each input and output port and how the ports relate to one another. In many ways the topical ontology describes what something represents or does, while the signature ontology describes its data requirements.

Envisioned GUI

We envision several mechanisms for discovering relevant data and actors via their semantic annotations. First, in the left-hand list of data and actors, we envision users being able to search for terms from an ontology and have all related results display in the list of results. For example, one might search for data about "biodiveristy" and see a data set that contains abundance of reptile species at a site. Or one might search for "SpeciesDistributionModels" and see the GARP model. In this scenario, we need to decide how the user chooses terms from the ontology for searching. Currently, we simply allow them to type in a text string which is then compared to the ontology terms, and if there is a match, those ontology terms are used in the search. We may want the ability to be more precise.

Second, we would like scientists to be able to discover semantically compatible actors and data while composing a workflow. This would involve two possible GUI mechanisms. First, there could be a toggle that allowed the user to only display semantically compatible actors and data in the left-hand pane. When toggled, the system would impose a constraint in which each potential actor is screened against the current workflow to see if it could be added in any semantically compatible way based on the semantic type of its I/O signature. When the workflow canvas is blank, all actors and data woud show up in the list, but as actors and data are added to the canvas, this imposes constraints that reduces the number fo actors that are displayed. Second, the user should be able to select any combination of one or more actors that is currently on the canvas and right click to "Show compatible actors". This effectively launches a semantic query against the semantic I/O signature of the selected actors and displays compatible actors in the left-hand pane.

Ontology issues

There is some question as to what exactly our needs are for the topical ontology and the signature ontology. These could probably both be part of one overall ontology, but that may also complicate matters. The topical ontology is in many ways more general, and would represent one or more classification axes that allowed users to label the actors and data with relevant terms. For example, an actor that calculates the Shannon-Weiner index might be labeled as a "BiodiversityIndex" and a model that generates spatially explicit maps of species distributions might be labeled a "SpatialSpeciesDistributionModel" (not that that would be an explicit term, but that would be the meaning of the annotation). The signature ontology would specifically be used to label the semantic types of data that are contained within a data set or that flow between two or more actors. Thus, this ontology would contain terms that are concretely tied to real-world measurements as represented by data. For example, the ontological annotation for a particular column in a biodiversity data set might be "PsychotriaLimonensisArealDensity" (it would also have a structural type describing units, etc). A particular "BiodiversityIndex" actor might take as input data with the type "SpeciesArealDensity", and the ontology would allow us to deduce that the Psychotria limonensis data is semantically compatible with the actor's input requirement.

Data and Actor Classification and Organization

Overview

When composing workflows scientists can create new composite actors that wrap up other actors into a functional unit that performs a typical task. They can also create new atomic actors that are introduced into Kepler using the KSW file format that is being designed. We want scientists to be able to label these new actors with the proper terms from the topical and signature ontologies so that they can be saved and then discovered later using the semantic techniques outlined above. When annotating these new actors, the scientist may have a need for new terms in the ontology to properly classify their new actor. These could be topical terms or new semantic data types that are not yet represented. In addition, the ontology may not capture any particular scientists views and knowledge of the domain area accurately, so they may wish to reorganize the ontology to better reflect their semantic worldview. In both of these cases the Kepler GUI needs to accomodate modification of the ontologies.

Data that are created by executing a workflow will be saved locally and possibly remotely and will need to be semantically labeled as well to maximize its utility. We hope that part of the semantic annotation may be propagated through the workflow automatically, but in the absence of this advanced feature the scientist would need to be able to annotate the derived data with its appropriate semantic type. This operation is closely related to annotating the ports of a new actor.

Envisioned GUI

How exactly this would work is an open topic. In the current Kepler GUI the topical ontology is represented as a tree control on the left side of the window. Each category shows up in the tree in all of its parent categories, and actors that have been annotated show up i the tree node for each ontology temr that applies. Thus, actors may show up in one, two, or more parts of the tree. To annotate a new actor, we envision being able to drag a composite actor from the canvas and drop it onto the ontology category in the tree. It could then be dragged again to add it to another category as well. This operation adds the new actor to the subsumption hierarchy at the appropriate place(s). If the ontology ends up representing more than just subsumption, we will need an GUI ability to create these additional relationships. One of our concerns is that the folder view that currently is shown in the tree conveys a much more informal representation of the ontology than is appropriate. So, we have considered changing the icons from folders to another 'container' icon that might reduce some implications of a folder. Strictly speaking, one physical item can not be in more than one folder at a time, so the folder metaphor isn't really appropriate for our case. The tree itself isn't really accurate as the ontology is a graph, so we could and should consider whether we want to use a more appropriate representation of the ontology in the left pane. Of course, switching to a graph representation could be very confusing and so any benefits in accuracy of the visualization would need to be weighed against the cost of using a less familiar and more complicated representation.

An alternative approach might be to right click on an actor and be shown a dilaog for classifying the actor. This dialog would both show the ontology and show the actor and its ports and allow the components of the actor to be annotated with the terms from the ontology.

Another alternative proposed by B. Ludaescher is to allow the user to enter Sparrow expressions in a text box for describing actor annotations. This is simpler to implement but has significant usability questions, partly because it assumes that the user has detailed knowledge of the terms and relationships between terms inthe ontology.

In many ways, annotating the semantic tpye of the ports (the actor's I/O signature) involves a different set of operations in the GUI than annotating what the actor is 'about' in terms of its function and algorithm.

Data and Actor Transformation and Integration

Overview

Envisioned GUI

Ontology issues

Semantic Workflow Design

Overview

Different levels of abstraction...drill down...

Envisioned GUI

Comments on the Draft



Go to top   More info...   Attach file...
This particular version was published on 01-Mar-2005 10:49:37 PST by NCEAS.jones.