Kepler Meeting SMS Notes

The Semantic Mediation System and KEPLER

Back to Kepler Meeting Agenda

Exploiting Ontologies

In SEEK we want to exploit "eco" ontologies to do "smart discovery and integration"
The goal is to "tag" (annotate) data and workflows (and their components/actors) using ontology terms
Our solutions are meant to be generic, applicable for KEPLER

Ontology Languages

An ontology is:

a set of concept (class) names,
subconcept (subclass) links,
named (directed, binary) relationships between concepts,
and constraints (cardinality, equivalence, conjunction, disjunction, etc.)

In SEEK, we've adopted the Web Ontology Language (OWL)

Semantic Annotations

A semantic annotation assigns an "item" to an ontology "expression".

Items

Datasets: An entire dataset or some portion (a single table, one or more attributes, one or more data values, etc.)
Workflows and components: A workflow, a workflow actor, or some portion (parameters, ports, substructures of a port type, etc.).

Selecting Items

Can be as simple as an LSID, e.g., that identifies an actor or dataset
Simple query expressions can also be used, e.g., like XPath/XPointer addressing, using EML attribute identifiers, etc.
More generally, expressed as a query.

Ontology Expressions

Defines the semantic "context" of the item selected
Can be as simple as a single concept id (like "Measurement")
Simple expressions can also be used, e.g., as paths in an ontology

Example: Measurement.spatialContext.loc.latDeg specifies the location of a Measurement's spatialContext as a latitude in degrees

More generally, update queries, e.g., SQL-style update queries

Architecture Issues

SMS-Based Applications

Browsing/Keyword Search

Categorize workflows, actors, datasets according to their position in the ontology concept hierarchy.
Search based on individual concepts (as a keyword), providing "term expansion" capabilities

Find "compatible" workflow actors

Given a workflow actor, find actors that can be connected to it (either as input or output) based on semantic annotations. If the annotations are "compatible" according to the ontology(ies), the actor is returned.
Could result in "data binding" -- a dataset may be a "compatible" input.
Note that semantic compatibility does not imply structural compatibility (the i/o types may not match; see below)
Requires port inputs/outputs to be semantically annotated

Workflow "analysis"

Given a workflow of connected actors, check that each connection (input/output) is semantically compatible.
Analysis may take advantage of annotation propagation (this is still research)

Workflow actor structural integration

Given two actors that are semantically compatible, determine one or more transformations (either by inserting new actors or deriving transformation "code") to make them structurally compatible.

In general, actor integration is a planning-style search problem (and still research)
May be a place where SCIA can contribute, to derive the structural transformation code and help users refine mappings

Dataset merging and integration

Search for "similar" datasets based on semantic annotations of current dataset
Given two datasets, merge them (data fusion) into a single dataset based on their semantic annotations + metadata
Define a dataset of interest (as a query---the classic approach---or as a target, annotated schema), then find/integrate datasets to populate result (classic data integration).

Perhaps places for SCIA to contribute?
In general, still research
Integration depends on the granularity/quality of the annotations, ontologies, etc.

Repositories

Ontology(ies)
Datasets (or metadata stating how to obtain the datasets)
Workflows and Actors (or metadata, etc.)
Semantic Annotations

"Smart discovery and integration" needs access to these components:

To search for a workflow component, we would search through semantic annotations. When an annotation matches, obtain the corresponding component.
To organize (for browsing) all actors according to their annotations. Might iterate over actors, or similarly, for datasets.

Required Tools

Ontology Editors/Browsers

The KR group in SEEK

Semantic Annotation Editors/Browsers

For creating, editing, registering annotations
KR and SMS group in SEEK

Ontology-based query rewriting/answering

Classification based on ontology (Jena, Racer, etc.)
Efficiently using rewriting to find components
Testing semantic compatibility
Annotation propogation (research)

Actor integration reasoning

Structural transformation algorithms (SCIA? CLIO? Schema Mapping?)
Search a la planners

Data merging and integration reasoning

Algorithms and rules for fusing together data
Structural transformation algorithms (see above))
Basic conversions like count/area = density

Explanation viewers/systems

To explain why an answer was obtained
Closely tied to ontology editors/browsers

"Smart" Actor Search in Kepler

A very simple keyword-based search we (Chad and I) implemented within Kepler.

Integrated with the actor 'quick search' frame
Allows dynamic actor classification (for browsing)
Allows runtime annotation and re-classification of actors
Term expansion for individual concept queries

Required a number of new features in Kepler:

ID mechanism for actors
Repositories: Fakes out component repository (as a ptolemy xml config file), annotation repository (xml file), ontology repository (simple is-a hierarchy, no rels)
Provides a very naive, hand-coded, local ID service (like for LSIDs)

What's needed for KEPLER

Ontologies and Ontology Tools

Need more example ontologies.
There also aren't tools within Kepler for creating, browsing, or editing ontologies (coupling tools within Kepeler? import OWL files?, etc).

Annotations

Need to formalize/finalize the annotation language
Annotation interface for Kepler
Also, may want:

GUI design
A uniform way to access/browse a component/dataset and its attributes, such as ports and their input/output types.
Perhaps SCIA can help with specifying annotations?

Basic Kepler GUI Hooks

Like for toolbar, menus, etc.
Checking semantic compatibility (can steal unit resolver?).
Explanation of results (like for searching, etc.)
Joined development with Ptolemy group for customizable menus, etc.
Use a personal ontology. Swap out default ontologies.

Algorithms

Need to understand the integration/merging algorithms better (working on examples/test cases currently ...)
Could today write the other types of search algorithms (compatible actors/datasets)

Repositories

Basically none of the repositories exist (except perhaps for Data, not sure)
I think the Kepler Obj. Manager can help with this, what we need from it is:

Ability to register components, data sets, ontologies, and annotations with the obj. manager
Ability to access all LSIDs of a certain type, e.g., components, data sets, ontologies, annotations
Ability to retrieve the object for an LSID
Some form of annotation indexing (this is similar to metadata indexing perhaps)

A search can be executed directly against an in-memory annotation file (e.g., obtained dynamically from all registered objects)
In contrast to asking the obj mngr for all lsids that are annotations, and for each retrieving the annotation file, etc.

For efficiency, probably want multiple access paths via lsids, e.g., get all the workflow components and for each, retrieve it's annotation (if there are a lot more annotations than just for components); or build an annotation index based on these lsids, etc.

The types of indexing needed should be driven by development/testing
We may consider an obj. mngr. architecture that can easily support "extensible" indexing strategies (e.g., through listeners, etc.)

Go to top Edit this page More info... Attach file...

This page last changed on 20-Jan-2005 14:48:24 PST by SDSC.bowers.