Kepler Meeting SMS Notes

This is version 31. It is not the current version, and thus it cannot be edited.
[Back to current version] [Restore this version]

The Semantic Mediation System and KEPLER

Back to Kepler Meeting Agenda

Exploiting Ontologies

In SEEK we want to exploit "eco" ontologies to do "smart discovery and integration"
The goal is to "tag" (annotate) data and workflows (and their components) using ontology terms
Our solutions are meant to be generic, applicable for KEPLER

Ontology Languages

An ontology is:

a set of concept (class) names,
subconcept (subclass) links,
named (directed, binary) relationships between concepts,
and constraints (cardinality, equivalence, conjunction, disjunction, etc.)

In SEEK, we've adopted the Web Ontology Language (OWL)

Semantic Annotations

A semantic annotation assigns an "item" to an ontology "expression".

Items

Datasets: An entire dataset or some portion (a single table, one or more attributes, one or more data values, etc.)
Workflows and components: A workflow, a workflow component, or some portion (parameters, ports, substructures of a port type, etc.).

Selecting Items

Can be as simple as an LSID, e.g., that identifies an entire component or dataset
Simple query expressions can also be used, e.g., like XPath/XPointer addressing, using EML attribute identifiers, etc.
More generally, expressed as a query.

Ontology Expressions

Defines the semantic "context" of the item selected
Can be as simple as a single concept id (like "Measurement")
Simple expressions can also be used, e.g., as paths in an ontology

Example: Measurement.spatialContext.loc.latDeg specifies the location of a Measurement's spatialContext as a latitude in degrees

More generally, update queries, e.g., SQL-style update queries

Architecture Issues

SMS-Based Applications

Browsing/Keyword Search

Categorize workflows, components, datasets according to their position in the ontology concept hierarchy.
Search based on individual concepts (as a keyword), providing "term expansion" capabilities

Find "compatible" workflow components

Given a workflow component (an actor), find components that can be connected to it (either as input or output) based on semantic annotations. If the annotations are "compatible" according to the ontology(ies), the component is returned.
Could result in "data binding" -- a dataset may be a "compatible" input.
Note that semantic compatibility does not imply structural compatibility (the i/o types may not match; see below)
Requires port inputs/outputs to be semantically annotated

Workflow "analysis"

Given a workflow of connected components, check that each connection (input/output) is semantically compatible.
Analysis may take advantage of annotation propagation (this is still research)

Workflow-component structural integration

Given two components that are semantically compatible, determine one or more transformations (either by inserting new components or deriving transformation "code") to make them structurally compatible.

In general, component integration is a planning-style search problem (and still research)
May be a place where SCIA can contribute, to derive the structural transformation code and help users refine mappings

Dataset merging and integration

Search for "similar" datasets based on semantic annotations of current dataset
Given two datasets, merge them (data fusion) into a single dataset based on their semantic annotations + metadata
Define a dataset of interest (as a query---the classic approach---or as a target, annotated schema), then find/integrate datasets to populate result (classic data integration).

Perhaps places for SCIA to contribute?
In general, still research
Integration depends on the granularity/quality of the annotations, ontologies, etc.

Repositories

Ontology(ies)
Datasets (or metadata stating how to obtain the datasets)
Workflows and Workflow Components (or metadata, etc.)
Semantic Annotations

"Smart discovery and integration" needs access to these components:

To search for a workflow component, we would search through semantic annotations. When an annotation matches, obtain the corresponding component.
To organize (for browsing) all actors according to their annotations. Might iterate over actors, or similarly, for datasets.

Required Tools

Ontology Editors/Browsers

The KR group in SEEK

Semantic Annotation Editors/Browsers

For creating, editing, registering annotations
KR and SMS group in SEEK

Ontology-based query rewriting/answering

Classification based on ontology (Jena, Racer, etc.)
Efficiently using rewriting to find components
Testing semantic compatibility
Annotation propogation (reasearch)

Component integration reasoning

Structural transformation algorithms (SCIA? CLIO? Schema Mapping?)
Search a la planners

Data merging and integration reasoning

Algorithms and rules for fusing together data
Structural transformation algorithms (see above))
Basic conversions like count/area = density

Explanation viewers/systems

To explain why an answer was obtained
Closely tied to ontology editors/browsers

"Smart" Actor Search in Kepler

A very simple keyword-based search we (Chad and I) implemented within Kepler.

Integrated with the component 'quick search' frame
Allows dynamic actor classification (for browsing)
Allows runtime annotation and re-classification of actors
Term expansion for individual concept queries

Required a number of new features in Kepler:

ID mechanism for actors
Repositories: Fakes out component repository (as a ptolemy xml config file), annotation repository (xml file), ontology repository (simple is-a hierarchy, no rels)
Provides a very naive, hand-coded, local ID service (like for LSIDs)

What's needed for KEPLER

Ontologies and Ontology Tools

There basically aren't any.
There also aren't any tools in Kepler for creating, browsing, or editing ontologies.

Annotations

Need to extend the annotation "language"
Desperately need an annotation editor/browser

Need a reasonable/practical GUI design
Need a good way to access/browse a component/dataset and its attributes, such as ports and their input/output types.

Basic Kepler GUI Hooks

Like for toolbar, menus, etc.
Checking semantic compatibility (can steal unit resolver?).
Explanation of results (like for searching, etc.)

Algorithms

Need to understand the integration/merging algorithms
Could today write the other types of search algorithms

Repositories

Basically none of the repositories exist (except perhaps for Data, not sure)
I think the Kepler Obj. Manager can help with this, what we need from it is:

Ability to register components, data sets, ontologies, and annotations with the obj. manager
Ability to access all LSIDs of a certain type, e.g., components, data sets, ontologies, annotations
Ability to retrieve the object for an LSID
Some form of annotation indexing (this is similar to metadata indexing perhaps)

A search can be executed directly against an in-memory annotation file (e.g., obtained dynamically from all registered objects)
In contrast to asking the obj mngr for all lsids that are annotations, and for each retrieving the annotation file, etc.

For efficiency, probably want multiple access paths via lsids, e.g., get all the workflow components and for each, retrieve it's annotation (if there are a lot more annotations than just for components); or build an annotation index based on these lsids, etc.

What types of indexing exactly needing should be driven by development/testing, but we may consider an obj. mngr. architecture that can easily support "extensible" indexing strategies (e.g., through listeners, etc.)

Go to top More info... Attach file...

This particular version was published on 20-Jan-2005 12:32:06 PST by SDSC.bowers.