Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Kepler Meeting SMS Notes

This is version 17. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]


The Semantic Mediation System and KEPLER

Back to Kepler Meeting Agenda


Exploiting Ontologies

  • In SEEK we want to exploit "eco" ontologies to do "smart discovery and integration"
  • The goal is to "tag" (annotate) data and workflows (and their components) using ontology terms
  • Our solutions are meant to be generic, applicable for KEPLER

Ontology Languages

  • An ontology is:
    1. a set of concept (class) names,
    2. subconcept (subclass) links,
    3. named (directed, binary) relationships between concepts,
    4. and constraints (cardinality, equivalence, conjunction, disjunction, etc.)
  • In SEEK, we've adopted the Web Ontology Language (OWL)

Semantic Annotations

  • A semantic annotation assigns an "item" to an ontology "expression".

    • Items
      • Datasets: An entire dataset or some portion (a single table, one or more attributes, one or more data values, etc.)
      • Workflows and components: A workflow, a workflow component, or some portion (parameters, ports, substructures of a port type, etc.).

    • Selecting Items
      • Can be as simple as an LSID, e.g., that identifies an entire component or dataset
      • More generally, expressed as a query.
      • However, simple query expressions are used, e.g., like XPath/XPointer addressing, using EML attribute identifiers, etc.

    • Ontology Expressions
      • Defines the semantic "context" of the item selected
      • Can be as simple as a single concept id (like "Measurement")
      • More generally, update queries, e.g., SQL-style update queries
      • However, simpler expressions used, such as paths in an ontology
      • Path Example: Measurement.spatialContext.loc.latDeg specifies the location of a Measurement's spatialContext as a latitude in degrees

Architecture

  • Repositories
    • Ontology(ies)
    • Datasets (or metadata stating how to obtain the datasets)
    • Workflows and Workflow Components (or metadata, etc.)
    • Semantic Annotations

    • "Smart discovery and integration" needs access to each of these components:
      • For example, to search for a workflow component, the sms engine would search through semantic annotations, and when an annotation matches, obtain the corresponding component.
      • You might also want to organize (for browsing) all actors according to their annotations; so need to iterate over actors, or similarly, for datasets.

  • SMS-Based Applications
    1. Browsing/Keyword Search
      • Categorize workflows, components, datasets according to their position in the ontology concept hierarchy.
      • Search based on concepts-as-keywords, providing "term expansion" capabilities
    2. Find "compatible" workflow components
      • Given a workflow component (an actor), find actors that can be connected to it (either as input or output) based on semantic annotations. If the annotations are "compatible" according to the ontology(ies), the component is returned.
      • Note that this could also result in "data binding," e.g., a dataset may be a "compatible" input.
      • Also note that semantic compatbility does not imply structural compatibility (the i/o types may not match)
      • Requires port inputs/outputs to be semantically annotated
    3. Workflow "analysis"
      • Given a workflow, check that each connection (input/output) is semantically compatible.
      • As part of analysis, annotation propagation ...
    4. Workflow-component structural integration
      • Given two components that are semantically compatible, determine a structural transformation (either another component or a transformation step) to make them structurally compatible.
      • May be a place where SCIA can contribute, to derive structural transformations.
    5. Dataset merging/integration
      • Search for "similar" datasets (that could be potentially "merged" or integrated)
      • Define a dataset of interest (via an ontology-style query), find/combine datasets as integrated "view".
        • Perhaps a place for SCIA to contribute?

  • Tools

*** Ontology Editors/Browsers

*** Semantic Annotation Editor

*** Ontology-based query rewriting/answering

  • "Smart" Actor Search in Kepler

A very simple keyword-based search implementation within Kepler.

Fakes out: workflow component LSIDs, an actor repository (as a ptolemy xml config file), annotation repository (xml file), LSID service.

The "ontology" is a simple hierarchy. No rels, etc.

  • What's Needed for KEPLER

** Ontologies

There basically aren't any.

There also aren't any tools. No tools within Kepler.

** Repositories ...

The Obj. Mngr. can help! We don't have repositories for workflows/components, ontologies, annotations, or datasets in KEPLER.

For annotations, need a searchable "index" of annotations and ids (for components, datasets, etc.), and a mechanism to "retrieve" those items.

For performance, I wonder though if the "index" should be in memory.

** Semantic Annotation Editor This doesn't exist either ... lots of ways/approaches here.

Need GUI design for this. Also, need a good way to access/browse a component/dataset and its attributes, such as is ports and their input/output types.

Similar with datasets

The challenges are making this tool easy to use, and accessible within Kepler.

** Basic Kepler Interfaces / GUI Design

Like for searching, checking semantic compatibility (can steal unit resolver), explanation of semantics (like for searching, etc.)



Go to top   More info...   Attach file...
This particular version was published on 20-Jan-2005 11:25:15 PST by SDSC.bowers.