This is version 24.
It is not the current version, and thus it cannot be edited.
[Back to current version]
[Restore this version]
Back to Kepler Meeting Agenda
Exploiting Ontologies
- In SEEK we want to exploit "eco" ontologies to do "smart discovery and integration"
- The goal is to "tag" (annotate) data and workflows (and their components) using ontology terms
- Our solutions are meant to be generic, applicable for KEPLER
Ontology Languages
- An ontology is:
- a set of concept (class) names,
- subconcept (subclass) links,
- named (directed, binary) relationships between concepts,
- and constraints (cardinality, equivalence, conjunction, disjunction, etc.)
- In SEEK, we've adopted the Web Ontology Language (OWL)
Semantic Annotations
- A semantic annotation assigns an "item" to an ontology "expression".
- Items
- Datasets: An entire dataset or some portion (a single table, one or more attributes, one or more data values, etc.)
- Workflows and components: A workflow, a workflow component, or some portion (parameters, ports, substructures of a port type, etc.).
- Selecting Items
- Can be as simple as an LSID, e.g., that identifies an entire component or dataset
- Simple query expressions can also be used, e.g., like XPath/XPointer addressing, using EML attribute identifiers, etc.
- More generally, expressed as a query.
- Ontology Expressions
- Defines the semantic "context" of the item selected
- Can be as simple as a single concept id (like "Measurement")
- Simple expressions can also be used, e.g., as paths in an ontology
- Example: Measurement.spatialContext.loc.latDeg specifies the location of a Measurement's spatialContext as a latitude in degrees
- More generally, update queries, e.g., SQL-style update queries
Architecture Issues
- SMS-Based Applications
- Browsing/Keyword Search
- Categorize workflows, components, datasets according to their position in the ontology concept hierarchy.
- Search based on individual concepts (as a keyword), providing "term expansion" capabilities
- Find "compatible" workflow components
- Given a workflow component (an actor), find components that can be connected to it (either as input or output) based on semantic annotations. If the annotations are "compatible" according to the ontology(ies), the component is returned.
- Could result in "data binding" -- a dataset may be a "compatible" input.
- Note that semantic compatibility does not imply structural compatibility (the i/o types may not match; see below)
- Requires port inputs/outputs to be semantically annotated
- Workflow "analysis"
- Given a workflow of connected components, check that each connection (input/output) is semantically compatible.
- Analysis may take advantage of annotation propagation (this is still research)
- Workflow-component structural integration
- Given two components that are semantically compatible, determine one or more transformations (either by inserting new components or deriving transformation "code") to make them structurally compatible.
- In general, component integration is a planning-style search problem (and still research)
- May be a place where SCIA can contribute, to derive the structural transformation code and help users refine mappings
- Dataset merging and integration
- Search for "similar" datasets based on semantic annotations of current dataset
- Given two datasets, merge them (data fusion) into a single dataset based on their semantic annotations + metadata
- Define a dataset of interest (as a query---the classic approach---or as a target, annotated schema), then find/integrate datasets to populate result (classic data integration).
- Perhaps places for SCIA to contribute?
- In general, still research
- Integration depends on the granularity/quality of the annotations, ontologies, etc.
- Repositories
- Ontology(ies)
- Datasets (or metadata stating how to obtain the datasets)
- Workflows and Workflow Components (or metadata, etc.)
- Semantic Annotations
- "Smart discovery and integration" needs access to these components:
- To search for a workflow component, we would search through semantic annotations. When an annotation matches, obtain the corresponding component.
- To organize (for browsing) all actors according to their annotations. Might iterate over actors, or similarly, for datasets.
*** Ontology Editors/Browsers
*** Semantic Annotation Editor
*** Ontology-based query rewriting/answering
- "Smart" Actor Search in Kepler
A very simple keyword-based search implementation within Kepler.
Fakes out: workflow component LSIDs, an actor repository (as a
ptolemy xml config file), annotation repository (xml file), LSID
service.
The "ontology" is a simple hierarchy. No rels, etc.
** Ontologies
There basically aren't any.
There also aren't any tools. No tools within Kepler.
** Repositories ...
The Obj. Mngr. can help! We don't have repositories for
workflows/components, ontologies, annotations, or datasets in
KEPLER.
For annotations, need a searchable "index" of annotations and ids
(for components, datasets, etc.), and a mechanism to "retrieve"
those items.
For performance, I wonder though if the "index" should be in
memory.
** Semantic Annotation Editor
This doesn't exist either ... lots of ways/approaches here.
Need GUI design for this.
Also, need a good way to access/browse a component/dataset and its
attributes, such as is ports and their input/output types.
Similar with datasets
The challenges are making this tool easy to use, and accessible
within Kepler.
** Basic Kepler Interfaces / GUI Design
Like for searching, checking semantic compatibility (can steal
unit resolver), explanation of semantics (like for searching, etc.)
|