Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Kepler Meeting SMS Notes

Difference between version 40 and version 2:

Removed lines 1-2
- !SMS Notes for Kepler Meeting
-
Lines 5-181 were replaced by line 3
- __"Eco" ontologies__
-
- In SEEK we want to exploit "eco" ontologies for semantic
- annotation of services to do "smart discovery and integration"
-
-
- ** Generic
-
- Our solutions are meant to be generic, and applicable for KEPLER
-
-
- * Ontologies
-
- An ontology is a set of concept (class) names, subconcept (subclass)
- links, named (directed, binary) relationships between concepts, and
- constraints (cardinality, equivalence, conjunction, disjunction,
- etc.)
-
-
- * Semantic Annotations
-
- A semantic annotation assigns a 'selected item' to an ontology
- expression.
-
-
- ** 'Items'
-
- *** Datasets
-
- An entire dataset or some portion (a single table,
- one or more attributes, one or more data values, etc.)
-
- *** Workflows and components
-
- A workflow, a workflow component, or some portion
- (parameters, ports, substructures of a port type, etc.).
-
-
- ** Item Selection
-
- Expressed in general as queries, however, simpler expressions can
- be used, e.g., XPath/XPointer addresses or possibly using EML
- identifiers
-
-
- ** Ontology expression
-
- An ontology expression defines the semantic "context" of the item
- selected
-
- Ontology expressions can be as complex as SQL-style update
- queries or simpler, e.g., as single concepts or paths in an
- ontology
-
-
- *** Path Example:
-
- Measurement.spatialContext.loc.latDeg, specifying the
- location of a Measurement's spatialContext as a latitude in
- degrees
-
-
- * Architecture
-
- ** Repositories
-
- *** Ontology(ies)
-
- *** Datasets (or metadata stating how to obtain the datasets)
-
- *** Workflows and Workflow Components (or metadata, etc.)
-
- *** Semantic Annotations
-
- Basically, SMS "smart discovery and integration" needs access to
- these components.
-
- For example, to search for a workflow component, the sms engine
- would search through semantic annotations, and when an annotation
- matches, obtain the corresponding component.
-
- You might also want to organize (for browsing) all actors
- according to their annotations; so need to iterate over actors,
- or similarly, for datasets.
-
-
- ** SMS-Based Applications
-
- *** Browsing/Keyword Search
-
- Workflow and/or dataset browsing and keyword-search based on
- ontology concepts
-
- *** Find "compatible" actors
-
- Given an actor, find semantically compatible actors that can
- be connected as an input or output to the actor (this might
- result in a dataset as input, e.g.)
-
- Requires port inputs/outputs to be semantically annotated
-
- *** Workflow analysis
-
- Given a workflow, check that each connection (input/output)
- is semantically compatibly.
-
- As part of analysis, annotation propagation.
-
-
- *** Workflow component structural integration
-
- Given two components that are semantically compatible,
- determine a structural transformation (either another
- component or a transformation step) to make them structurally
- compatible.
-
- May be a place where SCIA can contribute, to derive
- structural transformations.
-
- *** Dataset merging/integration
-
- Search for "similar" datasets (that could be potentially
- "merged" or integrated)
-
- Define a dataset of interest (via an ontology-style query?),
- find/combine datasets as integrated "view".
-
- Perhaps a place for SCIA to contribute?
-
-
- ** Tools
-
- *** Ontology Editors/Browsers
-
- *** Semantic Annotation Editor
-
- *** Ontology-based query rewriting/answering
-
-
- * "Smart" Actor Search in Kepler
-
- A very simple keyword-based search implementation within Kepler.
-
- Fakes out: workflow component LSIDs, an actor repository (as a
- ptolemy xml config file), annotation repository (xml file), LSID
- service.
-
- The "ontology" is a simple hierarchy. No rels, etc.
-
-
-
- * What's Needed for KEPLER
-
- ** Ontologies
-
- There basically aren't any.
-
- There also aren't any tools. No tools within Kepler.
-
-
- ** Repositories ...
-
- The Obj. Mngr. can help! We don't have repositories for
- workflows/components, ontologies, annotations, or datasets in
- KEPLER.
-
- For annotations, need a searchable "index" of annotations and ids
- (for components, datasets, etc.), and a mechanism to "retrieve"
- those items.
-
- For performance, I wonder though if the "index" should be in
- memory.
-
-
- ** Semantic Annotation Editor
-
- This doesn't exist either ... lots of ways/approaches here.
+ Back to [Kepler Meeting Agenda|http://seek.ecoinformatics.org/Wiki.jsp?page=KeplerMeetingAgendaJanuary2004]
Lines 183-186 were replaced by line 5
- Need GUI design for this.
-
- Also, need a good way to access/browse a component/dataset and its
- attributes, such as is ports and their input/output types.
+ ----
Line 188 was replaced by line 7
- Similar with datasets
+ __Exploiting Ontologies__
Lines 190-191 were replaced by lines 9-153
- The challenges are making this tool easy to use, and accessible
- within Kepler.
+ * In SEEK we want to exploit "eco" ontologies to do "smart discovery and integration"
+ * The goal is to "tag" (annotate) data and workflows (and their components/actors) using ontology terms
+ * Our solutions are meant to be generic, applicable for KEPLER
+
+ __Ontology Languages__
+
+ * An ontology is:
+ *# a set of concept (class) names,
+ *# subconcept (subclass) links,
+ *# named (directed, binary) relationships between concepts,
+ *# and constraints (cardinality, equivalence, conjunction, disjunction, etc.)
+ * In SEEK, we've adopted the Web Ontology Language (OWL)
+
+
+ __Semantic Annotations__
+
+ * A semantic annotation assigns an "item" to an ontology "expression".
+
+ ** ''Items''
+ *** ''Datasets'': An entire dataset or some portion (a single table, one or more attributes, one or more data values, etc.)
+ *** ''Workflows and components'': A workflow, a workflow actor, or some portion (parameters, ports, substructures of a port type, etc.).
+
+ ** ''Selecting Items''
+ *** Can be as simple as an LSID, e.g., that identifies an actor or dataset
+ *** Simple query expressions can also be used, e.g., like XPath/XPointer addressing, using EML attribute identifiers, etc.
+ *** More generally, expressed as a query.
+
+
+ ** ''Ontology Expressions''
+ *** Defines the semantic "context" of the item selected
+ *** Can be as simple as a single concept id (like "Measurement")
+ *** Simple expressions can also be used, e.g., as paths in an ontology
+ **** Example: {{Measurement.spatialContext.loc.latDeg}} specifies the location of a Measurement's spatialContext as a latitude in degrees
+ *** More generally, update queries, e.g., SQL-style update queries
+
+
+ __Architecture Issues__
+
+ * ''SMS-Based Applications''
+ *# Browsing/Keyword Search
+ *** Categorize workflows, actors, datasets according to their position in the ontology concept hierarchy.
+ *** Search based on individual concepts (as a keyword), providing "term expansion" capabilities
+ *# Find "compatible" workflow actors
+ *** Given a workflow actor, find actors that can be connected to it (either as input or output) based on semantic annotations. If the annotations are "compatible" according to the ontology(ies), the actor is returned.
+ *** Could result in "data binding" -- a dataset may be a "compatible" input.
+ *** Note that semantic compatibility does not imply structural compatibility (the i/o types may not match; see below)
+ *** Requires port inputs/outputs to be semantically annotated
+ *# Workflow "analysis"
+ *** Given a workflow of connected actors, check that each connection (input/output) is semantically compatible.
+ *** Analysis may take advantage of annotation propagation (this is still research)
+ *# Workflow actor structural integration
+ *** Given two actors that are semantically compatible, determine one or more transformations (either by inserting new actors or deriving transformation "code") to make them structurally compatible.
+ **** In general, actor integration is a planning-style search problem (and still research)
+ **** May be a place where SCIA can contribute, to derive the structural transformation code and help users refine mappings
+ *# Dataset merging and integration
+ *** Search for "similar" datasets based on semantic annotations of current dataset
+ *** Given two datasets, merge them (data fusion) into a single dataset based on their semantic annotations + metadata
+ *** Define a dataset of interest (as a query---the classic approach---or as a target, annotated schema), then find/integrate datasets to populate result (classic data integration).
+ **** Perhaps places for SCIA to contribute?
+ **** In general, still research
+ **** Integration depends on the granularity/quality of the annotations, ontologies, etc.
+
+ * ''Repositories''
+ *# Ontology(ies)
+ *# Datasets (or metadata stating how to obtain the datasets)
+ *# Workflows and Actors (or metadata, etc.)
+ *# Semantic Annotations
+
+ ** "Smart discovery and integration" needs access to these components:
+ *** To search for a workflow component, we would search through semantic annotations. When an annotation matches, obtain the corresponding component.
+ *** To organize (for browsing) all actors according to their annotations. Might iterate over actors, or similarly, for datasets.
+
+ * ''Required Tools''
+ *# Ontology Editors/Browsers
+ *** The KR group in SEEK
+ *# Semantic Annotation Editors/Browsers
+ *** For creating, editing, registering annotations
+ *** KR and SMS group in SEEK
+ *# Ontology-based query rewriting/answering
+ *** Classification based on ontology (Jena, Racer, etc.)
+ *** Efficiently using rewriting to find components
+ *** Testing semantic compatibility
+ *** Annotation propogation (research)
+ *# Actor integration reasoning
+ *** Structural transformation algorithms (SCIA? CLIO? Schema Mapping?)
+ *** Search a la planners
+ *# Data merging and integration reasoning
+ *** Algorithms and rules for fusing together data
+ *** Structural transformation algorithms (see above))
+ *** Basic conversions like count/area = density
+ *# Explanation viewers/systems
+ *** To explain why an answer was obtained
+ *** Closely tied to ontology editors/browsers
+
+
+ * ''"Smart" Actor Search in Kepler''
+ ** A very simple keyword-based search we (Chad and I) implemented within Kepler.
+ *** Integrated with the actor 'quick search' frame
+ *** Allows dynamic actor classification (for browsing)
+ *** Allows runtime annotation and re-classification of actors
+ *** Term expansion for individual concept queries
+ ** Required a number of new features in Kepler:
+ **# ID mechanism for actors
+ **# Repositories: Fakes out component repository (as a ptolemy xml config file), annotation repository (xml file), ontology repository (simple is-a hierarchy, no rels)
+ **# Provides a very naive, hand-coded, local ID service (like for LSIDs)
+
+
+
+ __What's needed for KEPLER__
+
+ * Ontologies and Ontology Tools
+ ** Need more example ontologies.
+ ** There also aren't tools within Kepler for creating, browsing, or editing ontologies (coupling tools within Kepeler? import OWL files?, etc).
+
+ * Annotations
+ ** Need to formalize/finalize the annotation language
+ ** Annotation interface for Kepler
+ ** Also, may want:
+ *** GUI design
+ *** A uniform way to access/browse a component/dataset and its attributes, such as ports and their input/output types.
+ *** Perhaps SCIA can help with specifying annotations?
+
+ * Basic Kepler GUI Hooks
+ ** Like for toolbar, menus, etc.
+ ** Checking semantic compatibility (can steal unit resolver?).
+ ** Explanation of results (like for searching, etc.)
+ ** Joined development with Ptolemy group for customizable menus, etc.
+ ** Use a personal ontology. Swap out default ontologies.
+
+ * Algorithms
+ ** Need to understand the integration/merging algorithms better (working on examples/test cases currently ...)
+ ** Could today write the other types of search algorithms (compatible actors/datasets)
+
+ * Repositories
+ ** Basically none of the repositories exist (except perhaps for Data, not sure)
+ ** I think the Kepler Obj. Manager can help with this, what we need from it is:
+ *** Ability to register components, data sets, ontologies, and annotations with the obj. manager
+ *** Ability to access all LSIDs of a certain type, e.g., components, data sets, ontologies, annotations
+ *** Ability to retrieve the object for an LSID
+ *** Some form of ''annotation indexing'' (this is similar to metadata indexing perhaps)
+ **** A search can be executed directly against an in-memory annotation file (e.g., obtained dynamically from all registered objects)
+ **** In contrast to asking the obj mngr for all lsids that are annotations, and for each retrieving the annotation file, etc.
+ *** For efficiency, probably want multiple access paths via lsids, e.g., get all the workflow components and for each, retrieve it's annotation (if there are a lot more annotations than just for components); or build an annotation index based on these lsids, etc.
+ **** The types of indexing needed should be driven by development/testing
+ **** We may consider an obj. mngr. architecture that can easily support "extensible" indexing strategies (e.g., through listeners, etc.)
Removed line 194
- ** Basic Kepler Interfaces / GUI Design
Removed lines 196-197
- Like for searching, checking semantic compatibility (can steal
- unit resolver), explanation of semantics (like for searching, etc.)

Back to Kepler Meeting SMS Notes, or to the Page History.