Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Semantic Data Integration

Semantic Data Integration

The area of scientific data integration provides a number of challenges in addition to the traditional ones in data integration and database mediation. While data volume can and often is a problem, in a number of disciplines (e.g., ecology, life sciences in general, geosciences, etc.) the **semantic heterogeneity and complexity** of the data can be a significant impediment in itself. For example, an ecologist may want to combine a number of different data sources as part of an analytical pipeline or scientific workflow. In order to facilitate data integration, additional semantic information is often necessary, for example, on the unit type of a measurement, the protocol by which data was created or derived, or simply to provide additional information at the conceputal/ontological level about the data.

The purpose of a semantic mediation system is to utilize semantic annotations, e.g., for smarter (ontology-enabled) data discovery, semantic type checking and conversion when linking analytical steps to one another or when binding data sets to analysis steps.

The goal of this meeting is to identify common techniques and procedures for semantic mediation and to explore opportunities for collaborations between UK and US research groups.

In particular, we plan to concentrate on Semantic registration of data sets, provenance sets, parameter sets, workflow sets, services sets. We will attempt to address questions like the following:

* What kind of resources should be semantically typed (datasets, databases, services), and what is the semantic typing language for those? * How is semantic typing employed for data discovery, query rewriting, and scientific workflow planning? * What does a registry of semantic types look like, and how are data and services registered to it? What is the semantic registration procedure? * What tools exists to support semantic registration, querying, and reasoning with ontologies, schemas, etc. ? * What standards can be employed and extended? In particular, what support does EML already provide, and what else is needed?

Prior to the meeting attendees will be given some **homework**, i.e., to illustrate their approach of semantic registration of data sets, web services, and workflows using two examples. One of them (tentative) can be found here: http://www.sdsc.edu/~ludaesch/Paper/dils04.html.

09:00-10:30 PRESENTATIONS: SEMANTIC EXTENSIONS IN SEEK

The presentations will specificially address requirements and current
architecture in the context of the SEEK Semantic Mediation System
(SMS):

1. How to register data sets, ontologies, workflows, and associations
between them (semantic registration).

2. How to put the above to good use, e.g., for "smart data discovery",
semantics-enhanced data integration and mediation, semantics-enhanced
workflow design and execution (this also includes, e.g., required
reasoning services).

Hopefully we will also be able to report on the linkage between the
SEEK EcoGrid and the SEEK SMS: what are the "structural and semantic
commitments" of the EcoGrid that can be used by SMS.

In a sense, SMS uses the EcoGrid to do (1). Conversely, applications
such as the SEEK workflow system (AMS/Kepler) use both the EcoGrid and
SMS. This overall picture should also be fleshed out to some extent as
part of this session.

10:30-11:00 TEA AND COFFEE

11:00-12:30 PRESENTATIONS: SEMANTIC EXTENSIONS IN MYGRID

Same as above for SEEK, but now for MyGrid!  As part of the "homework
assignment" both SEEK and MyGrid folks use the same running example(s)
to illustrate their approaches.

12:30-14:00 LUNCH

14:00-16:30 INTEROPERABLE SEMANTIC REGISTRATION, MEDIATION, WORKFLOWS I

This 2.5 hour session might be parallelized into break-out
sessions. The goal is to flesh out an interoperable semantic
registration approach that will work across SEEK, MyGrid, and related
"semantics-aware" systems.  A MyGrid semantically registered service
should be usable from a semantics-aware SEEK workflow. Conversely, a
semantics-aware MyGrid workflow should be able to invoke SEEK services
and take advantage of semantic types.

Part of this discussion should also deal with non-procedural data
integration, i.e., based on declarative views as opposed to procedural
workflows.

15:30-16:00 TEA AND COFFEE

16:00-17:00 INTEROPERABLE SEMANTIC REGISTRATION, MEDIATION, WORKFLOWS II

17:00-18:00 PLENARY SESSION: REPORTING, NEXT STEPS

18:00 CLOSE

Back to the meeting agenda



Go to top   Edit this page   More info...   Attach file...
This page last changed on 06-Jul-2004 14:56:34 PDT by LTER.stekell.