Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Eco Grid Registry Design

DRAFT FOR COMMENTS

EcoGrid needs to provide a registry of the services available and some brief metadata about each of those services. This registry will be used to locate the services and pick and choose among alternative incarnations of services when they exist. Services include core EcoGrid interfaces such as the query, authentication, and put interfaces, as well as custom services for computational tasks. Initial design of the registry was done in Seattle (see the notes and the diagram).

Registry Metadata

We need to both identify and describe services.

In general, I think we should keep the registry metadata brief at first, and then add more fields later as the need arises. So, it would be best if the metadata storage schema were pretty flexible. Here's the info I think we could use now:

Core (all services)

  1. Logical name of service
  2. URL of service WSDL
  3. Service type (for gwsdl, this is derivable from the wsdl, and is a namespace URI I think)
  4. Endpoint (we can get this from the wsdl, but it would be convenient to store)
  5. ServiceClassification (from an ontology of service functions)

Query service

  1. Search languages accepted (e.g., EML, Darwin core) (use namespace to identify)
  2. Types of documents in collection (e.g., EML, Darwin Core, etc) (identify using namespace URI)
  3. Spatial, temporal, taxonomic coverage of collection (use eml-coverage module) -- I'm not actually sure this is a good idea -- might be too hard to actually represent or update

Implementation

Obviously, each service type will need its own set of metadata fields to properly describe it, in addition to the core fields. The Globus design for this seems excellent in this regard. So, in many ways we may just want to emulate that for infrastructure (or utilize it, seeing as the 'findServiceData' method is implemented for every one of our services anyway). Doing it this way would mean that the registry wouldn't need special methods for getting the additional service-specific metadata. Really, the only thing that would be actively registered would be the GWSDL URL -- from there the registry should be able to call the service's 'findServiceData' method and get any and all additional metadata.

In addition, because the Globus service definition allows any given service to support multiple service data elements (i.e., xml document types), the service data for a service is extensible by definition.

Service identifiers

We need to decide how to handle identification of services. Globus' GSH is the right idea, but we have recently been discussing using LSID identifiers for identifying data sets, actors in Kepler, taxonomic concepts, and other things [1]. It would only make sense for us to treat services uniformly from an identification perspective. But we need to decide how this will fit into the WSDL/GWSDL framework.

Distributed registry services

We've discussed that one single, centralized point for the registry is not adequate, and instead we needed a registry that is distributed across several hosts on the internet. When a new registry comes online, it would announce its presence by registering with one of the existing registries. Existing registries would take registration events and send them to all of the other know registries. The diagram in our Seattle meeting notes describes this at least schematically.

Registry prototype

An inital registry prototype has been developed and currently can be viewed here: http://kuecogrid.ittc.ku.edu:8080/ogsa/registry.jsp. It is not complete.

[#1] See the EcoGrid Identifiers discussion and the Identifiers in Kepler discussion.


EcoGridCommunity



Go to top   Edit this page   More info...   Attach file...
This page last changed on 10-Sep-2004 12:45:02 PDT by NCEAS.jones.