|
|||
|
This is version 9.
It is not the current version, and thus it cannot be edited. DRAFT FOR COMMENTS AND REVISION To help scientists manage and share data sets, workflows, and workflow computation steps (i.e., actors) as well as semantic descriptions (i.e., metadata and ontology-based annotations) of each of these, we propose adding a new file-management middleware subsystem to Kepler. The main goals of the file-management subsystem are listed below. In particular, the subsystem should provide the infrastructure for enabling scientists to:
The figure below outlines the architectural components of the subsystem. The subsystem assumes the use of Life-Sciences Identifiers (LSIDs) as logical identifiers for actors, workflows, data sets, and annotations (and possibly for local "libraries"?). Thus, particular files (representing actors, workflows, data sets, or annotations) are assigned LSIDs and are accessed via LSIDs.
Fig 1: A high-level architecture for file-management in Kepler. The main components provide operations over a local data store of files. The remote manager provides the operations required to interact with the EcoGrid. It also may provide a local cache for performance. The LSID manager is used to assign unique identifiers for items, e.g., serving as logical identifiers for actors, workflows, data sets, and semantic annotations. The logical/physical index manager provides operations to related physical files to LSIDs, and to access files based on LSIDs. The directory/view manager provides operations for local organizations of items into libraries. The following are a set of possible supported operations (note that these are half-baked, if that):
MOTIVATION FOR LSIDs To link metadata and semantic annotations to actors and workflows that are utilized in Kepler, we need a consistent scheme for uniquely identifiying these components. Currently, MoML refers to the implementing Java class as the principal definition of the actor, but this does not allow for the specializations that might occur later that constrain and define the actors I/O signatures and functionality. For example, the 'Expression' actor can be specialized by providing a particular expression to be evaluated, and the I/O signature of this specialized actor can be far more constrained than the Expression actor is generally. In SEEK, we wish to provide both a structural and a semantic description of the signature and behavior of the actors and services used in models. This will allow us to use these descriptions to construct more powerful search and browsing services and to help integrate and compose workflows. The EcoGrid and Taxon communities within SEEK are adopting Life Science Identifiers (LSIDs) as the principal syntax for creating identifiers. These ientifiers are free of semantics relating to the identified object, which makes it far easier to maintain consistent identifiers for a set of changing objects. LSIDs are described more thoroughly in EcoGridIdentifiers.
Proposal: Use LSID identifiers for actors, services, and components
Potential components needing identification
ReferencesLSID SpecificationLSID Java Tutorial AnalysisAndModelingCommunity
|
This material is based upon work supported by the National Science Foundation under award 0225676. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF). Copyright 2004 Partnership for Biodiversity Informatics, University of New Mexico, The Regents of the University of California, and University of Kansas |