Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Identifiers In Kepler

Difference between version 13 and version 1:

Line 1 was replaced by line 1
- __DRAFT FOR COMMENTS__
+ __DRAFT FOR COMMENTS AND REVISION__
Line 3 was replaced by line 3
- EcoGrid needs to provide a consistent identification scheme for data objects and metadata records in the system, among other things. This identifier format should allow for the creation of globally unique identifiers, hopefully with a local part that is assigned at the participating service rather than centrally.
+ To help scientists manage and share data sets, workflows, and workflow computation steps (i.e., actors) as well as semantic descriptions (i.e., metadata and ontology-based annotations) of each of these, we propose adding a new file-management middleware subsystem to Kepler.
Lines 5-6 were replaced by line 5
- !!Proposed Identifier Formats
- In the past meetings we've discussed trying to standardize our identifier formats, but so far none of the systems (srb, metacat, etc) will accept identifers of the format we've discussed. In Edinburgh, our proposed ID format was:
+ The main goals of the file-management subsystem are listed below. We use the term "Kepler item" loosely below to refer to actors, workflows, and data sets, and "annotations" to refer to metadata (like EML) and semantic descriptions. The subsystem should provide the infrastructure for enabling scientists to:
Line 8 was replaced by lines 7-13
- {{{urn:ecogrid://scope/localIdentifier}}}
+ * Search for all known Kepler items of interest.
+ * Organize Kepler items of interest using a file-directory metaphor (currently called "actor libraries" in Ptolemy). For example, a scientist should be able to create and persist a personal library, and to browse and search that library.
+ * Allow Kepler items to be organized into multiple libraries, and in multiple places within a library.
+ * Persist Kepler items in a network-accesible repository (i.e., in the EcoGrid).
+ * Retrieve new Kepler items from a network-accessible repository (the EcoGrid) and update changes to local items.
+ * Track both revisions of Kepler items as well as new versions (off-shoots or branches) of an item.
+ * Provide similar functionality for annotations, i.e., store annotations of actors, workflows, and data sets and publish those annotations to a network repository (the EcoGrid), and retrieve new annotations from the network repository.
Line 10 was replaced by line 15
- where scope is a symbolic name from the registry that can be used to look up the wsdl (and therefore endpoint) of the EcoGrid query service.
+ The figure below outlines the architectural components of the subsystem. We treat annotations and Kepler items uniformly below. That is, the management subsystem does not have separate storage components for managing annotations and items. The subsystem assumes the use of [Life-Sciences Identifiers|http://www.i3c.org/wgr/ta/resources/lsid/docs/index.asp] (LSIDs) as logical identifiers for items and annotations. (QUESTION: and also for local "libraries" -- do these need to be stored in the EcoGrid, e.g.?). Thus, relevant files are assigned LSIDs and are accessed via LSIDs. We assume that an LSID will store the type of file (e.g., the type of annotation, or the type of Kepler item) associated with the ID.
Line 12 was replaced by line 17
- However, we've also discussed using LSID's as the taxon group is doing, which take the form:
+ [http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/kr-sms/docs/KeplerFileManagement/architectuure.png]
Line 14 was replaced by line 19
- {{{URN:LSID:authority:namespace:objectId:revision}}}
+ __Fig 1:__ A high-level architecture for file-management in Kepler.
Line 16 was replaced by line 21
- The revision is optional, and taxon and ecogrid have both agreed that it should NOT be used (and that any versioning information belongs in the metadata). The authority is a DNS name that resolves via DNS to a service that can be used to resolve the LSID to actual physical locations. 'Namespace' is analogous to our 'scope', and 'objectId' is analogous to our 'localIdentifier'. So, this is similar to our proposal above, but relies on DNS instead of our registry for the resolution service.
+ The remote manager component provides the operations required to interact with the EcoGrid. It also may provide a local cache for better performance. The LSID manager creates and assigns unique identifiers for items and annotations. The logical/physical index manager provides operations to relate physical files to LSIDs, and to access physical files based on LSIDs. The directory/view manager provides operations to support the creation and retrieval of local, customized libraries.
At line 17 added 34 lines.
+ The following are operations (these are very half-baked!) may be provided by the Kepler file-management subsystem:
+
+ * {{Retrieve(ID)}}. Given an LSID, returns the relevant files associated with the item.
+ * {{LocalIDs(IDMetadata)}}. Retrieve a list of local LSIDs, based on LSID metadata.
+ * {{RemoteIDs(IDMetadata)}}. Retrieve a list of remote LSIDs, based on LSID metadata.
+ * {{Store(ItemHandle, IDMetadata)}}. Construct an LSID with the given metadata for the item, and store the item in the local repository.
+ * {{GetRemoteUpdates()}}. Retrieve a list of new or updated LSIDs from a remote source.
+ * {{Update(LSID)}}. Update the given LSID item from a remote source. This retrieves a new revision if one exists.
+ * {{Branch(LSID)}}. Create a new version of an item. This creates and returns a new LSID for the item.
+ * {{CommitToRemoteServer(LSID)}}. Update this item in a remote server.
+
+
+ Another feature that we may consider is support for change management. In particular, when a file within the subsystem is changed, a notification can be cached/stored of the change, for use by components within Kepler. (Isn't there already a change api in Kepler; and can we piggy-back on this?)
+
+
+ __MOTIVATION FOR LSIDs__
+
+ To link metadata and semantic annotations to actors and workflows that are utilized in Kepler, we need a consistent scheme for uniquely identifiying these components. Currently, MoML refers to the implementing Java class as the principal definition of the actor, but this does not allow for the specializations that might occur later that constrain and define the actors I/O signatures and functionality. For example, the 'Expression' actor can be specialized by providing a particular expression to be evaluated, and the I/O signature of this specialized actor can be far more constrained than the Expression actor is generally.
+
+ In SEEK, we wish to provide both a structural and a semantic description of the signature and behavior of the actors and services used in models. This will allow us to use these descriptions to construct more powerful search and browsing services and to help integrate and compose workflows.
+
+ The EcoGrid and Taxon communities within SEEK are adopting Life Science Identifiers (LSIDs) as the principal syntax for creating identifiers. These ientifiers are free of semantics relating to the identified object, which makes it far easier to maintain consistent identifiers for a set of changing objects. LSIDs are described more thoroughly in [EcoGridIdentifiers].
+
+
+ !! Proposal: Use [LSID|http://www.i3c.org/wgr/ta/resources/lsid/docs/index.asp] identifiers for actors, services, and components
+
+ ! Potential components needing identification
+ * An actor or service overall
+ ** This would enable annotations regarding the behavior of the actor overall
+ ** Would need to attach these identifiers to Java implementations of actors, to specializations of actors, and to web service descriptions (e.g., WSDL), among other things
+ * A port from an actor or service
+ ** This enables annotation of the actor signature, both before and after specialization
+ * Combinations of ports? Probably not, as they can be referred as compound objects
+
Line 22 was replaced by line 61
- [EcoGridCommunity]
+ [AnalysisAndModelingCommunity]

Back to Identifiers In Kepler, or to the Page History.