Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Identifiers In Kepler

Difference between version 7 and version 6:

Line 3 was replaced by lines 3-6
- In order to link various types of annotations to actors, services, and I/O ports in models that are utilized in Kepler, we need a consistent scheme for identifiying unique actors and services and their components. Currently, MOML refers to the implementing Java class as the principal definition of the actor, but this does not allow for the specializations that might occur later that constrain and define the actors I/O signatures and functionality. For example, the 'Expression' actor can be specialized by providing a particular expression to be evaluated, and the I/O signature of this specialized actor can be far more constrained than the Expression actor is generally.
+ To help scientists manage and share data sets, workflows, and workflow
+ computation steps (i.e., actors) as well as semantic descriptions (i.e.,
+ metadata and ontology-based annotations) of each of these, we propose
+ adding a new file-management middleware subsystem to Kepler.
At line 4 added 79 lines.
+ The main goals of the file-management subsystem are listed below. In
+ particular, the subsystem should provide the infrastructure for
+ enabling scientists to:
+
+ * Search for all known items of interest, including actors, workflows,
+ and data sets.
+
+ * Easily organize actors, workflows, and data sets of interest using the
+ file-directory metaphor (called "actor libraries" in Ptolemy). For
+ example, by allowing a scientist to create and persist a personal
+ library, and to browse and search that library. Allow items to be
+ organized into multiple local actor libraries, and in multiple
+ locations within a library.
+
+ * Persist actors, workflows, and data sets in a network-accesible
+ repository (i.e., in the EcoGrid).
+
+ * Retrieve new actors, workflows, and data sets from a network-accessible
+ repository (from the EcoGrid) and update changes to existing items.
+
+ * Track both revisions of actors, workflows, and data sets as well as new
+ versions (off-shoots or branches) of an item.
+
+ * Store semantic annotations of actors, workflows, and data sets and
+ publish those annotations to a network repository (the
+ EcoGrid). Retrieve annotations from the network repository.
+
+
+ The figure below outlines the architectural components of the
+ subsystem. The subsystem assumes the use of [Life-Sciences Identifiers|http://www.i3c.org/wgr/ta/resources/lsid/docs/index.asp]
+ (LSIDs) as logical identifiers for actors, workflows, data sets, and
+ annotations (and possibly for local "libraries"?). Thus, particular
+ files (representing actors, workflows, data sets, or annotations) are
+ assigned LSIDs and are accessed via LSIDs.
+
+ [http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/kr-sms/docs/KeplerFileManagement/architectuure.png]
+ __Fig 1:__ A high-level architecture for file-management in Kepler.
+
+ The main components provide operations over a local data store of
+ files. The remote manager provides the operations required to interact
+ with the EcoGrid. It also may provide a local cache for
+ performance. The LSID manager is used to assign unique identifiers for
+ items, e.g., serving as logical identifiers for actors, workflows,
+ data sets, and semantic annotations. The logical/physical index
+ manager provides operations to related physical files to LSIDs, and to
+ access files based on LSIDs. The directory/view manager provides
+ operations for local organizations of items into libraries.
+
+ The following are a set of possible supported operations (note that
+ these are half-baked, if that):
+
+ * {{Retrieve(ID)}}. Given an id, the subsystem returns the files
+ associated with the item.
+
+ * {{LocalIDs(IDMetadata)}}. Retrieve a list of local LSIDs, based on LSID
+ metadata.
+
+ * {{RemoteIDs(IDMetadata)}}. Retrieve a list of remote LSIDs, based on LSID
+ metadata.
+
+ * {{Store(ItemHandle, IDMetadata)}}. Construct an LSID with the given
+ metadata for the item, and store the item in the local repository.
+
+ * {{GetRemoteUpdates()}}. Retrieve a list of new or updated LSIDs from a remote
+ source.
+
+ * {{Update(LSID)}}. Update the given LSID item from a remote source. This
+ retrieves a new revision if one exists.
+
+ * {{Branch(LSID)}}. Create a new version of an item. This creates and
+ returns a new LSID for the item.
+
+ * {{CommitToRemoteServer(LSID)}}. Update this item in a remote server.
+
+
+ __MOTIVATION FOR LSIDs__
+
+ To link metadata and semantic annotations to actors and workflows that are utilized in Kepler, we need a consistent scheme for uniquely identifiying these components. Currently, MoML refers to the implementing Java class as the principal definition of the actor, but this does not allow for the specializations that might occur later that constrain and define the actors I/O signatures and functionality. For example, the 'Expression' actor can be specialized by providing a particular expression to be evaluated, and the I/O signature of this specialized actor can be far more constrained than the Expression actor is generally.
+

Back to Identifiers In Kepler, or to the Page History.