Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Semantics In Kepler

Difference between version 39 and version 9:

At line 1 added 1 line.
+
Line 3 was replaced by line 4
-
+
At line 7 added 1 line.
+
Line 15 was replaced by line 17
- Our proposed Kepler semantics demonstration paper submitted to SSDBM 2005 also describes some of this material: see [Incorporating Semantics in Scientific Workflow Authoring|http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/kepler/docs/pubs/kepler-SSDBM2005.pdf?rev=1.1&content-type=application/pdf].
+ Our proposed Kepler semantics demonstration paper submitted to SSDBM 2005 also describes some of this material: see [Incorporating Semantics in Scientific Workflow Authoring|http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/kepler-docs/pubs/kepler-SSDBM2005.pdf?rev=1.1&content-type=application/pdf].
Removed line 20
- We intend to use ontology-driven search capabilities to improve search capabilities for both data and actors that have been annotated with terms from formal ontologies. For both actors and data, there are two levels of annotation that we will make use of. The first level is an annotation that addresses the general topic of the data or function of the actor. We can call this the "topical ontology". The second level is another annotation that describes the semantic signature of the data or actor. We can call this the "signature ontology". In the case of data, this describes the semnatic type of each attribute within the data and how the attributes relate to one another. For actors, this describes the semantic type of each input and output port and how the ports relate to one another. In many ways the topical ontology describes what something represents or does, while the signature ontology describes its data requirements.
At line 21 added 2 lines.
+ We intend to use ontology-driven search capabilities to improve discovery capabilities for both data and actors that have been annotated with terms from formal ontologies. For both actors and data, there are two levels of annotation that we will make use of. The first level of annotation addresses the general topic of the data or function of the actor. We can call this style of annotation the "topical ontology annotation". The second level of annotation describes the semantic "signature" of the data or actor. (By signature we mean a data-structure definition, e.g., a database schema.) We can call this style of annotation the "signature ontology annotation". In the case of data, signature annotations describe the semantic type of each attribute within the data and (possibly) how the attributes relate to one another. For actors, signature annotations describe the semantic type of each input and output port and (possibly) how the ports relate to one another. The topical annotation is meant to describe generally what something represents or does, while the signature annotation describes how such semantic information is encoded (i.e., realized) by the data or actor. In some cases, the signature annotation may be used to "infer" the topical annotation.
+
Line 23 was replaced by lines 26-27
- We envision several mechanisms for discovering relevant data and actors via their semantic annotations. First, in the left-hand list of data and actors, we envision users being able to search for terms from an ontology and have all related results display in the list of results. For example, one might search for data about "biodiveristy" and see a data set that contains abundance of reptile species at a site. Or one might search for "SpeciesDistributionModels" and see the GARP model. In this scenario, we need to decide how the user chooses terms from the ontology for searching. Currently, we simply allow them to type in a text string which is then compared to the ontology terms, and if there is a match, those ontology terms are used in the search. We may want the ability to be more precise.
+
+ We envision several mechanisms for discovering relevant data and actors via their semantic annotations within Kepler. First, in the left-hand list of data and actors, we envision users being able to search for terms from an ontology and have all related results display in the list of results. For example, one might search for data about "biodiveristy" and see a data set that contains abundance of reptile species at a site. Or one might search for "SpeciesDistributionModels" and see the GARP model. In this scenario, we need to decide how the user chooses terms from the ontology for searching. Currently, we simply allow them to type in a text string that is then compared to the ontology terms, and if there is a match, those ontology terms are used in the search. We may want to allow users to be more precise, e.g., by providing an advanced search interface or through an additional interactive step that suggests possible ontology concept expressions given the original text string.
Line 25 was replaced by line 29
- Second, we would like scientists to be able to discover semantically compatible actors and data while composing a workflow. This would involve two possible GUI mechanisms. First, there could be a toggle that allowed the user to only display semantically compatible actors and data in the left-hand pane. When toggled, the system would impose a constraint in which each potential actor is screened against the current workflow to see if it could be added in any semantically compatible way based on the semantic type of its I/O signature. When the workflow canvas is blank, all actors and data woud show up in the list, but as actors and data are added to the canvas, this imposes constraints that reduces the number fo actors that are displayed. Second, the user should be able to select any combination of one or more actors that is currently on the canvas and right click to "Show compatible actors". This effectively launches a semantic query against the semantic I/O signature of the selected actors and displays compatible actors in the left-hand pane.
+ Second, we would like scientists to be able to discover semantically compatible actors and data while composing a workflow. This type of search would involve two possible GUI mechanisms. First, a toggle could be used that allows the user to only display actors and data in the left-hand pane that are semantically compatible to the current workflow displayed in the Kepler canvas. When a user selects the toggle, the system would impose a constraint in which each potential actor or data set is screened to see if it could be connected in any "semantically compatible" way, based on I/O signature annotations, to actors and data in the workflow. When the workflow canvas is blank, all actors and data would appear in the list, but as actors and data are added to the canvas, constraints are imposed that reduce the number of actors and data displayed in the left-hand pane. Second, the user should be able to select any combination of one or more actors and data sets that are currently on the canvas (i.e., part of the current workflow) and right click to "Show compatible actors and data sets." This search effectively launches a semantic query against the I/O signature annotations of the selected actors and data, and displays the corresponding semantically-compatible actors and data in the left-hand pane.
Removed line 28
- There is some question as to what exactly our needs are for the topical ontology and the signature ontology. These could probably both be part of one overall ontology, but that may also complicate matters. The topical ontology is in many ways more general, and would represent one or more classification axes that allowed users to label the actors and data with relevant terms. For example, an actor that calculates the Shannon-Weiner index might be labeled as a "BiodiversityIndex" and a model that generates spatially explicit maps of species distributions might be labeled a "SpatialSpeciesDistributionModel" (not that that would be an explicit term, but that would be the meaning of the annotation). The signature ontology would specifically be used to label the semantic types of data that are contained within a data set or that flow between two or more actors. Thus, this ontology would contain terms that are concretely tied to real-world measurements as represented by data. For example, the ontological annotation for a particular column in a biodiversity data set might be "PsychotriaLimonensisArealDensity" (it would also have a structural type describing units, etc). A particular "BiodiversityIndex" actor might take as input data with the type "SpeciesArealDensity", and the ontology would allow us to deduce that the Psychotria limonensis data is semantically compatible with the actor's input requirement.
At line 29 added 2 lines.
+ There is some question as to what types of ontologies are required for topical annotations as opposed to signature annotations. There may be one overall ontology for both styles of annotation, or it may be convenient to have separate ontologies for each. The ontology for topical annotations may be inherently more general, representing one or more simple classification axes that allow users to label actors and data with relevant terms. To illustrate, an actor that calculates the Shannon-Weiner index might be labeled simply as a "BiodiversityIndex" and a model that generates spatially explicit maps of species distributions might be labeled as simply a "SpatialSpeciesDistributionModel".[1] The signature ontology would specifically be used to label the semantic types of data that are contained within a data set or that flow between two or more actors. Thus, this ontology would contain terms that are concretely tied to real-world measurements (and models) as represented by data. For example, the annotation for a particular column in a biodiversity data set might be "PsychotriaLimonensisArealDensity" (it would also have a structural type describing units, etc). A particular "BiodiversityIndex" actor might take input data with the type "SpeciesArealDensity", and the ontology would allow us to deduce that the Psychotria limonensis data is semantically compatible with the actor's input requirement.[2]
+
Line 33 was replaced by lines 38-39
- When composing workflows scientists can create new composite actors that wrap up other actors into a functional unit that performs a typical task. They can also create new atomic actors that are introduced into Kepler using the [KSW file format|http://kepler-project.org/Wiki.jsp?page=KSWEncapsulationSpecification] that is being designed. We want scientists to be able to label these new actors with the proper terms from the topical and signature ontologies so that they can be saved and then discovered later using the semantic techniques outlined above. When annotating these new actors, the scientist may have a need for new terms in the ontology to properly classify their new actor. These could be topical terms or new semantic data types that are not yet represented. In addition, the ontology may not capture any particular scientists views and knowledge of the domain area accurately, so they may wish to reorganize the ontology to better reflect their semantic worldview. In both of these cases the Kepler GUI needs to accomodate modification of the ontologies.
+
+ When composing workflows scientists can create new composite actors that "wrap up" other actors into a functional unit that performs a typical task. They can also create new atomic actors that are introduced into Kepler using the [KSW file format|http://kepler-project.org/Wiki.jsp?page=KSWEncapsulationSpecification], which is currently being designed. We want scientists to be able to label these new actors using topical and signature annotations so that they can be saved and then discovered later using the semantic techniques outlined above. When annotating these new actors, the scientist may have a need for new terms in the ontology to properly classify their new actor. These could be topical terms or new semantic data types that are not yet represented. In addition, the ontology may not capture any particular scientists assumptions and knowledge of the domain area accurately, so they may wish to add to or reorganize the ontology to better reflect their semantic worldview. In both of these cases, the Kepler GUI needs to accomodate modification of the ontologies. In addition, basic capabilities are needed for scientists to browse and search the ontologies themselves, e.g., to determine whether appropriate terms exist for annotation.
Line 35 was replaced by line 41
- Data that are created by executing a workflow will be saved locally and possibly remotely and will need to be semantically labeled as well to maximize its utility. We hope that part of the semantic annotation may be propagated through the workflow automatically, but in the absence of this advanced feature the scientist would need to be able to annotate the derived data with its appropriate semantic type. This operation is closely related to annotating the ports of a new actor.
+ Data that are created by executing a workflow will be saved locally and possibly remotely and will need to be semantically labeled as well to maximize its utility. Under certain conditions, signature annotations can be propagated through the workflow automatically (see Bowers and Ludaescher, 2005), but in the absence of this advanced feature, the scientist may also desire to annotate the derived data with its appropriate semantic type. This operation is closely related to annotating the ports of a new actor.
Line 38 was replaced by lines 44-45
- How exactly this would work is an open topic. In the current Kepler GUI the topical ontology is represented as a tree control on the left side of the window. Each category shows up in the tree in all of its parent categories, and actors that have been annotated show up i the tree node for each ontology temr that applies. Thus, actors may show up in one, two, or more parts of the tree. To annotate a new actor, we envision being able to drag a composite actor from the canvas and drop it onto the ontology category in the tree. It could then be dragged again to add it to another category as well. This operation adds the new actor to the subsumption hierarchy at the appropriate place(s). If the ontology ends up representing more than just subsumption, we will need an GUI ability to create these additional relationships. One of our concerns is that the folder view that currently is shown in the tree conveys a much more informal representation of the ontology than is appropriate. So, we have considered changing the icons from folders to another 'container' icon that might reduce some implications of a folder. Strictly speaking, one physical item can not be in more than one folder at a time, so the folder metaphor isn't really appropriate for our case. The tree itself isn't really accurate as the ontology is a graph, so we could and should consider whether we want to use a more appropriate representation of the ontology in the left pane. Of course, switching to a graph representation could be very confusing and so any benefits in accuracy of the visualization would need to be weighed against the cost of using a less familiar and more complicated representation.
+
+ The best approach for presenting ontologies to Kepler users remains an open topic. In the current Kepler GUI, the topical ontology is represented as a tree control on the left side of the window. Each category shows up in the tree in all of its parent categories, and actors that have been annotated show up in the tree node for each (most specific) ontology term that applies. Thus, actors may show up in one, two, or more parts of the tree. To annotate a new actor, one can currently interact with the tree control to insert an actor under the desired term. The current implementation uses a highly simplified ontology (i.e., a single hierarchy with no part-of relationships.) If the ontology becomes more complex, we may need extensions to display and represent, e.g., these additional relationships or multiple hierarchies.
Line 40 was replaced by line 47
- An alternative approach might be to right click on an actor and be shown a dilaog for classifying the actor. This dialog would both show the ontology and show the actor and its ports and allow the components of the actor to be annotated with the terms from the ontology.
+ Another addition we have implemented is the ability to add new concepts dynamically to the tree control. One of our concerns is that the folder view currently shown in the tree conveys a much more informal representation of an ontology than is appropriate. An ontology structure significantly differs from a folder structure. For example, in a folder structure, a physical item can not be in more than one folder at a time, whereas an object can be classified by multiple concepts. Another major difference is that folders do not transitively associate their elements, whereas in an ontology, concept subsumption is transitive. In other words, a file in a folder is not a file in a parent folder. However, an object that is an instance of a concept B is also considered an instance of all concepts A that subsume B. Additionally, an ontology with part-of relationships, i.e., a graph, is cumbersome to represent using a tree structure.
Line 42 was replaced by line 49
- Another alternative proposed by B. Ludaescher is to allow the user to enter Sparrow expressions in a text box for describing actor annotations. This is simpler to implement but has significant usability questions, partly because it assumes that the user has detailed knowledge of the terms and relationships between terms inthe ontology.
+ The following lists some possible ways to handle these discrepencies and provide better support for annotation and ontology GUIs in Kepler:
Line 44 was replaced by lines 51-55
- In many ways, annotating the semantic tpye of the ports (the actor's I/O signature) involves a different set of operations in the GUI than annotating what the actor is 'about' in terms of its function and algorithm.
+ * Replace the current tree widge with a graph widget. Switching to a graph representation, however, could be very confusing and so any benefits in accuracy of the visualization would need to be weighed against the cost of using a less familiar and more complicated representation.
+ * Change the icons from folders to another 'container' icon. The goal of switching icons is to signal a different user model than folders.
+ * Permit right-click actions on an actor that display an annotation dialog. This dialog would both show the ontology and show the actor and its ports and allow the components of the actor to be annotated with the terms from the ontology.
+ * Allow the user to enter text-based (e.g., Sparrow) expressions in a text box for providing actor annotations. This is simpler to implement but has usability questions, partly because it assumes that the user has detailed knowledge of the terms and relationships between terms in the ontology. It may be possible to "mix" modes, i.e., have one mode to enter the annotation and another to browser and search for terms.
+ * Provide separate "modes" or operations for topical and signature annotations.
At line 48 added 1 line.
+
Line 51 was replaced by line 63
- A related aspect is being able to construct a transformation path to get between two compatible actors if they can't be linked. The semantic type checker might be able to determine that two actors are incompatible for direct linking but might be compatibel with some series of transformations that might be applied to the data stream. For a trivial example, if a data set contains species counts for individual species over a particular area and an actor requires as input the number of species present in an area, a simple transformation can be applied (filter where count > 0) to allow the data to be used. Thus, the data can be used after a simple transformation. Of course, realistic transformations would probably require multiple steps, so the challenge here is to generate the appropiate composite workflow that takes the semantic type of the original data set as input and outputs the sematic type of the original actor.
+ A related case is when two actors (or a data set and an actor) are semantically compatible but structurally incompatible. For this case, we want to help the user construct a transformation path that allows the two actors to be connected structurally (while maintaining the semantic compatibility). An example of a transformation path is a series of transformation (or "shim") actors that sit between the two actors and re-structure the output data stream to be structurally compatible with the input data stream. For a trivial example, if a data set contains species counts for individual species over a particular area and an actor requires as input the number of species present in an area, a simple transformation can be applied (filter where count > 0) to allow the data to be used. Thus, the data can be used after a simple transformation. Our goal is to use the signature annotations, existing actors, and simple expressions to help generate these transformation paths. Of course, realistic transformations may require multiple and complex steps. In general, we may only be able to provide a partial transformation path requiring the user to provide additional "shims" to perform the needed conversions.
Line 53 was replaced by line 65
- An equivalent process can be applied when aggregating/integrating data such that multiple semantic and structural types incoming can result in one outgoing structural and semantic type.
+ A similar process can be applied when aggregating/integrating data for "binding" data to a workflow or for merging semantically compatible datasets.
At line 55 added 1 line.
+
Line 58 was replaced by line 71
- For semantic type checking, it has been proposed that a buttonbe placed on the Kepler UI that can invoke the type checking algorithm, which would then color code components based on their compatibility. For example, in a workflow A - B - C - D, the link from B to C might be semantically invalid and it could be highlighted. If the operation is not computatinally complex (ie, time consuming), the type check operation could be run in the background and the results highlighted on the canvas dynamically as changes are made to the workflow (akin to the red squiggly line in word processors indicating a spelling error). This would eliminate the need for a separate button and give the user immediate feedback about their workflow composition.
+ For semantic type checking, it has been proposed that a button be placed on the Kepler UI that can invoke the type checking algorithm, which would then color code components based on their compatibility. For example, in a workflow {{A -> B -> C -> D}}, the link from B to C might be semantically invalid and it could be highlighted. If the operation is not computatinally complex (i.e., time consuming), the type check operation could be run in the background and the results highlighted on the canvas dynamically as changes are made to the workflow (akin to the red squiggly line in word processors indicating a spelling error or the infamous, somersaulting paper clip). This approach would eliminate the need for a separate button and give the user immediate feedback about their workflow composition.
Line 60 was replaced by line 73
- For the transformations, we have discussed putting two actors on screen that are semantically compatible through a transformation and selecting a context menu item 'Generate transformation step'. This algorithm would generate a composite actor with the appropriate semantic types and create the workflow needed to handle the semantic transformation. In doing this, the semantic mediation system would need to know the correspondence between various types of semantic incompatibilities and how particular actors can be used in resolving those incompatibilities. For simple cases this seems tractable (e.g., convert density to occurence), but for more complex cases its an open question as to whether it is tractable -- this would essentially be an automated program generator.
+ For the transformations, we have discussed putting two actors on the screen that are semantically compatible through a transformation and selecting a context menu item 'Generate transformation step'. This algorithm would generate a composite actor with the appropriate semantic types and create the workflow needed to handle the semantic transformation. In doing this, the semantic mediation system would need to know the correspondence between various types of semantic incompatibilities and how particular actors can be used in resolving those incompatibilities. For simple cases this seems tractable (e.g., convert density to occurence), but for more complex cases its an open question as to whether it is tractable -- this would essentially be an automated program generator.
Removed line 65
- Semantic workflow design takes the semantic type checking and transformation discussion to a greater level of sophistication. Here we envision a system where a user can create workflows as an 'abstract model' which is then specified at successively more concrete levels. For example, a user might want a simple workflow that takes 'SpeciesDistribution' data, connects it to a 'SpeciesDistributionPredictionModel', which creates a 'PredictedSpeciesDistributionMap'. This high-level, abstract workflow is not in itself executable, but it constrains the world of possibilites. The user must then drill down to provide more concrete (although still possibly not executable) specifications for each of the abstract components. At some level of concreteness the semantic mediation system should be able to recognize specific executable actors that could be used to realize the abstract model. These could be suggested to the workflow designer or automatically inserted in cases where the use is unambiguously correct.
At line 66 added 2 lines.
+ Semantic workflow design takes the semantic type checking and transformation discussion to a greater level of sophistication. Here we envision a system where a user can create workflows as an 'abstract model', which is then specified at successively more concrete levels. For example, a user might want a simple workflow that takes 'SpeciesDistribution' data, connects it to a 'SpeciesDistributionPredictionModel', which creates a 'PredictedSpeciesDistributionMap'. This high-level, abstract workflow is not in itself executable, but it constrains the world of possibilites. The user must then drill down to provide more concrete (although still possibly not executable) specifications for each of the abstract components. At some level of concreteness the semantic mediation system should be able to recognize specific executable actors that could be used to realize the abstract model. These could be suggested to the workflow designer or automatically inserted in cases where the use is unambiguously correct.
+
Line 72 was replaced by lines 86-87
- The GUI in Kepler is already quite amenable to this approach in that it supports hierarchy in the models now. Right now composite actors are a form of abstraction that could be the basis for abstract models. The GUI changes needed would involve being able to drag abstract components from the ontology onto the workflow and visually indicate that these portions of the workflow are abstract and need to be further specified. All of the semantic type checking and tranformation tools described above could be utilized. The semantic type checker should be able to determine if the model has been specified to a sufficiently concrete level to be executed, and that would in turn trigger a visual indication on the canvas or in the GUI that the model was ready to be executed. For example, the director icon (if such a thing were present) might change shape or color, or the 'Run' button in the toolbar might switch from deactivated to actived with corresponding visual changes.
+
+ The GUI in Kepler is already amenable to this approach in that it currently supports hierarchical models. Composite actors are a form of abstraction that could be the basis for abstract models. The GUI changes needed would involve being able to drag abstract components from the ontology onto the workflow and visually indicate that these portions of the workflow are abstract and need to be further specified. All of the semantic type checking and tranformation tools described above could then be utilized. The semantic compatibility checker should be able to determine if the model has been specified to a sufficiently concrete level to be executed, and that would in turn trigger a visual indication on the canvas or in the GUI that the model was ready to be executed. For example, the director icon (if such a thing were present) might change shape or color, or the 'Run' button in the toolbar might switch from deactivated to actived with corresponding visual changes.
At line 75 added 8 lines.
+
+ [#1] The term "SpatialSpeciesDistributionModel" is highly specialized, and might not exist as such in an ontology. In practice, such terms are "built from" existing terms and ontology structures. In this example, the term "SpatialSpeciesDistributionModel" may be explicitly defined within an ontology as a sub-concept of one or more terms (i.e., placed within a subsumption hierarchy). For example, the term may be defined as a sub-concept of a "Model", or possibly a sub-concept of a "DistributionModel", or even a sub-concept of both a "SpatialDistributionModel" and a "SpeciesDistributionModel" (assuming these terms are present). Alternatively, an ontology may define the term "Model" as a composite structure having multiple parts (e.g., what the model takes as input and computes as output, dependencies, and so on). As such, the term "SpatialSpeciesDistributionModel" may be defined as a specialization of a "Model" structure by specializing its parts, e.g., asserting that it is a "Model" that "computes" a "SpatialMap" "from" a set of "SpatialDistribution" "over" "Species", where the terms in double-quotes are drawn from an ontology. These two approaches capture the different styles of ontology refered to above. Note that the latter is more precise, and potentially more appropriate for signature annotations because it "breaks" the term into input and output parts.
+
+
+ [#2] Note that in this case, it is also common that two columns are used to represent a species' areal density (one for the species in question, and another for its areal density value), resulting in a much more complex annotation stating that the species in one column has the areal density of another column. Using the annotation and the ontology, we can also deduce compatibility for these more complex cases.
+
+
+
Lines 79-81 were replaced by lines 102-134
- !! Your Name
- Your comment goes here.
-
+ !! Extract from email by Sergui Krivov
+ Coming back to the issues of "folder tree" versus some other
+ interface to ontologies- for deciding this issue it is important to
+ consider what information from ontology is relevant during this
+ annotation process. Let's go down to matrixes and imagine the process:
+
+ User has to select a concept X to annotate a port of an actor A. For
+ that he/she has to look at ready made ontology which is already in
+ Kepler and make a selection. What he/she has to know about a concept in
+ order to select or not select it? What "variables" related to concept
+ are relevant?
+
+ # Is position of concept in class hierarchy important?
+ # Is possible multiple inheritance important?
+ # Do roles (properties and properties restriction) come into picture? Does user need to know them to make a selection?
+ # Does user need to see the set of instances of a given concept?
+ # Does user have to see the other actors (besides actor A)during selection process?
+ # Does user have to see the semantic type of other actors?
+
+ Answers to questions 5 and 6 would dictate if concept selector work as
+ dialog or as a pane. The answer to questions 1-4 would dictate the
+ type of ontology browser/selector. If only #1 has positive answer then
+ tree control is OK. While advocating to use something as GrOWL in
+ capacity of "concept selection dialog" (concept selection panel or
+ buffer) I presumed that answers to the questions 1, 2, 3 are positive.
+ If they are not positive then my suggestion is void. If they are
+ positive then what could be another viable alternative to GrOWL as
+ concept selector? (May be it is hypertext based Sparrow rendering of
+ assertions pertained to certain concept where user could click on a
+ concept and get all Sparrow statement related to it assembled in one
+ place. Perhaps it would be easy to make such dynamic generator of
+ sparrow statement based on Jena or GrOWL model. I will certainly include
+ one into GrOWL in a near future.)

Back to Semantics In Kepler, or to the Page History.