Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge




SMS_KR Developers Meeting Notes May 2005

Kepler Architecture Notes (Monday, May 2nd, 2005)

  • The problems:
    • Versioning
      • Conflicts in Dependency Chains (underlying classes and their dependencies)
      • instead of class names, use lsids
    • no way to uniformly manage objects (workflows, data, libraries/dependencies, etc.)
    • no way to "seamlessly" transport objects
      • both publish and download
    • no way to manage local versus remote objects
      • lsids plus object cache ("intelligent" object cache)
      • dynamically load classes into current Java Classloader
      • no central repository(ies)
        • periodic indexing of HDD in background
    • no good way to create functional groups
      • e.g., the core group, plus packages for geospatial, rexpression, etc.
      • a lighter weight kepler

Search/Discovery Kepler Notes (Tuesday, May 3rd, 2005)

  • Topics
    • Actor browsing/searching
    • Data browsing/searching
    • Unify them
    • Helping understand why a result was included

  • Ideas
    • go with the tree approach for actors and data
    • introduce a "shopping cart"
    • experiment (via mockups) of different approaches -- laura et al
      • advanced search function
      • separate window for searching / results

Annotation Interfaces (Tuesday, May 3rd, 2005)

  • For actors and datasets, we are planning on the following
    • hyjack the port configuration dialog by adding a new column called "semantic type"
    • keep the unit column
    • add a (...) button for the semtype column, which will open up a new annotation dialog
    • the new dialog will open a list of the ontology "variables", which will let one select a "class" and then show the direct relationships of that class, e.g., subclasses, superclasses, and properties

Workflow Validation (Wednesday, May 4th, 2005)

  • User interface mockup: What should the interface look like for validating connections?
    • Everytime a connection is made
    • Need six (?) types of connections: datatype valid, invalid, unknown; and semtype valid, invalid, unknown.
    • Laura will look at the various ways to do this, and try to find some good approaches

  • SMS backend services
    • when a relation is connected to an input and ouptut port, a change event will be caught, and an sms method will be called, passing in the output actor-port pair, the input actor-port pair, and the workflow itself.
    • the sms method will reply with three states: valid, invalid, and unknown, which the even handler will use to code the relation with (to show these cases). Also, a textual description of why the match is invalid may be useful ... but we may get to this more in the next step, which is helping a user "fix" the connection
    • it would be useful to do the same thing with datatypes
   public SemtypeValidityCheckResult isSemtypeCompatibleConnection(LSID workflowId, Connection connection); // connection within wf
    • one issue is that if we consider the use of constraints (like in units), then one connection may affect other connections (later down in the workflow), in which case, we would need to extend the return type. Also, this is the reason for including the workflow in the call. The Connection object has four components: OutputActorId, OutputPortName, InputActorId, InputPortName.

Ontologies (May 4th, 2005)

  • Mark presented slides (ref should go here) going over:
    • Gene Ontology
    • Sem. web. comm. references (CO-ODE group); common errors, practical difficulties, rules of thumb.
    • terminological overload: stick to class/property/individual/restriction
      • restrictions:
        • primitive/complete; partial/defined; necessary/necessary and sufficient
        • existential (some) vs universal (only) quantifiers
        • disjointness/overlap; exlusive vs. exhaustive
        • negation not thru absence, but contradiction ("all" quantifier is trivially satisfied if "no value" for propert in question
        • natural language "paraphrasing" to assist
    • Rich's ontologies in terms of protege, and growl
    • kr's current direction: keep ontologies simple, intuitive with clear goals
      • implement deana's biodiv and productivity concept maps in OWL
      • test capability of Units.OWL to do simple units conversion (X g -> .00X kg) in Kepler
        • we are going to adopt Rich's approach for units which is to mimic the uml unit dictionary model
        • sms will provide an operation that takes two annotations (i.e., actor metadata), and returns a properly configured expression actor that can perform the necessary unit conversion (an adapter)
        • there are some special cases for "unit reasoning", in particular, when it is appropriate to do conversion (matt will find Rich's example) and precision
      • develop simple actor intology for parameterizing garp and {gam/nn?}
      • simple data integration use cases (measurement, experimental design)
        • need a mechansism to "integrate" merge operations into search, either as part of the search, or else being able to identify results from a search and then tell kepler those are the ones to be merged
        • the "merge set" should automatically be put, e.g., into a composite actor, connected to the "merge" actors
      • kr postdoc successor for rich williams
    • ferdinando: keep it simple, but not simpler
      • put "google" into GrOWL, so one can search annotation properties, concept names, etc.,
    • shawn/matt: need a CAD summarization view for users, and for giving default "views" to help summarize, simplify ontologies
  • More issues:
    • deana and mark need to become competent at either protege, growl, and swoop
    • need user guide to ontologies for ecologists
    • growl interface suggestions (incorporate into kepler), for exploration and annotation selection for domain scientists
    • biodiversity wf-simple data integration, simple dataset "transpose", r-scripts
    • community wiki to discuss ontologies

Semi-automated data integration (Wednesday, May 4th, 2005)

  • Jenny presented slides (ref here) on SCIA ("sie-ah"):
    • the schema mapping problem
      • schema matching (inter-schema correspondences) versus schema mapping (view generation)
      • schemas: dtd, xml schema, relational schema, OWL-full
      • automatic matching impossible
    • requirements for schema mapping
      • correctness depends on application
      • needs to be quick
      • semantics easier with interactive approaches
    • approach:
      • scia, semi-automatic approach
      • critical points; users do critical points (hardest ones); system does the rest(simplest ones); iterate until satisfied
      • ...
    • critical points
      • occurs wehn a core context has either no good matches, or else has more than one good one-to-one match
      • core contexts are most important contextualizing elements for tags within subtrees
      • correctness of core context matches greatly affects correctness of matches for all nodes under them
    • examples
      • (n:1) order contact might match contact for billing, shipping and/or suplier
      • (n:1) concat of first name and lat name; species count divided by are for density
      • if bib/book/author is found as best match candiate to arts/arcticle/author
    • incorporate similarity flooding, but make small change
    • use cases
      • bibliography use case
      • Note: the examples use a thesaurus, which derives name matches, e.g., site -> station; the name matches are weighted at provide 2/10 of the accuracy in the overall matching accuracy calculation
      • ...

Semi-automated workflow composition (Thursday, 5th, 2005)

  • Given a connection (relation) between two actors that has been flagged as incompatible, what next?
    • We want to allow for adapter insertion
    • Where the adapter can be "implemented" according to a number of approaches
      • Search for actors/composites that can perform the conversion
        • We want to use the "parameter dependencies" to insert the simple adapters
        • We want to provide simple structural conversions, like map
        • This could employ "planning" style approaches as well
      • If structural and not semantic, apply a DILS-like approach, given as input to SCIA
      • If structural and semantic, apply SCIA
      • Allow user to fill in the adapter itself

  • Changes to SCIA that may be needed
    • Performance issues:
      • Generating XQuery, converting to XML, etc., may not be efficient in certain situations
      • Try outputing R instead
      • Try compiling directly to Java code
      • Search for existing conversions as a user elaborates a mapping (needs to call sms search capabilities ?)

Demos and Ontologies (Thursday, May 5th, 2005)

  • More with Laura on Symbology
    • need to expand # of icons in current propopsal (Laura will put up wiki page)
    • consider mechanism for communicating both composite form and its function (kind of composite)
    • add "external program" icon
    • need to define small set of types of computations
    • add "graphs" icon to display set of icons which would include scatter plots, bar graphs etc.
    • consider defining small set of atomic actors that go across ontologies and small set of icons for ecology domain (issue here would still be that some kind of functions might be represented differently across ontologies)
    • consider tying symbols to semantics in ontology but this might still lead to lots of icons
    • consider providing a baseline set of symbols while controlling color but allow others to add their own symbols -- this could results in a large number of symbols and also could result in multiple symbols representing the same concept.
  • SCIA demo
  • Ferdinando

Tasks, Priorities, Milestones, Assignments (Thursday, May 5th, 2005)

  • Matt has the list ...

Go to top   Edit this page   More info...   Attach file...
This page last changed on 12-May-2005 13:53:06 PDT by LTER.ldowney.