SMS_KR Developers Meeting Notes May 2005

Kepler Architecture Notes (Monday, May 2nd, 2005)

The problems:

Versioning

Conflicts in Dependency Chains (underlying classes and their dependencies)
instead of class names, use lsids

no way to uniformly manage objects (workflows, data, libraries/dependencies, etc.)
no way to "seamlessly" transport objects

both publish and download

no way to manage local versus remote objects

lsids plus object cache ("intelligent" object cache)
dynamically load classes into current Java Classloader
no central repository(ies)

periodic indexing of HDD in background

no good way to create functional groups

e.g., the core group, plus packages for geospatial, rexpression, etc.
a lighter weight kepler

Search/Discovery Kepler Notes (Tuesday, May 3rd, 2005)

Topics

Actor browsing/searching
Data browsing/searching
Unify them
Helping understand why a result was included

Ideas

go with the tree approach for actors and data
introduce a "shopping cart"
experiment (via mockups) of different approaches -- laura et al

advanced search function
separate window for searching / results

Annotation Interfaces (Tuesday, May 3rd, 2005)

For actors and datasets, we are planning on the following

hyjack the port configuration dialog by adding a new column called "semantic type"
keep the unit column
add a (...) button for the semtype column, which will open up a new annotation dialog
the new dialog will open a list of the ontology "variables", which will let one select a "class" and then show the direct relationships of that class, e.g., subclasses, superclasses, and properties

Workflow Validation (Wednesday, May 4th, 2005)

User interface mockup: What should the interface look like for validating connections?

Everytime a connection is made
Need six (?) types of connections: datatype valid, invalid, unknown; and semtype valid, invalid, unknown.
Laura will look at the various ways to do this, and try to find some good approaches

SMS backend services

when a relation is connected to an input and ouptut port, a change event will be caught, and an sms method will be called, passing in the output actor-port pair, the input actor-port pair, and the workflow itself.
the sms method will reply with three states: valid, invalid, and unknown, which the even handler will use to code the relation with (to show these cases). Also, a textual description of why the match is invalid may be useful ... but we may get to this more in the next step, which is helping a user "fix" the connection
it would be useful to do the same thing with datatypes

   public SemtypeValidityCheckResult isSemtypeCompatibleConnection(LSID workflowId, Connection connection); // connection within wf

one issue is that if we consider the use of constraints (like in units), then one connection may affect other connections (later down in the workflow), in which case, we would need to extend the return type. Also, this is the reason for including the workflow in the call. The Connection object has four components: OutputActorId, OutputPortName, InputActorId, InputPortName.

Ontologies (May 4th, 2005)

Mark presented slides (ref should go here) going over:

Gene Ontology
Sem. web. comm. references (CO-ODE group); common errors, practical difficulties, rules of thumb.
terminological overload: stick to class/property/individual/restriction

restrictions:

primitive/complete; partial/defined; necessary/necessary and sufficient
existential (some) vs universal (only) quantifiers
disjointness/overlap; exlusive vs. exhaustive
negation not thru absence, but contradiction ("all" quantifier is trivially satisfied if "no value" for propert in question
natural language "paraphrasing" to assist

Rich's ontologies in terms of protege, and growl
kr's current direction: keep ontologies simple, intuitive with clear goals

implement deana's biodiv and productivity concept maps in OWL
test capability of Units.OWL to do simple units conversion (X g -> .00X kg) in Kepler

we are going to adopt Rich's approach for units which is to mimic the uml unit dictionary model
sms will provide an operation that takes two annotations (i.e., actor metadata), and returns a properly configured expression actor that can perform the necessary unit conversion (an adapter)
there are some special cases for "unit reasoning", in particular, when it is appropriate to do conversion (matt will find Rich's example) and precision

develop simple actor intology for parameterizing garp and {gam/nn?}
simple data integration use cases (measurement, experimental design)

need a mechansism to "integrate" merge operations into search, either as part of the search, or else being able to identify results from a search and then tell kepler those are the ones to be merged
the "merge set" should automatically be put, e.g., into a composite actor, connected to the "merge" actors

kr postdoc successor for rich williams

ferdinando: keep it simple, but not simpler

put "google" into GrOWL, so one can search annotation properties, concept names, etc.,

shawn/matt: need a CAD summarization view for users, and for giving default "views" to help summarize, simplify ontologies

More issues:

deana and mark need to become competent at either protege, growl, and swoop
need user guide to ontologies for ecologists
growl interface suggestions (incorporate into kepler), for exploration and annotation selection for domain scientists
biodiversity wf-simple data integration, simple dataset "transpose", r-scripts
community wiki to discuss ontologies

Semi-automated data integration (Wednesday, May 4th, 2005)

Jenny presented slides (ref here) on SCIA ("sie-ah"):

the schema mapping problem

schema matching (inter-schema correspondences) versus schema mapping (view generation)
schemas: dtd, xml schema, relational schema, OWL-full
automatic matching impossible

requirements for schema mapping

correctness depends on application
needs to be quick
semantics easier with interactive approaches

approach:

scia, semi-automatic approach
critical points; users do critical points (hardest ones); system does the rest(simplest ones); iterate until satisfied
...

critical points

occurs wehn a core context has either no good matches, or else has more than one good one-to-one match
core contexts are most important contextualizing elements for tags within subtrees
correctness of core context matches greatly affects correctness of matches for all nodes under them

examples

(n:1) order contact might match contact for billing, shipping and/or suplier
(n:1) concat of first name and lat name; species count divided by are for density
if bib/book/author is found as best match candiate to arts/arcticle/author

incorporate similarity flooding, but make small change
use cases

bibliography use case
Note: the examples use a thesaurus, which derives name matches, e.g., site -> station; the name matches are weighted at provide 2/10 of the accuracy in the overall matching accuracy calculation
...

Semi-automated workflow composition (Thursday, 5th, 2005)

Given a connection (relation) between two actors that has been flagged as incompatible, what next?

We want to allow for adapter insertion
Where the adapter can be "implemented" according to a number of approaches

Search for actors/composites that can perform the conversion

We want to use the "parameter dependencies" to insert the simple adapters
We want to provide simple structural conversions, like map
This could employ "planning" style approaches as well

If structural and not semantic, apply a DILS-like approach, given as input to SCIA
If structural and semantic, apply SCIA
Allow user to fill in the adapter itself

Changes to SCIA that may be needed

Performance issues:

Generating XQuery, converting to XML, etc., may not be efficient in certain situations
Try outputing R instead
Try compiling directly to Java code
Search for existing conversions as a user elaborates a mapping (needs to call sms search capabilities ?)

Demos and Ontologies (Thursday, May 5th, 2005)

More with Laura on Symbology

need to expand # of icons in current propopsal (Laura will put up wiki page)
consider mechanism for communicating both composite form and its function (kind of composite)
add "external program" icon
need to define small set of types of computations
add "graphs" icon to display set of icons which would include scatter plots, bar graphs etc.
consider defining small set of atomic actors that go across ontologies and small set of icons for ecology domain (issue here would still be that some kind of functions might be represented differently across ontologies)
consider tying symbols to semantics in ontology but this might still lead to lots of icons
consider providing a baseline set of symbols while controlling color but allow others to add their own symbols -- this could results in a large number of symbols and also could result in multiple symbols representing the same concept.

SCIA demo
Ferdinando

Tasks, Priorities, Milestones, Assignments (Thursday, May 5th, 2005)

Matt has the list ...

Go to top Edit this page More info... Attach file...

This page last changed on 12-May-2005 13:53:06 PDT by LTER.ldowney.