Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



E Science Link Up Oct 04

This is version 38. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]



Meeting notes and updates on the e-Science Link-Up Meeting

(The following notes taken by S. Bowers)

Semantic Registration in Taverna (Pinar Alper)

    • Feta Architecture
      • Ontologist (Chris Wroe) -> Ontology Editor -> DL Reasoner -> Classification (in RDF(S)) -> obtain classification -> Feta, PeDRo
      • Store WSDL Descriptions (in special XML schema), then annotate, and give to Feta
      • The ontology, classified, and the annotated wsdl are merged into a single graph
      • Taverna Workflow Workbench issues "semantic discovery via conceptual descriptions" against feta ... a set of canned queries
    • Feta Engine
      • Feta Loader uses myGrid service onto and domain onto
      • use Jena, e.g., to do RDQL queries, etc.
    • Feta Data Model
      • Operation (name, description, task -- from a bio service ontology, method -- particular type of algo/codes also from onto but not used much, resource, application, hasInput : Parameter, hasOutput : Parameter)
      • Parameter (name, desc, semantic type, format, transport type, collection type, collection format)
      • Service (name, description, author, organizations)
      • WSDL based operation is a subclass of Operation
      • WSDL based Web Service is a subclassof Service (hasOperation : WSDL based operation)
      • workflow, bioMoby service, soaplab service, local java code subclasses of Service and Operation
      • seqHound service is an operation
        • Each parameter can have a semantic type, stating that the parameter is an instance of a class, and the operation can have a "task" which is also a "semantic type" and "method"

SHIM breakout (Jim leads discussion)

    • SHIM (need acronym)
      • semantically compatible, syntactically incompatible services
      • uniprot database (uniprot_record) -> parser and filter shim -> blastp analysis (protein_sequence)
      • working definition: a software component who's main purpose is to syntactically match otherwise incompatible resources. it takes some input, performs some task and produces an output. depending on usage, a shim can be semantically neutral ...
      • in myGrid, basically doing type manipulations (map between abstract types to concrete types), e.g., embl, genbank, fasta concrete types, dna_sequence is an abstract type
      • examples:
        • parser / filter
        • de-referencer
        • syntax translator
        • mapper
        • iterator
      • dereferencer
        • service a (genbank id) -> dereferencer -> service b (genbank record)
        • retreives information from a URL
      • syntax translator
        • service a (dna seq; bsml) -> syntax translator -> service b (dna seq; agave)
      • mapper
        • service a (genbank id) -> mapper -> service b (embl id)
      • iterator
        • service a (collection of x) -> iterator -> service b (a single x)
      • seven steps to shim "nirvana"
      • recognize 2 services are not compatible (syntactically, possibly semantically)
      • recognize the degree of mismatch
        • everything connected to everything
      • identify what type of shiim(s) is/are needed
      • find or manufacture the shim
      • advise user on "semantic safety" of the shim
        • not clear what this means ...
      • invoke the shim
      • record provenance
      • my (Shawn's) proposal: a shim is an actor/service whose input semantic type is the same or more general than the output semantic type

Workflow management and AI Planning (Jim Blythe)

  • Motivation
    • workflows in grid-using communities
    • challenges in supporting workflow management
  • research on workflow planning at usc/isi
    • using ai techniques in Pegasus to generate executable grid workflows
  • using metadata descriptions as first step, to get away from the file encodings of VDL and Pegasus
  • an operator is specified generally as an (if preconditions then add <stuff>) form, in Lisp/Scheme syntax
    • example: user can say: I want the results of a pulsar search at this time and location
  • the generation of the operation defs are done by hand ... began looking at how to construct them automatically

Access Grid Meeting

  • The information model
    • Organization of people, projects, experiments, and so on
    • Operations, ... (Pinar)
    • every data item can be annotated with various type information ... some slides
    • mime types
    • primary objective is to model escience processes, not the domain -- capturing the process provides added value: facilitates contextualization, data-model contracts between components, visualize integrated result object (as a result of a workflow), ...
    • data fusion/integration not guided by this model
  • The aim
    • providing more direct support for the implementation of e-Science processes by:
      • increasing the synergy between components
      • facilitating data-model contracts between myGrid components
      • defining a coherent myGrid architecture
  • Some benefits:
    • automatically capturing provenance and context information that is relevant to the interpretation and sharing of the results of the e-science experiments
    • facilitating personalization and collaboration
  • Implementation
    • a database with a web service interface ... as canned queries
    • generic interface, i.e., sql query
    • performance penality -- overhead, access calls, etc.
  • Questions
    • Does the model support "synthetic" versus "raw/natural" data?
    • What about the set-up and callibration of tools
    • Also, predicted data versus experimentally observed
    • The model is based on CCRC model
    • There are also a lot of standards that should be incorporated, so need some kind of extensibility
    • There needs to be place-holders for these within the information model
    • Related issue is where the results should be stored
    • three stores: one is the third-party databases (e.g., arrayexpress gene expression database ...) and link back
    • this is encompassed by the MIR -- myGrid Info. Repository; like a notebook
  • First thing done with information model
    • Workbench: MIR browser, metadata browser, WF model editor/explorer, feta search gui
    • Taverna execution environment: freefluo, and various plug-ins for MIR, Metadata Storage, and Feta
    • MIR extenral
    • Interestingly, the information model is "viewed" through a tree browser
  • The Mediator
    • Application oriented
      • directly supports the e-Scientist by:
        • providing pre-configured e-Science processes templates (i.e., system level worlkflows)
        • helping capturing and maintaining context information that is relevant to the interpretation and sharing of the results of the e-science experiments
        • facilitating personalization and collaboration
    • middleware-oriented
      • contributes to the synergy between mygrid services by
        • acting as a sink for e-Science events initiated by myGrid components
        • interpreting the intercepted events and triggering interactions w/ other related components entailed by the semantics of those events
        • compensating for possible impedence mismatches with other services both in terms of data types and interaction protocols
          • not really an issue -- won't do much here -- but might be some other components that want to participate, and would need to have this service
        • inspired, etc., by WSMF, WSMO, WSMX, WSML, ..., Deri web-services -- Deter Fensel, et al.
  • Supporting the e-Scientist
    • recurring use-cases can be captured
    • find workflows use-case
    • etc.
  • mediating between services
    • fully service based approach
      • the whole myGrid as a service
      • all communication done through web services (the mediator acts as the front door / gateway)
    • the name mediator taken from Gang of Four pattern with the same name
    • internals
      • mediation layer: action decision logic, event handlers, etc.
      • interface aggregation layer: request router
      • component access layer: mir proxy, enactor proxy, registry proxy, mds store proxy, dqpproxy, etc.
  • all of these doc's are under the MIR portion of the Wiki

Grid Workflow Case Studies / Use Cases

  • Peter Li: Large data set transfer use case from Graves' disease scenario
    • Graves' disease: autoimmune thyroid disease; lymphocytes attack thyroid gland cells causing hyperthyroidism; symptoms: increates pulse rate, sweating, heat interolerance, goitre, exophthalmose; inherited
    • In silico experiments: microarray data analysis, gene annotation pipeline, design of genotype assays for SNP variations
    • large data set transfer problem: ~9 data sets x 60 mb of GD array data; affyR service integrates data sets, ...
    • demo

  • Tom Oinn
    • service a passes data to service b
    • service b may start before service a finished execution
    • need a comprehensive solution
    • lsid's won't work
    • to get the data out of it, you have to use soap calls, and you get all the data at once, or none
    • the only way is if the lsid points to a stream -- otherwise lsid arch. won't support it
    • Inferno ... Redding e-Science center (?) in the UK ... Inferno e-service
    • take any command line tool, wrap it up in this mechanism, deal with the reference passing, automatically
    • inputs are urls, protocol called styx
    • basically, a naming convention that lets you denote streams
    • http://www.vitanuova.com/solutions/grid/grid.html

  • Chris Wroe
    • use case from integrative biology
    • oxford and new zealand
    • from dna to whole organism modeling
      • cardiac vulnerability to acute ischemia: step 1; import mechanical model from Aukland Data
      • get mechanical model of heart
        • take slice, place in perfusion bath, top and botttom surfaces isolated, site pacing ...
        • finite elelent approach
      • properties of fusion bath
      • protocol for what they do in the experiment: pace at 250ms, apply shock, repeat with diff. interfals, etc.
      • each simulation takes a week
      • perturb initial conditions; stage 1 hypoxia (lack of oxygen), stage 2 hypoxia
      • data analysis: construct activation map, measure activation potential duration, threshold for fibrillation, file produced every 1ms, big
      • perl/shell scripts for all of this
    • want to e-iffy this.
      • simulation step
      • long running, no other examples of this in myGrid
      • finite element bidomain solver: mechanical model, electrophysio model, simulation protocol, initial conditions, parameters -> result file produced for every 1ms 7.3 mb
      • monitor, stop, checkpoint, discard, restart with different parameters
      • a mesh problem ... so more computation and you still run it for a week
    • http://www.geodise.org Simon Cox

  • Jeffrey Grethe
    • BIRN workflow requirements (Biomedical informatics research network)
    • enable new understanding of neurological disease by integrating data across multiple scales from macroscopic brain function etc.
    • telescience portal enabled tomography workflow
      • composed of the sequence of steps required to acquire, process, visualize, and extract useful information from a 3D volume
    • morphometry workflow
      • structural analysis of data
      • large amounts of pre-processing
        • normalization, calibration, etc., to get data in a form to be analyzed
        • most methods in the pre-process stream can lead to errors
        • requires manual editing, etc., and have a set of checkpoints, where a user interacts
      • moving towards high-performance computing resources
    • parameter sweeps
      • taking birn-miriad numbers and comparing to what scientist has done ...
      • researcher traced out diff area of the brain, need to compare fully automated approach
      • looking for correct parameters to use for the imaging
      • get as close as you can to the actual, to the trained researcher, can do: correlate minute changes in actual brain structure to saying to some patient we should put you on some drug regime because you have alzheimers -- to some preventive course of action
      • has picture/slide of the workflow
      • baseline preprocessing can take upwards of a day



Go to top   More info...   Attach file...
This particular version was published on 20-Oct-2004 11:05:39 PDT by SDSC.bowers.