Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



E Science Link Up Oct 04

This is version 36. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]



Meeting notes and updates on the e-Science Link-Up Meeting

(The following notes taken by S. Bowers)

Semantic Registration in Taverna (Pinar Alper)

    • Feta Architecture
      • Ontologist (Chris Wroe) -> Ontology Editor -> DL Reasoner -> Classification (in RDF(S)) -> obtain classification -> Feta, PeDRo
      • Store WSDL Descriptions (in special XML schema), then annotate, and give to Feta
      • The ontology, classified, and the annotated wsdl are merged into a single graph
      • Taverna Workflow Workbench issues "semantic discovery via conceptual descriptions" against feta ... a set of canned queries
    • Feta Engine
      • Feta Loader uses myGrid service onto and domain onto
      • use Jena, e.g., to do RDQL queries, etc.
    • Feta Data Model
      • Operation (name, description, task -- from a bio service ontology, method -- particular type of algo/codes also from onto but not used much, resource, application, hasInput : Parameter, hasOutput : Parameter)
      • Parameter (name, desc, semantic type, format, transport type, collection type, collection format)
      • Service (name, description, author, organizations)
      • WSDL based operation is a subclass of Operation
      • WSDL based Web Service is a subclassof Service (hasOperation : WSDL based operation)
      • workflow, bioMoby service, soaplab service, local java code subclasses of Service and Operation
      • seqHound service is an operation
        • Each parameter can have a semantic type, stating that the parameter is an instance of a class, and the operation can have a "task" which is also a "semantic type" and "method"

SHIM breakout (Jim leads discussion)

    • SHIM (need acronym)
      • semantically compatible, syntactically incompatible services
      • uniprot database (uniprot_record) -> parser and filter shim -> blastp analysis (protein_sequence)
      • working definition: a software component who's main purpose is to syntactically match otherwise incompatible resources. it takes some input, performs some task and produces an output. depending on usage, a shim can be semantically neutral ...
      • in myGrid, basically doing type manipulations (map between abstract types to concrete types), e.g., embl, genbank, fasta concrete types, dna_sequence is an abstract type
      • examples:
        • parser / filter
        • de-referencer
        • syntax translator
        • mapper
        • iterator
      • dereferencer
        • service a (genbank id) -> dereferencer -> service b (genbank record)
        • retreives information from a URL
      • syntax translator
        • service a (dna seq; bsml) -> syntax translator -> service b (dna seq; agave)
      • mapper
        • service a (genbank id) -> mapper -> service b (embl id)
      • iterator
        • service a (collection of x) -> iterator -> service b (a single x)
      • seven steps to shim "nirvana"
      • recognize 2 services are not compatible (syntactically, possibly semantically)
      • recognize the degree of mismatch
        • everything connected to everything
      • identify what type of shiim(s) is/are needed
      • find or manufacture the shim
      • advise user on "semantic safety" of the shim
        • not clear what this means ...
      • invoke the shim
      • record provenance
      • my (Shawn's) proposal: a shim is an actor/service whose input semantic type is the same or more general than the output semantic type

Workflow management and AI Planning (Jim Blythe)

  • Motivation
    • workflows in grid-using communities
    • challenges in supporting workflow management
  • research on workflow planning at usc/isi
    • using ai techniques in Pegasus to generate executable grid workflows
  • using metadata descriptions as first step, to get away from the file encodings of VDL and Pegasus
  • an operator is specified generally as an (if preconditions then add <stuff>) form, in Lisp/Scheme syntax
    • example: user can say: I want the results of a pulsar search at this time and location
  • the generation of the operation defs are done by hand ... began looking at how to construct them automatically

Access Grid Meeting

  • The information model
    • Organization of people, projects, experiments, and so on
    • Operations, ... (Pinar)
    • every data item can be annotated with various type information ... some slides
    • mime types
    • primary objective is to model escience processes, not the domain -- capturing the process provides added value: facilitates contextualization, data-model contracts between components, visualize integrated result object (as a result of a workflow), ...
    • data fusion/integration not guided by this model
  • The aim
    • providing more direct support for the implementation of e-Science processes by:
      • increasing the synergy between components
      • facilitating data-model contracts between myGrid components
      • defining a coherent myGrid architecture
  • Some benefits:
    • automatically capturing provenance and context information that is relevant to the interpretation and sharing of the results of the e-science experiments
    • facilitating personalization and collaboration
  • Implementation
    • a database with a web service interface ... as canned queries
    • generic interface, i.e., sql query
    • performance penality -- overhead, access calls, etc.
  • Questions
    • Does the model support "synthetic" versus "raw/natural" data?
    • What about the set-up and callibration of tools
    • Also, predicted data versus experimentally observed
    • The model is based on CCRC model
    • There are also a lot of standards that should be incorporated, so need some kind of extensibility
    • There needs to be place-holders for these within the information model
    • Related issue is where the results should be stored
    • three stores: one is the third-party databases (e.g., arrayexpress gene expression database ...) and link back
    • this is encompassed by the MIR -- myGrid Info. Repository; like a notebook
  • First thing done with information model
    • Workbench: MIR browser, metadata browser, WF model editor/explorer, feta search gui
    • Taverna execution environment: freefluo, and various plug-ins for MIR, Metadata Storage, and Feta
    • MIR extenral
    • Interestingly, the information model is "viewed" through a tree browser
  • The Mediator
    • Application oriented
      • directly supports the e-Scientist by:
        • providing pre-configured e-Science processes templates (i.e., system level worlkflows)
        • helping capturing and maintaining context information that is relevant to the interpretation and sharing of the results of the e-science experiments
        • facilitating personalization and collaboration
    • middleware-oriented
      • contributes to the synergy between mygrid services by
        • acting as a sink for e-Science events initiated by myGrid components
        • interpreting the intercepted events and triggering interactions w/ other related components entailed by the semantics of those events
        • compensating for possible impedence mismatches with other services both in terms of data types and interaction protocols
          • not really an issue -- won't do much here -- but might be some other components that want to participate, and would need to have this service
        • inspired, etc., by WSMF, WSMO, WSMX, WSML, ..., Deri web-services -- Deter Fensel, et al.
  • Supporting the e-Scientist
    • recurring use-cases can be captured
    • find workflows use-case
    • etc.
  • mediating between services
    • fully service based approach
      • the whole myGrid as a service
      • all communication done through web services (the mediator acts as the front door / gateway)
    • the name mediator taken from Gang of Four pattern with the same name
    • internals
      • mediation layer: action decision logic, event handlers, etc.
      • interface aggregation layer: request router
      • component access layer: mir proxy, enactor proxy, registry proxy, mds store proxy, dqpproxy, etc.
  • all of these doc's are under the MIR portion of the Wiki

Grid Workflow Case Studies / Use Cases

  • Peter Li: Large data set transfer use case from Graves' disease scenario
    • Graves' disease: autoimmune thyroid disease; lymphocytes attack thyroid gland cells causing hyperthyroidism; symptoms: increates pulse rate, sweating, heat interolerance, goitre, exophthalmose; inherited
    • In silico experiments: microarray data analysis, gene annotation pipeline, design of genotype assays for SNP variations
    • large data set transfer problem: ~9 data sets x 60 mb of GD array data; affyR service integrates data sets, ...
    • demo

  • Tom Oinn
    • service a passes data to service b
    • service b may start before service a finished execution
    • need a comprehensive solution
    • lsid's won't work
    • to get the data out of it, you have to use soap calls, and you get all the data at once, or none
    • the only way is if the lsid points to a stream -- otherwise lsid arch. won't support it
    • Inferno ... Redding e-Science center (?) in the UK ... Inferno e-service
    • take any command line tool, wrap it up in this mechanism, deal with the reference passing, automatically
    • inputs are urls, protocol called styx
    • basically, a naming convention that lets you denote streams
    • http://www.vitanuova.com/solutions/grid/grid.html

  • Chris Wroe
    • use case from integrative biology
    • oxford and new zealand
    • from dna to whole organism modeling
      • cardiac vulnerability to acute ischemia: step 1; import mechanical model from Aukland Data
      • get mechanical model of heart
        • take slice, place in perfusion bath, top and botttom surfaces isolated, site pacing ...
        • finite elelent approach
      • properties of fusion bath
      • protocol for what they do in the experiment: pace at 250ms, apply shock, repeat with diff. interfals, etc.
      • each simulation takes a week
      • perturb initial conditions; stage 1 hypoxia (lack of oxygen), stage 2 hypoxia
      • data analysis: construct activation map, measure activation potential duration, threshold for fibrillation, file produced every 1ms, big
      • perl/shell scripts for all of this
    • want to e-iffy this.
      • simulation step
      • long running, no other examples of this in myGrid
      • finite element bidomain solver: mechanical model, electrophysio model, simulation protocol, initial conditions, parameters -> result file produced for every 1ms 7.3 mb
      • monitor, stop, checkpoint, discard, restart with different parameters
      • a mesh problem ... so more computation and you still run it for a week
    • www.GeoDise.org? Simon Cox

  • Jeffrey Grethe
    • BIRN workflow requirements (Biomedical informatics research network)
    • enable new understanding of neurological disease by integrating data across multiple scales from macroscopic brain function etc.



Go to top   More info...   Attach file...
This particular version was published on 20-Oct-2004 10:46:10 PDT by SDSC.bowers.