This is version 40.
It is not the current version, and thus it cannot be edited.
[Back to current version]
[Restore this version]
(The following notes taken by S. Bowers)
- Feta Architecture
- Ontologist (Chris Wroe) -> Ontology Editor -> DL Reasoner -> Classification (in RDF(S)) -> obtain classification -> Feta, PeDRo
- Store WSDL Descriptions (in special XML schema), then annotate, and give to Feta
- The ontology, classified, and the annotated wsdl are merged into a single graph
- Taverna Workflow Workbench issues "semantic discovery via conceptual descriptions" against feta ... a set of canned queries
- Feta Engine
- Feta Loader uses myGrid service onto and domain onto
- use Jena, e.g., to do RDQL queries, etc.
- Feta Data Model
- Operation (name, description, task -- from a bio service ontology, method -- particular type of algo/codes also from onto but not used much, resource, application, hasInput : Parameter, hasOutput : Parameter)
- Parameter (name, desc, semantic type, format, transport type, collection type, collection format)
- Service (name, description, author, organizations)
- WSDL based operation is a subclass of Operation
- WSDL based Web Service is a subclassof Service (hasOperation : WSDL based operation)
- workflow, bioMoby service, soaplab service, local java code subclasses of Service and Operation
- seqHound service is an operation
- Each parameter can have a semantic type, stating that the parameter is an instance of a class, and the operation can have a "task" which is also a "semantic type" and "method"
- SHIM (need acronym)
- semantically compatible, syntactically incompatible services
- uniprot database (uniprot_record) -> parser and filter shim -> blastp analysis (protein_sequence)
- working definition: a software component who's main purpose is to syntactically match otherwise incompatible resources. it takes some input, performs some task and produces an output. depending on usage, a shim can be semantically neutral ...
- in myGrid, basically doing type manipulations (map between abstract types to concrete types), e.g., embl, genbank, fasta concrete types, dna_sequence is an abstract type
- examples:
- parser / filter
- de-referencer
- syntax translator
- mapper
- iterator
- dereferencer
- service a (genbank id) -> dereferencer -> service b (genbank record)
- retreives information from a URL
- syntax translator
- service a (dna seq; bsml) -> syntax translator -> service b (dna seq; agave)
- mapper
- service a (genbank id) -> mapper -> service b (embl id)
- iterator
- service a (collection of x) -> iterator -> service b (a single x)
- seven steps to shim "nirvana"
- recognize 2 services are not compatible (syntactically, possibly semantically)
- recognize the degree of mismatch
- everything connected to everything
- identify what type of shiim(s) is/are needed
- find or manufacture the shim
- advise user on "semantic safety" of the shim
- not clear what this means ...
- invoke the shim
- record provenance
- my (Shawn's) proposal: a shim is an actor/service whose input semantic type is the same or more general than the output semantic type
- Motivation
- workflows in grid-using communities
- challenges in supporting workflow management
- research on workflow planning at usc/isi
- using ai techniques in Pegasus to generate executable grid workflows
- using metadata descriptions as first step, to get away from the file encodings of VDL and Pegasus
- an operator is specified generally as an (if preconditions then add <stuff>) form, in Lisp/Scheme syntax
- example: user can say: I want the results of a pulsar search at this time and location
- the generation of the operation defs are done by hand ... began looking at how to construct them automatically
- The information model
- Organization of people, projects, experiments, and so on
- Operations, ... (Pinar)
- every data item can be annotated with various type information ... some slides
- mime types
- primary objective is to model escience processes, not the domain -- capturing the process provides added value: facilitates contextualization, data-model contracts between components, visualize integrated result object (as a result of a workflow), ...
- data fusion/integration not guided by this model
- The aim
- providing more direct support for the implementation of e-Science processes by:
- increasing the synergy between components
- facilitating data-model contracts between myGrid components
- defining a coherent myGrid architecture
- Some benefits:
- automatically capturing provenance and context information that is relevant to the interpretation and sharing of the results of the e-science experiments
- facilitating personalization and collaboration
- Implementation
- a database with a web service interface ... as canned queries
- generic interface, i.e., sql query
- performance penality -- overhead, access calls, etc.
- Questions
- Does the model support "synthetic" versus "raw/natural" data?
- What about the set-up and callibration of tools
- Also, predicted data versus experimentally observed
- The model is based on CCRC model
- There are also a lot of standards that should be incorporated, so need some kind of extensibility
- There needs to be place-holders for these within the information model
- Related issue is where the results should be stored
- three stores: one is the third-party databases (e.g., arrayexpress gene expression database ...) and link back
- this is encompassed by the MIR -- myGrid Info. Repository; like a notebook
- First thing done with information model
- Workbench: MIR browser, metadata browser, WF model editor/explorer, feta search gui
- Taverna execution environment: freefluo, and various plug-ins for MIR, Metadata Storage, and Feta
- MIR extenral
- Interestingly, the information model is "viewed" through a tree browser
- The Mediator
- Application oriented
- directly supports the e-Scientist by:
- providing pre-configured e-Science processes templates (i.e., system level worlkflows)
- helping capturing and maintaining context information that is relevant to the interpretation and sharing of the results of the e-science experiments
- facilitating personalization and collaboration
- middleware-oriented
- contributes to the synergy between mygrid services by
- acting as a sink for e-Science events initiated by myGrid components
- interpreting the intercepted events and triggering interactions w/ other related components entailed by the semantics of those events
- compensating for possible impedence mismatches with other services both in terms of data types and interaction protocols
- not really an issue -- won't do much here -- but might be some other components that want to participate, and would need to have this service
- inspired, etc., by WSMF, WSMO, WSMX, WSML, ..., Deri web-services -- Deter Fensel, et al.
- Supporting the e-Scientist
- recurring use-cases can be captured
- find workflows use-case
- etc.
- mediating between services
- fully service based approach
- the whole myGrid as a service
- all communication done through web services (the mediator acts as the front door / gateway)
- the name mediator taken from Gang of Four pattern with the same name
- internals
- mediation layer: action decision logic, event handlers, etc.
- interface aggregation layer: request router
- component access layer: mir proxy, enactor proxy, registry proxy, mds store proxy, dqpproxy, etc.
- all of these doc's are under the MIR portion of the Wiki
- Peter Li: Large data set transfer use case from Graves' disease scenario
- Graves' disease: autoimmune thyroid disease; lymphocytes attack thyroid gland cells causing hyperthyroidism; symptoms: increates pulse rate, sweating, heat interolerance, goitre, exophthalmose; inherited
- In silico experiments: microarray data analysis, gene annotation pipeline, design of genotype assays for SNP variations
- large data set transfer problem: ~9 data sets x 60 mb of GD array data; affyR service integrates data sets, ...
- demo
- Tom Oinn
- service a passes data to service b
- service b may start before service a finished execution
- need a comprehensive solution
- lsid's won't work
- to get the data out of it, you have to use soap calls, and you get all the data at once, or none
- the only way is if the lsid points to a stream -- otherwise lsid arch. won't support it
- Inferno ... Redding e-Science center (?) in the UK ... Inferno e-service
- take any command line tool, wrap it up in this mechanism, deal with the reference passing, automatically
- inputs are urls, protocol called styx
- basically, a naming convention that lets you denote streams
- http://www.vitanuova.com/solutions/grid/grid.html
- Chris Wroe
- use case from integrative biology
- oxford and new zealand
- from dna to whole organism modeling
- cardiac vulnerability to acute ischemia: step 1; import mechanical model from Aukland Data
- get mechanical model of heart
- take slice, place in perfusion bath, top and botttom surfaces isolated, site pacing ...
- finite elelent approach
- properties of fusion bath
- protocol for what they do in the experiment: pace at 250ms, apply shock, repeat with diff. interfals, etc.
- each simulation takes a week
- perturb initial conditions; stage 1 hypoxia (lack of oxygen), stage 2 hypoxia
- data analysis: construct activation map, measure activation potential duration, threshold for fibrillation, file produced every 1ms, big
- perl/shell scripts for all of this
- want to e-iffy this.
- simulation step
- long running, no other examples of this in myGrid
- finite element bidomain solver: mechanical model, electrophysio model, simulation protocol, initial conditions, parameters -> result file produced for every 1ms 7.3 mb
- monitor, stop, checkpoint, discard, restart with different parameters
- a mesh problem ... so more computation and you still run it for a week
- http://www.geodise.org Simon Cox
- Jeffrey Grethe
- BIRN workflow requirements (Biomedical informatics research network)
- enable new understanding of neurological disease by integrating data across multiple scales from macroscopic brain function etc.
- telescience portal enabled tomography workflow
- composed of the sequence of steps required to acquire, process, visualize, and extract useful information from a 3D volume
- morphometry workflow
- structural analysis of data
- large amounts of pre-processing
- normalization, calibration, etc., to get data in a form to be analyzed
- most methods in the pre-process stream can lead to errors
- requires manual editing, etc., and have a set of checkpoints, where a user interacts
- moving towards high-performance computing resources
- parameter sweeps
- taking birn-miriad numbers and comparing to what scientist has done ...
- researcher traced out diff area of the brain, need to compare fully automated approach
- looking for correct parameters to use for the imaging
- get as close as you can to the actual, to the trained researcher, can do: correlate minute changes in actual brain structure to saying to some patient we should put you on some drug regime because you have alzheimers -- to some preventive course of action
- has picture/slide of the workflow
- baseline preprocessing can take upwards of a day
- Karan Vahi
- Abstract Workflow (DAX): expressed in terms of logical entities; specifies all logical fils required to gen. the desired data prod. from scratch; dependencies between the jobs; analogous to build style dag
- format for specifying the abstract workflow, id's the recipe for creation
- xml syntax / format
- Concrete workflow ...
- alternate replica mechanisms
- how to manage replicas of the same service?
- haven't been looking at that, because of the mandate of the Pegasus ...
- all jobs run independently, wrapped around java executables, shell scripts, etc.
- leveraging condor, and condor-g, which don't go further with web-services, etc.
|