Jones, Zhang, Pereira, Higgins, Tao, Schildhauer, Spears, Berkley
- Dan gave overview of pipeline refactoring
- Discussion of how to handle distributed execution
- Dan has "bulletin-board" model in mind right now
- Matt would like a more 'cluster' oriented approach with a controller
- Ricardo suggests that we should reduce granularity of inputs to make it computationally tractable and then address the parallelization more comprehensively in a second iteration
- Ricardo: cluster at Kansas should be examined to discover issues relevant to the parallelization effort for Kepler
- Rod gave overview or DiGIR/DarwinCore data sources in Kepler
- Exposes data as fields, rows, tables
- Jianting's progress on the GIS actors
- Convex hull, rasterization, buffering
- Delay parallelization effort until we can do it right
- Implement a simpler GARP workflow that can run in one day on one machine
- Preprocessing to a reasonable grid density
- Choose fewer species (e.g., 10-50)
- Possibly eliminate the best subsets approach to reduce computational demand
- Implement whole end-to-end iteration in the workflow for demonstration purposes
- Rod will create another option in the DarwinCore data source to allow aggregations by species (across providers)
- Dan and Chad will handle format conversion from the .raw files to ascii
- Dan will enumerate tasks to complete ENM pipeline in bugzilla, with an overall tracker bug
- Will assign developers as needed to get these steps done
|