Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



ENM Pipeline Conference Call 20 Oct 2004

This is version 11. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]


Participants

Jones, Zhang, Pereira, Higgins, Tao, Schildhauer, Spears, Berkley

Discussion

  • Dan gave overview of pipeline refactoring
    • Discussion of how to handle distributed execution
      • Dan has "bulletin-board" model in mind right now
      • Matt would like a more 'cluster' oriented approach with a controller
    • Ricardo suggests that we should reduce granularity of inputs to make it computationally tractable and then address the parallelization more comprehensively in a second iteration
    • Ricardo: cluster at Kansas should be examined to discover issues relevant to the parallelization effort for Kepler
  • Rod gave overview or DiGIR/DarwinCore data sources in Kepler
    • Exposes data as fields, rows, tables
  • Jianting's progress on the GIS actors
    • Convex hull, rasterization, buffering

Decisions

  • Delay parallelization effort until we can do it right
  • Implement a simpler GARP workflow that can run in one day on one machine
    • Preprocessing to a reasonable grid density
    • Choose fewer species (e.g., 10-50)
    • Possibly eliminate the best subsets approach to reduce comnputational demand
    • Implement whole end-to-end iteration in the workflow for demonstration purposes

Action items

  • Rod will create another option in the DarwinCore data source to allow aggregations by species (across providers)



Go to top   More info...   Attach file...
This particular version was published on 20-Oct-2004 12:15:28 PDT by NCEAS.jones.