Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



ENM Pipeline Conference Call 20 Oct 2004

Participants

Jones, Zhang, Pereira, Higgins, Tao, Schildhauer, Spears, Berkley

Discussion

  • Dan gave overview of pipeline refactoring
    • Discussion of how to handle distributed execution
      • Dan has "bulletin-board" model in mind right now
      • Matt would like a more 'cluster' oriented approach with a controller
    • Ricardo suggests that we should reduce granularity of inputs to make it computationally tractable and then address the parallelization more comprehensively in a second iteration
    • Ricardo: cluster at Kansas should be examined to discover issues relevant to the parallelization effort for Kepler
  • Rod gave overview or DiGIR/DarwinCore data sources in Kepler
    • Exposes data as fields, rows, tables
  • Jianting's progress on the GIS actors
    • Convex hull, rasterization, buffering

Decisions

  • Delay parallelization effort until we can do it right
  • Implement a simpler GARP workflow that can run in one day on one machine
    • Preprocessing to a reasonable grid density
    • Choose fewer species (e.g., 10-50)
    • Possibly eliminate the best subsets approach to reduce computational demand
    • Implement whole end-to-end iteration in the workflow for demonstration purposes

Action items

  • Rod will create another option in the DarwinCore data source to allow aggregations by species (across providers)
  • Dan and Chad will handle format conversion from the .raw files to ascii
  • Dan will enumerate tasks to complete ENM pipeline in bugzilla, with an overall tracker bug
    • Will assign developers as needed to get these steps done



Go to top   Edit this page   More info...   Attach file...
This page last changed on 20-Oct-2004 12:29:05 PDT by NCEAS.jones.