SEEK-Wiki: Beam Knowledge Rep Sept 04

-SEEK-Home
About SEEK
Tools
Education
Publications
Opportunities
Community
About This Site
Calendar
+            Beam Knowledge Rep Sept 04
         
          
            Your trail: StatusReport19July2004 | StatusReport12July2004 | StatusReport28June2004 | StatusReport21June2004 | StatusReport14June2004 | StatusReport08June2004 | BEAM | EducationOutreachandTrainingTeam | Usability | SEEKSpecificKeplerItems
         
      

      

      

      
         



      
          Difference between 
          version 59 
          and 
          version 32:
          

          
Lines 7-19 were replaced by lines 7-19
- ** Steve Cox (Stephen.Cox@tiehh.ttu.edu|[mailto:Stephen.Cox@tiehh.ttu.edu])
- ** David Chalcraft ([mailto:CHALCRAFTD@MAIL.ECU.EDU])
- ** Shawn Bowers ([mailto:bowers@sdsc.edu])
- ** Bertram Ludaescher ([mailto:ludaesch@sdsc.edu])
- ** Deana Pennington ([mailto:dpennington@lternet.edu])
- ** Mark Schildhauer ([mailto:schild@nceas.ucsb.edu])
- ** Katy Suding ([mailto:ksuding@uci.edu])
- ** Kristin Vanderbilt ([mailto:vanderbi@sevilleta.unm.edu])
- ** Evan Weiher ([mailto:weiher@uwec.edu])
- ** Rich Williams ([mailto:rwilliams@nceas.ucsb.edu])
- ** Chad Berkley ([mailto:berkley@nceas.ucsb.edu])
- ** Dan Higgins ([mailto:higgins@nceas.ucsb.edu])
- ** Jianting Zhang ([mailto:jzhang@lternet.edu])
+ ** Steve Cox ([Stephen.Cox@tiehh.ttu.edu|mailto:Stephen.Cox@tiehh.ttu.edu])
+ ** David Chalcraft ([CHALCRAFTD@MAIL.ECU.EDU|mailto:CHALCRAFTD@MAIL.ECU.EDU])
+ ** Shawn Bowers ([bowers@sdsc.edu|mailto:bowers@sdsc.edu])
+ ** Bertram Ludaescher ([ludaesch@sdsc.edu|mailto:ludaesch@sdsc.edu])
+ ** Deana Pennington ([dpennington@lternet.edu|mailto:dpennington@lternet.edu])
+ ** Mark Schildhauer ([schild@nceas.ucsb.edu|mailto:schild@nceas.ucsb.edu])
+ ** Katy Suding ([ksuding@uci.edu|mailto:ksuding@uci.edu])
+ ** Kristin Vanderbilt ([vanderbi@sevilleta.unm.edu|mailto:vanderbi@sevilleta.unm.edu])
+ ** Evan Weiher ([weiher@uwec.edu|mailto:weiher@uwec.edu])
+ ** Rich Williams ([rwilliams@nceas.ucsb.edu|mailto:rwilliams@nceas.ucsb.edu])
+ ** Chad Berkley ([berkley@nceas.ucsb.edu|mailto:berkley@nceas.ucsb.edu])
+ ** Dan Higgins ([higgins@nceas.ucsb.edu|mailto:higgins@nceas.ucsb.edu])
+ ** Jianting Zhang ([jzhang@lternet.edu|mailto:jzhang@lternet.edu])
Line 25 was replaced by line 25
- * __Introductions (8:30-3:45)__
+ * __Introductions and Presentations__
Line 39 was replaced by lines 39-40
- *** Q: Are most of these datasets freely available on the web; are people sharing them? A: Few available on web ... Q: But is it possible to get them?  A: It is a very divergent / diverse community. Q: W
+ *** Q: Are most of these datasets freely available on the web; are people sharing them? A: Few available on web ... Q: But is it possible to get them?  A: It is a very divergent / diverse community ...
+ ** Shawn Bowers on semantic integration
At line 40 added 1 line.
+
Line 47 was replaced by line 49
- ** Mark: Goal is to compartmentalize that provides general utility for next project, for some other analysis; a standard type of function that we want to capture and describe to use for others
+ ** Mark: Goal is to compartmentalize so as to provides general utility for next project, for some other analysis; capture standard functions that we want to capture and describe to use for others
Line 60 was replaced by line 62
- ** Katy
+ ** Katy (note, the following mixes discussion with Katy's presentation)
Lines 65-70 were replaced by lines 67-71
- **** Every experiment found that as you increase productivy it decreases diversity, which doesn't follow the natural productivity/species-richness curve (this is gaining interest: we are increasing productivity of systems in general with environmental change going on, e.g., urbanization increases nitrogen/fertizilation, and a desire to know the result on diversity)
- **** What is Primary Productivity "can of worms"
- ***** Want to aggregate data at many different scales / communities to get many types of graphs; you want to ultimately go from smallest possible scale to largest scale
- ***** decision making that goes in, before any analysis happens.  This goes into the data discovery/integration. Deana: can we build a repository of methodologies.
- ***** primarily using herbaceous sites
- ***** you do a broad category of the dataset, but not the details (a little bit of woody stuff, ...)
+ **** Every experiment found that as you increase productivy it decreases diversity, which doesn't follow the natural productivity/species-richness curve (this is gaining interest in the community: we are increasing productivity of systems in general with environmental change going on, e.g., urbanization increases nitrogen/fertizilation, and a desire to know the result on diversity)
+ **** What is Primary Productivity "can of worms" discussion
+ ***** Want to aggregate data at many different scales/communities to get many types of graphs; you want to ultimately go from smallest possible scale to largest scale (infer curve of graph)
+ ***** What decision making occurs before any analysis happens?  This goes into the data discovery/integration. Deana: can we build a repository of methodologies?
+ ***** You do a broad category of the dataset, but not the details (a little bit of woody stuff, and so on...)
Lines 74-77 were replaced by lines 75-78
- **** most of the time, productivity scales well, i.e., it is a linear scaling, e.g., anpp vs. area is a linear relationship. so for exmaple, even though they are smaller plots, you can get g/m^2 measures. Basically, productivity doubles as area doubles...
- **** n addition decresases species diversity: plot at lter sites of anpp (above-ground primary productivity) versus relative species density; species density is the number of species observed in a given area
- **** two studies, measures at different scales: either you simulate or measure a species area curve, and extrapolate to different areas.
- **** N fertilization positivevly effects productivity negatively effects diversity
+ **** Most of the time, productivity scales well, i.e., it is a linear scaling, e.g., anpp vs. area is a linear relationship. So for example, even though they are smaller plots, you can get g/m^2 measures. Basically, productivity doubles as area doubles...
+ **** N addition decreases species diversity: plot at lter sites of anpp (above-ground primary productivity) versus relative species density; species density is the number of species observed in a given area
+ **** Two studies, measures at different scales: either you simulate or measure a species area curve, and extrapolate to different areas.
+ **** N fertilization positivevly effects productivity; negatively effects diversity
Lines 79-84 were replaced by lines 80-85
- **** species response to fertilization:
- ****** in the case of increased fertility, can we predict what species we will lose?  What species will become dominant?
- ****** are these responses contingent on system characteristics?
- **** dataset: 8 lter sites, 28 community types (mainly vegetation classifications, based on the experiments/manipulations done at the sites); 831 species
- **** dataset characteristics
- ***** n added (g/m^2/yr), 10 (ARC), 9.5 (CDR), 60/wk (GCE), etc.
+ **** Species response to fertilization:
+ ****** In the case of increased fertility, can we predict what species we will lose?  What species will become dominant?
+ ****** Are these responses contingent on system characteristics?
+ **** Dataset: 8 lter sites, 28 community types (mainly vegetation classifications, based on the experiments/manipulations done at the sites); 831 species
+ **** Dataset characteristics
+ ***** N added (g/m^2/yr), 10 (ARC), 9.5 (CDR), 60/wk (GCE), etc.
Line 90 was replaced by line 91
- **** Often times, a matrix like this is constructed at the beginning of doing a "synthesis", and the most important points to track for each site are: treatment, sample size, duration, and replication.
+ **** Often, a matrix like this is constructed at the beginning of doing a "synthesis", and the most important points to track for each site are: treatment, sample size, duration, and replication.
Lines 95-96 were replaced by lines 96-97
- **** herbaceous Term applied to a nonwoody stem/plant with minimal secondary growth
- **** example dataset:
+ **** Herbaceous term applied to a nonwoody stem/plant with minimal secondary growth
+ **** Example dataset:
Line 105 was replaced by line 106
- ** Both projects pretty muched took from the same original, "raw" data sets
+ ** Both projects pretty much took from the same original, "raw" data sets
Line 108 was replaced by line 109
- ** brainstorming:
+ ** Brainstorming:
Lines 110-118 were replaced by lines 111-303
- *** General Tasks: Identifying data that is relevant (talking with people), permission to obtain of the data, understanding the structure and content of the data (sampling design, how or what attributes mean), and then determining which can be appropriately can be integrated to do an analysis. From Jornada: [http://jornada-www.nmsu.edu/]
- *** anpp versus r and a bunch of data points. how can the data points be linked back to a table of other features ... of the species that are involved in the point. the point represents the set of species in an area (the species "richness") ... the auxiliary table is characteristics of the species
- *** as another example, compare what happens to the species between the points (as area increases); and more importantly the functional traits of the change
- **** these are just visualization things
- *** null model, and the null expectation
- *** abundance vs loss prob for n-fix and not n-fix
- *** standard toolset
- *** measuring functional diversity is a big problem -- a computationally complex problem
- *** for loss-prob and n-fixers; you need to do data integration: how many sites are needed, and so on ...
+ *** General tasks: Identifying data that is relevant (talking with people), permission to obtain the data, understanding the structure and content of the data (sampling design, how or what attributes mean), and then determining which can be appropriately integrated to do an analysis. From Jornada: [http://jornada-www.nmsu.edu/] (go to "Research Data" > "LTER Data" > "Plant")
+ *** anpp versus r and a bunch of data points. how can the data points be linked back to a table of other features ... of the species that are involved in the point. The point represents the set of species in an area (the species "richness") ... the auxiliary table is characteristics of the species
+ *** As another example, compare what happens to the species between the points (as area increases); and more importantly the functional traits of the change
+ **** Comment: These are just visualization things
+ *** Null model, and the null expectation
+ *** Abundance vs loss prob for n-fix and not n-fix
+ *** Standard toolset
+ *** Measuring functional diversity is a big problem -- a computationally complex problem
+ *** For loss-probability and N-fixers you need to do data integration: how many sites are needed, and so on ...
+
+ * __Biodiversity Change Analysis__
+ ** How to understand change in biodiversity: what are the factors causing the change in biodiversity
+ *** Given an attribute matrix, which are changing the most or least at particular segments in the species-area curve
+ *** Focus on Community structure
+ *** Niche breadth
+ *** Predict why diversity would declince / change
+
+ * __Deana's Example__
+ ** Query
+ *** Some data by keyword: "biodiversity", "species counts", "abundance", species names, functional trait
+ *** Location, e.g., place name, coordinates, bounding box
+ *** Sample method
+ *** Analysis type, e.g., counts, abundance
+ ** Construct logically and semantically equivalent views
+ ** Group on sample methods (transect vs plot)
+ ** Data
+ *** data1: transection 20m -> rarefaction (species-area curve) -> scale to 1 m interpolation
+ *** data3: transection 20m -> rarefaction (species-area curve) -> scale to 1 m interpolation
+ *** data2: plot 5m^2 -> species-area curve -> scale to 1 m interpolation
+ *** data4: plot 1m^2
+ ** Integrate transect data
+ ** Integrate plot data
+ ** Construct graphs
+
+
+ ''September 23rd''
+
+ * __Agenda Setting__
+ ** Next meeeting
+ ** Convert Deana's starting workflow to a specific analysis ...
+
+ * __Next Meeting__
+ ** February or March time frame
+ ** What we might speak about / do for next meeting
+ *** Get data, scripts, etc.
+ *** Get the biodiv workflow into Kepler
+ *** Maybe get familiar with kepler / ecological tutorial for feedback
+ *** Data integration examples
+ *** Given the technology -- what do you want to do?
+ *** IRC channel for BEAM? Weekly, bi-weekly scheduled meeting?
+
+ * __Group Breakouts__
+ ** Biodiversity ontology breakout
+ *** Participants: Deana Pennington, Kristin Vanderbilt, Evan Weiher and Rich Williams.
+ ** Biodiversity workflow breakout
+ *** Participants: Steve Cox, David Chalcraft, Shawn Bowers, Bertram Ludaescher, Mark Schildhauer, Chad Berkley, Dan Higgins, Jianting Zhang
+
+
+ * __Notes from ontology breakout__
+ ** The discussion broadly covered traits of plant communities important in biodiversity and productivity experiments and experimental methodologies.  The following notes are raw and will require considerable work to formalize.  As such, the categorizations suggested by the indented formatting should be regarded as preliminary.
+
+ ** __Traits of a Population__ (aggregated group of individuals) of Plant Species:
+ *** Abundance
+ **** Count
+ **** Cover
+ **** Biomass
+ *** Size
+ **** Height
+ ***** Mean, variance of “average height of the highest photosynthetic organ of a well-grown individual”
+ **** Biomass
+ ***** Avg above ground biomass of an individual
+ **** Avg Canopy size (area)
+ *** Path of Resource Uptake
+ **** Photosynthesis (C3, C4, CAM)
+ **** Nitrogen fixing
+ **** Microrhyzal fungi associations (Yes/no, Endo/ecto)
+ *** Modes of Reproduction
+ **** Clonality (None/clumping/branching).
+ ***** What is the definition used to separate clumping and branching?
+ **** Resprouting ability
+ *** Life Form/Habit
+ **** Grass
+ **** Forb
+ **** Subshrub
+ **** Shrub
+ **** Tree
+ **** Vine
+ ***** All these may be definable using other traits (height, woodiness, leaf shape, self-supportingness, perhaps others.
+ *** Life Span
+ **** Annual, biennial, perennial
+ *** Phenological Traits
+ **** Seasonality
+ **** Sprouting cue
+ *** Native/naturalized/non-native
+
+ ** __Traits of Parts of Plants__:  (note the important plant parts given by the trait categories)
+ *** Leaf Traits (Photosynthetic organ traits?)
+ **** Evergreen, deciduous (defined based on leaf longevity?)
+ **** Specific leaf area (Area/mass) or mass per unit area.
+ **** Water content or % dry mass
+ **** Many others ...
+ *** Root Traits
+ *** Stem Traits
+ **** Stem density
+ ***** Mass per unit volume
+ ***** Woody/nonwoody
+ **** Branching pattern
+ *** Seed Traits
+ **** Size
+ ***** Mass
+ **** Shape
+ **** Appendages/Fruits  -- closely related to dispersal categories, often highly correlated with seed size.
+ ***** Fly through the air
+ ***** Stick to an animal
+ ***** Eaten and excreted
+ ***** Cached for later consumption
+ ***** Ballistic dispersal
+ ***** Floating
+
+ ** __Traits of Interactions__
+ *** Competitive ability
+ **** Measure how an individual suppresses the growth of a neighbor
+ *** Interaction strength
+ *** Effect on environment (ability to reduce resources) (Tilman)
+
+ ** __Experimental Methods__
+ *** Experiment
+ **** Field Experiment
+ ***** Observational/Empirical Experiment
+ ***** Manipulation
+ *** All field experiments have
+ **** Where (site, plots etc)
+ **** When (sampling regime)
+ **** What (properties of organism/population/community/system)
+ *** (An empirical experiment is a field experiment with no manipulation)
+ *** Manipulations have one or more Treatments
+ *** Treatment has
+ **** What was treated?
+ **** Strength (amount), can be positive (addition) or negative (exclusion)
+ **** Temporal extent
+ *** When defining a treatment, a scientist might describe a substance (nutrient, presence of an organism) as being manipulated, or describe the manipulation of a process.
+ *** Sampling Regime
+ **** Random
+ **** Stratified
+ **** Stratified random
+ **** Nested
+ **** Regular (uniform)
+ **** Haphazard
+ **** Random haphazard
+ *** Note that the choice of a sampling regime (and of plot layout?) constrains the possible statistical analysis techniques that can be applied.
+ *** Traits of Experiments
+ **** Balanced or unbalanced sampling
+ **** Replication
+ *** Traits of Treatment Regime
+ **** Factorial (all possible combinations of treatments) or not
+ **** Random factors (treatments along a natural gradient)
+
+
+
+ * __Notes from workflow breakout__
+ ** Workflow based on part of Steve and David's Jornada analysis
+ ** [http://jornada-www.nmsu.edu/studies/lter/datasets/plants/nppqdbio/data/nppqdbio.htm]
+ ** General steps outlined:
+ *** Data Request
+ *** Quality Control and Assurance (if from different sites)
+ *** Data Integration
+ *** Quality Control and Assurance (of the integration)
+ *** Analysis
+ *** Capture result of analysis …
+ ** Workflow we examined:
+ [http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/kr-sms/docs/beam_kr_sms_meeting_sept_04_workflow.png]
+ ** Useful Actors
+ *** List Summarizer
+ **** A set of values in a data column
+ *** List Comparator
+ **** Given two sets (lists), do they match?
+ **** Which ones in the first list aren’t in the second
+ **** Assign first list values to new values
+ *** Nested Transpose
+ **** (site, taxon, count)
+ **** {(A, x, 3), (A, y, 1), (B, y, 4), (C, z, 2)}
+ **** Transpose to:
+ ***** (site, x, y, z)
+ ***** {(A, 3, 1, 0), (B, 0, 4, 0), (C, 0, 0, 2)}
+ **** Notes about this from Bertram and Shawn after meeting:
+ ***** Given an annotated schema S, denoted S*. And a white-box actor q s.t. q(S*) -> S’. We want to “push through” the annotations to obtain S’*.
+ ***** The “nested” transpose is basically a combination of various lower-level algebraic operators, such as (theoretical) group-by, matrix transpose, projection, etc. So, given q as such a plan of operators, can we reason over the plan (white box-actor) q to obtain S*’? Using symbolic manipulation? Using the chase, e.g., for similar problems in integrity constraints?
+ ** Often-found pattern of computation
+ *** Can Kepler/Ptolemy efficiently and conveniently support the following pattern?
+ *** Given a data set, construct a scatter plot for pairs of variables, allow user to select a subset of the plots -or- pairs of variables of interest, return data subsets based on chosen pairs (with no extraneous variables)
+ *** Similarly, given data sets, an actor computes a set of regressions, the user is shown the results, the user selects the regressions of interest, and the workflow then proceeds using only those selected regressions
+ *** These "patterns" can be supported now (with lots of plumbing) using the browser actor. Can we also add functionality to better support/model these patterns?
+


          

      

      

      
      Back to Beam Knowledge Rep Sept 04,
       or to the Page History.
 Beam Knowledge Rep Sept 04
 Your trail: StatusReport19July2004 | StatusReport12July2004 | StatusReport28June2004 | StatusReport21June2004 | StatusReport14June2004 | StatusReport08June2004 | BEAM | EducationOutreachandTrainingTeam | Usability | SEEKSpecificKeplerItems
 Lines 7-19 were replaced by lines 7-19
-- ** Steve Cox (Stephen.Cox@tiehh.ttu.edu|[mailto:Stephen.Cox@tiehh.ttu.edu])
-- ** David Chalcraft ([mailto:CHALCRAFTD@MAIL.ECU.EDU])
-- ** Shawn Bowers ([mailto:bowers@sdsc.edu])
-- ** Bertram Ludaescher ([mailto:ludaesch@sdsc.edu])
-- ** Deana Pennington ([mailto:dpennington@lternet.edu])
-- ** Mark Schildhauer ([mailto:schild@nceas.ucsb.edu])
-- ** Katy Suding ([mailto:ksuding@uci.edu])
-- ** Kristin Vanderbilt ([mailto:vanderbi@sevilleta.unm.edu])
-- ** Evan Weiher ([mailto:weiher@uwec.edu])
-- ** Rich Williams ([mailto:rwilliams@nceas.ucsb.edu])
-- ** Chad Berkley ([mailto:berkley@nceas.ucsb.edu])
-- ** Dan Higgins ([mailto:higgins@nceas.ucsb.edu])
-- ** Jianting Zhang ([mailto:jzhang@lternet.edu])
++ ** Steve Cox ([Stephen.Cox@tiehh.ttu.edu|mailto:Stephen.Cox@tiehh.ttu.edu])
++ ** David Chalcraft ([CHALCRAFTD@MAIL.ECU.EDU|mailto:CHALCRAFTD@MAIL.ECU.EDU])
++ ** Shawn Bowers ([bowers@sdsc.edu|mailto:bowers@sdsc.edu])
++ ** Bertram Ludaescher ([ludaesch@sdsc.edu|mailto:ludaesch@sdsc.edu])
++ ** Deana Pennington ([dpennington@lternet.edu|mailto:dpennington@lternet.edu])
++ ** Mark Schildhauer ([schild@nceas.ucsb.edu|mailto:schild@nceas.ucsb.edu])
++ ** Katy Suding ([ksuding@uci.edu|mailto:ksuding@uci.edu])
++ ** Kristin Vanderbilt ([vanderbi@sevilleta.unm.edu|mailto:vanderbi@sevilleta.unm.edu])
++ ** Evan Weiher ([weiher@uwec.edu|mailto:weiher@uwec.edu])
++ ** Rich Williams ([rwilliams@nceas.ucsb.edu|mailto:rwilliams@nceas.ucsb.edu])
++ ** Chad Berkley ([berkley@nceas.ucsb.edu|mailto:berkley@nceas.ucsb.edu])
++ ** Dan Higgins ([higgins@nceas.ucsb.edu|mailto:higgins@nceas.ucsb.edu])
++ ** Jianting Zhang ([jzhang@lternet.edu|mailto:jzhang@lternet.edu])
 Line 25 was replaced by line 25
-- * __Introductions (8:30-3:45)__
++ * __Introductions and Presentations__
 Line 39 was replaced by lines 39-40
-- *** Q: Are most of these datasets freely available on the web; are people sharing them? A: Few available on web ... Q: But is it possible to get them?  A: It is a very divergent / diverse community. Q: W
++ *** Q: Are most of these datasets freely available on the web; are people sharing them? A: Few available on web ... Q: But is it possible to get them?  A: It is a very divergent / diverse community ...
++ ** Shawn Bowers on semantic integration
 At line 40 added 1 line.
++
 Line 47 was replaced by line 49
-- ** Mark: Goal is to compartmentalize that provides general utility for next project, for some other analysis; a standard type of function that we want to capture and describe to use for others
++ ** Mark: Goal is to compartmentalize so as to provides general utility for next project, for some other analysis; capture standard functions that we want to capture and describe to use for others
 Line 60 was replaced by line 62
-- ** Katy
++ ** Katy (note, the following mixes discussion with Katy's presentation)
 Lines 65-70 were replaced by lines 67-71
-- **** Every experiment found that as you increase productivy it decreases diversity, which doesn't follow the natural productivity/species-richness curve (this is gaining interest: we are increasing productivity of systems in general with environmental change going on, e.g., urbanization increases nitrogen/fertizilation, and a desire to know the result on diversity)
-- **** What is Primary Productivity "can of worms"
-- ***** Want to aggregate data at many different scales / communities to get many types of graphs; you want to ultimately go from smallest possible scale to largest scale
-- ***** decision making that goes in, before any analysis happens.  This goes into the data discovery/integration. Deana: can we build a repository of methodologies.
-- ***** primarily using herbaceous sites
-- ***** you do a broad category of the dataset, but not the details (a little bit of woody stuff, ...)
++ **** Every experiment found that as you increase productivy it decreases diversity, which doesn't follow the natural productivity/species-richness curve (this is gaining interest in the community: we are increasing productivity of systems in general with environmental change going on, e.g., urbanization increases nitrogen/fertizilation, and a desire to know the result on diversity)
++ **** What is Primary Productivity "can of worms" discussion
++ ***** Want to aggregate data at many different scales/communities to get many types of graphs; you want to ultimately go from smallest possible scale to largest scale (infer curve of graph)
++ ***** What decision making occurs before any analysis happens?  This goes into the data discovery/integration. Deana: can we build a repository of methodologies?
++ ***** You do a broad category of the dataset, but not the details (a little bit of woody stuff, and so on...)
 Lines 74-77 were replaced by lines 75-78
-- **** most of the time, productivity scales well, i.e., it is a linear scaling, e.g., anpp vs. area is a linear relationship. so for exmaple, even though they are smaller plots, you can get g/m^2 measures. Basically, productivity doubles as area doubles...
-- **** n addition decresases species diversity: plot at lter sites of anpp (above-ground primary productivity) versus relative species density; species density is the number of species observed in a given area
-- **** two studies, measures at different scales: either you simulate or measure a species area curve, and extrapolate to different areas.
-- **** N fertilization positivevly effects productivity negatively effects diversity
++ **** Most of the time, productivity scales well, i.e., it is a linear scaling, e.g., anpp vs. area is a linear relationship. So for example, even though they are smaller plots, you can get g/m^2 measures. Basically, productivity doubles as area doubles...
++ **** N addition decreases species diversity: plot at lter sites of anpp (above-ground primary productivity) versus relative species density; species density is the number of species observed in a given area
++ **** Two studies, measures at different scales: either you simulate or measure a species area curve, and extrapolate to different areas.
++ **** N fertilization positivevly effects productivity; negatively effects diversity
 Lines 79-84 were replaced by lines 80-85
-- **** species response to fertilization:
-- ****** in the case of increased fertility, can we predict what species we will lose?  What species will become dominant?
-- ****** are these responses contingent on system characteristics?
-- **** dataset: 8 lter sites, 28 community types (mainly vegetation classifications, based on the experiments/manipulations done at the sites); 831 species
-- **** dataset characteristics
-- ***** n added (g/m^2/yr), 10 (ARC), 9.5 (CDR), 60/wk (GCE), etc.
++ **** Species response to fertilization:
++ ****** In the case of increased fertility, can we predict what species we will lose?  What species will become dominant?
++ ****** Are these responses contingent on system characteristics?
++ **** Dataset: 8 lter sites, 28 community types (mainly vegetation classifications, based on the experiments/manipulations done at the sites); 831 species
++ **** Dataset characteristics
++ ***** N added (g/m^2/yr), 10 (ARC), 9.5 (CDR), 60/wk (GCE), etc.
 Line 90 was replaced by line 91
-- **** Often times, a matrix like this is constructed at the beginning of doing a "synthesis", and the most important points to track for each site are: treatment, sample size, duration, and replication.
++ **** Often, a matrix like this is constructed at the beginning of doing a "synthesis", and the most important points to track for each site are: treatment, sample size, duration, and replication.
 Lines 95-96 were replaced by lines 96-97
-- **** herbaceous Term applied to a nonwoody stem/plant with minimal secondary growth
-- **** example dataset:
++ **** Herbaceous term applied to a nonwoody stem/plant with minimal secondary growth
++ **** Example dataset:
 Line 105 was replaced by line 106
-- ** Both projects pretty muched took from the same original, "raw" data sets
++ ** Both projects pretty much took from the same original, "raw" data sets
 Line 108 was replaced by line 109
-- ** brainstorming:
++ ** Brainstorming:
 Lines 110-118 were replaced by lines 111-303
-- *** General Tasks: Identifying data that is relevant (talking with people), permission to obtain of the data, understanding the structure and content of the data (sampling design, how or what attributes mean), and then determining which can be appropriately can be integrated to do an analysis. From Jornada: [http://jornada-www.nmsu.edu/]
-- *** anpp versus r and a bunch of data points. how can the data points be linked back to a table of other features ... of the species that are involved in the point. the point represents the set of species in an area (the species "richness") ... the auxiliary table is characteristics of the species
-- *** as another example, compare what happens to the species between the points (as area increases); and more importantly the functional traits of the change
-- **** these are just visualization things
-- *** null model, and the null expectation
-- *** abundance vs loss prob for n-fix and not n-fix
-- *** standard toolset
-- *** measuring functional diversity is a big problem -- a computationally complex problem
-- *** for loss-prob and n-fixers; you need to do data integration: how many sites are needed, and so on ...
++ *** General tasks: Identifying data that is relevant (talking with people), permission to obtain the data, understanding the structure and content of the data (sampling design, how or what attributes mean), and then determining which can be appropriately integrated to do an analysis. From Jornada: [http://jornada-www.nmsu.edu/] (go to "Research Data" > "LTER Data" > "Plant")
++ *** anpp versus r and a bunch of data points. how can the data points be linked back to a table of other features ... of the species that are involved in the point. The point represents the set of species in an area (the species "richness") ... the auxiliary table is characteristics of the species
++ *** As another example, compare what happens to the species between the points (as area increases); and more importantly the functional traits of the change
++ **** Comment: These are just visualization things
++ *** Null model, and the null expectation
++ *** Abundance vs loss prob for n-fix and not n-fix
++ *** Standard toolset
++ *** Measuring functional diversity is a big problem -- a computationally complex problem
++ *** For loss-probability and N-fixers you need to do data integration: how many sites are needed, and so on ...
++
++ * __Biodiversity Change Analysis__
++ ** How to understand change in biodiversity: what are the factors causing the change in biodiversity
++ *** Given an attribute matrix, which are changing the most or least at particular segments in the species-area curve
++ *** Focus on Community structure
++ *** Niche breadth
++ *** Predict why diversity would declince / change
++
++ * __Deana's Example__
++ ** Query
++ *** Some data by keyword: "biodiversity", "species counts", "abundance", species names, functional trait
++ *** Location, e.g., place name, coordinates, bounding box
++ *** Sample method
++ *** Analysis type, e.g., counts, abundance
++ ** Construct logically and semantically equivalent views
++ ** Group on sample methods (transect vs plot)
++ ** Data
++ *** data1: transection 20m -> rarefaction (species-area curve) -> scale to 1 m interpolation
++ *** data3: transection 20m -> rarefaction (species-area curve) -> scale to 1 m interpolation
++ *** data2: plot 5m^2 -> species-area curve -> scale to 1 m interpolation
++ *** data4: plot 1m^2
++ ** Integrate transect data
++ ** Integrate plot data
++ ** Construct graphs
++
++
++ ''September 23rd''
++
++ * __Agenda Setting__
++ ** Next meeeting
++ ** Convert Deana's starting workflow to a specific analysis ...
++
++ * __Next Meeting__
++ ** February or March time frame
++ ** What we might speak about / do for next meeting
++ *** Get data, scripts, etc.
++ *** Get the biodiv workflow into Kepler
++ *** Maybe get familiar with kepler / ecological tutorial for feedback
++ *** Data integration examples
++ *** Given the technology -- what do you want to do?
++ *** IRC channel for BEAM? Weekly, bi-weekly scheduled meeting?
++
++ * __Group Breakouts__
++ ** Biodiversity ontology breakout
++ *** Participants: Deana Pennington, Kristin Vanderbilt, Evan Weiher and Rich Williams.
++ ** Biodiversity workflow breakout
++ *** Participants: Steve Cox, David Chalcraft, Shawn Bowers, Bertram Ludaescher, Mark Schildhauer, Chad Berkley, Dan Higgins, Jianting Zhang
++
++
++ * __Notes from ontology breakout__
++ ** The discussion broadly covered traits of plant communities important in biodiversity and productivity experiments and experimental methodologies.  The following notes are raw and will require considerable work to formalize.  As such, the categorizations suggested by the indented formatting should be regarded as preliminary.
++
++ ** __Traits of a Population__ (aggregated group of individuals) of Plant Species:
++ *** Abundance
++ **** Count
++ **** Cover
++ **** Biomass
++ *** Size
++ **** Height
++ ***** Mean, variance of “average height of the highest photosynthetic organ of a well-grown individual”
++ **** Biomass
++ ***** Avg above ground biomass of an individual
++ **** Avg Canopy size (area)
++ *** Path of Resource Uptake
++ **** Photosynthesis (C3, C4, CAM)
++ **** Nitrogen fixing
++ **** Microrhyzal fungi associations (Yes/no, Endo/ecto)
++ *** Modes of Reproduction
++ **** Clonality (None/clumping/branching).
++ ***** What is the definition used to separate clumping and branching?
++ **** Resprouting ability
++ *** Life Form/Habit
++ **** Grass
++ **** Forb
++ **** Subshrub
++ **** Shrub
++ **** Tree
++ **** Vine
++ ***** All these may be definable using other traits (height, woodiness, leaf shape, self-supportingness, perhaps others.
++ *** Life Span
++ **** Annual, biennial, perennial
++ *** Phenological Traits
++ **** Seasonality
++ **** Sprouting cue
++ *** Native/naturalized/non-native
++
++ ** __Traits of Parts of Plants__:  (note the important plant parts given by the trait categories)
++ *** Leaf Traits (Photosynthetic organ traits?)
++ **** Evergreen, deciduous (defined based on leaf longevity?)
++ **** Specific leaf area (Area/mass) or mass per unit area.
++ **** Water content or % dry mass
++ **** Many others ...
++ *** Root Traits
++ *** Stem Traits
++ **** Stem density
++ ***** Mass per unit volume
++ ***** Woody/nonwoody
++ **** Branching pattern
++ *** Seed Traits
++ **** Size
++ ***** Mass
++ **** Shape
++ **** Appendages/Fruits  -- closely related to dispersal categories, often highly correlated with seed size.
++ ***** Fly through the air
++ ***** Stick to an animal
++ ***** Eaten and excreted
++ ***** Cached for later consumption
++ ***** Ballistic dispersal
++ ***** Floating
++
++ ** __Traits of Interactions__
++ *** Competitive ability
++ **** Measure how an individual suppresses the growth of a neighbor
++ *** Interaction strength
++ *** Effect on environment (ability to reduce resources) (Tilman)
++
++ ** __Experimental Methods__
++ *** Experiment
++ **** Field Experiment
++ ***** Observational/Empirical Experiment
++ ***** Manipulation
++ *** All field experiments have
++ **** Where (site, plots etc)
++ **** When (sampling regime)
++ **** What (properties of organism/population/community/system)
++ *** (An empirical experiment is a field experiment with no manipulation)
++ *** Manipulations have one or more Treatments
++ *** Treatment has
++ **** What was treated?
++ **** Strength (amount), can be positive (addition) or negative (exclusion)
++ **** Temporal extent
++ *** When defining a treatment, a scientist might describe a substance (nutrient, presence of an organism) as being manipulated, or describe the manipulation of a process.
++ *** Sampling Regime
++ **** Random
++ **** Stratified
++ **** Stratified random
++ **** Nested
++ **** Regular (uniform)
++ **** Haphazard
++ **** Random haphazard
++ *** Note that the choice of a sampling regime (and of plot layout?) constrains the possible statistical analysis techniques that can be applied.
++ *** Traits of Experiments
++ **** Balanced or unbalanced sampling
++ **** Replication
++ *** Traits of Treatment Regime
++ **** Factorial (all possible combinations of treatments) or not
++ **** Random factors (treatments along a natural gradient)
++
++
++
++ * __Notes from workflow breakout__
++ ** Workflow based on part of Steve and David's Jornada analysis
++ ** [http://jornada-www.nmsu.edu/studies/lter/datasets/plants/nppqdbio/data/nppqdbio.htm]
++ ** General steps outlined:
++ *** Data Request
++ *** Quality Control and Assurance (if from different sites)
++ *** Data Integration
++ *** Quality Control and Assurance (of the integration)
++ *** Analysis
++ *** Capture result of analysis …
++ ** Workflow we examined:
++ [http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/kr-sms/docs/beam_kr_sms_meeting_sept_04_workflow.png]
++ ** Useful Actors
++ *** List Summarizer
++ **** A set of values in a data column
++ *** List Comparator
++ **** Given two sets (lists), do they match?
++ **** Which ones in the first list aren’t in the second
++ **** Assign first list values to new values
++ *** Nested Transpose
++ **** (site, taxon, count)
++ **** {(A, x, 3), (A, y, 1), (B, y, 4), (C, z, 2)}
++ **** Transpose to:
++ ***** (site, x, y, z)
++ ***** {(A, 3, 1, 0), (B, 0, 4, 0), (C, 0, 0, 2)}
++ **** Notes about this from Bertram and Shawn after meeting:
++ ***** Given an annotated schema S, denoted S*. And a white-box actor q s.t. q(S*) -> S’. We want to “push through” the annotations to obtain S’*.
++ ***** The “nested” transpose is basically a combination of various lower-level algebraic operators, such as (theoretical) group-by, matrix transpose, projection, etc. So, given q as such a plan of operators, can we reason over the plan (white box-actor) q to obtain S*’? Using symbolic manipulation? Using the chase, e.g., for similar problems in integrity constraints?
++ ** Often-found pattern of computation
++ *** Can Kepler/Ptolemy efficiently and conveniently support the following pattern?
++ *** Given a data set, construct a scatter plot for pairs of variables, allow user to select a subset of the plots -or- pairs of variables of interest, return data subsets based on chosen pairs (with no extraneous variables)
++ *** Similarly, given data sets, an actor computes a set of regressions, the user is shown the results, the user selects the regressions of interest, and the workflow then proceeds using only those selected regressions
++ *** These "patterns" can be supported now (with lots of plumbing) using the browser actor. Can we also add functionality to better support/model these patterns?
++

This material is based upon work supported by the National Science Foundation under award 0225676. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).


Long Term Ecological Research Network, UNM	National Center for Ecological Analysis and Synthesis, UCSB	Biodiversity Research Center, KU	San Diego Supercomputer Center, UCSD


Arizona State University	Napier University	University of North Carolina	University of Vermont


UC Davis Genome Center