Beam Knowledge Rep Jan 04
- Use Case Development
- What are the reusable components for niche-modeling (GARP, GRASP, layer integration, etc.)?
- What other models/actors would be used pre- and post GARP?
- It would be nice to have some concrete examples of scenarios, see Matt's scenario below as an example. It would be nice to have some use cases for the various versions of niche modeling being considered (native species, climate change, and so on).
- Data Integration:
- What are some examples of data integration problems for niche modeling (again, see below)?
- What about component integration, for reusing algorithms/components within niche-modeling, or post- niche modeling.
- Identify recognized example datasets that we can use (that weren't crafted to fit a specific model implementation)?
- Integration / Reuse
- What components exist, for example, for layer integration and so on? What components need to be explicitly written, and which are functions of SMS?
- What are the input and output data types (schemas) for niche modeling components?
- How should we handle GARP parameters?
- (Deana): Figure out whether we can/should split up GARP into separate actors. I think we should, if it's not too big of a programming issue.
- (Deana): Reusability of actors for other purposes. Example: the GARP model was developed for ecological niche modelers, but it is really just a logistic model that works on independent variables that are in the form of environmental layers, and dependent variables that are point data. It could be used in any other discipline that had similar needs. For example, if one wanted to predict locations in Florida that are susceptible to sinkhole formation, you could input the location of known sinkholes (points) and relevant environmental layers (hydrology, chemistry, topography, etc), and the GARP model would work just fine. I think this is a "user scenario", like Shawn is asking for. I can spend some time next week coming up with some of these, maybe with the help of the other domain scientists who are there.
- Semantics of Niche Modeling:
- What is the ontology for niche modeling / GARP / GRASP?
- What is the ontology for inputs/outputs of components?
- What ontological information is required for pre- and post- niche modeling.
(Deana): I think, if I understand everything correctly, we need ontologies for:
- Data discovery
- Integration of selected datasets
- Actor semantics
- Pipeline semantics
(Deana): I think we also need to consider some sort of standardized metadata for actors/pipelines (metamodel? metaanalysis?) so that actors/pipelines can be searched and reused.
A scientist is interested in the native range of an oak species. The scientist first creates a semantic query -- a query posed against ontological information -- requesting (ecogrid) datasets that can be used as occurrence data for a particular oak species ('quercus rubrum'), over a specific spatial footprint, and over a specific time period. (This example is expressed over the space, time, and taxa context of measurement.)
The scientist then issues the query using the semantic mediation system, which performs a series of steps to construct the necessary underlying queries (query rewritings) to the ecogrid. The underlying queries return a set of datasets. These returned datasets are then further manipulated by the mediation system. For example, the datasets returned may need to be joined (to extract the occurrence data), pruned to fit into the desired footprint, converted to the correct presence measure (for example, the value '1' for presence), and irrelevant fields removed. At this point, the scientist may wish to remove some of the candidate datasets from further analysis. The datasets are then combined (unioned) to form a single, uniform input table.
Next, the mediation system uses the implied footprint of the input table to query for (again, using the ecogrid) relevant environmental layers. The resulting layers are then integrated, which involves clipping the returned layers to the implied footprint, re-gridding the datasets to the same scale (based on the density of the presence/absence point datasets and environmental layers), and re-projecting the datasets to a common projection scheme (so that points? are correctly placed on a flat map).
Finally, the the rule set and resulting predication map are stored (in ecogrid), with appropriate metadata.
- Take the equations of Roughgarden text and import into Kepler (then send it to her to integrate into her textware)
- How should MOML / EML / OWL be glued together? (Rich, Shawn, Chad)
- Consensus that we should define a set of reusable models (...)
- Rich gave GrOWL demo and ezOWL demo (onto visualization tools)
- SPECIFY group at KU (a platform for capturing data); Stan Blum has a nice interrelationship between concepts in biodiversity data; disintegrates when you want to discuss how to use the data
- (Town) Experiment using datasets with the GARP model in Kepler
- //Point data// (lattitude/longitude for a specific species of an barrd owl...)
- //Pure validation/sampling// (sample of data points): plot points, impose an appropriate sized grid (automatically doing this needs additional specification and is based on the purpose of your study), then select grids (like checkerboard) for samples and data
- (Deana) Build a sampling ontology for this (in terms of spatial ontologies)?
- (Town) Desktop Garp (lifemapper / desktop garp)
- Select Data points
- Load (one) data set (ESRI shapefile, Excel, text file: species longitude lattitude)
- Gives species list of the different species in the species column
- Independent validation is a pure random sample (finest grid possible -- one occurrence point per grid)
- Percentage (default is one half) of the data points are specified used for training
- Select layers (note that layers and data points are chosen in any order)
- select a layer dataset, and can select some of the layers in the dataset to use
- can choose from all selected layers (default), all combinations of the selected layers, or all combinations of size n
- Optimization Parameters
- number of runs
- convergence limit
- max iterations (number of generations to converge)
- rule types considered (atomic, range, negated range, logistic regrission)
- all combinations of the selected rules
- Best Subset Selection Parameters
- percent omission
- total models (a ruleset that results from one run of the genetic algorithm) under hard omission threshold
- % of distribution (commission threshold)
- Projection Layers (where to apply the resulting rule sets)
- Available datasets
- Current datasets for project
- maps (bitmaps, ascii grids, arc/info grids)
- models (all or best subset)
- output directory
- Do we want to reuse models (rulesets)? Store them? Metadatify them?
- (Town) For doing various types of predictions (invasive species, genetic modification, ...)
- (Town) Store a model (ruleset) and reuse is often useful for different types of predictions (as outputs of GARP)
- (Town) There are some motivations for integrating rule sets, e.g., for predicting based on predator prey
- Semantic Mediation / KR within niche modeling and BEAM:
- Concensus that there aren't semantic challenges in the current vision/implementation of GARP in Kepler
- Many of even the interopolation tasks are not kr-problems (geospatial)
- There are some types of data that are semantically difficult, like productivity data, soil, biodiversity
- Ecological nich modeling: not much done on soil characteristics
- Terry's soils aggretation tool (takes soil data, water data, etc., and generates new, derived datasets useful for plant niche modeling)
- "High-resolution" route (finer resolution); as you go to the fine scale, the issues of resolution becomes more of an issue
- Climate and climatic variables are a big problem in terms of semantic issues; a group in LTER that we could bring in to deal with the issue (if we wish to incorporate climate info into BEAM)
- Taxonomic issues, e.g., all mammals
- Issues related to productivity, what is meant by diversity and biodiversity
- what does mean daily temperature, what are growing days, and so on, the different climate variables
- The interpretation of the model (ruleset) is dependent on the framework of running the model (the assumptions used in building the models)
- Continental scale mammal climate change; what are the semantic issues?
- Mammal Taxonomic Concepts
- US National Museum data would be heterogeneous from DiGIR data
- Current, future, and past climatologies need to "speak" the same language
- Climate layers (future and current) need to be compatible
- Without the current climate you need to understand how to integrate monthly and daily temperature datasets
- Process that involves going from weather data to climate data
- Papers: Grinnell, 1917, 1924; Hutchinson, 1959; McCarther 1972 (Geographic Ecology);
- Mark Schildhauer, Rich Williams, Shawn Bowers, Bill Michener, Dave Vieglais, Ricardo Pereira, Deana Pennington, Chad Berkley, Steve Tekell, Terry Dawson, Town Peterson, Bob Waide, Manu Juyal.