Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Knowledge Representation

Difference between version 10 and version 9:

At line 8 added 1 line.
+ !!The attribute entity
At line 9 added 28 lines.
+ Do data attributes refer to the same entity or thing? For example, it would not be sensible to merge an attribute for a spatial area with one for a temporal duration.
+
+ The word "observation" is an overloaded term, and what constitutes a scientific observation is debatable. For example, an observation may be thought of as the tuple, which captures an associated group of attributes along some common thread in space and time. Conversely, an observation can be thought of as each individual measurement or cell, such as date, time, place, species, and height. A whole data set can even be thought of as an observation of some broader scientific concept, such as "productivity" or "ecosystem functioning." In the observation ontology, we define observation as an entity that is distinguishable from the other entities in a data set; for example, a location, time or organism. More than one characteristic (or property) may be recorded for a given entity, translating to more than one attribute in a data set. Our goal is to be able to distinguish the different entities in a data set so that we can describe how they are contextually related to each other. For example, a study location may provide context for a focal organism. However, we may record several characteristics of the focal organism, such as its taxonomic identity and weight. Because these characteristics both belong to the organism,
+
+ In our knowledge model, we define observation as an entity that is distinguishable from the other entities in a data set. More than one characteristic may be recorded for a given entity, translating to more than one attribute (or column) in a data set. Our goal is to be able to distinguish the different entities in a data set so that we can describe how they are contextually related to each other. For example, a study location may provide context for a focal organism. However, we may record several characteristics of the focal organism, such as its taxonomic identity and weight. Because these characteristics are
+
+ !!The attribute characteristic
+
+ Are the data attributes capturing the same characteristic of the entity being recorded? For example, two attributes might both pertain to an organism, but one the organism height, and the other weight. Attributes must refer to dimensionally or semantically compatible characteristics (or properties) of the entity.
+
+ !!The attribute (measurement) standard
+
+ Were the data attributes recorded using the same standard? Characteristics of entities can be recorded as data in many ways, including as physical quantities, names, or dates. For example, height of an organism might be measurement in meters for one data attribute, feet in a second attribute, and nominally as "tall" or "short" in another. Not only should there be the ability to convert among measurement standards, but also the ability to map qualitative standards to qualitative standards if the necessary information exists as metadata (e.g., "tall" = 10-20 meters).
+
+ !!Attribute precision
+
+ If attributes are quantitative, with what precisions are they recorded? For example, if two attributes were measured with different precision, then precision must be reduced to the lowest precision before merging. Precision is dependent on units, and should be normalized following unit conversion.
+
+ !!The attribute context
+
+ Possibly the most important and non-trivial aspect of attribute merging is correct alignment of contextual (mereological) dependencies. When aligning multiple attributes, it is necessary that the spatial, temporal and material containment hierarchies align or are, at least, made explicit. For example, a nesting sampling design "location <- biomass", is not directly compatible with a second design "location <- plot <- biomass". Merging the data sets and ignoring "plot" will deflate biomass estimates in the second data set. Knowledge of the nesting structure of the data sets indicates that biomass must be scaled by "location" area of the first data set before merging.
+
+ !!The attribute (spatial or temporal) scale
+
+ Were the data collected at the same spatial or temporal scale? Biomass of plants collected in a 1 square meter plot cannot be merged directly with biomass collected in a 2 square meter plot without normalizing the spatial scales. Sometimes such normalization can be handle by simple scaling (i.e., multiplication), but other time may require more complex curve fitting or rarefaction techniques. Although the ability to fully automate scaling may not be plausible, it is required that the potential need for scaling can be detected.
+
+
+

Back to Knowledge Representation, or to the Page History.