Knowledge Representation

This is version 9. It is not the current version, and thus it cannot be edited.
[Back to current version] [Restore this version]

Introduction

The range of different data structures for ecological information makes data sets difficult to align and merge for synthetic research. The Ecological Metadata Language (EML) has accomplished much in terms of making ecological data discoverable and accessible. But once data are accessed, time must be spent determining if the various data sources are both semantically compatible and structurally convertible so that they can be normalized before merging. The aim of the Knowledge Representation group was to develop a knowledge model for addressing the necessary semantic considerations for aligning and merging disparate ecological data. In pursuing this practical aim, other powerful semantic capabilities have been realized, such as semantically enhanced data discovery methods that improve upon current text-based and keyword methods. These improvements to data discovery include the ability to explore data--which we consider part of the discovery process--using summarization techniques that are enabled by the knowledge model.

Most observational data sets are a series of attributes (e.g., data columns in a table), in which instances among the attributes are related in space, time, or part. A related group of data instances is generically called a tuple, but can be thought of as a row in a typical data table. The objective for data set integration is to ascertain if two or more attributes are semantically compatible, and if any structural conversion or scaling must be undertaken before merging them together. At the scale of the attribute, integration may appear trivial. However, the context and scale that data were collected must also be compatible, which requires cross attribute knowledge. For example, an attribute "weight 1" might be compatible with a second attribute "weight 2" in that they both are continuous quantities with easily convertible measurement units. However, the first weight might pertain to all the grass biomass in a 1m2 plot, and the second in a 2m2 plot; or maybe one pertains to trees and the other fish; or one was collected in the Alaska, and the second in Indonesia. To automate the alignment and integration of ecological data sets, the knowledge model must contain the necessary machinery to reason not only between attributes in different data sets, but also among the attributes within the same data set.

The objective of this technical note is to present a knowledge model, specifically design for ecological data integration. Called the observation ontology, the model breaks down scientific observation and measurement into all the components required to understand if data are semantically compatible and structurally convertible for merging.

Knowledge Representation

Introduction

Outline of the Extensible Observation Ontology (OBOE)

Semantic annotation of ecological data using OBOE

Proposed OBOE extensions