Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Definitions Assumptions And General Architecture

Definitions:

  • Taxonomic Schema (Tax Schema): A group of attributes that define a taxonomic concept. The only required element is a name. There are other elements, such as publication date and author, circumscription, etc, that will more fully define a taxonomic concept, but may not be present. The current version is defined at http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/projects/taxon/schemas/

  • Concept database (Concept DB) : Database containing Taxonomy 'trees' from different authorities, such as ITIS, Species2000, and others. Taxonomic concepts are stored and hierarchical relationships are defined here.

  • Taxonomic Object (Tax Object) : A group of attributes that follow the SEEK Taxonomic Schema. In the Concept DB, Tax Objects are nodes in trees from Name Providers. These objects are returned in XML from queries. Since a name is the only required value in the Tax Schema, that is the only required value in a Tax Object.

  • Taxonomic Concept (Tax Concept) : A Tax Concept is a Tax Object within the Concept DB. It may or may not have Tax Relationships and may or may not be part of a Taxonomy Tree.

  • Taxonomic Relationship (Tax Relationship) : Relationships can be parent, child, sibling within a tree, or custom relationships defined by "authorities" or calculated by the system. A schema will be defined for a relationship.

  • Data Provider (DP) : Online datasets or databases, etc that contain names of organisms, usually formal, Latin, but also to include common names used in data sets. Examples: Ecological data sets, museum catalog databases, abstract databases such as BIOSIS's Zoological Record.

  • Taxonomic Concept Provider (TCP) : Online sources of names of organisms, usually taxonomic and latin, but could also be a common name server. Some examples: ITIS, SPECIES 2000, Prometheus and other taxonomists who provide details of a particular taxon. The definition could be extended to sources of names for other kinds of things besides organisms. These have occasionally been called Authorities, Name Providers, Concept Providers and Aggregators (because they may aggregate concept hierarchies from other sources).

  • User : An Ecogrid or outside application which sends a request to the SEEK system.

  • Queries : A request sent to the system by a user. Responses are returned in XML format according to a pre-defined schema. Taxonomic Objects and Taxonomic Relationships will be returned.


Some Assumptions:

  • SEEK maintains a 'Taxon Concept database' defined above. Implementation details are not specified here - it could be distributed or centralized. For prototype and testing purposes, it will be centralized.

  • All DPs registered with SEEK provide a service to query for all concepts/names stored for each dataset in their system according to an agreed XML schema. This return schema will include all or part of the Taxonomic Schema.

  • The SEEK system will have a function to extract (using the above DP service) and store concepts from individual datasets. Storage will not be in the Concept DB. This is not a primary function of the Taxon group.

  • The Tax Schema and Tax Relationship Schema (yet to be defined) are common formats to be used in any data exchange.

  • A TCP must have some interface that allows SEEK to consume its information.

  • DPs must mark up datasets using standard format for taxonomic information (EML or modified EML).

  • Workflow might include review -> publish step


General Plans (Architecture for newest schema to follow)

  • Python continues to be the language of choice for Taxon development. My-SQL is the database of choice. SOAP will be the communication protocol for Taxonomic Name resolution.

  • The current prototype including query of the KNB metacat server for dataset will be completed tout de suite, general architecture reviewed, modified, and published, then development including the new taxonomic schema will ensue.

  • We will extend the Python SOAP library to meet our needs.

  • IR algorithms for scoring equivilance between concepts will not be addressed until we finish a new prototype (with new schema) to the current stage.

  • Nearly all UseCases for the Taxon group are built on 2 main methods: Resolve-Concept and Compare-Concepts.

    • Resolve-Concept : method takes a Tax Object (could be just a name), and returns a ranked, scored list of matching concepts.

    • Compare-Concepts : method takes two or more concepts and returns an equivilance score (for each combination?). The score is calculated in the same way as Resolve-Concept.

  • An additional API will be built for the testing phase of this project - Resolve-EML-To-Concept.

  • We will depend on another part of SEEK to convert any metadata (including but not limited to EML) to a taxomomic object (based on our schema) to be used for querying the Taxonomic Concept Database. For testing purposes, we will implement a tool to query the KNB metacat server for datasets and return taxonomic objects from EML.



Go to top   Edit this page   More info...   Attach file...
This page last changed on 06-Jul-2004 14:32:12 PDT by LTER.stekell.