Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Nico Taxon Mammals

SEEK Taxon and SEEK's Prime Use Cases - North Americam Mammals (NMF - Nov. 2, 2004)

  • This is an attempt to summarize our thoughts and positions coming out of the November 1 SEEK Taxon afternoon meeting. We mostly discussed our role in SEEK's Prime Use Cases. We have decided that thinking about Use Cases and how to attract expert user communities are our top priorities now.

  • We understand at the moment that BEAM has two major prototype Use Cases; one dealing with long-term productivity analyses of plants (closely related to LTER New Mexico activities); and the other concerning the distributional dynamics of North American mammals. So far as we understand, the former Use Case involves ecological units, things like "grasses" or "shrubs" (or plants grouped according to physiological similarities, but not taxonomic names). We therefore think that SEEK Taxon's role in it will be minimal. The latter Use Case, on the other hand, will be using taxonomic names and -concepts. It is the one we talked about. In order to do a solid job with this Use Case, SEEK Taxon needs to clarify a number of issues very soon.


  • Apparently no detailed document about the aims and procedures of the NA Mammal Use Case exists. The CVS has a few smaller (2-page) informal papers, but none of them describe the Taxon-related activities sufficiently. We know we are dealing with maximally a few thousand species (no more than 400-500 specifically in the United States). The time range of specimen records comes (among other sources) from Museum collections, so it may span more than one or even two centuries. We also know that most of the specimen records will come from MANIS which can be accessed through DiGIR. Dave Vieglais estimates there may be 100,000 to 200,000 records in total.

  • The NA mammal distribution records in MANIS (and other sources) are tied to taxonomic names. It is SEEK Taxon's role to mark up these records as taxonomic concepts – i.e. attach a "sec." (according to…) label which specifies a reasonable literature source from which the name definitions (may) stem. These are in our view identifications of observational records to existing taxonomic concepts published elsewhere. So we are not dealing with 100,000 to 200,000 concepts per se (i.e. 1 for each specimen record), but with a much smaller number which will likely be only somewhat higher than the number of taxonomic names/species, since mammal taxonomy has been relatively stable.

  • Nevertheless, even if we had with only 400-500 taxonomic names to process for North America, we are still faced with possibly 100,000 to 200,000 record-with-a-name-to-concept identifications, in order to do our job.

  • If we want to achieve the 100,000 to 200,000 record-to-concept identifications and also honor the times and localities at which the mammal records were created, then we are dealing with an unknown number of corresponding taxonomic concepts. For example, we cannot just identify all DiGIR records to concepts published in a recent (2000 or younger) authoritative checklist of NA mammals. That would mean that a specimen from (say) 1950 would be identified to a concept published in a reference that did not exist at the time when the specimen was identified. It would thus shift all the tasks SEEK Taxon is supposed to make transparent and dynamic in the database back into heads of the persons who right away resolve all name/concept ambiguities to a current view. The taxonomic concept integrations and relationship assessments would have happened in someone's head and not be aided by SEEK Taxon "cognitive" services and tools. So for the record-to-concept identifications to be done right, we need to use a concept pool (more or less) adjusted to the time range over which the specimen collection records were created.

  • Just how many reasonable standard NA mammal references have been published over the past 100-200 years – we do not know. But those references make up the initial pool of candidate concepts. If for example there are 10 major references from 1800 to 2000, and on average they contain 300 concepts for the NA mammal species, then we have a concept pool of 3000. The 100,000 to 200,000 records would have to be identified to concepts making up that pool.


  • From the above it is clear that we have a problem. We clearly want and need to contribute to one of SEEK's prime Use Cases. But we have almost no idea about – and have not been approached or told by anyone leading the Use Case – how to solve the issue of associating each of the NA mammal records with a pool of potentially a few thousand concepts.


  • Of course we could make it "work" somehow. For example, we could use the Taxonomic Exchange Schema and transform all the names associated with the DiGIR records into concepts. The references specified by the "sec." following the names would then be the people who made the identifications at the time, e.g. Canis lupus "sec." Smithsonian mammal curator Hank Hankster, 1950. DiGIR makes available this kind of information from specimen ID labels wherever it exists (sometimes all we have may be "sec." Smithsonian Institution). But that would mean creating 100,000 to 200,000 new concepts. It would thus create the kind of unwarranted "inflation" of poorly circumscribed concepts that (as we all realize) most threatens a wider acceptance of the concept approach. If every occurrence of a name in any publication (very broadly defined to include DiGIR records) qualifies as a concept, then the whole approach will fail. So this solution will hurt SEEK Taxon's long-term strategy and aims significantly, and make us look a bit hypocritical.

  • Another feasible solution would be to restrict the concept pool for the record-to-concept identifications to NA mammal checklist currently available on-line, e.g. ITIS, the Smithsonian mammal web pages, and a few others (regional on-line checklist, etc.). But then we would incur the "intellectual sloppiness" of identifying old specimen records to concepts published much later. Because the concept pool would be small and temporally restricted to recent publications, we would also minimize the potential for resolving concepts among each other. This solution is messy but perhaps still better than the first. But again SEEK Taxon would fall far short of practicing what it has preached.

  • Of course the core issue with the NA mammal Use Case is the presence or absence of taxonomic experts contributing to it. We currently have no experts involved. In order to illustrate SEEK Taxon's worth, we need experts to assist us in the 100,000 to 200,000 record-to-concept identifications. We need to try to recruit them now. If we cannot muster enough support by mammal taxonomists, then we should opt for a relatively convenient solution (likely the second option above) to the NA mammal Use Case, and concentrate most of efforts on another still unidentified Use Case (e.g. a NSF-Planetary Biodiversity Inventory project) where expert input will be available. We think that would be in long-term interests of everyone involved in SEEK.



Go to top   Edit this page   More info...   Attach file...
This page last changed on 04-Nov-2004 14:00:04 PST by NCEAS.franz.