Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



All Hands Meeting 2005 Taxon Agenda And Notes

This is version 25. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]


Tuesday Afternoon

  • Laura Downey, Going Forward Presentation
    • Reviewed Group Plans and RoadMap from Winter 2005 EstesPark meeting
  • Xianhua Liu, Concept Mapping and authoring tool: demo and discussion
    • Laura identified usability issues of interest and will follow-up
    • Discussion of next steps, including generation of TCS and object model considerations
  • Martin Graham, Collections visualization tool: review of prototype
    • Limited discussion of next steps, first impressions

Wednesday Morning

Taxon-Kepler Interaction Design and Engineering Discussion with Dan Higgins

  • Integration of DiGIR and Kepler in two SEEK use case work flows
  • Responsibilities of TOS versus those of Kepler
  • Need for user interaction
  • Usage scenarios for TOS within Kepler

The 'selection problem' How do users *find* and then *select* the focal concept for searching and parameterizing an actor, with the concept one wants to work with? 1st generation, no interactivity, beyond filling in an Actor parameter file. Rob has wrapped current work flow into a custom actor, interactivity would come later, and maybe then passing parameter files to the actor from an application outside of Kepler would be possible.

Problem with names coming out TOS query being mapped to more than once of the concepts specified for DiGIR query for GARP input.

Looping issue in GARP usecase workflow: going through list of concepts to find synonyms to identify overlaps (names with 2 or more concepts), Gales & Jones, Action: make a custom TOS actor instead of modifying web services actor (which came from GEON)

SEEK-Taxon would like to see more possibility for an interactive UI within Kepler for TOS query and selection tasks. Alternative is to stop, restart and repeat short workflows as the way to introduce 'interactivity' for the user to test steps before running the complete workflow.

SEEK Use Case #2 need a taxon concept merging tool. SMS is working on merging other types of parameters across site data sets. (Get a list of species from LTER data sets, assume they are in EML, input to TOS, output unique list of merged names,

  • Must mark up taxon names as concepts with GUIDs in the EML data sets first, GUIDs would be the concept IDs. Need a tool to do that, Morpho is the likely app. If no GUID in the data set, a call to TOS could match on data set taxon name and a taxon reference.

  • Kepler work flow scenario: Kepler has Actors that can actually merge the data, need some user interaction with TOS to decide which level of lumping the user wants, include synonyms, concept overlaps, go up a level, etc.

    • if the GUIDs match between data sets, we merge
    • if the names from two data sets match (derived from GUIDs in the date set) in TOS, we combine them
    • if concepts match in TOS (other TOS operations)
    • (Incomplete, Jessie and Bob discussed)

Wednesday Afternoon

  • Extracting concepts from online and monographic sources, just mammals?, (Susan, Aravind)
    • Currently: 1600 PDF documents obtained, 100 are bat taxonomy papers, extracting data next
    • Desired: extract parent-child hierarchies, descriptions, synonyms,

  • TOS Data acquisition roadblocks, process, role of software tools, getting data into TOS with an import tool for usability testers.

    • Currently: ITIS ("relational" concepts), Bats from MSW 2005 (MSW concepts from original pubs and synonyms without bib references), MSW 1993, FNA in TCS, German Mosses (concepts with concept maps),

    • Requirements: TOS Actor could be tested with plant data,

    • Possible: Use Bob Peet's plant concept data (44,000 with relationships?)with PLANTS county level distribution data as a supplement to bat use case scenario, Ranunculus data set, 8 classifications, NA & Mexico.

    • Action: Stinger Guala to identify in 2 weeks the number of versions of PLANTS, send to KU to DiGIRize, Bob will send rich treatement of plant concepts the US Southeast. Available now: Ranunculus data, it needs to be upgraded to latest TCS, Xianhua will do that in 1 week. Jan or Feb: All USDA Plants in version 4, mapped against all plants in FNA, and also all of the Alan Weekly collaboration, his version mapped against 8 different classifications of plants.

  • Broader TDWG and community issues for TCS and TOS. What are their expectations? Our responsibilities? Can SEEK demonstrate the utility of TCS and TOS within the next year? What tools do we need to complete and harden for the community to buy into TCS and TOS? What can we expect from the GBIF community? Short term versus ultimate objectives

    • S-T TOS/TCS External Roles/Personas
      • S-T might be a concept provider for other people to test their concept applications,
      • S-T might be a concept repository for projects looking for a place to store them, might need a batch import process if requested.
      • S-T *must* use GUIDs because there will be multiple concept object servers
      • S-T if TOS is a global reference implementation, then we need to implement the whole TCS schema in TOS, we could not implement just some TCS fields, we would need to input and output 100% standard TCS documents.
      • Schema changes over time, do we need to maintain records in each version forever?

      • S-T TOS/TCS Internal Roles/Personas
      • See SEEK use cases 1 & 2 and other related logic

Discussion of the data independence problem with DiGIR queries that have the same name for 2 or more concepts. Solutions: Give user option to allow duplicates, eliminate duplicates, combine overlaps or go interactive and give user alert that name used for multiple concepts, ot just log errors in a sideband pipeline.

  • Status and review of Usability process:(Laura Downey with status powerpoint slides)
Discussed plans for interviewing Alan Weakly and Mouse guy. Discussed plans for getting user feedback on mapping tools and on vizualization tools

  • May 2006 EOT Workshop Planning (Pennington, Beach)
Reviewed Beach draft agenda, majority felt it was too broad, too many topics with time for limited detail, and all topics would not be of interest to a diverse workshop audience. Suggestion that we pick a research theme like doing niche modelling and targeting a workshop for just niche modellers, or alternatively having a 2-day workshop on just taxon tools and concept data management.

Thursday Morning

  • Modifying EML to accomodate taxonomic concepts with Matt Jones
    • Issue: Need to have concept metadata to EML, so that Kepler Actors could work with concept data in data sets. Could be just the GUID, if available, or at least a name and a reference or a relation of the concept in the data set to a known concept in TOS.
    • Matt: context of current EML handling of taxon data. EML is already designed to contain species name information. 9000 EML data records now, planning version upgrade now. Matt: We need a full proposal that is well fleshed out before Matt can put the proposal changes into the EML maintenance process. US Federal Biological Data Profile (BDP) wanted classification tree in the metadata. Does not allow repeatable taxonomic classifications, but EML does to allow for taxa in different trees (e.g. animals and plants) in one record. Matt has asked them to allow for multiple classifications but the BDP committee is now largely defunct, agencies not meeting anymore on this.

    • 1 possible plan (Matt): Taxon would need to tell the EML group, what is minimally needed to met the needs of the SEEK project.

Discussion of other standards that handle names, DC, ABCD, SDD should handle concepts. Standards need to be crosswalked. ABCD overlaps with EML on collections metadata. Natureserve Observation group overlaps with occurrence data. Needs to be some coordination. Standards need the GUID bit and the human readable reference. A small common concept citation (reference) schema, for a few fields, across standards would be very useful, in addition to the taxon name

Two options: modify EML to support minimally required needs or wok with the community to get an agreement across projects.

We could really use a poster which describes the overlap and activity of these various infrastructure project.

Discussion about using TSNs and ITIS and PLANTS lack of versioning and consistency with their IDs, making them ess valuable.

Whatever is proposed as required fields must be useful to end users. Ecologists must see added-value. Morpho has an ITIS plugin (ITIS-lib) to look up names in ITIS, and grabs the synonyms when the record is stored. Could be added functionality for the end the user at cataloging time to get the synonyms then.

Morpho: we still need a way in Morpho to add concept data. Matt, problem is how to convince people to add the data, right now code definitions (e.g. taxon name codes for a study, maps data set codes to names in the data set) can be put into EML. The section on taxon coverage in EML does not currently handle mapping information of any kind.

Morpho: Adding TOS lookup to Morpho (Java,Swing) 4 weeks? Simple plugin, to use GetBestConcept, from a lookup up popup in the Taxon data entry table. Xianhua might be able to work on this, his tool also needs to work with data from the TOS, that upgrade needs to be added.

UBER-Discussion

Actions:

Nico will analyze the design needs for modifying Morpho to add a TOS lookup during meta record creation.

Xianhua will nominally consider working on the implementation of that using existing GetBestConcept service.

Xianhua will make the relationship mapper compliant with TCS. TCS record import and export.

Bob will make his data available: Ranunculus data, it needs to be upgraded to latest TCS, Xianhua will do that in 1 week. Jan or Feb 2006: All USDA Plants in version 4, mapped against all plants in FNA, and also all of the Alan Weekly collaboration, his version mapped against 8 different classifications of plants.

  • Bat Data Subproject for Deomonstration Purposes

    • Objective: to exercise TOS and mapping, and to produce a nice demo.

    • Tasks:

      • Jessie: It would be useful to have bat treatments before 1993 to map them to 1993 treatment. Nico: we need to have a person to do that, we need a taxonomic expert to make those mappings.

      • MSW Bat data need to be mapped between 1993 and 2005 versions. Kate Jones demonstrated how they did the mapping in interview with Nico and Laura, the actual mappings can be identified from explicit annotations in the 2005 publication. But publication notes will likely not be explicit as to the exact mapping operator. We need someone to read the treatment and try to extract the mappings. We do not currently have a copy of the treatment. Diane Reeder and Don Wilson are the authors of this information and created the source documents.

      • Susan's project will look at the 100 treatments she has since 1993 and will try to see if the concepts in those were adopted and/or mapped in the 2005 treatment. Then we can compare the actual 2005 tree with pieces that would have ben predicted from the treatments, then we can evaluate how close we get to truth.

Thursday Afternoon


Attachments:


Go to top   More info...   Attach file...
This particular version was published on 27-Oct-2005 11:17:14 PDT by KU.beach.