Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



All Hands Meeting 2005 Taxon Agenda And Notes

Monday Afternoon

  • Jessie Kennedy presented SEEK Taxon update and plans (PowerPoint) will be put in CVS as part of plenary session

Tuesday Morning

  • Plenary Session

Tuesday Afternoon

  • Laura Downey, Going Forward Presentation (see attachment)
    • Reviewed Group Plans and RoadMap from Winter 2005 EstesPark meeting
  • Xianhua Liu, Concept Mapping and authoring tool: demo and discussion
    • Laura identified usability issues of interest and will follow-up
    • Discussion of next steps, including generation of TCS and object model considerations
  • Martin Graham, Collections visualization tool: review of prototype
    • Limited discussion of next steps, first impressions

Wednesday Morning

Taxon-Kepler Interaction Design and Engineering Discussion with Dan Higgins

  • Integration of DiGIR and Kepler in two SEEK use case work flows
  • Responsibilities of TOS versus those of Kepler
  • Need for user interaction
  • Usage scenarios for TOS within Kepler

The 'selection problem' How do users *find* and then *select* the focal concept for searching and parameterizing an actor, with the concept one wants to work with? 1st generation, no interactivity, beyond filling in an Actor parameter file. Rob has wrapped current work flow into a custom actor, interactivity would come later, and maybe then passing parameter files to the actor from an application outside of Kepler would be possible.

Problem with names coming out TOS query being mapped to more than once of the concepts specified for DiGIR query for GARP input.

Looping issue in GARP usecase workflow: going through list of concepts to find synonyms to identify overlaps (names with 2 or more concepts), Gales & Jones, Action: make a custom TOS actor instead of modifying web services actor (which came from GEON)

SEEK-Taxon would like to see more possibility for an interactive UI within Kepler for TOS query and selection tasks. Alternative is to stop, restart and repeat short workflows as the way to introduce 'interactivity' for the user to test steps before running the complete workflow.

SEEK Use Case #2 need a taxon concept merging tool. SMS is working on merging other types of parameters across site data sets. (Get a list of species from LTER data sets, assume they are in EML, input to TOS, output unique list of merged names,

  • Must mark up taxon names as concepts with GUIDs in the EML data sets first, GUIDs would be the concept IDs. Need a tool to do that, Morpho is the likely app. If no GUID in the data set, a call to TOS could match on data set taxon name and a taxon reference.

  • Kepler work flow scenario: Kepler has Actors that can actually merge the data, need some user interaction with TOS to decide which level of lumping the user wants, include synonyms, concept overlaps, go up a level, etc.

    • if the GUIDs match between data sets, we merge
    • if the names from two data sets match (derived from GUIDs in the date set) in TOS, we combine them
    • if concepts match in TOS (other TOS operations)
    • (Incomplete, Jessie and Bob discussed)

Wednesday Afternoon

  • Extracting concepts from online and monographic sources, just mammals?, (Susan, Aravind)
    • Currently: 1600 PDF documents obtained, 100 are bat taxonomy papers, extracting data next
    • Desired: extract parent-child hierarchies, descriptions, synonyms,

  • TOS Data acquisition roadblocks, process, role of software tools, getting data into TOS with an import tool for usability testers.

    • Currently: ITIS ("relational" concepts), Bats from MSW 2005 (MSW concepts from original pubs and synonyms without bib references), MSW 1993, FNA in TCS, German Mosses (concepts with concept maps),

    • Requirements: TOS Actor could be tested with plant data,

    • Possible: Use Bob Peet's plant concept data (44,000 with relationships?)with PLANTS county level distribution data as a supplement to bat use case scenario, Ranunculus data set, 8 classifications, NA & Mexico.

    • Action: Stinger Guala to identify in 2 weeks the number of versions of PLANTS, send to KU to DiGIRize, Bob will send rich treatement of plant concepts the US Southeast. Available now: Ranunculus data, it needs to be upgraded to latest TCS, Xianhua will do that in 1 week. Jan or Feb: All USDA Plants in version 4, mapped against all plants in FNA, and also all of the Alan Weekly collaboration, his version mapped against 8 different classifications of plants.

  • Broader TDWG and community issues for TCS and TOS. What are their expectations? Our responsibilities? Can SEEK demonstrate the utility of TCS and TOS within the next year? What tools do we need to complete and harden for the community to buy into TCS and TOS? What can we expect from the GBIF community? Short term versus ultimate objectives

    • S-T TOS/TCS External Roles/Personas
      • S-T might be a concept provider for other people to test their concept applications,
      • S-T might be a concept repository for projects looking for a place to store them, might need a batch import process if requested.
      • S-T *must* use GUIDs because there will be multiple concept object servers
      • S-T if TOS is a global reference implementation, then we need to implement the whole TCS schema in TOS, we could not implement just some TCS fields, we would need to input and output 100% standard TCS documents.
      • Schema changes over time, do we need to maintain records in each version forever?

      • S-T TOS/TCS Internal Roles/Personas
      • See SEEK use cases 1 & 2 and other related logic

Discussion of the data independence problem with DiGIR queries that have the same name for 2 or more concepts. Solutions: Give user option to allow duplicates, eliminate duplicates, combine overlaps or go interactive and give user alert that name used for multiple concepts, ot just log errors in a sideband pipeline.

  • Status and review of Usability process:(Laura Downey with status powerpoint slides)
Discussed plans for interviewing Alan Weakly and Mouse guy. Discussed plans for getting user feedback on mapping tools and on vizualization tools

  • May 2006 EOT Workshop Planning (Pennington, Beach)
Reviewed Beach draft agenda, majority felt it was too broad, too many topics with time for limited detail, and all topics would not be of interest to a diverse workshop audience. Suggestion that we pick a research theme like doing niche modelling and targeting a workshop for just niche modellers, or alternatively having a 2-day workshop on just taxon tools and concept data management.

Thursday Morning

  • Modifying EML to accomodate taxonomic concepts with Matt Jones
    • Issue: Need to have concept metadata to EML, so that Kepler Actors could work with concept data in data sets. Could be just the GUID, if available, or at least a name and a reference or a relation of the concept in the data set to a known concept in TOS.
    • Matt: context of current EML handling of taxon data. EML is already designed to contain species name information. 9000 EML data records now, planning version upgrade now. Matt: We need a full proposal that is well fleshed out before Matt can put the proposal changes into the EML maintenance process. US Federal Biological Data Profile (BDP) wanted classification tree in the metadata. Does not allow repeatable taxonomic classifications, but EML does to allow for taxa in different trees (e.g. animals and plants) in one record. Matt has asked them to allow for multiple classifications but the BDP committee is now largely defunct, agencies not meeting anymore on this.

    • 1 possible plan (Matt): Taxon would need to tell the EML group, what is minimally needed to met the needs of the SEEK project.

Discussion of other standards that handle names, DC, ABCD, SDD should handle concepts. Standards need to be crosswalked. ABCD overlaps with EML on collections metadata. Natureserve Observation group overlaps with occurrence data. Needs to be some coordination. Standards need the GUID bit and the human readable reference. A small common concept citation (reference) schema, for a few fields, across standards would be very useful, in addition to the taxon name

Two options: modify EML to support minimally required needs or wok with the community to get an agreement across projects.

We could really use a poster which describes the overlap and activity of these various infrastructure project.

Discussion about using TSNs and ITIS and PLANTS lack of versioning and consistency with their IDs, making them ess valuable.

Whatever is proposed as required fields must be useful to end users. Ecologists must see added-value. Morpho has an ITIS plugin (ITIS-lib) to look up names in ITIS, and grabs the synonyms when the record is stored. Could be added functionality for the end the user at cataloging time to get the synonyms then.

Morpho: we still need a way in Morpho to add concept data. Matt, problem is how to convince people to add the data, right now code definitions (e.g. taxon name codes for a study, maps data set codes to names in the data set) can be put into EML. The section on taxon coverage in EML does not currently handle mapping information of any kind.

Morpho: Adding TOS lookup to Morpho (Java,Swing) 4 weeks? Simple plugin, to use GetBestConcept, from a lookup up popup in the Taxon data entry table. Xianhua might be able to work on this, his tool also needs to work with data from the TOS, that upgrade needs to be added.

UBER-Discussion

Actions:

  • Morpho
    • Nico will analyze the functional design requirement for modifying Morpho to add a TOS lookup during meta record creation. Description of placement, function, insertion, behavior of such a function (End of November)

    • Xianhua will nominally consider working on the implementation of that using existing GetBestConcept service. (dependency on knowing requirements from Nico, and after other tasks)

    • Xinhua will upgrade Ranunculus data to latest TCS, (1-2 weeks) Bob will make his data available:

    • (Bob) In Jan or Feb 2006, all USDA Plants in version 4, mapped against all plants in FNA, and also all of the Alan Weekly collaboration, his version mapped against 8 different classifications of plants.

  • Bat Data Subproject for Deomonstration Purposes

    • Objective: to exercise TOS and mapping, and to produce a nice demo.
    • Tasks and Subprojects:

      • Jessie: It would be useful to have bat treatments before 1993 to map them to 1993 treatment. Nico: we need to have a person to do that, we need a taxonomic expert to make those mappings.

      • MSW Bat data need to be mapped between 1993 and 2005 versions. Kate Jones demonstrated how they did the mapping in interview with Nico and Laura, the actual mappings can be identified from explicit annotations in the 2005 publication. But publication notes will likely not be explicit as to the exact mapping operator. We need someone to read the treatment and try to extract the mappings. We do not currently have a copy of the treatment. Diane Reeder and Don Wilson are the editors of MSW and created the source documents.

        • Beach will meet with Don Wilson November 2nd, explain our need and interest. Will ask him for a copy of the 2005 MSW bat treatment.

        • Nico will then use Xianhua's mapping tool to author the relationships. Reserves the right to pass on the task if overly complex. Nico will use the TCS-included subset of Nico's symbolic annotation codes. (dependent on receipt of MSW Bats 2005 text from Beach)

      • Susan's project will look at the 100 treatments she has since 1993 and will try to see if the concepts in those were adopted and/or mapped in the 2005 treatment. Then we can compare the actual 2005 tree with pieces that would have ben predicted from the treatments, then we can evaluate how close we get to truth. (January 2006)

  • PLANTS (Stinger)
    • Stinger will send PLANTS database with county records.(October 31)
    • Vieglais will put the county records into a DiGIR server. (Dependent on receipt of data from Stinger.
    • Updated PLANTS nomenclature for 4.0 to be given to Bob.(October 31)
    • Stinger will get archived version of PLANTS.(October 31)

  • Kepler-TOS Objectives (Gales)

    • Tasks
      • Rob will produce a GET-TAXA description of actors he is developing (November).
      • Version upgrade of TOS 1.01 (Gales and Stewart)
      • Rob will work with Matt to identify parameters for the ecological niche workflow.
      • Write one large "GET TAXA" TOS actor, to handle three actors now has (without a lot of parameters, just the basic workflow needs (January).
      • The GET TAXA actor will have added parameters to handle matching scenarios discussed on Wednesday. Concept overlaps, transitive links, etc.
      • Additional actors need to be specified based on requirements of the other workflows.

      • Setup SEEK-ITTC server for TOS.
      • Hibernate upgrade, TOS deployment, also see Aimee and Rob list from week before meeting.
    • Web application to take a GUID and output a subtree of all related concepts and descendants, for ConceptMapper queries on TOS.

  • Concept MapperTool Objectives (Xianhua)

    • Translate Alan Weakley's Excel data into TCS (Timing UNKNOWN), sends to KU, import it into TOS
    • Incorporating Laura's heuristic evaluation comments, and results of the task analysis not currently in the tool (Nico and Laura have information (2 weeks)
    • ConceptMapper modified to import TCS documents
    • ConceptMapper modified to export TCS documents
    • ConceptMapper modified to creating concepts

    • User testing planned for 1Qr 2006 (Laura, Xianhua, Bob?, Alan, Brett, Bat Person, Nico?) Nico will organize schedule. Laura will organize the entire testing script, Xianhua will organize software, Bob will need to make local arrangments. Jim will confirm with Michener and Griego.

    • Load up Ranunculus data and prepare a scenario and demonstration for taxonomists/ecologists to compare taxa to help in the resolving of concepts.(Dependent on receiving data in TCS.)

    • Explore the effect of the granularity of matching. If you match only on concepts what you discover? If you match on names, what knowledge do you discover? Etc. Pass that document to Laura and Bob, to plan an evaluation session with Bob in early 2006.

  • GUID Issues
    • TDWG Workshop in February, Attending: Peet, Gales or Perry, Spears, Kennedy(?)
    • Need to discuss how to configure GUID services for SEEK, which data get returned?

  • Future Meetings

    • Next SEEK-Taxon Conference Call November 22, 2005, 4PM GMT, 11AM EST, 10 CST, 9 MST, 8 PST

    • SEEK Early Faculty Ecoinformatics Training for Ecologists, January 2006.
      • Nico presenting SEEK taxon concept talk

    • SEEK Developers Meeting, April 30--May 5, 2006

    • SPNHC Meeting, May 23-27, Albuquerque
      • good opportunity for outreach to collections managers and museum directors

    • Workshop/Feedback Event
      • Tentatively planned for May 2006, to overlap with SPNHC? Specify, Usability interviews and testing, not a 5 day general informatics meeting
    • DEADLINE to decide Two Weeks

    • Ecological Society of America Annual Meeting, 6-11 August 2006.
      • 3500-4000 scientists, Memphis, Tennessee
      • Things to potentially demonstrate: Taxon Comparison tool, Kepler ENM use case with two different classifications, Peterson, two bird classification, biological reserve planning, demonstration, ConceptMapper. Need a very large graphic display! Try Bat data 93-05 comparison with distribution data

    • TDWG October 2006, Somewhere in the U.S., St. Louis, or Durham.

Thursday Afternoon

Review of objectives and timetables (above). Stinger discussed availability of concepts with character data for TOS, grasses from various projects, family lists,

Taxon-related Plenary Notes

  • Need a set of scenarios for the different tasks we are trying to improve and/or enable for our various user groups. They are in peoples' heads and discussed but not really written down anywhere.

  • TOS user interaction:
    • could be embedded within data search so user can make decision about what data set to get.
    • or use to extract the rows from a data set?
    • What do the user interfaces look like for user to interact with?
      • What about asking user to pick the best match?
      • What about asking user to rank the authoritative sources then having the system do the concept resolution based on that?
      • Have user specify some level of precision for matching/concept resolution etc.?

  • Establish plan for showing Martin's visualization to collection managers. This is a near term activity. We will structure the feedback and conduct the feedback most likely remotely using technology. General plan is demo the product, demonstrate the current tasks it can support, then get user feedback on tasks they would like to do that it doesn't support.

  • Concept mapper - change connect to DB to read and write TCS documents

  • Kepler actors:
    • There is an actor that does the querying of data and returns concepts (for several species) so this would replace the user using the data tab and getting results and then dragging that data set on to the canvas. (Laura's question). This would then feed into the ENM workflow.
    • Should we have a tool outside or within Kepler for users to configure the data (based on TOS). Users could configure the actor to fire automatically or manually.
    • Jessie feels strongly that the searching should be part of the workflow why is there a data tab separately?
      • One reason is because a search returns multiple objects and then a decision needs to be made of which data sets to use. This was seen as a separate step since the workflow objects/actors are seen as configurable but not necessarily interactive.
      • There are some technical issues within Kepler that have prevented more interactive actors.


Attachments:


Go to top   Edit this page   More info...   Attach file...
This page last changed on 01-Nov-2005 12:18:57 PST by KU.stewart.