Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Taxon WG Meetings In Edinburgh Agenda

This is version 1. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]


SEEK Taxon Meetings - Preliminary Schedule


Monday-Tuesday, May 10-11

1. Transfer Schema Report - Dry Run

  • our internal presentation and discussion related to the transfer schema; progress, issues, outlook, etc.
  • coordinators: 'Jessie Kennedy, Robert Kukla'

2. Use Cases

  • review the use case revisions prepared by Nico in light of the subsequent concept mapping activities
  • coordinators: 'Nico & Bob'

3. Concept Synonymy

  • Bob & Nico have tried to do "name/concept" translating with real datasets and have some experiences now
  • to handle "concept synonymies" (including P/C relations) seems to be one of the harder problems (not even conclusively solved in the moss dataset)
  • a set of rules for taxonomists to deal with these issues seems necessary
  • coordinators: 'Nico & Bob'

Peet_CaryaDemo.ppt Peet -- Carya: Lessons from constructing a demonstration database of taxonomic concept data No InterWiki reference defined in properties for Wiki called 'Peet_CaryaDemo.ppt Peet -- Carya'!)

Peet_WorkFlow.ppt Peet -- Use case lessons: Components of the SEEK architecture No InterWiki reference defined in properties for Wiki called 'Peet_WorkFlow.ppt Peet -- Use case lessons'!)(cancelled for lack of time)

4. GUIDs

  • what is our progress and position in this topic prior to coming to the eScience Workshop?
  • coordinator: 'Dave Thau'

5. API Comparison - KU Taxon Efforts vs. SMS Working Group

  • what are the differences/similarities; 1 or several APIs?
  • coordinator: 'Susan Gauch'

6. Peer Review System (cancelled for lack of time)

  • description of the peer review system for taxon and community concept taxonomy being developed by Bob and Xianhua
  • coordinator: 'Bob'


Additional topics - not yet allocated

X1. Visualisation Tools

  • demo of what we've done and discussion of what visualisation tools we'd like to have - for a grant application to EPSRC to support Martin Graham working collaboratively with SEEK; duration: 1 hour max including discussion; possible date: 'Tuesday morning?' (would be ok)

X2. Schema Mapping Tool

  • demo of project by student Marc Schaffer at Napier along with Robert which attempts to take an XSD and allow the user to define mappings from their database to the schema elements and then automatically generate a valid XML document representing the database in terms of the XSD; 'possibly 30 mins at lunch on Monday or Tuesday' (either would work)

Trevor's notes on Tuesday's presentations (which were actually from Monday's itinerary...)

Dave Thau talk on GUIDs

  • for presentation to wider taxonomic community tomorrow at e-science meeting
  • therefore need to make sure not SEEK specific solution, addressing SEEK-specific requirements, but rather something of benefit to wider taxonomic community
  • investigation of the need for, benefits of and possible solutions for GUIDs
  • everyone agreed they are necessary (for at least some of the entities referred to by the Taxonomic Concept Server)

ie
  • Publication, Taxonomic Concept, Vouchers

less clear if we need/want them (or they are feasible) for
  • Data Providers, Authors and Journals

  • Dave Vieglais confirmed that Voucher IDs would be provided by DiGR for every specimen

  • Who, how, when they are assigned is somewhat political (best avoided in a general discussion) SEEK can choose to implement its own prototype system anyway

  • recognized need to develop good policies/business rules for GUID assignment

    • what is a new/unique concept
    • how different/new does it have to be
    • what are insignificantly minor changes that dont warrant new IDs (eg typos)

  • is ownership an issue (Nico wonders about who can change or even reference an authors concepts but this is insoluble for old concepts)

  • is GUID explosion a problem we should beware? Bob thought it might be on the basis of his analysis of N.American Hickories

  • Jessie pointed out that GUIDs don't solve resolution problems for relationships between concepts, and can just shift the burden to other parts of the system

  • Generally recognized that any system for entering new concepts and assigning new GUIDs would ideally identify potential trivial conflicts, eg with similar existing concepts and ask the user whether they really wish a new concept to be created or should they accept a pre-existing similar (or better) one

  • Dave T. showed HANDLE GUIDs to be probably a better candidate solution than the other main candidate (LSIDs) mainly because more mature, better support, some centralized gatekeeping possible

  • gatekeeping can be delegated tho: whilst SEEK could handle its own GUIDS, other users (eg taxon DBs) can independently have GUIDs

  • there was some discussion over what/how/whether a one GUID = one concept rule was a critical idea, most thought it was a good goal: simplifies maintenance and resolution, helps constrain problem

  • Bob raised his ideas about Natural Entities/Super Concepts again at this point: could the 'name+reference' component of a Taxon Concept not have an independent GUID, allowing it to be used as a less specific concept others felt that this moves us away form the concept based approach back to a name based approach. Taxonomic concepts could be constructed in the present schema that represent Bob's concepts name+reference(Bob) +included concepts: to make these larger, less specific, natural entities

  • Later on there was discussion on whether it was envisioned that data markup (EML) might ever reasonably include GUIDs for concepts: and would 'tool' support for this be useful or used..how would you guide the user which concept to pick? (rules favouring good, complete, recent concepts)...would you allow a user to pick multiple SEEK GUIDS to match his data concept (ie it might be GUID_x, GUID_y, GUID_z and the user cant decide/doesnt care/thinks that it is all of them) ...Bob's super concepts might be useful to a working field ecologist here: or you just choose to follow a given authority: ITIS for example

Nico Franz's talk on concept synonymy

Franz -- Comparing a trad./concept-implementing catalog; and: "constituents vs. properties"

  • addressed the complexity of how one recognizes valid (new) taxConcepts, especially translating from existing data

  • which may have concepts attached to unavailable names, concepts described with varying richness, and a variety of correlations/relations between concepts

  • synonym not all synonym relationships seem acceptable
    • can be real/true (eg just a typo)
    • but might be apparently true at some level, but at a deeper level not true
    • how deep do you have to go for synonym: eg one taxonomic rank of constituents (Nico thinks this is appropriate)

  • there is no need, or not enough information, to translate all names used into valid concepts

  • The German mosses revision is the best detailed example of a concept based revision to date

  • even this shows some problems and inconsistencies interpreting the various synonymy/congruence relationships

  • we need to get a complete list for these, for the schema sources: this moss work, Bob Peet, Berendson's Taxon paper (if we can understand them all! Jessie/Robert should ask Berendson to explain) No InterWiki reference defined in properties for Wiki called 'sources'!)

  • Congruence is measured differently by humans and machines: we can choose to consider a mix of constituents (child circumscription?) and properties: machines only really handle the former. However the way congruence is recognized and described depends somewhat on taxonomic rank: at higher rank property/character circumscription is easier/more natural for humans

  • Dave T. raised a question of geographical restriction: if one body if work asserts that two concepts are related/synonymous in one country: this might not be true for a different geographic range where the concepts might not be considered synonymous

    • is it necessary to include geographical info on concepts/assertions
    • where/how in the schema?
    • is this necessary for reasoning/SMS

  • jessie tried to convince everyone that resolution would be between the concepts, and that the direction of a synonymy relation would be important - especially when considered historically: so that according to one revision/view two taxa might be synonymous and according to another revision they are not: but this would be a real observation: if it was possible to traverse the synonymy relation in either direction the system would find that yes: the concepts had been related somehow in some circumstance...

  • change in time is also important older workers might have only been able to consider fewer species/specimens etc: therefore simple numeric measures of child constituents of a concept would give poor measures of congruence between old and new concepts

Bob Peet case study of North American Hickories: mapping concepts

  • Bobs study back projected how ITIS seemed to have modelled/used concepts through various version of there database: using recognized description authorities, and changing when a better authority became available

  • nicely illustrated the explosion of concepts: when moving from 22 taxa to 500+concepts

  • whilst this approach is integrating old data: it is also projecting future preferred use: ie recommending what synonyms to use

  • for any mark up of old data it may be necessary to make informed guesses for some taxonomic concepts: but this should be clearly 'authored' by the modern worker

  • Bob sees his approach as extracting/defining/using berendson potential taxa: Jessie sees it as more as creating new concepts: which contain/are synonymous with various older concepts: and are now sec Peet 2004

  • Dave V. thought this was old ground that we couldnt progress till we got some more data sets to do real implementations with: they need to develop their prototypes with various test data at this stage

Susan Gauch et al

  • only had time for a very brief overview of the Kansas development work on TCS APIs

  • main limitations to their progress are

    • lack of taxonomic concept providers
    • lack of data sets
    • lack of real algorithms/requirements for taxonomic comparisons/searches comparisons (until then they can only put simplistic functionality in)

      • eg how deep should searches go, or to what distance?
      • should it be possible for a user to specify depth?
      • will loops be a problem once we have multiple concept hierarchies?
      • what will the measures of similarity be children at one depth? parents?

Shawn Bowers and Dave T

  • has only a couple of minutes to demo a data discovery prototype integrating taxonomic concept resolution (in a dummy format) KR, and SMS: using a bespoke ant ontology written up in Sparrow

  • some concern at scalability of approach of mining based on data not just metadata

  • some confusion about where the divisions in functionality lie between SMS and TCS

  • some discussion of whether a separate tool for taxonomic mark-up of data is appropriate (no) or would be a transparent plug-in calling up TCS functionality as far as the user is concerned (yes)


Topics

  • At this point we plan to have 'four half-day sessions,' two each on May 10 and on May 11. Lunch seems to be scheduled from 12:30 to 14:00, so we could have morning sessions from 09:00 to 12:30, and afternoon sessions from 14:00 to 18:00. 'On Tuesday the session finishes at 15:30 for the plenary session.'


Participants

Please add/subtract your name if you are (or aren't) planning to attend the pre-eScience SEEK Taxon Meetings

  • Nico Franz (NCEAS)
  • Robert Gales (KU)
  • Susan Gauch (KU)
  • Martin Graham (Napier)
  • Matt Jones? (NCEAS)
  • Jessie Kenndey (Napier)
  • Robert Kukla (Napier)
  • Trevor Paterson (Napier)
  • Bob Peet (UNC)
  • Rod Spears (KU)
  • Aimee Stewart (KU)
  • Dave Thau (KU)
  • Dave Vieglais (KU)
  • Shawn Bowers (SDSC)


Attachments:
Peet_CaryaDemo.ppt Info on Peet_CaryaDemo.ppt 76800 bytes
ConceptSynonymies1.ppt Info on ConceptSynonymies1.ppt 8197120 bytes
Peet_WorkFlow.ppt Info on Peet_WorkFlow.ppt 54784 bytes


Go to top   More info...   Attach file...
This particular version was published on 29-Jun-2004 14:21:10 PDT by LTER.stekell.