Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Taxon Meeting_23_January_2004

This is version 3. It is not the current version, and thus it cannot be edited.
[Back to current version]   [Restore this version]


Day 1 Fri

January 23, 2005

Participants: Peet, Kennedy, Pyle, Kukla, Lee, Franz, Liu, Anderson

Peet presentation

  • Powerpoint = VegBank_taxon_Jan2004.ppt

Day 2 Sat

January 24, 2004

Participants: Peet, Kennedy, Jones, Pyle, Kukla, Lee, Franz, Liu, Anderson Notetaker: Jones

Pyle presentation

R. Pyle gave an overview of the FishBase concept model. Need to upload slides from Rich Pyle.
  • Draft paper at http://www2.bishopmuseum.org/schema/Phyloinformatics-Pyle.pdf (soon to be formally published & available at http://phyloinformatics.org)
  • Protonym -- an Assertion that also was the original use of a name (replaces Name)
  • name author and concept author can be distinct. Protonym always contains the name author. See Pyle examples.
  • variant spellings end up encoded in the assertion table
  • one type specimen can be used to support multiple concept definitions
  • critical to have slots that reference the type specimen for a concept
  • cache field sensu Berendsohn can contain information not present in "cached" fields, and differs from Pyle's "cheat" field which are purely derived from the parsed data fields
  • discussion of whether all assertions need to be in centralized database -- Rich wants GUIDs for everything, Jessie and Matt argued full centralization is unrealistic in the extreme case of including all uses of taxa (e.g., in ecological data sets)
  • rich says all GUIDs should come from the same global scope, but that storage need not be centralized
  • pools of GUIDS for a) reference, b) name (text string or *original description*?), c) intersection of name and reference (assertion/concept, ...); using all three allows implementation flexibility, but we agree that probably only two (Assertions & References) are needed to represent the information, and two *may* allow same level of implementation flexibility (need to think this through)
  • mapping domain among assertions classified as: unspecified, congruent, includes, included in, overlaps, excludes (see http://www.bgbm.org/BioDivInf/Projects/MoreTax/standard_liste_en.htm for a formal treatment)
  • museum determination for specimens is very similar to ecological observations (Rich called ecological observations "unvouchered specimens"

Quick overview of use cases

Categories of potential users:

  1. Taxonomist -- business of creating and modify and labeling concepts, including phylogenetic information
  2. Synthesizers of taxon data -- authorities who maintain lists and trees
    • indexers of names who don't make authority recommendations (e.g., IPNI, ZooRecord, SEEK, GBIF)
    • aggregators who make authoritative checklists (e.g., ITIS, USDA Plants)
      Use cases: DataLocationRegistrationUseCase
  3. Taxonomic data users who make determinations/identification
  4. Aggregators/providers of ecological and biodiversity data (e.g., SEEK, GBIF, Lifemapper)
    Use cases: DataLocationRegistrationUseCase
  5. Biodiversity & Ecological data users (data has name or concept information included)

Day 3 Sun

January 25, 2004

Participants: Peet, Kennedy, Jones, Pyle, Kukla, Franz, Liu, Anderson Notetaker: Jones

Started by completing our discussion of TaxonUseCases.

Draft Agenda for rest of week

Mon morn

  • 8:30 - 10:00 Brief Reports from current activities Pyle (15), Bob (5), Jessie (15), DaveV DiGIR(15), Susan, RobertG, and Amy IR (30)
  • 10:15 - Noon Ontology and Reasoning approach to concept system (Thau, Shawn)

Mon aft

  • 1:15 - 4:00 System architecture and product drawing
  • 4:00 - 6pm Discuss GUID issues and decide on implementation approach (Thau, Matt)

Tue morn

  • 8:30 - Noon Consolidate / Finalize / Rewrite /Prioritize Use cases in context of system architecture

Tue aft

  • 1:15 - 4 System architecture and product drawing
  • 4:00 - 6:00 Generate deliverable products list

Wed morn

  • 8:30 - Noon Review Architecture, generate task assignments and target milestones
    Identify person(s) to gather demonstration data sets (e.g., Angelfishes)

Wed aft

  • Informal discussions among remaining participants

Day 4 Mon

January 26, 2004

Participants: Gauch, Beach, Peet, Anderson, Kennedy, Kukla, Thau, Pyle, Jones, Bowers, Stewart, Franz, Liu, Trajkova

Started with brief progress reports from each group:

  • ...
  • Gauch: concept retrieval service, pull data from ITIS, has website showing the mapping between the ITIS fields and the current SEEK taxon model (http://www.ittc.ku.edu/SEEK)
  • Franz: deep versus shallow definition of concept. Deep concepts are formally defined, shallow concepts might only be defined as an assertion from having been used (eg, an identificaiton event)
  • Trajkova: document outlining the 7 ways in which OWL can be used to model biological taxonomy, documents now in cvs and accessible here: http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/projects/taxon/docs/ontology/
  • Gayles: architecture diagram for taxon name service and interactions with other SEEK components such as EcoGrid (will provide document)

Thau gave overview of ontologies for taxonomic information

Discussion of system architecture centered around diagrams provided by Susan

Beach: suggests taxonomic editing/revision tool is a good way to get people to enter concepts, but need to motivate them to participate

Day 5 Tue

January 27, 2004

Participants: Gauch, Beach, Peet, Kennedy, Kukla, Thau, Pyle, Jones, Bowers, Stewart, Franz, Liu, Trajkova, Gayles

Thau overview of GUID approaches

  • UUID -- algorithmic hash that can generate unique IDs, non-readable/long, no resolution information built into the UUID (eg, no clues about where it was generated)
  • LSID and URIs: one part is domain, second part is local ID, resolver based on DNS system (urn:lsid:ku.edu:ecogrid:3214), disadvantage of LSID is resolver is DNS hosts can disappear which causes system instability
  • DOI global and local parts, IDF is authority, IDF gets fee for authority and each ID, global part is not particularly readable, uses http://www.handle.net/ CNRI Handle System which can be used independently of DOI
  • See PLOS article on public DOIs at http://www.plosbiology.org/plosonline/?request=get-document&doi=10.1371%2Fjournal.pbio.0000057

What are the outstanding issues:

  • is there a verification system for ID assignments checking that that name/ref pair doesn't already exist
  • GUIDs: jessie argues for centralization of creation of concepts (ie, to be issued a GUID, a data provider must ask a central authority if that concept already has been entered; if not, a GUID is issued, if so the existing GUID is sent back)
  • Jessie: need to separate out the idea of the person who authors a concept from the person who enters the existence of a concept into a data system
  • Jessie: need ability to change attributes of a concept, which will probably result in a new GUID which is a "duplicate" of the original (with the typo fixed), so should extend concept correlation table to include "DUPLICATE" as a correlation level (in addition to congruence, includes, included in, intersects, excludes)
  • revision semantics, lineage could and maybe should be tracked

What can we assume:

  • Jessie: DOIs will exist (eventually) for every reference
  • Jessie: Specimen can use LSID
  • Jessie: need mechanism for concept IDs
  • Jessie: shouldn't be able to get a GUID for a concept twice
  • Rich: which system we use partly determines how ids can be assigned
  • GUID resolver service needs to check for laready entered GUIDs based on similarity and issue warnings to the issuer

Day 6 Wed

January 28, 2004

Participants: Beach, Peet, Kennedy, Kukla, Thau, Pyle, Jones, Bowers, Stewart, Franz, Liu, Trajkova, Gayles

Target Milestones

  • ES: Escience meeting May 2004
  • TDWG: TDWG October 2004
  • PA: Prototypes Available January 2005
  • SR: Site review (April 2005?)

Tasks

  1. ES: Discussions with various ITIS partners about status and involvement (Jessie, Bob, Rich, Dave)
    • Pyle can be working on interfacing with ITIS Pacific node (report back to SEEK Taxon)
    • Jessie talks to Paula
    • Bob talks to USDA (Mark and Scott)
    • Dave contacts ITIS Canada participants
  2. ES: First draft of schema for presentation, seek-taxon reviews ahead of time based on call for reviews (Jessie)
  3. TDWG: Final SEEK schema so that we can make progress (Jessie)
  4. Finish architecture diagram as blueprint of activities (Aimee, RobertG, DaveT, draft end of feb 2004, more stable by end of march 2004) (see http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/taxon/docs/taxon-arch-diagrams-20040126.pdf?content-type=application/pdf)
  5. Finish SEEK Taxon Use Cases document to be consistent with architecture diagram (Nico, Bob, Matt, Dave) (aim for middle to end of March 2004) (see http://seek.speciesanalyst.net/ow.asp?TaxonUseCasesTemplatesMarch04)
  6. Draft interface definitions in WSDL (Aimee, RobertG, DaveT, next several months)
  7. ES: Proposed Interface definitions in WSDL with full documentation (Aimee, RobertG, DaveT)
  8. ES: Write a whitepaper describing the data and API interfaces with SMS, determine who does which operation (ie, which system reasons about equivalence of concepts) (DaveT, Shawn)
  9. define the operations at a detailed level (e.g., what's needed in terms of processing power)
  10. where do those operations fit into the overall architecture
  11. Define OWL model for taxon information, how it relates to the Taxon Exchange Schema (DaveT, Shawn, Joana, Xianhua)

  • 1. ES: Develop specific use cases for SMS system using taxonomic info in OWL, use GARP as inspiration (DaveT, Shawn, Nico, DaveV)
  • 1. ES: Simple prototype of Data Discovery use case on limited data set that has been fully tagged with concepts (Shawn, Rich, DaveT)
  • 1. ES: Assemble example data sets, including concept data and tag Darwin Core / EML data with concepts (Bob, Rich), try to send initial data dump to Jessie for testing early, then later provide rest
    • Angelfishes (Rich)
    • Plants Juglandaceae (Bob, Xianhua)
    • More extensive but shallow mapping between Flora of NA and Plants from USDA/ITIS (Bob, Xianhua)
    • Maybe generate a virtual data world that is clean but illustrates the common mapping issues
  • 1. TDWG: Update current implemention to be consistent with interfaces described in (5,6) (Robert, Aimee)
    • data proxying methods
    • example data provided (Rich, Bob) is put into exchange syntax (Jessie) and loaded in implementation using the implemented population APIs (Robert)
    • findConcepts operation is high priority
    • Robert and Aimee will email a proposed priority order for implementation after discussing with Susan
    • Robert and Aimee produce status report for implementation for eScience Meeting
  • 1. TDWG: Test harness to check system integrity based on well-known test data from 12 ()
  • 1. PA: Design mapping tools, ie the interface for taxonomists and ecologists to do mappings (Xianhua, Nico, Matt)
  • 1. ES: Build a makeshift GUID system using Handle system (RobertK, Matt, Rich, Jessie, DaveT) (end of Feb)
    • Comparison table of pros/cons of GUID systems, precursor to publication on same topic (DaveT, Jim, Nico) (deliver by end of first week of Feb)
  • 1. TDWG: Visualization tools prototype in May, something more complete by TDWG (Jessie, MartinG)

Future Meetings and Conference Calls

  • Beach: would be good to have monthly calls (Bob will set up calls)
    • conf call week 1 of March
    • conf call week 1 of April
    • In-house KU taxon meeting 2nd or 3rd week of March
    • conf call to coordinate eScience meeting, maybe not everybody


Attachments:


Go to top   More info...   Attach file...
This particular version was published on 29-Jun-2004 12:41:52 PDT by LTER.stekell.