This is version 1.
It is not the current version, and thus it cannot be edited.
[Back to current version]
[Restore this version]
January 23, 2005
Participants: Peet, Kennedy, Pyle, Kukla, Lee, Franz, Liu, Anderson
- Powerpoint = VegBank_taxon_Jan2004.ppt
January 24, 2004
Participants: Peet, Kennedy, Jones, Pyle, Kukla, Lee, Franz, Liu, Anderson
Notetaker: Jones
R. Pyle gave an overview of the FishBase concept model. Need to upload slides from Rich Pyle.
- Draft paper at http://www2.bishopmuseum.org/schema/Phyloinformatics-Pyle.pdf (soon to be formally published & available at http://phyloinformatics.org)
- Protonym -- an Assertion that also was the original use of a name (replaces Name)
- name author and concept author can be distinct. Protonym always contains the name author. See Pyle examples.
- variant spellings end up encoded in the assertion table
- one type specimen can be used to support multiple concept definitions
- critical to have slots that reference the type specimen for a concept
- cache field sensu Berendsohn can contain information not present in "cached" fields, and differs from Pyle's "cheat" field which are purely derived from the parsed data fields
- discussion of whether all assertions need to be in centralized database -- Rich wants GUIDs for everything, Jessie and Matt argued full centralization is unrealistic in the extreme case of including all uses of taxa (e.g., in ecological data sets)
- rich says all GUIDs should come from the same global scope, but that storage need not be centralized
- pools of GUIDS for a) reference, b) name (text string or *original description*?), c) intersection of name and reference (assertion/concept, ...); using all three allows implementation flexibility, but we agree that probably only two (Assertions & References) are needed to represent the information, and two *may* allow same level of implementation flexibility (need to think this through)
- mapping domain among assertions classified as: unspecified, congruent, includes, included in, overlaps, excludes (see http://www.bgbm.org/BioDivInf/Projects/MoreTax/standard_liste_en.htm for a formal treatment)
- museum determination for specimens is very similar to ecological observations (Rich called ecological observations "unvouchered specimens"
Categories of potential users:
- Taxonomist -- business of creating and modify and labeling concepts, including phylogenetic information
- Synthesizers of taxon data -- authorities who maintain lists and trees
- indexers of names who don't make authority recommendations (e.g., IPNI, ZooRecord, SEEK, GBIF)
- aggregators who make authoritative checklists (e.g., ITIS, USDA Plants)
Use cases: DataLocationRegistrationUseCase
- Taxonomic data users who make determinations/identification
- Aggregators/providers of ecological and biodiversity data (e.g., SEEK, GBIF, Lifemapper)
Use cases: DataLocationRegistrationUseCase
- Biodiversity & Ecological data users (data has name or concept information included)
January 25, 2004
Participants: Peet, Kennedy, Jones, Pyle, Kukla, Franz, Liu, Anderson
Notetaker: Jones
Started by completing our discussion of TaxonUseCases.
- 8:30 - 10:00 Brief Reports from current activities Pyle (15), Bob (5), Jessie (15), DaveV DiGIR(15), Susan, RobertG, and Amy IR (30)
- 10:15 - Noon Ontology and Reasoning approach to concept system (Thau, Shawn)
- 1:15 - 4:00 System architecture and product drawing
- 4:00 - 6pm Discuss GUID issues and decide on implementation approach (Thau, Matt)
- 8:30 - Noon Consolidate / Finalize / Rewrite /Prioritize Use cases in context of system architecture
- 1:15 - 4 System architecture and product drawing
- 4:00 - 6:00 Generate deliverable products list
- 8:30 - Noon Review Architecture, generate task assignments and target milestones
Identify person(s) to gather demonstration data sets (e.g., Angelfishes)
- Informal discussions among remaining participants
January 26, 2004
Participants: Gauch, Beach, Peet, Anderson, Kennedy, Kukla, Thau, Pyle, Jones, Bowers, Stewart, Franz, Liu, Trajkova
Started with brief progress reports from each group:
- ...
- Gauch: concept retrieval service, pull data from ITIS, has website showing the mapping between the ITIS fields and the current SEEK taxon model (http://www.ittc.ku.edu/SEEK)
- Franz: deep versus shallow definition of concept. Deep concepts are formally defined, shallow concepts might only be defined as an assertion from having been used (eg, an identificaiton event)
- Trajkova: document outlining the 7 ways in which OWL can be used to model biological taxonomy, documents now in cvs and accessible here: http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/projects/taxon/docs/ontology/
- Gayles: architecture diagram for taxon name service and interactions with other SEEK components such as EcoGrid (will provide document)
Thau gave overview of ontologies for taxonomic information
Discussion of system architecture centered around diagrams provided by Susan
Beach: suggests taxonomic editing/revision tool is a good way to get people to enter concepts, but need to motivate them to participate
January 27, 2004
Participants: Gauch, Beach, Peet, Kennedy, Kukla, Thau, Pyle, Jones, Bowers, Stewart, Franz, Liu, Trajkova, Gayles
- UUID -- algorithmic hash that can generate unique IDs, non-readable/long, no resolution information built into the UUID (eg, no clues about where it was generated)
- LSID and URIs: one part is domain, second part is local ID, resolver based on DNS system (urn:lsid:ku.edu:ecogrid:3214), disadvantage of LSID is resolver is DNS hosts can disappear which causes system instability
- DOI global and local parts, IDF is authority, IDF gets fee for authority and each ID, global part is not particularly readable, uses http://www.handle.net/ CNRI Handle System which can be used independently of DOI
- See PLOS article on public DOIs at http://www.plosbiology.org/plosonline/?request=get-document&doi=10.1371%2Fjournal.pbio.0000057
What are the outstanding issues:
- is there a verification system for ID assignments checking that that name/ref pair doesn't already exist
- GUIDs: jessie argues for centralization of creation of concepts (ie, to be issued a GUID, a data provider must ask a central authority if that concept already has been entered; if not, a GUID is issued, if so the existing GUID is sent back)
- Jessie: need to separate out the idea of the person who authors a concept from the person who enters the existence of a concept into a data system
- Jessie: need ability to change attributes of a concept, which will probably result in a new GUID which is a "duplicate" of the original (with the typo fixed), so should extend concept correlation table to include "DUPLICATE" as a correlation level (in addition to congruence, includes, included in, intersects, excludes)
- revision semantics, lineage could and maybe should be tracked
What can we assume:
- Jessie: DOIs will exist (eventually) for every reference
- Jessie: Specimen can use LSID
- Jessie: need mechanism for concept IDs
- Jessie: shouldn't be able to get a GUID for a concept twice
- Rich: which system we use partly determines how ids can be assigned
- GUID resolver service needs to check for laready entered GUIDs based on similarity and issue warnings to the issuer
January 28, 2004
Participants: Beach, Peet, Kennedy, Kukla, Thau, Pyle, Jones, Bowers, Stewart, Franz, Liu, Trajkova, Gayles
- ES: Escience meeting May 2004
- TDWG: TDWG October 2004
- PA: Prototypes Available January 2005
- SR: Site review (April 2005?)
- ES: Discussions with various ITIS partners about status and involvement (Jessie, Bob, Rich, Dave)
- Pyle can be working on interfacing with ITIS Pacific node (report back to SEEK Taxon)
- Jessie talks to Paula
- Bob talks to USDA (Mark and Scott)
- Dave contacts ITIS Canada participants
- ES: First draft of schema for presentation, seek-taxon reviews ahead of time based on call for reviews (Jessie)
- TDWG: Final SEEK schema so that we can make progress (Jessie)
- Finish architecture diagram as blueprint of activities (Aimee, RobertG, DaveT, draft end of feb 2004, more stable by end of march 2004) (see http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/~checkout~/seek/projects/taxon/docs/taxon-arch-diagrams-20040126.pdf?content-type=application/pdf)
- Finish SEEK Taxon Use Cases document to be consistent with architecture diagram (Nico, Bob, Matt, Dave) (aim for middle to end of March 2004) (see http://seek.speciesanalyst.net/ow.asp?TaxonUseCasesTemplatesMarch04)
- Draft interface definitions in WSDL (Aimee, RobertG, DaveT, next several months)
- ES: Proposed Interface definitions in WSDL with full documentation (Aimee, RobertG, DaveT)
- ES: Write a whitepaper describing the data and API interfaces with SMS, determine who does which operation (ie, which system reasons about equivalence of concepts) (DaveT, Shawn)
- define the operations at a detailed level (e.g., what's needed in terms of processing power)
- where do those operations fit into the overall architecture
- Define OWL model for taxon information, how it relates to the Taxon Exchange Schema (DaveT, Shawn, Joana, Xianhua)
1. ES: Develop specific use cases for SMS system using taxonomic info in OWL, use GARP as inspiration (~DaveT, Shawn, Nico, ~DaveV)
1. ES: Simple prototype of Data Discovery use case on limited data set that has been fully tagged with concepts (Shawn, Rich, ~DaveT)
1. ES: Assemble example data sets, including concept data and tag Darwin Core / EML data with concepts (Bob, Rich), try to send initial data dump to Jessie for testing early, then later provide rest
* Angelfishes (Rich)
* Plants Juglandaceae (Bob, Xianhua)
* More extensive but shallow mapping between Flora of NA and Plants from USDA/ITIS (Bob, Xianhua)
* Maybe generate a virtual data world that is clean but illustrates the common mapping issues
1. TDWG: Update current implemention to be consistent with interfaces described in (5,6) (Robert, Aimee)
* data proxying methods
* example data provided (Rich, Bob) is put into exchange syntax (Jessie) and loaded in implementation using the implemented population APIs (Robert)
* findConcepts operation is high priority
* Robert and Aimee will email a proposed priority order for implementation after discussing with Susan
* Robert and Aimee produce status report for implementation for eScience Meeting
1. TDWG: Test harness to check system integrity based on well-known test data from 12 ()
1. PA: Design mapping tools, ie the interface for taxonomists and ecologists to do mappings (Xianhua, Nico, Matt)
1. ES: Build a makeshift GUID system using Handle system (~RobertK, Matt, Rich, Jessie, ~DaveT) (end of Feb)
* Comparison table of pros/cons of GUID systems, precursor to publication on same topic (~DaveT, Jim, Nico) (deliver by end of first week of Feb)
1. TDWG: Visualization tools prototype in May, something more complete by TDWG (Jessie, ~MartinG)
- Beach: would be good to have monthly calls (Bob will set up calls)
- conf call week 1 of March
- conf call week 1 of April
- In-house KU taxon meeting 2nd or 3rd week of March
- conf call to coordinate eScience meeting, maybe not everybody
Attachments:
|