|
|||
|
Taxon WG Meeting, San Diego Supercomputer CenterOpening Session Tuesday May 13, 2003, 1:30 PMPresent: Trevor Paterson, Aimee Stewart, Paula Huddleston, Bob Peet, Jessie Kennedy, Dave Vieglais, Hannu Saarenma, Jim Beach Paula Huddleston briefly reviewed status of ITIS project. Hannu Saarenmaa briefly reviewed GBIF plans for serving biological names, electronic catalog of life consortium will likely provide names. Bob: three basic functions: retrieval and mapping of tagged and not tagged names for data set queries
GUIDs for taxonomic concepts JK:
DV Need to consider what kinds of queries will be asked within the SEEK architecture of a taxon data provider for SEEK analysis needs, for example for ecological niche modeling. Might need to start with a common name, e.g. The service needs to provide the mappings between names and concepts from the start. There are many implentation issues with returning this kind of data, for example DoS problems, "Have any bananas?" Concept resolution needs to be done both completely automatically for naive users and with intermediate results for expert browsing to see which names in which data sets come back are what the user meant. There is also the need for taxon comparison support functions for expert users, who need to map between the name in their query and the results Clear that we need a federated schema for concepts, with client and server APIs, to enable multiple different systems. JK reviewed her summary of concept classifications, diagram. How taxonomic concepts are defined: Rank, Label, Publication, Definition, Author. Quality score based on what kind of information was provided with the concept. 0.0.0.0 if for example, no name, no publication, no author, no definition.
4:15 BreakUse Cases Ecogrid providers and consumers Use Cases for the architecture and not for any particular database Do retrieval first
5:15 Adjourn
Wednesday Morning April 14, 8:30 AM, SDSCPresent: David Stockwell, Trevor Paterson, Aimee Stewart, Susan Gauch, Paula Huddleston, Bob Peet, Jessie Kennedy, Dave Vieglais, Hannu Saarenmaa, Jim BeachDiscussion of daily agenda priorities Bob Peet Presentation -- his perspective of Taxon WG vision, objectives, deliverables.
We need to parse out what the responsibilites are for development in the next two days. Funded Parties of the WG
Components
JK: the most fundamental decision is what our design requirements are based on the expections of the other WGs, what the taxon name resolution service is depends a lot on how the Ecogrid group expects it to operate. Integration with the overall project architecture is pretty important. DV: BEAM WG has identified a typical use case for how SEEK would work, Powerpoint slide of searching for concepts related to the Fringed Filefish, then a query on specimen databases, pipeline into GARP, identification for environmental data sets, run GARP model. Hannu: an architectural document would be important for sharing with the community, GBIF has one a copy was circulated at the meeting, on the gbif web site, http://www.gbif.org GBIF has about 8 use cases that would be valuable for SEEK to look at GBIF architecture, DiGIR architecture should be up an running next year, GBIF will support that GBIF does not have a project in place to deal with a taxon concept architecture Susan: would like to see 1 use case, would like to build one function, and then at the next meeting talk about what we would like to add to that. Would like to pursue the spiral model of development. Bob would like use case 1 to be what is needed to store a concept level record in an archive SG: Then my interests would be to see what would be needed to bring in legacy data into that archive and see how they could be mapped to full-qualified taxon records, and how they could be retrieved with them in an overall architecture There is a comfort level with approaching WG modeling and development activites one use case at a time, with a user function like archiving being case.
DV: Presented overall SEEK Use Case, powerpoint slides. HS: defining the interfaces on how you access multiple name providers to do this kind of thing, would be very useful for GBIF. The APIs and architectural requirements are key. DV: We should just start with this, lets not try to solve all of the taxonomic problems and issues with concepts. Let's just use ITIS for name lookups and call it our first demonstration. JK: Don't see a requirement here for the SEEK concept architecture. DV: yes, but it is a starting point and a point of departure for concepts. SG: Use case 2, should be that this common name goes to two or more scientfic names, how do these map? How could user deal with multiple concepts to then due a query. JK: (Need to get Jessie's model comparison slides on here.) PH: SEEK will need a respository of its own to store data, because other people may not have a database. JK: I am worried that working on a single provider, ITIS for the first use case, is too limited, we will do a lot of things that will not be of generality, we will not be thinking about multiple taxonomies JK: Maybe the internal one is more important to do first. DV: First thing we do is define the data structure and the API, and then look or existing repositories that could support our requirements. Still the first step is to build the API and the data structures. JK: we cannot design for a single application use case, we need to also dig in into the ways to represent concepts and classifications in a database. If we dont look at the broader requirements of the architecture, then we are going to waste our time on one-off demonstrations. SG: DV: Don't disagree.
Coffee Break 10:15-10:45 AMHannu Do it the way the Digir group did it:
Stockwell: would be most useful if this could be generalized to other systems, e.g. vegetation systems, gene name data, DS: if Jesse's tools could be generalized to vegetation types, that would create a great buzz. JK: But it is not easy. Classifications are not trees, there is a ton of semantics, we are not just doing name comparisons, all of my systems are built to deal with taxon concepts, Talk about generalizing the software to be able to generalize to other types of classifications, without actually getting into the semantics of the data. One could simply create a system that would allow people to map between concepts manually, would not need to include any machine reasoning to compare things like vegetation types. Bob -- let's take the diagrams that we have and come up with a draft schema for concepts, including the metadata. Links would have to be modeled seperately, with a distinct schema.
Lunch 12:00-1:30Agreed to:
Implementation options JK: use prometheus as a database to serve concepts in the SEEK Schema Use Prometheus viz tools to read concepts out of 1 or more concept data servers. Discussion notes below from both Wednesday and Thursday, juxtapositioned where appropriate Wed: Discussion of IDs for concepts and concept instances JK: we should not use usage as the basis for a unique concept ID, if there is no additional information then the identity is unknowable,we cant use that data. We have to decide what level of concept we want to capture, do institutions have concepts, do individuals have concepts? Or do we want to say we dont know what it is, so all we can say is that it falls into the unknown concept bucket. So, collection catalogs have classifications associated with them, but we assume that the concepts are not new, they are simply uses of existing concepts from unknown sources. Thurs: the distinction of what is acceptable/workable for SEEK is whether the classification and concepts are explicitly defined or implicit. Usage of concepts, in an implicit classification, does not "create" new concepts. Alternative classifications, non-scientific classifications, special purpose classifications, amateur birder classificationsm, if explicitly created and defined are OK. MS: what about the formal rules of classifications? Do they constrain what is acceptable for the architecture? BP: no, we are not going to constrain concept creation and traffic to a restricted subset of authors. But the sources must be explicit taxonomies.
Wed: Where do we draw the line on what we accept as a new concept? Weak concepts might be species field names, Melastome #1, Melastome #2, moth "A", "big brown moth B" etc. But what about a box of insects with no name, or a specimen with just a catalog number. (Hannu argues that we need to accomodate these, but are they implicit or explicit?)
JK: if concepts are well described, regardless of the source, then they should be considered concepts.
BP: Revised: anytime anyone uses a name on a permanent record, e.g. a specimen, then it is a concept and we should be able to handle it. The concept has to be explicitly authored to be recognized. Someone has to put those into SEEK concept schema, e.g. Gentry's concept of "green plant" according to Bob Peet. JK: this is a classification of Peet's -- fine. MJ: one of the best times to do concept mappings is when researchers are trying to do data integration with two or more datasets. Someone has an interest in doing that mapping for a particular analysis. Mappings of data set field names, should not be automatically done. BP, SG: all names should be tagged with an explicit concept by authors of incoming data sets. If the name does not already exist in the federation, there should be no automatic concept generation, the policy should be that the author of the data set must register the name by linking it to a new or existing concept. MJ: I see lots of field names "A" "B", we have to allow those data sets to exist in the SEEk architecture without forcing the author to map the field names to formal names. (BP: they must make the mappings to be stored in SEEK otherwise they are useless and not wanted.) MJ: A data set that is not completely mapped to formal classifications is NOT useless. Summary Thurs: We are not going to create concepts in the SEEK federation implicitly,we are not going to scan data sets and create concepts or mappings from field data. Wed: JK: it is a concept but of no use, an informationless label is meaningless and not useful, this is not accepting a new concept, this is an application of a name to some unknowable object.
DS: this is a policy decision, the data model should accomodate all potential uses of names. E.G. the data model should be able to handle a concept which is just a name in use, applied in some way, by somebody. SG: we should accomodate all kinds of concepts including "small black moth" and Susan: What about specimen or observation data in ecological data sets with no names? There will be these pathological issues with some data, we should work with the data that follows the rules, and not worry about things without names. Trevor: but without names there is very limited value in having the data.
Thursday, May 15, 2003 AM 8:30, SDSCPresent: Kennedy, Huddleston, Saaranemaa, Schildhauer, Jones, Stewart, vieglais, Pereira, Trevor, Gauch, Stockwell, LudaescherAgenda Brief review Aimee Stewart Review of remaining agenda items Discussion of the Metamodel for concept classifications, simply shows how similiar the different classes of models are, does not suggest how they might be reconciled, but how the data can be accomodated. Notes are intercalated above from the Thursday morning discussion.
Break 10:30-10:50
List of priorities multiple classification architecture
Discussion of EML and DiGIR and museum databases.
DV: To particiapte in Ecogrid will insist on having Ecogrid interfaces for data providers.
Prior to lunch Susan and Jessie were asked to work on diagrams of the architecture as they saw it.
Lunch 12:00-1:30 PM
Post Lunch
TWG Development Tasks to enable the Fringed Filefish Concept Retrieval Demonstration with the SEEK architecture.Add Concepts to the Concept Database
Task Schedule Date - Task May 26 - 1a strawman June 2 - 2 draft, 6 June 9 - 10 June 16 - 1b, draft June 18 Wednesday - conference call, 10AM US CDT June 23 - 3+6; 5+9 draft June 30 - NSF reporting deadline for previous year July 7 - 2 final July 14 - 7 final July 16 - Wednesday - conference call 10AM US CDT July 21 - July 28 - 4+8 initial
Taxon WG Human Resources
Original Agenda
AGENDA Tuesday May 13 Afternoon: Review Draft Use Cases from Jan 30 notes, WITH ADDS from WG members before the meeting. Quickly review current approaches and literature on classification concept mapping and retreival
Beach: Beach, Pramanik and Beaman model for classification concepts (Taxon), James Ytow's paper
Wednesday May 14 Morning: Go through literature, mini-reviews (continued, if needed) From UPDATED Use Case list, identify FR and primary deliverables (PD) for Year 1 objectives Identify FR and PD for Year 3 objectives Identify FR and PD for Year 5 objectives Afternoon: Collaboration relationships with other projects: Our own software overlap, e.g. Prometheus, ITIS, Vegbank, BIOTICS requirements overlap for services, data, objectives, etc. data overlap e.g. ITIS, Specify Project will be a DIGIR source of multiple classifications of collection catalogs Other commuity projects, e.g. GBIF Octopus
Thursday May 15 Morning: Briefly review any unresolved issues with Use Cases, Functional Requirements, Primary Delieverables, Staging, Staffing Begin development of project deliverables, software and publications Afternoon: Development activities continued future planning, meetings (Seek, side and outreach), new hires, next steps
Attachments:
|
This material is based upon work supported by the National Science Foundation under award 0225676. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF). Copyright 2004 Partnership for Biodiversity Informatics, University of New Mexico, The Regents of the University of California, and University of Kansas |