Taxon Meeting KU_UNC_19_March_2004

Present: Jim Beach, Bob Peet, Nico Franz, Xianhua Liu, Joana Trajkova, Aimee Stewart, Ricardo Pereiera, Robert Gales, Dave Vieglais, Susan Gauch, Dave Thau (by phone)

19 March 2004

Bob: Would like overall vision for next 2 or 3 years. He had just come from Arlington where he is collaborating with NatureServe to build a concept database for North American flora based on the 2004 release of the John Kartesz checklist. Kartesz expects to have roughly 14K concepts for VegBank by June 1 (of the roughly 35K total taxa of all ranks). The concepts will be used by and the VegBank, ITIS, NatureServe's Biotics dbs. NatureServe is doing QAQC now on the first 7K. Going to use some SEEK money to fill in the holes (exotics, spell check, some ambiguities, etc); enough to get more NSF money to finish. The project will also identify the 100 most problematic cases. Will enter these in SEEK to have more complete concepts for plants. Vegbank is also developing a peer review system for authoritative concepts and which should be applicable to taxa

Nico - SEEK taxon use cases update - to be cleaned up with Bop Peet by end of March, for the moment see Taxon Use Cases Update March04

Xianhua - familiarizing himself with data models and working on the concept peer review system.

Jessie has German mosses data, downloadable from cvs.

Concepts - Nico goes through Use Cases, ClassifyConcepts - what is the difference between concepts, assertions, names, etc. Aimee: do we need to have a definition written in stone? What do we want a scientist to see when querying on "hickory"? A taxonomist to get when entering a new taxa? Ricardo: similarity compare algorithm based on fields in TES. Relate algorithm uses trees,

Use Cases, Associated Credit for person digitizing or person who discovered species - enters into SEEK but does not publish.

SEEK stores preferred resolutions between different concepts, names, etc
Register datasets
Merge datasets

Jim: what are some Use Cases that address the Data Providers' expectations, we could end up Walmart of data - not discriminating,

When an ecologist marks up a dataset - does that information go into the Taxon Cache? Maybe into taxon Context database

possible 5-year target of Taxon project - have minimally Nico's Name-Relation Concept, SEEK will never be source of data. Not headed towards searching on character descriptions.

Dave Thau: installed LSID server and handle system. Handle system was slow, but now as fast as db, remaining:

what is result of handle? what does it point to? what is response?
look into metadata - what do we want
how does it fit into overall architecture
does rest of seek benefit from a particular type of GUID (LSID, handle, etc)
look at structure - maybe a checksum to make sure it's a valid number, code in data provider or put that in metadata?
post question to seek-dev about preferences of LSID over handle system, or something else
tie in more closely to current data model

2 tables

handle table - who registered, who can change, when registered - has gui tools to distribute system so many people can change handles, duplicates some info, but makes it more widespread / easily available
some other table

programmatic access: all in java, can plug into whatever db, register guid info in db; DOI system runs on handle system; registered with CRNI (1883 for Thau), handle system will have validity outside of SEEK system, general url (handle.net?) that can resolve concept, handle system does not have metadata support - we would have to implement that ourselves. LSID has metadata built into system

how stable is DOI system in the long term? should be good - lots of investment
what about RFID: mostly different domain, commercial application, like bar codes issuance,
what things were others trying to solve with this unique ID system? pointer to information which remains valid no matter where data resides or moves, numbers easily, robustly, quickly resolved, system which will be used, Handles try to free info from specific domain
what kind of revolutionary impact can our handle system have? like new industry - revolution in cross-linking, ex: used in Nature, big revolution in scientific online, dynamic links tied together in grid system, cross-referencing by humans or applications, ex: good for reproducing experiments (doi on dataset, doi on algorithm), persistant identifier that exists after publication, frees publishers to change websites,
what about Ecogrid's use of GUIDs? If we can show utility of our choice, eocgrid will take that seriously when making their decision
what are implications of location of GUID server - how does it play with Taxon DB
LSID - standard soap call to get metadata, publicly known interface, could add metadata to our db
Link to outside world, most difficult for legacy data

Susan: Need to spend some uninterupted time on implementing last python version. Have spent some time on architecture diagram, creating software architecture for that architecture and taxonomic exchange schema, will send guts of algorithms out for

?? ask Nico: synonymy: most searches will be limited to name: congruent with, included in, includes

bob: what are the implications of changing TES? Should be relatively easy - code is modular and changes should be limited to algorithms.

Nico: wrote to George Gerrity, paper on patent of db record structure?, Marc Geoffroy in berlin,

20 March 2004 Present: Jim Beach, Bob Peet, Nico Franz, Xianhua Liu, Joana Trajkova, Aimee Stewart, Robert Gales, Dave Vieglais, Susan Gauch, Jessie Kennedy by phone

Next version of Jessie's taxon transfer schema should be ready for developers to code against in May. In October we should have a final transfer (TDWG) solid schema. Differences between SEEK schema and standards schema for TDWG: SEEK schema should be very close to TDWG (which takes other providers' schemas into account). Should be able to see relationships between transfer schema and anyone else's schema. TDWG and SEEK should be very close

Between now and May, Bob will download latest xml from Kukla, try to use that to represent his data, talk with Jessie about schema.

Jessie will make sure that she understands schemas and structures from others based on their data, by transfering following data into our XML schema: Berlin, Vegbank, ITIS, Rich Pyle's - should help convince others that transfer schema is workable.

Bob will send developers data when sending to Jessie, download schema, iterate on schema design with Jessie til May, populate small but detailed example into Access, Nico and Bob will work on Use Cases,

Robert K. - will send raw xml data. Susan: maybe Edinburgh can develop tools to populate db since they are halfway there? Robert Kukla - the code is not ready for prime-time, just proof of concept

Matt: Could use extension feature of EML so don't have to worry about changing EML immediately. Existing tools for marking up datasets into EML will work, esp if not worried about user efficiency right away. Would be good to focus on BEAM test case - mammal, climate change, GARP, data from MANIS (Town Peterson and others), also activities of SMS and KR group - ontologies, OntoBrowser, semantic registration = concept markup, Taxon group doesn't want to create user interface for Taxon markup

Nico, Use Cases:

assign status (valid, invalid, accepted, etc) as of particular person/entity - action Bob, Nico should talk to Jessie about including this in schema
register data providers (name providers) - Aimee thinks this is an Ecogrid function
get provider data
register data providers (taxonomists) - Aimee thinks Ecogrid, Nico is thinking of a credit system for taxonomists and access rules (maybe 2 different things), credit system is metadata for the concepts/relationships
find concept with different fields of TES filled,
visualization tools for comparing groups of relationships (tree, synonymy, lineage). similar to work of Jessie pre-SEEK, good for buy-in of taxonomists, for literature search, show lineage relationships with dates

Concept - guid, name, reference Relationship - guid, relationshiptype, guid

Taxonomic exchange schema - could we have a flat-file of list of concepts, then list of relationships - not hierarchical?

Go to top Edit this page More info... Attach file...

This page last changed on 30-Jun-2004 12:31:48 PDT by LTER.stekell.