Science Environment for Ecological Knowledge
Ecoinformatics site parent site of Partnership for Biodiversity Informatics site parent site of SEEK - Home
Science Environment for Ecological Knowledge









 

 

 



Taxon WG Meeting_24_October_2003

Present:

  • Jim Beach
  • Jessie Kennedy
  • Bob Peet
  • Martin Pullan
  • Susan Gauch
  • Dave Thau
  • Nico Franz
  • 2 guys from Paris
  • Aimee Stewart

Agenda

  1. potential seek role in TDWG concept standard activities a. Nico - couple of slides relating to TDWG concepts
  2. Dave Thau finish SMS/Taxon possibilities - ontologies
  3. update from Susan and Aimee and concept matching
  4. Bob - use cases working towards overall architecture
  5. Plan 2-3 month activities, Jan mtg Ask Matt - week of Jan 19 Bob - want use cases (maybe a subset stay longer and work on)
  6. Things that Nico and Liu(?) will be doing
  7. Bob - what are our goals? need a blueprint
    • How does this help an ecologist looking around?
    • what does a dataset markup look
    • We need tools - prompt user to markup data nicely
    • registry for datasources?
    • how do we deal with old data
    • many other projects have pieces that would work for us
    • would like to talk through use case - fringed file fis
    • do we do the markup for approval by ecologist, supplement

Our potential role take the lead in standards development would look for additional money - 1 year time frame, in parallel with rest of our tasks Walter, GBIF, Frank Bisby are all interested Bob - concept based system that many institutions could use. institution that would serve and use concepts. Standards alone are necessary but not enough. We need something to test in one year

Jim - we need to be application driven. But prototype tests to interact with other systems Vegbank, prometheus, etc. Needs to be funded

Jessie - must show that it works, not just a seek model

Martin - risk management - what is the risk to seek goals if we get involved with TDWG standards.

Jim

  1. money - maybe allocate 20%, but most external funded
  2. can we rationalize this within seek goals

Bob would get ecologists to markup right

Martin GBIF has different immed. goals that are not nec. same as seek.

Jessie - do it our way in broader context, then hope others like . can't guarantee that others will buy in. we must do it anyway

Jim - others may drop out, what is our risk to spend time and money. Could be major disagreements bw different perspectives

Susan can make work together with wrappers even if somewhat different

Jim - need all of: schema, repr methods, documentation of schema, transfer protocol, to prove our stuff, others will want software, we are committing to having stuff for others, formalizing schema is only additional work. we needed to do this anyway

Jessie - we need our own schema which is subset of federation schema. We will need to be able to anticipate others' needs for interfaces - no it's others' problem

Bob - must ask for more money to interface with others

Jessie - extra work - must really study/communicate with other systems to collaborate

Jim - MOU with GBIF to say how far we will go - whoever is materially involved. Everyone should understand, umbrella of TDWG, but they have no material investment. understanding with TDWG, not vested interests of it. They must have buy-in. Self-nomination for standards - if no stake in it, then just noise, must have money, people, something involved.

Bob - supplemental request 20% of SEEK taxon money, need to submit to stinger in 30 days (over 100K). Walter

who is involved in management, if we get financial/in-kind investments, what is our commitment to them

Susan - we are to develop union of existing schema and how it maps to existing datasets

Bob - need buy in from datasources

Susan - no just need a portal

Jessie - we will contribute % of existing work to broader effort - what will others contribute - only commit year by year. Need $ for travel, indiv. meetings

Jim - must meet with Frank, Walter, GBIF - we are prepared to do it, see if they will contribute materially

Jessie - meet with people with models, try to extend our model to workable schema, produce something, discuss proposal with everyone, meet with everyone Need money for Jessie and one hire to come with, learn on job, write up. Discuss in Edinburgh.

Susan - in year 2 or 3, money for wrapper, generic tool for dataset merging, Thau:

SMS, OWL, Taxon
mock up fringed file fish, Jessie's schema, merge with OWL
Protege - ontology editing app
ITIS - Species2000
OWL built on rdf built on xml - enhanced xml
OWL specifies to merge
any OWL reader will do these functions
can query merged ontology
instance based approach - if instances are defined, reasoning tools can merge
GBIF - missed the point that reasoning about instances is the big missing link
description _ - extension of predicate logic
Genna reasons over OWL files
not great at probabilistic reasoning
Susan - need to be able to put in complex rules - criteria for merging on the fly
what is the performance with 100k nodes - does it die?
should things be represented as subclasses vs instances
genus is a union of these species
taxon is resource with properties, property/values are instances
OWL - built for people who are building ontologies and want to view them as they work. Just now, starting to look at what you can do with owl after you have ontology in OWL. Overlay rules, etc. Just now accepted as a standard. could spit stuff out into OWL.

Context - why does OWL make sense in Taxon and other working groups. SMS group is using OWL and these tools, (Genna - Geon too) This is the language of the semantic web

Jessie - nobody else is doing things in OWL at this scale. Taxonomists do not want computers to infer anything. Internal representations of our classes in OWL would be death.

Susan - could dump out small pieces into owl, then play and reason with them

Jim - would inferences be more important with other stuff?

Nico - could represent with indentation - nobody looks at 1000s at a time, so not a problem

Dave - when someone registers datasets and enter tax info, SMS finding datasources, could use language to constrain the way that ecologists enter info. Language for computers to talk to each other

Jessie - OWL/ ontologies to represent other data (did I get this point?) don't know how we can represent everything in OWL

Susan - could spit out small piece in owl then SMS can do other work to compare pieces

How do you compare models to each other - how do you represent the models

OWL is good: standard for describing ontologies

  • concept providers can share tax info stdly
  • has std tools to view, query, add to
  • mechanishms for combine, check reason over taxonomies
  • SMS can use the repr to
    • register datasets
    • plug tax info into their work flows
Data providers can easily control access to data - data is updated dynamically
  • to publish - put a text file on web server
  • tax compilations point to their sources
  • can archive by controlling file naming
  • access to data is logged via the web

OWL is bad:

  • tags are too contstraing
  • noneed to move behond xml
  • combining / reasoning, etc ...?

Next:
try doing something like GEON in tax domain?

  • register data to one tax (ITIS)
  • map itis to another tax
  • see if it scales
and more ?

Other possibilities:

  • work out with KR/SMS group needs from taxon (unique ids, defined ops)
  • focus on ontol repr of schema and compare to xml
  • ??

Dave- Useful aspects of OWL/ontologies that you can't get from XML schema?

Jessie - taxonomic classification =? ontology (or is it very narrow subset). If we allow others to build sub-ontologies, OWL might be good tool to create then store in another way. other - need to reason b/w other hierarchies and tie in with SMS

could just give SMS unique id's, use RDF

RDF - subject --> property --> value, standard format for unique id, resource

G - ontology is not a distribution mechanisms,

Dave - yes, just makes it easy/convenient RDF schema - fundamental to semantic web

Nico: TODO: GET SLIDES FROM NICO

  1. Need clarity of vision, conceptual unity, hard problems ahead of us
  2. In systematics, 60s-70s difference in conceptual outlook on what is systematics, related, phylogenic vs cladistic. cladistic won closely followed what happens in the real world.

Need to work on YET another model

Aim - 3 dim past "Genospace" / "Phenospace" soup --> Evolution --> present non-homogenouse distribution of clusters

The (typically imperfect) stability of clusters of properties in space and time (kinds) is a precondition for taxonomic names.

Discover properties of unique names

aim of systematics is to discover and name Kinds

The continuity of refernce in absence of an exact relation beween predicate (name) and referent (species) is possible because of a causal chain among the communicators (taxonomists) that stretches from the initial naming event to the present.

Must model frequency or stability of use.

Susan - would be good to see differences between versions of taxonomies

Martin - revisions can take small portions based on groups one is familiar with - there are ways to decide what makes up the subset of revision, could be based on character data

Bob - find something as example of best practice

Martin - cannot do cladistics in a vacuum, no way to tell whether your patent is better/worse than another, just different

Jessie - Nico should come up with what is wrong with our model Susan - give us an alternate proposal

Jessie - keep in mind difference between individual and species. species is an idea. This is an individual taxonomists view of what they are trying to do. Not taxonomists as a whole. This is what the individual thinks he has achieved.

Jim - How does this fit in with Tree of life , cladistic model, intermediate nodes without names, could just have diagnoses, characteristics

Martin - only need to name terminal nodes, internal nodes (clades?) only need internal reference. cladogram is not a formal classification with???

Jim - Most phylogeneticists are not doing this post-classification. Can our system handle this in 10 years when ???

Martin - Concepts are in terminal taxa (cladistic classificiation)

Susan - store the links between the nodes not a problem ??? can handle them as long as there is a link to something that we recognize

Jim - most systemestists do not go back and name all the leaf nodes

Nico - ??? clarification of our mission

Jim - formal classification is fading away - breaking down on the edges, we decide whether we can deal with it with our model


Jessie - get info from ecologist what field guide is used

Susan - get a 2nd hierarchy, then

Implications of new taxonomic schema

Must get a data source with real concepts to fill new schema - better test

  • Nico can do full data for weevils for test case
  • Walter - German mosses as test
  • Bob can put together some plant data w/in 2 months
  • New data
    • tools to import programmatically
    • simple user interface form to put data in our system for testing

datasets - Birds of Mexico does Town have bird concepts too?

Bob will export his data in XML Jessie will give us input on mapping of fields Dave will write XSLT translator to go to our latest schema

January meeting:

Jim- How can we interact more effectively between meetings Use cases - haven't done that

Susan - Have a mission for each meeting
What's the goal

  • assign to a person
  • have a plan unless they are stopped

Maybe we should look at a tool that takes a dataset with taxa names and produce EML (taxonomic coverage only)

Jessie - Do we ever record just genus or family? If so, we should never expand it (in the EML)

Susan - need great data on German mosses and datasets with great EmL detailing tax coverage of German mosses, good

If we have another taxonomy - it is fine if we have

Use case - ITIS has full tree of weak concepts have another tree of full concepts, small tree tools for importing conceptual data map concepts between trees

Dave tasks: look at use cases?

  • what kind of questions will be asked by users
  • types of matches to be done
  • what matches are experts interested in
  • how are they represented
  • expansion, weighting
  • business rules from Bob and Nico
  • semantic tools can't take probabilistic stuff into account
  • talk to Joanna about problems, weaknesses strengths
  • instances vs subclasses
  • be able to export our stuff to OWL

Jessie - will work on visualization tools to do the mapping between trees (for someone who knows the data)

Nico - Treemap - maps topologies - hypothetical extensions, doubling of trees, minimizes steps between trees

We should all email more often

Commitment for others within management process (decision making):
virtual project parallel to SEEK taxon
$$$ for a
half-time person to follow jessie, learn and write
travel money
jessie visit all the places
then create a proposal and
get a meeting for to get consensus



Go to top   Edit this page   More info...   Attach file...
This page last changed on 29-Jun-2004 10:21:12 PDT by LTER.stekell.