Science Taxon_12_May_2004

Berendsohn: should emphasize that these proposed systems to exchange taxonomic information are not just for the benefit of the narrow taxonomic community - but for the use of the wider scientific and global community
Bisby: raised issue of taxonomic treatments as a data source (= aggregators) give complete views of a genus etc., composed of existing concepts/ constituents; sees these compilations as as important as concepts
Kennedy: pointed out that aggregators would be represented as composing their own new concepts, which might have relationships (include) existing concepts; this relates somewhat to Peet's wish for higher-level natural entities, i.e. a name-based view of a concept...
Kennedy: says that one could choose to accept an aggregator's concepts, and if these changed with time one could accept this, or the aggregator (e.g. ITIS) should be obliged to take care of versioning etc.; in any case it would be represented in the transfer schema as an ITIS concept

2. Reactions to Kukla presentation (I)

various participants wanted to make sure that the schema was available on-line (through NeSC or the SEEK repository), and wished they had received it in advance (they should have)
Berendsohn: drew attention to TDWG ABCD and SDD groups ongoing resolution of metadata elements, and thought that we shouldn't be reproducing this; also proposed using units instead of vouchers/specimens; this is in the ABCD schema, and allows one to refer to statements made that don't refer to actual/extant specimens; also drew attention to the SDD description schema, which he suggested could be added as a place holder within descriptive circumscriptions
Berendsohn: said that IOPI has dropped a similar representation of relationships as directed from one concept to another; as some authors only make relationship observations, is there a need to create a new concept to capture this?, or could an authored relationship just reference existing concepts?
Kennedy: thinks that this ultimately is a new concept, i.e. a new view on taxonomy, and it simplifies things to have a single representation of concepts and relationships, which may not be that more expensive; however the schema represents relationships, individual implementations could still represent relationships in their own manner
Kukla: speculated whether a new concept type, purely for referencing such relationships, might be one solution...
Berendsohn: still asserted that having an author for a relationship was a sensible solution, though he disclaimed he was trying to solve things, but instead merely raising issues
Peet: pointed out that again this was up to individual implementations, and was reflected in Kukla's denormalisation of data in the transfer schema, which again doesn't impose this on the data providers/implementations
Ytow: asked what happened for the special case of inter-regnum species which can have two valid names (with different ranks etc.); thought it would be good to have these represented by a single concept (name-based again), whereas Kennedy viewed these as being represented by two separate concepts, which then can be related as congruent etc.
Ytow: also raised the question of what happened if a record recorded both sensu and "sec." assertions, Hichcliffe and Kennedy agreed that these would both be separate concepts with their own "according to" data

Bisby: was concerned that vernacular names are not represented in the ABCD decomposition of names - so how are they represented?; the transfer schema would hold them as a Simple-Name field, and not decompose or normalize them, this would be up to individual implementations that might want to parse them (broad bean example)
Bisby: can the "publication" (referenced) element in the "according to" element hold any type of reference, i.e. electronic, web, database etc? - (answered:) yes

Bisby: raised the types of relationships, are there enough, are they well defined, what happens if they are poorly defined in the source? Kennedy and Peet: said that this is very much a work in progress where it is necessary to collate a list of required relationships from a variety of sources, and that "synonymy" itself is a wooly relationship that can be used for ill-defined relationships; rhere are different types of relationships, some are set-based, others classical, nomenclatural, etc.
Berendsohn: has a large list, "misapplied" might be a useful one, but others are for other types of information where it is not easy to see how they are applied in the transfer schema, for example what about "conservation status"
Hinchcliffe: confirmed this was an issue with IPNI, e.g. could invalid names be represented just with a flag?, or does this information require a new concept?
Kennedy: remarked that the question of when a concept changed sufficiently to be thought of as new is a big issue, and not resolved yet
Berendsohn and Hichcliffe: seemed to think that names do need some bigger/better representation (with more attributes?) in the transfer schema; Berensohn agreed that although names and concepts are in general modeled differently, it is not necessary to reflect this in a transfer schema, as long as all the necessary information required by the people who use name concepts is captured; this may overlap with ownership issues, who can amend, correct, add to another person's concepts

Hobern: wondered whether relationships should be split up (as mutually exclusive sets?), but was reassured when pointed out that more than one relationship can be created for a concept
Berendsohn: questioned how one would extract the original concept from a revised concept; Kennedy thought that this was again getting more into implementation issues
Berendsohn: wondered whether there really was a useful distinction between revised and original concepts; Kennedy thought that original concepts would be a useful distinction as a baseline, if a system started of populating these, then later concepts would be able to refer to them

Bisby: questioned why circumscription is not treated as any other relationship; Kukla said that at least simplistically, circumscriptions would only refer to the same author's concept (within hierarchies), whereas relationships would be across hierarchies to other authors' concepts
Bisby: also wondered about the direction of circumscription, would there be inheritance of character circumscriptions from parents for example?; at the moment the transfer schema circumscription only deals with downwards traversal of the tree, not upwards; Kennedy mentioned that the interpretation of circumscriptions viz. real data is always problematic, since it is not always clear that the circumscriptions are complete, or whether they represent an amalgam of specimen and character circumscriptions, etc.; the latter was found in the moss data, when not all the data used to decide about congruence were necessarily captured (see Franz's presentation on Tuesday)
Schenck (sic.?, Austrian participant): asked whether there were any plans to investigate whether relationships could be represented in OWL (or similar applications) to allow reasoning; Kennedy: sees taxonomy as too large a problem to be tackled by current ontology/reasoning approaches, especially when one moves from schema to instances; the size/number/scope of ontologies required would be vast
Thau: reported that he has played with representing the schema in OWL to allow reasoning, though "queries" over OWL are difficult
Hobern: mentioned that from a processing point of view, bi-directional (or at least non-directed relationships) could be traversed more efficiently, whilst Kennedy pointed out that an implementation could choose to traverse a relationship in either or in both directions; the schema was asserting the direction from the specified concept to another concept in one direction(from the owning concept/author); more on this later...

3. Reactions to Thau presentation

Alistair (last name?): pointed out that many GUIDs are only truly globally unique within a defined domain; LSIDs are potentially globally unique because they are domain-linked; 64-bit GUIDs etc. can be truly globally unique, but are not necessarily resolvable, so they dont solve the problem
Ytow: wanted clarification that the GUIDs refer to digital objects (rather than physical objects), i.e. they resolve to electronic information/a data object
Hinchcliffe: pointed out that there was no guarantee of permanence for distributed GUIDs; if an LSID server disappeared, so might the resolvability of GUIDs that it had issued; one has to trust them to maintain the service and not to change the resources that the GUIDs point to; Handle domains can be transferred, so that this might be less of an issue
Franz: thought that getting into detailed issues of GUIDs was a diversion; we should just decide to adopt and try out a standard, unless we do this we will not learn what the issues and questions are
Thau: affirmed that the Handle System was free, whereas DOIs - which are an implementation on top of Handles - are expensive (and all their extra functionality could be home-made...); LSIDs are also relatively expensive to set up...

Kennedy: raised the point that in her opinion the main question to think about (regarding GUIDs) was the type of uniqueness; that it is an important goal to at least try for a system giving a concept a single GUID from the start; otherwise we will create a resolution nightmare; however, this does have the implication that there is a need for strong policies and possible centralized gatekeeper(s); will this be a problem when trying to get the user community to accept and adopt GUIDs?; this will need a careful balancing act

4. Reactions to Kukla presentation (II)

nobody seemed to object to Kukla's proposal to call the exchange format TCML (taxonomic concept mark-up language), though it will need checked out for availability
Hobern: asked everyone to consider what now were the main things that we need to do to test the exchange process
Peet: wanted to know if he could get his data back from the exchange format...
Kukla: said if the algorithm used in the conversion was known, this would theoretically be possible, apart from data that had been discarded, or ignored (what about all the denormalized fields...?)
Thau: suggested that if the data was exported from the database as XML, then could one not do an XSLT to make the TCML format?; indicated some previous success doing this with quite large datasets; Kukla was worried about the size of the XML data sets, and liked the string/parsing functions that he could use in Java, etc.
someone asked Kukla what the most tricky part of the conversion is; he replied that it is the identification of the actual concepts within the data set
Hinchcliffe: was keen to know if any of the data had an overlap with another set, where it would be really interesting to start looking how concepts relate among them...; Kukla said that the ITIS/mosses overlap, and that Martin Graham is looking at ways to visualise overlaps between the hierarchies created, and that this is one of the ultimate goals of SEEK - to allow this cross hierarchy resolution and exploration
Berendsohn: raised some ideas about using wrappers (wrapper applications such as BioCase?), but the details were missed (though see below)

5. Reactions to Kukla presentation (III) and final discussion

Hobern: started off by saying that from GBIFs standpoint, they were interested in providing users with suitable web interfaces to retrieve taxonomic name data, along the lines of what Spice has already implemented, but this exchange format would be used to provide fuller concept information from resources if this is available
GBIF has 22 use cases for such query/exchange, aswell as protocols for exchange of full datasets between tools/resources, which is not seen as requiring a transfer format
Franz: has looked at these in comparison with 15 SEEK use cases developed with Peet; all these use cases are on the web, people should look at them and consider overlap, what is missed, etc., though Kennedy reminds us that the TCML is just a data exchange format, which doesn't directly implement/support APIs
it was agreed that the development of TCML should be separate from the development of implementations, but that the implementations being developed by Bisby/SPICE and Dave Vieglais (KU/SEEK) should clearly leverage and feed back into the TCML development; we will need twp reference implementations of the standard when it is considered for adoption by TDWG. Walter pointed out that 3 months notice before a TDWG meeting is required prior to voting/adopting a proposal, so that this couldn't happen at this October's meetings
various participants (especially Bisby) were keen to have another workshop/meeting scheduled just prior the full plenary TDWG meetings in October

Kukla: is unclear about how much more development is seen for TCML, i.e. is it envisaged that a query language against TCML is required?, or would queries against the databases be in TCML format, etc.; yet it was clarified by the discussion to separate the TCML development from the implementation development; Hobern saw the purpose of TCML in being the format used by queried resources to return retrieved data
Bisby: was keen to stress that SPICE was happy to develop protocols for dealing with taxonomic concept resources when they became available, but they weren't out there yet...
it was felt that there definitely should be more cross consideration of development work between SPICE and SEEK etc., and that it would help if it was clear that data models and project specifications, APIs, etc., were all easily available
Berendsohn: encapsulated this TCML project as providing the data definition for all processes or processors who want to use the data; and that now is the time to examine the use cases and ensure that the TCML can support them, especially so that we avoid having to download whole datasets to answer simple queries, and avoid everyone doing the same functions in incompatible ways...
Kennedy/Peet: agreed that there was coordination needed between SEEK/GBIF/SPICE use cases

Kennedy: asked everyone now to raise any issues with the schema now
Bisby: congratulated Kennedy and Kukla on overall structure; the main issue is "what is one taxonomic concept?"
Hobern: thinks the system works because it isn't dogmatic about what a concept is; the schema allows multiple optional parts to the definition
Franz: agreed, shallow concepts (name+reference) are allowed, with a new concept for each change; rich concepts are supported by the schema - the problem is that few if any databases provide rich concepts; it should be up to the scientists/taxonomists/botanists etc. to push for database that capture such information
Bisby: opined that it is impossible for machines to resolve taxonomic concepts, and that this is the job of expert taxonomists; as these are a rare and dwindling resource, we have to capture as much of this knowledge as possible; for example if there are four different concepts, only an expert can resolve them; in Kennedy's parlance this creates a fifth concept which asserts synonymy relationships; however lots of other participants are unhappy that one cannot simply record the synonymy assertions (in the transfer schema) by authoring a relationship, and instead has to create a new concept
Berendsohn: is most concerned that some relationships would be lost in the current model, and indeed were lost in the moss work in the original (and similar) IOPI model, because there was no way for recording relationships between other people's concepts; most people agreed that this seemed to be an important omission from the model; Kennedy justified this omission on the basis that some taxonomists had told her that they would be unhappy for third party people to change/make relationships from other people's concepts, so currently it is not allowed unless it is tied to your own concept; Berendsohn countered that authoring a relationship makes it clear that one interpret's someone else's work which remains "untouched" in its original version
Bisby: thinks there is a lot of work to be done on these relationships, suggested experimentation (?) to see if TCML can capture all of the relationships that the various providers want/need
Kennedy: says she is happy to set up lists/wikis to allow everyone to pool the relationship types that are needed, and then look at this set with the experts to see if it is comprehensive, has exclusivities/incompatibilities, etc.; it is worth considering whether the list of relationship types should be expandable, or fixed at some time...
Bisby: liked the idea of separate plug-in libraries of relationships that people might choose among (how appropriate is this for a general transfer schema?), for example at one level one can categorise relations into ambiguous vs. unambiguous, but this categorisation cannot be mixed with fuller enumerations/lists
there was basic agreement that people should get together to work on this list of relationships, and whether they should be directed or not (Berendsogn, Bisby, Peet, White, and some zoologists confirmed their basic willingness to work on these issues)

Peet: came back to his GUID inflation worries, and when is a concept changed/new, which will need good rules or a deciding authority
Hobern, Kennedy, and everyone seemed clear that it is not desirable to represent slight changes/typos/different entries/documentations with separate GUIDs, e.g. there should only be one GUID for a Linnean original concept, not different GUIDS caused by slightly different accuracies of data entry
Berendsohn: suggested the idea of a dummy concept for every name in use, this would be a very weak concept, like a reference concept?; but it could be one way to allow easy access into the data, and capture uncertainties

Berendsohn: tried summing up that we should

dump vouchers or specimens and replace them with ABCD unit elements,
dump metadata as SDD and ABCD are working on a generic solution,
dump description as this conflicts with SDD, but Kennedy says that, since description can be of any type, this is not necessary, it can accommodate SDD descriptions, or anyone else's

Bisby: was concerned about using End-Note as the standard for publication fields, as there are actually TDWG guidelines for this (though there is no XSD representation)
Berendsohn: says he still has some issues about using historical names, etc.

Meeting Notes, Version 2 (Nico Franz, both talks & discussion)

PPTs still need to be added

1. Bisby Introduction

British efforts (SEEK-homolog: Biodiversity World); US efforts
larger picture: TDWG - development of standards for taxonomy, since 1985; though a general, world-wide, collaborative pattern in biodiversity research has been going on for many more years

2. Kennedy Presentation

overview: goals, progress so far
today: there are many named-based, taxonomic databases; relations among uses of names are not always clear; yet: many resource for information
a few modern databases try to capture "taxonomic concepts," though in different ways, yet those do not hold a lot of data that is well translated into a "concept format," bringing up the question whether "concepts" are necessary, for reliable scientific work
so then: "what is a taxonomic concept?," - perspective-dependent, many notions are out there, though a common view will facilitate data exchange, etc.
towards agreement: Portugal TDWG
towards a transfer schema: consult with main stakeholders in the taxonomic database development community: what was essential to transport?, what are the important issues and motivations?
analysis of differences and similarities, strengths and weaknesses, core interests and specializations; main motivation: to share information across multiple systems

today: half-way point in development of a transfer standard, final, revised version will be presented at TDWG in New Zealand
listing of entities/projects consulted
main perspectives: '(1) revisionary taxonomist' (Berlin Model, Prometheus), full, classificatory revision of a particular taxon, differences in handling synonymies, more/less focus on specimens; (2) 'taxonomy as recorded in publications' (Taxonomer, Nomencurator), assertions from (any?) publications, also: identifications, circumscriptions (though not always); (3) 'species-focused taxonomy' (Species 2000, ITIS, Biodiversity World), under the assumption that a "definitive" list is achievable, and "finished" at some point (mostly name-centered approach); (4) 'named-based taxonomy' (IPNI, APNI), published names and validity according to Codes (traditional view); (5) 'database taxonomy' (GBIF, SEEK, VegBank), collecting and understanding taxonomic assertions as stored in actual databases (not just traditional publications), most wide-ranging approach, attempting to capture "everything"
in general: taxonomy encompasses classification, nomenclature, and identification; sometimes these activities are not separated well enough, causing misunderstandings
names versus concepts distinction: names require definitions
full scientific name - implies an 'original concept' - according to author/date attached to the name
revisionary taxonomy: multiple definitions can be associated with a name - 'revised concept'
'reference concept:' not well defined, ambiguous, link to someone else's concepts (in other publications), just mentioned of a full scientific name in a publication that really emphasized its own concepts
over time: 'reference concepts should be replaced in the database by either original or revised concepts'
finally: 'vernacular concepts' - "labels" - used to allow possible later replacement by more deeper concepts
'thus: "names" are not really needed in the database'

how to define a concept: are character circumscriptions necessary (language is context-dependent, ambiguous), currently: optional in the transfer schema
or: concept has a taxon circumscription (?) - in terms of lower level taxa (as done in Prometheus)
yet another alternative way to define concepts: through specimens (minimally: type specimens - original concepts)
what about relationships to other taxa: part of the concept definition?
finally: "publication as is" is available on-line and "vouchers" for the definition
now: where the project is so far, clarify the semantics, highlight the main features
Robert Kukla: overview of schema; then Dave Thau: GUIDs (what are the possible roles, reference to concepts across databases); Kukla: experiences with ITIS/Berlin Model data in terms of applying the transfer schema
'outstanding questions/issues:' can we talk about names independent of concepts?, are common names important?, is it possible to change the definition of an existing concept (e.g. in terms of information added), when should we consider having a new concept (in the deep sense), how do we separate "creation of," from "identification to?"; who "owns" concepts?, who owns GUIDs?; also: we don't even have a comprehensive list of name, so why worry about concepts already?; what about errors (spelling mistakes, quality, ...) - however: 'problems need to be tackled sooner rather than later!' (problem of inflation if no regulations exist)

Questions/Discussion

Berendsohn: '"why?" do we need concepts' - usability as a justification - possibility to exchange information tied to taxa
Bisby: affirming this, "we" need to emphasize that concepts will be the basis of the knowledge structure, 'involve taxonomists in connecting information to which ecologies etc. is linked'
Bisby: goal of species-focused lists: no necessarily depth in concepts, but comprehensive, all-encompassing, internally consistent treatment of (named-based) taxa, consistency is the measuring stick, not necessarily depth - this is what aggregators/compilers like ITIS are engaged in
Kennedy: rearranging of constituents during this process "creates" new concepts, this is like a "revision" of the entire taxonomic tree, changes are various levels
Bisby: necessary to have a "notion" of upper-level concepts that are consistent throughout
Berendsohn: some database change concept without necessarily "labeling" those changes, and we need to discuss the implications of this practice on
Bisby: biological "integrity" is desirable/necessary - we should have a term for this
Kennedy: this might be addressed in the transfer schema

3. Kukla Presentation (I)

'transfer schema' - use to pass around taxonomic concept information
XML, tree structure; extra conventions - particular to transfer schema
element names, attributes, repeated elements, complex elements
top level of transfer schema: root = TDWG
the 4 container elements: 'metadata (what's contained, who created it), taxon concepts, vouchers, publications'
'little normalization' since different databases de-/normalize different items in idiosyncratic ways
description of metadata (creator, etc.), with indexing service
now: voucher - recording information about location of physical specimens (catch phrase an actual repository information)
next: publication - simple version (human-readable) + detailed, atomized version for computers to reason about - EndNote as a de facto standard, generic list of terms
now: 'taxonomic concepts - 4 core types:' original/revised (= "own" concept - everything that is available (known) about the concept is actually present in that publication) /referenced (incomplete, ambiguous, not completely specified like it is in other publications) /vernacular (odd type - downstream possibility of further resolution/specification as original or revised concept)
components: '(1) name: simple/detailed' (bacterial/botanical/zoological/viral), based on ABCD schema (some structures added to accommodate higher-level taxa); '(2) "according to"' author/date/publication + microreference; '(3) relationships' - reflecting views of authors - concept synonymies + name synonymies; '(4) specimen circumscription - vouchers' (types, etc.); '(5) taxon concept circumscription' - based on other concepts (at a lower rank - typically only one level lower); '(6) kingdom, (7) rank, (8) payload, (9) character circumscription'
summary: denormalized XML, shorthands for atomized versions, GUID support, ABCD schema used

Questions/Discussion

is the 'schema accessible to others for revision?' answer: yes - in CVS ecoinformatics, project "Taxon"
Berendsohn: would be nice to have schema in XML (Spy-format) directly available in the e-mail as an attachment (in general, not just here)
Berendsohn: '3 of 4 core parts are in essence references to other projects:' 1. metadata: description standards (ABCD), 2. vouchers should be renamed "units" as in ABCD schema (metadata envelope), to record observations as well (when no vouchers are available), versions of ABCD and STD are still being revised, 3. circumscriptions also already covered elsewhere (STD, ABCD)
Berendsohn: re - relationships being directed, problem case: 'sometimes authors make comments about relationships without saying anything else about concepts,' synonymy relations are "sitting on top" of concepts, otherwise things get complicated (assuming the author of concept is also author of concept relationship, which is not always the case, and/or useful)
Kennedy replying: those authors have "another notion" of, e.g., Abies alba - and relate that new one to all previous ones through synonymy, ergo '"just relationships" are interpreted as if they were new concepts;' also: transfer schema is not a data model, so others might do this differently
Peet: this issue is a reflection of 'different people's priorities to de-/normalize things'
Ytow: issue of identical names classified in two kingdoms - Kennedy - these are two concepts that need to be synonymized after the fact; this illustrates the name/concept difference
Ytow: name is more precise in this case: 1 name entity referring to 1 natural entity, independent of the 2 ways to classify the name in separate publications;
Kennedy: it's ultimately simpler to treat each as a concept and relate them afterwards

Bisby: name acc. to 4 codes of nomenclature - vernacular names are handled only in the simple system, even though their structures could be atomized; Kennedy: if that's the case then the transfer schema supports that
Bisby: difference between "sec." and "publication" - publication here sensu lato, including e.g. webpages
Bisby: name and concept synonymy, most are name synonymies (99%) which are ambiguous; Kennedy: "ambiguous" is a case of synonymy that's accommodated
Bisby: 'what about misapplications?;' Kennedy: these are candidate concepts as well
Peet: how to denote concept synonymies is an outstanding issue
Berendsohn: 'nomenclatural relationships' are an issue, but the Codes require other things about names (e.g. 'names conserved' in themselves, not in relation to anything else), there are some genuine name issues that need to be addressed, traditional standards about how to handle names need be accommodated, this is currently not the case in the transfer schema; Kennedy: this might be addressed, also talking to IPNI, things like "not available"; Berendsohn: certain properties about names need be addressed in a fully fleshed-out concept view; this is not covered by ABCD, not covered in current schema, and necessary and a property of name (which creates potential issues in a full-scale concept world view)
Berendsohn: we can solve these issues by stipulating a new concept when something about the status of a name is modified/issued (e.g., conservation, or not, etc.)

Hobern: maybe the relationships need to be flesh out because some are orthogonal to each other; Kennedy: this is possible in the current concept
Ytow: to support vernacular names, a "local field" (ISO standard) is helpful (to specify region of use); also: some names have both "sensu" and "sec." associated with them; Kennedy: again, 2 concepts are created
Berendsohn: every revised concept necessarily has to have a relation to the original concept, how is that accommodated?; Kennedy: some databases can do this, others need to come to terms with how this is addressed (e.g. ITIS, when effecting the transfer process, must specify how the only implicit assumptions about relationships should be made explicit according to the preferred ITIS view)
Peet: relationships are optional
Kennedy (to Berendsohn): original/revised concept distinction helps us understand things about various entities' use (meaning) of the term "concept", but has no bearing on the transfer schema structure
Bisby: issue about "inclusion" as synonymy vs. parent/child relation (the two can sometimes merge into each other), also character circumscriptions (inheritance of characteristics from above)
Kennedy: some of these intricacies in circumscriptions can't be accommodated only through set relationships, plus: going up & down the tree; there a basic difference (as represented in the transfer schema) between relationships between & among treatments (point has now been clarified for Bisby)

question: will this be mapped onto an 'ontological forma't like OWL, taking advantage of the work that has been done there; Kennedy: SMS explores that, so far nothing is planned, since traditional ontologies are trivial in comparison to what happens in the taxonomy in terms of range and contextuality, it's just too complex
Thau: OWL representations of transfer schema have been explored, though actual utility is largely unclear
Thau: it is difficult to query OWL, which really means reverting to XML (which is neutral to the object (nested)/relational (flat) distinction
Hobern: transfer schema structure implies that is difficult to work yourself up the hierarchy (since relationships are unidirectional and flattened out); Kennedy: the is a transfer schema, not an optimal way to display and explore the data

4. Thau Presentation

GUIDs - how to access/use taxonomic concepts
(1) what are GUIDs, ISBN, patents, GenBank (launched through Scientific Publishers), DOIs, LSID (Life Science ID); common features: short, useful for locating information about entities, identifying only one entity (uniquely or not), sense of permanence; differences: some entities can/cannot have only one GUID; who is in charge of issuing GUIDs (de-/centralized)
(2) why GUIDs; useful internally to manage taxonomic information, enable communication across systems that are not under one's immediate control, across user communities (e.g. scientific publishing industry), plus: typical GUID motivations (see above), also: resolvability
(3) when: taxonomic concepts, publications, specimens (ability to reason across concept synonymies through in-/exclusion of specimens); less clear to assign GUIDs to various other things: providers, journals, authors, etc.; when is a concept "new"?, optimal: one GUID per one concept - issues of implementation, what minor changes are allowable? (e.g. discovery of error in inputting the page number of a concept) - assign these implementation issues to the actual domain specialists, empower information providers
(4) which kind of GUIDs, serious candidates: LSIDs, Handle System (used by DOI system), currently most used (3 million resolutions per year), 10 years old, though: it is proprietary (somewhat centralized), code is not available (not cheap), each are relatively similar, SEEK chose the Handle System in a prototyping effort, neutral to web services (which is advantageous when "owners" change)
(5) how to encourage users to assign and use GUIDs, is it necessary, what system, 1-GUID-1-concept rule is necessary?

Questions/Discussion

comment: GUIDs are unique to certain domains of discourse only, so boundaries of domains of discourse must be specified; Thau: in case of LSID, it's the internet; various ways to specify boundaries of domain, through agreement/stipulation/or make optional the choice for each to accept some GUIDs
Ytow: GUIDs are identifying physical specimens too?; Thau: only as digital proxies thereof
Hinchcliffe: issuing GUIDs does not solve the permanency issue; Thau: this is not an issue particular to taxonomic databases
question: what's the price of issuing/maintaining/resolving (though DOI system provides extra services and pays for them)
Kennedy: 1-to-1 system is necessary to avoid ridiculous inflation of GUIDs
Peet: what's a concept in the GUID context has not yet been addressed!; lots of room for argument

5. Kukla Presentation (II)

experience from mapping existing models to transfer schema
issue of identifying concepts, extracting relationships, circumscriptions; no character/specimen circumscriptions
'(1) ITIS:' synonymy relations are ambiguous, circumscriptions through children are viewed as "incomplete"
'(2) Berlin Model:' easiest to implement (mosses/plants), many different relationships
'(3) Taxonomer:' complex model, assertions, protonyms (original concepts), some difficulties in interpreting Pyle's degrees of confidence into concepts, relationships not yet populated
various technical aspects

Questions/Discussion

Hobern: necessary to also identify "things to do" in the future
Peet: what about getting the structure back again into the source database; Kukla: possible if algorithms are known; Hobern: transfer packages should come as input/output packages
Thau: what about using XML as depositories; Kukla: might be too large, look-up are tedious
main problems?: how to interpret data as concepts (taxonomic meaning of information)
Hinchcliffe: what about transferring mosses as in German list and ITIS?, Kukla: visual comparisons are still pending (Martin Graham); Kennedy: this is also planned in SEEK; Hinchcliffe: this could be a way to see whether "value-added" can be had by seeing that more complex concepts relate to simpler ones
Berendsohn: future task: use mapping/wrapper protocols (Digger, BioCase, BioDigger) to extract information from other species-orientated databases

6. Kukla Presentation (III)

Hobern (moderating): GBIF needs soon a suitable web interfaces for users to access taxonomic information (web service interface), use cases for users; how much of this is covered by the transfer schema; Kennedy: transfer schema is about acquiring data, various ways of displaying
Hobern: transfer of metadata (who authored the concepts?) needs to be covered; data format that is being adopted should support the kinds of use cases the major centralized providers are trying to address; also: what about compatibility with Digger etc.
Kukla now: TCML - taxonomic concept mark-up language: create, transport, manipulate, transform
research and record concepts; transfer checklists

7. General Discussion

Hobern: TCML as a way to mark up a database to describe queries into those databases; TCML document would contain information about how concepts are conceived in the database
Bisby: Species 2000 is funded to develop a protocol to do these things, although it is focused on names, however, Species 2000 should be emphasizing these process; Thau: have the APIs been published: Bisby: yes, they are on-line (not an API, however, it is a protocol)
Berendsohn: necessary to distinguish between distributed versus centralized view
Bisby: same interface could handle both kinds of queries across a particular or many sets of (accepted) databases (Spice)
White: actually the Species 2000 protocol is more like an API than just protocol, and it's accessible
Thau: does it have the full concept functionality; answer: this is intended, but not yet implemented

Vieglais: SEEK has already published an openly accessible API for this purpose
Kennedy: who has a prototype protocol?
Berendsohn: distinguish between protocols and data definitions; protocols need to be more efficient, query things without having to access all possible/available data
Hobern: 'protocol/ABI must be able to handle the potential complexity of the taxonomic information; need to organize various efforts'
Berendsohn: important to get people together who are working towards standards to get access to taxonomic data; related issue: examine structure for transferring schema
Kennedy: 'what is the relation of protocol/schema progress?'
Hobern: 3 tasks: 1. transfer information (ok for protocol); 2. support the kinds of queries that SEEK/Spice envision (already more difficult); 3. have something like a matching Digger protocol - i.e. TCML; 'transfer schema should drive protocol development'
Bisby: how are we planning to allocate some time for these issues at TDWG New Zealand

Kennedy: 'are there any major objections to the transfer schema?'
Bisby: general acceptance, but "what is concept?" remains as an issue
Berendsohn: 'relationships must "sit on top"' (triangular drawing, ABC - A unequal B, and A unequal C, how to annotate equality of B and C without referencing A): his only objection to the current transfer schema
Thau: this problem also comes up in cases of non-valid names (according to someone)
Bisby: necessary to capture relations among concepts as experts conceive them
Berendsohn: 'let's have one set of people explore concept synonymies further'
Peet: 'inflation of GUIDs;' Kennedy: legacy data: each publication (original concept) should only get only 1 GUID
Hobern: 'GUIDs are probably the key issue when still need to address!'
Peet: SEEK envisions a tool to mark up datasets, to achieve name/concept relationships, hierarchy of qualities
Berendsohn: use the term "units" (sec. ABCD)
Bisby: 'TDWG standards' - 1. names from ABCD; 2. vouchers/units (ABCD); 3. metadata (SDD); 4. descriptive data (STD)
Hensley-Project at Smithsonian - Biologia Centrali-Americana - new insights about how to references the legacy publications (see http://www.sil.si.edu/bcaproject/resources.htm)
'important to keep on getting feed-back'

Miscellaneous Additional Notes (by Aimee Stewart)

Ytow: why do we need concepts - use name only
Peet: this is just an exchange schema, way of denormalizing, you can do what you want, normalize in your own way for your own purposes
Bisby: ability for complex common names, relationships - wants a synonymy relationship that is not explicit
Berendsohn: relationship between concepts and names - non-relationship, names conserved?, invalid names - think that name should be separate entity/concept, every revised concept is attached to an original concept.
Ytow: vernacular should support ISO standard with language, etc.
any plans to represent this schema in OWL?, JK: is this an ontology problem - no!
comment: GUIDs are ONLY unique within a certain domain; issues of trust, permanence
Kennedy: we must decide on importance of 1 GUID per 1 concept rule; she says yes - otherwise more complex problem than we have now

Attachments:

TDWG_robert3.ppt		69632 bytes
TDWG_robert1.ppt		173056 bytes
Why_do_we_need_a_taxonomic_concept_transfer.ppt		130048 bytes
SeekTaxonMay12Summary.doc		34304 bytes
TDWG_robert2.ppt		81408 bytes

Go to top Edit this page More info... Attach file...

This page last changed on 07-Jul-2004 10:03:12 PDT by LTER.stekell.

Science Taxon_12_May_2004

PPT presentations

Meeting Notes, Version 1 (Trevor Paterson, focus on discussion)

1. Reactions to Kennedy presentation

2. Reactions to Kukla presentation (I)

3. Reactions to Thau presentation

4. Reactions to Kukla presentation (II)

5. Reactions to Kukla presentation (III) and final discussion

Meeting Notes, Version 2 (Nico Franz, both talks & discussion)

1. Bisby Introduction

2. Kennedy Presentation

Questions/Discussion

3. Kukla Presentation (I)

Questions/Discussion

4. Thau Presentation

Questions/Discussion

5. Kukla Presentation (II)

Questions/Discussion

6. Kukla Presentation (III)

7. General Discussion

Miscellaneous Additional Notes (by Aimee Stewart)