SEEK-Wiki: Taxon Meeting Jan 30

-SEEK-Home
About SEEK
Tools
Education
Publications
Opportunities
Community
About This Site
Calendar
+            Taxon Meeting Jan 30_2003
         
          
            Your trail: Jobs | SVNRepository | StatusReports | Intranet | KnowledgeRepresentationCommunity | EcoGridCommunity | SemanticMediationCommunity | AnalysisAndModelingCommunity | SEEKTaxonCommunity | EducationAndOutreachCommunity
         
      

      

      

      
         



      
          Difference between 
          current version 
          and 
          current version:
          

          
At line 0 added 220 lines.
+ ; Polycom address : 205.253.57.82
+
+ !! Attendees
+ * Jim Beach
+ * Crispin Wilson
+ * Bob Peet
+ * Jesse Kennedy
+ * Dave Vieglais
+ * Bill Michener
+ * Matthew Jones
+ * Ricardo Pereira
+ * Scott Downie
+ * Greg Vorontsov
+ * Aimee Stewart
+ * Meg Kumin
+
+ !! Links
+ * [SEEK-Home | WelcomeToSEEK]
+ * [ Use cases on SEEK CVS site | http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/docs/design/design-ams-1.0.0.txt?rev=1.1&content-type=text/x-cvsweb-markup]
+ * [Prometheus | http://www.dcs.napier.ac.uk/~prometheus/]
+ * [David Remsen's uBio site | http://www.ubio.org/people/team.html]
+ * Natural history Museum London
+ * [James Ytow's Nomencurator site | http://www.nomencurator.org/]
+ * [James Ytow's comparison between models work - presented at TDWG 2001 still in progress | http://www.nomencurator.org/TDWG.html]
+ * [Richard Pyle's schema for concept based taxonomy | http://www2.bishopmuseum.org/PBS/schema/PyleSchema.pdf]
+ * [Nature Serve | http://www.natureserve.org/]
+ * [FGDC Biological Data Working Group -Biological Nomenclature/Taxonomy Meeting Summary | http://biology.usgs.gov/fgdc.bio/FGDC_TaxNom.doc]
+ * [TDWG Subgroup on Biological Collection Data | http://www.bgbm.org/TDWG/acc/Referenc.htm]
+ * [IOPI | http://www.bgbm.org/IOPI/IOPIModel73/7301root.htm]
+ * [Perobase - J Rose, South Carolina | http://wotan.cse.sc.edu/perobase/datamodel/Taxonomy/taxonomy.htm]
+ * [SPICE | http://www.systematics.reading.ac.uk/spice/]
+ * [LITCHI | http://litchi.biol.soton.ac.uk/]
+ * [http://www.kreditkarten-tipp.de] [http://www.fastenwandern-heppner.de]
+ * [http://www.kaufsonline.de] [http://www.badado.de]
+
+
+
+ !! Crispin Wilson
+ * [http://65.205.36.26/taxon/index.asp] (login: bcis, 12@bcis)
+
+ Multiple taxonomy resolution with annotation.  Primarily for browsing - should not be too hard to add a programmatic interface.  Data stored within a single database- centralized repository.
+
+
+ !! Use Cases
+
+ There is now a TaxonUseCases page, which contains the current list of use cases for the SEEK taxon group.
+
+
+ !! Questions
+ * What is a Taxonomic Concept?
+ * What sort of mapping methods?  Hard coded by specialist? Automated?
+
+ ----
+ !! Revised Scope
+ Services:
+
+ * An Internet taxonomic concept (assertion) resolution service employing a semantic mediation engine would exploit the SEEK architecture to enable precise species concept based data discovery and integration.
+ * Service to provide a measure of relative equivalence of two (or more) taxonomic names.
+ * e.g. Algae taxonomy is very dynamic- need mechanism to determin if the data in two experiments such as density measurements can be compared- was the experimenter working with the same organism?
+
+ * Mechanism for integrating existing and ongoing efforts rather than creating another stand-alone system
+ ** Develop standard interchange model
+ ** Standard API for accessing system(s) regardless of back-end architecture and implementation
+
+
+ Architecture:
+
+ * Central facility for the SEEK environment
+ * Shall support distributed classification and concept maps with standard schemas for storage and programatic interfaces
+ * Must maintain autonomy for data providers.  Architecture provides value to authors for contributions.
+ * Interface specification not implementation.  Providers must be able to support the interface defined by this project.
+ * Support multiple classifications with arbitrary number of levels
+
+ Why Distributed?
+ * Contributors retain control
+ * Scalability
+ * Distributed in the sense of contribution and editing, not necessarily for database infrastructure.
+
+ Why Not?
+ * Need to provide a fast, reliable service.  Distributed model can make this quite complicated.  This is primarily a problem for the database, not the activity.
+
+ Populating the Resource:
+ * Populate initially with weak concept lists (such as ITIS)
+ * Populate in more detail through several tiers of concepts and relatedness for some portions to demonstrate capability and functionality
+ * Prioritize population process by the needs of demonstration experiments for the project.
+
+ * What is the minimum content for a concept entry?
+ {{{
+ concept = Full name + (author + publication, date)  +  usage reference [to taxonomic work]
+ }}}
+ ; '''Full name''' : can be a "proper" scientific name or any other string that provides a handle that was used in a publication (e.g. an experiment label such as "spp#1")
+
+ ; '''Minimal e.g.''' : ITIS as a dynamic system does not provide a set of concepts, but a published version of ITIS would qualify as a set of concepts.
+
+
+
+
+
+
+ Names by themselves are a useful entity
+
+ Names are not substitutes for concepts
+
+ ----
+
+ !!! From the Proposal
+
+ An Internet taxonomic concept (assertion) resolution service employing a semantic
+ mediation engine would exploit the SEEK architecture to enable precise species concept based data discovery and integration. This specific concept identity resolution problem is representative of a large class of problems (e.g. with classifications for biotic communities, soils, rocks, places) where there exist many-to-many relationships between concepts and names. The solutions we develop should have fundamental utility far beyond biological nomenclature and biodiversity.
+
+ !! IT Research Challenges
+ * Development of a comprehensive conceptual model that can represent all relevant aspects of biological classification and nomenclature semantics, specifically models of multiple interpretations depending on explicit representations of context information, e.g., temporal, hierarchical and circumscription dimensions.
+ * Development of logic representations that allow reasoning about the consistency and consequences of multiple, possibly competing interpretations. For example, using formalizations in modal and many-valued logic [84, 85], an automated deduction system may be devised that allows one to systematically compute all consequences of different taxonomic interpretations and feed those into the semantic mediation system, which in turn would show the different data and analysis views arising from the different nomenclature interpretations.
+ * Deducing concepts rather than species name strings from distributed taxonomic data sources
+
+ !! Deliverables
+ * Conceptual schema and data model for concept-based nomenclature data leveraging previous research by collaborators and colleagues.
+ * Data entry software that allows scientists to add new or published assertions as needed and to map institutional and personal perspectives on the relationships among assertions.
+ * Desktop visualization tools for data discovery and management of multiple classifications will be based on previous work by the Napier University Prometheus Project and others.
+ * Database implementation for an operational, Web-accessible prototype database with representative data from several different taxonomic groups (e.g. higher plants, fishes) aimed ultimately at a global, distributed and federated system of taxonomic concept servers.
+ * Internet service for automated name/concept resolution, accessible via EcoGrid, for several groups using information from synonymies currently available in public databases.
+ * Usability analysis of the functional requirements by working group members would evaluate all applications and tools developed for nomenclature resolution.
+
+    +++ Added during workshop +++
+
+ !! Milestones
+
+ ! Year 1
+ * Communications and outreach activities required to avoid duplication of effort (e.g. TDWG effort)
+ ** All classification database mafia
+ ** attend meetings / workshops etc to familiarize ourselves with other projects
+ ** populate working group activity with reps. from other groups
+ * Schedule working group meeting soon
+ * Draft schema for taxonomic concept object - as annotated XML-Schema document
+ * Reports by Bob and Jesse on their systems
+ * Jesse, Bob and Jim will take lead in analyzing all the other models and developing a plan for communicating with the rest of the group
+ * By February meeting
+ ** working group will summarize research on data models
+ ** This document including use cases will be completed
+ * Identify and hire human resources (early)
+
+ ! Year 2
+
+
+
+    +++
+
+
+ !!! From CVS Document
+
+ !! Scope
+
+ Includes any type of analysis or model in ecology and biodiversity science.
+
+ Goal is to massively streamline the analysis and modeling process, and provide
+ for archiving analyses and their outputs. Includes support for analyses in SAS,
+ Matlab, R, SysStat and custom models written in various languages (e.g., C).
+ The system should allow the addition of various back-end anaylytical engines as
+ they become available or as new versions are released.
+
+ The system as a whole should not be tied to any one metadata standard, back-end
+ system or operating system/platform.  Flexibility should be a major concern
+ in the design process due to the heterogeneous makeup of the ecological
+ scientific community.
+
+ The system should include features that assist users in determining the
+ appropriateness of combining various analytical steps and data sources based
+ on semantic mediation.  Semantic mediation should occur in three areas.  First,
+ to determine whether it is appropriate to link together particular analytic
+ steps. Second, to mediate between multiple data sets to determine in what ways
+ they can be combined.  Third, to determine whether the selected data sources are
+ appropriate inputs for the selected analysis.
+
+
+ !! Functional requirements
+ {{{
+   FR1: Analyses and models documented in declarative language (e.g., XML)
+   FR2: Must support 'pipelining' of models in a graph
+   FR3: Ability to archive analyses and their outputs
+   FR4: Ability to version analyses and their outputs
+   FR5: Must have an easy-to-use front end GUI to assist scientists with
+        building and executing pipelines
+   FR6: Allows the sharing of analytical processes amongst scientists
+   FR7: Flexibility in input, processing and output.  e.g. not binding the
+        system to one metadata standard, back-end system or platform
+ }}}
+
+ !! Use cases
+ {{{
+   UC1: Scientist can create new analytic steps
+   UC2: Scientist can use a graphical interface to arrange analytical steps
+        into a pipeline, save it, bind data to the inputs, and execute it
+   UC3: Scientist can execute an analysis or model described in a declarative
+        language
+   UC4: Scientist can archive various intermediate and endpoint results of an
+        analytical process
+   UC5: Scientist can create new versions of analytical steps, and can return
+        to old versions
+   UC6: Scientist can share coded pipelines or sub-pipeline steps and results
+        of pipeline analyses with other scientists
+   UC7: Administrators can add support for additional metadata processors
+        and back-end systems when needed
+   UC8: Scientist can work backwards through a pipeline of interest and so by
+        starting with knowledge of the semantics of the result of interest
+        is able to determine the type of data needed as inputs to the pipeline
+   UC9: Given a particular data set and set of pipelines, the scientist can
+        use the semantic mediation system to determine the types of analyses
+        that are possible to carry out on the data set.
+ }}}
+ !! Software components
+ {{{
+   SW1: Metadata language for formal description of analyses
+   SW2: Metadata language for the formal description of data and model semantics
+   SW3: Server-side system for execution of analyses and models
+   SW4: Server-side system for processing semantic metadata
+   SW5: Client interface for creating and executing analyses and models
+ }}}
+ ----
+
+ [Old version of this page | http://seek.speciesanalyst.net/ow.asp?p=MeetingJan30_2003]


          

      

      

      
      Back to Taxon Meeting Jan 30_2003,
       or to the Page History.
 Taxon Meeting Jan 30_2003
 Your trail: Jobs | SVNRepository | StatusReports | Intranet | KnowledgeRepresentationCommunity | EcoGridCommunity | SemanticMediationCommunity | AnalysisAndModelingCommunity | SEEKTaxonCommunity | EducationAndOutreachCommunity
 At line 0 added 220 lines.
++ ; Polycom address : 205.253.57.82
++
++ !! Attendees
++ * Jim Beach
++ * Crispin Wilson
++ * Bob Peet
++ * Jesse Kennedy
++ * Dave Vieglais
++ * Bill Michener
++ * Matthew Jones
++ * Ricardo Pereira
++ * Scott Downie
++ * Greg Vorontsov
++ * Aimee Stewart
++ * Meg Kumin
++
++ !! Links
++ * [SEEK-Home | WelcomeToSEEK]
++ * [ Use cases on SEEK CVS site | http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/seek/docs/design/design-ams-1.0.0.txt?rev=1.1&content-type=text/x-cvsweb-markup]
++ * [Prometheus | http://www.dcs.napier.ac.uk/~prometheus/]
++ * [David Remsen's uBio site | http://www.ubio.org/people/team.html]
++ * Natural history Museum London
++ * [James Ytow's Nomencurator site | http://www.nomencurator.org/]
++ * [James Ytow's comparison between models work - presented at TDWG 2001 still in progress | http://www.nomencurator.org/TDWG.html]
++ * [Richard Pyle's schema for concept based taxonomy | http://www2.bishopmuseum.org/PBS/schema/PyleSchema.pdf]
++ * [Nature Serve | http://www.natureserve.org/]
++ * [FGDC Biological Data Working Group -Biological Nomenclature/Taxonomy Meeting Summary | http://biology.usgs.gov/fgdc.bio/FGDC_TaxNom.doc]
++ * [TDWG Subgroup on Biological Collection Data | http://www.bgbm.org/TDWG/acc/Referenc.htm]
++ * [IOPI | http://www.bgbm.org/IOPI/IOPIModel73/7301root.htm]
++ * [Perobase - J Rose, South Carolina | http://wotan.cse.sc.edu/perobase/datamodel/Taxonomy/taxonomy.htm]
++ * [SPICE | http://www.systematics.reading.ac.uk/spice/]
++ * [LITCHI | http://litchi.biol.soton.ac.uk/]
++ * [http://www.kreditkarten-tipp.de] [http://www.fastenwandern-heppner.de]
++ * [http://www.kaufsonline.de] [http://www.badado.de]
++
++
++
++ !! Crispin Wilson
++ * [http://65.205.36.26/taxon/index.asp] (login: bcis, 12@bcis)
++
++ Multiple taxonomy resolution with annotation.  Primarily for browsing - should not be too hard to add a programmatic interface.  Data stored within a single database- centralized repository.
++
++
++ !! Use Cases
++
++ There is now a TaxonUseCases page, which contains the current list of use cases for the SEEK taxon group.
++
++
++ !! Questions
++ * What is a Taxonomic Concept?
++ * What sort of mapping methods?  Hard coded by specialist? Automated?
++
++ ----
++ !! Revised Scope
++ Services:
++
++ * An Internet taxonomic concept (assertion) resolution service employing a semantic mediation engine would exploit the SEEK architecture to enable precise species concept based data discovery and integration.
++ * Service to provide a measure of relative equivalence of two (or more) taxonomic names.
++ * e.g. Algae taxonomy is very dynamic- need mechanism to determin if the data in two experiments such as density measurements can be compared- was the experimenter working with the same organism?
++
++ * Mechanism for integrating existing and ongoing efforts rather than creating another stand-alone system
++ ** Develop standard interchange model
++ ** Standard API for accessing system(s) regardless of back-end architecture and implementation
++
++
++ Architecture:
++
++ * Central facility for the SEEK environment
++ * Shall support distributed classification and concept maps with standard schemas for storage and programatic interfaces
++ * Must maintain autonomy for data providers.  Architecture provides value to authors for contributions.
++ * Interface specification not implementation.  Providers must be able to support the interface defined by this project.
++ * Support multiple classifications with arbitrary number of levels
++
++ Why Distributed?
++ * Contributors retain control
++ * Scalability
++ * Distributed in the sense of contribution and editing, not necessarily for database infrastructure.
++
++ Why Not?
++ * Need to provide a fast, reliable service.  Distributed model can make this quite complicated.  This is primarily a problem for the database, not the activity.
++
++ Populating the Resource:
++ * Populate initially with weak concept lists (such as ITIS)
++ * Populate in more detail through several tiers of concepts and relatedness for some portions to demonstrate capability and functionality
++ * Prioritize population process by the needs of demonstration experiments for the project.
++
++ * What is the minimum content for a concept entry?
++ {{{
++ concept = Full name + (author + publication, date)  +  usage reference [to taxonomic work]
++ }}}
++ ; '''Full name''' : can be a "proper" scientific name or any other string that provides a handle that was used in a publication (e.g. an experiment label such as "spp#1")
++
++ ; '''Minimal e.g.''' : ITIS as a dynamic system does not provide a set of concepts, but a published version of ITIS would qualify as a set of concepts.
++
++
++
++
++
++
++ Names by themselves are a useful entity
++
++ Names are not substitutes for concepts
++
++ ----
++
++ !!! From the Proposal
++
++ An Internet taxonomic concept (assertion) resolution service employing a semantic
++ mediation engine would exploit the SEEK architecture to enable precise species concept based data discovery and integration. This specific concept identity resolution problem is representative of a large class of problems (e.g. with classifications for biotic communities, soils, rocks, places) where there exist many-to-many relationships between concepts and names. The solutions we develop should have fundamental utility far beyond biological nomenclature and biodiversity.
++
++ !! IT Research Challenges
++ * Development of a comprehensive conceptual model that can represent all relevant aspects of biological classification and nomenclature semantics, specifically models of multiple interpretations depending on explicit representations of context information, e.g., temporal, hierarchical and circumscription dimensions.
++ * Development of logic representations that allow reasoning about the consistency and consequences of multiple, possibly competing interpretations. For example, using formalizations in modal and many-valued logic [84, 85], an automated deduction system may be devised that allows one to systematically compute all consequences of different taxonomic interpretations and feed those into the semantic mediation system, which in turn would show the different data and analysis views arising from the different nomenclature interpretations.
++ * Deducing concepts rather than species name strings from distributed taxonomic data sources
++
++ !! Deliverables
++ * Conceptual schema and data model for concept-based nomenclature data leveraging previous research by collaborators and colleagues.
++ * Data entry software that allows scientists to add new or published assertions as needed and to map institutional and personal perspectives on the relationships among assertions.
++ * Desktop visualization tools for data discovery and management of multiple classifications will be based on previous work by the Napier University Prometheus Project and others.
++ * Database implementation for an operational, Web-accessible prototype database with representative data from several different taxonomic groups (e.g. higher plants, fishes) aimed ultimately at a global, distributed and federated system of taxonomic concept servers.
++ * Internet service for automated name/concept resolution, accessible via EcoGrid, for several groups using information from synonymies currently available in public databases.
++ * Usability analysis of the functional requirements by working group members would evaluate all applications and tools developed for nomenclature resolution.
++
++    +++ Added during workshop +++
++
++ !! Milestones
++
++ ! Year 1
++ * Communications and outreach activities required to avoid duplication of effort (e.g. TDWG effort)
++ ** All classification database mafia
++ ** attend meetings / workshops etc to familiarize ourselves with other projects
++ ** populate working group activity with reps. from other groups
++ * Schedule working group meeting soon
++ * Draft schema for taxonomic concept object - as annotated XML-Schema document
++ * Reports by Bob and Jesse on their systems
++ * Jesse, Bob and Jim will take lead in analyzing all the other models and developing a plan for communicating with the rest of the group
++ * By February meeting
++ ** working group will summarize research on data models
++ ** This document including use cases will be completed
++ * Identify and hire human resources (early)
++
++ ! Year 2
++
++
++
++    +++
++
++
++ !!! From CVS Document
++
++ !! Scope
++
++ Includes any type of analysis or model in ecology and biodiversity science.
++
++ Goal is to massively streamline the analysis and modeling process, and provide
++ for archiving analyses and their outputs. Includes support for analyses in SAS,
++ Matlab, R, SysStat and custom models written in various languages (e.g., C).
++ The system should allow the addition of various back-end anaylytical engines as
++ they become available or as new versions are released.
++
++ The system as a whole should not be tied to any one metadata standard, back-end
++ system or operating system/platform.  Flexibility should be a major concern
++ in the design process due to the heterogeneous makeup of the ecological
++ scientific community.
++
++ The system should include features that assist users in determining the
++ appropriateness of combining various analytical steps and data sources based
++ on semantic mediation.  Semantic mediation should occur in three areas.  First,
++ to determine whether it is appropriate to link together particular analytic
++ steps. Second, to mediate between multiple data sets to determine in what ways
++ they can be combined.  Third, to determine whether the selected data sources are
++ appropriate inputs for the selected analysis.
++
++
++ !! Functional requirements
++ {{{
++   FR1: Analyses and models documented in declarative language (e.g., XML)
++   FR2: Must support 'pipelining' of models in a graph
++   FR3: Ability to archive analyses and their outputs
++   FR4: Ability to version analyses and their outputs
++   FR5: Must have an easy-to-use front end GUI to assist scientists with
++        building and executing pipelines
++   FR6: Allows the sharing of analytical processes amongst scientists
++   FR7: Flexibility in input, processing and output.  e.g. not binding the
++        system to one metadata standard, back-end system or platform
++ }}}
++
++ !! Use cases
++ {{{
++   UC1: Scientist can create new analytic steps
++   UC2: Scientist can use a graphical interface to arrange analytical steps
++        into a pipeline, save it, bind data to the inputs, and execute it
++   UC3: Scientist can execute an analysis or model described in a declarative
++        language
++   UC4: Scientist can archive various intermediate and endpoint results of an
++        analytical process
++   UC5: Scientist can create new versions of analytical steps, and can return
++        to old versions
++   UC6: Scientist can share coded pipelines or sub-pipeline steps and results
++        of pipeline analyses with other scientists
++   UC7: Administrators can add support for additional metadata processors
++        and back-end systems when needed
++   UC8: Scientist can work backwards through a pipeline of interest and so by
++        starting with knowledge of the semantics of the result of interest
++        is able to determine the type of data needed as inputs to the pipeline
++   UC9: Given a particular data set and set of pipelines, the scientist can
++        use the semantic mediation system to determine the types of analyses
++        that are possible to carry out on the data set.
++ }}}
++ !! Software components
++ {{{
++   SW1: Metadata language for formal description of analyses
++   SW2: Metadata language for the formal description of data and model semantics
++   SW3: Server-side system for execution of analyses and models
++   SW4: Server-side system for processing semantic metadata
++   SW5: Client interface for creating and executing analyses and models
++ }}}
++ ----
++
++ [Old version of this page | http://seek.speciesanalyst.net/ow.asp?p=MeetingJan30_2003]

This material is based upon work supported by the National Science Foundation under award 0225676. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).


Long Term Ecological Research Network, UNM	National Center for Ecological Analysis and Synthesis, UCSB	Biodiversity Research Center, KU	San Diego Supercomputer Center, UCSD


Arizona State University	Napier University	University of North Carolina	University of Vermont


UC Davis Genome Center