|
|||
|
This is version 159.
It is not the current version, and thus it cannot be edited. Intended audienceThis document is intended for SEEK and Kepler developers. It is a DRAFT DESIGN DOCUMENT and does not reflect functionality as it currently exists in Kepler or SEEK. Comments and feedback are appreciated (see Comments Page).
ToDo
IntroductionThis document describes an interchange syntax that can be used to express semantic types. We describe three versions of the format: a canonical version, a version for embedding within MoML files, and a version for embedding within EML files. For the case of embeddings, we show how the embedding can be converted to the canonical version.
KR/SMS Semantic TypesA semantic type classifies and constrains the semantic, as opposed to structural interpretation of a resource. Datasets, workflows, actors (and other workflow components), and actor input and output ports are examples of resources that may have semantic types within SEEK. A semantic type is an ontology expression that is "linked" to a resource using a semantic annotation. Generally, a semantically typed resource will have an annotation to a single concept. Plus, additional resource "sub-structure" (e.g., particular attributes of a dataset) annotations may exist that further link portions of the resource to parts of the resource's more general concept (examples are given below). The following XML representation shows the general form of the canonical semantic-type interchange syntx (see below [3] for the XML Schema).
<sms:semtype id="id3" xmlns:sms="http://seek.ecoinformatics.org/sms"> <sms:resource name="R" id="id1"/> ... <sms:ontology name="C" id="id2"/> ... <sms:annotation object="R" meaning="C"/> ... </sms:semtype> The element semtype starts a semantic type definition. A semantic type can have an optional unique identifier, which is given in the id attribute. Unique identifiers are (preferably) expressed as an LSID, allowing the semantic type to be managed as an LSID data object. Alternatively, if a semantic type is embedded within a document, the semantic-type id can also be expressed as a fragment identifier (for example, when used within EML). As shown above, a semantic type consists of a set of resource elements, ontology elements, and annotation elements. We consider each of these elements, in turn, below.
Semantic-Type Resource ReferencesItems that are semantically typed are called resources. A resource element identifies a resource using the resource's unique identifier. A resource element also assigns a name to the resource, which is used to reference it within annotations.
Resource tags within a semantic-type description provide a mechanism to identify and name those resources that are semantic Labels within a semantic-type description provide a mechanism to identify and name resources and ontology terms. Label names are used within annotations to refer to resources and ontology terms. In a Label element, the name attribute is assigned to the resource identified by the resource attribute. Each Label element is required to have exactly one name and resource attribute. A SemanticType element must contain at least two Label elements: one identifying the resource to be annotated and the other identifying an ontology concept. Further, no two Label elements within a semantic type may have the same name attribute. Two label definitions are shown below. The first label associates a data set to the name crops and the second label associates an ontology concept to the name Biomass.
<sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Biomass" resource="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#Biomass"/>
Semantic AnnotationsAn annotation asserts that an object of a resource has a particular meaning according to an ontology. The object and meaning attributes of an Annotation element relate the object and ontology expressions, respectively. We provide a uniform annotation language for identifying resource objects and for creating corresponding ontology instances. Some resources, such as data sets and actors with input/output ports, can have complex data structures. For example, a data set typically is structured according to a schema, which specifies among other things a relation name (that is, the name of the table) and names for each attribute of the relation and their data types. Actor ports can also have complex structure, including arbitrarily nested relations. The annotation language facilitates the selection of the various (sub-) objects of structured resources. The annotation language has two forms: an abbreviated syntax, and a more complex, full syntax.
The Abbreviated Annotation-Language SyntaxFor expressing annotation objects, the abbreviated syntax permits the following atoms given a resource label T and attributes A1 to An.
T T.A1 T.A1.A2. ... .An
The first atom T selects corresponding objects of the resource. For example, if the resource is a data set, T selects the tuple objects of the resource. If the resource is an actor, T selects instances of the actor. The second atom T.A1 selects A1 objects contained within T objects. For T representing a data set, T.A1 selects the values of attribute A1 for tuples of T. The last atom selects nested attributes for complex structures, for example, used by actor input/output ports. For instance, if T represents an input port to some actor[1], T.A1.A2 selects the A2 objects nested within A1 objects contained in T objects. Atoms can be combined to form expressions. In particular, an expression is either: (1) a single atom or (2) a comma-separated list of atoms of the form T.A1 or T.A1.A2. ... An. In the abbreviated syntax, ontology expressions only consist of a single concept label C. To illustrate, consider the following semantic-type description for the crops data set.
<sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#"> <sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Measurement" resource="ont:Measurement"/> <sms:Label name="Biomass" resource="ont:Biomass"/> <sms:Label name="Species" resource="ont:Species"/> <sms:Label name="Year" resource="ont:Year"/> <sms:Label name="Location" resource="ont:Location"/> <sms:Annotation object="crops" meaning="Measurement"/> <sms:Annotation object="crops.bm" meaning="Biomass"/> <sms:Annotation object="crops.spp" meaning="Species"/> <sms:Annotation object="crops.yr" meaning="Year"/> <sms:Annotation object="crops.station" meaning="Location"/> </sms:SemanticType>
In this simple example, we (1) associate the label crops to the data-set resource identifed as KBS019-003, (2) associate the remaining labels to corresponding ontology concepts (simplifying their identifiers using XML namespaces), (3) state with the first annotation that each crops tuple is a Measurement instance, (4) state with the second annotation that each bm attribute value is a Biomass instance, (5) state with the third annotation that each spp attribute value is a Species instance, and so on.
Semantic-Type Ontology DefinitionsFor convenience, we permit ontology concept definitions to be directly included within a semantic type using the OntologyDefinitions element. The purpose of this features is to allow specialized concept definitions to more accurately annotate objects, without having to go through the process of creating a new ontology, or editing an existing one. These concept definitions are expressed using OWL[2]. To illustrate, part of the previous semantic type is shown below with an embedded concept. (Note that to simplify the definition below we take liberty with the use of namespaces in OWL). This embedded concept definition states that MyMeasurement is both a Measurement and a SubjectiveObservation.
<sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Measurement" resource="MyMeasurement"/> <sms:Annotation object="crops" meaning="MyMeasurement"/> <sms:OntologyDefinitions> <owl:Class rdf:ID="MyMeasurement"> <owl:equivalentClass> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:resource="ont:Measurement"/> <owl:Class rdf:resource="ont:SubjectiveObservation"/> </owl:intersectionOf> </owl:equivalentClass> </sms:Resource> </sms:OntologyDefinitions> </sms:SemanticType>
The Full Annotation-Language SyntaxThe full annotation-language syntax provides more access to various parts of a complex structure and the ability to assign those parts to more complex ontology instances. To support a wide variety of structural models -- the primary ones including relational, XML, and the Ptolemy type system -- we consider a generic model consisting of nested-relational-style constructs. In addition, we permit multi-valued attributes in which an attribute can have an associated collection of values. The abbreviated annotation-language syntax is shorthand for a subset of the full syntax (we give more details of the relationship below). In the full syntax, resource expressions consist of lists of atoms (separated by commas) taking one of the following forms.
x:T x[A1=y] Here, symbols x and y denote either constants, variables, or skolem terms. Variables are prefixed with a '$' sign. Constants that contain spaces must be delimited using single quotes. A skolem term takes the form f(z1, ..., zn) for constants and/or variables z1 to zn and n > 0. For x and y constants, the atom x:T is true if x is a T object, and the atom x[A1=y] is true if x is an object that has y as one of its A1 attribute values. Complex expressions are constructed as follows: each atom is an expression; expressions x:T and x[A1=y] can be composed to form the expression x:T[A1=y]; expressions x[A1=y] and x[A2=z] can be composed to form the expression x[A1=y, A2=z]; expressions y:T1[A2=z] and x:T[A1=y] can be composed to form the expression x:T[A1=y:T1[A2=z]]; and so on. This same syntax is used to describe ontology expressions, where T can be replaced with a concept label C and A1 represents a property label. For x and y constants, the atom x:C is true if x is an instance of concept C, and the atom x[A1=y] is true if x has y as one of its A1 property values. The meaning of an annotation can be interpreted as follows. Assume we have an annotation A such that R is the expression selecting resource objects (the expression in the object attribute) and O is the expression creating ontology instances (the expression in the meaning attribute). The annotation is a constraint that says whenever R is true, O is true. Let Rvar be the set of variables in R and Ovar be the set of variables in O that are not in R. The annotation A asserts (forall Rvar) R -> (exists Ovar) O. That is, for each variable assignment making R true there are variable assignments for Ovar that make O true. In this way, an annotation can also be viewed as a mapping from R to O. Consider the crops data set again. The following semantic type provides a more detailed description of crops using the full annotation syntax. <sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#"> <sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Measurement" resource="ont:Measurement"/> <sms:Label name="Biomass" resource="ont:Biomass"/> <sms:Label name="Species" resource="ont:Species"/> <sms:Label name="Year" resource="ont:Year"/> <sms:Label name="Location" resource="ont:Location"/> <sms:Label name="measProp" resource="ont:measurementProperty"/> <sms:Label name="measItem" resource="ont:measurementItem"/> <sms:Label name="measContext" resource="ont:measurementContext"/> <sms:Annotation object="$x:crops" meaning="$x:Measurement"/> <sms:Annotation object="$x:crops[bm=$y]" meaning="$x[measProp=$y:Biomass]"/> <sms:Annotation object="$x:crops[spp=$y]" meaning="$x[measItem=$y:Species]"/> <sms:Annotation object="$x:crops[yr=$y]" meaning="$x[measContext=$y:Year]"/> <sms:Annotation object="$x:crops[station=$y]" meaning="$x[measContext=$y:Location]"/> </sms:SemanticType> The advantage of using the full syntax here is that we can properly relate attributes of a given tuple according to the ontology. For example, we are able to say that for a given tuple, the bm value represents the biomass of the species represented by the corresponding spp value. Another advantage of using the full syntax is that it allows annotations of resources that mix schema and data. Consider the following semantic-type description for a data set with attributes station, MEDSA, and GLYMX, where MEDSA and GLYMX are species codes containing biomass values.
<sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#"> <sms:Label name="ds" resource="..."/> <sms:Label name="Measurement" resource="ont:Measurement"/> <sms:Label name="Biomass" resource="ont:Biomass"/> <sms:Label name="Species" resource="ont:Species"/> <sms:Label name="Location" resource="ont:Location"/> <sms:Label name="measProp" resource="ont:measurementProperty"/> <sms:Label name="measItem" resource="ont:measurementItem"/> <sms:Label name="measContext" resource="ont:measurementContext"/> <sms:Annotation object="$x:ds[site=$y, MEDSA=$z]" meaning="f1($x):Measurement[measContext=$y:Location, measProp=$z:Biomass, measItem=MEDSA:Species]"/> <sms:Annotation object="$x:ds[site=$y, GLYMX=$z]" meaning="f2($x):Measurement[measContext=$y:Location, measProp=$z:Biomass, measItem=GLYMX:Species]"/> </sms:SemanticType> Here, each tuple of the dataset represents two distinct measurements of biomass: one for the MEDSA species and the other for the GLYMX species. The skolem terms f1($x) and f2($x) distinguish these two observations given a tuple $x, that is, the skolem terms can be seen as creating two objects from the original object $x.
The abbreviated syntax has a natural translation to the full syntax. For expressions T in the abbreviated syntax, the following two annotations are equivalent.
<sms:Annotation object="T" meaning="C"/> <sms:Annotation object="$x:T" meaning="$x:C"/> For expressions T.A1 in the abbreviated syntax, the following two annotations are equivalent.
<sms:Annotation object="T.A1" meaning="C"/> <sms:Annotation object="$x:T[A1=$y]" meaning="$y:C"/> For expressions T.A1.A2. ... .An in the abbreviated syntax, the following two annotations are equivalent.
<sms:Annotation object="T.A1.A2. ... .An" meaning="C"/> <sms:Annotation object="$x:T[A1=$y1], $y1[A2=$y2] ... $yn-1[An=$yn]" meaning="$yn:C"/> And finally, for expressions T.A1, T.A2, ..., T.Am in the abbreviated syntax, the following two annotations are equivalent, where f is a unique skolem symbol.
<sms:Annotation object="T.A1, T.A2, ..., T.Am" meaning="C"/> <sms:Annotation object="$x:T, $x[A1=$y1], $x[A2=$y2], ..., $x[Am=$ym]" meaning="f($x, $y1, $y2, ..., $ym):C"/>
The original semantic type for the crops data set expressed in the abbreviated syntax is translated below into an equivalent semantic type in the full syntax. Compared with the crops semantic type given above in the full syntax (relating tuple values via the ontology), the semantic type below is "less precise." However, in many cases where the semantic type is fairly simple, the abbreviated annotation syntax will be sufficient to describe the desired semantics.
<sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#"> <sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Measurement" resource="ont:Measurement"/> <sms:Label name="Biomass" resource="ont:Biomass"/> <sms:Label name="Species" resource="ont:Species"/> <sms:Label name="Year" resource="ont:Year"/> <sms:Label name="Location" resource="ont:Location"/> <sms:Annotation object="$x:crops" meaning="$x:Measurement"/> <sms:Annotation object="$x:crops[bm=$y]" meaning="$y:Biomass"/> <sms:Annotation object="$x:crops[spp=$y]" meaning="$y:Species"/> <sms:Annotation object="$x:crops[yr=$y]" meaning="$y:Year"/> <sms:Annotation object="$x:crops[station=$y]" meaning="$y:Location"/> </sms:SemanticType>
[#1] We note that actor ports may not always be represented as an identifiable resource, and instead may be modeled as components of an actor. For example, consider an actor A having two ports P1 and P2. For the case where P1 and P2 are not separate resources, we can define the structural type of A as having two attributes P1 and P2 where A.P1 denotes port P1 and A.P2 denotes port P2. [#2] Perhaps originally converted from a Sparrow expression. [#3] The semantic-type interchange-syntax XML Schema is: ...
|
This material is based upon work supported by the National Science Foundation under award 0225676. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF). Copyright 2004 Partnership for Biodiversity Informatics, University of New Mexico, The Regents of the University of California, and University of Kansas |