|
|||
|
This is version 144.
It is not the current version, and thus it cannot be edited. Intended audienceThis document is intended for SEEK and Kepler developers. It is a DRAFT DESIGN DOCUMENT and does not reflect functionality as it currently exists in Kepler or SEEK. Comments and feedback are appreciated (see Comments Page).
IntroductionThis document describes an interchange syntax that can be used to express semantic types.
KR/SMS Semantic TypesA semantic type classifies and constrains the semantic, as opposed to structural interpretation of a resource. Datasets, actors (services), and actor input and output ports are examples of resources that may have semantic types within SEEK. A semantic type is expressed as a set of semantic annotations. The purpose of a semantic annotation is to assign objects of a resource a "meaning" using ontology terms. A semantic annotation serves to "link" a portion of a resource to a portion of an ontology. In this way, the semantic interpretation of a resource (its semantic type) is built from the annotation of its parts. Semantic types can be expressed using the following XML representation (see below [3] for the XML Schema).
<sms:SemanticType id="..." xmlns:sms="http://seek.ecoinformatics.org/sms"> <sms:Label name="..." resource="..."/> ... <sms:Annotation object="..." meaning="..."/> ... <sms:OntologyDefinitions> ... </sms:OntologyDefinitions> </sms:SemanticType> To be used within the SEEK architecture, semantic types must be uniquely identified. The unique identifier of a semantic type can be stated using the id attribute of the SemanticType element. An identifier is (preferably) expressed as an LSID in which the semantic type is managed as an LSID data object. Alternatively, if a semantic type is embedded within a document, the semantic-type id can be expressed as a fragment identifier (for example, when used within EML). As shown above, a semantic type consists of a set of labels, a set of annotations, and an optional ontology definition section. The rest of this page describes these components.
Semantic-Type LabelsLabels within a semantic-type description provide a mechanism to identify and name the resources and ontology terms used in the corresponding annotations. In a Label element, the the name attribute is assigned to the resource identified by the resource attribute. Each Label element is required to have exactly one name and resource attribute. A SemanticType element must contain at least two Label elements: one identifying the resource to be annotated and the other identifying an ontology concept. Further, no two Label elements within a semantic type may have the same name attribute. The first label shown below associates a dataset to the name crops and the second label associates an ontology concept to the name Biomass.
<sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Biomass" resource="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#Biomass"/>
Semantic AnnotationsAn annotation asserts that an object of a resource has a particular meaning according to definitions within an ontology. The object and meaning attributes of an Annotation element relate the object and ontology expressions, respectively. We provide a uniform annotation language for identifying resource objects and specifying ontology expressions. Some resources (in particular, data sets and actors with input/output ports) can have complex data structures. For example, a data set typically is structured according to a schema, which specifies among other things a relation name (that is, the name of the table) and names for each attribute of the relation and their data types. Actor ports can also have complex structure, including arbitrarily nested relations. The annotation language facilitates the selection of the various (sub-) objects of structured resources. The entire resource itself can also be selected using the annotation language. The annotation language has two forms: an abbreviated syntax, and a more complex, full syntax.
The Abbreviated Annotation-Language SyntaxFor expressing annotation objects, the abbreviated syntax permits the following atoms given a resource label T and attributes A1 to An.
T T.A1 T.A1.A2. ... .An
The first atom T selects corresponding objects of the resource. For example, if the resource is a data set, T selects the tuple objects of the resource. If the resource is an actor, T selects instances of the actor. The second atom T.A1 selects A1 objects contained within T objects. For T representing a data set, T.A1 selects the values of attribute A1 for tuples of T. The last atom selects nested attributes for complex structures occuring, for example, in actor input/output ports. For instance, if T represents an input port to some actor[1], T.A1.A2 selects the A2 objects nested within A1 objects contained in T objects. Atoms can be combined to form expressions. In particular, an expression is either: (a) a single atom or (b) a comma-separated list of atoms of the form T.A1 or T.A1.A2. ... An. In the abbreviated syntax, ontology expressions only consist of a single concept label C. To illustrate, consider the following semantic-type description for the crops data-set resource.
<sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#"> <sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Measurement" resource="ont:Measurement"/> <sms:Label name="Biomass" resource="ont:Biomass"/> <sms:Label name="Species" resource="ont:Species"/> <sms:Label name="Year" resource="ont:Year"/> <sms:Label name="Location" resource="ont:Location"/> <sms:Annotation object="crops" meaning="Measurement"/> <sms:Annotation object="crops.bm" meaning="Biomass"/> <sms:Annotation object="crops.spp" meaning="Species"/> <sms:Annotation object="crops.yr" meaning="Year"/> <sms:Annotation object="crops.station" meaning="Location"/> </sms:SemanticType>
In this simple example, we (1) associate the label crops to the data-set resource identifed as KBS019-003, (2) associate the remaining labels to corresponding ontology concepts (simplifying their identifiers using XML namespaces), (3) state with the first annotation that each crops tuple is a Measurement instance, (4) state with the second annotation that each bm attribute value is a Biomass instance, (5) state with the thrid annotation that each spp attribute value is a Species instance, and so on.
Semantic-Type Ontology DefinitionsFor convenience, we permit ontology concept definitions to be directly included within a semantic type using the OntologyDefinitions element. The purpose of this features is to allow specialized concept definitions to more accurately annotate objects, without having to go through the process of creating a new ontology, or editing an existing one. These concept definitions are expressed using OWL[2]. To illustrate, part of the previous semantic type is shown below with an embedded concept. (Note that to simplify the definition below we take liberty with the use of namespaces in OWL). This embedded concept definition states that MyMeasurement is both a Measurement and a SubjectiveObservation.
<sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Measurement" resource="MyMeasurement"/> <sms:Annotation object="crops" meaning="MyMeasurement"/> <sms:OntologyDefinitions> <owl:Class rdf:ID="MyMeasurement"> <owl:equivalentClass> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:resource="ont:Measurement"/> <owl:Class rdf:resource="ont:SubjectiveObservation"/> </owl:intersectionOf> </owl:equivalentClass> </sms:Resource> </sms:OntologyDefinitions> </sms:SemanticType>
The Full Annotation-Language SyntaxThe full annotation-language syntax provides more access to various parts of a complex structure and the ability to assign those parts to more complex ontology expressions. To support a wide variety of structural models -- the primary ones including relational, XML, and the Ptolemy type system -- we consider a generic model consisting of nested-relational-style constructs. In addition, we permit multi-valued attributes in which an attribute can have an associated collection of values. The abbreviated annotation-language syntax is shorthand for a subset of the full syntax (we give more details of the relationship below). In the full syntax, resource expressions consist of lists of atoms (separated by commas) taking one of the following forms.
x:T x[A1=y] Here, symbols x and y denote either constants, variables, or skolem terms. Variables are prefixed with a '$' sign. Constants that contain spaces must be delimited using single quotes. A skolem term takes the form f(z1, ..., zn) for symbols z1 to zn and n > 0. For x and y constants, the atom x:T is true if x is a T object, and the atom x[A1=y] is true if x is an object that has y as one of its A1 attribute values. Complex expressions are constructed as follows: each atom is an expression; expressions x:T and x[A1=y] can be composed to form the expression x:T[A1=y]; expressions x[A1=y] and x[A2=z can be composed to form the expression x[A1=y, A2=z]; expressions y:T1[A2=z] and x:T[A1=y] can be composed to form the expression x:T[A1=y:T1[A2=z]]; and so on. This same syntax is used to describe ontology expressions, where T can be replaced with a concept label C and A1 represents a property label. For x and y constants, the atom x:C is true if x is an instance of concept C, and the atom x:[A1=y] is true if x has y as one of its A1 property values. The meaning of an annotation using the full syntax can be interpreted as follows. Assume we have an annotation A such that R is the expression selecting resource objects (the expression in the object attribute) and O is the expression selecting ontology objects (the expression in the meaning attribute). The annotation is a constraint that says whenever the object attribute is true, the meaning attribute is true. Let Vo be the set of variables in the object expression and Vm be the set of variables in the meaning expression not in Vm. We interpret A as: (forall Vo) R => (exists Vm) O. That is, the annotation asserts that for each variable assignment making R true there are variable assignments for Vm that make O true. For instance, consider the semantic type below, which is a more detailed version of the previous semantic type. <sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#"> <sms:Label name="crops" resource="KBS019-003"/> <sms:Label name="Measurement" resource="ont:Measurement"/> <sms:Label name="Biomass" resource="ont:Biomass"/> <sms:Label name="Species" resource="ont:Species"/> <sms:Label name="Year" resource="ont:Year"/> <sms:Label name="Location" resource="ont:Location"/> <sms:Label name="measProp" resource="ont:measurementProperty"/> <sms:Label name="measItem" resource="ont:measurementItem"/> <sms:Label name="measContext" resource="ont:measurementContext"/> <sms:Annotation object="$x:crops" meaning="$x:Measurement"/> <sms:Annotation object="$x:crops[bm=$y]" meaning="$x[measProp=$y:Biomass]"/> <sms:Annotation object="$x:crops[spp=$y]" meaning="$x[measItem=$y:Species]"/> <sms:Annotation object="$x:crops[yr=$y]" meaning="$x[measContext=$y:Year]"/> <sms:Annotation object="$x:crops[station=$y]" meaning="$x[measContext=$y:Location]"/> </sms:SemanticType> The advantage of using full syntax here is that we can properly connect the attributes of a given tuple to its proper semantic components. Another advantage of using the full syntax is that it can provide support for data sets that have "promoted" data to schema. Consider the following semantic-type description for a data set with attributes station, MEDSA, and GLYMX, where MEDSA and GLYMX are species codes whose values are biomass measurements.
<sms:SemanticType id="mySemType" xmlns:sms="http://seek.ecoinformatics.org/sms" xmlns:ont="http://seek.ecoinformatics.org/seek/ontos/DefaultOnto#"> <sms:Label name="ds" resource="..."/> <sms:Label name="Measurement" resource="ont:Measurement"/> <sms:Label name="Biomass" resource="ont:Biomass"/> <sms:Label name="Species" resource="ont:Species"/> <sms:Label name="Location" resource="ont:Location"/> <sms:Label name="measProp" resource="ont:measurementProperty"/> <sms:Label name="measItem" resource="ont:measurementItem"/> <sms:Label name="measContext" resource="ont:measurementContext"/> <sms:Annotation object="$x:ds[site=$y, MEDSA=$z]" meaning="f1($x):Measurement[measContext=$y:Location, measProp=$z:Biomass, measItem=MEDSA]"/> <sms:Annotation object="$x:ds[site=$y, GLYMX=$z]" meaning="f2($x):Measurement[measContext=$y:Location, measProp=$z:Biomass, measItem=GLYMX]"/> </sms:SemanticType> Here, each tuple of the dataset represents two distinct measurements of biomass: one for the MEDSA species and the other for the GLYMX species. The skolem terms f1($x) and f2($x) distinguish these two observations given a tuple $x, that is, the skolem terms can be seen as an creating new objects from the original object $x.
The abbreviated syntax has a natural "translation" to the full syntax. In particular, the following two annotations are equivalent.
<sms:Annotation object="T" meaning="C"/> <sms:Annotation object="$x:T" meaning="$x:C"/> For atoms T.A1, the following two annotations are equivalent.
<sms:Annotation object="T.A1" meaning="C"/> <sms:Annotation object="$x:T[A1=$y]" meaning="$y:C"/> For atoms T.A1.A2. ... .An, the following two annotations are equivalent.
<sms:Annotation object="T.A1.A2. ... .An" meaning="C"/> <sms:Annotation object="$x:T[A1=$y1], $y2:[A2=$y3] ... $yn-1:[An=$yn]" meaning="$yn:C"/> And finally, atoms of the form T.A1, T.A2, ..., T.Am, the following two annotations are equivalent, where f is a unique skolem symbol.
<sms:Annotation object="T.A1, T.A2, ..., T.Am" meaning="C"/> <sms:Annotation object="$x:T, $x[A1=$y1], $x[A2=$y2], ..., $x[Am=$ym]" meaning="f($x, $y1, $y2, ..., $ym):C"/>
[#1] We note that actor ports may not always be represented as an identifiable resource, and instead may be modeled as components of an actor. For example, consider an actor A having two ports P1 and P2. For the case where P1 and P2 are not separate resources, we can define the structural type of A as having two attributes P1 and P2 where A.P1 denotes port P1 and A.P2 denotes port P2. [#2] Perhaps originally converted from a Sparrow expression. [#3] The semantic-type interchange-syntax XML Schema is: ...
|
This material is based upon work supported by the National Science Foundation under award 0225676. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF). Copyright 2004 Partnership for Biodiversity Informatics, University of New Mexico, The Regents of the University of California, and University of Kansas |