|
|||
|
Intended audienceThis document is intended for SEEK and Kepler developers. It is a DRAFT DESIGN DOCUMENT and does not reflect functionality as it currently exists in Kepler or SEEK. Comments and feedback are appreciated (see Comments Page).
IntroductionSemantic annotations leverage ontologies to describe the conceptual aspects of structured resources, e.g., information sources such as data sets and services (workflows, actors, and web-services). Each resource is assumed to have a well defined schema that describes the structure of associated data (in the case of services, e.g., inputs and outputs). In addition to providing metadata for resources, ontology-based semantic annotations can enable improved discovery and integration of data. Properly describing the semantics of a resource often requires "fine-grain" annotation, in which different parts of the resource are annotated with distinct semantic information, possibly including the assertion of semantic relations among the parts. The challenge is to provide an appropriate language for accessing, annotating, and relating portions of resources. This technical note describes basic aspects of semantic annotation templates, which are designed to support these fine-grain resource annotations. We use an XML-based language for representing semantic annotations, in which annotations take the form:
<annotation id="..."> <!-- header --> <resource label="R" uri="http://resources.org/resource" type="..."/> <ontology label="ont" uri="http://ontologies.org/ont"/> ... <!-- annotation assertions --> ontology instantiation templates ... </annotation> The resources being annotated and the ontologies used for annotation are assigned labels in the annotation header. For the case of ontologies, we typically refer to these labels as prefixes. Ontologies are assumed to be expressed using the Web Ontology Language (OWL). Annotation headers may also include information concerning who the author of the annotation is, when the annotation was created, who manages the annotation, and so on. The template information, which is the focus of this technical note, specifes fine-grain semantic annotations as mappings from resources to instances of the ontologies listed in the header. This technical note gives an introduction to semantic annotation templates. Section 2 provides a short overview of annotation templates, and Section 3 gives a more detailed explanation. A number of the terms introduced in this technical note are defined in the Glossary at the end of the document. Following the Glossary is a list of footnotes. Some familiarity with RDF, OWL, and basic First-Order Logic is assumed.
2 Overview of Annotation TemplatesAn annotation template consists of one or more instantiation patterns for constructing OWL individuals from resources. Instantiation patterns are typically based on (or driven by) resource structure and content. The simplest form of an instantiation pattern is:
<individual type="ont:C"/> This expression creates a single, unique instance of the C class in the ontology refered to by ont (assumed to be an ontology prefix defined in the annotation header). The OWL document that results from running (executing) the pattern is:
<rdf:RDF xmlns="local-ns" ...> <owl:Ontology rdf:about="">; <owl:imports rdf:resource="http://ontologies.org/ont"/> </owl:Ontology> <ont:C rdf:ID="id1"/> </rdf:RDF> In this example, the pattern maps the resources given in the annotation to a single OWL individual[1]. Note that the identifier for the instance above is generated automatically as a result of executing the pattern[2]. A more common use of templates is to relate data values in a resource to class instances in the ontology. Assume we are annotating a relational table labeled R with attributes x, y, and z[3]. The following pattern, which uses a foreach attribute, creates an instance of class C for every unique value of x in the dataset.
<individual type="ont:C" foreach="DISTINCT R.x"> This pattern can be read as "For each unique x value of R create an instance of C." In this example, the term "R.x" is a resource variable. Executing this pattern results in the following document, assuming there are n unique values of x in R[4]. As above, identifiers are generated as a result of executing the pattern over R.
<rdf:RDF xmlns="local-ns" ...> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://ontologies.org/ont"/> </owl:Ontology> <ont:C rdf:ID="id-val1"/> <ont:C rdf:ID="id-val2"/> ... <ont:C rdf:ID="id-valn"/> </rdf:RDF> Note that this document has a different namespace than the corresponding ontology(ies), but imports the ontologies referenced in the annotation header. Thus, the individuals listed in this document are treated as distinct from the ontology itself, but OWL-based tools (such as Protege or a description-logic reasoner) can display and reason over the individuals as though they were part of the original ontology. We use rules expressed in first-order logic to formalize how instantiation patterns should be interpreted[5]. For example, the first-order logic rule for the above pattern is:
(Axyz) R(x, y, z), u=id_p1(x) -> triple(u, rdfs:type, ont:C). Here, the predicate triple asserts an RDF triple (with subject, property, value), and id_p1 is a (Skolem) function mapping values into identifiers. The function id_p1 is meant to apply only within this rule, where p1 stands for "pattern 1." We say in this case that each x value of R constitues a particular C. There are a number of additional features of instantiation patterns for describing fine-grain semantic annotations. These features are discussed in more detail in the next section.
<a name="section3"></a> <h2>3. Template Instantiation Patterns</h2>
3.1 IndividualsIteration. More than one variable can be given in a foreach expression. For example, the following pattern creates an instance of C for every unique pair of x and y values occurring together in tuples of R.
<individual type="ont:C" foreach="DISTINCT R.x, R.y"/> This pattern can be read as "For each unique x, y (tuple) value pair of R, create an instance of C." The corresponding first-order rule for this pattern is:
(Axyz) R(x, y, z), u=id_p2(x, y) -> triple(u, rdfs:type, ont:C). In this example, we say that each x,y value constitutes a particular C. Resource labels (such as R) in annotations are used in a similar way as tuple variables (i.e., "range variables") in SQL. In particular, a different label can be applied to the same resource in an annotation header. For example, if R1 and R2 are both labels for the Employee relation, the iteration expression "R1.x, R2.x" is equivalent to the SQL cross-product projection:
SELECT DISTINCT R1.x, R2.x FROM Employee R1, Employee R2 In a similar way, one can view foreach expressions as group constructors, similar to the GROUP BY clause in SQL. Conditions. Conditions can be added to restrict the application of a pattern. For example, the following pattern restricts the creation of C instances based on positive values of x.
<individual type="ont:C" foreach="DISTINCT R.x, R.y" if="R.x>0"/> This pattern can be read as "For each unique x, y (tuple) value pair of R in which x is greater than 0, create an instance of C." The corresponding first-order rule for this pattern is:
(Axyz) R(x, y, z), x>0, u=id_p3(x, y) -> triple(u, rdfs:type, ont:C). In general, conditions are Boolean expressions of the form term op term, where a term is a constant or variable (such as R.x or the value 5), and op is a Boolean operator such as <, >, <=, >=, or =. As with foreach expressions, if expressions can be given as a conjunction of comma-separated conditions. Further, condition expressions may contain resource variables that are outside the condition's <a href="#context">iteration context</a>, i.e., the set of variables (or particular bindings of the variables) used in the condition's corresponding foreach expression. Note that any given binding of foreach variables may have many associated values for an "out-of-context" variable. For these cases, the if expression is satisfied whenever the condition is true for any one of these values (i.e., similar to the ANY keyword in SQL).
3.2 Object PropertiesA property expression assigns OWL properties to corresponding individuals within an instantiation pattern. For example, the following pattern creates instances of C containing properties P:
<individual type="ont:C" foreach="DISTINCT R.x"> <property type="ont:P" valuetype="ont:D"/> </individual> This pattern can be read as "For each unique x value of R, create an instance of C that has a property P to an instance of D." Executing this pattern results in the following document, assuming there are n unique values of x in R[6].
<rdf:RDF xmlns="local-ns" ...> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="htt://ontologies.org/ont"/> </owl:Ontology> <ont:C rdf:ID="id-val1"> <ont:P> <ont:D/> </ont:P> </ont:C> ... <ont:C rdf:ID="id-valn"> <ont:P> <ont:D/> </ont:P> </ont:C> </rdf:RDF> The first-order rule for this pattern is:
(Axyz) R(x, y, z), u=id_p4(x) -> (Ev) triple(u, rdf:type, ont:C'), triple(u, ont:P, v), triple(v, rdf:type, ont:D'). Note that in this rule, v is existentially quantified, which we assume is interpreted as an RDF anonymous identifier. Alternatively, we could have introduced a new Skolem function over x values (similar to id_p4) for generating the appropriate D identifiers. Nested Properties. Property expressions corresponding to OWL object properties can be arbitrarily nested within instantiation patterns. For example, the following pattern further elaborates the D instances above with Q properties:
<individual type="ont:C" foreach="DISTINCT R.x"> <property type="ont:P" valuetype="ont:D"> <property type="ont:Q" valuetype="ont:E"/> </property> </individual> This pattern can be read as "For each unique x value of R, create an instance of C that has a property P to an instance of D such that the D instance has a property Q to an instance of E." The first-order rule for this pattern is:
(Axyz) R(x, y, z), u=id_p5(x) -> (Evw) triple(u, rdf:type, ont:C), triple(u, ont:P, v), triple(v, rdf:type, ont:D) triple(v, ont:Q, w), triple(w, rdf:type, ont:E). Multiple Properties. Individuals can be assigned more than one property. The following pattern assigns two properties P1 and P2.
<individual type="ont:C" foreach="DISTINCT R.x"> <property type="ont:P1" valuetype="ont:D1"/> <property type="ont:P2" valuetype="ont:D2"/> </individual>
This pattern can be read as "For each unique x value of R, create an instance of C that has two properties, P1 to an instance of D1, and P2 to an instance of D2." The first-order rule for this pattern is: (Axyz) R(x, y, z), u=id_p6(x) -> (Evw) triple(u, rdf:type, ont:C), triple(u, ont:P1, v), triple(v, rdf:type, ont:D1) triple(u, ont:P2, w), triple(w, rdf:type, ont:D2).
The general form of a pattern consists of an individual expression, followed by any number of (possibly nested) property expressions:
<individual type="..." foreach="..." if="..." ...> <property type="..." ...> ... nested property expressions ... </property> ... additional property expressions ... </individual> The additional attributes of individual and property statements are described further below (as well as in the footnotes).
3.3 Datatype PropertiesThe examples so far assume the use of OWL object properties, whose ranges (i.e., what the properties "point" to) are individuals. Here we describe support for annotating to datatype properties, in which ranges are assumed to be atomic data values (e.g., strings, integers, or doubles). Datatype property statements use the attribute value instead of valuetype. In general, a value attribute is used to assign a specific data value or individual identifier to a property, whereas a valuetype attribute is used to give the type of the individual linked to the property. Thus, valuetype attributes are used exclusively for object properties, and value attributes can be used for assigning both object and datatype properties. Property statements that use a value attribute (for either an object or datatype property) cannot be further nested. Constants. One use of a datatype property annotation is for assigning constant values to each corresponding individual generated by a pattern. For example, the following pattern assigns a property P with the value 5 to each generated C instance.
<individual type="ont:C" foreach="DISTINCT R.x"> <property type="ont:P" value="5"/> </individual> This pattern can be read as "For each unique x value of R, create an instance of C that has a property P with the value 5." The corresponding first-order rule for this pattern is:
(Axyz) R(x, y, z), u=id_p7(x) -> triple(u, rdf:type, ont:C), triple(u, ont:P, 5). Resource Values. Another common use of datatype property annotations is for capturing associated resource values. For example, the following pattern assigns each instance a property P whose value is taken from the resource variable x.
<individual type="ont:C" foreach="DISTINCT R.x"> <property type="ont:P" value="$R.x"/> </individual> This pattern can be read as "For each unique x value of R, create an instance of C that has a property P with the value x." Note that the symbol '$' is used to distinguish references to resource values from constants. The corresponding first-order rule for this pattern is:
(Axyz) R(x, y, z), u=id_p8(x) -> triple(u, rdf:type, ont:C), triple(u, ont:P, x). Resource variables can be used outside of the current iteration context (i.e., the enclosing foreach expression). In this case, the current iteration context is used to determine the particular resource values that are accessed. Note that it is possible for multiple properties to be created when the resource variables are outside of the iteration context. For example, the following pattern assigns to each instance associated with x, a property P for each of x's corresponding y values.
<individual type="ont:C" foreach="DISTINCT R.x"> <property type="ont:P" value="$R.y"/> </individual> This pattern can be read as "For each unique x value of R, create an instance of C that has a property P with value y, for each unique y value of x"[7]. The corresponding first-order rule for this pattern is:
(Axyz) R(x, y, z), u=id_p9(x) -> triple(u, rdf:type, ont:C), triple(u, ont:P, y). In this case, if a particular x value has multiple y values, each such y value will result in a P property. Note that if x and y were not related, (e.g., if the expressions were R1.x and R2.y, respectively), the result would be a cross-product in which every x value would be P-related to every y value. For example, the following pattern:
<individual type="ont:C" foreach="DISTINCT R1.x"> <property type="ont:P" value="$R2.y"/> </individual> corresponds to the following first-order rule, assuming R1 and R2 both represent relation R:
(Axvzwyt) R(x, v, z), R(w, y, t), u=id_p10(x) -> triple(u, rdf:type, ont:C), triple(u, ont:P, y). In an instantiation pattern, value expressions must evaluate to a single value. Although not considered here, it may be useful to define functions for use in value expressions, such as concatenation, addition, and so on. Conditional Properties. A property statement can be conditionally applied using an if expression. In particular, the conditions of the if expression must hold for the property to be added to the corresponding individual. For example, the following pattern only adds P to the individual if x is a positive value.
<individual type="ont:C" foreach="DISTINCT R.x"> <property type="ont:P" value="$R.x" if="R.x>0"/> </individual> This patterns must be represented using two first-order rules:
(Axyz) R(x, y, z), u=id_p11(x) -> triple(u, rdf:type, ont:C). (Axyz) R(x, y, z), x>0, u=id_p11(x) -> triple(u, ont:P, x). Thus, the condition on the property does not affect whether the individual is created, only whether the individual has a P property. Using property conditions, it is possible to define simple mappings from resource values to standard property values, e.g., for converting coded values in a dataset to their corresponding "full" names. Like with conditions on individual statements, no restrictions are placed on the variables that can be used in property statement conditions. Variables used in property conditions that are outside the iteration context of the property, with the exception of variables within value expressions[8], require only one associated value to satisfy the condition for the property to be applied (again, similar to the ANY keyword in SQL).
3.3 Complex Instantiation PatternsWe have described two mechanisms to link individuals to object properties: through valuetype expressions that generate new, anonymous individuals "in place"; and through value expressions containing pre-defined individual identifiers. Here, we introduce the use of pattern labels and pattern references to additionally allow object properties to link to individuals created in other instantiation patterns. Pattern Labels. Each individual instantiation pattern can be assigned a unique label. For example, the following pattern is assigned the label 'o1'.
<individual label="o1" type="ont:C" foreach="DISTINCT R.x"/> The first-order rule for this pattern is:
(Axyz) R(x, y, z) u=o1(x) -> triple(u, rdf:type, ont:C). The use of labels in this way does not change the interpretation of the pattern, thus, the first-order rule associated with this pattern is the same as above (p1). However, for convenience, we use the label name as the Skolem function here. Referencing Patterns. Properties can reference patterns using pattern labels in value expressions[9]. To distinguish pattern references from constants and resource variables, pattern references are prefixed with an '@' sign. For example, the following pattern contains a reference to the pattern labeled 'o1' above.
<individual label="o2" type="ont:D" foreach="DISTINCT R.x, R.y"> <property type="ont:P" value="@o1"/> </individual> This pattern can be read as "For each unique x, y (tuple) value pair in R, create an instance of D that has a property P to the corresponding instance of C." The first-order rule for this pattern is:
(Axyz) R(x, y, z) u=o1(x), v=o2(x, y) -> triple(v, rdf:type, ont:D), triple(v, ont:P, u). As with resource variables, pattern references are interpreted with respect to the current iteration context. In this example, because P's iteration context is "R.x, R.y" and o1's iteration context is "R.x" (i.e., o1's foreach expression is contained in P's enclosing foreach expression), the added P property is assigned the individual corresponding to the current x value of the iteration context. The iteration context of a property is not required to be a superset of its referenced pattern. For example, in the following pattern:
<individual label="o3" type="ont:D" foreach="DISTINCT R.y, R.z"> <property type="ont:P" value="@o1"/> </individual> each unique y, z pair for R will be assigned a property P for every corresponding x value of the pair. Note that in this example, any given y, z pair may have multiple associated x values. Also, the iteration context of a property only applies to the referenced pattern, and does not apply to additionally nested pattern references. For example, consider the following two patterns.
<individual label="o5" type="ont:E" foreach="DISTINCT R.x, R.z"> <property type="ont:Q" value="@o4"/> </individual> <individual label="o4" type="ont:D" foreach="DISTINCT R.y"> <property type="ont:P" value="@o1"/> </individual> The corresponding first-order rules for these patterns are:
(Axyz) R(x, y, z) u=o1(x), v=o4(y) -> triple(v, rdf:type, ont:D), triple(v, ont:P, u). (Axyz) R(x, y, z) v=o4(y), w=o5(x, z) -> triple(w, rdf:type, ont:E), triple(w, ont:Q, v). Thus, although property values containing pattern references are assigned values from within the context of the enclosing foreach expression, each distinct pattern is still executed within its own context. To illustrate, let R be defined as follows.
x y z --- --- --- 1 4 8 2 4 9 The triples created from pattern o1 are:
triple(o1(1), rdf:type, ont:C) triple(o1(2), rdf:type, ont:C) The triples created from pattern o4 are:
triple(o4(4), rdf:type, ont:D) triple(o4(4), ont:P, o1(1)) triple(o4(4), ont:P, o1(2)) And the triples created from pattern o5 are:
triple(o5(1, 8), rdf:type, ont:E) triple(o5(1, 8), ont:Q, o4(4)) triple(o5(2, 9), rdf:type, ont:E) triple(o5(2, 9), ont:Q, o4(4)) Notice that both individuals of pattern o5 are Q-related to the same o4 individual. Similarly, this o4 individual is P-related to both individuals of o1, corresponding to both x values of R, and thus going "out of context" for pattern o5. To use the iteration context of o5 for o1 while still generating intermediate instances of D, we can use the following pattern, combining o5 and o4:
<individual label="o6" type="ont:E" foreach="DISTINCT R.x, R.z"> <property type="ont:Q" valuetype="ont:D"> <property type="ont:P" value="@o1"/> </property> </individual> The corresponding first-order rule for this pattern is:
(Axyz) R(x, y, z) u=o1(x), w=o6(x, z) -> (Ev) triple(w, rdf:type, ont:E), triple(w, ont:Q, v), triple(v, rdf:type, ont:D), triple(v, ont:P, u). Note that in this case, however, we generate only one D instance per x, z pair,(instead of one for every value of y. Also, with pattern o6, we can no longer reference the D instances in other patterns. Property Iteration and Labels. It is also possible to apply foreach expressions to property statements, e.g., allowing one to additionally specify how intermediate individuals, for cases like o6 above, should be constructed. For example, the following pattern:
<individual label="o7" type="ont:E" foreach="DISTINCT R.x, R.z"> <property type="ont:Q" valuetype="ont:D" foreach="DISTINCT R.y" label="o8"> <property type="ont:P" value="@o1"/> </property> </individual> results in the first-order rule:
(Axyz) R(x, y, z) u=o1(x), w=o7(x, z), v=o8(x, y, z) -> triple(w, rdf:type, ont:E), triple(w, ont:Q, v), triple(v, rdf:type, ont:D), triple(v, ont:P, u). As shown, labels may also be applied to intermediate individuals (via their corresponding property statements), allowing these individuals to be referenced from within other patterns. In this case, the iteration context of the nested pattern is the union of its foreach expression with each of its ancestor's foreach expressions.
Glossary
Footnotes[#1] In OWL, instances of classes are called 'individuals.' [#2] Alternatively, we could use anonymous identifiers for generated OWL individuals. However, using explicit as opposed to anonymous identifiers has a number of advantages, e.g., identifiers can be used for "provenance" (that is, using conventions for identifier names one could go from the created OWL individuals back to the resource item used to generte the resource), and also make it easier to formalize the interpretation of patterns in first-order logic. [#3] The examples of resources in this document are assumed to be relational data sets. However, the approach described here can be used with a variety of resource structures, including nested relational data (e.g., like in XML). [#4] By default, variables in foreach expressions that are null in the resource do not generate corresponding ontology class instances. [#5] We use the notation (Axy) for universal quantification over variables x and y; (Exy) for existential quantification over variables x and y; and -> for implication. [#6] Note that the use of property expressions in this way is useful for cases in which the property is either (i) not defined (or optional) in the ontology for the associated class, or (ii) is a required property, but the valuetype expression gives a subclass of the property's defined range. [#7] Implicitly, this pattern is equivalent to the pattern:
<individual type="ont:C" foreach="R.x"> <property type="ont:P" value="$R.y" foreach="R.y"> </individual> That is, for each unique x, y pair, assign a P property with value y to the corresponding C instance. Additional uses of foreach attributes on properties are discussed later. [#8] Because the variable used within a value attribute is implicitly carried over to the property statement's foreach expression (see [7]), these resource variables are considered to be part of the property statement's iteration context. [#9] Pattern references can be cyclic, i.e., a property within a pattern p can contain a property that refers to p.
|
This material is based upon work supported by the National Science Foundation under award 0225676. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF). Copyright 2004 Partnership for Biodiversity Informatics, University of New Mexico, The Regents of the University of California, and University of Kansas |