Saturday, May 15, 2010

Semantic Web Introduction - Part 3: RDF Schema

This post represents the second in a series of posts that aim to introduce some of the concepts and technologies used as part of the Semantic Web. The previous post briefly introduced RDF which allows for the description of entities using statements comprised of subjects predicates and objects. In this post the RDF Schema language is discussed.

The RDF Schema (RDFS) language was developed to permit the definition of vocabularies that can be used with RDF data. RDFS includes a small set of predefined resource classes with associated meaning that can be used to define classes and the relationships between them. Unlike some schema languages such as Document Type Definition (DTD) which is one way to define the structure of XML data, RDFS uses RDF to describe the vocabularies. In using the language itself to define the schema, RDFS is in some ways more like XML Schema. One significant difference between RDFS and XML Schema is that RDFS does not validate RDF data, while XML Schema is used to validate XML.

One of the core types of relationships between classes is generalization and specialization; super class and sub class relationships. RDFS includes a predefined property named rdfs:subClassOf to describe such relationships. In RDFS there is no limitation on the number of classes a given class can be the sub class of. In the example below, Penicillin is declared to be a sub class of both Antibiotic and USRegulatedMedication. It will therefore inherit the properties of those classes.

Penicillin rdfs:subClassOf Antibiotic. Penicillin rdfs:subClassOf USRegulatedMedication.

In addition to permitting the construction of class hierarchies using the rdfs:subClassOf property, RDFS provides for creating similar relationships between properties. This is accomplished through the use of rdfs:subPropertyOf as the following example demonstrates.

hasSideEffect rdf:type rdf:Property. hasAllergicReaction rdfs:subPropertyOf hasSideEffect.

The RDFS specification provides for additional expressivity through the ability to define what elements can be used as the subject and objects in triples using individual properties. Borrowing the terminology from mathematics, RDFS defines the properties of Domain (rdfs:Domain) and Range (rdfs:Range). In RDFS, the domain is the set of elements that can be used as the subject in RDF triples and the range is the set of elements that can be expressed as the object. As the subject of a triple must be a resource, the domain must be defined as a class. Since the object of a triple can be either a resource or a literal, the range can be expressed as a class or a RDF data type. The example below defines two classes Medication and Condition and asserts that they are the domain and range respectively of the hasSideEffect property.

Medication rdf:type rdfs:Class. Condition rdf:type rdfs:Class. hasSideEffect rdf:type rdf:Property. hasSideEffect rdfs:Domain Medication. hasSideEffect rdfs:Range Condition.

There is no limitation to the number of domain and range restrictions that can be defined for an individual property. For example, if two classes are defined as the domain for a property, the set of entities that can be used as subject with the property is the intersection of the two classes.

RDFS includes a few other defined properties that are not discussed here (e.g., rdfs:Label used for labeling entities in human readable form, rdfs:Comment used for embedding notes in RDFS content). Additional information can be found in the References and Resources section at the end of this post.

Inferences and RDFS

Having now introduced the basic constructs of RDF and RDFS, the benefit behind modeling in the semantic languages of RDFS and OWL (described in the next post) can be discussed. While expressing stated information in a standardized way has its own benefits, the true power that modeling in these languages is the ability to infer additional data. This is accomplished through the combination of the stated triples and the model (also described as triples). Triples that have been stated explicitly are known as asserted triples, while those that have been arrived at based on the data and model are called inferred triples.

The RDFS constructs that have been discussed above, rdfs:subClassOf, rdfs:subPropertyOf, rdfs:Domain, and rdfs:Range, can be used to infer additional triples which are governed by inference rules. While all of the rules will not be discussed here two of the most important ones are, type propagation and type inference through domain and range.

Type propagation allows that a resource defined as a specific type will inherit the properties of that type plus all properties of any class that it is a sub class of. The rdfs:subClassOf property is a transitive property meaning that the resource will inherit not only the properties of the class it is a sub class of but any other class for which the parent is a sub class of.

Previously it was mentioned that unlike XML Schema, RDFS does not validate. The reason for this is that in RDFS additional inferences are drawn instead of validation errors. This is at the heart of the application of type inference through property domain and range. As a result, if a property is defined with a specific class for its domain and an assertion is made that a resource has the property, it is inferred that the resource is a member of the class. The same is true if a class is defined as the range of a property and a resource is used as the object of a triple using the property. Continuing with the previous example, the triple below asserts that Penicillin has a potential side effect of rash.

Penicillin hasSideEffect Rash.

The following inferences can be made based on the previous assertions that the hasSideEffect property has a domain of Medication and a range of Condition.

Penicillin rdf:type Medication. Rash rdf:type Condition.

While the examples presented here are intentionally very simplistic, the inference rules allow for the addition of inferred triples into the data. These inferred triples are treated in the same way as asserted triples in that they can then be used as the basis for additional inference.
The software component that performs the inferencing over the triples is referred to as reasoner.

Reasoners have been implemented using different techniques to perform the inferencing and are available for multiple programming environments. Several of the most popular ontology development tools have integrated support for reasoners allowing the ontology creator to easily determine the effect that their model is having by the inferences that are being drawn.

With RDFS, it is possible to model a domain by describing the relationship between classes of resources and their properties. It is possible to infer data from asserted statements that can be added to the total amount of known information. However, RDFS only provides very limited expressiveness which may not be adequate for all modeling needs. For example, there is no way to express cardinality in properties, or to state that the members of two sets of classes are disjoint. These and other features are included in Web Ontology Language (OWL) to enable the creation of more expressive models. This will be the topic of the next post.

References and Resources

No comments: