Tuesday, October 26, 2010

Filling a Commute with Tech Podcasts

Being a person who commutes to work via train on a regular basis, I have over the past few years attempted to find productive uses for that time. Technical podcasts have filled many an hour and on some occasions provided an idea, concept, or piece of news applicable to work.

Recently I was chatting with a co-worker, James Lorenzen, about podcats and thought it might be interesting to post some of the ones that I commonly listen to. The below list contains some of my regulars that continue to produce new episodes. They are listed in no particular order.



Oracle Technology Network TechCasts - http://feeds.feedburner.com/OTN_TechCasts

Software Engineering Radio - http://www.se-radio.net/rss




ESRI Speaker Series Podcasts - http://feeds.esri.com/podspeaker

Recent addition

IEEE Software's “On Architecture” with Grady Booch - http://feeds.feedburner.com/onarchitecture

I'm always interested in any other recommendations people may have.

Sunday, July 18, 2010

Starting down the trail with Silverlight and REST Services

The current project that I am working on includes a user interface that can be characterized as a browser-based rich client application. To date, the client components have been written in either JavaScript or Flex. The components interact with a JEE application server through a set of RESTful APIs. I recently started investigating the potential to implement components based on Microsoft Silverlight in anticipation of potential client requirements.

While not explicitly defined, my goals included the following:
1. Asses how easy would it be for someone like me to create user interfaces with Silverlight
2. What level of effort is involved in working with our REST-based APIs in Silverlight
and a as bonus
3. Could such components run on Linux through Moonlight

This post is a running summary of what I have been able to accomplish to date:

Initial Steps

To start out, I setup a development environment in a VM consisting of Windows 7 and Visual Studio 2010 Ultimate. To help familiarize myself with Silverlight, I started reading through Matthew MacDonald's Pro Silverlight 3 in C#. I choose to implement the components in C# because of our team's strong background in Java and the similarities between the two languages. The book provided most of the information I needed and I'm sure will be a good reference as I continue with Silverlight.

After some initial experimenting with the tools, I selected one of our existing components and decided to attempt to reproduce it in Silverlight. The component I choose is very simple. It retrieves a list of resources from the server, let's call them widgets, and allows the user to set their active / inactive state.

Visual Layout

To be clear, the focus of my experimentation was not to create the best looking UI, but I did want to see if using the standard settings you could create something presentable. One of the new features that Microsoft added as part of Visual Studio 2010 is a graphical XAML (Extensible Application Markup Language) editor. XAML files allow the developer / designer to layout components (it can also be done in code). Microsoft produces another editor named Expression Blend which targets the designers. I decided to stick with the VS2010 editor for this quick project.

My layout was fairly basic. It used a simple Grid layout that included a DataGrid to hold the records to be displayed and manipulated. The layout concepts should be easily picked-up by developers accustomed to other UI layout libraries (e.g., Java Swing etc.). The DataGrid can be configured to automatically create columns for each public property of the object bound to it. This is a great way to quickly get the data into a UI. After initially using this approach to make sure the binding was working, I defined switched to explicitly defining the columns and setup three named active, name, and title.

Retrieving and Displaying the Data

The constructor for the code behind class of MainPage was defined as follows:

public MainPage()
  {
    InitializeComponent();
      // Setup the Cell Editing delegate
      this.gridWidgets.CellEditEnded += 
         new EventHandler(gridWidgets_CellEditEnded);

      // Request the widget List data
      this.requestWidgetList();
  }

The first instruction registers an event handler to handle when the user changes the state of the active value of a record. The second instruction calls the requestWidgetList() method which starts the process of requesting the data from the server. As this was a simple HTTP GET request, I was able to use the WebClient .NET class. You simply create an instance of the client, register a delegate to read the response, and pass an instance of the Url class to the client to initiate the request. The event handler for the read complete event receives the data, deserializes it from JSON using the built in .NET class and populates the DataGrid through its ItemsSource property.The two methods are shown below.

private void requestWidgetList()
  {
    WebClient client = new WebClient();
    Uri address = new Uri(getApplicationUrl() + "/widgets;

    client.OpenReadCompleted += client_OpenReadCompleted;
    client.OpenReadAsync(address);
  }

  void client_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
  {
    DataContractJsonSerializer serializer =
      new DataContractJsonSerializer(typeof(WidgetSet));
    this.widgetSet = (WidgetSet)serializer.ReadObject(e.Result);
    this.gridWidgets.ItemsSource = widgetSet.widgets;
  } 
 
As you may notice, the deserializer was initialized with the type of data expected in the results. This is one area where working with the built-in .NET serialization classes could be time consuming if the JSON you're working with has a complex structure and is still in development. There is an open source library named Json.NET started by James Newton-King that aims to make working with JSON in .NET much easier. I haven't tried using the library yet, but plan to shortly.

Another thing to notice about the above code, is the call to the getApplicationUrl. This method obtains the URL to our application's base REST API context. In our application the REST API host may not be the same as the host the Silverlight component is downloaded from, Therefore the properties provided by the Silverlight environment could not be used and I needed a different way to obtain this information. For the sake of this simple example, I decided to pass this information into the Silverlight component as initialization parameters. This isn't the way you would want to go for a production application as it can be easily hacked, but for my purposes I thought it acceptable.

Updating the Data

The update process starts with the event handler registered in the MainPage constructor for DataGrid changes. It first checks to verify that the user is committing a change as opposed to canceling one. It then starts the process of sending the POST to the REST API to update the data. Unlike the GET request used to initially retrieve the data, sending a POST cannot be done using the WebClient .NET class. It requires the use of the WebRequest .Net class. This class is more flexible but adds an additional asynchronous step in the request process.

Testing and Experimentation

After I got the component working using VisualStudio's built-in test server, I deployed the Silverlight application to my JEE server as an embedded object in an HTML/JSP page. The Silverlight application is packaged up as a XAP file. This is essentially a ZIP file containing the compiled C# code, supporting libraries, and any static resources that the developer wanted to include. After starting our application, I brought up the new component and things worked as expected. Deciding to go a step further, I then installed Moonlight 3 Preview on Ubuntu 0904 64-bit and decided to give it a try. As the DaatGrid control was introduced in Silverlight 3, I needed Moonlight 3 to try out my component. I was pleasantly surprised that it worked as expected. The run time included a warning banner indicating that it was a preview release and it seemed a bit more sluggish than running in Windows, but all-in-all I was really impressed. Nice work Moonlight team!

Conclusions

Overall, I felt that the experience of creating a Silverlight component to interact with a JEE provided REST service went pretty smooth. Using the Json.Net library should hopefully make any similar work easier. And finally, the Moonlight team appears to be doing a great job at implementing the Silverlight environment. I'm curious now to see how working with the project would be in MonoDevelop ...

Tuesday, May 18, 2010

Semantic Web Introduction - Part 4: OWL

In this fourth part of the introduction to Semantic Web technologies blog series, the OWL (Web Ontology Language) is discussed along with its profiles.

A quick note on tools: So far the discussion has focused on the languages that are used in the Semantic Web and have not mentioned any of the tools that can be used to aid in the development of models. The following discussion of OWL includes references to many additional properties and classes. While it is possible to work with these languages in a text editor having a graphical toolset make the job much easier. Fortunately, there exist excellent commercial and open source packages that fill this need. The two packages that I have used are the open source Protégé 4.0 and the commercially supported TopBraid Composer from TopQuadrant. Both tools are implemented in Java and are therefore cross platform. TopBraid Composer utilizes the Eclipse platform and includes both trials versions and free version. Both include integrated reasoner support. Links to these packages can be found in the Resources and References section at the end of this post.

Web Ontology Language

The Web Ontology Language builds on the constructs of RDF and RDFS. Like RDFS, OWL is defined using triples and is valid RDF. The goal of OWL is to add to the expressivity of RDFS. This is accomplished using a set of additional defined classes and properties. Using OWL domain models can be created that are much more expressive than those created with RDFS. With the power of additional expressiveness comes the potential for additional complexity. For example, using OWL it is very easy to define an ontology that includes contradictions or classes that are unsatisfiable (i.e., cannot possibly include any members).

OWL 2 is the latest version of the OWL language. It became a W3C recommendation during the fall of 2009. OWL 2 maintains compatibility with OWL 1 as valid OWL 1 ontologies are also valid in OWL 2. The latest version adds new constructs to add to the expressiveness of the language. These include complex data types, property chains, and multi-property keys for classes to name a few. Everything in this discussion will be valid in OWL 2.

As mentioned above, OWL defines several standard classes and properties to aid in the expressivity of models. One of the first things to note regarding OWL, is that it defines its own class to indicate that a resource is a class instead of an individual. The class owl:Class is used in OWL instead of the version defined in RDFS, rdfs:Class. Additionally, OWL defines several new sub classes of the RDF property class rdf:Property. These include. owl:ObjectProperty, owl:DatatypeProperty, owl:AnnotationProperty, and owl:OntologyProperty.

The first three new property type classes represent concepts that have been previously discussed. OWL, is essentially adding classes to group them. For example, any property that is defined as being a sub property of owl:ObjectProperty cannot have a primitive as the object of a triple. The annotation property includes several of the non-semantic RDFS properties such as rdfs:label and rdfs:comment. It also includes a sub property to specify version information, owl:versionInfo. Sub-properties of the owl:OntologyProperty allow for the specification of version compatibility with the ontology using owl:priorVersion, owl:compatibleWith and owl:incompatibleWith. Likely the most significant sub-property of owl:OntologyProperty is owl:imports, as it is through this property that one ontology can include the assertions of another. This plays an important role in the re-use and extension of ontologies.

One of the new classes, owl:Thing is defined as being the base class from which all other classes are sub classed. Therefore, owl:Thing is the most generic class in OWL, all classes are sub classes of it, and all individuals are inferred as the type. At the opposite end of the spectrum, OWL defines owl:Nothing, as the most specific class which cannot have any members. All OWL classes can thus be thought of as being between owl:Thing and owl:Nothing. OWL includes a similar concept for properties by defining a property that is the most generic and another which is the most specific. These are referred to as top and bottom properties respectively and OWL includes a set of these properties for both object and data type properties (e.g., owl:topObjectProperty, owl:bottomObjectProperty etc.).

OWL includes a wealth of property classes which can be used in ontologies to model a domain. A complete discussion of the individual properties and the inferences that can be drawn from their use is beyond the scope of this post, however the following table provides a sampling of some of the property classes available in OWL. The Resources and Reference section at the end of this post contains links to information on the use and of these properties.

OWL Property Description

owl:inverseOf

If P is an inverse property of Q, then if A Q B is asserted, B
PA can be inferred and vice versa

owl:FunctionalProperty

If P is a functional property, then if A P B there can be only
a single value of B for a given A

owl:InverseFunctionalProperty

If P is an inverse functional property, then if A P B there
can be only a single value of A for a given B

olw:hasKey

If A owl:hasKey (X,
Y, Z) then there can be only one value of A for the unique
combination of X, Y, and Z

owl:ReflexiveProperty

If P is a reflexive property, then A P A holds for all As

owl:IrreflexiveProperty

If P is an irreflexive property, then A P A holds for no As

owl:SymmetricProperty

If P is a symmetric property, then If A P B is asserted, then
B P A can be inferred and vice versa

owl:AsymmetricProperty

If P is an asymmetric property, then if A P B is asserted, B P
A is a contradiction

owl:TransitiveProperty

If P is a transitive property, then if A P B and B P C are
asserted then A P C can be inferred

OWL significantly increases the level or expressivity possible in ontologies through the use of property restrictions. Property restrictions allow the ontology designer to place conditions on property relationships that define a class. Restrictions are defined using the owl:Restriction class and the owl:onProperty property class. There are two different types of restrictions in OWL, value and cardinality. Value restrictions limit the individuals that can be members of the class while cardinality restrictions limit the number of a specific property that can be defined for a class. Value restriction properties include owl:allValuesFrom, owl:someValuesFrom, and owl:hasValue. In the example below, a chemical compound class is defined as the sub class of an anonymous class that has a molecular formula property which must be a string. OWL specifies that restriction classes must be anonymous.

ChemicalCompound rdf:type owl:class;
                 rdfs:subClassOf[
                 rdf:type owl:Restriction;
                 owl:onProperty molecularFormula;
                 owl:allValuesFrom xsd:string
                 ].

Cardinality restrictions include owl:cardinality, owl:minCardinality, and owl:maxCardinality. They are specified in the same manner as value restrictions except that the cardinality property restriction includes an integer defining the cardinality limit.

During the discussion of RDFS, it was noted that class membership for an individual entity could either be asserted or inferred. Rules such as the type propagation rule are used in conjunction with the rdfs:subClassOf property to infer class membership based on their use. OWL includes the ability to infer class membership through the use of restrictions and class equivalency. In the previous example the restriction was used in a sub class relationship. This specifies that all members of the class must adhere to the restriction. OWL provides a property owl:equivalentClass that allows for the definition that two classes are the same. Used with a restriction, it asserts the additional statement that the restriction is sufficient to define membership in the class (i.e., if a class meets the restriction but is not asserted to be a member of the class, it can be inferred to be a member). This is a powerful concept used often in ontologies to categorize individual entities without the need to have them be declared to be members of the class. It enables new classes to be defined independently of the data and applied for analysis. The Protégé ontology editor refers to classes that have an equivalent class assertion as “Defined” classes and one that do not are referred to as “Primitive” classes.

Defining classes using equivalence and restrictions is not the only use of the owl:equivalentClass property. It also plays an important role in data integration. In part one of this blog post series, it was mentioned that one of the core problems that the Semantic Web aims to address is the ability to apply consistent concepts across data repositories on the Internet. Currently, data exposed by systems will often use different data formats to represent their data. If the data was exposed using an OWL ontology, the equivalency properties within OWL could be used to align the data from both systems even though they use different ontologies. For example, two systems dealing with consumers of medical services expose data using different ontologies. System 1 refers to the concept of a consumer as a “patient”, while system 2 refers to the same concept as a “client”. If the two concepts are semantically the same, the following statement can be used to align the classes. Once they have been defined to be equal, members of one class are also members of the second.

sys1:Patient owl:equivalentClass sys2:client.

In addition to owl:equivalentClass, OWL defines owl:equivalentProperty to assert that two properties are the same. So far, the discussion of equivalency has only dealt with classes. OWL includes the owl:sameAs property to assert that individuals are the same.

Sometimes it is also important to assert that two classes or individuals are not the same. For individuals, OWL includes the owl:differentFrom property. OWL provides a similar concept for classes through the specifying that the classes are disjoint. This means that members of one class cannot be members of the other. OWL provides two different ways to indicate that classes are disjoint; owl:disjointWith specifies that two classes are disjoint with each other and owl:AllDisjointClasses which provides a shorthand construct to assert that a set of classes are disjoint with each other.

So far, this post has attempted to introduce some of the main concepts, features and constructs of OWL; however here are many other properties, forms etc that were not discussed. The Resources and Reference section at the end of the post contains links to the full specifications and additional material to aid in getting started with OWL.

OWL Profiles

The various OWL constructs increase the expressiveness of the language but they can also have a cost measured in the computational complexity required to perform such activities as determining class membership. As a result, there is a trade-off between expressiveness and computational complexity. Different uses of OWL might therefore want to select a subset of OWL in order to have better computational characteristics. These limited subsets are referred to as OWL Profiles. The concept of defining targeted subsets of language features as profiles is one that is used in other modeling languages. For example, the Geography Markup Language (GML) defines several profiles including the Point Profile and Simple Features Profile to limit the scope of the language.

OWL 1 defined three profiles, Full, Lite, and DL. The Full profile is the complete OWL specification. All OWL ontologies are valid under the Full profile. The first profile that limited OWL was the Lite profile. Its intent was to define a subset of OWL geared toward easy adoption by tool developers. The profile did not gain much acceptance as a number of tools chose to support other profiles. The final profile defined in OWL 1 is the DL profile, which refers to Description Logic. The intent of the profile was to support a subset of axioms that would ensure that models were decidable. One of the most significant limits the DL profile imposes is that a given resource cannot be treated as a class and an individual. The fact that a model is expressed in a decidable profile does not guarantee good performance.

With OWL 2 three additional profiles were introduced that are further limitation of OWL DL. The first of these is the EL profile which is designed to provide polynomial time determination of a model’s consistency and individual class membership. The EL profile is a good choice for ontologies with large and complex class structures. OWL 2 also defined an RL profile targeted for use with rules processing. The final profile defined by OWL 2 is QL which is a good choice for query processing. It is designed to perform queries in log time based on the number of assertions. One other profile of note is the EL++ profile. This profile is designed to provide polynomial time performance for satisfiability, subsumption, classification, and instance checking reasoning problems. Two of the main limitations of EL++ from OWL DL are prohibition on the use of owl:allValueFrom and the cardinality restrictions. Again, the Resources and Reference section at the end of the post contains links to information on the specific limitations imposed by each of the profiles.

This post presented a very brief (ok, not that brief as blog posts go) introduction to the Web Ontology Language. The next post wil attempt to tie the series together by presenting an example ontology built to model a problem domain.

Resources and References

Saturday, May 15, 2010

Semantic Web Introduction - Part 3: RDF Schema

This post represents the second in a series of posts that aim to introduce some of the concepts and technologies used as part of the Semantic Web. The previous post briefly introduced RDF which allows for the description of entities using statements comprised of subjects predicates and objects. In this post the RDF Schema language is discussed.

The RDF Schema (RDFS) language was developed to permit the definition of vocabularies that can be used with RDF data. RDFS includes a small set of predefined resource classes with associated meaning that can be used to define classes and the relationships between them. Unlike some schema languages such as Document Type Definition (DTD) which is one way to define the structure of XML data, RDFS uses RDF to describe the vocabularies. In using the language itself to define the schema, RDFS is in some ways more like XML Schema. One significant difference between RDFS and XML Schema is that RDFS does not validate RDF data, while XML Schema is used to validate XML.

One of the core types of relationships between classes is generalization and specialization; super class and sub class relationships. RDFS includes a predefined property named rdfs:subClassOf to describe such relationships. In RDFS there is no limitation on the number of classes a given class can be the sub class of. In the example below, Penicillin is declared to be a sub class of both Antibiotic and USRegulatedMedication. It will therefore inherit the properties of those classes.

Penicillin rdfs:subClassOf Antibiotic. Penicillin rdfs:subClassOf USRegulatedMedication.

In addition to permitting the construction of class hierarchies using the rdfs:subClassOf property, RDFS provides for creating similar relationships between properties. This is accomplished through the use of rdfs:subPropertyOf as the following example demonstrates.

hasSideEffect rdf:type rdf:Property. hasAllergicReaction rdfs:subPropertyOf hasSideEffect.

The RDFS specification provides for additional expressivity through the ability to define what elements can be used as the subject and objects in triples using individual properties. Borrowing the terminology from mathematics, RDFS defines the properties of Domain (rdfs:Domain) and Range (rdfs:Range). In RDFS, the domain is the set of elements that can be used as the subject in RDF triples and the range is the set of elements that can be expressed as the object. As the subject of a triple must be a resource, the domain must be defined as a class. Since the object of a triple can be either a resource or a literal, the range can be expressed as a class or a RDF data type. The example below defines two classes Medication and Condition and asserts that they are the domain and range respectively of the hasSideEffect property.

Medication rdf:type rdfs:Class. Condition rdf:type rdfs:Class. hasSideEffect rdf:type rdf:Property. hasSideEffect rdfs:Domain Medication. hasSideEffect rdfs:Range Condition.

There is no limitation to the number of domain and range restrictions that can be defined for an individual property. For example, if two classes are defined as the domain for a property, the set of entities that can be used as subject with the property is the intersection of the two classes.

RDFS includes a few other defined properties that are not discussed here (e.g., rdfs:Label used for labeling entities in human readable form, rdfs:Comment used for embedding notes in RDFS content). Additional information can be found in the References and Resources section at the end of this post.

Inferences and RDFS

Having now introduced the basic constructs of RDF and RDFS, the benefit behind modeling in the semantic languages of RDFS and OWL (described in the next post) can be discussed. While expressing stated information in a standardized way has its own benefits, the true power that modeling in these languages is the ability to infer additional data. This is accomplished through the combination of the stated triples and the model (also described as triples). Triples that have been stated explicitly are known as asserted triples, while those that have been arrived at based on the data and model are called inferred triples.

The RDFS constructs that have been discussed above, rdfs:subClassOf, rdfs:subPropertyOf, rdfs:Domain, and rdfs:Range, can be used to infer additional triples which are governed by inference rules. While all of the rules will not be discussed here two of the most important ones are, type propagation and type inference through domain and range.

Type propagation allows that a resource defined as a specific type will inherit the properties of that type plus all properties of any class that it is a sub class of. The rdfs:subClassOf property is a transitive property meaning that the resource will inherit not only the properties of the class it is a sub class of but any other class for which the parent is a sub class of.

Previously it was mentioned that unlike XML Schema, RDFS does not validate. The reason for this is that in RDFS additional inferences are drawn instead of validation errors. This is at the heart of the application of type inference through property domain and range. As a result, if a property is defined with a specific class for its domain and an assertion is made that a resource has the property, it is inferred that the resource is a member of the class. The same is true if a class is defined as the range of a property and a resource is used as the object of a triple using the property. Continuing with the previous example, the triple below asserts that Penicillin has a potential side effect of rash.

Penicillin hasSideEffect Rash.

The following inferences can be made based on the previous assertions that the hasSideEffect property has a domain of Medication and a range of Condition.

Penicillin rdf:type Medication. Rash rdf:type Condition.

While the examples presented here are intentionally very simplistic, the inference rules allow for the addition of inferred triples into the data. These inferred triples are treated in the same way as asserted triples in that they can then be used as the basis for additional inference.
The software component that performs the inferencing over the triples is referred to as reasoner.

Reasoners have been implemented using different techniques to perform the inferencing and are available for multiple programming environments. Several of the most popular ontology development tools have integrated support for reasoners allowing the ontology creator to easily determine the effect that their model is having by the inferences that are being drawn.

With RDFS, it is possible to model a domain by describing the relationship between classes of resources and their properties. It is possible to infer data from asserted statements that can be added to the total amount of known information. However, RDFS only provides very limited expressiveness which may not be adequate for all modeling needs. For example, there is no way to express cardinality in properties, or to state that the members of two sets of classes are disjoint. These and other features are included in Web Ontology Language (OWL) to enable the creation of more expressive models. This will be the topic of the next post.

References and Resources

Thursday, May 13, 2010

Semantic Web Introduction - Part 2: RDF

This post represents the second in a series of posts that aim to introduce some of the concepts and technologies used as part of the Semantic Web. The first in the series attempted to provide a brief overview of the motivation behind the Semantic Web. In this post, one of the core technologies used to describe entities is discussed.

The Resource Description Framework is one of the fundamental Semantic Web standards in that it specifies how data in the form of statements are structured. A statement is made up of three parts, a subject, a predicate, and an object. For example, a statement stating the origins of Penicillin could be written in the following form.

AlexanderFleming discovered Penicillin.

In this example AlexanderFleming is the subject, discovered is the predicate, and Penicillin is the object. Statements in this form are referred to as triples and a software component that provides storage and access to them is called a triple store. A set of triples can be represented as a graph. For example, if we further want to state that Penicillin is used to treat Staphylococcus we might represent the statements in the form of triples as the following.

AlexanderFleming discovered Penicillin.
Penicillin treats Staphylococcus.


A graph representation of the above statements could be shown as in Figure 1.

Figure 1 – Basic RDF Graph

In a graph representation, the subjects and objects from the triples are depicted as nodes and the predicates form the edges. In RDF, there are two types of nodes, resources and literals. A resource is an entity that is represented by the node and has associated with it a Universal Resource Identifier (URI), while a literal is a constant value. While resources can be the subject or the object in a triple, literals can only serve as the object. An additional feature of RDF is that the predicates are themselves resources. It is common in RDF to refer to predicates as properties. When the properties relate two resources the property is referred to as an object property and when it relates a resource to a literal it is referred to as a data property.

Similar to other data formats like XML, RDF supports the concept of namespaces. This is an important aspect of the specification as it allows the same name to be used in different contexts or vocabularies without conflicting or causing ambiguity. Namespaces are identified by URIs and prefix the resource names that use them. The prefixes and resource names are typically separated by a colon and most RDF serializations support the definition of abbreviations to make the date more compact and readable.

The RDF specification is itself a namespace and includes a few resources that can be used to help define data. One of the most commonly used is rdf:type which allows a resource to be included in a class or category of resources. In a triple that uses rdf:type as the property, the subject is defined as being an instance of the class represented by the object. There is no limitation that a subject can be declared as an individual of a single class. While RDF allows for declaring that a resource is an individual of a class it does not support the modeling of classes. This is the topic of the next two Semantic Web technologies RDFS and OWL which are discussed in the future posts.

Data modeled in RDF can be serialized in a number of different formats including RDF/XML, N3 (Notation 3 RDF), N-Triples, and Turtle (Terse RDF Triple Language). The individual serializations have different benefits and drawbacks. For example RDF/XML is supported by many tools and can be parsed by standard XML tools such as XPath, but is more difficult for human reading than Turtle. Fortunately, tools often support multiple serializations and libraries such as the open source Jena package make it easy to convert between them. Resources for the different serialization formats can be found at the end of this post.

The Turtle RDF serialization is being used for examples in this blog post series as it is compact and is relatively readable. The base for of a triple in Turtle (hint the above examples are using it) is to write the subject, predicate, and object separated by a space and ended with a period. Turtle includes a few shortcuts for writing multiple statements about the same subject or using the same predicate. When the next statement is about the same subject a semi-colon is inserted in place of the period after the first statement.

AlexanderFleming discovered Penicillin;
                 bornIn “Scottland”.


When the next statement is regarding the same subject and predicate a comma is used to separate the objects. Note also that literals are enclosed in double quotes.

AlexanderFleming profession “Biologist”, “Pharmacologist”.

As was mentioned above, the subject of a triple must be a resource and all named resources must have a URI. However, it is often desirable to make a statement about a resource where it is impractical to make it a named resource with a URI. To address these situations RDF includes the concept of blank nodes. Blank nodes can have local names which have a scope local to the document they are used in. This technique makes it possible to make statements about statements (i.e., reification). In Turtle blank nodes are enclosed in brackets. The example below demonstrates this construct in that the side effect is not named but statements about its properties are made.

Penicillin hasSideEffect [ duration “2 days”;
                           acute “No”].


In the next part in the series, the RDF Schema standard will be discussed. It will be shown how RDFS can be used to define hierarchies of classes and properties.

References and Resources

Monday, May 10, 2010

Semantic Web Introduction - Part 1

This blog post represents the first in what I hope is a series of posts that review some of the concepts and core technologies used in the Semantic Web. The content is drawn from a recent independent study class that I completed. In this first entry, some of the motivations behind the semantic web are discussed.

Introduction

The web as initially conceived was designed primarily to link documents, images, and files without capturing information on the meaning of the relationships. The human user was required to infer meaning from the context of the linked information. This ambiguity severely limits the potential for the data to be leveraged as part of automated machine processes. Several technologies and standards are emerging which aim to address these and other problems by providing a means to relate data stored in different formats using potentially different terminology. Their goal is to provide meaning to data not just structure; semantics in addition to syntax. The implementation of these technologies on the Internet is referred to as the Semantic Web.

The general concept behind the semantic web is the addition of meaning to the data exposed on the web. Today, most websites are formatted such that the data and presentation are one and the same. This relies on an individual’s ability to infer the meaning of the data from its surrounding context on the page. Unfortunately, this makes it very difficult for programs to extract the meaning. What is needed is a means to access data that is geared towards machine consumption and exploitation. The semantic web technologies approach this problem by enabling the construction of a web of machine interpretable data with the capability to include meaning. This is largely achieved through the use of data standards that formalize how data in the form of statements about entities are constructed.

Several key assumptions that are intuitive to most users of the Internet were designed into the Semantic Web. The first of these assumptions is known as the AAA slogan, or “Anyone can Say Anything about Any Topic”. It means that statements about a subject can be made from multiple sources and that these statements may in some cases even conflict. The second is what is known as the Open World Assumption. It means that additional information may be learned in the future and therefore conclusions cannot be made under the premise that all information is known. The last assumption deals with the fact that the same entity may be referred to in different ways by different data sources. For example, a specific data source may include statements about an individual using their Social Security Number (SSN) for identification while a second system may make statements about the same individual using a driver’s license ID, and a third using a passport number. All three sets of statements refer to the same individual and the ability to relate all of the statements may yield additional information. The ability to associate data from multiple sources in order to help answer questions, or draw additional conclusions is the basis of the Semantic Web’s notions of linked data.

The statements about data in the semantic web can be represented in the form of a graph with each statement representing two nodes and an edge. Linking data from one source to another is equivalent to adding links between two separate graphs of data. The Semantic Web includes standards that permit the definition of vocabularies that can be utilized to describe data from multiple sources in the same way. Building upon this, are standards that allow for the construction of information models. They enable class hierarchies to be defined with properties describing the entities they represent. Through the inclusion of data into these models inferences about the data that may not have existed previously can potentially be developed.

The Semantic Web Stack

The technologies used in the semantic web build upon those used in the document-based web such as unique naming and the Unicode character set. The technologies of the semantic web are often represented in a layered manner in that the higher level technologies build upon the base technologies. This arrangement is often referred to as the Semantic Web Stack or the Semantic Layer Cake and is depicted in below.

Source: Wikipedia semantic-web-stack


It is my intent to discuss the Resource Description Framework (RDF), RDF Schema (RDFS), and the Web Ontology Language (OWL) portions of the Semantic Web Stack in future parts of this blog post series. In addition, I hope to include an example of how these technologies can be used to model a domain and draw inferences.

References

Monday, February 1, 2010

Two Semantic Web titles that I found very informative

Over the past few months I've been working on a side project involving Semantic Web technologies (more on that in a later post). In addition to the many on-line references, I have found two texts to be very helpful in climbing the learning curve. As a result, I wanted to pass along some information on them in case anyone else was looking for a good place to get started with the Semantic Web.

Semantic Web Programming by John Hebeler, Matthew Fisher, Ryan Blace, Andrew Perez-Lopez and, Mike Dean (Foreword) provides a thorough introduction to the most common Semantic Web standards, libraries and tools. The authors approach the Semantic Web from a developer's standpoint focusing on how to implement semantic functionality. A working example is built upon throughout the book. The example demonstrates many of the concepts and techniques discussed. The example application shows how data in multiple different formats and types of source systems can be linked together. In addition to discussing tools and libraries, a introduction to some of the most important semantic web standards is provided. These include the Resource Description Framework (RDF), RDF Schema (RDFS), and the Web Ontology Language (OWL). Despite the very ambitious amount of material that they attempt to cover, in my opinion the authors do a very good job in 600 pages of providing a good place to jump into Semantic Web programming.

While the Semantic Web Programming text introduces the major semantic web standards, it doesn't delve too deeply into the creation of ontologies. This is where Semantic Web for the Working Ontologist - Effective Modeling in RDFS and OWL by Dean Allemang and James Hendler is really strong. The authors provide a comprehensive discussion of modeling in RDF, RDFS and OWL. Sections include practical advice on how to handle challenging modeling situations. Following the coverage of the standards two real world ontologies are discussed as examples of how modeling decisions impact the ontologies. Finally, good and bad modeling practices are presented. While the text covers the first version of OWL and OWL 2 became a W3C recommendation last fall, OWL 2 primarily adds to the language. As a result, this text remain very relevant.

Wednesday, January 20, 2010

GeoServer Doesn't Start with a Corrupted Coordinate System DB

This week I ran into a problem in our CI (Continuous Integration) environment that really had me stumped for a while. I wanted to post the outcome in case anyone else sees similar behavior and is in search of the solution.

We''ve been using the open source map server GeoServer for a while on at least two different projects. Let me say first, that I think the GeoServer team is doing a wonderful job and has produced a very function, flexible, and stable solution. Earlier this week I had checked in configuration changes that added additional data to our default deployment. The next time that our CI environment rebuilt and redeployed GeoServer it failed to start with an out of memory error. I thought, no problem I'll adjust the JVM initialization parameters and restart. At this point, I received the following exception during startup and the application failed to deploy.

org.opengis.referencing.NoSuchAuthorityCodeException: No code "EPSG:4326" from authority "European Petroleum Survey Group" found for object of type "IdentifiedObject"

An EPSG is a common way of representing the coordinate system that spatial data is associated with. Since new data was recently added, my initial thought was that I madea mistake configuring the new data. The wierd thing was that the EPSG code referenced, 4326, corresponds to one of the most commonly used coordinate systems used with spatial data. If it wasn't available, somethign larger was going on. After sifting through a number of log files, news group posts, etc it appeared that the most likely culprit was an HSQLDB database that is written to disk by GeoServer to aid in coordinate system look-ups. If the database exists, but is corrupted GeoServer will not be able to read it and will indicate that it can't determine referenced coordinate systems. The solution is easy to delete these files and let them be recreated on the next restart. Locating the files was a little challenging. Below are the locations where I have seen the files being written. Delete all of the files in the directory, restart, and the problem should go away.

UNIX/Linux:
/tmp/Geotools/Databases/HSQL

Windows:
%SystemDrive%\Documents and Settings\\Local Settings\tmp\Geotools\Databases\HSQL\


or

%SystemRoot%\tmp\Geotools\Databases\HSQL