Semantics is about meaning. Not about the meaning of life, but about the meaning of words. This meaning can come from words relating to each other, or arising from the distinction between words. For us it is clear that a blender and a mixer are alike, and what the difference is between ‘let’s play ball’ and ‘having a ball’ and the similarity with ‘having a blast’. A computer doesn’t know these simple things, yet. Technologies like text and language recognition are not widely spread, let alone in all languages and of sufficient quality. Therefore it is necessary to add additional context to add more meaning. This context can be used for gathering information, displaying information, and provide more relevant search results. This is the base for semantic search. The definition of Wikipedia; Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results.
This context can be added by defining content to be of a specific type. When content is of a type content there will no longer be a page containing unstructured text, but the page will, for example, contains a recipe or a product. Each page has metadata, information which describes certain characteristics of the data. For example; a book has a writer and publication date, a recipe has a cooking time, ingredients and instructions. So far so good, this is all still very logical and nothing new. Content Management Systems (CMS) have already been using this approach for a long time, by creating templates; a template of an article or a product. These templates will then be filled and can be added to parts of the website in a structured manner. However, this approach means that nobody but the content owner knows about the structure of the content. All the information which lies within these templates is hidden from the public.
Using metadata technologies, this hidden information can be stored into the page itself, so everyone can access this information. There are several ways of adding this metadata; RDF(a), Microdata (used in HTML5) and Microformats. A whole blog post can be dedicated to talk about these techniques, what the pros and cons are, who is using what technique etc., but for now it’s enough to know that enough ways are available for adding metadata.
Using the context, different kinds of functionality can be created. Faceted navigation is a simple example which shows the context of search results in an optimal way. Faceted navigation is a technique which can be used to drill-down on search results using filters. These filters provide common information about the search results. A simple example can be found on Amazon.com. When searching for Wilde (from the Irish writer Oscar Wilde) results are shown in the main pane and the filters are placed in the left pane. Two levels of filters are shown, the top level contains the type; books, movies, e-books and home and kitchen. Within this top level filter a subdivision has been made; literature & fiction, British literature and social sciences. The filters which are placed in one level are comparable to each other; it would be silly if classic literature and DVD were placed on the same level. That’s like comparing apples with oranges. While Amazon provides an example on faceted navigation, it doesn’t provide an example of this data being available for others.
One of the big encouragers of opening up data is the white house. Their own website whitehouse.gov, contains RDFa tags for exposing data in a structured manner. For example the title of the introduction of President Obama looks like this: <h1 property=”dc:title”>President Barack Obama</h1>. This part has been added right in the middle of the content into the HTML, lying around to be picked up by anyone.
Another example comes from the same source: data.gov, an official website of the United States Government. This website provides datasets that can be downloaded and used to make the data comprehensible. For example; a dataset can be downloaded which gives all M1+ earthquakes in the past seven days. Instead of the government having to process all this data by itself, anyone can make this data comprehensible. An example of the usage of this M1+ earthquake dataset can be found here: http://data-gov.tw.rpi.edu/demo/stable/demo-34-earthquake-exhibit.html. Next to this example of usage of data another 200+ applications have been developed by citizens using the exposed data provided by data.gov! FlyOnTime, finds the most on-time scheduled flights between two airports, checks the average flightdelay, in different weather conditions. National Obesity Comparison Tool, gives an overview per state of the percentage of people being obese.
Many, many more examples can be thought of when data from different sources is available and can be freely combined; vacancy search tool, product comparison tool, location based bargain overview, houses for sale from all real estate agents etc.
When this context is publicly available the question won’t be what the government can provide for the public, but what the public can add on top.