Planet RDF

It's triples all the way down

September 08

Sebastian Trueg: Faceted Browsing in KDE or The Blog Entry that Missed a Catchy Name

It is about time I blogged again. So I will just blabber out what I was doing for the last two days: Facets – again. Why again? Well, this is my third attempt at creating a generic facet framework for KDE and the fifth in total I think.

Posted at 10:28

September 07

Yves Raimond: Geocaching for music wins the 7digital prize at London Music Hack Day!

We had a really good time last week-end at the London Music Hack Day. If you haven't seen the list of hacks that were done over the week-end yet, go take a look! I found Speakatron, BumbleTab and Earth Destroyers to be particularly funny :)

Andrew, Chris and myself (as well as our beloved video producer Patrick) created an Android application to create and find 'musical treasures'. Think geocaching for music. You wander around and can drop tracks from your personal music collection, and you see what tracks people have dropped nearby you. If you're close enough, you can fetch and play the tracks. And we won the 7digital prize!

Here is a small video demonstration, and below are some screenshots:

On the data side, we use the recently linked 7digital and Echonest APIs. The back-end was written using Rails (check out the Cucumber tests!), and the Android application using the Android SDK. A list of clues for recently dropped tracks appear on the main website and on a twitter feed. And of course, here is the APK for the Android application.

Posted at 08:34

September 06

Ebiquity research group UMBC: SWSA seeks ISWC 2012 bids, 11th Int. Semantic Web Conf.

Semantic WebThe Semantic Web Science Association (SWSA) is seeking statements of interest from organizations or consortia interested in hosting the 11th International Semantic Web Conference, ISWC 2012. The conference series moves regularly between the Americas, Europe, and the Asia/Pacific region and we expect that the 2012 edition will be held in the US Americas in late October or early November 2012.

Organizations wishing to host ISWC 2012 should contact SWSA President Professor James Hendler (swsa-president@aifb.uni-karlsruhe.de) who will work with the SWSA members who are co-ordinating the bidding process for ISWC 2012.

The process comprises two stages. During the first stage, statements of interest are solicited through an open call. Once the first phase is complete, SWSA will shortlist a number of applications, who will be invited to submit a full proposal, using a standard form and budget template. More information about the ISWC Conference Series and the bidding process for hosting a conference in the series can be found in the ISWC Conference Guide.

The important dates for applying to host a Conference in 2012 are:

  • September 30, 2010: Deadline for receiving statements of interest
  • November 15, 2010: Notifications to shortlisted bids are sent out
  • January 15, 2011: Formal applications received from shortlisted bids
  • March 1, 2011: SWSA decides on location for the 2012 Conference

Posted at 22:33

Semantic Web Company (Austria): The review in a car

Imagine the following: A car full of Semantic Web Experts is on it’s way back from Graz. They hand around an iPhone to record some first impressions about the just ended 6th International Conference on Semantic Systems, I-SEMANTICS. So, the car was a Volvo, occupied by Thomas Schandl, Helmut Nagy, Tassilo Pellegrini and Andreas Blumauer.

Andreas Blumauer: “I think this year’s I-Semantics was a big step forward. I had the impression that a lot of industry representatives are looking again for serious solutions there, after they have had already “burned their fingers� with the first-generation semantics. The now presented 2nd generation is much more about running applications and less unproven concepts.�

Tassilo Pellegini: “This is what I also noticed this year. People build now on a solid common knowledge on the topic and are much more aware of the possibilities of the existing technologies and methods. And as this conference also where visited by a quite international crowd, a very homogeneous discussion incorporating a lot of the international trends was possible. So the developed sight on the topic was quite clear. In this respect, the keynote of Peter A. Gloor was a notable and impressive look into the very next future. It seems that the powerful technique of Cool Farming will be on our agenda in the next years, when we talk about prognosis tools, sentiment analysis, aggregated expert’s data, etc.�

Andreas Blumauer: “In terms of a look into the very next trends, also the Keynote of Rafael Sidi was impressive to me, as he draw a real amazing picture how his company Elsevier is on the way to transform their whole business model into a new paradigm. And this gives a glue that LOD has now arrived in real industry environments.�

Tom Schandl: “I think this real-live-aspect of the Semantic Web was one of the unspoken focal points of the conference. In this respect Richard Cyganiak had a brilliant talk about how corporate data integration can benefit from RDF-Solutions, because a RDF based data concept can be developed step-by-step in contradiction to a “conservative corporate data integration� which always goes with a general redesign of the whole data-structure of a company. Richard calls this “pay as you go� – and I think this is what the industry looks for.�

Helmut Nagy: “This is also my impression, standing a lot on our booth. The industry looks for very concrete semantic solutions – and some of them are already there and ready to use. So – to carry some house advertising – our PoolParty demozone was very well recognised and commented. And this is not only because I served Tropical Banana Cocktail there.�

So the talk went on, in the car, at the blogosphere in the Semantic Web Community.

Posted at 13:16

September 05

AKSW Group - University of Leipzig: AKSW coordinates EU-funded research project LOD2 aiming to take the Web of Linked Data to the next level

All wealth of information is already widely available on the Internet or in company-wide Intranets. In many situations, however, we tend perceive this plethora of information as an information overload, since it is still rarely possible to answer search queries going beyond simple keyword-searches and tedious to integrate information from different sources in unforeseen ways. Enabling such intelligent ways to process information on the Web is the key aim of the Semantic Web vision, but it seems that its realization based on logic and reasoning will take more time than initially anticipated.

Recently however, the Linked Data paradigm - a more lightweight and pragmatic approach for integrating information on the Web - gained traction. It is based on representing information in facts consisting of subject, predicate and object (aka RDF triples), publishing these on the Web and interlinking them by using the same mechanism as linking between web pages (via URIs). With more than 20 billion facts thus already published as Linked Open Data (LOD) the document Web is enriched with a data commons comprising, for example, all the BBC programming, Wikipedia as a structured knowledge base (DBpedia) and statistical information from Eurostat and the US census.

Co-funded by the European Union with 6.5 Million Euro as well as by companies and research institutions from 6 European countries the project LOD2 aims to realize the Web of Linked Data by developing crucial technological building blocks for the application of the Linked Data paradigm in companies, Web communities and governmental institutions. In particular, the LOD2 project will develop:

  • enterprise-ready tools and methodologies for exposing and managing very large amounts of structured information on the Data Web,
  • a testbed and bootstrap network of high-quality multi-domain, multi-lingual ontologies from sources such as Wikipedia and OpenStreetMap.
  • algorithms based on machine learning for automatically interlinking and fusing data from the Web.
  • standards and methods for reliably tracking provenance, ensuring privacy and data security as well as for assessing the quality of information.
  • adaptive tools for searching, browsing, and authoring of Linked Data.

The resulting tools, methods and data sets have the potential to change the Web as we know it today. This makes LOD2 relevant for researchers, industry and citizens alike. Whether it is about the efficient integration of enterprise data, the open-standardized access to scientific publications and experiment data or the opening of governmental data silos for the creative use by citizens, LOD2 will improve the usability of the Web for integrating heterogeneous information.

The 4-year collaborative research and development project, which is coordinated by the AKSW research group from Universität Leipzig starts in September 2010. Involves the partners Centrum Wiskunde & Informatica from the Netherlands, National University of Ireland, Galway, Freie Universität Berlin, UK-based OpenLink Software, Semantic Web Company from Vienna, the Belgian IT service providerTenForce, the french specialist for Enterprise search Exalead, the international publishing house Wolters Kluwer as well as the non-profit NGO Open Knowledge Foundation.

For companies and organizations owning large datasets of public interest and interested in publishing and interlinking these on the Data Web, the LOD2 partners offer a Linked Open Data Starter Service (LODS). The application deadline for this free consulting and development support is 15th of December 2010. Further information is available from the LOD2 website http://lod2.eu.

Posted at 14:51

Semantic Web Company (Austria): Winners of Triplification Challenge 2010

On Friday, September 3, 2010 the winners and honorary mentions of the 3rd Triplification Challenge have been awarded at the I-SEMANTICS conference in Graz. This year’s challenge consisted of an Open Track and a special Open Governement Data Track.

In total we received 28 submissions from which 15 nominees have been selected by the organizing committee. In a second round an international reviewing team of scientific and industrial experts elected the 3 equal winners and 3 honorary mentions. The winners were each granted a prize money of 1000.- Euro which was sponsored by Wolters Kluwer Germany, Semantic Universe and Semantic Web Company.

In the Open Governement Data Track the awards went to:

Winner:

Self-Service Linked Government Data with dcat and Gridworks
Richard Cyganiak, Fadi Maali and Vassilios Peristeras

Honorary Mention:

Linking Open Government Data: What Journalists Wish They Had Known
Christoph Boehm, Felix Naumann, Markus Freitag, Stefan George, Norman Höfler, Martin Köppelmann, Claudia Lehmann, Andrina Mascher and Tobias Schmidt

Honorary Mention:

Geographical Linked Data: a Spanish Use Case
Alexander De Leon, Victor Saquicela, Luis M. Vilches-Blázquez, Boris Villazón-Terrazas, Freddy Priyatna, Oscar Corcho, Carlos Buil, Jose Mora and Jean Paul Calbimonte

In the Open Track the awards went to:

Winner:

Live Open Linked Sensor Database
Danh Le Phuoc, Josiane Xavier Parreira, Michael Hausenblas, Yuanbo Han, Manfred Hauswirth

Winner:

Twarql: Tapping Into the Wisdom of the Crowd
Pablo Mendes, Pavan Kapanipathi and Alexandre Passant

Honorary Mention:

BibBase Triplified – http://data.bibbase.org
Christian Fritz, Oktie Hassanzadeh, Yang Yang, Reynold Xin and Renée J. Miller

The  winners f.l.t.r. Christian Dirschl (Sponsor Wolters Kluwer), Alex Passant, Richard Cyganiak, Danh LePhuoc and Pavan Kapanipathi

Cordial congratulations from the organizing team & look out for the 4th Triplification Challenge in 2011, which will again take place at the I-SEMANTICS conference, September 7 – 9, 2011 in Graz / Austria.

Posted at 07:50

September 03

AKSW Group - University of Leipzig: Triplification Challenge Winners

Today we announced the winners of this year’s Triplification Challenge, which have been selected from 23 submissions.

Open Government Data Track

  • Winner: Richard Cyganiak, Fadi Maali and Vassilios Peristeras, “Self-Service Linked Government Data with dcat and Gridworksâ€?
  • Honorary Mention: Christoph Boehm, Felix Naumann, Markus Freitag, Stefan George, Norman Höfler, Martin Köppelmann, Claudia Lehmann, Andrina Mascher and Tobias Schmidt, “Linking Open Government Data: What Journalists Wish They Had Knownâ€?
  • Honorary Mention: Alexander De Leon, Victor Saquicela, Luis M. Vilches-Blázquez, Boris Villazón-Terrazas, Freddy Priyatna, Oscar Corcho, Carlos Buil, Jose Mora and Jean Paul Calbimonte, “Geographical Linked Data: a Spanish Use Caseâ€?

Open Track

  • Winner: Danh Le Phuoc, “Live Open Linked Sensor databaseâ€?
  • Winner: Pablo Mendes, Pavan Kapanipathi and Alexandre Passant, “Twarql: Tapping Into the Wisdom of the Crowdâ€?
  • Honorary Mention: Oktie Hassanzadeh, Reynold S. Xin, Christian Fritz, Yang Yang and Renée J. Miller, “Bib Base Triplifiedâ€?

We thank all participants for their submissions, which were of extraordinary high quality, and we also thank the members of the reviewer committee for their help in selecting the winners. We are also increadibly thankful to the sponsors of this years prices: Wolters Kluwer, Semantic Universe.

We are very looking forward to next year’s challenge, which will again be organized in conjunction with the annual I-Semantics conference in Graz in September 2011.

Posted at 10:32

Jeen Broekstra: Sesame 2 Windows Client - or is that SPARQL Windows/Linux Client?

I've just released a new version of the Sesame 2 Windows Client. Thanks to a new co-developer, Anton Andreev (of OntoText), the SWC tool is now a full-fledged SPARQL client: it can connect to any SPARQL endpoint, not just Sesame servers.

Apart from this new feature, the tool also has a couple of other improvements:
  • context information fetching can now be disabled when connecting to large repositories;
  • namespace clauses can now be automatically generated, based on the prefixes used in the query.
The only problem with this new release is that the name of the tool is now even more of a mismatch. Not only is the Sesame 2 Windows Client not just for Windows (it also runs on Linux under Mono), but now it is also not just for Sesame. A free drink to whoever suggests a good new name!

As always, the new release can be found on the  project homepage on Sourceforge. The source code can from now on be found in the Sourceforge SVN repository.

Posted at 08:38

September 02

Ebiquity research group UMBC: Is Twitters plan to log all clicks a privacy loss?

Twitter’s planned shortening of all links via its t.co service is about to happen. The initial motivation was security, according to Twitter:

“Twitter’s link service at http://t.co is used to better protect users from malicious sites that engage in spreading malware, phishing attacks, and other harmful activity. A link converted by Twitter’s link service is checked against a list of potentially dangerous sites. When there’s a match, users can be warned before they continue.�

Declan McCullagh reports that Twitter announced in an email message that when someone click “on these links from Twitter.com or a Twitter application, Twitter will log that click.� Such information is extremely valuable. Give Twitter’s tens of millions of active users, just knowing how often certain URLs are clicked by people indicates what entities and topics are of interest at the moment.

“Our link service will also be used to measure information like how many times a link has been clicked. Eventually, this information will become an important quality signal for our Resonance algorithm—the way we determine if a Tweet is relevant and interesting.�

Associating the clicks with a user, IP address, location or device can yield even more information — like what you are interested in right now. Moreover, Twitter now has a way to associate arbitrary annotation metadata with each tweet. Analyzing all of this data can identify, for example, communities of users with common interests and the influential members within them.

Note that Twitter has not said it will do this or even that it will record and keep any user-identifiable information along with the clicks. They might just log the aggregate number of clicks in a window of time. But going the next step and capturing the additional information would be, in my mind, irresistible, even if there was no immediate plan to use it.

Search engines like Google already link clicks to users and IP addresses and use the information to improve their ranking algorithms and probably in many other ways. But what is troubling is the seemingly inexorable erosion of our online privacy. There will be no way to opt out of having your link wrapped by the t.co service and no announced way to opt out of having your clicks logged.

Posted at 13:12

August 31

Norm Walsh: Mexico

Cancun and a day trip to the Riviera Maya brings me to country number 16.

Posted at 22:32

Semantic Web Company (Austria): Why SKOS thesauri matter – the next generation of semantic technologies

As a matter of fact still a lot of “semantic technologies� are around which do nothing else than pure statistical analysis of text. Sure, this is better than simple full text search but there are still quite a lot of opportunities to improve search, especially when it comes to more sophisticated applications like “similarity search�, the search for similar documents to enable cross-reading or recommendation systems.

Providers of first generation semantic technologies calculate rather basic “semantic networks� by co-occurency analysis which results sometimes in  disappointing results. Bearing in mind that Google just bought a company (“Google buys Metaweb“) which has been working on one of the largest knowledge bases in the world, we could assume that some of the last miles towards a semantic search engine can be achieved by applying thesauri or other structured knowledge bases.

A demo application was recently developed by PoolParty team where one can find out how thesauri will improve search results on top of second generation semantic technologies. With PoolParty SKOS based controlled vocabularies can be managed and also can be enriched with linked data. PoolParty Tag & Content Recommender analyzes virtually any text or website to recommend corresponding tags, concepts from (in this case) STW (Standard Thesaurus für Wirtschaft), DBpedia and respective articles from Wikipedia.

STW which was developed by the German National Library of Economics (ZBW) provides vocabulary on any economic subject: about 6,000 standardized subject headings and about 18,000 entry terms to support individual keywords.

This background knowledge is used in this demo app to improve the search for similar documents dramatically:

Similarity between two documents can be calculated not only on a key-phrase basis but also on a rather conceptual basis. Even if two documents do not have one single word or phrase in common they can be identified as “similar documents�.

This can be achieved because thousands of important relations between economic subjects are represented in the domain specific thesaurus. Thus, in this special case best results are achieved with documents from economics (for instance from Econstor) but of course for other recommender systems thesauri from other domains can be used instead of STW.

Nevertheless, also this approach can be improved and this development is underway: SKOS thesauri enriched with Linked Data do an even better job. This kind of third generation semantic technologies are currently developed by LASSO project and LOD2 project, two innovative projects in the area of linked data and the semantic web.

Posted at 04:44

Andrew Matthews: Note to Self: Convert UTF-8 w/ BOM to ASCII (WIX + DB) using GNU uconv

Posted at 00:02

August 30

Dublin Core Metadata Initiative: New Task Groups for revising the User Guide and reviewing the DCMI Abstract Model

2010-08-30, Two new DCMI Task Groups have been formed: the DCMI User Guide Task Group that will work on a revision of the popular but outdated document "Using Dublin Core" and the DCMI Abstract Model Review Task Group that will prepare a review of the DCMI Abstract Model, both for discussion at DC-2010 in October 2010. Discussion will take place on the DC-Glossary and DC-Architecture mailing lists, respectively. Participation by interested members of the Dublin Core community is welcomed and encouraged; please contact Tom Baker for further information.

Posted at 23:59

Dublin Core Metadata Initiative: NISO/DCMI Webinar slides published

2010-08-30, The slides from the Joint NISO/DCMI Webinar "Dublin Core: The Road from Metadata Formats to Linked Data" held on 25 August 2010 are now available at the Metadata Training Resources page.

Posted at 23:59

Norm Walsh: Reconsidering specialization, part the first

It's been a few years since I first considered DITA specialization. I wonder if I missed the point? I think that might depend on the assumptions that I brought to the table.

Posted at 17:19

Jeen Broekstra: Moving to New Zealand

Milford Sound from kayak Karen and I are taking the plunge together. Well, another plunge: we are emigrating to New Zealand at the end of this year.

This is not, of course, something you decide on a whim. We've both long felt that we might like to make a "big change" for the better, and a long holiday spent getting to know New Zealand and its people convinced us that it is everything we were hoping it would be. New Zealand is an absolutely breathtakingly beautiful place, its people are friendly and easy-going, and for what is basically a predominantly Anglosaxon culture they actually make very good coffee as well.

The plan is quite simple. We have both obtained our permanent residence visa, and will move to New Zealand, to the Wellington area, in December. Although neither of us has secured a job yet, I am actively on the lookout, and confident that something suitable will turn up. NZ and especially Wellington is full of smaller and larger IT firms, and every job vacancy site I check literally posts 2 to 6 positions for (Java) developers a day, at least. However, what I've not been able to find much of yet, is companies or institutes who work in Semantic Web/Ontologies, or who have a need for expertise in that area. If any of you have tips for me (or a job offer ;-)), it would be much appreciated!

I will post updates on developments now and then on this weblog. Oh and yes: we will of course be giving a farewell party later this year ;-)

Posted at 10:11

August 29

Dean Allemang: Graphical "more like this" Query Building

I promised in an earlier blog post to talk about how to create queries over OWL in RDF.  So here it is.

As Ivan alluded in his comment, there are some syntax issues with talking about OWL restrictions in RDF.  What is he referring to?  Well, let's take the same example in the last blog post, a datatype restriction about things with age>=21.  We could write this in Manchester Syntax as 

hasAge only xsd:integer [>=21]

But the OWL/RDF rendition of this is where the 'arcane' syntax comes in.  We can see it just by looking at the source code in turtle, where it looks like this:

[] a owl:Restriction ;
owl:allValuesFrom
[ a rdfs:Datatype ;
owl:onDatatype xsd:integer ;
owl:withRestrictions
([ xsd:minInclusive 21])
] ;
owl:onProperty :hasAge .

In the last blog entry, we saw a rule that would match this sort of definition, so that we could classify persons of appropriate ages as Adults.  That rule looked like this:

CONSTRUCT {
    ?x a ?restriction .
}
WHERE {
    ?datatype owl:onDatatype xsd:integer .
    ?datatype owl:withRestrictions ?var .
    ?datatype a rdfs:Datatype .
    ?restriction owl:allValuesFrom ?datatype .
    ?restriction a owl:Restriction .
    ?restriction owl:onProperty ?datatypeproperty .
    ?var rdf:first ?var1 .
    ?var1 xsd:minInclusive ?mval .
    ?x ?datatypeproperty ?val .
    FILTER (?val >= ?mval) .
}

How do you write a rule like that?  By looking up in the standard how to express datatype restrictions, and how to link those to restricted value sets, and . . . . if that seems labor intensive and error-prone to you, then you're right.  It is.

But we can use a power-tool to help make this happen. The power tools aren't included in the free version of TopBraid Composer, so if you want to follow along here, you'll need the Maestro Edition; a 30-day trial is available for free.

Start by loading http://workingontologist.org/Examples/adult.rdf into Composer, just as shown before, and open it. We're going to use the model itself as a prototype to create a query. Let's start by looking at an example of the restriction we want to match - look at the definition of Adult in the model:

Man 

You can type it in just like that.  But that doesn't help us write a SPARQL query to match any restriction of this form.  How can we do that?   If you click on "Graph" at the bottom of the pane, you can explore this definition, in RDF.  If you drill down to the Datatype Restriction itself, you get a view like the top of this figure:

Minequery 

This is just a graphic representation of triples in the model - you can see all the structure of the RDF representation of the restriction. 

Now comes the fun part - let's turn this image into a query (which, to avoid suspense, is already shown at the bottom of the figure).  We want a query that will match "things like this" restriction.  What does "like this" mean?  That's what we have to specify - there are some aspects of this example that should be included in the match (like the fact that it is a owl:Restriction, on a rdfs:Datatype xsd:integer, and that it is a owl:minInclusive restriction), and others should not be included in the match (that the property is :hasAge; after all, we this to match for restrictions on any property).  So, we select the things that we want to keep in the query, marked with a small "x" (you can set/reset the "x" by clicking on the small box in each node in the graph).  

Once you have selected the aspects that specify what you mean by "like this" (a Datatype Restriction, on some property, with minInclusive over xsd:integers), you can generate the query automatically by clicking the  Starbutton.  You can see the generated query at the bottom of the figure. 

All the generator did was to take the triples shown in the figure, and render them in the query.  Selected nodes (with "x") appear in the query as themselves; unselected nodes (no "x") become variables.  Properties always show up as themselves.   Best guesses are made for meaningful variable names; it uses type information for the guesses.  

There are a few differences between the generated query and the WHERE clause of the rule:

WHERE {
?datatype owl:onDatatype xsd:integer .
?datatype owl:withRestrictions ?var .
?datatype a rdfs:Datatype .
?restriction owl:allValuesFrom ?datatype .
?restriction a owl:Restriction .
?restriction owl:onProperty ?datatypeproperty .
?var rdf:first ?var1 .
?var1 xsd:minInclusive ?mval .
?x ?datatypeproperty ?val .
FILTER (?val >= ?mval) .
}

The first difference is ordering of triples - the generator isn't very fussy about the order in which triples are generated, so it is different each time (if you are following along at home, your generated query will probably be different from the one shown here, and also from the rule).  

The second difference is the inclusion of a triple to match data, to wit:

 ?x ?datatypeproperty ?val .

After all, in a rule, we want to say "when some data satisfies this restriction, ..." This clause uses the same variable for the property (?datatypeproperty) as used in the rest of the query. 

The final difference has to do with the constant "21".  The generated query includes the constant, whereas the rule turns it into a variable (?mval) and adds a filter to compare it to the actual data (?val).  After all, the value "21" comes from the model, and shouldn't be built in to the rule. 

So yes, these modifications have to be made by hand (using the SPARQL editor, where the generator put the query).  The query generator should be seen as a power tool; you still need an operator who knows how to use it, but it simplifies a lot of the heavy lifting for query writing.  In this case, we have a rule with 10 clauses (9 triples and a filter).  The generator created seven of the triples, and most of the eighth one; the human only had to write the last two clauses.  That is, the power tool took care of the "arcane syntax" that Ivan referred to, leaving the human to figure out what they really want the rule to mean.

I use this feature of TopBraid Composer all the time, in this pattern.  I want to write a query that matches some 'arcane' bit of RDF (e.g., from dbpedia, the OWL in RDF standard, the XML DOM, SKOS, etc.). Instead of trying to write a query from scratch, I find (or even build) an example of the thing I want to match.  Then I generate the query - automatically guaranteeing that I didn't leave out any triples, that I got all the namespaces and property names correct, that I didn't accidentally collide bnodes by giving them the same variable name, etc.  Then I beat up the result to create the query that I really want - in which I define what I want to do with the match. 

So when you see an elaborate query with dozens of triples in it, and you wonder what sort of geek can write or maintain such a thing, keep in mind that it might not have been written at all; it might have been generated from an example.

Posted at 16:17

August 26

Dave Beckett: Leaving Yahoo – Joining Digg

I’m heading to a new adventure at Digg in San Francisco to be a lead software engineer working on APIs and syndication.

I’ve been at Yahoo! nearly 5 years so it is both a happy and sad time for me, and I wish all the excellent people I worked with the best of luck in future.

Here is a summary of the main changes:

  • Silicon Valley -> San Francisco
  • 15,000 staff -> 100 staff
  • Architect -> Software engineer
  • strategizing, meeting -> coding
  • Powerpoint, OmniGraffle, twiki -> emacs, eclipse, …?
  • (No coding!) -> Python, Java, Hadoop, Cassandra, …?
  • Sunny days -> Foggy days
  • 15 min commute -> 2.5hr commute (until I move to SF)
  • Public company -> private company

Exciting!

Posted at 20:44

Dean Allemang: Extending OWL RL

I've always been a fan of describing OWL in terms of rules. When introducing a someone to a new technology, it is nice to be able to describe it simply (a lesson that facebook taught us again recently). And while it is a bit of a white lie to say that OWL is defined just by a set of rules, it makes it very easy to explain what something in OWL (or RDFS) means, by stating a rule that it follows.

I've actually been using a rule-based definition of OWL for years now, starting back at Intellidimension years ago, and then using OWLIM, and nowadays SPIN. All of these technologies have been 'approximating' OWL for years using variations of Datalog technology - implementing OWL as a set of rules.

While OWL 2's creation of three profiles and a subset hardly counts as keeping the standard simple, I have to say I appreciate the legitimacy that the OWL 2 RL profile has given to a practice that many of us (more than just the ones I have listed) have been doing for years now - of using rule-based systems to process OWL. And the RIF folks have even done us the favor of writing out just what rules OWL 2 RL is made of.

One of the things I have always liked about this approach is the flexibility it gives the system builder in trading off performance vs. expressiveness in the modeling language. You don't need someValuesFrom restrictions? Fine - take those rules out, and speed up the system. I've taken systems from intractable 20-minute response times down to almost instant by fine-tuning the rule system, while still maintaining the same semantics - because my model didn't use the discarded rules.

But today I want to talk about another advantage of this approach - that you can extend your model semantics as well. Suppose there is something in OWL-Full that you want to use, but it doesn't appear in the OWL 2 RL list of rules? What can you do about it? You could switch approaches, and use another style of reasoner, but then you lose the advantage of being able to tune your rule base. Another approach is to encode just the extensions that you want in rules.

Let's take a simple example of this, using SPIN as our rule language. You can follow along yourself if you like - all you need is the Free Edition of TopBraid Composer.

OWL-Full allows something called Data Range Expressions, in which you can define a range to be a set of values. A simple example of this is the notion of Adult, that is a person who has an age greater or equal to 21. An example of a model with this definition can be found at http://www.workingontologist.org/Examples/adult.rdf.

You can import this file into TopBraid Composer by right-clicking on the TopBraid project, selecting "Import RDF or OWL File from the Web" and pasting in the URL of the model, http://www.workingontologist.org/Examples/adult.rdf (see first figure). 

CreateFile

Open the file adult.rdf by double-clicking, then expand owl:Thing to see the ontology. Click on "Adult" to see its definition - a Person who hasAge only from values greater or equal to 21 (see second figure).

OpenAdult

Notice that there are also three instances of the class Person - with ages 23, 18 and 45. Evidently, two of these are adults, and one is not.

Persons

Now we run SPIN inferences (by pressing the  Inference button), and we see that indeed just the people of appropriate age are classified as Adults.

Done

How did this work?

SPIN works by expressing the rules for OWL in SPARQL. Thanks to the RIF effort, mentioned above, we at TopQuadrant were able to write out all the OWL 2 RL rules in SPIN (since SPARQL has the same expressive power as RIF). This example simply imported these rules from http://topbraid.org/spin/owlrl-all. The SPIN inferencer finds these rules, and executes them when you press the Inference button. We can see one of these rules in the following figure - it is a familiar rule, telling us how rdfs:subPropertyOf works.

SubPRule

But that doesn't explain the whole thing - if you know OWL 2 RL well, you know that DataRange Expressions are not part of the OWL 2 RL profile. There are good technical reasons why it was left out, but that doesn't keep us from wanting to do these inferences. So we express them in SPARQL and add them in to our rule set for the SPIN inferencer to work on. One such rule is shown in the next figure;

MinInclusive
most of the rule matches the RDF rendition of the OWL data restriction. It matches restrictions of xsd:integer, where all the values come from the set defined by minexclusive for some value (in our case, 21). When all these things match, then we assert that the instance is a member of the restriction.

So in the case of :Person_1 who is 23 years old, the property :hasAge matches the variable ?datatypeproperty, and 21 matches the variable ?mval, while the actual age 23 matches the variable ?val. Since 23 > 21, ?val > ?mval, and the rule matches. Hence, :Person_1 is a member of the restriction, and by the rest of the rules from OWL-RL, is an :Adult.

This approach to OWL gives a lot of control to the modeler; they can use standard models (like the OWL 2 RL model we used here), but they can also augment this reasoning with new rules that do just as much inferencing as is needed for the application. These new rules can be consistent with the standard OWL-Full rules, or they could even be domain-specific business rules. In any case, the power lies in the hands of the modeler. In the particular case of SPIN, we have the added advantage that the modeler can write these rules in the standard SPARQL language.

Posted at 04:25

August 25

Leigh Dodds: Gridworks Reconciliation API Implementation

Gridworks is a really fantastic tool and there’s scope to extend it in all kinds of interesting ways. Jeni Tennison has recently published a great blog post describing how to use Gridworks for generating Linked Data. I strongly encourage you to read her posting as it not only provides a good introduction to Gridworks itself, but also shows a nice real world example of generating RDF using its built-in data cleaning and templating tools.

I was luckily enough to meet David Huynh as a workshop recently and chatted to him briefly about another aspect of the Gridworks: its ability to match field values in a dataset to entities in Freebase, e.g. identifying a place based on just it’s name. Within Gridworks this process is known as “reconciliation�.

Reconciliation is an important step for generating good Linked Data as you’ll often need to correlate values in a dataset with URIs in existing datasets in order to generate links. E.g. matching company names to their URIs. While it is possible to generate identifiers algorithmically during a conversion this typically just defers the reconciliation work until a later stage, when you carry out cross-linking to introduce equivalence links.

Recognising that the ability to introduce new reconciliation services would be a powerful extension to Gridworks, David Huynh has been creating a draft specification that will allow third-parties to create and deploy their own reconciliation services. He’s been documenting his progress on implementing the client side of this protocol and has published a testing service.

It occurred to me that the reconciliation API is essentially a structured search over a dataset and thus could be implemented against the search interface exposed by Talis Platform stores. The RSS 1.0 feeds that the Platform returns includes enough information to rank and filter results as required by the API.

I’ve created a simple Ruby application, using the Sinatra web framework, that implements the reconciliation API for any Talis Platform store. You can find the code on github if you want to have a play with it. As I note in the README there are some areas where customisation is useful to get the most from the service. So while in principle it can be used against any existing Platform store you can create a simple JSON config to tweak it for particular datasets.

There’s a live version of the code running one my server here: http://ldodds.com/gridworks/.

That page has a simple API console for carrying out queries, but consult the draft specification for more details. I think I’ve covered all of the basic features (but bug reports welcome!). Consult the README for notes on configuration options and implementation decisions.

As a simple illustration, lets say that I have the value “Bath� in a dataset and want to match that to some area in the UK administrative geography. This information is available from the Linked Data exposed by statistics.data.gov.uk and this happens to be hosted in this platform store. The reconciliation API we need can therefore be found at: http://ldodds.com/gridworks/govuk-statistics/reconcile. An HTTP GET on that location retrieves the service metadata.

If we use the API explorer we can use a simple HTML form to try out examples. Select govuk-statistics from the Store drop-down and then type Bath into the search box. You’ll get this result. This is not very readable by default, so if you’re using Firefox I recommend you install the JSONView extension which provides a nicely formatted display.

Our initial search returns a number of results. The highest ranked of these being the Westminster Constituency for Bath. That seems like a pretty good initial result to me. As it is the most relevant result in the search it’s marked as an exact match, so once integrated with Gridworks it will capture and store the reconciled identifier for you.

However, we may know that in the imaginary dataset we’re working with, that a particular field doesn’t contain names of constituencies. It may instead refer to a Local Education Authority. We can refine our search by adding the URI that defines that type of resource into the type field in the API explorer.

Try pasting in http://statistics.data.gov.uk/def/geography/LocalEducationAuthority into the post and running the search again. You’ll find that this time you get a single result, which is Bath and North East Somerset. Job done.

Of course, to get the most from this you need to know what URIs you can use for filtering by types (and properties). But this is something that the Gridworks UI will help with. It can integrate with “suggestion services� that can be used to help map values to a properties and types within a schema. I’ll be looking at how to expose those as my next piece of work.

Hopefully you can see how the overall system works. Feel free to have a play with the API to try it out for yourself. If you have comments on the implementation then I’d love to hear them, but I’d suggest that comments on the specification are best addressed to the gridworks mailing list.

I also suspect the Reconciliation API has uses outside of just Gridworks. For example, I wonder how easy it would be to introduce reconciliation into Google Spreadsheets using Google Apps Script? It’s also another nice demonstration of how easy it is to map simple RESTful APIs onto RDF datasets, this implementation works for any data in the Platform, no matter what schema it confirms with. Neat.

Posted at 21:19

Ebiquity research group UMBC: Yahoo! using Bing search engine in US and Canada

Google, Bing, Yahoo!Microsoft’s Bing team announced on their blog that that the Bing search engine is “powering Yahoo!’s search results� in the US and Canada for English queries. Yahoo also has a post on their Yahoo! Search Blog.

The San Jose Mercury News reports:

“Tuesday, nearly 13 months after Yahoo and Microsoft announced plans to collaborate on Internet search in hopes of challenging Google’s market dominance, the two companies announced that the results of all Yahoo English language searches made in the United States and Canada are coming from Microsoft’s Bing search engine. The two companies are still racing to complete the transition of paid search, the text advertising links that run beside and above the standard search results, before the make-or-break holiday period — a much more difficult task.�

Combining the traffic from Microsoft and Yahoo will give the Bing a more significant share of the Web search market. That should help them by providing both companies with a larger stream of search related data that can be exploited to improve search relevance, ad placement and trend spotting. It will also help to foster competition with Google focused on developing better search technology.

Hopefully, Bing will be able to benefit from the good work done at Yahoo! on adding more semantics to Web search.

Posted at 04:08

August 24

Talis: Talis Training: Intro to the Web of Data

Intro to the Web of Data

21-22 September

76 Portland Place, London

26-27 October

Talis Offices, Birmingham

So, we’ve been running a series of Open Days which you can’t have failed to notice here on the Nodalities blog. We’ve covered very broad topics related to the Semantic Web and Linked Data, giving an overview of graph-thinking with data, URI’s and some direction.

But the question keeps coming up: “How does my team actually use Linked Data?�

We’ve done quite a bit of training, both bespoke consulting and as a set course, and you can read a bit more about that over on our consulting page. We’re now hosting a series of open-registration training courses: A 2-day introduction to the Web of Data.

The course provides an in-depth introduction to all of the core technologies that a developer will encounter when working with and publishing Linked Data. It includes a thorough introduction to the RDF model; modelling of data using RDF Schema; publishing of data to the web as Linked Data, and querying RDF datasets using SPARQL.

We’re offering a discounted price for the first two courses of £1,000 per attendee (ex VAT), including lunch and our now-famous SPARQL blend coffee from Union Hand-Roasted Coffee.

The first course will be on 21 and 22 September at No 76 Portland Place, London. The second will be at our offices in Birmingham on 26 and 27 October.

Posted at 17:53

Dean Allemang: Facebook OGP for SVSW

I went to the Silicon Valley Semantic Web Meetup last night about Facebook's Open Graph Protocol. The presentation by Austin Haugen and Paul Tarjan was short and sweet and gave the best overview of OGP that I have seen. It included a live demo of using OGP to link a page into facebook - it only took a couple minutes, but made it very clear what was and wasn't going on.

On the one hand, I see why Jim Hendler has said that the more he sees of OGP, the better he likes it; these guys really 'get' the Semantic Web, they understand what it means to link a page in a web of data rather than just point to it, and they can demonstrate it very cleanly inside the facebook infrastructure. 

But on the other hand, when I asked the speakers how I could query the Open Graph, their answer was, for us linked data fans, a bit disappointing (though I applaud the speakers for their honesty).  Not only is there no way to query the graph, there won't be one any time soon.  One speaker went so far as to say, "we're sort of faking the semantic web here; there's no Virtuoso behind this." 

My hopes of using Facebook as a sort of clearinghouse of interesting RDF data for classes, demos, etc. were dashed right away.  

My question attracted a number of discussions afterward, many of which had a cynical edge; one fellow said to me that it is clear that facebook doesn't really understand or care about the Semantic Web; they just want a way to drive more traffic to their site, and that the techies just made it look like Semantic Web to jump on the buzzword bandwagon.  I guess it's nice that we're a buzzword bandwagon now, that someone like facebook wants to be part of.  The discussion also wandered to speculations about Google's intentions with our darling Metaweb, and what plans the not-evil giant has for her.

Be this as it may (and of course it is true to some extent; after all, facebook lives in a capitalist economy, so making money has to be a big part of what drives their decisions), that didn't stop me from putting a facebook "like" button on workingontologist.org .

On the other hand (how many hands do I have now?), one can see a motivation for facebook to include a query interface to the Open Graph Protocol - after all, they do want to encourage a cottage industry of app builders to add functionality to their site. And we know how successful RDF has been in doing that - just look at SearchMonkey...., oh wait.  Maybe not.

One of the things I found most informative about the talk came in the discussion in response to various questions about design decisions.  From the point of view of metatags, the Open Graph Protocol is really simple; just a handful of required tags with a simplified syntax (simpler even than standard RDFa).  Even so, facebook user studies showed that this was almost too complicated.  Even very small complications - additional namespaces, some slightly twisty syntax from RDFa - were found to have a severe damping effect on technology adoption.  It seems that even the levels of simplicity we argue for in our Semantic Universe blog entry on technology adoption are not enough; for some audiences, simple really has to be simple.  This is a tough pill for any technologist to swallow; looking at OGP makes it look as if the baby has been thrown out with the bathwater.  But there are now hundreds of millions of new 'like' buttons around the web; simplicity pays off.  As another commenter pointed out, regardless of the purity (or lack thereof) of the facebook approach, OGP has still made the biggest splash in terms of bringing semantic web to the attention of the public at large.  So who's the bandwagon, and who's riding?

Posted at 17:06

Bill Roberts: Scottish government spending - CSVs please?

The Scottish Government is now publishing monthly reports of all items of expenditure over £25,000.

That’s great and a positive step towards transparency and open data. But can we have them as CSV files? or even Excel? Currently they are only available as PDF which makes it a pain to do anything with the data.

Chris Taggart has converted the first couple of reports to CSV and helpfully made them available here, but since this data was probably in a database or spreadsheet before being made into a PDF, it’s a bit daft to create all that extra work for anyone who wants to analyse the data.

They could leap in a single bound from one star to three on the Berners-Lee 5 stars of open linked data.

Posted at 15:09

Bill Roberts: RDF datasets and graphs

I think we need to establish a common practice to link RDF datasets and named graphs.

I’ve been working recently on discovery of linked data: how people can find out easily what is available and how they can use it, either directly or by building it into new applications.

The dataset is clearly an important concept in this, albeit a rather vague one – essentially it’s a bunch of data that belong together somehow.

As most of you will already know, an ever increasing list of UK government datasets is catalogued at http://data.gov.uk/data, each with some supporting information on what it’s about, where it came from and where to go to access all the details.

Some of that data has been made available as Linked Data, with dereferenceable URIs and SPARQL endpoints, for example the information on schools available here. The Linked Data approach can be very powerful, but for a ‘data consumer’ it can also be difficult to know where to start. One of the things I’m currently working on is to publish some simple additional information that will hopefully make it easier to exploit the Linked Data part of data.gov.uk.

Which brings me back to describing Linked Data datasets. The de facto standard for this (and only show in town) is voiD which defines a vocabulary and recommends some good practices for describing a dataset.

Via the void:dataDump property, you can point to a location where you can get a copy of all the data in the dataset. And using dcterms:isPartOf, you can link the description of a resource back to the dataset that it’s part of.

OK, so far so good. However, one important thing that seems to be missing in this picture is how to restrict a SPARQL query to a particular dataset or shortlist of datasets.

The typical approach with statistical data has been to use SCOVO, soon to be usurped by the RDF Data Cube vocabulary from Dave Reynolds et al (still work in progress but hopefully soon to be released). Those approaches link individual observations back to the dataset they belong to, which makes it easy to limit a query to a particular dataset. But not all data fits that kind of pattern.

However this is exactly the reason that the named graph approach was created – as a convenient way of grouping a bunch of triples together and letting us talk about them – and it’s already supported by most RDF databases/quad stores.

VoiD lets you link a dataset to a SPARQL endpoint, but the voiD guide says “Note: It is assumed that the default graph of the SPARQL endpoint is the dataset itself�. This seems unnecessarily restrictive as it implies only one dataset per endpoint and a lot of the value of SPARQL and Linked Data in general is the ability to connect stuff across multiple datasets.

We can link named graphs to voiD datasets (by using dcterms:isPartOf for example) and if the data available through SPARQL endpoints is grouped into those named graphs, then it provides an easy mechanism for adding metadata and finding aids for the steadily increasing list of linked data datasets.

Posted at 14:50

Talis: Linked Open Data and Pavlova

rjw_caricature_mini If Sir Tim Berners-Lee can equate Linked Data with a packet of  crisps/potato chips, I thought I would take a stab at another food metaphor for this post. 

Linked Open Data (LOD) is a concept that many believe they understand.  Take yourself to most any conference that has a connection with data, or the web, or the Internet at the moment, and it will not belong before you see a slide of the Linked Open Data cloud diagram, or of Sir Tim imploring us to give him our raw data now, or if you are very lucky a shot of him doing his imploring whilst stood in front of a shot of the LOD cloud.  -  Simple really, just publish your data as Linked Open Data and all will be wonderful as we move towards the sunlit Semantic Web uplands.  Unfortunately life is never that simple – LOD is not a single identifiable thing.  As Paul Walk eloquently puts it:

  1. data can be open, while not being linked
  2. data can be linked, while not being open
  3. data which is both open and linked is increasingly viable
  4. the Semantic Web can only function with data which is both open and linked

As with any recipe for success, the majority concentrate on the final result.  Praising or criticising it as a whole, without identifying the benefits or otherwise, of the individual ingredients.  Take a strawberry pavlova for instance.  If you you are in to that kind of thing, a delightful culmination of the culinary arts designed to send your taste buds in to raptures.  Unless that is, you don’t like cream, or you don’t like strawberries, or can’t abide meringue, in which case the whole thing seems a little pointless.

What has this got to do with Linked Open Data (LOD), I hear you ask.  Well, I am increasingly seeing LOD being presented as the goal for those wishing to publish their data on line.  My position is that the eventual goal, from which will spring a Semantic Web, is a global web of linked and open data. However, there are many steps from where we are now to achieving that goal.  Within audiences that I present to, and/or sit amongst, I see people who for whatever reasons do not ‘get’ one or more of the components of LOD – they cannot envisage opening up any of their data, or think that using a web address for an identifier is over complex, or have a religious aversion to RDF.  As a result they dismiss the whole recipe as not for them, or worse still, as something impractical that will become nothing more than the plaything of a few passionate enthusiasts.

When someone who is still struggling with the concept of opening up their organisation’s data; or why RDF might be a more useful format than csv, is shown the ubiquitous Linked Open Data cloud diagram with encouragement to join in – it is hardly surprising they remain a little unconvinced.  This isn’t a criticism of presenters either.  In only 20 minutes on a stage, it is difficult to go into underlying detail.

Let my try in a few paragraphs to break the LOD pavlova in to it’s ingredients

  •  Data – In the context of  this post, by data I mean machine readable information, produced in a format that can be consumed and processed by other machines.  Inevitably, this means file formats such as csv, XML, RDF, etc. , but not something like pdf, html, or word, which although they are in a transferrable format it is designed for human consumption not machine analysis.

    For some, just this step from their current human targeted format, to a machine readable one, is a significant one.

  • Open Data  – Data (see above) which is accessible for all to download, view, and consume in a way that is not encumbered by licensing that restricts its use.  For example, the licensing used by data.gov.uk data.  By definition data which is restricted for certain uses is not fully open.  

    In our internet based world, openness can also be defined in terms of technical accessibility.  If it is only available after a login process, or it is only available to users behind a firewall, it couldn’t be considered as open. 

  • Linked Data – Data (see above) which contains URIs as identifiers for concepts described in the data and URIs to identify the relationships between those concepts.  The four Linked Data Principles, as published as a design note by Tim Berners-Lee, provide a bit more detail on this.

    I am in danger of stirring the embers of a religious fire fight here, between those that believe that Linked Data must be described in RDF and contain URIs as identifiers, and those that maintain that you can have data linked across the web without those constraints.  All I am going to say on that at this time, is that the Linked Open Data cloud of data sets has been successful, based on the first of those two views. (if you want to follow that particular debate in more detail, Paul Miller’s post and associated comments would be a good starting point)

So, how can data be open, but not linked? – by publishing in in a non-Linked Data form such as a text file or a html page or a pdf.  Where would you find this? – all over the web. As encouraged by Sir Tim to give us your raw data now, and as I detailed in my previous “data publishing three-step’ post, this is often the first element of getting your data out there for others to consume.

How can data be Linked but not open? – by publishing it in accordance with the principles, in RDF, with URIs, but restricting access either by imposing restrictive licensing conditions or restricting access to the data.  Where would you find this? – again all over the web, but often hiding behind restrictive licensing terms such as “non-commercial use only�.  Also to be found inside organisational firewalls.  For example, commercial organisations can realise the benefits of  using Linked Data techniques with their internal private data.  Potentially linking it to publicly visible concepts across the web to add even more value for their employees.

Data that is Linked and Open, like that strawberry pavlova, has the power to deliver value beyond the sum of its individual ingredients.  By providing data in a form that is linked to other data, and easy for others to link to, without restrictions on who or how that linking takes place, provides the foundation for a web of linked data built on the same principles that fostered the growth of the web of documents that has so changed our world over the last decade and a half.

The ingredients that formed that World Wide Web of documents – html, http, open publishing of web sites without restrictions on other’s abilities to consume and/or link to them – individually  were important developments.  However, when those elements were blended together their effects were multiplied many fold and resulted in the web we experience today. 

So [as I stretch my culinary metaphor to it’s limits] if you are hoping to take people with you in building a Linked Open Data future, you not only have to show them a picture of the final dish, you need to describe the individual ingredients and their relevance to the eventual result.

Pictures from Flickr by PhOtOnQuAnTiQuE and avixyz

Posted at 08:18

August 22

Dave Beckett: Rasqal RDF Query Library 0.9.20

I just released a new version of my Rasqal RDF Query Library for two main new features:

  1. Support more of the new W3C SPARQL working drafts of 1 June 2010 for SPARQL 1.1 Query and SPARQL 1.1 Update.
  2. Support building with Raptor V2 API as well as Raptor V1 API..

The main change is to start to add to Rasqal’s APIs and query engine changes for the new SPARQL 1.1 working drafts. This release adds support the syntax for all the changes for Query and Update. The new draft syntax is available via the ‘laqrs’ query language name, until the SPARQL 1.1 syntax is finalized. The ‘sparql’ query language provides SPARQL 1.0 support.

On Query 1.1, the addition is primarily syntax and API support for the new syntax. There is expression execution for the new functions IF(), URI(), STRLANG(), STRDT(), BNODE(), IN() and NOT IN() which are noew usable as part of the normal expression grammar. The existing aggregate function support was extended to add the new SAMPLE() and GROUP_CONCAT() but remains syntax-only. Finally the new GROUP BY with HAVING conditions were added to the syntax and had consequent API updates but no query engine execution of them.

For Update 1.1 the full set of update operations syntax were added and they create API structures. Note, however there seem to be some ambiguities in the draft syntax especially around multiple optional tokens in a row near WITH which are particularly hard to implement in flex and bison (aka “lex and yacc�).

The main non-SPARQL 1.1 related change is to allow building Rasqal with Raptor V2 APIs rather than V1. Raptor V2 is in beta so this is not a final API and is thus not the default build, it has to be enabled with --enable-raptor2 with configure. When raptor V2 is stable (2.0.0), Rasqal will require it.

The changes to Rasqal in this release, in summary, are:

  • Updated to handle more of the new syntax defined by the SPARQL 1.1 Query and SPARQL 1.1 Update W3C working drafts of 1 June 2010
  • Added execution support for new SPARQL 1.1 query built-in expressions IF(), URI(), STRLANG(), STRDT(), BNODE(), IN() and NOT IN().
  • Added an ‘html’ query result table format from patch by Nicholas J Humfrey
  • Added API support for group by HAVING expressions.
  • Added XSD Date comparison support.
  • Support building with Raptor V2 API if configured with --with-raptor2.
  • Many other bug fixes and improvements were made.
  • Fixed Issues: #0000352, #0000353, #0000354, #0000360, #0000374, #0000377 and #0000378

See the Rasqal 0.9.20 Release Notes for the full details of the changes.

Get it at http://download.librdf.org/source/rasqal-0.9.20.tar.gz.

PS The source code control has also moved to GIT and hosted at GitHub.

Posted at 21:33

August 20

Talis: Best Buy: Semantic Web and Retail

In this Nodalities Podcast, I speak with Jay Myers from Best Buy about how he and his team are working within the retail giant to better harness their data. Jay tells us about his use of blogs and RDFa to better manage “open-box� products returned to Best Buy’s many stores in an effort to surface deals to the public and make savings on otherwise costly problems.

Jay also explains how Best Buy are publishing the machine-readable data out on the public web and touches on the next steps Best Buy will be taking. He also calls on the Semantic Web community to take an active role in promoting work like this by voting for his panel at South by Southwest, which you can see here.

Jay Myers is a Lead Web Development Engineer for Best Buy, and is an active supporter of the GoodRelations vocabulary for ecommerce, utilizing it for modeling consumer products, stores, and services in both RDF/XML and RDFa. For more information, you can read his blog or catch him on Twitter.

Posted at 13:53

Georgi Kobilarov: Data Journalism Meetup Berlin, September 1st 2010

After two successful Web of Data meetups in London with 200 guests each, it was time to bring the Web of Data meetup to Berlin.

Data Journalism and the new and exciting possibilities that the Web of Data opens up for creators and consumers of news and media online will be the topic of this first meetup in Berlin on September 1st 2010.

We have a brilliant lineup of speakers from media organisations like the BBC, The Guardian, the Deutsche Presse Agentur, the Bertelsmann Foundation, and ZEIT Online coming to Berlin and talking about data journalism and the latest development and projects in this field. Join the Berlin meetup group and sign up for the event now. Thanks to my friends at Fjord and at the Open Knowledge Foundation for their help and support!

Posted at 13:48

Talis: A conversation about The Interactive Knowledge Stack

wernher_behrendt John_periera1 My guests on this Talking with Talis podcast are Wernher Behrendt  and John Pereira of Salzburg Research.  They are part of the team behind IKS – The Interactive Knowledge Stack an Integrating Project part-funded by the European Commission.

The four year project started in January 2009 to provide an open source technology platform for semantically enhanced content management systems.  The concept behind it being, that once developed, the stack can be bolted-on to many different CMS products to add semantic, and semantic web, capabilities.  Even though the project is open source, and the obvious use of it is with open source CMS tools, it’s use could be of equal value to commercial products.

Their target is engage with 40 small to medium organisations for whom developing such capability would not be possible with their limited resources.  They are already well on the way, with many joining in via the project Web site and participating at the first early adopters workshop in Salzburg in June.

Technorati Tags: ,,

Posted at 07:35

August 19 W3C QA Blog Semantic Web News: New opportunities for linked data nose-following For those of you interested in deploying RDF on the Web, I'd like to draw your attention to three new proposed standards from IETF, " Web Linking ", " Defining Well-Known URIs ", and " Web Host Metadata ", that create new follow-your-nose tricks that could be used by semantic web clients to obtain RDF connected to a URI - RDF that presumably defines what the URI 'means' and/or describes the thing that the URI is supposed to refer to. Most semantic web application developers are probably familiar with three ways to nose-follow from a URI: For # URIs - for X#F, the document X tells you about When the response to GET X is a 303 - the redirect target tells you about When the response to GET X is a 200 - the content may tell you about In case 3, X refers to what I'll call a "web page" (a more technical term is used in the TAG's httpRange-14 resolution ). One of the new RFCs extends case 3 to situations where the RDF can't be embedded in the content, either because the content-type doesn't provide a place to put it (e.g. text/plain) or because for administrative reasons the content can't be modified to include it (e.g. a web archive that has to deliver the original bytes faithfully). The others cover this case as well as offering improved performance in case 2. Web pages as RDF subjects Before getting into the new nose-following protocols, I'll amplify case 3 above by listing a few applications of RDF in which a web page occurs as a subject. I'll rather imprecisely call such RDF "metadata". Bibliographic metadata - tools such as Zotero might be interested in obtaining Dublin Core, BIBO, or other citation data for the web page. Stability metadata - for annotation and archiving purposes it may be useful to know whether the page's content is committed to be stable over time (e.g. this has changing content versus this has unchanging content ). See TimBL's Generic Resources note . Historical and archival metadata - it is useful to have links to other versions of a document - including future versions. All sorts of other statements can be made about a web page, such as a type (wiki page, blog post, etc.), SKOS concepts, links to comments and reviews, duration of a recording, how to edit, who controls it administratively, etc. Anything you might want to say about a web page can be said in RDF. Embedded metadata is easy to deploy and to access, and should be used when possible. But while embedded metadata has the advantages of traveling around with the content, a protocol that allows the server responsible for the URI to provide metadata over a separate "channel" has two advantages over embedded metadata: First, the metadata doesn't have to be put into the content; and second, it doesn't have to be parsed out of the content. And it's not either/or: There is no reason not to provide metadata through both channels when possible. Link: header The 'Web Linking' proposed standard defines the HTTP Link: header, which provides a way to communicate links rooted at the requested resource. These links can either encode interesting information directly in the HTTP response, or provide a link to a document that packages metadata relevant to the resource. In the former case, one might have: Link: ;   rel="http://www.w3.org/1999/02/22-rdf-syntax-ns#type" meaning that the request URI refers to something of type foaf:Document. In the latter case one might have: Link: ;   rel="describedby"; type=application/rdf+xml meaning that metadata can be found in , and hinting that the latter resource might have a 'representation' with media type application/rdf+xml. Host-wide nose-following rules The motivation for the "well-known URIs" RFC is to collect all "well-known URIs" (analogous to "robots.txt") in a single place, a root-level ".well-known" directory, and create a registry of them to avoid collisions. The most pressing need comes from protocols such as webfinger and OpenID; see Eran Hammer-Lahav's blog post for the whole story. For linked data, .well-known provides an opportunity for providing metadata for web pages, as well improving the efficiency of obtaining RDF associated with other "slash URIs", what is currently done using 303 responses. Ever since the TAG's httpRange-14 decision in 2005, there have been concerns that it takes two round trips to collect RDF associated with a slash URI. While some might question why those complaining aren't using hash URIs, in any case the "well-known URIs" mechanism gives a way to reduce the number of round trips in many cases, eliminating many GET/303 exchanges. The trick is to obtain, for each host, a generic rule that will transform the URI at that host that you want RDF for into the URI of a document that carries that RDF. This generic rule is stored in a file residing in the .well-known space at a path that is fixed across all hosts. That is: to find RDF for http://example.com/foo, follow these steps: obtain the host name, "example.com" form the URI with that host name and path "/.well-known/host-meta", i.e. "http://example.com/.well-known/host-meta" (see here ) if not already cached, fetch the document at that URI in that document find a rule generically transforming original-URI -> about-URI apply the rule to "http://example.com/foo" obtaining (say) "http://example.com/about/foo" find RDF about "http://example.com/foo" in document "http://example.com/about/foo" The form of the about-URI is chosen by the particular host, e.g. "http://example.com/foo,about" or "http://about.example.com/foo" or whatever works best. Why is this fewer round trips than using 303? Because you can fetch and cache the generic rule once per site. The first use of the rule still costs an extra round trip, but subsequent URIs for a given site can be nose-followed without any extra web accesses. A worked example can be found here . Next steps As with any new protocol, figuring out exactly how to apply the new proposed standards will require coordination and consensus-building. For example, the choice of the "describedby" link relation and "host-meta" well-known URI need to be confirmed for linked data, and agreement reached on whether multiple Link: headers is in good taste or poor taste. (Link: and .well-known put interesting content in a peculiarly obscure place and it might be a good idea to limit their use.) Consideration should be given to Larry Masinter's suggestion to use multiple relations reflecting different attitudes the server might have regarding the various metadata sources: For example the server may choose to announce that it wants the Link: metadata to override any embedded metadata, or vice versa. Agreement should be reached on the use of Link: and host-meta with redirects (302 and so on) - personally I think it would be a great thing as you could then use a value-added forwarding service to provide metadata that the target host doesn't or can't provide. This is not a particularly heavy coordination burden; the design odds-and-ends and implementations are all simple. The impetus might come from inside W3C (e.g. via SWIG) or bottom-up. All we really need to get this going are a bit of community discussion, a server, and a cooperating client, and if the protocols actually fill a need, they will take off. For past TAG work on this topic, please see TAG issue 62 and the " Uniform Access to Metadata " memo. Posted at 19:06

Copyright of the postings is owned by the original blog authors. Contact us.